Does ChatGPT-User mean the same thing as OAI-SearchBot?

No. OAI-SearchBot is OpenAI search crawling. ChatGPT-User is a user-triggered live retrieval fetcher. This study reports them separately.

Does the baseline prove AI crawlers prefer Markdown?

Not yet. OAI-SearchBot shows a small directional Markdown edge on best-for pages, but it is not statistically significant in the baseline window.

What is the primary outcome in this experiment?

The primary outcome is page-level coverage by assigned variant: the share of randomized Markdown or HTML pages each crawler reached at least once.

Research

Study 010

Do AI Crawlers Prefer Markdown?

Every few weeks someone posts with total confidence that AI crawlers love Markdown, or that they ignore it entirely. Nobody seems to run the test. So we did: same URL, two formats, four crawlers. They don't agree with each other.

8,033

Pages randomized

Experiment events

26K

Live retrieval fetches

71.9%

Smaller payload

Last updated · May 5, 2026

Half the GEO industry will tell you AI crawlers love Markdown. The other half says it makes no difference. Almost nobody has the data to back either claim. We built the test that people keep talking about but nobody seems to actually run: every public page on trakkr.ai gets randomly served as either Markdown or HTML, that assignment stays locked to the URL, and we track what four different AI crawlers do with each version.

Short version: three of the four haven't picked a side. The one that has (GPTBot) leans hard toward HTML, but GPTBot is OpenAI's training scraper, not the one that decides what gets cited when you're talking to ChatGPT. Worth knowing, but I wouldn't act on it yet.

What I find more interesting is actually underneath the headline results. Once you strip the nav, scripts, and tracking pixels out of a typical HTML page, what's left is just the answer. The Markdown version is roughly a quarter of the size. Cheaper to fetch, faster to parse. If retrieval cost ever becomes a routing signal inside the AI labs (and I'd bet it already is in at least one of them), that's the thing to watch. Numbers below refresh every Monday.

[01]

The experiment

/ Same URL, two formats, randomly assigned

Design

Same page. Two formats. Locked to the URL.

Every public page on trakkr.ai gets one of two surfaces (Markdown or HTML), based on a hash of its URL. Same content, same canonical, different format. The crawlers don't know they're in a test; they just see a page. And because the assignment is locked to the URL, the same page always serves the same surface. When a crawler comes back, we know which arm it's on. That's what makes the comparison clean.

01The pages

0,033

pages in the test

Every public trakkr.ai URL enters once, then stays put for the rest of the experiment.

02The split

50 / 50

random but stable

4,516markdownhtml4,517

03The data

crawler fetches measured

Concentrated across 0 recommendation pages, our most heavily crawled URLs.

Which pages count

Every public, indexable trakkr.ai page with a stable URL. Login-walled routes, redirects, and preview pages are left out.

How pages get assigned

Each URL is hashed with the experiment ID, and the hash decides Markdown or HTML. Same URL, same surface, every fetch; a crawler can't end up seeing both versions. Observed split: 50.0% Markdown, the rest HTML.

What 'recommendation pages' means

Our pages at /ai-recommends/<product>/<audience> (think “best AI transcription for nonprofits”). There are thousands of them, which is why this slice has enough volume to detect small effects. The results scoreboard runs against this slice.

[02]

Three OpenAI bots, three different jobs

/ Why mixing them up gives misleading answers

Three crawlers, three jobs

OpenAI runs three crawlers. Most write-ups blur them into one.

They shouldn't. Each one does a different job, runs on a different schedule, and responds to format differently. If you count GPTBot fetches as evidence of live citations, or read ChatGPT-User numbers as proof of search indexing, you're measuring the wrong thing. We keep all three separate from here.

01Search

OAI-SearchBot

Search index crawler

Crawls steadily, like a search engine

Pulls pages into OpenAI's search index, the system that decides what surfaces inside ChatGPT Search. If you want to show up when ChatGPT searches the web, this is the crawler whose preferences matter most.

02Interaction

ChatGPT-User

Live retrieval fetcher

Spikes when users ask questions

Opens a page in real time when someone in ChatGPT asks a question and the model decides it needs more context to answer. Pure conversation-time demand. Whatever a user asks, this is the bot that goes and fetches.

03Training

GPTBot

Training data scraper

Comes in heavy bursts on a schedule

Pulls pages into the corpus used to train future versions of GPT. Tells you about training pipeline preferences, not whether your page gets cited when a real user is talking to ChatGPT today.

[03]

The results so far

/ One settled signal, three still moving

Baseline results · recommendation pages

One settled signal, three still moving.

Across our recommendation pages plus ChatGPT-User live retrieval, only 0 of 5 crawlers shows a clear Markdown-vs-HTML preference. 1 leans in a direction but doesn't have enough data yet to call. 3 sit flat, with same reach for both formats. Click any row to see the numbers behind it.

OAI-SearchBot reached Markdown pages slightly more often than HTML, but the gap is still inside the noise band. Worth watching as more data lands.

Reached Markdown pages47.2%(528 of 1,119)Reached HTML pages44.4%(502 of 1,130)Significancep=0.189

ChatGPT-User isn't a background crawler. It's the fetch ChatGPT makes in real time when someone asks a question and the model needs to read a page to answer. It reached Markdown and HTML pages at almost identical rates. So at conversation time, format isn't deciding which page gets opened; the user's question is. We break that down in the next section.

Reached Markdown76.7%Reached HTML77.0%Live fetches27,810Unique pages1,728Significancep=0.859 (flat)

GPTBot is dramatically skipping Markdown pages; remember, GPTBot is OpenAI's training scraper, not the bot that decides which page gets cited when someone is actually talking to ChatGPT. We treat this as a signal about how training data gets selected, not about live answers.

Reached Markdown pages2.5%(28 of 1,119)Reached HTML pages31.9%(361 of 1,130)Significancep=<0.001

Perplexity's crawler is essentially neutral. There's a tiny HTML-side edge, but nothing that clears the bar for statistical significance.

Reached Markdown pages8.4%(94 of 1,119)Reached HTML pages9.7%(110 of 1,130)Significancep=0.271

Claude's crawler leans HTML by a couple of points. Directionally interesting, but not yet statistically settled; file under 'watch this space.'

Reached Markdown pages8.9%(100 of 1,119)Reached HTML pages11.0%(124 of 1,130)Significancep=0.107

← Prefers HTMLPrefers Markdown →

How to read this

GPTBot's big HTML lean is the only statistically settled result on the board, and we're careful with it. GPTBot is OpenAI's training scraper; it tells you something about how future model versions get fed, not whether your page gets cited when someone talks to ChatGPT today. Interesting, but not something I'd change my site over.

The result I'd actually act on is OAI-SearchBot's. That's the crawler behind ChatGPT Search; when ChatGPT goes looking for fresh information on the open web, this is what it sends. It leans Markdown by a few percentage points right now, but not by enough to be statistically confident yet. The weekly tracker further down is where we'll see if that changes.

Everyone else (ChatGPT-User, Perplexity, Claude) sits roughly flat. Markdown and HTML get reached at about the same rate. Which makes sense if you think about it: these systems are chasing the user's question, not the page's format. The flat line is actually the interesting result here. At conversation time, what your page is about matters more than how you serve it.

[04]

What people actually ask about

/ Live retrieval follows the question, not the format

ChatGPT-User · last 7 days

Live retrieval follows the question, not the format.

ChatGPT-User isn't a crawler on a schedule. It opens a page mid-conversation because someone asked ChatGPT something and the model needed a real page to answer. Across 28K of these live fetches on our site, demand tracks the topics people are actually asking about, and it spreads roughly evenly across both arms of the experiment. Format doesn't seem to matter here. The question does.

Live fetches

Unique pages

0,728

What ChatGPT pulled the most

top 5 categories · last 7 days

0,603

0,310

0,269

0,183

0,042

Sanity check: ChatGPT-User reached 76.7% of Markdown-assigned pages and 77.0% of HTML-assigned pages. That's within 0.3% of each other (p=0.859). So the category gaps above are about what people asked, not which arm of the experiment got more traffic.

[05]

Why Markdown might pull ahead anyway

/ The structural reason this line could keep moving

Markdown is much smaller

0.0%

smaller than the same page in HTML

OAI-SearchBot's Markdown lean is small for now. But here's why I think that line could keep growing: strip the nav, scripts, tracking pixels, and CSS chrome from a typical HTML page and what's left is just the answer. The Markdown version of that same page is roughly a quarter of the size. Cheaper to fetch, faster to parse. If retrieval cost ever becomes a routing signal inside the labs (and I'd bet it already is in at least one), that gap starts to matter.

Fetches we served as Markdown

0,745

across the randomized URLs

Fetches we served as HTML

0,322

across the randomized URLs

Total bytes saved

0.0%

vs serving everything as HTML

[06]

What to watch each week

/ The page rebuilds every Monday at 09:00 UTC

How this updates

The whole page rebuilds every Monday.

Every Monday at 09:00 UTC, the previous week's data lands and this whole page refreshes. Numbers, prose, and analysis are all pinned to the same snapshot. If you cite something here, it's tied to that week's run. The line I'm watching closest is OAI-SearchBot's Markdown lean. If it crosses into statistically significant territory, the rail below is where you'll see it happen.

Latest run

May 4, 2026

09:00 UTC

Next run

May 11, 2026

09:00 UTC

OAI-SearchBot Markdown preference · week by week

1 run so far · 11 to come

Baseline snapshot. Public weekly rebuilds start the Monday after this.

Updated Mondays

[07]

How this study works

/ Methodology and what it can't yet say

Methodology

How this is built, and what it can’t yet say.

Every Monday's run produces one snapshot that the whole page is built from. If the prose ever says something the data doesn't back, the snapshot is what's true. Below is the short version of how it works, what it can't tell you yet, and any caveats from this week.

Cadence

Mondays at 09:00 UTC

Pages

9,033

Split

50% Markdown

How it’s built

01
Each eligible page is locked to either Markdown or HTML using a hash of its URL; same URL, same surface, every time.
02
The headline metric is page-level coverage: of all the pages assigned to a variant, what share did each crawler actually reach?
03
We track request counts and bytes transferred too, but treat them as secondary signals because a few popular URLs can dominate raw volume; differences between Markdown and HTML are tested with two-proportion z-tests and reported as percentage-point gaps with p-values.
04
ChatGPT-User is reported separately from OAI-SearchBot because it's user-triggered live retrieval, not background indexing; different signal entirely.

What it can’t say

01
We identify bots by their user-agent string in this baseline. Cloudflare's verified-bot signal isn't in the dataset yet, so a determined spoofer could be miscounted.
02
ChatGPT-User wasn't part of the experiment cohort during the first window; it shows up here as historical retrieval demand. It joins the main scoreboard from 2026-05-06 onward.
03
We're measuring whether crawlers fetch a page, not whether the AI ended up citing it in an answer. Those are linked questions, but they're not the same.
04
Some live retrieval still hits old recommendation URLs we retired months ago. Those aren't part of the active test pool; they're a side artifact of how AI systems hold onto URLs they've seen before.

C01
Bot identity in this baseline window comes from the user-agent string alone. Cloudflare's verified-bot signal isn't in the dataset yet, so any well-resourced spoofer could be miscounted.
C02
ChatGPT-User was tracked as historical live-retrieval demand for this baseline. It joins the main scoreboard as a first-class cohort from 2026-05-06 onward.
C03
Coverage is measured per page, not per request. A handful of popular URLs can dominate raw request counts, so we treat request volume as a secondary signal.

ReferencesFurther reading

OpenAI crawler docs

GPTBot · ChatGPT-User · OAI-SearchBot

Crawler behavior study

How AI bots discover and traverse pages

More Trakkr Research

Studies 001-010

Last snapshot · May 5, 2026, 8:26 AM GMT+1Study 010 · Trakkr Research