Install crawler tracking
Pick your platform and walk through the full crawler-tracking setup — one picker, eighteen methods (Cloudflare, Vercel, WordPress, and more), every step from connection to verification.
Pick your platform. Each path is a step-by-step setup with prereqs, exact click paths, and troubleshooting — the same shape as the in-product stack picker on the Crawler page.
Hosting platforms
Read from the CDN or platform serving your site.
CMS & site builders
Drop-in plugin for WordPress, Cloudflare proxy for everything else.
Connect the site, install the Trakkr plugin, done.
Server-side · ~7 min
Proxy your Webflow site through Cloudflare.
Server-side · ~15 min
Proxy your storefront through Cloudflare.
Server-side · ~15 min
Proxy your HubSpot CMS site through Cloudflare.
Server-side · ~15 min
External nameservers + Cloudflare proxy.
Server-side · ~20 min
Proxy through Cloudflare so crawlers are visible.
Server-side · ~15 min
Proxy your Framer site through Cloudflare.
Server-side · ~15 min
Proxy Ghost(Pro) through Cloudflare.
Server-side · ~15 min
Self-hosted runtime
POST from your own server, edge function, or log forwarder.
Drop a proxy / middleware file into your app.
Real-time · ~5 min
~25 lines of middleware near the top of the app.
Real-time · ~5 min
Add a log_by_lua_block to your server config.
Real-time · ~10 min
Lambda@Edge on the Viewer Request hook.
Real-time · ~10 min
Forward DataStream 2 via HTTPS endpoint.
Real-time · ~8 min
Real-Time Log Streaming over HTTPS.
Real-time · ~5 min
POST batches to a unique webhook URL from anything.
Real-time · ~5 min
Cloudflare - install guide
Any site already proxied through Cloudflare (orange cloud).
If your site sits behind Cloudflare, this is the easiest setup. Cloudflare already sees every request that hits your site, including the AI bots blocked at the WAF before they reach your origin. Trakkr reads that data through a scoped, read-only API token, so you don't have to change DNS, modify your firewall, or deploy any code.
Before you start
- A Cloudflare account with at least one zone (domain).
- The zone must be proxied through Cloudflare — the orange-cloud icon in DNS settings. DNS-only zones don't generate the analytics Trakkr needs.
- Permission to create API tokens in your Cloudflare account.
Step 1 — Create a scoped API token
The fastest path is the pre-filled token template — Cloudflare opens the create-token screen with the right permissions, name, and resource scope already selected. The same link is exposed inside the Trakkr setup modal.
**Open the pre-filled Cloudflare token template →**
Review the pre-filled values, then click Continue to summary → Create Token → Copy the token now (Cloudflare only shows it once; refresh and it's gone).
Or pick the permissions manually
If the template doesn't open or you'd rather walk through it yourself:
- 1.Open dash.cloudflare.com/profile/api-tokens and click Create Token.
- 2.Pick Create Custom Token (not one of the listed templates).
- 3.Add three permissions using Cloudflare's three-level selector (scope group → category → access level):
| Permission row | Scope group | Category | Access level |
|---|---|---|---|
| 1 | Zone | Analytics | Read |
| 2 | Account | Account Analytics | Read |
| 3 | Zone | Zone | Read |
The Zone > Analytics > Read row is the one most setups miss. Without it the token verifies and lists your zones, but Cloudflare denies the analytics query and no crawler data comes through.
- 1.Under Account Resources, choose your account (or "All accounts").
- 2.Under Zone Resources, either include All zones from an account or pick the specific zone(s) you want to track.
- 3.Click Continue to summary → Create Token → copy the token immediately.
Step 2 — Connect in Trakkr
- 1.In Trakkr, open the Crawler page (click Add source in the header if you already have other sources). Under Hosting platform, pick Cloudflare.
- 2.Paste your token. The modal shows a one-click "Open Cloudflare token template" shortcut if you skipped Step 1.
- 3.Click Verify token.
- 4.Choose the zone you want to track and click Connect zone.
That's the whole flow. Trakkr starts pulling crawler data within a few minutes. The dashboard fills in automatically as new bot visits arrive.
What Trakkr can do with the token
The token's read scopes let Trakkr:
- List your zones so you can pick which domain to track.
- Read Cloudflare's GraphQL analytics to find AI crawler hits, broken out by bot, status code, country, and time.
It cannot modify DNS, firewall rules, caching, Workers, or anything else. Read-only, by design.
Adaptive analytics — what's the tradeoff?
Cloudflare's analytics layer is sampled on very high-traffic zones. On a busy site, low-volume crawler traffic can be slightly underreported (the dashboard will still surface every bot, but exact daily counts on the long tail can be approximate).
If you need per-request capture — for example, a publisher tracking every PerplexityBot hit individually — pair the Cloudflare integration with a Cloudflare Worker. The Worker captures every request and sends it to Trakkr directly. The Workers path is offered as an optional step in the in-product setup after the API token connection succeeds.
Troubleshooting
"Failed to verify token"
- Double-check the token has all three reads: Zone Analytics: Read, Account Analytics: Read, and Zone: Read.
- Make sure the token isn't restricted to zones you no longer have active.
- Tokens are case-sensitive and long — make sure there's no leading/trailing whitespace.
"Cloudflare denied analytics access for this zone"
The token is valid and your zones load, but the analytics query is rejected. This almost always means the Zone > Analytics > Read permission is missing (the Account > Account Analytics > Read row alone isn't enough on every account). Edit the token in Cloudflare API Tokens, add a Zone > Analytics > Read row, save, and reconnect. The pre-filled template above already includes it.
"No zones appear after verification"
The token verified but Trakkr couldn't list any zones. Almost always means the token's Zone Resources are scoped to a zone that isn't in your account — regenerate the token with All zones or the correct specific zones.
"Verification works but I see no real bot data"
Cloudflare's data lags by a few minutes for low-volume sites. If 24 hours pass and the Feed is still empty:
- 1.Click Send verification in the Crawler header to confirm the pipeline.
- 2.Open the Access tab — a
Disallow: /in yourrobots.txtunderUser-agent: GPTBotor a WAF rule blocking AI bots will show up here as a finding. - 3.Check that the zone's orange-cloud proxy is still on (someone may have toggled it to DNS-only).
Disconnecting
Removing the connection in Trakkr (Crawler → the connection card → Disconnect) stops the data flow immediately. You can also revoke the token from Cloudflare API Tokens. Neither action touches your site, DNS, or traffic.
Vercel - install guide
Sites deployed on Vercel Pro or Enterprise (Log Drains required).
If your site is deployed on Vercel, the install takes about a minute. You install Trakkr through Vercel's marketplace; Trakkr then creates a Log Drain on the project you select and filters the resulting log stream for AI crawler hits — every human visitor is discarded on arrival. No code changes, no environment variables, no DNS.
Before you start
- A Vercel project that's actively serving traffic.
- A Vercel Pro or Enterprise plan. Log Drains are not available on the Hobby plan.
- Permission to install integrations on the Vercel team that owns the project.
- A logged-in Vercel session in the same browser you're using for Trakkr (so the redirect doesn't ask you to sign in mid-flow).
Step 1 — Install the Trakkr integration on Vercel
- 1.In Trakkr, open the Crawler page (click Add source in the header if you already have other sources). Under Hosting platform, pick Vercel.
- 2.Click Continue to Vercel.
- 3.Vercel opens the marketplace install screen for the Trakkr integration. Pick the team that owns the project you want to track, review the permissions Vercel shows, and click Install.
- 4.Vercel handles the consent flow and sends you back to Trakkr automatically.
Step 2 — Pick a project
After installation, Trakkr lists every project you have access to. Pick the one matching the site you want to track and click Continue. Trakkr then calls Vercel's API to create a Log Drain on that project — no manual configuration in the Vercel dashboard.
The Log Drain starts streaming within a few minutes. From that point, every production request to the project flows through Trakkr's filter and AI crawler hits land in your Feed in real time.
Which deployments are tracked
Trakkr tracks production traffic on the project — your main domain and any production aliases. Preview and branch deployments aren't included by default, which is what most teams want for AI crawler tracking (bots almost always crawl the canonical production URL).
Step 3 — Verify
In Trakkr's Crawler header, click Send verification. Trakkr fetches your homepage with a GPTBot User-Agent — the synthetic ping should appear in your Feed with a Verified badge within 30 seconds, confirming the full pipeline (Vercel → Log Drain → Trakkr ingest → dashboard) is healthy.
Real AI bot visits arrive as crawlers discover (or revisit) the site. Most actively-published sites see their first real GPTBot, ClaudeBot, or PerplexityBot hit within an hour.
How the Log Drain works
For every request hitting your project, Vercel sends a log line to Trakkr's ingest endpoint. Trakkr inspects the User-Agent:
- AI crawler detected (GPTBot, ClaudeBot, PerplexityBot, ChatGPT-User, etc.) — recorded against your brand, surfaces in the dashboard.
- Anything else — discarded immediately. Human visitor data is never stored.
The pipeline runs entirely server-side, so it catches every bot — including ones that never execute JavaScript and ones blocked by Vercel's edge middleware before the page is served.
Troubleshooting
"Log Drain setup failed"
The most common cause is being on a Hobby plan. Vercel rejects Log Drain creation requests from Hobby projects. Upgrade or pick a different method.
If you're on Pro/Enterprise and still see this, disconnect and reconnect — the OAuth token may have lost permissions.
"Install callback errored / popup closed early"
- Browser pop-ups blocked? Allow pop-ups for
trakkr.aiand retry. - Multiple Vercel accounts? Make sure you install onto the team that owns the project, not your personal team — the marketplace install screen has a team picker at the top.
"No data after connecting"
- Log Drains take 1–5 minutes to activate after creation.
- Verify the project is actually receiving traffic (open the Vercel deployment in a browser).
- Click Send verification in Trakkr to confirm the ingest pipeline is healthy.
Disconnecting
From Trakkr, open Crawler, find the Vercel connection card, and click Disconnect. Trakkr removes the Log Drain from your Vercel project in the same step. If Vercel's API hiccups during cleanup, you can also remove the drain or the Trakkr integration from your team's Integrations page in Vercel — it's safe to do either way.
Netlify - install guide
Sites hosted on Netlify (any plan).
If your site is hosted on Netlify, you OAuth once, pick a site, and then add a single Edge Function file to your site's repo. Once committed and deployed, that function inspects every request at Netlify's edge and forwards AI crawler hits to Trakkr in real time.
Before you start
- A Netlify account with a deployed site whose repo you can commit to.
- Permission to authorize third-party apps on your Netlify team.
- The site must allow Edge Functions (enabled by default on every plan; some restrictive enterprise setups disable them).
Step 1 — Connect via OAuth
- 1.In Trakkr, open the Crawler page (click Add source in the header if you already have other sources). Under Hosting platform, pick Netlify.
- 2.Click Continue to Netlify. You'll be redirected to Netlify in the same tab.
- 3.Approve Trakkr's access. The grant lets us list the sites on your team so you can pick the right one — we don't read site content, deploy code, or touch your DNS.
- 4.Netlify redirects you back to Trakkr to finish setup.
Step 2 — Pick a site
After OAuth, Trakkr lists every site you have access to. Pick the one matching the brand and click Continue.
At this point Trakkr records the site selection and shows the Pending Edge Setup card with everything you need to paste into your repo — three environment variables and a single TypeScript file.
Step 3 — Add the edge function to your repo
- 1.In your site's repo, create the file
netlify/edge-functions/trakkr-crawler.ts(create thenetlify/edge-functions/folder if it doesn't exist — Netlify picks up that path automatically, nonetlify.tomlchange required). - 2.Copy the edge template from the Trakkr setup card and paste it into that file.
- 3.Commit and push.
Step 4 — Set the three environment variables
In Netlify, open Site configuration → Environment variables (or use the Netlify CLI) and add:
TRAKKR_CONNECTION_ID=<your-connection-id>
TRAKKR_WEBHOOK_SECRET=<your-webhook-secret>
TRAKKR_INGEST_URL=https://api.trakkr.ai/crawler-connect/ingest/netlifyAll three values are visible in the Trakkr setup card. The function uses these to authenticate every POST it sends — without them, the edge function will run but Trakkr will reject the payloads as unauthorised.
Step 5 — Trigger a deploy and verify
On the next push, Netlify builds and activates the edge function automatically. If you want it live immediately without a code change, trigger a manual deploy from the Netlify dashboard:
Deploys → Trigger deploy → Clear cache and deploy site
Then in Trakkr click Send verification in the Crawler header. The synthetic GPTBot ping should appear in the Feed within 30 seconds.
How the Edge Function works
The function runs at Netlify's edge, in front of your origin, on every request to your site. For each request it lets the response through first, then — only if the request came from a known AI crawler — fires a small POST to Trakkr in the background.
- AI crawler detected (GPTBot, ClaudeBot, PerplexityBot, ChatGPT-User, Claude-User, Perplexity-User, OAI-SearchBot, Applebot, Bytespider, CCBot, and more) — the visit is sent to Trakkr without blocking the response.
- Human visitor — passed through with zero side effects.
The outbound POST is fire-and-forget (via Netlify's waitUntil), so it doesn't add any perceptible latency for your visitors. The function never modifies your HTML, headers, or behavior.
Troubleshooting
"Edge Function deploy didn't activate"
Netlify only activates edge changes on a new deploy. Trigger one manually from the dashboard.
"OAuth didn't complete"
- Did the redirect get blocked? Allow
trakkr.aiandapp.netlify.comand retry. - Multiple Netlify teams? Make sure you authorize from the correct team — Trakkr's connection is scoped to whichever team you choose.
"Edge function runs but Trakkr Feed is empty"
Almost always missing or wrong env vars. Open the Netlify deploy logs and search for TRAKKR_ — if you see warnings about undefined env vars, fix them in Site configuration and trigger another deploy.
"No data after deploy"
- The function needs a fresh deploy to go live.
- Confirm the site is receiving real traffic (visit it in a browser yourself).
- Click Send verification in Trakkr to confirm the ingest pipeline.
- Open the Access tab in Trakkr — a
robots.txtblock or an Edge rule denying bots will show up as a finding.
"I see verification visits but no real crawls"
Real AI crawler traffic comes in waves, not on a fixed schedule. Most sites see their first non-verification GPTBot or ClaudeBot hit within 24 hours. If your site is newer or low-traffic, give it a few days.
Disconnecting
From Trakkr (Crawler → the connection card → Disconnect): the Edge Function will stop forwarding successfully because the connection no longer recognises the env-var pair. To remove the function entirely, delete netlify/edge-functions/trakkr-crawler.ts from your repo and deploy. You can also revoke Trakkr's OAuth grant in Netlify under User settings → Applications → Authorized apps → Trakkr.
Next.js (self-hosted) - install guide
Self-hosted Next.js apps not deployed on Vercel.
For self-hosted Next.js apps (anything not running on Vercel), Trakkr ships a drop-in proxy/middleware file that detects AI crawler User-Agents at the edge and POSTs them to a shared ingest URL using a per-connection bearer token. Response latency stays unchanged — the forwarding happens in the background via waitUntil.
Before you start
- A Next.js 13+ app you can deploy a code change to.
- Your bearer token from Trakkr (see Step 1). The ingest URL is the same for every connection (
https://api.trakkr.ai/crawler-connect/ingest/manual); the bearer token is unique to your connection and identifies it server-side.
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr. If you haven't connected any sources yet you'll see the stack picker straight away; otherwise click Add source in the header.
- 2.Under Self-hosted runtime, pick Next.js.
- 3.Give the source a display name (e.g.
Production Next.js app). - 4.Trakkr creates a pending connection and shows you the snippet with the ingest URL and a unique bearer token (
trk_…) baked in. Keep that tab open — you'll paste from it in Step 2.
Step 2 — Drop in the proxy / middleware file
Save the snippet from the in-app setup as proxy.ts (Next.js 16+) or middleware.ts (Next.js 15 and earlier) at the root of your app or src/ directory. The exported function name must match the file: proxy for proxy.ts, middleware for middleware.ts.
The full template is in the in-app setup so it can include your real ingest URL and bearer token. Here's the shape:
import { NextResponse, type NextFetchEvent, type NextRequest } from class="text-[#0e7c5a]">'next/server'
const TRAKKR_URL = process.env.TRAKKR_INGEST_URL!
const TRAKKR_TOKEN = process.env.TRAKKR_BEARER_TOKEN!
const AI_BOT_RE = /GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Claude-SearchBot|Bytespider|CCBot|Amazonbot|Applebot(?!-Extended)|Meta-ExternalFetcher|MistralAI-User|Google-Agent|anthropic-ai|cohere-ai|Claude-Web|Claude-Code|Diffbot/i
export function proxy(request: NextRequest, event: NextFetchEvent) {
const response = NextResponse.next()
const ua = request.headers.get(class="text-[#0e7c5a]">'user-agent') || class="text-[#0e7c5a]">''
if (!AI_BOT_RE.test(ua)) return response
const payload = [{
timestamp: new Date().toISOString(),
url: request.nextUrl.href,
requestPath: request.nextUrl.pathname,
method: request.method,
referrer: request.headers.get(class="text-[#0e7c5a]">'referer') || class="text-[#0e7c5a]">'',
ip: request.headers.get(class="text-[#0e7c5a]">'x-forwarded-for')?.split(class="text-[#0e7c5a]">',')[0]?.trim() || class="text-[#0e7c5a]">'',
country: request.headers.get(class="text-[#0e7c5a]">'x-vercel-ip-country') || request.headers.get(class="text-[#0e7c5a]">'cf-ipcountry') || class="text-[#0e7c5a]">'',
userAgent: ua,
}]
event.waitUntil(
fetch(TRAKKR_URL, {
method: class="text-[#0e7c5a]">'POST',
headers: {
class="text-[#0e7c5a]">'content-type': class="text-[#0e7c5a]">'application/json',
authorization: class=class="text-[#0e7c5a]">"text-[#0e7c5a]">`Bearer ${TRAKKR_TOKEN}`,
},
body: JSON.stringify(payload),
cache: class="text-[#0e7c5a]">'no-store',
}).catch(() => undefined),
)
return response
}
export const config = {
matcher: [class="text-[#0e7c5a]">'/((?!api|_next/static|_next/image|favicon.ico|robots.txt|sitemap.xml).*)'],
}Step 3 — Set environment variables
Store the ingest URL and bearer token as environment variables in your deployment platform (Docker, Kubernetes, Render, Railway, Fly, your own bare-metal — wherever Next.js runs):
TRAKKR_INGEST_URL=https://api.trakkr.ai/crawler-connect/ingest/manual
TRAKKR_BEARER_TOKEN=trk_<the rest of your token>The ingest URL is shared across every Next.js / Node / Nginx / custom connection — your connection identity is carried inside the bearer token. Copy the exact trk_… string from the in-app setup so you don't lose any characters.
You can also paste both values directly into the snippet (as fallbacks), but env vars are the right pattern for rotating credentials.
Step 4 — Deploy and verify
- 1.Deploy the updated app.
- 2.In Trakkr, click Send verification in the Crawler header. The synthetic GPTBot ping should show up in the Feed within 30 seconds.
Real bot traffic begins flowing as soon as crawlers visit the deployed site. Most actively-crawled sites see the first real hit within hours.
Troubleshooting
"Verification arrives but no real events"
- The matcher in the snippet excludes
/api,/_next/*,favicon.ico,robots.txt, andsitemap.xml— adjust it if you want to capture bots hitting those. - Some bots only hit specific paths. Open the Access tab to make sure none of your routes are returning 404 to GPTBot.
"Build fails with 'process is not defined'"
Edge middleware in Next.js doesn't have full Node process access. The snippet uses process.env, which is supported in Edge — but only if the env vars are set at build time on platforms like Vercel/Cloudflare. On Node runtimes you have full access. If you're targeting Edge runtime, set env vars during the deployment step.
"Webhook returns 401"
Bearer token mismatch. Re-copy the full trk_… string from the connection card in Trakkr (Crawler → the connection card → Reveal the token).
"Real client IP shows as the load balancer / Vercel edge / Cloudflare IP"
The snippet reads x-forwarded-for only. If you sit behind Cloudflare or Sucuri, the real client IP is also exposed as cf-connecting-ip or x-sucuri-clientip — prefer those in front of x-forwarded-for if you need accurate per-country attribution. The Trakkr WordPress plugin shows the full priority order (WordPress install); the Next.js template intentionally keeps it short.
Node / Express - install guide
Express, Fastify, Koa, NestJS, or any Node HTTP server.
If your app runs on Node — Express, Fastify, Koa, NestJS, raw http — Trakkr ships a tiny middleware (~25 lines) that hooks the response.finish event and forwards AI crawler hits in the background.
Before you start
- A Node app you can add middleware to.
- Your bearer token from Trakkr (Step 1). The ingest URL is the same for every webhook connection (
https://api.trakkr.ai/crawler-connect/ingest/manual); the bearer token identifies your connection server-side.
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr. If you have no sources yet you'll land on the stack picker; otherwise click Add source in the header.
- 2.Under Self-hosted runtime, pick Node / Express.
- 3.Name the source (e.g.
API + app server). - 4.Trakkr creates the connection and shows the middleware template with the ingest URL and a unique bearer token (
trk_…) pre-filled.
Step 2 — Mount the middleware
For Express, drop this in near the top of your app — before route handlers, so it runs on every request:
const TRAKKR_URL = process.env.TRAKKR_INGEST_URL
const TRAKKR_TOKEN = process.env.TRAKKR_BEARER_TOKEN
const AI_BOT_RE = /GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Claude-SearchBot|Bytespider|CCBot|Amazonbot|Applebot(?!-Extended)|Meta-ExternalFetcher|MistralAI-User|Google-Agent|anthropic-ai|cohere-ai|Claude-Web|Claude-Code|Diffbot/i
function trakkrCrawlerMiddleware(req, res, next) {
res.on(class="text-[#0e7c5a]">'finish', () => {
const ua = req.get(class="text-[#0e7c5a]">'user-agent') || class="text-[#0e7c5a]">''
if (!AI_BOT_RE.test(ua)) return
const protocol = req.headers[class="text-[#0e7c5a]">'x-forwarded-proto'] || req.protocol || class="text-[#0e7c5a]">'https'
const host = req.get(class="text-[#0e7c5a]">'host') || class="text-[#0e7c5a]">'localhost'
const url = class=class="text-[#0e7c5a]">"text-[#0e7c5a]">`${protocol}:class=class="text-[#0e7c5a]">"text-[#94a3b8] italic">//${host}${req.originalUrl || req.url}`
const ip = (req.headers[class="text-[#0e7c5a]">'x-forwarded-for'] || req.ip || class="text-[#0e7c5a]">'').toString().split(class="text-[#0e7c5a]">',')[0].trim()
void fetch(TRAKKR_URL, {
method: class="text-[#0e7c5a]">'POST',
headers: {
class="text-[#0e7c5a]">'content-type': class="text-[#0e7c5a]">'application/json',
authorization: class=class="text-[#0e7c5a]">"text-[#0e7c5a]">`Bearer ${TRAKKR_TOKEN}`,
},
body: JSON.stringify([{
timestamp: new Date().toISOString(),
url,
requestPath: req.path,
method: req.method,
referrer: req.get(class="text-[#0e7c5a]">'referer') || class="text-[#0e7c5a]">'',
ip,
country: req.get(class="text-[#0e7c5a]">'cf-ipcountry') || req.get(class="text-[#0e7c5a]">'x-vercel-ip-country') || class="text-[#0e7c5a]">'',
statusCode: res.statusCode,
userAgent: ua,
}]),
}).catch(() => undefined)
})
next()
}
app.set(class="text-[#0e7c5a]">'trust proxy', true)
app.use(trakkrCrawlerMiddleware)The full template (with your live URL and token baked in) lives in the in-app setup. Copy it from there to avoid hand-pasting credentials.
app.set('trust proxy', true) is important if your app sits behind a load balancer, reverse proxy, or CDN. Without it, req.ip returns the proxy's address instead of the real client IP. Express's trust proxy docs cover the safer per-hop alternatives if true is too permissive.Other frameworks
- Fastify — register as a hook on
onResponse, same payload shape. - Koa — run after
await next(), push to the response withctx.res.on('finish', ...). - NestJS — wrap as a Nest middleware via
MiddlewareConsumer.apply(...).forRoutes('*'). - Raw
http— listen onresponse'finish' directly.
The payload contract is the same across all of them. Trakkr accepts an array of crawler-visit objects with the fields shown in the snippet.
Step 3 — Set environment variables
TRAKKR_INGEST_URL=https://api.trakkr.ai/crawler-connect/ingest/manual
TRAKKR_BEARER_TOKEN=trk_<the rest of your token>The ingest URL is shared across every Next.js / Node / Nginx / custom connection. Your connection identity is carried inside the bearer token, so copy the full trk_… string from the in-app setup.
Step 4 — Deploy and verify
Deploy the change, then in Trakkr click Send verification in the Crawler header. The synthetic GPTBot visit should appear in the Feed within 30 seconds.
Why res.on('finish', …) and not direct sending?
The middleware sends crawler data after the response has gone out. The request handler never waits on the ingest call, so user-facing latency is zero. If the ingest call fails (network blip, Trakkr maintenance), the .catch(() => undefined) swallows it — your app never crashes because of analytics.
Troubleshooting
"Webhook returns 401"
Bearer token mismatch. Re-copy the full trk_… string from the connection card in Trakkr (Crawler → the connection card → Reveal).
"No data at all, including verification"
- Is
TRAKKR_INGEST_URLset in the running process?console.log(TRAKKR_URL)to confirm. - Outbound HTTPS blocked from your environment? Run
curl -sS -o /dev/null -w '%{http_code}\n' "$TRAKKR_INGEST_URL"from a shell on the same host.
"req.ip is always the load balancer IP"
Add app.set('trust proxy', true) (or a stricter per-hop setting). Without it, Express ignores X-Forwarded-For and req.ip becomes the immediate connecting address.
"Bots hit but the wrong status code is recorded"
The middleware uses res.statusCode from the finish event, which is correct after redirects/errors are committed. If you're using a custom error handler that mutates the status after next() returns, attach the listener inside the error path too.
"IP attribution is wrong (showing Cloudflare/Sucuri IPs)"
Express's req.ip returns the connecting IP, which under a CDN/WAF is the proxy's address. With app.set('trust proxy', true) it walks X-Forwarded-For instead — but Sucuri uses X-Sucuri-ClientIP and Cloudflare's preferred header is CF-Connecting-IP, neither of which Express reads by default. If country attribution looks off, change the snippet's ip line to prefer those headers before falling back to x-forwarded-for. The WordPress install lists the full priority order.
Nginx / OpenResty - install guide
OpenResty deployments, or plain Nginx with a log shipper.
For OpenResty (Nginx with the lua-nginx-module), Trakkr ships a log_by_lua_block you paste into your server config. The block runs after the response has already been sent to the client, in a background timer — so it never adds latency.
If you're on plain Nginx without Lua, see the "No Lua?" section below.
Before you start
- OpenResty, or Nginx compiled with
lua-nginx-module. - The
lua-resty-httplibrary:opm get ledgetech/lua-resty-http. - Permission to edit and reload your Nginx config.
- Your bearer token from Trakkr (Step 1). The ingest URL is shared across every webhook connection (
https://api.trakkr.ai/crawler-connect/ingest/manual); the bearer token identifies your connection.
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr (you'll see the stack picker if no sources are connected yet, or click Add source in the header if you already have some).
- 2.Under Self-hosted runtime, pick Nginx / OpenResty.
- 3.Name the source (e.g.
Edge gateway). - 4.Trakkr generates the snippet with the ingest URL and a unique bearer token (
trk_…) baked in. Copy it from the in-app setup so you don't risk hand-pasting credentials.
Step 2 — Add DNS resolver and CA bundle to your http { } block
OpenResty's HTTP client needs a DNS resolver and a CA bundle to reach Trakkr over HTTPS. Add these to the http { } block (skip any you already have):
http {
resolver 1.1.1.1 8.8.8.8 valid=300s ipv6=off;
lua_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
lua_ssl_verify_depth 3;
...
}The CA bundle path depends on your distro:
| Distro | Path |
|---|---|
| Debian / Ubuntu / Alpine | /etc/ssl/certs/ca-certificates.crt |
| RHEL / CentOS / Amazon Linux | /etc/pki/tls/certs/ca-bundle.crt |
If neither path exists on your server, find / -name 'ca-certificates.crt' -o -name 'ca-bundle.crt' 2>/dev/null will surface the right one.
Step 3 — Add the log_by_lua_block to your server
Paste the snippet from the in-app setup into the server { } block for your site. Here's the shape — the in-app version includes your real URL and token:
log_by_lua_block {
local AI_BOT_RE = [[GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Claude-SearchBot|Bytespider|CCBot|Amazonbot|Applebot(?!-Extended)|Meta-ExternalFetcher|MistralAI-User|Google-Agent|anthropic-ai|cohere-ai|Claude-Web|Claude-Code|Diffbot]]
local ua = ngx.var.http_user_agent or ""
if not ngx.re.find(ua, AI_BOT_RE, "ijo") then return end
local cjson = require "cjson.safe"
local payload = cjson.encode({{
timestamp = os.date("!%Y-%m-%dT%H:%M:%SZ"),
url = ngx.var.scheme .. "://" .. (ngx.var.host or "") .. (ngx.var.request_uri or ""),
requestPath = ngx.var.uri,
method = ngx.req.get_method(),
referrer = ngx.var.http_referer or "",
ip = ngx.var.remote_addr or "",
country = ngx.var.geoip_country_code or ngx.var.http_cf_ipcountry or "",
statusCode = tonumber(ngx.var.status),
userAgent = ua,
}})
if not payload then return end
ngx.timer.at(0, function(premature, body)
if premature then return end
local httpc = require("resty.http").new()
httpc:set_timeouts(2000, 2000, 5000)
httpc:request_uri("https://api.trakkr.ai/crawler-connect/ingest/manual", {
method = "POST",
body = body,
headers = {
["Content-Type"] = "application/json",
["Authorization"] = "Bearer trk_<the rest of your token>",
},
ssl_verify = true,
})
end, payload)
}The ingest URL is shared across every webhook connection — your connection identity is carried inside the trk_… bearer token, so copy the full string from the in-app setup.
Step 4 — Reload and verify
nginx "text-[#0e9373]">-t && nginx "text-[#0e9373]">-s reloadThen in Trakkr click Send verification in the Crawler header.
No Lua? Ship JSON access logs to the webhook
If you can't run OpenResty or add the lua module, the equivalent path is:
- 1.Configure Nginx to emit JSON access logs (
log_format json_combined ... ; access_log /var/log/nginx/access.json json_combined;). - 2.Run a log shipper (Vector, Fluent Bit, Filebeat) that POSTs each line as a JSON array to your Trakkr webhook URL with the same bearer token.
Trakkr accepts the same payload shape regardless of source — so a Vector pipeline filtering Nginx access logs by User-Agent and forwarding to the webhook is functionally identical to the Lua block above.
Troubleshooting
"no resolver defined to resolve …"
You didn't add a resolver directive in http { }. Add it and reload.
"unable to get local issuer certificate"
The lua_ssl_trusted_certificate path is wrong for your distro. Check the table above and fix it.
"the timer callback errored: lua-resty-http is not installed"
Install the library: opm get ledgetech/lua-resty-http. If you don't use OPM, the library is also available via LuaRocks or as a vendored single file.
"Lua block runs but Trakkr Feed stays empty"
Tail your error log for trakkr: ingest … messages. Common causes:
- Outbound HTTPS blocked from your server (firewall or VPC egress rules).
- Bearer token mismatch — copy fresh from Trakkr.
- The DNS resolver in
http { }can't reach Cloudflare's1.1.1.1— try8.8.8.8.
AWS CloudFront - install guide
Sites delivered through a CloudFront distribution.
For sites delivered through AWS CloudFront, Trakkr ships a Lambda@Edge function that runs on Viewer Request — every request, including cache hits — and POSTs AI crawler hits to the shared Trakkr ingest URL using your per-connection bearer token.
Before you start
- An existing CloudFront distribution.
- The ability to publish a Lambda function in
us-east-1(Lambda@Edge requirement). - Permission to associate Lambda functions with CloudFront behaviors.
- Your bearer token from Trakkr (Step 1). The ingest URL is shared (
https://api.trakkr.ai/crawler-connect/ingest/manual); the bearer token identifies your connection server-side.
(Note: the in-product setup card currently mentions Origin Request — that copy is wrong. Use Viewer Request so you also catch CloudFront cache hits.)
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr (stack picker on empty state; otherwise click Add source in the header).
- 2.Under Self-hosted runtime, pick AWS CloudFront.
- 3.Name the source (e.g.
Primary distribution). - 4.Copy the Lambda@Edge template from the in-app setup — the ingest URL and a unique bearer token (
trk_…) are pre-filled.
Step 2 — Create the Lambda function
- 1.In AWS Console, switch your region to us-east-1 (N. Virginia) — required for Lambda@Edge.
- 2.Go to Lambda → Create function → Author from scratch.
- 3.Name it
trakkr-crawler-edge. - 4.Runtime: Node.js 18.x or newer.
- 5.Permissions: create a new role with basic Lambda permissions plus the AWS Lambda@Edge managed policy (
lambda-edge-rolefrom the dropdown if you've used Lambda@Edge before, otherwise create one). - 6.Paste the function code from the in-app setup into the editor.
Here's the shape (the in-app version pre-fills your bearer token):
class="text-[#0e7c5a]">'use strict'
const https = require(class="text-[#0e7c5a]">'https')
const { URL } = require(class="text-[#0e7c5a]">'url')
const TRAKKR_URL = class="text-[#0e7c5a]">'https:class="text-[#94a3b8] italic">//api.trakkr.ai/crawler-connect/ingest/manual'
const TRAKKR_TOKEN = class="text-[#0e7c5a]">'trk_<the rest of your token>'
const AI_BOT_RE = /GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Claude-SearchBot|Bytespider|CCBot|Amazonbot|Applebot(?!-Extended)|Meta-ExternalFetcher|MistralAI-User|Google-Agent|anthropic-ai|cohere-ai|Claude-Web|Claude-Code|Diffbot/i
function postToTrakkr(payload) {
return new Promise((resolve) => {
const target = new URL(TRAKKR_URL)
const body = JSON.stringify(payload)
const req = https.request({
hostname: target.hostname,
port: target.port || 443,
path: target.pathname + target.search,
method: class="text-[#0e7c5a]">'POST',
headers: {
class="text-[#0e7c5a]">'content-type': class="text-[#0e7c5a]">'application/json',
class="text-[#0e7c5a]">'content-length': Buffer.byteLength(body),
authorization: class=class="text-[#0e7c5a]">"text-[#0e7c5a]">`Bearer ${TRAKKR_TOKEN}`,
},
timeout: 1200,
}, (res) => { res.on(class="text-[#0e7c5a]">'data', () => {}); res.on(class="text-[#0e7c5a]">'end', resolve) })
req.on(class="text-[#0e7c5a]">'error', resolve)
req.on(class="text-[#0e7c5a]">'timeout', () => { req.destroy(); resolve() })
req.write(body)
req.end()
})
}
exports.handler = async (event) => {
const request = event.Records[0].cf.request
const headers = request.headers || {}
const ua = (headers[class="text-[#0e7c5a]">'user-agent'] || [{}])[0].value || class="text-[#0e7c5a]">''
if (!AI_BOT_RE.test(ua)) return request
const host = (headers.host || [{}])[0].value || class="text-[#0e7c5a]">''
await postToTrakkr([{
timestamp: new Date().toISOString(),
url: class=class="text-[#0e7c5a]">"text-[#0e7c5a]">`https:class=class="text-[#0e7c5a]">"text-[#94a3b8] italic">//${host}${request.uri}${request.querystring ? `?${request.querystring}class=class="text-[#0e7c5a]">"text-[#0e7c5a]">` : class="text-[#0e7c5a]">''}`,
requestPath: request.uri,
method: request.method,
referrer: (headers.referer || [{}])[0].value || class="text-[#0e7c5a]">'',
ip: request.clientIp || class="text-[#0e7c5a]">'',
userAgent: ua,
}])
return request
}Step 3 — Publish a numbered version
Lambda@Edge associations require a numbered version, not $LATEST.
- 1.In the Lambda console: Actions → Publish new version.
- 2.Add a description (e.g.
v1 — Trakkr crawler tracking). - 3.Click Publish.
Copy the function's full ARN — it ends with :1 (or whatever version number).
Step 4 — Associate with CloudFront
- 1.Go to your CloudFront distribution → Behaviors tab.
- 2.Pick the default behavior (
*) → Edit. - 3.Scroll to Function associations at the bottom.
- 4.Under Viewer Request, set:
- Function type: Lambda@Edge - Function ARN: paste the versioned ARN from Step 3.
- 1.Save changes. CloudFront begins deploying the change — takes 5–10 minutes globally.
If you have multiple behaviors that serve HTML, repeat for each one. Static-asset behaviors (CSS, JS, fonts) can skip this — bots don't usually hit assets.
Step 5 — Verify
Wait for the CloudFront deploy to finish (the status flips from "In progress" back to "Deployed"). Then in Trakkr click Send verification in the Crawler header.
Troubleshooting
"Distribution deploy fails with LambdaValidationError"
The function isn't in us-east-1, or you tried to associate $LATEST instead of a numbered version. Re-create in us-east-1 and publish a version.
"Function executes but Trakkr Feed is empty"
Tail the CloudWatch logs for the function. They live in the region where the request was served, not us-east-1 — open CloudWatch in us-east-1, us-west-2, eu-west-1, etc. depending on traffic.
"Bearer token rotated, function still sends the old one"
Lambda@Edge has no env vars. Edit the function, paste the new token, Publish new version, and re-associate the new ARN in the CloudFront behavior. The old version stays live until the distribution finishes redeploying.
Akamai - install guide
Sites behind Akamai with DataStream 2 available.
For sites behind Akamai, Trakkr uses Akamai's DataStream 2 product to forward request logs to a custom HTTPS endpoint — the shared Trakkr ingest URL, authenticated with a per-connection Basic Auth pair.
Before you start
- Akamai Control Center access with DataStream 2 entitlement (most Akamai contracts include it; verify before starting).
- The Akamai property you want to track.
- Your Basic Auth username and password from Trakkr (Step 1). The ingest URL is shared (
https://api.trakkr.ai/crawler-connect/ingest/manual); your connection identity is carried by the Basic Auth header.
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr (stack picker on empty state; otherwise click Add source in the header). Under Self-hosted runtime, pick Akamai.
- 2.Name the source (e.g.
Akamai property #12345). - 3.Trakkr generates the ingest URL plus a Basic Auth username (your connection ID) and password (your webhook secret). Copy all three — you'll paste them into the Akamai console next.
Step 2 — Create a DataStream in Akamai
- 1.In Akamai Control Center, navigate to DataStream 2 → Create Stream.
- 2.Select the property you want to track.
- 3.Choose JSON as the stream format. Akamai DataStream 2 emits newline-delimited JSON (NDJSON) by default — Trakkr's ingest endpoint accepts both NDJSON and array-of-JSON, so no extra formatting is needed.
- 4.Pick the destination: Custom HTTPS connector.
Step 3 — Configure the HTTPS connector
| Field | Value |
|---|---|
| Endpoint URL | https://api.trakkr.ai/crawler-connect/ingest/manual |
| Content-Type | application/json |
| Authentication | Basic |
| Username | The username Trakkr generated (your connection ID) |
| Password | The password Trakkr generated (your webhook secret) |
Akamai will send a sample validation payload to the endpoint to confirm reachability. Trakkr accepts and discards validation pings without storing them.
Step 4 — Pick the right log fields
You want the fields Trakkr needs to attribute a bot visit. Select these in the DataStream field picker:
- Request Host
- Request Path
- Request Method
- User-Agent
- Client IP
- Status Code
- Country / Region
- Request Time (seconds)
You can include more fields — Trakkr ignores anything beyond what it uses — but you should at minimum include this set.
Step 5 — Optional User-Agent filter
If you want to keep your DataStream volume tight, add a match rule:
User-Agent matches: GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Claude-SearchBot|Bytespider|CCBot|Amazonbot|Applebot|Meta-ExternalFetcherWithout the filter, Akamai sends every request and Trakkr does the bot detection. With the filter, only matched requests are sent — cheaper and faster, but the filter list needs maintenance when new bots ship. Trakkr's server-side detector covers the full bot set automatically; the Akamai filter is just a volume optimization.
Step 6 — Activate the stream
Save and activate the DataStream. Akamai begins forwarding in batches, typically within 10 minutes. Then in Trakkr click Send verification in the Crawler header to test end-to-end.
Troubleshooting
"Akamai validation failed"
- Confirm the username/password match exactly — Basic Auth is case-sensitive.
- Confirm the endpoint URL is reachable from public internet (no IP allowlisting beyond what Akamai's documentation specifies).
"Data arrives but with wrong field names"
DataStream allows custom field naming. Trakkr accepts both standard CloudFront-style and Akamai-style fields — the in-app docs list the accepted aliases. If you renamed fields, either rename back to defaults or open a support ticket and we'll add the alias.
"No country data on visits"
Country is part of the DataStream field set but isn't enabled by default on all contracts. In the DataStream config, confirm Country or Geo fields are selected.
"Volume is huge and you're worried about cost"
Apply the User-Agent match rule from Step 5. It typically cuts DataStream egress by 99% on sites that aren't dominated by bot traffic.
Fastly - install guide
Sites on Fastly using the standard logging surface.
For sites on Fastly, Trakkr uses Fastly's Real-Time Log Streaming with an HTTPS endpoint — forwarding crawler hits to the shared Trakkr ingest URL as they happen, authenticated with a per-connection bearer token.
Before you start
- A Fastly service serving your site.
- Permission to add logging endpoints on the service.
- Your bearer token from Trakkr (Step 1). The ingest URL is shared (
https://api.trakkr.ai/crawler-connect/ingest/manual); the bearer token identifies your connection.
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr (stack picker on empty state; otherwise click Add source in the header). Under Self-hosted runtime, pick Fastly.
- 2.Name the source.
- 3.Trakkr generates the ingest URL and a unique bearer token (
trk_…). The setup screen also shows the VCL log format string you'll need in Step 3.
Step 2 — Add an HTTPS logging endpoint in Fastly
- 1.In Fastly, open the service → Logging → Create endpoint → HTTPS.
- 2.Configure:
| Field | Value |
|---|---|
| URL | https://api.trakkr.ai/crawler-connect/ingest/manual |
| Method | POST |
| Content-Type | application/json |
| Custom header | Authorization: Bearer trk_<the rest of your token> |
| JSON format | Array of JSON |
- 1.Domain ownership challenge: Fastly validates that you own the destination domain by requesting
/.well-known/fastly/logging/challengeon it. Trakkr serves that challenge automatically atapi.trakkr.ai— no separate proxy or DNS change needed.
Step 3 — Set the log format
In the same endpoint configuration, paste the Trakkr log format under Log format:
{
"timestamp": "%{strftime({"%Y-%m-%dT%H:%M:%S%z"}, time.start)}V",
"host": "%{if(req.http.Fastly-Orig-Host, req.http.Fastly-Orig-Host, req.http.Host)}V",
"url": "%{json.escape(req.url)}V",
"request_method": "%{json.escape(req.method)}V",
"request_referer": "%{json.escape(req.http.referer)}V",
"request_user_agent": "%{json.escape(req.http.User-Agent)}V",
"client_ip": "%{req.http.Fastly-Client-IP}V",
"country_code": "%{client.geo.country_code}V",
"response_status": %{resp.status}V
}The in-app setup includes this format pre-escaped for Fastly's textarea, so copy from there rather than from this page.
Step 4 — Optional User-Agent filter
Fastly can pre-filter log lines using a VCL condition, which keeps egress volume down. In the endpoint, set a Condition:
req.http.User-Agent ~ "GPTBot|ChatGPT|ClaudeBot|Claude-User|Perplexity|OAI-Search|Bytespider|Amazonbot"Without it, every request is sent to Trakkr and Trakkr's server-side detector handles the filtering. With it, only matched requests leave Fastly — typically cuts log volume by 99%+ on sites with normal human traffic.
Step 5 — Activate the version and verify
- 1.Activate the new VCL version (Fastly applies changes on activation, not on save).
- 2.In Trakkr, click Send verification in the Crawler header. The synthetic GPTBot ping should appear within 30 seconds.
Troubleshooting
"Domain ownership challenge fails"
The challenge endpoint must respond with the expected token. If your Fastly service is in front of your own domain (not pointing at api.trakkr.ai), the challenge happens on api.trakkr.ai directly, which is always served correctly. If you're seeing a failure, the most common cause is a misconfigured ingest URL — re-copy from Trakkr.
"Logs not arriving"
- Did you activate the VCL version? Saving alone doesn't apply changes.
- Bearer token mismatch — re-copy from Trakkr.
- Try without the VCL condition first to confirm the pipeline, then re-add the filter once data flows.
"Volume looks low / missing crawler hits"
If you set the VCL filter, the regex needs to include any bots you care about. Trakkr's default detector covers ~30 patterns; the Fastly regex above hits the most common ones but misses the long tail. Either widen the regex or remove the filter and let Trakkr do server-side detection.
Other / Custom - install guide
Any stack with access to request logs or a log forwarder.
If your stack isn't covered by the first-class integrations, you can POST batched crawler visits to Trakkr's manual ingest endpoint from anything — a custom edge stack, a log shipper (Vector, Fluent Bit, Filebeat, Logstash), a homegrown analytics service, an old-school cron job that tails access.log.
This is the lowest common denominator. As long as you can make an authenticated HTTPS POST with a JSON body, you can ship crawler data to Trakkr.
Before you start
- The ability to make outbound HTTPS POSTs from your source.
- Access to a stream of request data (access logs, log shipper output, edge function output, message queue, etc.).
- Your bearer token from Trakkr (Step 1). The ingest URL is shared (
https://api.trakkr.ai/crawler-connect/ingest/manual); the bearer token identifies your connection.
Step 1 — Create the connection in Trakkr
- 1.Open the Crawler page in Trakkr (stack picker on empty state; otherwise click Add source in the header). Under Self-hosted runtime, pick Other / Custom.
- 2.Name the source so future-you knows what's behind it (e.g.
Vector → Trakkr (us-east edge)). - 3.Trakkr generates the ingest URL and a unique bearer token (
trk_…). The same connection also supports Basic Auth if your shipper prefers that — username is your connection ID, password is your webhook secret, both shown in the in-app setup.
Step 2 — The payload contract
POST a JSON array of crawler-visit objects to the shared ingest URL with the bearer token in the Authorization header. You can send a single visit per request (array with one item) or batch up to a few hundred per call.
"text-[#5b5fc7]">curl "text-[#0e9373]">-X POST "https://api.trakkr.ai/crawler-connect/ingest/manual" \
"text-[#0e9373]">-H "Content-Type: application/json" \
"text-[#0e9373]">-H "Authorization: Bearer trk_<the rest of your token>" \
"text-[#0e9373]">-d '[{
"timestamp": "2026-04-06T12:00:00Z",
"url": "https://example.com/page",
"requestPath": "/page",
"userAgent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
"ip": "203.0.113.45",
"statusCode": 200,
"country": "US"
}]'Required fields
| Field | Type | Notes |
|---|---|---|
timestamp | string (ISO 8601) | When the request happened. |
userAgent | string | Trakkr does server-side bot detection — send the raw UA. |
url | string | The full URL hit. Alternative: requestPath + host. |
Optional but useful fields
| Field | Type | Notes |
|---|---|---|
requestPath | string | Path component if url is full URL. |
method | string | GET / POST / HEAD. Defaults to GET. |
statusCode | number | HTTP status returned. |
referrer | string | Referer header. |
ip | string | Client IP (informational only, never used for security). |
country | string | ISO 3166 alpha-2 (e.g. US, GB). |
Anything else you send is ignored.
Step 3 — Filter (or don't)
You can either:
- Send everything. Trakkr's server-side detector picks out AI crawlers and discards everything else. Wastes bandwidth, but you never have to maintain the bot pattern list.
- Pre-filter at the source. Faster and cheaper, but the filter regex needs to evolve as new bots ship.
The pre-filter regex Trakkr uses (matching ~30 bots, including ChatGPT-User, GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Bytespider, MistralAI-User, etc.) is shown in the in-app setup. Copy it from there.
Step 4 — Verify
Send a sample event from your stack and watch the Trakkr Feed. The simplest test: run the curl example above with your real bearer token, then refresh the Feed.
For end-to-end pipeline validation: click Send verification in the Crawler header. This triggers a synthetic fetch of your homepage tagged as GPTBot, which exercises the full Trakkr pipeline (ingest → detector → dashboard).
https://api.trakkr.ai/crawler-connect/ingest/manual/validate — same payload, same auth, but it returns a per-entry normalization report without writing to BigQuery. Useful when you're building a Vector / Fluent Bit pipeline and want to confirm field mapping before flipping live ingestion on.Patterns by source
Vector / Fluent Bit / Filebeat
Use the http sink:
[sinks.trakkr]
type = "http"
inputs = ["access_logs_filtered"]
uri = "https://api.trakkr.ai/crawler-connect/ingest/manual"
encoding.codec = "json"
batch.max_events = 100
headers.Authorization = "Bearer trk_<the rest of your token>"A custom edge worker / function
Same as the Cloudflare Worker or Lambda@Edge templates — match User-Agent, build the payload, POST in the background.
A daily cron tailing access.log
"text-[#94a3b8] italic"># Parse the last day of Nginx access logs, filter for bots, POST in batches of 100
tail "text-[#0e9373]">-F /var/log/nginx/access.log \
| grep "text-[#0e9373]">-E "(GPTBot|ChatGPT-User|ClaudeBot|Claude-User|PerplexityBot|Perplexity-User|OAI-SearchBot|Bytespider|Amazonbot)" \
| jq "text-[#0e9373]">-R 'split(" ") | { ... payload here ... }' \
| <batch and POST>Troubleshooting
"Webhook returns 401"
Bearer token mismatch. Open the connection card in Trakkr → Reveal to see the full trk_… token and re-copy.
"Webhook returns 400"
The payload structure is off. Most common: sending a single object instead of an array, or missing timestamp / userAgent / url. The response body explains which field failed.
"Events arrive but aren't classified as bots"
Trakkr only stores requests it recognizes as AI crawlers. If you sent a Chrome User-Agent, it'll be silently discarded — that's the design. Confirm the User-Agent field in your payload actually contains GPTBot or another known bot string.
"Token leaked / I want to rotate"
There's no in-app rotate yet. The safe path is: open the connection in Trakkr → Disconnect, then create a fresh Other / Custom source. The new connection gets a brand-new bearer token (and Basic Auth pair). Swap the credentials in your shipper, then verify.
WordPress - install guide
WordPress sites, including those behind Sucuri, Cloudflare, or any WAF.
WordPress crawler tracking uses a small Trakkr plugin that runs at the origin on every request. No front-end scripts, no browser weight, full visibility behind a CDN or WAF.
Four steps, 5 to 10 minutes for most teams.
- 1.Connect your WordPress site under Integrations → Sites.
- 2.Enable crawler tracking from the Crawler page.
- 3.Install the Trakkr plugin in WordPress.
- 4.Verify.
X-Sucuri-ClientIP, CF-Connecting-IP, True-Client-IP, and similar headers automatically, so country and IP attribution still work. See Trakkr behind a WAF for the full picture, including bot allowlisting.Step 1. Connect your WordPress site
The Trakkr plugin syncs data over WordPress's REST API. To talk to that API, Trakkr needs an authenticated connection to your site, stored under Sites, separate from crawler tracking itself.
- 1.In Trakkr, click Integrations in the sidebar, then pick the Sites category (or go straight to
/sites). - 2.Click Connect destination, then pick WordPress.
- 3.Pick a flow.
Quick Connect (OAuth). Trakkr redirects you to your wp-admin, you approve the connection, and you come back authenticated.
Manual Connection (Application Password). For sites where the OAuth redirect doesn't fit, such as custom security plugins, IP-locked admin areas, or multisite setups.
Creating an Application Password
- 1.In WordPress admin, go to Users → Profile for the admin user you want Trakkr to act as.
- 2.Scroll to the Application Passwords section near the bottom.
- 3.Type
Trakkrin the name field and click Add New. (The button may render as Add New Application Password on older WordPress versions.) - 4.WordPress shows the generated password once, as
xxxx xxxx xxxx xxxx xxxx xxxx. Copy it now. - 5.Back in Trakkr's Manual Connection form, enter the WordPress username and paste the password. Spaces are stripped automatically.
manage_options capability. The default Administrator role has it, and any custom role with that capability granted also works. Editor and lower roles can't reach Trakkr's sync endpoints because the plugin checks for manage_options on every request.Once the connection succeeds, the WordPress site appears on the Sites page with a green Connected status.
Step 2. Enable crawler tracking
- 1.In Trakkr, open the Crawler page from the sidebar. If you already have other sources connected, click Add source in the header to open the stack picker.
- 2.Find the WordPress chip (it appears once you have a connected WordPress site without a crawler connection yet) and click it.
- 3.Pick the connected WordPress site from the list.
- 4.Click Enable tracking.
Trakkr immediately checks whether the Trakkr plugin is already installed on the site. Two possible outcomes:
- Plugin detected and active. You'll see a "WordPress connected" confirmation. Tracking is live. Skip to Step 4.
- Plugin not detected. You'll see an "Almost there" screen with a Download plugin button. Continue to Step 3.
Step 3. Install the plugin
- 1.In Trakkr's "Almost there" screen, click Download plugin. You'll get a
trakkr-crawler.zipfile. - 2.In wp-admin, go to Plugins → Add New → Upload Plugin.
- 3.Click Choose File, pick the
.zipyou just downloaded, click Install Now. - 4.Once installed, click Activate Plugin.
That's it. No plugin configuration screens, no settings to fill in. The plugin starts detecting AI bot User-Agents on the very next request that hits your site.
Step 4. Verify
Back in Trakkr's Crawler page, click Send test ping in the header. Within 30 seconds, three synthetic events (GPTBot, PerplexityBot, ChatGPT-User) appear in the Feed with a Verified badge. That confirms Trakkr is receiving and rendering events.
Real bot traffic comes in waves, not on a schedule. Most active WordPress sites see their first real GPTBot, ClaudeBot, or PerplexityBot hit within 24 hours of install.
Behind a CDN, WAF, or security plugin
The plugin transparently reads X-Sucuri-ClientIP, CF-Connecting-IP, and similar proxy headers, so country and IP attribution still work behind a WAF. Two failure modes to know about. Heavy CDN caching hides a small fraction of requests from the origin, and security plugins (Sucuri, Wordfence, iThemes) often block AI user-agents by default. Per-platform allowlist recipes and the full cache story are on Trakkr behind a WAF.
Troubleshooting
Webflow - install guide
Webflow-hosted sites with no access to request logs.
Webflow doesn't expose request logs and doesn't run third-party code at the edge, so a JavaScript pixel inside a Webflow site only sees crawlers that execute JavaScript — and most AI crawlers don't. The fix: put Cloudflare in front of Webflow as a transparent proxy. Webflow keeps serving every page; Cloudflare observes every request, including bot hits Webflow itself can't surface.
How this works
- 1.You add your domain to Cloudflare and recreate the Webflow DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.Webflow continues to serve every page — nothing about how your site renders changes, and you don't need to touch the Designer.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 15 minutes the first time and zero minutes thereafter.
Before you start
- Admin access to your Webflow project.
- A free or paid Cloudflare account.
- Control over your domain's nameservers at your registrar (Namecheap, GoDaddy, Google Domains, Cloudflare Registrar, etc.).
Step 1 — Note your current Webflow DNS records
- 1.In Webflow, open Project Settings → Publishing.
- 2.Note the DNS records for your custom domain. Webflow shows the exact values — typically an A record for the root domain plus a CNAME for the
wwwsubdomain. Webflow's canonical IPs change over time, so always read them from Webflow's panel rather than copying from this page.
Take a screenshot or copy these somewhere — you'll recreate them in Cloudflare in Step 2.
Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, click Add a site and enter your domain.
- 2.Pick the Free plan — it's enough for crawler tracking.
- 3.Cloudflare scans your existing DNS records. Some may import automatically — verify they match the values Webflow showed you.
- 4.Add or correct records until the Cloudflare zone has exactly the same A / CNAME values Webflow published.
- 5.Make sure each record's proxy status is the orange cloud (Proxied), not grey (DNS only). Cloudflare can only see traffic for proxied records.
- 6.Cloudflare gives you two nameservers (e.g.
mia.ns.cloudflare.com,bob.ns.cloudflare.com).
Step 3 — Update nameservers at your registrar
- 1.Log in to your domain registrar.
- 2.Find the nameserver settings for the domain.
- 3.Replace the existing nameservers with the two from Cloudflare.
- 4.Save.
Propagation usually completes in 10 minutes to a few hours. Cloudflare emails you when it sees your domain on its nameservers.
Step 4 — Connect Webflow in Trakkr
- 1.In Trakkr, open the Crawler page and pick Webflow under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your Webflow site.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yoursite.com via Webflow" in your Connections list.
Troubleshooting
"I don't want to move nameservers"
Cloudflare's free plan requires using its nameservers. If that's not an option, the alternatives are:
- A different CMS that supports custom edge code (Webflow doesn't on standard plans).
- The Webflow → Cloudflare CNAME setup with partial DNS, only available on Cloudflare Business and Enterprise. Contact Cloudflare sales if you need this.
"Site went down during the switch"
Almost always a record mismatch. Compare your Cloudflare DNS to Webflow's Project Settings → Publishing panel exactly — the IPs there are the source of truth, not anything copied from elsewhere.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm the records are proxied (orange cloud), not DNS-only (grey cloud). DNS-only zones don't generate analytics.
"Can I keep my existing JavaScript pixel?"
Yes. A JS pixel inside a Webflow site catches crawlers that execute JavaScript (Perplexity-User, ChatGPT-User in some cases). It misses GPTBot, ClaudeBot, Bytespider, and most training crawlers, which don't run JS. The Cloudflare proxy method catches every crawler server-side. If both are running, Trakkr deduplicates events so you won't double-count.
Shopify - install guide
Shopify storefronts on standard plans.
Shopify doesn't expose request logs to merchants on standard plans, and third-party edge code isn't supported on the storefront. The fix: put Cloudflare in front of Shopify as a transparent proxy — Shopify keeps serving every product and content page, Cloudflare observes every request, including AI crawler hits that Shopify itself can't show you.
How this works
- 1.You add your domain to Cloudflare and recreate the Shopify DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.Shopify continues to serve every storefront page — theme rendering, Liquid templates, and the rest of the merchant experience are unchanged.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 15 minutes the first time and zero minutes thereafter.
Before you start
- Shopify Admin access.
- A primary domain connected to Shopify via DNS, not transferred to Shopify. If your domain was bought through Shopify and they hold it, you can still do this — see the note in Step 1.
- A free or paid Cloudflare account.
- Control over your domain's nameservers at your registrar.
Step 1 — Check your current Shopify DNS
- 1.In Shopify Admin → Settings → Domains.
- 2.Confirm your primary domain is connected via DNS (not transferred). Shopify shows you the exact A record (for the root domain) and CNAME (for
www) that your domain should use. Read those values from Shopify directly — Shopify's canonical IPs change over time, and copying a stale one will take your store offline.
Note the values Shopify displays — you'll recreate them exactly in Cloudflare.
Bought your domain through Shopify?
Shopify-managed domains can't use external nameservers, so the standard flow doesn't apply directly. Two options:
- 1.Transfer the domain out of Shopify to a registrar that supports custom nameservers (Cloudflare Registrar is the simplest). Then follow the standard flow below.
- 2.Contact Cloudflare for partial DNS (CNAME setup, Business+ plan). Shopify-managed DNS can host a CNAME pointing into Cloudflare, but it's a more involved setup.
Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, Add a site → enter your domain.
- 2.Pick the Free plan — it's enough for crawler tracking.
- 3.Cloudflare imports existing DNS records. Verify each one matches the values Shopify showed you; add or correct anything that doesn't.
- 4.Make sure each record's proxy status is the orange cloud (Proxied), not grey.
- 5.Cloudflare gives you two nameservers.
Step 3 — Update nameservers at your registrar
- 1.In your registrar dashboard, find the nameserver settings for the domain.
- 2.Replace the existing ones with the two from Cloudflare.
- 3.Save.
DNS propagation takes minutes to hours. Cloudflare emails you when your domain is active.
Step 4 — Connect Shopify in Trakkr
- 1.In Trakkr, open the Crawler page and pick Shopify under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your storefront.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yourstore.com via Shopify" in your Connections list.
Troubleshooting
"Storefront broke after the switch"
Almost always a record mismatch. Compare your Cloudflare DNS to Shopify Admin → Settings → Domains exactly. The canonical A/CNAME values shown there are the source of truth.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm records are proxied (orange cloud). DNS-only zones produce no analytics for Trakkr to read.
"Shopify Admin says my domain isn't connected"
This can happen if the Cloudflare records don't exactly mirror Shopify's expected values. Visit https://yourstore.com from an incognito browser — if it loads your store, the connection is fine and Shopify Admin's status is stale. Re-verify in Admin after Cloudflare DNS fully propagates.
HubSpot - install guide
HubSpot CMS sites without edge log access.
HubSpot CMS doesn't expose raw edge request logs and doesn't run third-party code at the edge. The fix: put Cloudflare in front of HubSpot as a transparent proxy. HubSpot keeps serving every page; Cloudflare observes every request, including AI crawler hits HubSpot can't show you.
How this works
- 1.You add your domain to Cloudflare and recreate the HubSpot DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.HubSpot continues to serve every page — landing pages, website pages, blog posts, knowledge base, all unchanged.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 15 minutes the first time and zero minutes thereafter.
Before you start
- HubSpot CMS admin access for the site whose domain you want to track.
- A free or paid Cloudflare account.
- Control over your domain's nameservers at your registrar.
Step 1 — Note your HubSpot DNS records
- 1.In HubSpot, Settings → Website → Domains & URLs.
- 2.Note the records HubSpot is using for the connected domain. HubSpot shows the exact CNAME target hostname (it's account-specific and changes over time) — read it from HubSpot rather than copying from this page.
- 3.Note whether you're connecting the root domain (
example.com), a subdomain (www.example.comorblog.example.com), or both. HubSpot configures each separately and you'll mirror that in Cloudflare.
example.com and Cloudflare will handle the rest. If you're only tracking a subdomain like www.example.com, this isn't a concern.Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, Add a site → enter your domain → pick the Free plan.
- 2.Cloudflare imports existing records. Verify each one matches the values HubSpot showed you in Step 1; add or correct anything missing.
- 3.Set the proxy status for each record to the orange cloud (Proxied). Cloudflare can only generate analytics for proxied records.
- 4.Cloudflare gives you two nameservers.
Step 3 — Update nameservers at your registrar
- 1.At your registrar, replace the existing nameservers with the two from Cloudflare.
- 2.Save.
DNS propagation takes minutes to hours.
Step 4 — Connect HubSpot in Trakkr
- 1.In Trakkr, open the Crawler page and pick HubSpot under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your HubSpot site.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yoursite.com via HubSpot" in your Connections list.
track.hubspot.com domain) is unaffected by this setup.Troubleshooting
"I don't want to move nameservers"
Cloudflare's free plan requires using its nameservers. If that's not an option, the alternative is the partial DNS / CNAME setup on Cloudflare Business and Enterprise — Cloudflare can sit in front of HubSpot via a CNAME, without taking over the zone. Contact Cloudflare sales if you need that path.
"HubSpot pages serving in the wrong language or wrong locale"
HubSpot uses request headers to choose locale variants. Cloudflare's default settings preserve these, but aggressive Cloudflare caching can end up serving cached non-localized pages. Keep Cloudflare's caching at Cache Level: Standard (the default) and HubSpot's locale routing works normally.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm records are proxied (orange cloud). DNS-only mode doesn't generate the analytics Trakkr needs.
"HubSpot's domain connection check fails after the switch"
Run the HubSpot domain verification again after Cloudflare's nameservers fully propagate. If a record is missing or the proxy is grey-cloud, HubSpot may not see the expected SSL chain — fix the record, wait for propagation, retry.
Squarespace - install guide
Squarespace sites with a domain you can repoint.
Squarespace doesn't surface server logs and doesn't run third-party edge code on standard plans, so a JavaScript pixel inside a Squarespace site misses most AI crawlers. The fix: put Cloudflare in front of Squarespace as a transparent proxy. Squarespace keeps building and serving every page; Cloudflare observes every request, including the bots Squarespace itself doesn't show you.
How this works
- 1.You add your domain to Cloudflare and recreate the Squarespace DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.Squarespace continues to serve every page — builder, member areas, commerce, all unchanged.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 20 minutes the first time and zero minutes thereafter — slightly longer than the other CMSes if your domain was bought through Squarespace, because you'll transfer it out first.
Before you start
- Squarespace admin access.
- A domain you can repoint to Cloudflare nameservers. Squarespace-purchased domains require an extra step — see Step 1.
- A free or paid Cloudflare account.
Step 1 — Use external DNS in Squarespace
- 1.In Squarespace, open Settings → Domains.
- 2.If you bought the domain elsewhere (Namecheap, GoDaddy, etc.), you're already using external DNS — note the A and CNAME records Squarespace gave you to point at their servers. The exact values are shown inside the Squarespace panel.
- 3.If your domain was bought through Squarespace, Squarespace-managed domains can't use external nameservers. Transfer the domain to a registrar that supports custom nameservers (Cloudflare Registrar is the cleanest) before continuing. Squarespace has a step-by-step guide in their help center — search "transfer a domain away from Squarespace".
Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, Add a site → enter the domain → pick the Free plan.
- 2.Cloudflare imports existing records. Verify each one matches what Squarespace displayed in Step 1; add or correct anything missing.
- 3.Set proxy status to the orange cloud (Proxied) on each record.
- 4.Cloudflare gives you two nameservers.
Step 3 — Update nameservers at your registrar
- 1.Find the nameserver settings for the domain at the registrar (after transferring out of Squarespace if needed).
- 2.Replace the existing ones with the two from Cloudflare.
- 3.Save.
DNS propagation takes minutes to hours.
Step 4 — Connect Squarespace in Trakkr
- 1.In Trakkr, open the Crawler page and pick Squarespace under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your Squarespace site.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yoursite.com via Squarespace" in your Connections list.
Troubleshooting
"I can't change nameservers because my domain is managed by Squarespace"
You'll need to transfer the domain out of Squarespace first. Cloudflare Registrar is a common destination — at-cost pricing and custom nameservers supported natively. Once transferred, repoint nameservers to Cloudflare.
"Site started serving a Squarespace 'website not found' page"
The A or CNAME records in Cloudflare don't match Squarespace's expected values. Open Squarespace's domain settings, confirm the canonical A/CNAME, copy exactly into Cloudflare, and wait for propagation.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm records are proxied (orange cloud). DNS-only zones don't produce analytics.
Wix - install guide
Wix sites where you control DNS.
Wix doesn't expose raw request logs and doesn't run third-party edge code, so client-side scripts miss most AI crawlers. The fix: put Cloudflare in front of Wix as a transparent proxy. Wix keeps serving every page; Cloudflare observes every request, including AI crawler hits Wix itself can't show you.
How this works
- 1.You add your domain to Cloudflare and recreate the Wix DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.Wix continues to serve every page — Editor, ADI, stores, members area, all unchanged.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 15 minutes the first time and zero minutes thereafter.
Before you start
- Wix Editor access for the site whose domain you want to track.
- A domain you control DNS for. Wix-bought domains can also work — see Step 1.
- A free or paid Cloudflare account.
Step 1 — Note your Wix DNS records
- 1.In Wix, Settings → Domains.
- 2.Note the A and CNAME records for the connected domain. The exact values are shown inside the Wix panel — read them from there rather than copying from this page.
If the domain was purchased through Wix, you can still proxy through Cloudflare — Wix lets you change nameservers from the domain management screen.
Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, Add a site → enter your domain → pick the Free plan.
- 2.Cloudflare imports existing records. Verify each one matches the values Wix showed you in Step 1, and add or correct anything missing.
- 3.Set proxy status to the orange cloud (Proxied).
- 4.Cloudflare gives you two nameservers.
Step 3 — Update nameservers at your registrar
- 1.Replace the existing nameservers with Cloudflare's two.
- 2.Save.
DNS propagation takes minutes to hours.
Step 4 — Connect Wix in Trakkr
- 1.In Trakkr, open the Crawler page and pick Wix under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your Wix site.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yoursite.com via Wix" in your Connections list.
Troubleshooting
"Wix Editor says my site isn't published"
Cloudflare doesn't affect Wix's publishing state. If the Editor shows "not published" after the switch, click Publish in Wix — the orange-cloud Cloudflare proxy will serve the updated site as soon as Wix's origin does.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm records are proxied (orange cloud). DNS-only zones don't generate analytics.
"I see fewer visits than I expected"
Wix's CDN caches a lot of pages aggressively. AI crawlers usually request URLs with custom User-Agents that skip the cache and reach the proxied origin, so coverage is usually high. If you need lossless capture on a very heavily-cached site, deploy the optional Trakkr Cloudflare Worker alongside this proxy.
Framer - install guide
Framer-hosted sites with a custom domain.
Framer doesn't expose request logs and doesn't run third-party edge code on its hosting plans, so a JavaScript pixel inside a Framer site misses non-JS crawlers. The fix: put Cloudflare in front of Framer as a transparent proxy. Framer keeps serving every page; Cloudflare observes every request, including the AI crawlers Framer itself can't show you.
How this works
- 1.You add your domain to Cloudflare and recreate the Framer DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.Framer continues to serve every page — design changes still publish normally, and you don't touch the editor.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 15 minutes the first time and zero minutes thereafter.
Before you start
- Framer admin access to the project whose domain you want to track.
- A free or paid Cloudflare account.
- Control over your domain's nameservers at your registrar.
Step 1 — Note your Framer DNS records
- 1.In Framer, open Site Settings → Domains.
- 2.Copy the CNAME record exactly as Framer shows it. Framer's custom-domain target hostname can change over time, so always read it from the Framer panel rather than copying from this page.
Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, Add a site → enter the domain → pick the Free plan.
- 2.Cloudflare imports existing DNS records. Verify the Framer CNAME target matches what Framer showed you in Step 1; add or correct it if needed.
- 3.Set the proxy status to the orange cloud (Proxied). DNS-only mode means Cloudflare doesn't generate the analytics signal Trakkr needs.
- 4.Cloudflare gives you two nameservers.
Step 3 — Update nameservers at your registrar
- 1.Find your domain in the registrar's dashboard.
- 2.Replace the existing nameservers with Cloudflare's two.
- 3.Save.
DNS propagation takes minutes to hours.
Step 4 — Connect Framer in Trakkr
- 1.In Trakkr, open the Crawler page and pick Framer under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your Framer site.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yoursite.com via Framer" in your Connections list.
Troubleshooting
"Framer site shows the Cloudflare error page"
The CNAME in Cloudflare doesn't match Framer's canonical record. Open Framer's domain settings to copy the exact CNAME, paste into Cloudflare, wait for propagation.
"Site loads but with Framer's default subdomain instead of my custom one"
Custom-domain hosting on Framer requires the CNAME to point at framer.website. If a different target is in Cloudflare, Framer falls back to its default. Fix the CNAME target.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm records are proxied (orange cloud), not DNS-only (grey). DNS-only zones produce no analytics.
Ghost - install guide
Ghost(Pro) and self-hosted Ghost without origin log access.
Ghost(Pro) doesn't give you raw access logs and doesn't run third-party code at the edge. The fix: put Cloudflare in front of Ghost as a transparent proxy. Ghost keeps publishing pages from its origin; Cloudflare observes every request so Trakkr can see crawler hits Ghost itself doesn't expose.
This guide covers both Ghost(Pro) (managed hosting) and self-hosted Ghost where you don't have access to the underlying server logs.
How this works
- 1.You add your domain to Cloudflare and recreate the Ghost DNS records there.
- 2.You switch your registrar to use Cloudflare's nameservers.
- 3.Ghost continues to serve every page — publishing, members, newsletters, all unchanged.
- 4.Trakkr reads bot traffic from Cloudflare's analytics with a scoped read-only token.
The whole flow takes about 15 minutes the first time and zero minutes thereafter.
Before you start
- Ghost Admin access for the publication whose domain you want to track.
- A free or paid Cloudflare account.
- Control over your domain's nameservers at your registrar.
Step 1 — Note your Ghost DNS records
- 1.In Ghost Admin, Settings → Custom domain.
- 2.Note the record Ghost is using. Ghost(Pro) typically uses a CNAME pointing to
<your-subdomain>.ghost.io; self-hosted Ghost uses an A record pointing at your server IP. Read the exact values from the panel rather than copying from this page.
Step 2 — Add your domain to Cloudflare
- 1.In Cloudflare, Add a site → enter the domain → pick the Free plan.
- 2.Cloudflare imports existing DNS records. Verify each one matches what Ghost displayed in Step 1; add or correct anything missing.
- 3.Set proxy status to the orange cloud (Proxied).
- 4.Cloudflare gives you two nameservers.
Step 3 — Update nameservers at your registrar
- 1.Replace the existing nameservers with Cloudflare's two.
- 2.Save.
DNS propagation takes minutes to hours.
Step 4 — Connect Ghost in Trakkr
- 1.In Trakkr, open the Crawler page and pick Ghost under Hosted CMS (or use the picker on the empty state).
- 2.The modal walks you through three short steps — Overview, DNS, Connect. If you already pointed your domain at Cloudflare above, tick "I already have Cloudflare in front of this domain" on the Overview step to skip the DNS reminder.
- 3.Open the pre-filled Cloudflare token template the modal links to, copy the token, and paste it back. Pick the zone that fronts your Ghost site.
Trakkr starts pulling crawler data within a few minutes. The connection appears as "yoursite.com via Ghost" in your Connections list.
Troubleshooting
"Ghost members area broke after Cloudflare"
Ghost's members and newsletter features use specific request paths and headers. Cloudflare's defaults preserve these, but aggressive caching or Rocket Loader can interfere. Disable Rocket Loader on the Ghost zone (Cloudflare → Speed → Optimization) and keep cache level at Standard.
"Cloudflare shows traffic but Trakkr Feed is empty"
Confirm records are proxied (orange cloud). DNS-only mode doesn't generate analytics.
"Self-hosted and want per-request capture"
If you're self-hosting Ghost behind Nginx, the Nginx / OpenResty method gives you per-request capture without relying on Cloudflare's sampled analytics. You can pair both — Trakkr deduplicates events at ingest.
