Crawler Tracking

Connect Cloudflare, Vercel, Netlify, Next.js, CloudFront, WordPress, Wix, Node, Nginx, or webhook-based edge stacks to send AI crawler visits into Trakkr.

6 min readUpdated Apr 24, 2026

What you'll learn

Pick the right server-side capture path for your site
Connect Cloudflare, Vercel, Netlify, Next.js, AWS CloudFront, WordPress, Wix, Webflow, Shopify, Squarespace, Framer, Ghost, Node / Express, Nginx / OpenResty, or Akamai / Fastly / other webhook sources
Verify the pipeline works in 30 seconds with the synthetic verification ping
Backfill historical visits automatically when you connect a server-side platform

This page is the install hub. If you want to understand what crawler tracking is and how to read your data, start with AI Crawlers. If you're here to wire it up, you're in the right place.

Trakkr captures crawler visits from server-side sources: your CDN, hosting provider, edge function, CMS proxy, or log forwarder. This is more accurate than browser-based tracking because many AI bots do not execute JavaScript and some are blocked before a page loads.

Older tracking-pixel installs continue to report data, but new crawler tracking setup should use one of the server-side sources below. If you are maintaining an existing pixel install, keep it in place until your server-side source is connected and verified.

Server-side platform connections

Server-side integrations read from your CDN, host, or log drain directly. They see every request that hits your origin - including non-JS bots, blocked bots, and 404s on URLs that no longer exist.

Platform	Auth	Realtime	Plan requirements
Cloudflare	API token	No	All Cloudflare plans
Vercel	OAuth	Yes	Vercel Pro or Enterprise
Netlify	OAuth	Yes	All Netlify plans
Next.js self-hosted	Webhook	Yes	Any self-hosted Next.js deployment
AWS CloudFront	Webhook	Yes	Lambda@Edge on any CloudFront distribution
WordPress	Existing adapter	No	Trakkr WordPress plugin
Hosted CMS	Cloudflare proxy	No	Wix, Webflow, Shopify, Squarespace, Framer, Ghost
Node / Express	Webhook	Yes	Any Node or Express server
Nginx / OpenResty	Webhook	Yes	OpenResty or nginx with a log shipper
Akamai / Fastly / Other	Webhook	Yes	Any CDN or edge stack that can POST visits

Dedicated guides exist for Cloudflare, Vercel, Netlify, and WordPress. The other webhook-based runtimes are configured directly in the in-app setup flow with copy-paste templates:

Cloudflare Setup - Create a scoped read-only API token in Cloudflare and paste it into Trakkr
Vercel Setup - OAuth into Vercel and let Trakkr install a Log Drain
Netlify Setup - OAuth into Netlify and let Trakkr deploy an Edge Function
WordPress Setup - Enable crawler tracking on a connected WordPress site through the Trakkr plugin
Wix, Webflow, Shopify, Squarespace, Framer, Ghost - Choose your CMS in the setup flow, proxy the site through Cloudflare, then connect the Cloudflare zone
Next.js self-hosted - Copy the Proxy or middleware snippet from the setup flow and redeploy
AWS CloudFront - Copy the Lambda@Edge template and attach it to Origin Request in CloudFront
Node / Express - Copy the Express middleware snippet and mount it near the top of your app
Nginx / OpenResty - Copy the OpenResty log hook or ship JSON access logs into the webhook
Akamai / Fastly / Other - Use the webhook examples for Akamai DataStream, Fastly log streaming, or your own edge forwarder

Webhook runtimes and edge forwarders

If you're on Next.js, CloudFront, Express, Nginx / OpenResty, Akamai, Fastly, or a custom server, you can still get server-side tracking via Trakkr's webhook ingest path.

1Open Crawler → Connect platform
2Choose Next.js self-hosted, AWS CloudFront, Node / Express, Nginx / OpenResty, or Akamai / Fastly / Other
3Trakkr generates a unique webhook URL and bearer token for your brand
4Configure your runtime, CDN log forwarder, or middleware to POST AI crawler visits to that URL
5Use the dry-run validation endpoint or the built-in verification step to test before going live
6Once events are flowing, the connection switches to "Active"

Trakkr ships starter templates for Next.js Proxy, Express middleware, OpenResty log hooks, Lambda@Edge, Akamai DataStream, Fastly log streaming, and a generic webhook example directly in the dashboard.

Tip

Treat the bearer token as a secret. Anyone with the token and the webhook URL can post events to your brand's stream. If a token leaks, rotate it from the connection settings.

Choosing your path

If you...	Use
Run a server-rendered or SSG site	Server-side connection
Are behind Cloudflare with no other constraints	Cloudflare server-side
Use Wix, Webflow, Shopify, Squarespace, Framer, or Ghost	Hosted CMS via Cloudflare
Use Vercel or Netlify hosting	Their respective OAuth flow
Self-host on Next.js, Node, or Nginx	The matching first-class webhook runtime
Run on Akamai, Fastly, or an unsupported edge stack	Akamai / Fastly / Other
Want the most accurate data	Any server-side connection
Have a JavaScript-rendered SPA without SSR	Server-side connection

You can connect more than one. Trakkr deduplicates events at ingest, so multiple server-side sources on the same site are safe.

Verifying your setup

Whichever path you chose, verify it the same way:

1Open Crawler in the sidebar
2Click Send Verification in the header
3Wait ~30 seconds and refresh the Feed

You should see a "Verified ✓" event appear with GPTBot as the bot name. This confirms the entire pipeline (your site → Trakkr's ingest → BigQuery → the dashboard) is working.

If the synthetic event arrives but real crawler events are still empty after 24 hours, the issue is upstream of Trakkr - usually a robots.txt block, a WAF rule, or a DNS misconfiguration. Check the Access tab for findings.

Connection management

Once a connection is live, you can manage it from the Crawler dashboard.

Action	What it does
Sync now	Manually pull recent visits from the platform
Backfill	Re-sync a wider time window (clears the dedup ledger for that period)
View logs	See the last N sync attempts with status, visit counts, and error details
Pause	Stop syncing without disconnecting (useful during maintenance windows)
Disconnect	Remove the connection entirely. Cleans up Vercel Log Drains automatically

Each connection has a health indicator showing Active, Pending, Error, or Paused. Errors include the underlying message - usually expired credentials or a permission change on the platform side.

Troubleshooting

"No crawler data showing" after install

1Click Send Verification to confirm the pipeline works
2If verification works but real visits don't appear, check your robots.txt for AI bot blocks
3Check your CDN's bot management or WAF for rules that might be blocking the bots before they reach your site
4Wait 24 hours - some AI bots crawl on a weekly cycle and may not have visited yet

"Connection shows Error status"

Open the connection's logs in the Connections panel
Look for "401 Unauthorized" - usually means the platform credentials expired or were revoked
For OAuth connections (Vercel, Netlify), reconnect to refresh the token
For Cloudflare, regenerate the API token if it has been deleted on the Cloudflare side

"I see verification visits but no real crawls"

Open the Access tab and check for blocking findings
Look at your robots.txt for Disallow: / rules under AI bot user agents
Check your CDN for bot management rules that may be challenging or blocking AI bots

Next steps

AI Crawlers

Read the dashboard - hero stats, the page funnel, and AI insights.

Cloudflare Setup

Connect a Cloudflare zone in under five minutes.

JavaScript Rendering

Make sure AI crawlers can read your client-rendered pages.

Next upCloudflare Setup

Introduction to Trakkr

Understand the core mission and capabilities of Trakkr for AI search visibility.

Quick Start Guide

Get up and running with Trakkr in under 5 minutes.

Was this helpful?