Crawler Tracking
Connect Cloudflare, Vercel, Netlify, Next.js, CloudFront, WordPress, Wix, Node, Nginx, or webhook-based edge stacks to send AI crawler visits into Trakkr.
- Pick the right server-side capture path for your site
- Connect Cloudflare, Vercel, Netlify, Next.js, AWS CloudFront, WordPress, Wix, Webflow, Shopify, Squarespace, Framer, Ghost, Node / Express, Nginx / OpenResty, or Akamai / Fastly / other webhook sources
- Verify the pipeline works in 30 seconds with the synthetic verification ping
- Backfill historical visits automatically when you connect a server-side platform
This page is the install hub. If you want to understand what crawler tracking is and how to read your data, start with AI Crawlers. If you're here to wire it up, you're in the right place.
Trakkr captures crawler visits from server-side sources: your CDN, hosting provider, edge function, CMS proxy, or log forwarder. This is more accurate than browser-based tracking because many AI bots do not execute JavaScript and some are blocked before a page loads.
Older tracking-pixel installs continue to report data, but new crawler tracking setup should use one of the server-side sources below. If you are maintaining an existing pixel install, keep it in place until your server-side source is connected and verified.
Server-side platform connections
Server-side integrations read from your CDN, host, or log drain directly. They see every request that hits your origin - including non-JS bots, blocked bots, and 404s on URLs that no longer exist.
| Platform | Auth | Realtime | Plan requirements |
|---|---|---|---|
| Cloudflare | API token | No | All Cloudflare plans |
| Vercel | OAuth | Yes | Vercel Pro or Enterprise |
| Netlify | OAuth | Yes | All Netlify plans |
| Next.js self-hosted | Webhook | Yes | Any self-hosted Next.js deployment |
| AWS CloudFront | Webhook | Yes | Lambda@Edge on any CloudFront distribution |
| WordPress | Existing adapter | No | Trakkr WordPress plugin |
| Hosted CMS | Cloudflare proxy | No | Wix, Webflow, Shopify, Squarespace, Framer, Ghost |
| Node / Express | Webhook | Yes | Any Node or Express server |
| Nginx / OpenResty | Webhook | Yes | OpenResty or nginx with a log shipper |
| Akamai / Fastly / Other | Webhook | Yes | Any CDN or edge stack that can POST visits |
Dedicated guides exist for Cloudflare, Vercel, Netlify, and WordPress. The other webhook-based runtimes are configured directly in the in-app setup flow with copy-paste templates:
- Cloudflare Setup - Create a scoped read-only API token in Cloudflare and paste it into Trakkr
- Vercel Setup - OAuth into Vercel and let Trakkr install a Log Drain
- Netlify Setup - OAuth into Netlify and let Trakkr deploy an Edge Function
- WordPress Setup - Enable crawler tracking on a connected WordPress site through the Trakkr plugin
- Wix, Webflow, Shopify, Squarespace, Framer, Ghost - Choose your CMS in the setup flow, proxy the site through Cloudflare, then connect the Cloudflare zone
- Next.js self-hosted - Copy the Proxy or middleware snippet from the setup flow and redeploy
- AWS CloudFront - Copy the Lambda@Edge template and attach it to Origin Request in CloudFront
- Node / Express - Copy the Express middleware snippet and mount it near the top of your app
- Nginx / OpenResty - Copy the OpenResty log hook or ship JSON access logs into the webhook
- Akamai / Fastly / Other - Use the webhook examples for Akamai DataStream, Fastly log streaming, or your own edge forwarder
Webhook runtimes and edge forwarders
If you're on Next.js, CloudFront, Express, Nginx / OpenResty, Akamai, Fastly, or a custom server, you can still get server-side tracking via Trakkr's webhook ingest path.
- 1Open Crawler → Connect platform
- 2Choose Next.js self-hosted, AWS CloudFront, Node / Express, Nginx / OpenResty, or Akamai / Fastly / Other
- 3Trakkr generates a unique webhook URL and bearer token for your brand
- 4Configure your runtime, CDN log forwarder, or middleware to POST AI crawler visits to that URL
- 5Use the dry-run validation endpoint or the built-in verification step to test before going live
- 6Once events are flowing, the connection switches to "Active"
Trakkr ships starter templates for Next.js Proxy, Express middleware, OpenResty log hooks, Lambda@Edge, Akamai DataStream, Fastly log streaming, and a generic webhook example directly in the dashboard.
Choosing your path
| If you... | Use |
|---|---|
| Run a server-rendered or SSG site | Server-side connection |
| Are behind Cloudflare with no other constraints | Cloudflare server-side |
| Use Wix, Webflow, Shopify, Squarespace, Framer, or Ghost | Hosted CMS via Cloudflare |
| Use Vercel or Netlify hosting | Their respective OAuth flow |
| Self-host on Next.js, Node, or Nginx | The matching first-class webhook runtime |
| Run on Akamai, Fastly, or an unsupported edge stack | Akamai / Fastly / Other |
| Want the most accurate data | Any server-side connection |
| Have a JavaScript-rendered SPA without SSR | Server-side connection |
You can connect more than one. Trakkr deduplicates events at ingest, so multiple server-side sources on the same site are safe.
Verifying your setup
Whichever path you chose, verify it the same way:
- 1Open Crawler in the sidebar
- 2Click Send Verification in the header
- 3Wait ~30 seconds and refresh the Feed
You should see a "Verified ✓" event appear with GPTBot as the bot name. This confirms the entire pipeline (your site → Trakkr's ingest → BigQuery → the dashboard) is working.
If the synthetic event arrives but real crawler events are still empty after 24 hours, the issue is upstream of Trakkr - usually a robots.txt block, a WAF rule, or a DNS misconfiguration. Check the Access tab for findings.
Connection management
Once a connection is live, you can manage it from the Crawler dashboard.
| Action | What it does |
|---|---|
| Sync now | Manually pull recent visits from the platform |
| Backfill | Re-sync a wider time window (clears the dedup ledger for that period) |
| View logs | See the last N sync attempts with status, visit counts, and error details |
| Pause | Stop syncing without disconnecting (useful during maintenance windows) |
| Disconnect | Remove the connection entirely. Cleans up Vercel Log Drains automatically |
Each connection has a health indicator showing Active, Pending, Error, or Paused. Errors include the underlying message - usually expired credentials or a permission change on the platform side.
Troubleshooting
"No crawler data showing" after install
- 1Click Send Verification to confirm the pipeline works
- 2If verification works but real visits don't appear, check your
robots.txtfor AI bot blocks - 3Check your CDN's bot management or WAF for rules that might be blocking the bots before they reach your site
- 4Wait 24 hours - some AI bots crawl on a weekly cycle and may not have visited yet
"Connection shows Error status"
- Open the connection's logs in the Connections panel
- Look for "401 Unauthorized" - usually means the platform credentials expired or were revoked
- For OAuth connections (Vercel, Netlify), reconnect to refresh the token
- For Cloudflare, regenerate the API token if it has been deleted on the Cloudflare side
"I see verification visits but no real crawls"
- Open the Access tab and check for blocking findings
- Look at your
robots.txtforDisallow: /rules under AI bot user agents - Check your CDN for bot management rules that may be challenging or blocking AI bots
Next steps
AI Crawlers
Read the dashboard - hero stats, the page funnel, and AI insights.
Cloudflare Setup
Connect a Cloudflare zone in under five minutes.
JavaScript Rendering
Make sure AI crawlers can read your client-rendered pages.
Was this helpful?
