Crawl Insights works on your server logs. Connect a data source and Peec identifies the AI bot visits in your traffic, categorizes each bot, and keeps your dashboard up to date.
There are eight ways to connect. Pick the one that matches your hosting setup:
| Integration | How it connects |
|---|
| AWS CloudFront | CloudFormation stack deployed into your AWS account |
| Google Cloud CDN | One-line setup script for your GCP project |
| Cloudflare | Worker deployed to your zone |
| Vercel | Log Drain on your Vercel team |
| WordPress | Plugin installed on your site |
| Akamai | DataStream created via the Akamai API |
| Generic webhook | Your system posts logs to a Peec endpoint |
| File upload | Upload a CSV or CLF log file directly |
You can manage or disconnect your data source at any time from Settings.
Every integration filters your traffic the same way: only requests from known AI crawlers and agents are stored, all other traffic is discarded at ingest. See the supported AI bots below.
AWS CloudFront
Peec deploys a CloudFormation stack into your AWS account. The stack receives your CloudFront access logs, filters them for AI crawler traffic, and forwards the matching entries to Peec automatically.
You’ll need permission to deploy CloudFormation stacks, plus your CloudFront Distribution ID(s).
- Select your AWS region in Peec.
- Click Open in AWS Console. A CloudFormation quick-create page opens with all parameters pre-filled.
- Deploy the stack. This usually takes 1 to 2 minutes.
- Back in Peec, enter your CloudFront Distribution ID(s), comma-separated if you have several.
The status changes to Stack deployed once the integration is live.
Google Cloud CDN
Peec generates a one-line setup command pre-filled with your credentials. Running it configures a Cloud Logging sink and deploys a Cloud Function that filters your CDN access logs and forwards AI crawler traffic to Peec.
You’ll need a GCP project with permission to enable APIs, create Pub/Sub topics, and deploy Cloud Functions, plus your Cloud Load Balancer URL map name.
- Select your GCP region in Peec.
- Copy the generated setup command.
- Run it in your terminal or Google Cloud Shell. The script asks for your Cloud Load Balancer URL map name, then completes setup and activates the integration.
The status moves from Awaiting activation to Integration active.
Cloudflare
Peec uses your Cloudflare API token to deploy a Worker to your selected zone. The Worker captures AI crawler requests in real time and forwards them to Peec without affecting your site’s response times.
Create an API token
- Navigate to your profile in Cloudflare
- Select API Tokens from the left-hand panel
- Select Create Token and click on Get Started (the first option)
- Give your token a name
- Add three permission fields by clicking on Add More under the permission section. From there, you need to add in order:
- Workers Scripts > Edit
- Zone > Zone > Read
- Zone > Workers Routes > Edit
- No need to edit anything else, you can continue to the summary and click on Create Token
Deploy the Worker
- Enter your Cloudflare API token in Peec and press Enter to validate it.
- Select your Zone from the dropdown.
- Click Deploy worker.
The Worker goes live immediately across all requests to that zone.
Cloudflare Worker limits
When you connect through Cloudflare Workers, every request your site receives, including from AI bots, counts toward your Workers request quota.
All Cloudflare accounts include the Workers Free plan by default, which allows up to 100,000 requests per day and resets daily. This limit applies regardless of your Cloudflare site plan, because Business and Enterprise site plans are separate from the Workers plan. On high-traffic sites, this limit can be reached quickly.
Once the limit is hit, two things can happen:
- Peec stops receiving log data: For the remainder of that day, Peec stops receiving any log data coming from your domain, resulting in gaps in your Crawl Insights.
- Errors to visitors: Cloudflare routes requests to Fail closed by default, meaning that once the limit is hit, Cloudflare serves a 1027 error page instead of your site.
What to do in such situations:
- Enable minimum safeguard: In your Cloudflare dashboard, set the route mode to Fail Open. Your site will remain accessible to visitors, though data collection will still be paused for the rest of the day.
- Upgrade to a Workers Paid plan: The Workers Paid plan removes the daily request cap, ensuring uninterrupted log ingestion and continuous data in Crawl Insights. You can find Cloudflare’s pricing page for details.
Vercel
Peec uses your Vercel API token to create a Log Drain on your selected team. Vercel streams access logs directly to Peec, where they’re filtered for AI crawler traffic.
You’ll need a Vercel API token with access to the team you want to monitor.
- Enter your Vercel API token in Peec and press Enter to validate it.
- Select your Team from the dropdown.
- Select the Projects to monitor, or choose All projects.
- Click Deploy drain.
WordPress
Peec generates a WordPress plugin pre-configured with your organization’s credentials. Once activated, the plugin logs AI crawler visits to Peec without impacting your site’s performance. It skips internal WordPress paths (health checks, admin AJAX, cron) and correctly identifies visitor IPs when your site sits behind a proxy or CDN.
- Click Download Plugin in Peec. A
.zip file is generated with your credentials embedded.
- In your WordPress admin panel, go to Plugins → Add New → Upload Plugin.
- Upload the
.zip file and click Activate Plugin.
The plugin confirms the connection on activation, and the status changes to Plugin active.
Akamai
Peec uses your Akamai EdgeGrid credentials to create a DataStream via the Akamai API. The stream delivers access logs to Peec every 30 seconds.
You’ll need the EdgeGrid credentials from your .edgerc file: Host, Client Token, Client Secret, and Access Token.
- Enter your Akamai EdgeGrid credentials in Peec and press Enter to validate them.
- Select a Group from the dropdown.
- Select the Properties to monitor.
- Click Deploy DataStream.
Generic webhook
If your provider isn’t listed above, you can send logs to Peec yourself. Peec gives you a webhook endpoint and an API key. You configure your own system, whether that’s a CDN, log shipper, reverse proxy, or custom application, to POST access log batches to the endpoint. Peec validates each payload and keeps the AI crawler traffic.
- Click Generate API Key in Peec.
- Click Reserve API Key to lock it in.
- Copy the Webhook URL and required headers.
- Configure your system to POST log batches to the endpoint.
- Click Confirm webhook deployment.
Regenerating the API key invalidates the previous one. Update your sender configuration before switching keys.
Endpoint
POST https://api.peec.ai/agent-analytics/generic-access-log
Authorization: Bearer <your-api-key>
x-org-id: <your-organization-id>
Content-Type: application/json
Request body
Send an array of log objects, with a maximum of 500 entries per request.
[
{
"timestamp": "2024-01-15T12:34:56Z",
"request_method": "GET",
"request_url": "https://example.com/blog/my-post",
"response_status": 200,
"user_agent": "GPTBot/2.0",
"country_code": "US",
"client_ip": "1.2.3.4",
"referer": "https://some-site.com/"
}
]
| Field | Type | Required | Notes |
|---|
timestamp | string | Yes | ISO 8601 datetime |
request_method | string | Yes | e.g. GET, POST |
request_url | string | Yes | Full URL including scheme and host |
response_status | number | Yes | HTTP status code (100–599) |
user_agent | string | Yes | Raw User-Agent header value |
country_code | string | No | ISO 3166-1 alpha-2 |
client_ip | string | No | IPv4 or IPv6 |
referer | string | No | Referring URL |
File upload
To analyze a specific time period, or if you’d rather not set up a live connection, upload a log file directly from your browser. Peec parses the file, finds the AI crawler entries, and imports them into your dashboard.
- Drag and drop your log file (
.csv or CLF .log), or browse for it.
- For CLF files, enter your domain (e.g.
example.com) when prompted, so Peec can build full URLs from the request paths.
- Peec processes the file and shows how many AI bot requests it found.
- Click Upload to import them. A progress bar tracks completion.
Column order doesn’t matter:
| Column | Required | Notes |
|---|
timestamp | Yes | ISO 8601 datetime |
request_method | Yes | e.g. GET |
response_status | Yes | HTTP status code |
user_agent | Yes | Raw User-Agent string |
request_url | Yes | Full URL |
client_ip | No | IPv4 or IPv6 |
referer | No | Referring URL |
country_code | No | ISO 3166-1 alpha-2 |
Example:
timestamp,request_method,response_status,user_agent,request_url
2024-01-15T12:34:56Z,GET,200,GPTBot/2.0,https://example.com/blog/post
1.2.3.4 - - [15/Jan/2024:12:34:56 +0000] "GET /path HTTP/1.1" 200 1234 "https://referer" "GPTBot/2.0"
Supported AI bots
All integrations detect and track the following AI crawlers and agents. Only requests whose User-Agent matches one of these are stored.
| Bot | Description |
|---|
| GPTBot | OpenAI web crawler |
| ChatGPT-User | ChatGPT browsing requests |
| OAI-SearchBot | OpenAI search crawler |
| ClaudeBot | Anthropic web crawler |
| Claude-Web | Anthropic browsing (legacy) |
| Claude-SearchBot | Anthropic search crawler |
| Claude-User | Claude browsing requests |
| Claude-Code | Claude Code agent |
| anthropic-ai | Anthropic general crawler |
| PerplexityBot | Perplexity AI crawler |
| Perplexity-User | Perplexity browsing requests |
| Google-Extended | Google AI training crawler |
| Google-CloudVertexBot | Google Vertex AI crawler |
| Google-Agent | Google AI agent |
| GoogleAgent-Mariner | Google Mariner agent |
| Gemini-Deep-Research | Google Gemini deep research agent |
| Meta-ExternalAgent | Meta AI agent |
| meta-webindexer | Meta web indexer |
| meta-externalfetcher | Meta external fetcher |
| FacebookBot | Meta/Facebook crawler |
| Applebot | Apple web crawler |
| Applebot-Extended | Apple AI training crawler |
| Amazonbot | Amazon web crawler |
| Amzn-SearchBot | Amazon search AI crawler |
| AzureAI-SearchBot | Microsoft Azure AI crawler |
| GrokBot | xAI Grok crawler |
| Grok-DeepSearch | xAI Grok deep search agent |
| xAI-Grok | xAI Grok agent |
| DeepSeekBot | DeepSeek AI crawler |
| MistralAI-User | Mistral AI browsing requests |
| cohere-ai | Cohere AI crawler |
| cohere-training-data-crawler | Cohere training data crawler |
| PanguBot | Huawei PanGu crawler |
| Ai2Bot | Allen Institute for AI crawler |
| Ai2Bot-Dolma | Allen Institute Dolma dataset crawler |
| CCBot | Common Crawl bot |
| Bytespider | ByteDance/TikTok crawler |
| Diffbot | Diffbot AI crawler |
| DuckAssistBot | DuckDuckGo AI assistant crawler |
| YouBot | You.com crawler |
| quillbot.com | QuillBot AI crawler |
| Webzio-Extended | Webz.io extended crawler |
| omgili | Webhose/Omgili crawler |
| omgilibot | Webhose/Omgili bot |
| Timpibot | Timpi search crawler |
| NovaAct | Amazon Nova Act agent |
| Manus-User | Manus AI agent |
| MyCentralAIScraperBot | MyCentral AI scraper |