Skip to main content
Crawlability shows you which AI bots are allowed or blocked by a site’s robots.txt file. Select a tracked domain and get results instantly with no setup or account connection required. Peec checks the domain’s robots.txt against 40+ AI bots from 20+ vendors, then shows you the status for each. We categorize bots based on publicly available data and their stated purpose. As real-world behavior becomes clearer, categories may be refined to ensure accuracy.

Crawlability table

The table breaks down the status for each individual bot:
  • Bot: the user-agent identifier (e.g., GPTBot, ClaudeBot).
  • Platform: the AI vendor behind the bot (e.g., OpenAI, Anthropic, Google)
  • Bot type: Training, Search, User Query, and Other
  • Status: Allowed, Partial, or Blocked
  • Reason: How the status was determined: explicit rules for this bot, or inherited from the global wildcard (*) rules
Use the in-table search bar to find a specific bot, or filter by platform, bot type or status using the filters at the top.

URL Tester

Here you can enter any URL to see which AI bots are allowed or blocked by your domain’s robots.txt rules. Simply choose a URL on your domain to analyze and see which bots are allowed or blocked from crawling it. You can then use this insight to decide whether allowing different bots to crawl that URL is beneficial.

Interpreting Crawlability

If a bot is blocked, it can’t access your content. This means it can’t use your site as a source in its responses. Use Crawlability to:
  • Catch accidental blocking before it affects your AI visibility
  • Understand which AI ecosystems can and can’t access your content
  • Verify that changes to your robots.txt are working as expected

Bots

The Bots shows you which particular bots from which vendor are accessing and visiting your pages, and under which type.
AI botPlatformTypePurpose / Note
YouBotYou.comOtherFetches pages to power You.com’s AI search results.
omgiliWebz.ioTrainingForum and discussion crawler for structured dataset building.
Perplexity-UserPerplexityUser QueryUsed during a user’s Deep Research session.
AmazonbotAmazonTrainingGeneral training for Titan/Olympus models.
Google-AgentGoogleUser QueryUsed by Google agents to navigate the web and perform actions upon user request (e.g. Project Mariner).
cohere-training-data-crawlerCohereTrainingSpecialized crawler for raw training data.
ClaudeBotClaude (Anthropic)TrainingOfficial training bot for Anthropic models.
Gemini-Deep-ResearchGoogleUser QueryHigh-intensity agent for user-requested research.
Google-CloudVertexBotGoogleSearchCrawling for Google Cloud Vertex AI services.
Google-ExtendedGoogleTrainingOpt-out token for Gemini training and AI product improvement.
PanguBotPanGu (Huawei)TrainingTraining for Huawei’s Pangu models.
ChatGPT-UserChatGPT (OpenAI)User QueryVisits links directly provided by a user.
CCBotCommon CrawlTrainingMassive open-source web archive for AI labs.
GrokBotGrok (xAI)TrainingReal-time web search and training for Grok 3/4 models.
DuckAssistBotDuckDuckGoUser QuerySummarizes pages for DuckDuckGo’s AI responses.
omgilibotWebz.ioOtherForum-specific crawler variant. Commercial data product.
DiffbotDiffbotTrainingStructured data extraction as a service.
GoogleAgent-MarinerGoogleUser QueryAction Agent: Can fill forms and click buttons.
TikTokSpiderByteDanceOtherSpecialized scraper for TikTok’s AI data.
Webzio-ExtendedWebz.ioTrainingLarge-scale data scraping for AI providers.
BytespiderByteDanceTrainingTraining for TikTok and ByteDance AI.
Applebot-ExtendedAppleTrainingUsed for training Apple’s generative features.
OAI-SearchBotChatGPT (OpenAI)SearchReal-time retriever for ChatGPT answers.
DeepSeekBotDeepSeekTrainingTraining for the DeepSeek model series.
PerplexityBotPerplexitySearchFact-checking and retrieval for Perplexity.
Claude-WebClaude (Anthropic)OtherLegacy bot for web browsing during Claude interactions.
Grok-DeepSearchGrok (xAI)SearchReal-time web search for Grok’s deep research feature.
Ai2Bot-DolmaAllen InstituteTrainingSpecifically builds the Dolma open dataset.
Manus-UserMetaUser QueryAction Agent: Navigates and interacts with sites.
FacebookBotMetaTrainingWeb crawler used by Meta for AI training data collection.
AzureAI-SearchBotMicrosoftSearchWeb retrieval for Azure AI services.
xAI-GrokGrok (xAI)SearchGeneral-purpose web search bot for xAI/Grok.
TimpibotTimpiTrainingDecentralized search engine for AI.
Claude-SearchBotClaude (Anthropic)SearchAnthropic’s specific bot for its search features.
MistralAI-UserMistralUser QueryOn-demand browser for Mistral users.
Claude-UserClaude (Anthropic)User QueryTriggered when a user prompts with a specific link.
Amzn-SearchBotAmazonSearchSearch bot for Amazon’s AI shopping features.
MyCentralAIScraperBotUnknownOtherCentralized AI data collection tool.
GPTBotChatGPT (OpenAI)TrainingPrimary crawler for foundational training.
anthropic-aiClaude (Anthropic)TrainingGeneral data collection and model training.
meta-webindexerMetaSearchSearch indexing for Meta’s AI assistants.
NovaActAmazonUser QueryAgent for automated web-based workflows.
meta-externalfetcherMetaUser QueryUsed for real-time link expansion on Meta.
CloudVertexBotGoogleTrainingCloud-based AI deployment and indexing.
Ai2BotAllen InstituteTrainingGeneral-purpose web crawler for Allen Institute AI research.
Meta-ExternalAgentMetaTrainingHigh-velocity training crawler for Llama.
quillbot.comQuillBotUser QueryFetches content to power QuillBot’s AI writing tools.
ApplebotAppleSearchGathers data to power Spotlight, Siri, and Safari search functionality.
cohere-aiCohereTrainingTraining for enterprise-grade LLMs.