| YouBot | You.com | Other | Fetches pages to power You.com’s AI search results. |
| omgili | Webz.io | Training | Forum and discussion crawler for structured dataset building. |
| Perplexity-User | Perplexity | User Query | Used during a user’s Deep Research session. |
| Amazonbot | Amazon | Training | General training for Titan/Olympus models. |
| Google-Agent | Google | User Query | Used by Google agents to navigate the web and perform actions upon user request (e.g. Project Mariner). |
| cohere-training-data-crawler | Cohere | Training | Specialized crawler for raw training data. |
| ClaudeBot | Claude (Anthropic) | Training | Official training bot for Anthropic models. |
| Gemini-Deep-Research | Google | User Query | High-intensity agent for user-requested research. |
| Google-CloudVertexBot | Google | Search | Crawling for Google Cloud Vertex AI services. |
| Google-Extended | Google | Training | Opt-out token for Gemini training and AI product improvement. |
| PanguBot | PanGu (Huawei) | Training | Training for Huawei’s Pangu models. |
| ChatGPT-User | ChatGPT (OpenAI) | User Query | Visits links directly provided by a user. |
| CCBot | Common Crawl | Training | Massive open-source web archive for AI labs. |
| GrokBot | Grok (xAI) | Training | Real-time web search and training for Grok 3/4 models. |
| DuckAssistBot | DuckDuckGo | User Query | Summarizes pages for DuckDuckGo’s AI responses. |
| omgilibot | Webz.io | Other | Forum-specific crawler variant. Commercial data product. |
| Diffbot | Diffbot | Training | Structured data extraction as a service. |
| GoogleAgent-Mariner | Google | User Query | Action Agent: Can fill forms and click buttons. |
| TikTokSpider | ByteDance | Other | Specialized scraper for TikTok’s AI data. |
| Webzio-Extended | Webz.io | Training | Large-scale data scraping for AI providers. |
| Bytespider | ByteDance | Training | Training for TikTok and ByteDance AI. |
| Applebot-Extended | Apple | Training | Used for training Apple’s generative features. |
| OAI-SearchBot | ChatGPT (OpenAI) | Search | Real-time retriever for ChatGPT answers. |
| DeepSeekBot | DeepSeek | Training | Training for the DeepSeek model series. |
| PerplexityBot | Perplexity | Search | Fact-checking and retrieval for Perplexity. |
| Claude-Web | Claude (Anthropic) | Other | Legacy bot for web browsing during Claude interactions. |
| Grok-DeepSearch | Grok (xAI) | Search | Real-time web search for Grok’s deep research feature. |
| Ai2Bot-Dolma | Allen Institute | Training | Specifically builds the Dolma open dataset. |
| Manus-User | Meta | User Query | Action Agent: Navigates and interacts with sites. |
| FacebookBot | Meta | Training | Web crawler used by Meta for AI training data collection. |
| AzureAI-SearchBot | Microsoft | Search | Web retrieval for Azure AI services. |
| xAI-Grok | Grok (xAI) | Search | General-purpose web search bot for xAI/Grok. |
| Timpibot | Timpi | Training | Decentralized search engine for AI. |
| Claude-SearchBot | Claude (Anthropic) | Search | Anthropic’s specific bot for its search features. |
| MistralAI-User | Mistral | User Query | On-demand browser for Mistral users. |
| Claude-User | Claude (Anthropic) | User Query | Triggered when a user prompts with a specific link. |
| Amzn-SearchBot | Amazon | Search | Search bot for Amazon’s AI shopping features. |
| MyCentralAIScraperBot | Unknown | Other | Centralized AI data collection tool. |
| GPTBot | ChatGPT (OpenAI) | Training | Primary crawler for foundational training. |
| anthropic-ai | Claude (Anthropic) | Training | General data collection and model training. |
| meta-webindexer | Meta | Search | Search indexing for Meta’s AI assistants. |
| NovaAct | Amazon | User Query | Agent for automated web-based workflows. |
| meta-externalfetcher | Meta | User Query | Used for real-time link expansion on Meta. |
| CloudVertexBot | Google | Training | Cloud-based AI deployment and indexing. |
| Ai2Bot | Allen Institute | Training | General-purpose web crawler for Allen Institute AI research. |
| Meta-ExternalAgent | Meta | Training | High-velocity training crawler for Llama. |
| quillbot.com | QuillBot | User Query | Fetches content to power QuillBot’s AI writing tools. |
| Applebot | Apple | Search | Gathers data to power Spotlight, Siri, and Safari search functionality. |
| cohere-ai | Cohere | Training | Training for enterprise-grade LLMs. |