If you’ve been checking your server logs lately, you’ve probably noticed some unfamiliar names showing up — things like GPTBot, ClaudeBot, CCBot, or Bytespider. These are AI crawlers. They’re visiting your site to collect data, often to train or power AI models.
Whether you want to stop them completely or just control which ones get access, Or you want Block AI Crawlers in Robots txt you can do that through your robots.txt file. This guide shows you exactly how.
Why Bloggers Are Blocking AI Crawlers in 2026
A year or two ago, most people had never heard of AI crawlers. Now they’re showing up in Google Analytics, server access logs, and WordPress security plugins on a daily basis.
The concern is simple: your content took time to write. You did the research, the editing, the formatting. AI companies are sending bots to collect that content at scale — sometimes to train models, sometimes to generate AI summaries that show up instead of your actual page.
That’s the core issue. It’s not just about traffic. It’s about control over your own content.
Some bloggers are blocking all AI crawlers. Others are selectively blocking only the training bots while keeping the user-facing ones (like Bing’s AI search). Both approaches are valid, and both can be done with a few lines in your robots.txt file.
The question isn’t really can you block them — you can, and it’s easy. The question is which ones should you block, and for what reason.
Training Bots vs User-Facing Bots — Know the Difference
Not all AI crawlers are the same, and this is where most guides get it wrong.
There are two types of AI bots visiting your site right now:
Training bots — These crawl your content to build or improve an AI model. Your content becomes part of their dataset. Examples: GPTBot (OpenAI), CCBot (Common Crawl), Bytespider (ByteDance/TikTok).
User-facing bots — These crawl your content to give real users an AI-powered answer, often in search results or a chat interface. If you block these, your content may stop appearing in AI-powered search features. Example: Bingbot (when used for Copilot answers), PerplexityBot.
The practical difference matters. If you block a training bot, your content won’t end up in future model training datasets. If you block a user-facing bot, you might lose visibility in AI-generated search summaries.
A lot of site owners want to block the training bots but keep the user-facing ones. The good news is you can do that — you just need to know the specific user-agent strings for each bot.
Relevant read: Google’s documentation on how crawlers interact with robots.txt — useful background on how the
robots.txtprotocol actually works.
Complete List of AI Crawlers and Their User-Agents (2026)
| Bot Name | Company | Type | User-Agent String | Respects robots.txt? |
|---|---|---|---|---|
| GPTBot | OpenAI | Training | GPTBot | Yes |
| ClaudeBot | Anthropic | Training | ClaudeBot | Yes |
| CCBot | Common Crawl | Training | CCBot | Yes |
| Bytespider | ByteDance | Training | Bytespider | Inconsistent |
| Google-Extended | Training (Gemini) | Google-Extended | Yes | |
| PerplexityBot | Perplexity AI | User-facing | PerplexityBot | Yes |
| FacebookBot | Meta | Training/Indexing | FacebookBot | Yes |
| Applebot-Extended | Apple | Training | Applebot-Extended | Yes |
| cohere-ai | Cohere | Training | cohere-ai | Yes |
| Omgilibot | Webz.io | Data collection | omgilibot | Mostly |
| DuckAssistBot | DuckDuckGo | User-facing AI | DuckAssistBot | Yes |
| YouBot | You.com | User-facing AI | YouBot | Yes |
| Timpibot | Timpi | Indexing/AI | Timpibot | Mostly |
| img2dataset | Various | Image training | img2dataset | No |
Note: “Respects robots.txt” means the company has publicly stated they will honor disallow rules.
Bytespiderandimg2datasethave been reported to ignore them in some cases. For those, server-level blocking via.htaccessor your firewall is a more reliable option.
Source: OpenAI’s GPTBot documentation and Anthropic’s crawler policy — both confirm they honor
robots.txtdisallow rules.
How to Block AI Crawlers in Your Robots.txt (Copy-Paste Code)
Your robots.txt file lives at the root of your website — so it’s accessible at yourdomain.com/robots.txt. If you’re on WordPress, it’s either auto-generated or you can edit it through your SEO plugin (more on that below).
Here are the exact rules you need:
Block All AI Training Bots
If you want to stop all known AI training crawlers from accessing your content, add this to your robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: Timpibot
Disallow: /
User-agent: img2dataset
Disallow: /
Block Specific Bots Only
If you only care about one or two — say, you want to block OpenAI but not Google’s Gemini training bot — just add the relevant lines:
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
Recommended: Allow User-Facing, Block Training Only
This is the approach most SEO-conscious bloggers are taking. You keep access open for AI search tools (which can still send you traffic) and block the pure training crawlers:
# Block training bots
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: img2dataset
Disallow: /
# Allow user-facing AI search bots
User-agent: PerplexityBot
Allow: /
User-agent: DuckAssistBot
Allow: /
User-agent: YouBot
Allow: /
Instead of copying all this manually, you can use our free robots.txt generator — it has a dedicated AI Bots tab where you can toggle each of the 14 crawlers above on or off and see the output update in real time. Much faster than editing raw code.
How to Add These Rules to WordPress
WordPress generates a virtual robots.txt file automatically. You have two main ways to edit it:
Option 1 — Use the Rank Math or Yoast robots.txt editor
Both plugins have a built-in editor. In Rank Math, go to: Rank Math → General Settings → Edit robots.txt. Paste your rules there and save. The plugin takes care of the rest.
Option 2 — Use a physical robots.txt file
Create a file called robots.txt in your site’s root folder (via FTP or your host’s file manager) and paste the rules. WordPress will use this file instead of the auto-generated one.
Option 3 — Skip the manual work
Use our robots.txt generator, select the bots you want to block in the AI Bots tab, copy the output, and paste it into Rank Math or your physical file. Done in about 60 seconds.
Tip: After editing your
robots.txt, test it with Google Search Console’s robots.txt tester to make sure your rules are working correctly.
Should You Block ALL AI Crawlers? (The Trade-off)
The honest answer is: it depends on what you’re trying to protect and what you’re willing to give up.
Here’s how to think through it:
Block all AI crawlers if:
- Your content is original, research-heavy, or creative work you don’t want scraped
- You’ve seen AI tools reproducing your content verbatim
- You’re not relying on AI-powered search features for traffic
Don’t block everything if:
- You want to appear in Perplexity, Copilot, or other AI search tools
- Your niche relies on AI-powered discovery (tech, health, recipes)
- You’re open to AI summarization as long as users can still click through
One thing worth knowing: blocking AI crawlers in robots.txt is honored by the bots that respect the protocol. Most major ones — GPTBot, ClaudeBot, Google-Extended — do. But some smaller or less reputable bots don’t bother. If you want complete control, you may also need to block certain user-agents at the server level.
For most bloggers, the selective approach works fine: block the training bots, allow the user-facing ones, and check your server logs every few months to catch any new ones.
Our robots.txt generator is updated regularly with new AI bots as they’re identified, so you don’t have to manually track every new crawler that appears. Just open the AI Bots tab, toggle what you want, and copy the output.
Final Thoughts
Blocking AI crawlers isn’t about being anti-technology. It’s about having control over content you created. The robots.txt file has always been the standard way to tell bots what they can and can’t do — AI crawlers are just the newest type of bot you might want to talk to.
The rules are simple, the code is easy to copy, and you can always adjust later. If you want to handle it without touching code at all, head over to our robots.txt generator and use the AI Bots tab to build your file in about a minute.
Also worth reading: Dark Visitors’ AI crawler directory — a regularly updated database of known AI bots, their user-agents, and whether they honor robots.txt. Useful for staying current.
Quick Recap
- AI crawlers fall into two types: training bots (collect data for models) and user-facing bots (power AI search results)
- You can block any of them using
User-agentandDisallowrules in yourrobots.txt - Most major bots — GPTBot, ClaudeBot, Google-Extended — officially respect these rules
- WordPress users can edit
robots.txtthrough Rank Math, Yoast, or a physical file - For the fastest setup, use the AI Bots tab in our robots.txt generator
Last updated: 2026. Bot user-agent strings are accurate as of publication. Check official documentation for each company for any changes.
SEO Checklist (Rank Math On-Page)
| Element | Status | Notes |
|---|---|---|
| Primary keyword in H1 | ✅ | “How to Block AI Crawlers in Robots.txt” |
| Primary keyword in meta title | ✅ | Included |
| Primary keyword in meta description | ✅ | “block ai crawlers in robots.txt” |
| Primary keyword in first 100 words | ✅ | Appears in intro paragraph |
| Primary keyword in H2 headings | ✅ | Multiple H2s include keyword variations |
| Internal links to /robots-txt-generator/ | ✅ | 4 internal links placed naturally |
| Outbound links | ✅ | 4 outbound links (Google, OpenAI, Anthropic, Dark Visitors, GSC) |
| Image alt texts with primary keyword | ✅ | All 3 images include keyword |
| Word count | ✅ | ~1,600 words |
| Table of structured data | ✅ | Bot comparison table |
| Code blocks | ✅ | 3 copy-paste code examples |
| LSI / related keywords used | ✅ | robots.txt AI bots, user-agent strings, GPTBot, training crawlers, WordPress robots.txt |