If you’ve been checking your server logs lately, you’ve probably noticed some unfamiliar names showing up — things like GPTBot, ClaudeBot, CCBot, or Bytespider. These are AI crawlers. They’re visiting your site to collect data, often to train or power AI models.

Whether you want to stop them completely or just control which ones get access, Or you want Block AI Crawlers in Robots txt you can do that through your robots.txt file. This guide shows you exactly how.

Why Bloggers Are Blocking AI Crawlers in 2026

A year or two ago, most people had never heard of AI crawlers. Now they’re showing up in Google Analytics, server access logs, and WordPress security plugins on a daily basis.

The concern is simple: your content took time to write. You did the research, the editing, the formatting. AI companies are sending bots to collect that content at scale — sometimes to train models, sometimes to generate AI summaries that show up instead of your actual page.

That’s the core issue. It’s not just about traffic. It’s about control over your own content.

Some bloggers are blocking all AI crawlers. Others are selectively blocking only the training bots while keeping the user-facing ones (like Bing’s AI search). Both approaches are valid, and both can be done with a few lines in your robots.txt file.

The question isn’t really can you block them — you can, and it’s easy. The question is which ones should you block, and for what reason.

Training Bots vs User-Facing Bots — Know the Difference

Not all AI crawlers are the same, and this is where most guides get it wrong.

There are two types of AI bots visiting your site right now:

Training bots — These crawl your content to build or improve an AI model. Your content becomes part of their dataset. Examples: GPTBot (OpenAI), CCBot (Common Crawl), Bytespider (ByteDance/TikTok).

User-facing bots — These crawl your content to give real users an AI-powered answer, often in search results or a chat interface. If you block these, your content may stop appearing in AI-powered search features. Example: Bingbot (when used for Copilot answers), PerplexityBot.

The practical difference matters. If you block a training bot, your content won’t end up in future model training datasets. If you block a user-facing bot, you might lose visibility in AI-generated search summaries.

A lot of site owners want to block the training bots but keep the user-facing ones. The good news is you can do that — you just need to know the specific user-agent strings for each bot.

Relevant read: Google’s documentation on how crawlers interact with robots.txt — useful background on how the robots.txt protocol actually works.


Complete List of AI Crawlers and Their User-Agents (2026)

Bot NameCompanyTypeUser-Agent StringRespects robots.txt?
GPTBotOpenAITrainingGPTBotYes
ClaudeBotAnthropicTrainingClaudeBotYes
CCBotCommon CrawlTrainingCCBotYes
BytespiderByteDanceTrainingBytespiderInconsistent
Google-ExtendedGoogleTraining (Gemini)Google-ExtendedYes
PerplexityBotPerplexity AIUser-facingPerplexityBotYes
FacebookBotMetaTraining/IndexingFacebookBotYes
Applebot-ExtendedAppleTrainingApplebot-ExtendedYes
cohere-aiCohereTrainingcohere-aiYes
OmgilibotWebz.ioData collectionomgilibotMostly
DuckAssistBotDuckDuckGoUser-facing AIDuckAssistBotYes
YouBotYou.comUser-facing AIYouBotYes
TimpibotTimpiIndexing/AITimpibotMostly
img2datasetVariousImage trainingimg2datasetNo

Note: “Respects robots.txt” means the company has publicly stated they will honor disallow rules. Bytespider and img2dataset have been reported to ignore them in some cases. For those, server-level blocking via .htaccess or your firewall is a more reliable option.

Source: OpenAI’s GPTBot documentation and Anthropic’s crawler policy — both confirm they honor robots.txt disallow rules.


How to Block AI Crawlers in Your Robots.txt (Copy-Paste Code)

Your robots.txt file lives at the root of your website — so it’s accessible at yourdomain.com/robots.txt. If you’re on WordPress, it’s either auto-generated or you can edit it through your SEO plugin (more on that below).

Here are the exact rules you need:

Block All AI Training Bots

If you want to stop all known AI training crawlers from accessing your content, add this to your robots.txt:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: Timpibot
Disallow: /

User-agent: img2dataset
Disallow: /

Block Specific Bots Only

If you only care about one or two — say, you want to block OpenAI but not Google’s Gemini training bot — just add the relevant lines:

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

Recommended: Allow User-Facing, Block Training Only

This is the approach most SEO-conscious bloggers are taking. You keep access open for AI search tools (which can still send you traffic) and block the pure training crawlers:

# Block training bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: img2dataset
Disallow: /

# Allow user-facing AI search bots
User-agent: PerplexityBot
Allow: /

User-agent: DuckAssistBot
Allow: /

User-agent: YouBot
Allow: /

Instead of copying all this manually, you can use our free robots.txt generator — it has a dedicated AI Bots tab where you can toggle each of the 14 crawlers above on or off and see the output update in real time. Much faster than editing raw code.


How to Add These Rules to WordPress

WordPress generates a virtual robots.txt file automatically. You have two main ways to edit it:

Option 1 — Use the Rank Math or Yoast robots.txt editor

Both plugins have a built-in editor. In Rank Math, go to: Rank Math → General Settings → Edit robots.txt. Paste your rules there and save. The plugin takes care of the rest.

Option 2 — Use a physical robots.txt file

Create a file called robots.txt in your site’s root folder (via FTP or your host’s file manager) and paste the rules. WordPress will use this file instead of the auto-generated one.

Option 3 — Skip the manual work

Use our robots.txt generator, select the bots you want to block in the AI Bots tab, copy the output, and paste it into Rank Math or your physical file. Done in about 60 seconds.

Tip: After editing your robots.txt, test it with Google Search Console’s robots.txt tester to make sure your rules are working correctly.



Should You Block ALL AI Crawlers? (The Trade-off)

The honest answer is: it depends on what you’re trying to protect and what you’re willing to give up.

Here’s how to think through it:

Block all AI crawlers if:

Don’t block everything if:

One thing worth knowing: blocking AI crawlers in robots.txt is honored by the bots that respect the protocol. Most major ones — GPTBot, ClaudeBot, Google-Extended — do. But some smaller or less reputable bots don’t bother. If you want complete control, you may also need to block certain user-agents at the server level.

For most bloggers, the selective approach works fine: block the training bots, allow the user-facing ones, and check your server logs every few months to catch any new ones.

Our robots.txt generator is updated regularly with new AI bots as they’re identified, so you don’t have to manually track every new crawler that appears. Just open the AI Bots tab, toggle what you want, and copy the output.


Final Thoughts

Blocking AI crawlers isn’t about being anti-technology. It’s about having control over content you created. The robots.txt file has always been the standard way to tell bots what they can and can’t do — AI crawlers are just the newest type of bot you might want to talk to.

The rules are simple, the code is easy to copy, and you can always adjust later. If you want to handle it without touching code at all, head over to our robots.txt generator and use the AI Bots tab to build your file in about a minute.

Also worth reading: Dark Visitors’ AI crawler directory — a regularly updated database of known AI bots, their user-agents, and whether they honor robots.txt. Useful for staying current.


Quick Recap


Last updated: 2026. Bot user-agent strings are accurate as of publication. Check official documentation for each company for any changes.


SEO Checklist (Rank Math On-Page)

ElementStatusNotes
Primary keyword in H1“How to Block AI Crawlers in Robots.txt”
Primary keyword in meta titleIncluded
Primary keyword in meta description“block ai crawlers in robots.txt”
Primary keyword in first 100 wordsAppears in intro paragraph
Primary keyword in H2 headingsMultiple H2s include keyword variations
Internal links to /robots-txt-generator/4 internal links placed naturally
Outbound links4 outbound links (Google, OpenAI, Anthropic, Dark Visitors, GSC)
Image alt texts with primary keywordAll 3 images include keyword
Word count~1,600 words
Table of structured dataBot comparison table
Code blocks3 copy-paste code examples
LSI / related keywords usedrobots.txt AI bots, user-agent strings, GPTBot, training crawlers, WordPress robots.txt