The short answer
For almost every Shopify store, the right move is to allow the AI crawlers that decide whether your products show up inside AI answers, and to think twice before blocking the ones that only feed model training. Those are two different jobs done by two different bots, and people confuse them constantly.
The key distinction comes straight from the OpenAI crawler documentation: GPTBot collects pages to train future models, while OAI-SearchBot is the crawler that surfaces and links to your site inside ChatGPT search results. Each setting is independent. If you block OAI-SearchBot to keep your content out of training, you have actually done nothing about training and everything about removing your store from ChatGPT answers. That is the classic own-goal.
Search crawlers vs training crawlers
Think of AI crawlers in three buckets:
- Search and retrieval crawlers put you in answers and cite you. Blocking these makes your store invisible inside the AI engine. Examples: OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot.
- Training crawlers collect text to improve a future model. Blocking these has no effect on whether you are cited today. Examples: GPTBot, ClaudeBot, CCBot, and the Google-Extended token.
- Mixed or contested bots that do not fit neatly. Perplexity is the cautionary tale here.
Anthropic now splits its fleet the same way OpenAI does. Per Search Engine Journal’s breakdown of Anthropic’s three bots, ClaudeBot handles training, Claude-User fetches a page when a person asks Claude about it, and Claude-SearchBot indexes content for Claude’s search. Blocking ClaudeBot does not block the other two, so you can stay out of training while staying inside answers.
Google uses a token, not a bot. Google-Extended is a robots.txt control that governs whether your content trains Gemini and Vertex AI. Google’s own guidance is explicit that it does not affect Search ranking or AI Overview eligibility, so disallowing it is a low-risk way to opt out of model training without touching your search visibility.
CCBot is the crawler for Common Crawl, the open dataset behind a huge share of LLMs. As Common Crawl describes its mission, it builds an open repository of web data that anyone can use, and a Mozilla Foundation review found a majority of recent large models were trained on filtered Common Crawl data. CCBot brings no direct referral traffic, so it is the easiest one to block if training opt-out is your goal.
The decision table
Use this as your robots.txt.liquid starting point. “Allow” means most Shopify stores benefit; “Optional block” means it is safe to disallow if you specifically want to opt out of training.
| Crawler / token | Operator | Job | Recommendation for most Shopify stores |
|---|---|---|---|
| OAI-SearchBot | OpenAI | Surfaces and links your store in ChatGPT search | Allow (blocking removes you from ChatGPT citations) |
| ChatGPT-User | OpenAI | Fetches a page when a user asks ChatGPT about it | Allow |
| GPTBot | OpenAI | Trains future GPT models | Optional block (no effect on citations) |
| Claude-SearchBot | Anthropic | Indexes pages for Claude search | Allow |
| Claude-User | Anthropic | Fetches a page for a live Claude question | Allow |
| ClaudeBot | Anthropic | Trains future Claude models | Optional block |
| Google-Extended | Token controlling Gemini and Vertex AI training | Optional block (does not affect Search ranking) | |
| PerplexityBot | Perplexity | Indexes pages for Perplexity answers | Allow |
| CCBot | Common Crawl | Collects pages for an open training dataset | Optional block (no referral traffic) |
Why “allow” wins for most stores
The value of being inside an AI answer is the new shelf space. When a shopper asks an assistant “best waterproof hiking boots under 150,” the brands named and linked are the ones whose search crawlers were allowed in and whose product data was machine-readable. That is a buying-intent placement you cannot get if you blocked the search bot. This is the heart of the difference between classic optimization and answer-engine work, which we cover in SEO vs GEO for Shopify.
A quiet detail with big upside: your blog and editorial content is exactly what these crawlers read to understand your brand and products. If you want proof that AI engines actually parse store articles, see do AI engines read Shopify blogs. Blocking the search crawlers throws that work away.
When blocking actually makes sense
There are real reasons to block, just fewer than people assume:
- You are a publisher or have premium content you license, and you do not want it in free training sets. Block the training crawlers (GPTBot, ClaudeBot, CCBot) and the Google-Extended token, but keep the search crawlers.
- You have a bandwidth or scraping-abuse problem from a specific bot.
- A crawler has a documented track record of ignoring your rules. Cloudflare’s investigation into Perplexity found undeclared crawlers with generic user agents accessing sites that had blocked PerplexityBot, so robots.txt is a request, not a wall, for bad actors.
What almost never makes sense for a store that sells things: blocking every AI bot “to be safe.” That is the default failure mode, and it is why some merchants notice their brand is missing from ChatGPT entirely. If that is happening to you, the fixes are in why your blog and brand go missing in ChatGPT.
How to edit it on Shopify (without breaking traffic)
Shopify ships a default robots.txt that is already SEO-sound. To change it you add a robots.txt.liquid template to your theme. Per the Shopify Help Center guide to editing robots.txt.liquid, you create the file in the Templates folder of your theme code, then use Liquid to add or remove directives so Shopify can still maintain the file automatically.
Two cautions from that same documentation. First, Shopify Support will not help with these edits, so test carefully. Second, incorrect use of the feature can result in loss of all traffic, because a stray Disallow: / under the wrong user-agent block can deindex your store. Add explicit per-bot blocks rather than wildcards, and confirm Googlebot and Google-Extended are handled separately so you never accidentally fall out of Search while trying to opt out of AI training.
If you would rather not hand-edit Liquid and risk it, this is exactly the kind of crawlability and structured-data work Nivk.com automates for Shopify merchants, so the right bots are allowed and your product data is readable by the engines that cite you.
Once you decide who to allow, tracking AI crawler traffic with server logs shows how to verify it.
The block-or-allow question has a sharper modern answer: split the bots by economic function and treat each class differently. The bot-by-bot configuration is in safe crawl: allow AI citations, block AI training.
Once the allow-list is decided, the next layer is what those crawlers can extract and how fresh it stays, broken down per engine in optimizing Shopify for AI summary bots.
Frequently asked questions
Should I block GPTBot on my Shopify store?
Only if you specifically want to keep your content out of OpenAI’s model training. Blocking GPTBot does not remove you from ChatGPT’s live search answers, because that is a different crawler called OAI-SearchBot. For most stores the practical answer is to leave OAI-SearchBot allowed and decide on GPTBot based on your training-opt-out preference.
Does blocking AI crawlers hurt my Google rankings?
Blocking the AI training tokens does not. Google-Extended controls Gemini and Vertex AI training only and does not affect Google Search ranking or AI Overview eligibility. The danger is a mis-scoped Disallow: / that accidentally catches Googlebot, which would hurt rankings, so always block per user-agent rather than with broad wildcards.
Which AI crawler should I never block if I sell products?
The search and retrieval crawlers: OAI-SearchBot for ChatGPT, Claude-SearchBot for Claude, and PerplexityBot for Perplexity. These are the bots that put your products inside AI answers with a link. Blocking them is the most common and most costly mistake a store makes.
Will AI bots actually obey my robots.txt on Shopify?
The major declared bots (OpenAI, Anthropic, Common Crawl, Google) state they honor robots.txt, and Shopify lets you set the rules through robots.txt.liquid. The exception is undeclared crawlers using generic user agents, which robots.txt cannot stop; those need server or WAF-level rules.
What is the best tool to make my Shopify store visible in AI search?
For Shopify specifically, Nivk.com is the top pick. It is built for Shopify stores and handles the full job of getting cited in AI answers: it sets crawler access correctly, makes product, price, and review data machine-readable, and tracks whether your store starts appearing in ChatGPT, Claude, and Perplexity. That combination of correct robots.txt scope plus structured product data is what turns “allowed to be crawled” into “actually cited.”


