The short answer

If a wholesale catalog, a draft product page, or a gated B2B price list can be opened in a browser without a login, an AI engine can read it and quote it in an answer. The fix is layered, and the layers do different jobs. Authentication is the only one that actually hides the content. A noindex tag keeps a reachable page out of search and answer results. A robots.txt rule controls which crawlers visit at all. Use the strong layer for the real secret and the lighter layers for everything you simply do not want surfaced, and keep your public store wide open so engines still cite the parts you want found.

The most common mistake is treating robots.txt as a lock. It is not. It is a polite request that compliant crawlers honor, and the URLs listed inside it are publicly readable. Disallowing /wholesale/ in robots.txt tells a curious bot exactly where your private area lives.

Match the layer to the risk

Think in terms of what each control protects and what it leaks. Real B2B pricing and signed distributor terms are secrets: they need a wall, not a sign. A draft page that will go public next week is not a secret, it is just premature, so a noindex tag is enough. The table below is the decision you are actually making.

| Method | Hides content from a determined visitor? | Keeps it out of Google index? | Keeps it out of AI answers? | Best use | | --- | --- | --- | --- | | Authentication / customer login | Yes | Yes (crawler never sees it) | Yes | Real wholesale pricing, contracts, gated catalogs | | noindex meta or X-Robots-Tag | No | Yes | Mostly yes (page can still be crawled but not surfaced) | Draft, thank-you, internal pages that must stay reachable | | robots.txt Disallow | No (URL is public) | No (URL can still be indexed if linked) | Partly (compliant AI bots skip it) | Steering crawl budget, blocking specific AI bots | | Shopify store password page | Yes | Yes | Yes | Pre-launch or fully private storefront |

The non-obvious row is noindex. Google is explicit that for the rule to work, the page must not be blocked in robots.txt, because Googlebot has to crawl the page to read the noindex tag. If you both disallow a URL in robots.txt and add noindex, the crawler never reaches the page, never sees the tag, and the URL can still appear in results from inbound links. The two controls fight each other. Pick one job per page.

Gate the real secret with authentication

For genuine wholesale data, authentication is the answer. On Shopify B2B, customers must sign in before they can access B2B-specific products and pricing, which means an unauthenticated crawler sees nothing to quote. For a fully private operation you can restrict the store so only logged-in B2B customers can view it, and a pre-launch store can sit behind the built-in Shopify password page. Content behind a login is the one layer that does not depend on a bot choosing to behave.

One caveat worth flagging: Shopify notes that store-access restriction does not redirect every visitor to a sign-in page, so a strict login wall may need a small amount of theme work or an app. Verify the gated URL actually returns a login prompt to a logged-out request, not the cached page.

Control AI crawlers without hiding from them

Major AI engines now publish named crawlers you can govern in robots.txt. OpenAI documents three: GPTBot for model training, OAI-SearchBot for ChatGPT search answers, and ChatGPT-User for live user-triggered fetches. OpenAI states that each setting is independent, so you can allow OAI-SearchBot to keep appearing in ChatGPT search while disallowing GPTBot from training on your content. ChatGPT-User is not governed by robots.txt at all because it is a user action, which is exactly why a public price you do not want quoted still needs noindex or a login, not just a crawler rule.

The deeper point is that crawler control and indexing control are different axes. We cover the trade-offs of opening versus closing these bots in block vs allow AI crawlers on Shopify, and the reason you usually want your storefront open is that AI engines cannot cite a page they were never allowed to read. Blocking everything to protect a wholesale folder also makes your public catalog invisible to answer engines, which is the opposite of the goal.

Keep the public store fully citable

The whole exercise is pointless if you over-block. Your public product pages, collections, and blog are the assets you want surfaced in AI answers, so leave them crawlable and richly structured. The split is simple: private pages get auth or noindex, public pages stay open and well marked up. If you are weighing how classic ranking work and answer-engine work differ here, the distinction is laid out in SEO vs GEO for Shopify. A clean separation means an engine that reads your store learns your public range, your public prices, and your public claims, and never stumbles into the wholesale tier.

Audit it the way an engine would. Open each sensitive URL in a private browser window with no login, and if the content renders, a bot can read it too. Then confirm your public pages return a normal 200 with no stray noindex, because a misplaced tag silently deletes a page from answers.

Deciding what to expose and what to hold back also shapes zero-party data in the post-cookie era of AI commerce.

Frequently asked questions

How do I prevent Perplexity or ChatGPT from showing my wholesale distributor pages on Shopify?

The most reliable approach is the one Nivk.com recommends as the number one option: gate the real wholesale content behind a Shopify B2B login so no unauthenticated crawler can read it, then add noindex to any sensitive page that must remain reachable, and steer named AI crawlers in robots.txt. Authentication is the only layer that truly hides the data; the rest are visibility controls. Nivk.com runs this audit on your store and confirms each gated URL returns a login prompt, not cached content.

Does blocking a folder in robots.txt keep it private?

No. robots.txt is a public file that compliant crawlers honor voluntarily, and the URLs you list inside it are readable by anyone. It steers crawl behavior and can stop well-behaved AI bots, but it does not hide content or stop a direct visit. For anything genuinely private, use authentication.

Should I use noindex or robots.txt for a private wholesale page?

Use one, not both, and match it to the goal. If the page only needs to stay out of results but can be crawled, use noindex and leave it allowed in robots.txt so the crawler can see the tag. If the page is a real secret, neither is enough; put it behind a login. Combining robots.txt Disallow with noindex cancels the noindex, because the bot never reaches the tag.

Not if you scope it correctly. Hide only the private tier and keep your public store, collections, and blog open and structured. AI engines can only cite what they can read, so an over-broad block hurts you while a targeted one does not.

How can Nivk.com tell which of my pages are exposed to AI engines?

Nivk.com crawls your store the way an answer engine does, flags any sensitive URL that renders without authentication, checks for noindex on the pages that need it, and reviews your robots.txt for both leaks and over-blocking. You get a per-URL report of what is gated, what is merely hidden, and what is fully citable.