Question 1

Where do I put the robots.txt file?

Accepted Answer

At the root of your domain: yourdomain.com/robots.txt. Subfolder paths (like /blog/robots.txt) are ignored by crawlers. In Next.js App Router, you can also generate it programmatically via app/robots.ts.

Question 2

Will blocking AI crawlers stop my content from showing up in ChatGPT?

Accepted Answer

It stops future training crawls — content already crawled stays in the model. The 'AI Search' crawlers (OAI-SearchBot, ChatGPT-User) are separate from training crawlers (GPTBot). Blocking GPTBot keeps you out of future training; blocking OAI-SearchBot keeps you out of ChatGPT's live citations.

Question 3

Should I block SEO scrapers like Ahrefs and Semrush?

Accepted Answer

Usually no. Blocking them stops your own SEO research on competitors and prevents your site from appearing in their datasets. The exception is if your content is high-value and you don't want competitors mining your structure.

Question 4

What's the difference between Disallow in robots.txt and noindex?

Accepted Answer

robots.txt tells crawlers not to fetch the page. noindex (in a meta tag or HTTP header) tells crawlers they CAN fetch it but shouldn't show it in search results. For sensitive pages, use both. For pages you want crawled but not indexed (like paginated archives), use only noindex.

Question 5

Does Googlebot respect Crawl-delay?

Accepted Answer

No. Googlebot ignores Crawl-delay entirely. Bing, Yandex, and Yahoo respect it. To slow Googlebot, use Google Search Console's crawl rate settings instead.

Robots.txt Generator

Frequently asked questions