What is a robots.txt file?

A robots.txt file is a text file placed at the root of your website that tells search engine crawlers which pages or sections they are allowed or disallowed from indexing. It follows the Robots Exclusion Protocol standard.

Can robots.txt block a page from appearing in Google?

Not reliably. robots.txt prevents crawling but not indexing. Google may still index a URL if other pages link to it, showing it in results without a description. To truly block indexing, use a noindex meta tag or X-Robots-Tag header instead.

robots.txt Generator

Generate a robots.txt file for search engine crawlers

Sitemap URL

Disallow Paths (one per line)

Allow Paths (one per line)

User-Agent

Crawl Delay (seconds)

Result

robots.txt Content

User-agent: *
Disallow:

Rules

0 rules defined

Crawl Status

Open (all allowed)

Related Tools

Sitemap XML Generator

Generate a sitemap.xml file

Meta Tag Generator

Generate HTML meta tags

Canonical URL Builder

Build canonical link tags

About This Tool

Builds a robots.txt file from a structured form: user-agent rules, allow/disallow paths, sitemap URLs, and crawl-delay directives. Output follows the de facto standard documented in RFC 9309.

Directives are case-insensitive but path matching is case-sensitive. Search engines have varied support for non-standard fields like crawl-delay; major engines mostly ignore it.

The Robots Exclusion Protocol was first proposed in 1994 by Martijn Koster and remained an informal convention until being codified as RFC 9309 in 2022. The format is line-oriented: each block opens with one or more 'User-agent: <name>' lines naming a crawler, followed by 'Allow:' and 'Disallow:' lines listing paths. A wildcard 'User-agent: *' applies to crawlers not specifically named. The file lives at the root of a host (https://example.com/robots.txt), and crawlers fetch it before any other URL on that host. Subdomains require their own robots.txt; one at the apex does not cover sub.example.com.

A worked example: a typical e-commerce robots.txt blocks crawlers from cart and checkout while allowing the product catalog. 'User-agent: *\nDisallow: /cart\nDisallow: /checkout\nDisallow: /api/\nAllow: /api/sitemap.xml\nSitemap: https://example.com/sitemap.xml'. The Allow exception for /api/sitemap.xml overrides the broader /api/ disallow because RFC 9309 specifies that the most-specific (longest matching) rule wins. Google and Bing follow this convention; older or smaller crawlers may evaluate rules in document order, where the first match takes precedence.

Limitations are widely misunderstood. Disallow blocks crawling, not indexing. A page that cannot be crawled can still appear in search results if other sites link to it, sometimes with no description because the crawler never read the content. Suppressing indexing requires a noindex meta tag or X-Robots-Tag HTTP header, which can only be discovered if the page is crawlable. Robots.txt is also strictly advisory; well-behaved crawlers honor it, malicious scrapers ignore it. Sensitive paths should never be hidden by robots.txt alone, because the file itself is publicly readable and effectively advertises the locations of restricted content. Authentication, server-side IP blocking, or rate limiting are appropriate for genuine access control.

Wildcard syntax (* and $) is supported by Google, Bing, Yandex, and most major engines but is technically an extension to the original 1994 spec. Crawl-delay (a numeric pause between requests in seconds) is honored by Bing and Yandex but ignored by Google, which uses its own crawl-rate algorithm. The Sitemap directive is universally supported and is independent of user-agent blocks.

The about text and FAQ on this page were drafted with AI assistance and reviewed by a member of the Coherence Daddy team before publishing. See our Content Policy for editorial standards.

robots.txt Generator

Related Tools

About This Tool

Frequently Asked Questions

What is a robots.txt file?

Can robots.txt block a page from appearing in Google?