All free tools

Robots.txt, done right.

Build a valid robots.txt in 60 seconds. User-agent rules, allow and disallow paths, sitemap location. Block AI crawlers or let them in. Copy-paste to your site root.

Generated robots.txt 0 lines

Save as robots.txt in your site root

The four directives

What each line actually does.

Robots.txt has a tiny vocabulary: user-agent, disallow, allow, sitemap. That is it. Master those four and you control every crawler that visits your site. Here is what each one means and the common patterns.

User-agent Target

User-agent: * (all crawlers)

Which crawler the next rules apply to. * means everyone. Use Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot to target specific ones. Group multiple user-agents under one block by listing each on its own line before the rules.

Disallow Block

Disallow: /admin/

Path patterns the user-agent should NOT crawl. Use a leading slash. Trailing slash matters (it means folder; no slash means prefix match). Use * for wildcards. Empty Disallow: means "block nothing" (i.e., allow all).

Allow Override

Allow: /admin/public/

Carve out exceptions inside a Disallow path. If /admin/ is blocked but /admin/public/ should be crawled, add Allow for the inner path. Allow lines must come before the Disallow they override (or be more specific).

Sitemap Pointer

Sitemap: https://yoursite.com/sitemap.xml

Tells crawlers where to find your XML sitemap. Use absolute URLs (not relative). Can appear anywhere in the file but convention is the last line. Multiple sitemaps are allowed (one per line). Improves discovery of new pages.

Wildcards Pattern

Disallow: /*.pdf$ (all PDFs, $ = end)

* matches any sequence. $ marks end of URL. Disallow: /*?* blocks any URL with query parameters. Disallow: /tag/* blocks all tag pages. Google and Bing both honor wildcards; older or niche crawlers may not.

Crawl-delay Throttle

Crawl-delay: 10 (seconds between requests)

Tells the crawler to wait N seconds between requests. Google IGNORES this (use Search Console's crawl rate setting instead). Bing, Yandex, and many smaller crawlers respect it. Useful to throttle aggressive bots that strain your server.

User-agent names you should know

Googlebot

Main Google search crawler. Almost no site should block this. Drives the bulk of organic traffic for ranked pages.

GPTBot

OpenAI's crawler. Block to opt out of ChatGPT training. Allow to be cited in ChatGPT browse and search responses.

ClaudeBot

Anthropic's crawler for Claude. Same trade-off as GPTBot: training opt-out vs citation visibility in Claude responses.

Sources: Google Search Central docs, OpenAI GPTBot documentation, Anthropic Crawler documentation, current as of 2024.

Common questions

Honest answers.

What is robots.txt?

Robots.txt is a plain-text file at the root of your website (yoursite.com/robots.txt) that tells search engine crawlers which pages to crawl and which to skip. Every major crawler reads it before fetching anything else. It is the simplest, most universal way to control what Google, Bing, and AI crawlers like GPTBot and ClaudeBot can access on your site.

Where do I put the robots.txt file?

At the root of your domain: yoursite.com/robots.txt (not yoursite.com/folder/robots.txt). Most CMSes have a setting for this. WordPress: Yoast or RankMath plugin. systeme.io: handles it automatically based on your domain. Webflow: Site settings > SEO > robots.txt. For hand-coded sites: upload the file to your web root via FTP or your hosting panel.

Does robots.txt prevent Google from indexing a page?

No. Robots.txt only tells crawlers what to crawl. A blocked page can still appear in search results if other sites link to it (Google indexes the URL but cannot read the content). To prevent indexing entirely, use a noindex meta tag on the page itself OR remove the page and serve a 404. Robots.txt is a crawl control, not an indexing control.

What is the difference between Disallow and noindex?

Disallow (in robots.txt) tells crawlers not to FETCH the page. Noindex (in a meta tag or HTTP header) tells crawlers not to INDEX the page (it can still be fetched). Critical implication: if you Disallow a page in robots.txt, Google cannot see the noindex tag on that page, so the URL might stay indexed forever as a blank entry. To deindex something properly, allow crawling AND set noindex.

What user-agents should I block?

For most sites: none. Search crawlers (Googlebot, Bingbot) drive traffic; blocking them hurts SEO. AI crawlers (GPTBot, ClaudeBot, PerplexityBot) are now a strategic decision: block to prevent your content training AI, allow to be cited as a source in AI Overviews and chatbot responses. Block scrapers and bandwidth-hungry bots (SemrushBot, AhrefsBot) if they overload your server, but be aware this also blocks competitive SEO research you might use yourself.

Should I block AI crawlers like GPTBot and ClaudeBot?

Depends on your strategy. Block them if you sell content (publishers, course creators) and do not want it trained on without compensation. Allow them if you want visibility in ChatGPT, Claude, Perplexity, and Google AI Overviews — AI engines cite sources they were trained on or can browse. Major AI-bots and their robots.txt names: GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Gemini training), PerplexityBot (Perplexity), CCBot (Common Crawl, used by many).

Is my data kept private?

Yes. The generator runs in your browser. Nothing you enter is sent to systeme.io or any other server, stored, or logged. You can verify in DevTools by watching the network tab while you use the tool.

systeme.io handles robots.txt for you

Run your funnels on systeme.io.

Build landing pages, sales funnels, online courses, email automations, and affiliate programs on one platform. Robots.txt, sitemap, SSL handled automatically. Free plan, 2,000 contacts.

Start free