Robots.txt is a file that tells search engine crawlers which pages they can or cannot access on your website.

Where should robots.txt be placed?

The robots.txt file must be placed in the root directory of your website (e.g., example.com/robots.txt).

Does robots.txt block pages from appearing in search?

No! Robots.txt prevents crawling, not indexing. Blocked pages can still appear in search results (showing URL without description). Use meta noindex tags to prevent indexing.

What happens if I block CSS/JS in robots.txt?

Don't do it. Google needs to render pages to understand content. Blocking resources harms SEO because Google can't see your page as users do.

Robots.txt Generator - Free Robot Exclusion Tool

Googlebot is crawling your development environment. Your staging site is indexed. Private admin pages are appearing in search results. All because your robots.txt was missing or misconfigured.

Robots.txt is the first thing search engines read before crawling your site. It tells crawlers what they can and can't access. Get it wrong, and you're either invisible to search engines or exposing pages you wanted private.

This generator creates properly formatted robots.txt files with common configurations for WordPress, Next.js, and custom setups.

What is robots.txt?

robots.txt is a text file placed in your website's root directory that instructs web crawlers which pages or sections they should or shouldn't access. It's part of the Robots Exclusion Protocol—a voluntary standard that well-behaved crawlers follow.

Basic structure:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

Robots.txt Is Advisory, Not Enforcement

Well-behaved crawlers (Google, Bing) respect robots.txt. Malicious bots ignore it. It's for directing search engines, not securing sensitive content.

Why People Actually Need This Tool

Critical for SEO and Privacy

A missing robots.txt leaves search engines guessing. A wrong robots.txt can deindex your entire site. Getting it right is essential.

SEO control — Direct search engines to your important content.
Crawl budget management — Prevent crawlers from wasting time on irrelevant pages.
Environment protection — Keep dev and staging sites out of search results.
Admin section privacy — Hide admin, login, and backend pages from indexes.
Duplicate content prevention — Block print versions, filtered pages, and other duplicates.
Sitemap advertisement — Tell crawlers where your sitemap lives.
Crawler-specific rules — Apply different rules to different search engines.

How to Use the Robots.txt Generator

Select a preset — WordPress, Next.js, or start from scratch.
Add rules — Specify which paths to allow or disallow.
Set user agents — Apply rules to all crawlers or specific ones.
Add sitemap — Include your sitemap URL for discovery.
Generate and download — Save as robots.txt in your site root.

Directive	Purpose	Example
User-agent	Which crawler this rule applies to	`User-agent: Googlebot`
Disallow	Paths crawlers should not access	`Disallow: /admin/`
Allow	Override disallow for specific paths	`Allow: /admin/public-page`
Sitemap	Location of your sitemap	`Sitemap: https://example.com/sitemap.xml`
Crawl-delay	Seconds between requests	`Crawl-delay: 10`

Disallow ≠ Noindex

Disallow prevents crawling but doesn't remove pages from search. Pages might still appear if other sites link to them. Use meta noindex for true deindexing.

Real-World Use Cases

1. The WordPress Standard Setup

Context: WordPress site with standard structure.

Problem: Need to block wp-admin, wp-includes, and other WordPress internals.

Solution: Generate WordPress-specific robots.txt with proper exclusions.

Outcome: Search engines crawl content, not WordPress system files.

2. The E-commerce Filter Prevention

Context: Online store with 10,000 products and filter combinations.

Problem: Search engines indexing /products?color=red&size=large creates millions of duplicate pages.

Solution: Disallow filtered URLs: Disallow: /products?*

Outcome: Crawl budget focused on canonical product pages. No duplicate content penalties.

3. The Staging Site Block

Context: Staging environment at staging.example.com accidentally indexed.

Problem: Staging pages competing with production in search results.

Solution: User-agent: * / Disallow: / on staging to block all crawling.

Outcome: Staging removed from search index within weeks.

4. The API Documentation Exception

Context: Site has /api/ path with both docs (public) and endpoints (private).

Problem: Want /api/docs/ indexed but not /api/v1/.

Solution: Disallow: /api/ + Allow: /api/docs/

Outcome: Documentation discoverable, API endpoints hidden.

5. The Print Page Exclusion

Context: CMS generates /print/ versions of every page.

Problem: Print pages duplicating regular pages in search index.

Solution: Disallow: /print/

Outcome: No more duplicate content from print-friendly pages.

6. The Multi-Crawler Configuration

Context: Want Google to crawl everything but block aggressive bots.

Problem: Some crawlers ignore crawl etiquette.

Solution: Allow Googlebot fully, set Crawl-delay for others, block known bad bots.

Outcome: Good crawlers unrestricted, aggressive ones slowed down.

7. The Sitemap Advertisement

Context: Large site with multiple sitemaps needs discovery.

Problem: Search engines not finding all sitemaps.

Solution: Add all sitemap URLs to robots.txt.

Outcome: All sitemaps discovered without manual Search Console submission.

Common Mistakes and How to Avoid Them

One Wrong Line Can Deindex Your Site

Disallow: / blocks your entire site. Triple-check robots.txt before deploying, especially on production.

Blocking CSS and JavaScript

❌ The Mistake

Using `Disallow: /assets/` or similar, preventing Google from rendering your pages properly.

✅ The Fix

Google needs CSS and JS to render pages. Only block what truly shouldn't be crawled.

Using robots.txt for Security

❌ The Mistake

Thinking `Disallow: /secret-admin/` makes the page secure. It doesn't—anyone can still visit it.

✅ The Fix

robots.txt is not security. Use authentication, passwords, and proper access control for sensitive areas.

Forgetting Trailing Slashes

❌ The Mistake

`Disallow: /admin` blocks /admin, /administrator, /administration. `Disallow: /admin/` blocks only /admin/.

✅ The Fix

Be precise with paths. Include trailing slash to match only that directory.

Conflicting Rules

❌ The Mistake

Multiple User-agent blocks with overlapping rules creating unexpected behavior.

✅ The Fix

Specific user-agent blocks override the wildcard. Order matters. Test with Google's robots.txt tester.

Not Testing Before Deploy

❌ The Mistake

Deploying robots.txt changes without testing, accidentally blocking important content.

✅ The Fix

Use Google Search Console's robots.txt tester before deploying. Verify each important URL.

Privacy and Data Handling

This Robots.txt Generator operates entirely in your browser.

No URLs or rules are sent to any server.
No configurations are stored.
No account required.
Works completely offline.

Your site structure and crawler rules stay private.

Conclusion

Robots.txt is simple in concept but critical in execution. A few lines of text control how search engines discover and index your content. Get it right, and crawlers focus on what matters. Get it wrong, and you're either invisible or exposing content you wanted hidden.

This generator creates correct robots.txt files with proper syntax and common configurations. No more syntax errors, no more accidental site-wide blocks, no more missing sitemaps.

Control what gets crawled. The robots are listening.

Robots.txt Generator

What is robots.txt?

Why People Actually Need This Tool

How to Use the Robots.txt Generator

Real-World Use Cases

1. The WordPress Standard Setup

2. The E-commerce Filter Prevention

3. The Staging Site Block

4. The API Documentation Exception

5. The Print Page Exclusion

6. The Multi-Crawler Configuration

7. The Sitemap Advertisement

Common Mistakes and How to Avoid Them

Privacy and Data Handling

Conclusion

Frequently Asked Questions

What is robots.txt?

Where should robots.txt be placed?

Does robots.txt block pages from appearing in search?

What happens if I block CSS/JS in robots.txt?