🤖

Robots.txt Generator

Generate robots.txt file to control search engine crawlers. Create user-agent rules, allow/disallow paths, set crawl delays, and add sitemap URLs. Perfect for managing bot access to your website.

Text_tool
Loading tool...

How to Use Robots.txt Generator

What is Robots.txt?

Robots.txt is a text file placed in your website root directory that tells search engine crawlers which pages or sections of your site they can or cannot access. It is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web. While not all crawlers respect robots.txt, major search engines like Google, Bing, and Yahoo follow these directives.

How to Use This Tool

Step 1: Choose a Preset

Start with one of the preset configurations:

  • Allow All Crawlers: Permit all search engines to crawl everything (default for most sites)
  • Standard Website: Block admin, private, and API directories
  • Blog/News Site: Block WordPress admin while allowing content directories
  • E-commerce Store: Block cart, checkout, and account pages while allowing products
  • Block All Crawlers: Prevent all crawlers from indexing your site (staging/development)

Click any preset to instantly load its configuration.

Step 2: Select User-agent

The user-agent specifies which crawler(s) the rules apply to:

All Crawlers (*)

  • Applies rules to all search engine bots
  • Most common choice for general sites
  • Can be overridden by specific user-agent rules

Specific Crawlers:

  • Googlebot: Google Search crawler
  • Googlebot-Image: Google Image Search
  • Bingbot: Microsoft Bing crawler
  • Slurp: Yahoo Search crawler
  • DuckDuckBot: DuckDuckGo crawler
  • Baiduspider: Baidu (Chinese search engine)
  • YandexBot: Yandex (Russian search engine)
  • facebookexternalhit: Facebook link preview crawler
  • Twitterbot: Twitter/X link preview crawler

You can create multiple user-agent blocks with different rules for each crawler.

Step 3: Configure Allow Rules

Allow rules explicitly permit crawlers to access specific paths:

When to use Allow:

  • Override broader Disallow rules
  • Permit access to specific subdirectories within blocked directories
  • Example: Block /admin/ but allow /admin/public/

Path Syntax:

  • / = Allow root and everything (if no disallow rules)
  • /blog/ = Allow blog directory and all subdirectories
  • /products/ = Allow products directory
  • Leave empty if you want to block everything

Best Practices:

  • Allow rules take precedence over Disallow rules at the same specificity level
  • Use Allow sparingly, primarily to create exceptions
  • Most sites do not need explicit Allow rules

Step 4: Configure Disallow Rules

Disallow rules block crawlers from accessing specific paths:

Common Paths to Block:

  • /admin/ = Admin panel, control panel
  • /wp-admin/ = WordPress admin dashboard
  • /private/ = Private files and directories
  • /temp/ or /tmp/ = Temporary files
  • /api/ = API endpoints
  • /cgi-bin/ = CGI scripts
  • /search/ = Search results pages (duplicate content)
  • /cart/ = Shopping cart pages
  • /checkout/ = Checkout flow pages
  • /account/ = User account pages
  • /login/ and /register/ = Authentication pages

Path Syntax:

  • / = Block everything
  • /admin/ = Block admin directory and all subdirectories
  • /secret.html = Block specific file
  • /*? = Block all URLs with query parameters
  • /*.pdf$ = Block all PDF files
  • /*sessionid= = Block URLs with session IDs

Wildcards:

  • * = Matches any sequence of characters
  • $ = End of URL
  • Example: /private/*.pdf$ blocks all PDFs in private directory

Step 5: Set Crawl Delay (Optional)

Crawl-delay specifies the number of seconds crawlers should wait between requests:

When to use:

  • Limit server load from aggressive crawlers
  • Prevent bandwidth exhaustion
  • Protect resource-intensive pages

Values:

  • 0 = No delay (not recommended, omit the directive instead)
  • 1-5 = Light delay for fast servers
  • 10 = Standard delay for most sites (recommended)
  • 30-60 = Heavy delay for slow servers or heavy scrapers

Important Notes:

  • Google ignores Crawl-delay; use Google Search Console instead
  • Bing and Yandex respect Crawl-delay
  • Too high values may reduce crawling frequency
  • Most modern sites do not need this unless experiencing crawler issues

Step 6: Add Sitemap URL (Optional but Recommended)

Sitemap directive tells crawlers where to find your XML sitemap:

Format:

  • Sitemap: https://example.com/sitemap.xml
  • Must be absolute URL (include https://)
  • Can list multiple sitemaps on separate lines

Benefits:

  • Helps search engines discover all your pages
  • Improves indexing efficiency
  • Provides metadata about page priority and update frequency

Common Sitemap Locations:

  • /sitemap.xml = Root level (most common)
  • /sitemap_index.xml = Sitemap index file
  • /blog/sitemap.xml = Subdirectory sitemap
  • Multiple sitemaps are allowed

Step 7: Copy or Download the File

Two options to save your robots.txt:

Copy Button:

  • Copies content to clipboard
  • Paste into a text editor
  • Save as robots.txt (no file extension)

Download Button:

  • Downloads file directly as robots.txt
  • Ready to upload to your server
  • Preserves correct formatting

Step 8: Upload to Your Website

Upload robots.txt to your website root directory:

File Location:

  • Must be at: https://yoursite.com/robots.txt
  • NOT in subdirectories: /blog/robots.txt (will not work)
  • NOT with different names: robots.txt.txt (invalid)
  • Case-sensitive: robots.txt not Robots.txt

Upload Methods:

FTP/SFTP:

  1. Connect to your server via FTP client (FileZilla, Cyberduck)
  2. Navigate to root directory (public_html, www, or htdocs)
  3. Upload robots.txt file
  4. Set file permissions to 644 (readable by all)

cPanel File Manager:

  1. Log into cPanel
  2. Open File Manager
  3. Navigate to public_html directory
  4. Upload robots.txt file
  5. Verify file is not hidden

WordPress:

  1. Use FTP to upload to root directory (same level as wp-config.php)
  2. Or use All in One SEO / Yoast SEO plugin robots.txt editor
  3. Some WordPress plugins auto-generate robots.txt (check first)

Next.js/Vercel:

  1. Place robots.txt in /public/ directory
  2. Deployed to root automatically
  3. Or use next-sitemap package for dynamic generation

Nginx:

  1. Upload to web root directory (usually /var/www/html)
  2. Ensure proper permissions (644)
  3. Restart Nginx if needed

Step 9: Test Your Robots.txt

Verify your robots.txt file is working correctly:

Manual Check:

  1. Visit: https://yoursite.com/robots.txt
  2. Verify content displays correctly in browser
  3. Check for any 404 errors

Google Search Console:

  1. Go to: search.google.com/search-console
  2. Select your property
  3. Navigate to Legacy Tools & Reports → robots.txt Tester
  4. Enter a URL to test if it is blocked or allowed
  5. Submit robots.txt for indexing

Bing Webmaster Tools:

  1. Go to: bing.com/webmasters
  2. Select your site
  3. Go to Configure My Site → Crawl Control
  4. View current robots.txt
  5. Test URLs against rules

Online Validators:

  • Ryte.com robots.txt validator
  • Technical SEO robots.txt tester
  • Screaming Frog robots.txt analyzer

Robots.txt Syntax and Rules

Basic Structure

User-agent: *
Allow: /
Disallow: /private/
Crawl-delay: 10

Sitemap: https://example.com/sitemap.xml

Multiple User-agent Blocks

You can define different rules for different crawlers:

# Allow Googlebot to access everything
User-agent: Googlebot
Allow: /

# Block other crawlers from certain paths
User-agent: *
Disallow: /private/
Disallow: /admin/

Path Matching Rules

Exact Path:

Disallow: /admin/

Blocks: /admin/, /admin/users/, /admin/settings.php Allows: /administrator/ (different path)

Wildcard (*):

Disallow: /search*

Blocks: /search, /search/, /search?q=test, /searchresults

End Anchor ($):

Disallow: /*.pdf$

Blocks: /documents/file.pdf, /downloads/guide.pdf Allows: /pdf/ (directory, not file)

Query Parameters:

Disallow: /*?

Blocks all URLs with query parameters

Case Sensitivity:

  • Paths are case-sensitive: /Admin//admin/
  • User-agent names are case-insensitive: Googlebot = googlebot

Allow vs Disallow Priority

When rules conflict, most specific rule wins:

User-agent: *
Disallow: /admin/
Allow: /admin/public/

Result: /admin/public/ is allowed, rest of /admin/ is blocked

Comments

Use # for comments:

# Block admin area
User-agent: *
Disallow: /admin/  # Admin panel

# Allow public resources
Allow: /public/

Common Use Cases

Standard Public Website

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Allows all crawlers to index everything.

WordPress Site

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

Blocks WordPress admin while allowing AJAX endpoints.

E-commerce Store

User-agent: *
Allow: /
Allow: /products/
Allow: /categories/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

Allows product pages, blocks transactional pages.

Staging/Development Site

User-agent: *
Disallow: /

Blocks all crawlers from accessing the entire site.

Block Specific Crawlers

# Block bad bots
User-agent: BadBot
User-agent: ScraperBot
Disallow: /

# Allow good bots
User-agent: *
Allow: /

Blocks specific malicious crawlers.

Prevent Image Indexing

User-agent: Googlebot-Image
Disallow: /images/

User-agent: *
Allow: /

Blocks Google from indexing images while allowing text crawling.

Important Limitations

Robots.txt is NOT Security

What it does NOT do:

  • Does not prevent malicious bots from accessing pages
  • Does not hide pages from search results if linked externally
  • Does not remove pages already indexed by search engines
  • Can be ignored by any crawler (it is just a request, not enforcement)

For actual security:

  • Use password protection (.htaccess, server authentication)
  • Implement IP whitelisting
  • Use noindex meta tags or X-Robots-Tag headers
  • Apply proper file permissions

Cannot Remove Indexed Pages

If pages are already indexed:

  • Robots.txt will not remove them from search results
  • Use noindex meta tag instead: <meta name="robots" content="noindex">
  • Or use X-Robots-Tag HTTP header
  • Then request removal in Google Search Console

File Must Be Accessible

  • Robots.txt must return 200 OK status
  • Must be plain text (text/plain)
  • Must be UTF-8 encoded
  • Maximum size: 500 KiB (recommended under 100 KB)
  • Cannot use redirects (301/302)

Troubleshooting

Robots.txt Not Working?

Check File Location:

  • Must be at exact path: /robots.txt
  • Not in subdirectory or with wrong name
  • Case-sensitive filename

Verify File Permissions:

  • Set to 644 (readable by all)
  • Not executable (do not use 777)

Test Accessibility:

Syntax Errors:

  • No syntax errors in directives
  • Check spelling: User-agent not User-Agent or Useragent
  • No extra spaces or special characters

Pages Still Being Indexed?

Solutions:

  • Add noindex meta tag: <meta name="robots" content="noindex">
  • Wait for next crawl (can take weeks)
  • Use Google Search Console Removals tool
  • Check for external links pointing to blocked pages

Frequently Asked Questions

Related Marketing & SEO Tools

🏷️

Meta Tag Generator

Generate HTML meta tags for SEO optimization. Create title, description, keywords, viewport, charset, robots, and author meta tags. Perfect for improving search engine rankings and social sharing.

Use Tool →
🔍

Google SERP Simulator

Preview how your title and meta description appear in Google search results. See real-time character counts, pixel width estimates, and desktop/mobile previews to optimize your SEO.

Use Tool →

FAQ Schema Generator

Generate JSON-LD FAQPage schema markup for SEO. Add questions and answers to create structured data that helps search engines display FAQ rich snippets in search results.

Use Tool →
🍞

Breadcrumb Schema Generator

Generate JSON-LD BreadcrumbList schema markup for SEO. Add breadcrumb items with names and URLs to create structured data that helps search engines understand your site hierarchy.

Use Tool →
🐦

Twitter Card Generator

Generate Twitter Card meta tags for Twitter/X sharing. Create summary cards, large image cards, app cards, and player cards. Optimize how your links appear on Twitter with custom titles, descriptions, and images.

Use Tool →
📱

Open Graph Generator

Generate Facebook Open Graph meta tags for social media sharing. Create og:title, og:description, og:image, og:url, and og:type tags. Perfect for optimizing how your links appear on Facebook, LinkedIn, WhatsApp, and Slack.

Use Tool →
🛍️

Product Schema Generator

Generate JSON-LD Product schema markup for SEO. Add product details like name, price, brand, rating, and availability to create structured data for rich search results.

Use Tool →

Share Your Feedback

Help us improve this tool by sharing your experience

We will only use this to follow up on your feedback