Where should I place the robots.txt file on my website?

The robots.txt file must be placed in the root directory of your website and accessible at https://yoursite.com/robots.txt. It cannot be in subdirectories like /blog/robots.txt or have a different name. For most servers, upload to public_html, www, or htdocs directory. For WordPress, place in the same directory as wp-config.php. For Next.js, place in the /public/ folder. The file must return 200 OK status, be plain text format, and have 644 permissions. Test by visiting yoursite.com/robots.txt in a browser to verify it displays correctly.

What is the difference between Allow and Disallow rules?

Disallow rules block crawlers from accessing specific paths (Disallow: /admin/ prevents crawling the admin directory). Allow rules explicitly permit access, usually to override broader Disallow rules (Allow: /admin/public/ creates an exception within a blocked directory). If there are no Disallow rules, Allow rules are not necessary since everything is allowed by default. When rules conflict, the most specific rule wins. For example, if you block /admin/ but allow /admin/public/, crawlers can access the public subdirectory but not other admin areas. Most websites primarily use Disallow rules with occasional Allow exceptions.

Does robots.txt prevent pages from being indexed by search engines?

No, robots.txt only tells crawlers not to visit pages, but it does not prevent indexing. If pages are linked from external sites, they can still appear in search results even if blocked in robots.txt. To actually prevent indexing, use the noindex meta tag in the page HTML: or use the X-Robots-Tag HTTP header. For already-indexed pages, add noindex tags and request removal via Google Search Console. Robots.txt is for controlling crawler access, not for removing pages from search results or providing security.

What should I put in Crawl-delay and do all search engines respect it?

Crawl-delay specifies seconds crawlers should wait between requests. Use 10 seconds as a standard value for most websites. Lower values (1-5) for fast servers with good bandwidth, higher values (30-60) if experiencing server overload from aggressive crawlers. However, Google ignores Crawl-delay completely - use Google Search Console to adjust crawl rate instead. Bing and Yandex respect Crawl-delay. Many modern sites omit this directive entirely unless experiencing specific crawler performance issues. Setting it too high may reduce how often search engines crawl your site, potentially delaying indexing of new content.

Can I use wildcards in robots.txt paths?

Yes, robots.txt supports two wildcards: asterisk (*) matches any sequence of characters, and dollar sign ($) matches the end of URL. Examples: Disallow: /*? blocks all URLs with query parameters, Disallow: /*.pdf$ blocks all PDF files, Disallow: /search* blocks anything starting with /search. These wildcards are supported by Google, Bing, and most modern crawlers. Without wildcards, /admin/ blocks /admin/anything but not /administrator/. Use wildcards to create powerful blocking rules, but test carefully as they can block more than intended. Always validate with Google Search Console robots.txt tester.

Should I include a Sitemap in robots.txt?

Yes, absolutely! Adding your sitemap URL to robots.txt helps search engines discover all your pages more efficiently. Format: Sitemap: https://example.com/sitemap.xml (must be absolute URL with https://). You can list multiple sitemaps if needed. This is optional but highly recommended for SEO. Even if you submit your sitemap directly to Google Search Console and Bing Webmaster Tools, including it in robots.txt provides a fallback discovery method. Most major CMS platforms (WordPress, Shopify, Next.js) generate sitemaps automatically. Place the Sitemap directive at the bottom of your robots.txt file after all user-agent blocks.

How do I block specific bad bots while allowing good search engines?

Create separate User-agent blocks for different crawlers. First, block the bad bots by name: User-agent: BadBot followed by Disallow: /. Then create a separate block for all others: User-agent: * followed by Allow: /. The User-agent is case-insensitive. You can list multiple bad bots under consecutive User-agent lines before the Disallow directive. However, malicious bots often ignore robots.txt entirely. For actual security against bad bots, use server-level blocking (firewall rules, .htaccess, IP bans), rate limiting, or web application firewalls (Cloudflare, Sucuri). Robots.txt is just a polite request, not enforcement.

What happens if I accidentally block important pages in robots.txt?

If you block important pages in robots.txt, search engines will stop crawling them and over time may remove them from search results, causing significant SEO damage. To fix: 1) Immediately update robots.txt to allow access to those pages, 2) Use Google Search Console robots.txt tester to verify the fix, 3) Submit the updated robots.txt, 4) Request re-indexing of affected pages in Search Console, 5) Create or update your XML sitemap to include those pages. Recovery can take days to weeks depending on crawl frequency. Always test robots.txt changes in Google Search Console BEFORE deploying to production. Keep backups of working robots.txt files.

🤖

Robots.txt Generator

Generate robots.txt file to control search engine crawlers. Create user-agent rules, allow/disallow paths, set crawl delays, and add sitemap URLs. Perfect for managing bot access to your website.

Text_tool

Loading tool...

How to Use Robots.txt Generator

What is Robots.txt?

Robots.txt is a text file placed in your website root directory that tells search engine crawlers which pages or sections of your site they can or cannot access. It is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web. While not all crawlers respect robots.txt, major search engines like Google, Bing, and Yahoo follow these directives.

How to Use This Tool

Step 1: Choose a Preset

Start with one of the preset configurations:

Allow All Crawlers: Permit all search engines to crawl everything (default for most sites)
Standard Website: Block admin, private, and API directories
Blog/News Site: Block WordPress admin while allowing content directories
E-commerce Store: Block cart, checkout, and account pages while allowing products
Block All Crawlers: Prevent all crawlers from indexing your site (staging/development)

Click any preset to instantly load its configuration.

Step 2: Select User-agent

The user-agent specifies which crawler(s) the rules apply to:

All Crawlers (*)

Applies rules to all search engine bots
Most common choice for general sites
Can be overridden by specific user-agent rules

Specific Crawlers:

Googlebot: Google Search crawler
Googlebot-Image: Google Image Search
Bingbot: Microsoft Bing crawler
Slurp: Yahoo Search crawler
DuckDuckBot: DuckDuckGo crawler
Baiduspider: Baidu (Chinese search engine)
YandexBot: Yandex (Russian search engine)
facebookexternalhit: Facebook link preview crawler
Twitterbot: Twitter/X link preview crawler

You can create multiple user-agent blocks with different rules for each crawler.

Step 3: Configure Allow Rules

Allow rules explicitly permit crawlers to access specific paths:

When to use Allow:

Override broader Disallow rules
Permit access to specific subdirectories within blocked directories
Example: Block /admin/ but allow /admin/public/

Path Syntax:

/ = Allow root and everything (if no disallow rules)
/blog/ = Allow blog directory and all subdirectories
/products/ = Allow products directory
Leave empty if you want to block everything

Best Practices:

Allow rules take precedence over Disallow rules at the same specificity level
Use Allow sparingly, primarily to create exceptions
Most sites do not need explicit Allow rules

Step 4: Configure Disallow Rules

Disallow rules block crawlers from accessing specific paths:

Common Paths to Block:

/admin/ = Admin panel, control panel
/wp-admin/ = WordPress admin dashboard
/private/ = Private files and directories
/temp/ or /tmp/ = Temporary files
/api/ = API endpoints
/cgi-bin/ = CGI scripts
/search/ = Search results pages (duplicate content)
/cart/ = Shopping cart pages
/checkout/ = Checkout flow pages
/account/ = User account pages
/login/ and /register/ = Authentication pages

Path Syntax:

/ = Block everything
/admin/ = Block admin directory and all subdirectories
/secret.html = Block specific file
/*? = Block all URLs with query parameters
/*.pdf$ = Block all PDF files
/*sessionid= = Block URLs with session IDs

Wildcards:

* = Matches any sequence of characters
$ = End of URL
Example: /private/*.pdf$ blocks all PDFs in private directory

Step 5: Set Crawl Delay (Optional)

Crawl-delay specifies the number of seconds crawlers should wait between requests:

When to use:

Limit server load from aggressive crawlers
Prevent bandwidth exhaustion
Protect resource-intensive pages

Values:

0 = No delay (not recommended, omit the directive instead)
1-5 = Light delay for fast servers
10 = Standard delay for most sites (recommended)
30-60 = Heavy delay for slow servers or heavy scrapers

Important Notes:

Google ignores Crawl-delay; use Google Search Console instead
Bing and Yandex respect Crawl-delay
Too high values may reduce crawling frequency
Most modern sites do not need this unless experiencing crawler issues

Step 6: Add Sitemap URL (Optional but Recommended)

Sitemap directive tells crawlers where to find your XML sitemap:

Format:

Sitemap: https://example.com/sitemap.xml
Must be absolute URL (include https://)
Can list multiple sitemaps on separate lines

Benefits:

Helps search engines discover all your pages
Improves indexing efficiency
Provides metadata about page priority and update frequency

Common Sitemap Locations:

/sitemap.xml = Root level (most common)
/sitemap_index.xml = Sitemap index file
/blog/sitemap.xml = Subdirectory sitemap
Multiple sitemaps are allowed

Step 7: Copy or Download the File

Two options to save your robots.txt:

Copy Button:

Copies content to clipboard
Paste into a text editor
Save as robots.txt (no file extension)

Download Button:

Downloads file directly as robots.txt
Ready to upload to your server
Preserves correct formatting

Step 8: Upload to Your Website

Upload robots.txt to your website root directory:

File Location:

Must be at: https://yoursite.com/robots.txt
NOT in subdirectories: ~~/blog/robots.txt~~ (will not work)
NOT with different names: ~~robots.txt.txt~~ (invalid)
Case-sensitive: robots.txt not Robots.txt

Upload Methods:

FTP/SFTP:

Connect to your server via FTP client (FileZilla, Cyberduck)
Navigate to root directory (public_html, www, or htdocs)
Upload robots.txt file
Set file permissions to 644 (readable by all)

cPanel File Manager:

Log into cPanel
Open File Manager
Navigate to public_html directory
Upload robots.txt file
Verify file is not hidden

WordPress:

Use FTP to upload to root directory (same level as wp-config.php)
Or use All in One SEO / Yoast SEO plugin robots.txt editor
Some WordPress plugins auto-generate robots.txt (check first)

Next.js/Vercel:

Place robots.txt in /public/ directory
Deployed to root automatically
Or use next-sitemap package for dynamic generation

Nginx:

Upload to web root directory (usually /var/www/html)
Ensure proper permissions (644)
Restart Nginx if needed

Step 9: Test Your Robots.txt

Verify your robots.txt file is working correctly:

Manual Check:

Visit: https://yoursite.com/robots.txt
Verify content displays correctly in browser
Check for any 404 errors

Google Search Console:

Go to: search.google.com/search-console
Select your property
Navigate to Legacy Tools & Reports → robots.txt Tester
Enter a URL to test if it is blocked or allowed
Submit robots.txt for indexing

Bing Webmaster Tools:

Go to: bing.com/webmasters
Select your site
Go to Configure My Site → Crawl Control
View current robots.txt
Test URLs against rules

Online Validators:

Ryte.com robots.txt validator
Technical SEO robots.txt tester
Screaming Frog robots.txt analyzer

Robots.txt Syntax and Rules

Basic Structure

User-agent: *
Allow: /
Disallow: /private/
Crawl-delay: 10

Sitemap: https://example.com/sitemap.xml

Multiple User-agent Blocks

You can define different rules for different crawlers:

# Allow Googlebot to access everything
User-agent: Googlebot
Allow: /

# Block other crawlers from certain paths
User-agent: *
Disallow: /private/
Disallow: /admin/

Path Matching Rules

Exact Path:

Disallow: /admin/

Blocks: /admin/, /admin/users/, /admin/settings.php Allows: /administrator/ (different path)

Wildcard (*):

Disallow: /search*

Blocks: /search, /search/, /search?q=test, /searchresults

End Anchor ($):

Disallow: /*.pdf$

Blocks: /documents/file.pdf, /downloads/guide.pdf Allows: /pdf/ (directory, not file)

Query Parameters:

Disallow: /*?

Blocks all URLs with query parameters

Case Sensitivity:

Paths are case-sensitive: /Admin/ ≠ /admin/
User-agent names are case-insensitive: Googlebot = googlebot

Allow vs Disallow Priority

When rules conflict, most specific rule wins:

User-agent: *
Disallow: /admin/
Allow: /admin/public/

Result: /admin/public/ is allowed, rest of /admin/ is blocked

Comments

Use # for comments:

# Block admin area
User-agent: *
Disallow: /admin/  # Admin panel

# Allow public resources
Allow: /public/

Common Use Cases

Standard Public Website

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Allows all crawlers to index everything.

WordPress Site

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

Blocks WordPress admin while allowing AJAX endpoints.

E-commerce Store

User-agent: *
Allow: /
Allow: /products/
Allow: /categories/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

Allows product pages, blocks transactional pages.

Staging/Development Site

User-agent: *
Disallow: /

Blocks all crawlers from accessing the entire site.

Block Specific Crawlers

# Block bad bots
User-agent: BadBot
User-agent: ScraperBot
Disallow: /

# Allow good bots
User-agent: *
Allow: /

Blocks specific malicious crawlers.

Prevent Image Indexing

User-agent: Googlebot-Image
Disallow: /images/

User-agent: *
Allow: /

Blocks Google from indexing images while allowing text crawling.

Important Limitations

Robots.txt is NOT Security

What it does NOT do:

Does not prevent malicious bots from accessing pages
Does not hide pages from search results if linked externally
Does not remove pages already indexed by search engines
Can be ignored by any crawler (it is just a request, not enforcement)

For actual security:

Use password protection (.htaccess, server authentication)
Implement IP whitelisting
Use noindex meta tags or X-Robots-Tag headers
Apply proper file permissions

Cannot Remove Indexed Pages

If pages are already indexed:

Robots.txt will not remove them from search results
Use noindex meta tag instead: <meta name="robots" content="noindex">
Or use X-Robots-Tag HTTP header
Then request removal in Google Search Console

File Must Be Accessible

Robots.txt must return 200 OK status
Must be plain text (text/plain)
Must be UTF-8 encoded
Maximum size: 500 KiB (recommended under 100 KB)
Cannot use redirects (301/302)

Troubleshooting

Robots.txt Not Working?

Check File Location:

Must be at exact path: /robots.txt
Not in subdirectory or with wrong name
Case-sensitive filename

Verify File Permissions:

Set to 644 (readable by all)
Not executable (do not use 777)

Test Accessibility:

Visit https://yoursite.com/robots.txt in browser
Should display text content
Check for 404, 403, or 500 errors

Syntax Errors:

No syntax errors in directives
Check spelling: User-agent not User-Agent or Useragent
No extra spaces or special characters

Pages Still Being Indexed?

Solutions:

Add noindex meta tag: <meta name="robots" content="noindex">
Wait for next crawl (can take weeks)
Use Google Search Console Removals tool
Check for external links pointing to blocked pages

Frequently Asked Questions

Related Marketing & SEO Tools

🏷️

Meta Tag Generator

Generate HTML meta tags for SEO optimization. Create title, description, keywords, viewport, charset, robots, and author meta tags. Perfect for improving search engine rankings and social sharing.

Use Tool →

🔍

Google SERP Simulator

Preview how your title and meta description appear in Google search results. See real-time character counts, pixel width estimates, and desktop/mobile previews to optimize your SEO.

Use Tool →

❓

FAQ Schema Generator

Generate JSON-LD FAQPage schema markup for SEO. Add questions and answers to create structured data that helps search engines display FAQ rich snippets in search results.

Use Tool →

🍞

Breadcrumb Schema Generator

Generate JSON-LD BreadcrumbList schema markup for SEO. Add breadcrumb items with names and URLs to create structured data that helps search engines understand your site hierarchy.

Use Tool →

🐦

Twitter Card Generator

Generate Twitter Card meta tags for Twitter/X sharing. Create summary cards, large image cards, app cards, and player cards. Optimize how your links appear on Twitter with custom titles, descriptions, and images.

Use Tool →

📱

Open Graph Generator

Generate Facebook Open Graph meta tags for social media sharing. Create og:title, og:description, og:image, og:url, and og:type tags. Perfect for optimizing how your links appear on Facebook, LinkedIn, WhatsApp, and Slack.

Use Tool →

🛍️

Product Schema Generator

Generate JSON-LD Product schema markup for SEO. Add product details like name, price, brand, rating, and availability to create structured data for rich search results.

Use Tool →

Share Your Feedback

Help us improve this tool by sharing your experience