• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Categories
  • Contact Us
    • Advertising
    • Requests
    • Privacy Statement
HelpSpa.com

HelpSpa.com

Computer Tutorials

Browsershots robots.txt error — “Browsershots was blocked by robots.txt”

December 16, 2009 by davidwank

Browsershots.org is a fantastic resource for web developers because it allows you to test how a web page will look in many different browsers. One of the most common problems with browsershots.org, however, is the robots.txt error that people often get when using the service: “Browsershots was blocked by yoursite.com/robots.txt”

Here’s the short version of why you are getting this browsershots robots.txt error and the solution:
Because you are developing a test site for a client, you are naturally going to have a disallow statement in the robots.txt file in your live development directory. Because you have a disallow statement, browsershots can’t access the directory to create the preview files. Thus, what you need to do is to temporarily change the robots.txt to allow browsershots.org to access the specific directory where your site is located. Once browsershots finishes its work, you can then go back and revise your robots.txt file to its original disallow statement.

Below is a more detailed explanation of this concept:

Background
The robots.txt file is a proverbial “gatekeeper” that tells a search engine spider what directories it can and cannot index. Some search engines do not respect the instructions in the robots.txt file, but most engines such as Google currently do appear to respect these instructions. In general, a production website would have a robots.txt file in order to make sure that spiders are able to find content in the site.

In addition to “telling” the web spiders what they can look at or index, the robots.txt file can also list what spiders shouldn’t see. Let’s look at an example:
Assume you have a web site with a root, /, a /content and a /development directory, and you only want the search engines to index the root / and the /content directories, but not the /development directory.

You could write a robots.txt file that looks like this:

[shell]
User-agent: *
Allow: /
Disallow: /development
[/shell]

Real-World Example
Moving back to how robots.txt affects Browsershots.org, and causes you to get an error, realize that most web developers have a domain they use for testing so they can put development versions of websites online for clients to see. Because the test site is online and live on the internet, albeit on a test domain and not the client’s actual domain, Google and other search engines may pick up this development site and index it on search engines — especially if you leave the test site up on the test domain for a while. While you eventually want to have your clients site indexed, you do not want a search engine indexing a clients development website. Imagine if you are building a website called helpspa.com (what a great idea), and when someone searches for helpspa.com on Google they get directed to www.developersite.com/test/helpspa.com — you will not have a happy client!

Thus, in order to leave the test site up and allow the client to see it, but at the same time prevent Google and other search engines from seeing this test site, you would create a robots.txt file that blocks the test directory. So using the example above, you can just replace the directory you wish to block from search engine spiders in your own robots.txt file, and then make sure that this copy is the one one the server. But now when you go to browsershots.org, you will get an error about browsershots not being able to access the site. Thus, you’d go back into the robots.txt and make sure that you remove the disallow or modify the robots.txt so that you directory is available for browsershots to view. Then when browsershots is done, just remember to change the robots.txt file back to what it was with the disallow statement.

To give you an example of how I do it, I have a test domain for my client sites. I’ll make up a name here but the concept is the same. Here is the directory structure

www.myfictitioustestsite.com/client1
www.myfictitioustestsite.com/client2
www.myfictitioustestsite.com/client3
www.myfictitioustestsite.com/client4

I have a robots.txt that looks like this:
[shell]
User-agent: *
Disallow: /
[/shell]

In this manner everything is blocked from search engines while I work. If I want to use browsershots to test the development site for client 2, I’d have something like this
[shell]
User-agent: *
Allow: /client2
Disallow: /
[/shell]

So that’s the story — let me know if you have any questions.

Filed Under: Web Development

Primary Sidebar

More to See

Google Analytics 4 Migration Training

December 22, 2022 By davidwank

Hubspot Zoom Webinar Integration

Connecting HubSpot and Zoom Webinars

August 27, 2022 By davidwank

Google Analytics as we know it is going away!

July 24, 2022 By davidwank

Footer

Text Widget

This is an example of a text widget which can be used to describe a particular service. You can also use other widgets in this location.

Examples of widgets that can be placed here in the footer are a calendar, latest tweets, recent comments, recent posts, search form, tag cloud or more.

Sample Link.

Recent

  • Google Analytics 4 Migration Training
  • Connecting HubSpot and Zoom Webinars
  • Google Analytics as we know it is going away!
  • Building a New Dental Website?
  • BackupBuddy Timeouts on GoDaddy Hosting

Search

Tags

Adobe Acrobat Backup classical music and opera cPanel DAM Database Digital Photography email Excel Facebook Firefox hardware HP Security HTML iMovie Internet Explorer iTunes linksys Microsoft Office Microsoft Word MySQL networking Office 2010 OneNote OpenOffice OS X PDF PHP printers reseller hosting Reviews router SEO subversion Video Virtualization Virtual PC Web Design Web Development Web Hosting Windows Windows 7 Windows Vista Windows XP Wireless

Copyright © 2025 · Magazine Pro on Genesis Framework · WordPress · Log in