How to Write a robots.txt File

It's easy to learn how to write a valid robots.txt file that search engine spiders will follow and clearly understand. This how to takes you through the steps.
Difficulty: Average
Time Required: 5 minutes
Here's How:
In a text editor, open a file named robots.txt. Note that the name must be all lower case, even if your Web pages are hosted on a Windows Web server. You'll need to save this file to the root of your Web server. For example:
http://webdesign.about.com/robots.txt
The format of the robots.txt file is
User-agent: robot Disallow: files or directories
You can use wildcards to indicate all robots, or all robots of a certain type. For example:
To specify all robots:
User-agent: *
To specify all robots that start with the letter A:
User-agent: A*
The disallow lines can specify files or directories:
Don't allow robots to view any files on the site:
Disallow: /
Don't allow robots to view the index.html file
Disallow: /index.html
If you leave the Disallow blank, that means that all files can be retrieved, for example, you might want the Googlebot to see everything on your site:
User-agent: Googlebot Disallow:
If you disallow a directory, then all files below it will be disallowed as well.
Disallow: /norobots/
You can also use multiple Disallows for one User-agent, to deny access to multiple areas:
User-agent: * Disallow: /cgi-bin/ Disallow: /images/
You can include comments in your robots.txt file, by putting a pound-sign (#) at the front of the line to be commented:
# Allow Googlebot anywhere User-agent: Googlebot Disallow:
Robots follow the rules in order. For example, if you set googlebot specifically in one of your first directives, it will then ignore a directive lower down that is set to a wildcard.
# Allow Googlebot anywhere User-agent: Googlebot Disallow:  # Allow no other bots on the site User-agent: * Disallow: /
Tips:
Find robot User-agent names in your Web log
Always follow the capitalization of the agent names and the file and directories. If you disallow /IMAGES the robots will spider your /images folder
Put your most specific directives first, and your more inclusive ones (with wildcards) last

via webdesign.about.com

Sanity for Superheroes

Search This Blog

How to Write a robots.txt File

Here's How:

Tips:

Comments

Post a Comment