A text file placed on a website to instruct search engine crawlers which pages or sections of the site should or should not be indexed.
Robots.txt is a simple text file that sits on the server with your web site. It’s basically your web site’s way of giving instructions to search engines about what how they index your web site.
Search Engines tend to look for the robots.txt file when they first visit a site. They can visit and index your site whether you have a robots.txt file or not; having one simply helps them along the way.
All of the major search engines read and follow the instructions in a robots.txt file. That means it’s a pretty effective way to keep content out of the search indexes.
A word of warning. While some sites will tell you to use robots.txt to block premium content you don’t want people to see, this isn’t a good idea. While most search engines will respect your robots.txt file and ignore the content you want to have blocked, a far safer option is to hide that premium content behind a login. Requiring a username and password to access the content you want hidden from the public will do a much more effective job of keeping both search engines and people out.
What Does Robots.txt Look Like?
The average robots.txt file is one of the simplest pieces of code you’ll ever write or edit.
If you want to have a robots.txt file for the engines to visit, but don’t want to give them any special instructions, simply open up a text editor and type in the following:
User-Agent: *
Disallow:
The “User-Agent” part specifies which search engines you are giving the directions to. Using the asterisk means you are giving directions to ALL search engines.
The “disallow” part specifies what content you don’t want the search engines to index. If you don’t want to block the search engines from any area of your web site, you simply leave this area blank.
For most small web sites, those two simple lines are all you really need.
If your web site is a little bit larger, or you have a lot of folders on your server, you may want to use the robots.txt file to give some instructions about which content to avoid.
A good example of this would be a site that has printer-friendly versions of all of their content housed in a folder called “print-ready.” There’s no reason for the search engines to index both forms of the content, so it’s a good idea to go ahead and block the engines from indexing the printer-friendly versions.
In this case, you’d leave the “user-agent” section alone, but would add the print-ready folder to the “disallow” line. That robots.txt file would look like this:
User-Agent: *
Disallow: /print-ready/
It’s important to note the forward slashes before and after the folder name. The search engines will tack that folder on to the end of the domain name they are visiting.
That means the /print-ready/ file is found at www.yourdomain.com/print-ready/. If it’s actually found at www.yourdomain.com/css/print-ready/ you’ll need to format your robots.txt this way:
User-Agent: *
Disallow: /css/print-ready/
You can also edit the “user-agent” line to refer to specific search engines. To do this, you’ll need to look up the name of a search engine’s robot. (For instance, Google’s robot is called “googlebot” and Yahoo’s is called “slurp.”)
If you want to set up your robots.txt file to give instructions ONLY to Google, you would format it like this:
User-Agent: googlebot
Disallow: /css/print-ready/
How do I Put Robots.txt on my Site?
Once you’ve written your robots.txt file to reflect the directions you want to give the search engines, you simply save the text file as “robots.txt” and upload it to the root folder of your web site.