If you have a website, you know how important search engine placement is to driving new clients to your site. What you may not realize is that you can control which pages the search engines see by uploading a simple file to your site.
Before I get into the details, I think it's important to talk a little bit about how search engines work. Each of the major search engines (Google, Yahoo, MSN, and Ask) use what are called "spiders" or "robots" to try to visit every web page on the internet and add each one to their index. Once the spider adds your site to the index, the search engine then decides where each page will rank for certain terms.
The first thing a spider does when it visits a website is to look for a "robots.txt" file. This file tells the spider the areas of your site where it's not allowed to go. If you don't have one (or if yours is blank), you are telling the spider, "Please index my entire site."
Believe it or not, this is a problem!
It may seem counterintuitive to block the search engines from accessing certain areas of your site, but otherwise the spiders are going to spend a considerable amount of time indexing pages that will never rank, never bring you traffic, and never bring a client through your door. Blocking these pages from the spiders also funnels your site's PageRank to your optimized pages, which means they'll rank higher in the SERPs.
What type of pages should we block from the spiders? Anything that isn't optimized for search engine placement. A typical list would include contact pages, image galleries, policy pages, etc.
It may help to look at an example, so let's take a look at a robots.txt file I am familiar with (I wrote it) on agentBOOST.com.
The first line of the file looks like this:
User-Agent: *
This line is telling the spiders that the rules that follow apply to all spiders (the * means every spider). For an example where individual spiders are addressed, take a look at activerain's robots.txt file, which gives specific instructions to Googlebot (Google's spider) and ShopWiki.
Going back to agentBOOST.com's robots.txt file, the ten lines after the User-Agent line all begin with "Disallow:" which is then followed by a directory on our site. It should come as no surprise that each of these lines is telling the spiders that they are not allowed access to a certain directory.
The first two lines (/terms/ and /privacy/) disallow the search engine spiders from indexing agentBOOST's Terms of Service and Privacy Policy. While both of these pages are important to our users, I don't see the benefit of having search engines wasting their time or our bandwidth/PageRank on these pages; we don't aspire to rank for the term "Privacy Policy"!
The next five lines (/user/, /agent/, /bid/, /property/, and /logout/) block the spiders from trying to index areas of the site that were built for our members to navigate the site, but not for search engines. This seems like a good time to point out a very important, powerful, and dangerous aspect of the robots.txt file:
When you disallow a directory in your robots.txt file it also blocks all the subdirectories under that directory!
We don't have to add lines for /user/register/ or /user/password/, for instance, because these are subdirectories of /user/. Just make sure you don't abuse this power by adding "Disallow: /", which will block your entire site!
The next three lines (/blog/category/ and /blog/feed/) block the spiders from indexing areas of our blog that may be considered duplicate content. The last line (/blog/subscribe) disallows our blog's subscribe page, which isn't optimized for anything in particular.
Remember, search engines have finite resources and billions (trillions?) of pages to index. When the spider comes to visit your site don't let it waste time on pages that aren't going to do you any good! Utilizing a robots.txt file is a great way to hold the spider's hand and bring them to the content you worked so hard to optimize.
I hope you found this quick tutorial on robots.txt helpful and informative.
If you'd like us to show you how to get the most cost-effective real estate leads, with no monthly fee and no percentage of your commission, please visit us at http://agentBOOST.com
Chris
http://agentBOOST.com
Comments(4)