In case you missed it, there was a good debate ongoing in the numerous comments of a recent blog post by Mary McKnight. The debate closed in on a basic question -
Is there more than one class of index in Google?
And if so, if you have a larger percentage of pages in the "secondary index", would it affect content discoverability? The answer is clearly yes - read on.
I've done a little more research and found that the idea of a yellow band (as coined by Mary), and secondary index (as suggested by me) is more commonly known as the "supplemental index".
"Hey, pages get added to the supplemental index using automatic algorithms. You can imagine a lot of useful criteria, including that we saw a url during the main crawl but didn't have a have a chance to crawl it when we first saw it. Think of this as icing on the cake. If there's an obscure search, we're willing to do extra work with this new experimental feature to turn up more results. The net outcome is more search results for people doing power searches." - GoogleGuy, Aug 27, 2003 (this is the first indication Google started experimenting with the supplemental index)
"As Google explains it, it’s a question of priorities. Supplemental results have a secondary priority. So they’re spidered less frequently and may well have less information held about them in the database. Google says that the PageRank is unaffected [by the supplemental index]. Currently there seem to be few supplemental results showing in typical keyword searches. That suggests to me it’s better to do what it takes to get your web pages into the regular index and avoid the supplemental index." - Barry Welford (Supplemental Results - A Word to the Wise)
I noticed on a forum that one webmaster grappeling with the supplemental index wrote:
"With a casual inspection I could see that all these pages in the supplemental were the php based dynamic URLs. Google does not seem to index them and though they are linked to high pagerank pages, they can not get out of the supplemental. So the only way to reduce such instances is to rewrite your applications which generate the dynamic URLS and make them search engine friendly."
This is untrue. Blogsite is 100% dynamic and our customers average more than 90% of all their blogsite pages in the primary index; they achieve this by doing nothing special - they just blog. We believe our high rate of success is related to the architecture of our presentation layer (i.e., the way our platform generates HTML). Not many folks realize it but the MyST platform (the foundation of Blogsite and Real Estate Blogsites) was designed for knowledge menagement and high search optimization.
Shimon Sandler offers a list of reasons why pages get shoved into the supplemental index:
- You have little unique text on your webpages (maybe a lot of images, and little text),
- Duplicate content,
- Your Title and Description meta tags are all identical,
- Your pages have similar header, sidebar, and footer sections,
- Your pages are dynamically generated from a database,
- Possibly most of your links are reciprocal links (not one way incoming links),
- Orphaned web pages, which are pages that no one links to, including yourself.
Many of these points suggest (although not conclusively) architectural issues concerning your HTML affect your ability to avoide the supplemental index. This seems to corroborate what we see with Blogsite.
The best evidence and overview of the supplemental index can be found at SEO Adept.
"The supplemental index is not a good place for your pages to be, as pages in the supplemental index have almost no chance of ranking for good keywords." - Staying Out of Google's Supplemtal Index
SEO Adept also offers these tips to help you get those pages out of the supplemental class.
- Make sure that your pages have enough content. Extremely short blog posts and other very brief pages sometimes end up in the supplemental index.
- Make sure that your pages have unique content, from each other and from other pages on the Internet.
- Make sure that no one is duplicating your pages elsewhere on the Internet. You can run a search on some of the unique phrases in your page to see if other pages may be similar.
- Try to acquire more and better links to your supplementally indexed pages. Try to get keywords that people are search for in the anchor text of links coming from authoritative, similary themed pages.
These tips all make sense of course - nothing new here. What *is* new (to me anyway) is that the supplemental index is apparently quite real and avoiding it is an important success factor in terms of your online marketing strategy. Given this understanding, I'm going to continue to use the ratio of pages in the primary index to total pages in the index as a measure of index penetration success. This seems to be an excellent measure of blogging success because blogging already does a good job of addressing many of the [apparent] reasons that pages get supplementalized.