PIP: Percentage of Indexed Pages by Google
I've heard a lot lately about PIP (Percentage of Indexed Pages).  PIP essentially refers to how many of your total website Google has actually indexed and includes in their search results.  Yes, while Google indexes all/most of your pages it will "omit" some for one reason or another.  Ever wonder what pressing that "ommitted results" link at the bottom of your search results would show?  Those are the pages Google is suppressing in the search results because it thinks they are irrelevent to your search or appear to have repeated content. 

Want to see the pages Google is omitting from your site?

Use RSS Pieces free Google Supressed Pages Tool

Some people suggest that PIP is the all important ratio for determining if a site is "successful" on the web.  I, and much of the web development community beg to differ.  And to prove my point from an unbiased perspective, I will show you that while RSS Pieces has an extremely high PIP at 90%, I run neck and neck with my friend and competitor, Jim Cronin's site Real Estate Tomato (another Real Estate Blog seller) that has an extremely low PIP.  Both sites are successful on the web meaning they both drive tons of traffic and generate quality leads daily.  In fact, you've probably visited one or both of the sites recently.

Compare RSS Pieces and Tomato Traffic:
Alexa Traffic Comparison

How to determine your PIP:

PIP =     number of pages included in the search results
    all pages indexed by Google

  • Go to Google, run a site:www.yourdomainname.com.  When you get the results, click "next" and go to the end of your results.  On the very last page of your results you will see numbers in the upper right hand corner.  It will say results x of y total results. 

For example, this is the last page of RSS Pieces included results:  it shows that of the 342 pages Google indexed, it displays 313 results.  Meaning Google has chosen to omit 29 of our pages for some reason or another.  My PIP is 313/342 = 91.5%

Jim Cronin's results from a site query are here.  Only 6 of Jim's 153 pages are included in the Google results pages which puts his PIP at 3.9%. 

According to some logic, RSS Pieces should appear higher in the Search Engine Results pages and drive substantially more traffic than the Tomato, however, this simply isn't true.  PIP is not an indicator of a website's success.  Traffic and lead generation are what determine a website's success and those numbers are directly related to how good the content is on the page.  Both Jim and I drive substantial traffic to our sites because we consistently put out quality content.  We do not populate our sites with tons of RSS feeds from various sources, we subscribe to the theory that unique relevant information is what people want and we are winning the traffic wars.

Conclusion: Do not be fooled by dressed up metrics of PIP.  Be sure to use the metrics the rest of web community uses when determining your site's success... traffic and leads. 

For more information about meausring your website traffic read:
Measuring and boosting traffic in 1 hour/day
 

84 Comments on Myth: High percentage of pages indexed by Google = online success

OCT
31
2006
260,761 Points 67 Featured Posts Localism Sponsor Outside Blog
Thank you, again, for this info. I am learning more and more every day!
8:53am • #1
599,049 Points 59 Featured Posts Localism Sponsor Outside Blog Hit Router

goodness I don't think I have ever heard of PIP.  I have heard not all pages are indexed and did not realizee that the omitted results is THAT.  

Thanks as usual.  

  

 

9:07am • #2
244,607 Points 5 Featured Posts Localism Sponsor Outside Blog
Interesting stuff, I think that I should pay more attention.
9:07am • #3
4 Featured Posts

Now I'm thoroughly confused.  I just checked my blogs

http://LongIslandsBestHomes.blogspot.com and http://BloggingLongIsland.wordpress.com and was shocked by the discrepancy between the two.  Is there any wisdom you can share with regard to this?  My wordpress blog has each article listed separately while the blogger one has none.

10:35am • #4
8 Featured Posts

Mary - Excellent post as usual. I'm glad to see the debate for PIP bubble up - heck I didn't even know it was called PIP, but the fact that there's a name for it suggests it might be useful to understand.

"PIP is not an indicator of a website's success."

I agree, but traffic is not necessarily an indicator of business success either. I'm sure you agree that many indicators are necessary and traffic is certainly a great one.

This reminds me - there's a type of service that involves safety management processes with liquid nitrogen (for oil refineries I believe). The company that provides that service depends on just a few dozen prospect clicks a year to be successful, so to them, "high traffic" doesn't have the same definition as it might for you, me, or real estate agents. Indeed, traffic is a contextual determinant of success.

"According to some logic, RSS Pieces should appear higher in the Search Engine Results pages and drive substantially more traffic than the Tomato, however, this simply isn't true."

I think this is an invalid assertion - you and Jim are each in different businesses. Jim is broadcasting to a much larger audience about far more general ideas than RSS Pieces. Aren't you comparing apples and tangelos?

"Be sure to use the metrics the rest of web community uses when determining your site's success... traffic and leads."

I agree - traffic and leads are excellent metrics, but just because the web community for your industry has a set way of doing things, consider that some metrics were established before search engines were able to calculate things like PIP. The web (after all) is changing in dramatic ways and blogs are used for a variety of business objectives. Doesn't it make sense to be more open-minded to improving business and technical metrics?

In my view, we might learn a bunch about our online marketing strategies by employing additional metrics - like:

  • Ratio of comments to total blog posts
  • Ratio of comments to total blog page hits
  • Ratio of contact page hits to total page hits
  • Ratio of total page hits to sales
  • Ratio of pages indexed to sales
  • Ratio of blog page hits to property search requests

When tracked month over month, these metrics tell you new things about your blogging effort. For example, when the ratio of comments to total blog page hits is increasing, you are becoming increasingly more efficient about using your blog content to connect with readers. A rising ratio of contact page hits from your blog compared to total blog page hits indicates your content is compelling people to call you.

PIP is no different - but it measures something we've not paid much attention to - how effective your pages are at getting into the primary index. It's my feeling (however purely simple logic it may be) that to get good search recommendations you have to have pages in the primary index; the greater the percentage, the better. I guess I struggle with anyone arguing for a smaller percentage. Historically this may not have been so important because search engines were on a tear to see who had the biggest index. This is not longer true - now they are on a tear to provide relevant results - happy days for SEO'ists indeed.

No single metric is ideal, but all of them serve to help you measure success in many discrete and valuable ways.

10:59am • #5
8 Featured Posts

Geri -

I think the difference may be that the BlogSpot template is generating HTML (i.e., the presentation layout) of your blog not as efficient as your Wordpress template. This is what I was referring to in my comment - how can a blog do well with one page in the index? More specifically, how can it possibly do well if competitors in your market have blogs with 200 times the pages in the primary index?

As a simple test - take one of your post titles and search for it verbatim in Google - I picked this one because it's unique and should be easily found if it's in the index. Now look at the first 100 hits - is your domain (LongIslandsBestHomes.blogspot.com) anywhere to be found? I looked and I couldn't see it. This tells me that however nice your blog is, this particular post is incapable of generating a search referral from Google. This may not be true of all search engines, but it seems to be the case with Google. You can easily test other engines to see if it's a pattern.

How you fix the template in BlogSpot so that it gets more pages in the index is a mystery to me.

11:37am • #6
123,112 Points 26 Featured Posts Outside Blog

Bill,

I don't know about your test.  I tried it half a dozen times with artcle titles from the Tomato, which apparently is PIP challenged, and all my articles faithfully land the #1 spot in Google.  Wouldn't I too have trouble appearing in the SE's based on titles if I was suffering from low PIP?  Now what?

3:05pm • #7

However, we could we differnentiate your blog and The Real Estate Tomato as 'thought leader blogs', which you most certainly are :-) Not everyone can be a 'thought leader' and many don't have the time to try, so maybe the 'rules of engagement' between a Realtor and the average Home Buyer or Seller are different? 

If you'e a Realtor and a lot of the folks searching on your area for specific information (i.e. Summit NJ 4 bed 4 bath) want simple & immediate gratification, isn't it the big picture/ best case scenario to be capable of answering as many of their searches as possible?

If so, maybe having the most pages, the most keywords and the most key phrases indexed is the best chance scenario for being there, for generating more traffic and for getting more leads?

chris
3:35pm • #8
8 Featured Posts

Jim:

I think it's possible to find all artifacts that are in the index - whether primary or secondary. I believe that your weblog service is doing fine at getting them into the index, however, what I don't clearly understand is when Google looks to the secondary index for content when making recommendations. I heard it uses content there if there is a distinct lack of good material in the primary index from all pages competing for that query result.

This item is in the secondary index, and it comes up #80-something in Google. That was the first one I tried.

3:41pm • #9
140,919 Points 14 Featured Posts Outside Blog

Mary I wanted to try your tool, but you have gone over your API limit. Maybe you could allow users to use their own API key and then this wouldn't be an issue. Would make for much better link bait if those that end up trying the tool get to actually use it ;) .

Also the text "I found no rows. Either the URL is bad or all of the calls to the Google API have been used for today." is the same color as the background so I only noticed it when I highlighted the area on accident as I am stuck using my laptop.

BTW, the article is good info as always. 

4:45pm • #10
35 Featured Posts

Ken, working on it all... we noticed it too.

 

Which tool were you using when you had to roll over the text to see the API error.  Suppressed or Banned tool?

5:08pm • #11
8 Featured Posts

Ken/Mary -

I was trying to use the supressed pages tool and I got the same result. I suspect if we just wait until midnight, we can probably run it again.

bf

5:18pm • #12
8 Featured Posts

Jim:

I did a little more looking and found these blog titles to be in the Google secondary index but unavailable as a recommended page even when requesting the exact title such as "Real Estate Tomato: Buying Homes Online - The Video".

While this exact item can be found in the index as a quoted string, there's no recommendation for it even when the complete title is searched for.

"Wouldn't I too have trouble appearing in the SE's based on titles if I was suffering from low PIP?"

Yes, that's what I've noticed on hundreds of blogs and websites - pages in the secondary index are discovery-challenged. The data seems to indicate this is the case for your blog and everyone's blog for that matter. However, I cannot stress enough that these are simply anecdotal instances that seem to suggest there is something about the secondary index that is not as valuable as the primary index. This doesn't rule out the possibility that secondary items won't be recommended, but it does indicate that it's harder to get Google to recommend them even when they are asked for by exact title.

5:46pm • #13
136,105 Points 17 Featured Posts Outside Blog

Superb analysis Mary, my real estate site has very little pages showing up in the search results (mainly my hompage) and everything else being categorized as supplemental pages, Ken shared this with me.

But I do get the benifit of all those pages and my homepage gets pulled for search terms that relate to my supplimental page indexing.

Another key post here :)

11:07pm • #14
NOV
01
2006
35 Featured Posts

There is no "secondary index"

I have not read any credible sources that suggest that a "secondary index" exists.  I do not believe there is such a thing as a "primary index" (this is a database term and specifically an SQL term- not a search engine indexing term) and "secondary index" and I have not found any information that shows that it exists through Google's own documentation or their public patent information.  The only primary index that is mentioned is always in the context of database stuff and not related to Google.  If you have seen contradictory information, please show me.  Remember that when we refer to Google and an index we are not talking about an index on a field or set of fields as you would expect when you talk about a database but instead we are talking about a block of pages which are two different beasts.  That having been said, there is no secondary index as it relates to Google's representation of web pages.  What really happens is that there is one index, albeit spread over hundreds of datacenters each sporting their own full copies, but still just one index.  And in that index there is a value that represents link credibility (LC) and it would have a range of 0..n but for my comments I will say 0 to 1. This information comes from Google's patent data and their own developer documentation.  When a site is pure and the content is clean and the context is proper for the site and page you get an LC of 1.  Each time you drift by mentioning poison words, dramatically off topic information, spamming of keywords or other data, terms of use violations like altering content for a search engine then you get points deducted from the score with total failure being represented by a 0.  Somewhere between 0 and 1 you establish a line of trust that signifies a good or acceptable page and we will say that score is 0.5.  So a page with a score under 0.5 is completely dropped from the index.  Now you have a second line in the sand that means a page is truly on topic and we will say that score is 0.7. Because you have one true index and you discard trash pages completely and then have a range of acceptable versus unacceptable you now have a very simple criteria by which you decide if a page will be shown or not shown.  It is simply too much of a stretch to imply that Google maintains a different list of suppressed pages in its (quote fingers) "secondary index."  I would never design a database like that because it overly complicates a very simple task and its design can be limiting.  So... There is no secondary index but instead a factor that means don't index, index but suppress and index and show all handily represented by one floating point number in a table.

Why long tail seraches return what are considered ommitted results
(Quote fingers again) PIP and SERP
When you run a short tail search you get the top indexed pages based on trust, context, pagerank, etc. and that is just as you expect.  When your terms exist on a page but it is suppressed then your page would not be returned near the top of the list.  Lets say that you do a long tail search and none of the prime pages really fit, well now you fall back to the ommitted results.  So being in the yellow band (that range of suppressed pages) doesn't preclude you being returned in the results but it does limit you when a person does a prime/generalized search.

4:43am • #15
1 Featured Post
Mary do websites with frames have a much harder time being spidered?  My google rankings have been increasing very fast each month, but many of my interior pages have not been indexed yet.
6:19am • #16
35 Featured Posts

Jennifer,

That is exactly what is happening.  Search Engine's (for the most part) can't read frames so they throw out a lot of the information contained in them.  The best way to rectify that situation is to switch to a tableless CSS design.  A developer should charge you no more than $500 to convert the site and then you will be good to go!

6:41am • #17
8 Featured Posts

Mary - thanks for the detailed explanation of the index machinery. I really appreciate the effort.

"When your terms exist on a page but it is suppressed then your page would not be returned near the top of the list."

So, you agree - there is a class of indexed pages [in Google] that are supressed. A number of [alleged] SEO experts have refered to this as the secondary index. Whether Google intended it to be called "secondary" or something else is irrelevant - your comments suggest the "yellow band" exist and that corroberates my observations as well.

"So being in the yellow band (that range of suppressed pages) doesn't preclude you being returned in the results but it does limit you when a person does a prime/generalized search."

Ok, this all makes sense and seems to confirm what I've been seeing. Having said that, do you agree that seeing a larger percentage of your web pages in the "yellow band" is not as good as seeing a smaller percentage of your web pages in the yellow band?

9:32am • #18
8 Featured Posts

Jennifer -

"The best way to rectify that situation is to switch to a tableless CSS design."

There's not a lot of data to suggest that tables versus CSS-tableless sites have distinct SEO advantages, but there's plenty of debate about the subject. While I agree that CSS provides some architectural advantages in performance and other aspects of maintenance, there aren't a lot of web consutants that know how to do this, and even less understanding about how to do it well in an SEO-friendly way. If you can find someone that knows how to use CSS for SEO benefit, ask for a few references and then full speed ahead.

10:02am • #19
140,919 Points 14 Featured Posts Outside Blog

I can point you to sites that use tables that are at the top of some very tuff terms, but they are done correctly. Most sites that use frames aren't designed correctly and that is the issue with them. 

IMO, CSS is the way to go, when I switched from tables to CSS I removed 50% of my file size so the pages load faster and my rankings went up a little (1-3 spots). That might have had nothing to do with the CSS, but it's my opinion that it did.

BTW, Mary it was the suppressed tool that I tried. 

10:17am • #20
35 Featured Posts

The SEO benefit of switching from a site designed using frames which cannot be read/indexed fully by search engines to a site designed using tableless CSS is that the content inside the site can be fully read by a search engine. SEO is the process of making a site more readable and indexable to a search engine so architecture itself is important as well as the many other factors involved.  

Read SEOmoz' Beginners Guide to SEO it states that CSS and tableless design with valid HTML is the prefered architecture for SEO. SEOmoz is the industry's premiere SEO provider and research firm.  They have a number of articles that cover this topic.  One quote I like is "Valid HTML & CSS - Although arguments exist about the necessity for full validation of HTML and CSS in accordance with W3C guidelines, it is generally agreed that code must meet minimum requirements of functionality and successful display in order to be spidered and cached properly by the search engines."

10:24am • #21
1 Featured Post
Thanks Mary, thats what I thought.  My webhost Advanced Access is switching over to tableless CSS withing a month or two.  I will definatley switch it over!
10:50am • #22
8 Featured Posts

Mary -

I agree - CSS is a good way to improve site quality and SEO performance.

"The SEO benefit of switching from a site designed using frames ... to a site designed using tableless CSS is that the content inside the site can be fully read by a search engine."

And I agree that this is an accurate statement and may be a good strategy for Jennifer. However, the way you said it suggests it's the only model Jennifer should consider to repair her site, which is not true. In my view, it's important to call out some alternatives that will also correct her situation - especially given that not every HTML programmer is up on this new CSS model for eliminating tables.

10:58am • #23
35 Featured Posts
CSS was approved 8 years ago as a standard, it is not new.  The other option is straight HTML which will cost the same amount of money and is far less functional and flexible than a design driven by CSS.
12:22pm • #24
8 Featured Posts

Mary -

"CSS was approved 8 years ago as a standard, it is not new."

Indeed, I wasn't suggesting that CSS was a new idea. Rather, don't you agree that the use of pure CSS as a replacement for tables *is* a fairly new concept?

12:31pm • #25
136,105 Points 17 Featured Posts Outside Blog

This is a highly debatable topic :) great insight from everyone!

12:49pm • #26
8 Featured Posts

Toby - yes, lots'o'fun when you get two techno-geeks in the same room. And there's actually two topics here -

  • CSS to replace tabels - a great topic and fantastic solution to some age-old problems;
  • Yellow-band index class - a very new and not fully understood aspect of Google.

Mary and I agree on many things - more entertaining debate to come I'm sure.

1:23pm • #27
8 Featured Posts

Mary -

Perhaps you missed this question - I was really hoping to get your opinion on this:

"So being in the yellow band (that range of suppressed pages) doesn't preclude you being returned in the results but it does limit you when a person does a prime/generalized search."

Having said that, do you agree that seeing a smaller percentage of your web pages in the "yellow band" is better than seeing a larger percentage in the yellow band?

2:55pm • #28
NOV
02
2006
599,910 Points 244 Featured Posts Localism Sponsor Outside Blog
Ok what the hell are y'all talking about? Nevr mind, don't even try to explain it. This stuff is WAY out of my realm of understanding. Great discussion though. 
7:26am • #29
599,049 Points 59 Featured Posts Localism Sponsor Outside Blog Hit Router
it is over my head too... but I am reading it in hopes that some of it will sink in to my brain at some point.  I will have an Uh Huh!  moment
7:33am • #30
8 Featured Posts

Bryant - LOL! Made me chuckle a'loud first thing this morning - a great way to start a day. ;-)

As it turns out, I was trying to get my head around these concepts as well, so I did a little more research - maybe it will help.

My simple conclusion is that Google uses some sort of mechanism to determine content strength and content weakness for each page. Pages seem to fall in one or the other category. Those that are stronger have a good chance of being recommended (for relevant queries); those that are weaker don't.

Mary has named the ratio of pages in the two classes as PIP (i.e., pages in primary) - a ratio that is relatively easy to monitor - in fact she developed a tool for it. I believe that a web site with a low percentage of pages in the "supplemental" (i.e., yellow band, or secondary) index is one additional measure of discoverability effectiveness of your web content - specifically useful for blogs. Simply stated, the higher the PIP ratio, the more effective your content is at getting recognized. It seems to be an elegant way to monitor indexability - an important metric because without indexed pages, it's darn difficult to generate traffic from search referrals.

9:01am • #31

While indexing and the ratio of indexed and suppressed pages are an important part of discoverability (for lack of a real word) they are not the ultimate answer to search referrals.  There is just no substitute for real and meaningful content.  No amount of indexing in the world is going to help a post that is woefully off-topic or just plain boring.  

I think it is sometimes easy to fall back to metrics and use them to explain shortcomings rather than address the real concern which is compelling content and a sense of purpose to a website.

 

10:20pm • #32
8 Featured Posts

Kevin -

Couldn't agree more. The fact that you agree as to the importance of avoiding anything that supresses indexed pages is exactly how I feel.

"There is just no substitute for real and meaningful content."

Again - 100% agreement on that. ;-) But for a moment, imagine you have the best content about subject "x" and you publish it in a site whose architecture struggles to keep your very good content out of the yellow band - that's not a good outcome either.

"... easy to fall back to metrics and use them to explain shortcomings rather than address the real concern which is compelling content and a sense of purpose to a website."

If I had to pick, I would definitely pick good content over poor presentation architecture. ;-) However, what if my competitor had mediocre content and a better index footprint - who would be likely to get more search referrals?

10:33pm • #33

To address your last comment.  I have looked at quite a few sites in my day and I have analyzed sites that are poorly indexed and survive on quality content alone.  

If the content is of the same quality, the better crafted site should win but it also comes down to the reputation of the site, backlinks and many other factors that are too numerous to list.

I have one foot stuck in the real world and one in the perfect world.  In the perfect world content would be king because that is what people are really after.  In the real world presentation architectures and marketing can overcome weak content but have you really done anyone a favor?

10:44pm • #34
8 Featured Posts

Kevin -

"In the perfect world content would be king but that is what people are really after..."

Yep - it turns out that search engines want what people want. ;-) If only humans would recognize that - we'd see people doing one thing - just writing good information. Unfortunately, humans tend to think they are smarter than search engines and that they can exert control over them. This was true a few years ago, but that's not necessarily true now. So businesses that recognize strategic behavior (content and brand building) is better than tactical behavior (misspelling keywords intentionally), will have a competitive advantage soon if not now.

"In the real world presentation architectures and marketing can overcome weak content but have you really done anyone a favor?"

Excellent question.

To be clear - I'm just a geek that builds information systems and products. I started in 1981 - first product was LapLink (75 million units in 20 years). I was involved with it for about 8 yrs. And I noticed that while it was the coolest file transfer program on the planet - people did stupid stuff with it - even Nasa shot their foot off occasionally with it.

Blogs and km systems are much the same - people use them in good ways and bad ways - the common thread is that a small portion fail to recognize that their own responsibility is as important as the technology itself. Blogsite is 49% of the solution - the customer (and their content) is the real success factor.

I've learned over nearly three decades of building cool software that you have to build hardened, well researched and nearly perfect architectures to meet scalability, security, and marketability requirements. Most systems aren't built for more than 36 months. The MyST platform (the underpinnings of Blogsite) was built to last well beyond 2012. If you slight the customer on any of these points or the longevity of your foundation, your technology won't make it to "the show" - another term for "get acquired".

If you try to force or control how your customers use your technology, you'll quickly realize that you can't earn profits - it simply doesn't scale well. Imagine if Stihl refused to sell chain saws to anyone that used them improperly. What if Toyota didn't allow you to put goats in your Forerunner? These are indeed dumb things - some even dangerous. ;-) But you don't see companies trying to control how people use these tools.

So, you have to trust the market to seek out it's own natural level of competence. We have hundreds of Blogsite customers and some of them abuse this elegant piece of technology - but that's their decision, not ours. We provide counsel and tips and even free training, but there's a limit. But just because there are a few people that refuse to do some smart things, we can't dismiss building the right architecture to achieve maximum and optimal penetration.

Have we done anyone a favor? Sure - Intel, GFT, Real Estate Blogsites, PowerPR, VeriSign, UpData, AOL (soon, The Stones, Sting, and The Who) - they all benefit from deeper indexing and without thinking or worrying (or even knowing about it) - their content avoids the supplemental index because we continuously engage in debate (like this) to better test our knowledge and fortify it with new understanding.

11:18pm • #35
NOV
03
2006
123,112 Points 26 Featured Posts Outside Blog

Kevin-

That's what we're all looking to hear.   No blackhat... just good, relevant and unique content.

12:47am • #36
When I said "you" I meant it in a hypothetical sense and not actually directed at you as a person or company.  What I am addressing is the concept of artificially elevating content not on the merit of the content but on indexing, tricks and misdirection.
6:30am • #37
8 Featured Posts

Kevin -

"When I said "you" I meant it in a hypothetical sense and not actually directed at you as a person or company."

And I took it that way - I just wanted to respond in a less hypothetical way. ;-) Plus, I was trying to avoid doing some real work last night.

"What I am addressing is the concept of artificially elevating content not on the merit of the content but on indexing, tricks and misdirection."

You and I could not be more on the same page. I work in five different veritcal segments with Blogsite, and I'm astonished at some of the things people believe is good practice in terms of SEO activity. Using 20 domains to trick search crawlers into thinking your web site is at the center of the universe is just silly thinking. When I see folks spending hours every week to think up words to intentially misspell, I find it humorous because it's a tactic that will work until the instant one of three things happen -

  1. Search crawlers begin dinging site owners for poor spelling quality;
  2. Web browsers begin to auto-spell correct what people type;
  3. Search engine UI's begin to auto-spell correct query forms.

In all of the cases of bad behavior, there seems to be a common thread - their efforts are tactical, not strategic. I get the sense you prefer strategic, sustainable SEO. I do as well.

We can't blame anyone for their nefarious attempts to create better visibility in the past. All of us were roped into this behavior to some degree - after all, search crawlers weren't very smart and competitors are sometimes ruthless. However, crawlers are not as dumb as they used to be, and smart companies have begun to recognize this shift. They also recognize that content consumers are getting smarter about discrete search techniques, and search engines are learning ways to deliver better relevance.

Soon (I hope sooner than later) it will be embarrassing to admit that you have 6 domains, participate in a link farm, a web ring, and misspell words intentionally to trick crawlers into thinking you are something you aren't. That day is coming and smart companies are preparing for it by creating good, clean information. The most competitive businesses will follow this strategy because it's sustainable. Any other approach might produce short term results, but they will forever be retuning their tactics, a costly way to work.

8:42am • #38
8 Featured Posts

Kevin -

"... not on the merit of the content but on indexing, ..."

I singled this thought out of *my* previous rant beause I sense that you think getting content indexed is without merit. Did I misread your comment?

8:46am • #39

It is possible for something to be important and yet less important than something else.  I feel that indexing is a necessary evil and it takes a backseat to content. I want to spend my day reading content and spend very little time wondering how to find it.

I am not sure that I like the word "rant."  One definition of rant is "to talk in a noisy, excited, or declamatory manner" and another is "to scold vehemently."  Neither term applies to the way that I expressed my opinion.

 

9:04am • #40
8 Featured Posts

Kevin -

Sorry - poorly constructed sentence -- I meant *my* rant. I singled this thought out so that *my* rant wouldn't overshadow your point.

bf

10:05am • #41
8 Featured Posts

Kevin - so, back to your comment...

"I feel that indexing is a necessary evil and it takes a backseat to content."

That would be ideal, but if your content isn't indexed, no one will see it, so in the current Internet architectural climate, the two are equally important I think.

To be clear, my analysis of the supplemental index is an attempt to understand what's important (at a technical level) - more pages in the prime index or less? I think we all agree comprehensive (primary) indexing of as many pages as possible has merit because content is more useful if it is more discoverable. In fact, content has zero value until it is findable. How valuable it may be after it is found (and by whom) is contextual.

"I want to spend my day reading content and spend very little time wondering how to find it."

Indeed - this post by my co-founder [of MyST] is relevant.

"I've long believed that the ultimate search technology is one you don't explicitly use.  Imagine turning the search paradigm on its head—instead of us finding stuff, why not stuff finding us?  Pushing this idea to the extreme, our applications would understand what we are working on and automatically provide us with exactly the information that we need in every specific context.  In that scenario, we would never need to search for stuff because the right stuff would find us." - F. Andy Seidl, MyST Technology Partners

Andy published those words in 2004, but we wrote papers and code on this subject in 2001 - that code is presently in use at Borland and embedded in Starteam products.

10:17am • #42

Bill,

In a perfect world the producer of content and consumer of content would never have to concern themselves with SEO and its affect on content, would not have to phrase their searches specifically to get the results they desire and they wouldn't have to spend so much time on the propagation of data.

A search engine should be a tool.  It really should be no more complex than a screwdriver.  Generic, easy to understand and grasp and performs a myriad of tasks without having to worry about getting a rubberized grip, three sided handle and hardened tip.  It should do the job and get out of the way so you can do what you want which is read something. 

11:11am • #43

Since I know that I will be asked, here is my brief assessment of search engines.

 Yahoo is an example of a good search engine that has gone bad.  It has too many options, loads slowly and its results are returned slowly as well.  There are far too many categories on Yahoo that have nothing to do with their original purpose which is search and directory functions.

MSN Live is OK.  Not great but certainly not bad.

Google is quite good.  Easy interface and typically returns decent matches for your search but their perversion of search data that is used to affect the final results is bad.  Their results were more relevant a few years ago than they are now.

And the site that I love to hate, Technorati.  Awful search engine, bad indexing and their site in general is flaky.  I have been there many times and had it say that a site has x pages linked from n blogs but a click of the link returns nothing.  It is either growing pains or bad design.  Hint - another indication of bad design is searching for a domain that is hyphenated.  Try it, you'll hate it too.

Kevin

11:31am • #44
8 Featured Posts

Kevin -

"In a perfect world the producer of content and consumer of content would never have to concern themselves with SEO and its affect on content, would not have to phrase their searches specifically to get the results they desire and they wouldn't have to spend so much time on the propagation of data."

I agree with this - it's the fundamental value proposition for Blogsite. We advise our business blogging clients to just write good content. There's a segment of the business blogging market that has no desire to learn the machinery of blogging, such as trackbacks, pinging, proprietary template languages or try to understand how web services and proper HTML structures obtain greater visibility. They simply want to benefit from blogging.

We also advise them to invest a little in learning *how* to write good content and to be good participants in the conversational web. This is typically not a challenge for our training staff because companies that are likely to care about such things as quality assurance reporting, security, and other differentiators that are not typically found in personal blogs, are also good marketers to begin with and have staff with skills that know what they want to say. Of course, they too need to understand how blogging differs from polished marketing prose and there are plenty of experts and blog coaches out there to assist.

There are other segments of business bloggers that want to do everything themselves, so your definition of "perfect world" may be in conflict with some folks. We respect those viewpoints because at the end of the day, they know what's best for their own business and technical strategy. Besides, some folks *enjoy* building their own online systems just as some people enjoy changing their own oil. ;-)

"A search engine should ... do the job and get out of the way so you can do what you want which is read something."

Um, yea - no argument there. It's why we offer direct integration with Microsoft Office Research Services. Some clients want search to blend into Microsoft Office tools. As such, we've developed a .NET integration that allows your MyST-based content to magically infuse keywords (and topic tags) as Microsoft Office Smart Tags in all Office apps. This is rarely of interest with advertorial blogsite customers - more popular with customers that are blogging inward for KM purposes.

Great discussion BTW.

bf

11:50am • #45
8 Featured Posts

Kevin -

"And the site that I love to hate, Technorati."

Just as I thought - you and I think much alike. Technorati is on my short list of domains that I prefer to avoid. Unfortunately (in the blogging business) they have mind-share and money and our clients see a fair bit of referrals from their domain. However, there are more evil things we should be aware of regarding this site -- the dangers of using Technorati Tags.

"[Google] results were more relevant a few years ago than they are now."

That's true for many reasons - the biggest is the size of the index. There are scaling limitations that come into play when you have to recommend just ten pages from 30 million pages about the same subject. When you think about it this way, you quickly realize that Google has a big responsibility not to mention a huge technical challenge.

12:00pm • #46

About Google

True.  But when they had 2 billion pages their search results were better and 2 billion is quite a few pages.  One of their shortcomings is they dampen new sites too much and give free reign to sites at the other end of the spectrum.  The disparity created by this system can be a source of frustration to users that can't find what they need even when they know what it is and instead are greeted with the same old, same old.  It always is disheartening to people that write solid content and are arbitrarily confined by the "Trust Box" mechanism and share space with content that should be there.

About Technorati 

The problems in Technorati run so deeply into the heart of the system that there is likely no repair that can be made short of replacement.  All of the aggregation systems suffer from the same flaws so I rarely have interest in Digg, del.icio.us and other sites of the same ilk because they never do exactly what I need.  It may be that my requirements are too specific to be rolled into a social system.

The nice part about ActiveRain is that it doesn't regurgitate content but instead promotes the creation of new and unique content.  Hats off to them.

 

1:12pm • #47
8 Featured Posts

Kevin -

"One of [Google's] shortcomings is they dampen new sites too much and give free reign to sites at the other end of the spectrum."

Well, some folks think that's a good hueristic of course. If you were going to recommend the best Chinese establishment to your best friend, you might be inclined to go with a place that's been around for a while -- one that you have great trust in. Google believes online recommendations should mimic life in similar ways. I suspect they also have the data to prove that domains with long histories are more likely to produce good content. The nefarious use of domains by black-hat SEO'ists have created this problem, so we can 't expect Google to tolerate it.

1:58pm • #48
NOV
04
2006

"So being in the yellow band (that range of suppressed pages) doesn't preclude you being returned in the results but it does limit you when a person does a prime/generalized search."

How about explaining or giving an example of someone doing a prime/generalized search versus a one that isn't a prime/generalized search.  Some may not understand what you mean by that type of search.  Someon would say a short tail search versus long tail search.  Most just search without thinking what type of search they are making.

Jacob Reynolds
1:42pm • #49

A prime/generalized search is a search that has only a few terms and is generic but a sub-prime search would involve more terms and very specific terms.  You can consider them to be similar in concept to short-tail and long-tail searches. 

The thing that really differentiates long-tail and sub-prime searches is that a sub-prime search has to have extremely specific terms that must be present to return a result and these sub-prime results tend (but not always) come from the "yellow band."  

An example of a sub-prime search is literally taking text right out of an article and doing a search for it and you may go so far as even putting it into quotes so you get exactly that result. 

For example:

searching Google for this term "Myth: High percentage of pages indexed by Google." 

It is sub-prime because you have to know exactly what you are looking for or you won't find it.

A long tail search on the other hand would be just adding terms to a search to narrow down the results.  It would be like doing a search for "real estate in Galveston" and then doing a search like "real estate in Galveston with gulf view."  The second query merely refined the search.

Having failed to get a good result from a short-tail or long-tail search, a search engine would fall back and try to a sub-prime search.  The search engine has now just scraped the bottom of the barrel to find anything that would resemble a match.

3:32pm • #50
8 Featured Posts

Jacob -

I don't like to put words in anyone's mouth, but if I were allowed the lattitude if rephrasing Mary's comment that you quoted, I would say this:

"So being in the supplemental index (that range of suppressed pages) doesn't preclude you being returned in the results but it does limit you when a person searches for a subject that can be easily satisfied with content from a competitors site without going to pages that are in the supplemental index."

As you very correctly point out, there are not two types of searches - there are simply searches for subject matter. If you happen to have a site with lots of good content about subject matter "x" when a search query asks for subject matter "x", Google will advance recommendations to domains about subject matter "x" from its pool of pages in the primary index. If it finds few or no pages about "x" from any content provider, it will call upon the supplemental index to satisfy that request.

This is the precise point I was trying to make - a high percentage of content in the supplemental index puts you at a disadvantage irrespective of whether that query was very discrete (i.e., unique and long tail'ish) or very ambiguous (i.e., popular and short tail'ish).

Having said that [Kevin] - your comments are misleading. Suggesting that search phrases indicate whether the results are likely to come from either primary or supplemental search indices is patently false. You can easily find pages in the primary index with a "sub-prime" search, and you can also find pages in the supplemental index with a "prime" search; each outcome is possible given a specific search context and the topology of content available about that context.

The reasons that pages happen to end up in the supplemental index have nothing to do with the way Google determines what to recommend given any specific query. In fact, pages that are *in* the supplemental are placed there before any searches are conducted. According to [some] SEO experts on this subject, pages typically get sidelined in supplemental for architectural reasons - it's spelled out here.

8:03pm • #51

If you read my comments, you will see that I did not say that search phrases indicate positions or existence in the primary or "secondary" indexes.  The examples given were simplified to demonstrate the difference between sub-prime and long tail and not imply a direct link between terms and SERP. 

If you want to discuss the parts that I was addressing, I am happy to but do not put words in my mouth or put a spin on my message.  The entire comment was made to define the differences in search terminology.  

If you would like to discuss this one on one, please feel free but I will let you know right now that I am currently consulting with a startup that is redefining search engine technology and this is my day job.  And BTW - This is not my first exposure working for search engine companies.

8:24pm • #52
8 Featured Posts

Kevin -

I'm sorry you thought I was putting words in your mouth - not my intention. These are your words:

"a sub-prime search has to have extremely specific terms that must be present to return a result"

This is untrue for the reasons cited in my comment. Furthermore, *every* search must contain extremely specific terms that actually appear in the content for it to be returned in any result page.

This is why I said your comments were misleading - you can get both prime and supplemental pages to appear in results using literal text, quoted text, and more terse phrases with just a few keywords. The fact that it's a more detailed query or a more ambiguous query is irrelevant when measuring the outcome - the determining factor is the context of the query in relation to the available information to satisfy that request. Your comments may have been accurate as far as you envisioned, but they were misleading - no big deal.

The behavior of results from prime to supplemtal is what I've observed - you were just saying something that didn't reflect my own reality. ;-) And to be clear - the only reason I'm ranting on in this particular thread was to get some guidance and possible consensus from experts on this very basic question that no one seems to want to answer:

Is it better to have a larger percentage of your website pages in the primary index? ;-)

BTW - there's no need to react defensively - this is just a conversation. If you see something stupid or misleading that I've said, tell me I'm wrong - that's what makes a good learning experience for everyone. We (as techno-geeks) also have a responsibility to the members of this forum - we need to be explicit about our ideas and continually provide clarity to our comments, else we may be misinterpreted.

And best of luck on the new startup - I think search is ready to be redefined. I also enjoyed your comments on the trouble with existing search solutions - they all need to have a little rethinking.

bf

9:00pm • #53

I was giving a loose definition to the four terms and brief examples of their use.  The rest of this comment is meant to address the yellow and green band phenomena that we see daily.  The most important part to realize is that suppressed pages, by their very nature, have been determined (whether right or wrong) to be irrelevant for any number of reasons ranging from indexing mistakes, to poorly constructed pages and will likely be excluded from prime searches and may be exposed more fully from a sub-prime query.

BTW - If you watch Google closely you will see that it will return pages that do not match the search terms.  This isn't caused by keyword spamming or other techniques but instead by Google's ability to "guess" at the intent of the search by using historical data that has been cultivated to improve searches.  So it is possible to have a page that is missing major search terms and yet still show up in the index.

You need to realize that there are no absolutes in SERP because each result returned is keyed to the relevance of the search terms, trust in the index and other factors so a prime search can yield pages in the previous named "yellow band" but it occurs because the value of the page is combined with many other factors that eventually result in a result set being returned.

All things being equal, a prime search will prefer to return rows from the primary result set but it can be outranked by other factors.  That having been said, a sub-prime search can be seen to return rows from the primary result set as well.  The entire process is much like a man walking a tight rope.  There are always adjustments being made in an effort to find the happy medium.

Either way a sub-prime query, while it may return results in and out of the primary index, is undesirable because to be effective it demands a level of specificity that a typical user will not know or even want to know.

One of my favorite examples of sub-prime is trying to find song lyrics.  Try to find the David Bowie song "Changes" without resorting to sub-prime methods - given that you are searching by lyrics because you don't know the artist.  You can do it but you will see that it will be buried amongst many other pages with similar terms.  The words "changes, turn and face the stranger" would appear on millions of pages and odds are 90% are completely irrelevant and I don't know about you but I would not be interested in evaluating 900,000 rows to find that needle in the vinyl haystack.  Turn that search into a classic sub-prime and it will be much more accurate in the returned rows but it demands that I have more data at hand than may be feasible.  This example of sub-prime is also for illustration and the rows returned would likely be prime but we would need to use sub-prime methods to locate them.

So in a round robin fashion I will answer this question:

Is it better to have a larger percentage of your website pages in the primary index?

The answer... Drum roll please....

If you had identical content and setting aside trust, PageRank and other factors - the site with a higher PIP will usually win because the pages in question are likely part of the green band and considered more desirable. 

But this sets aside the premise (and my belief) that content itself will have the last word in search results because a highly optimized site about the existence of gnats in the canals on Mars and their desire to eat really good Chinese food will probably not get much traffic.

9:49pm • #54

" There is no "secondary index" "

 Ok so we have one saying there is not secondary or supplemental index.

 "So being in the supplemental index"

And then we have another school of thought that says there is one.

I have greatly enjoyed the reading thus far, but would love to find a definitive answer.  From what I have searched there seems to be a supplemental index or what  one would call supplemental search results.

There seem to be several credible sources talking on this, but how credible?  I can't say as I don't know all the players, but I would love to get to the bottom of it. 

Jacob Reynolds
9:58pm • #55

This is a separate comment because it is off topic but I have seen this asked somewhere else and it applies to PIP.

CSS and/or HTML faults can cause pages to be suppressed.  I want to put emphasis on this because it is real and is currently causing people great discomfort as they try to diagnose a visually appealing site that is suffering in Google's indexes.

In an effort to nip search and indexing problems for our new engine in the bud we have done some preliminary analysis of sites that are performing poorly in various search engines.  One site that we have reviewed belongs to a blogger here on ActiveRain and they have a classic PIP problem that can be traced directly to a decision made regarding formatting and use of HTML.  In their case they have a header image nested in a div element and an adjacent div that has text in it.  This text has been suppressed by CSS into visual non-existence but from a strict spider standpoint it actually still has text and is beside a graphical device which has no SEO value.  The result is that Google is reading this people-invisible text as an introductory paragraph for the page and because their blog is templated that same text appears on all the pages. Net result?  All but the home page are suppressed and they are challenged when it comes to searching via Google.  BTW - To the site involved, I am going to send some recommendations to you to clear this up. 

This is an example of bad HTML and not bad CSS. 

 

10:06pm • #56

Jacob

There is no supplemental index but there are supplemental search results.  Pages are given a weight or trust based on many factors and I know that I have seen Mary link to a dissection of patent filings by Google that discuss page freshness, site/domain freshness, contextual factors and trust to name a few that are rolled together into a value for a given page.  If a page falls below all acceptable levels, it will be discarded and not just suppressed.  If a page is in the mid-range or yellow band it will be suppressed and the last area is highly relevant and trusted values that are considered part of the primary search result.

What I have just said is based on previous search engine experience in internet/intranet implementations and also current work/research that I am doing.  Because of IP and patent issues there is some information that I am not allowed to go into but I will say that the new search engine will enter a public beta in January.  

I hate being so cryptic but non-disclosure agreements are a bear.

 

10:18pm • #57
8 Featured Posts

Kevin -

Excellent stuff - so many things to learn, so little time. I'm slammed by some production schedules this weekend but I promise to dig deeper and ask more questions.

"... because a highly optimized site about the existence of gnats in the canals on Mars and their desire to eat really good Chinese food will probably not get much traffic."

Humorous and educational. I love it. ;-) It's true, this page will not get much traffic, but in the rare instance when one (and likely only one) person ever asks for it, do you want it to be your domain (and brand) that's seen, or your competitors? This is the basis of my conclusion that content dominance is possible if you take the position that the long tail is liklely to give companies that ever so slight edge that is somethimes the difference between capturing one additional customer while denying the competition that same customer.

"So it is possible to have a page that is missing major search terms and yet still show up in the index."

This, I did not know or even concieve was possible. Can you share a few examples?

"Either way a sub-prime query, while it may return results in and out of the primary index, is undesirable because to be effective it demands a level of specificity that a typical user will not know or even want to know."

This is completely predictable and very logical - and it's difficult to argue with intuitive conclusions. I once believed the same until I started to look closely at the tail of referrers. I discovered an increased propensity of highly complex queries - many from machines - mostly RSS requests. How these machine-based queries are being formed and with what tools, I'm not entirely certain, but it appears to be happening with greater frequency. If my observations are true and seen by other observers, would it not make sense to attempt to gain a competitive advantage by ensuring that your content is seen by these "beasts"? I see your point, but if it's inexpensive to be discovered in these other very discrete results, why not increase your brand awareness?

"Try to find the David Bowie song "Changes" without resorting to sub-prime methods ..."

As Jacob points out - people don't think in terms of prime and sub-prime queries - I still don't quite see that queries fall into two categories - although there are certainly degrees of ambiguity. In fact, all queries have a degree of ambiguity - some are very ambiguous, others are less ambiguous, but no query is absolute - at least not in the mind of the search user.

I want to see a broadway play while I'm in NYC - I search for "plays". But I get ESPN's Baseball Play of the Day. My search was ambiguous, so I add broadway and I have less ambiguous results. I want a cheap matinee - a comedy - so I add many more terms until I stop searching. At what point does this process cross the line between prime and sub-prime? In my view, it never crosses the line because there is no such line.

"green band ... considered more desirable."

I agree - thanks for putting a stake in the ground. Is it [one] good metric? I'm not convinced but I'm tilting that way because it seems to be better than the alternative (i.e., a high percentage of yellow-band pages).

"CSS and/or HTML faults can cause pages to be suppressed.  I want to put emphasis on this because it is real and is currently causing people great discomfort as they try to diagnose a visually appealing site that is suffering in Google's indexes."

I agree - this is a very interesting subject because it's really at the heart of the next discussion - how to avoid the supplemental index. It's also one area that we haven't paid much attention to in our own presentation layer. While our XSL models are not optimized for CSS and HTML accuracy, we seem to have had good luck avoiding the supplemental index. But that doesn't mean we know what's best.

10:43pm • #58

Kevin,

I see the term "yellow band" used several times.  I assume this is a made up term as I have never seen Google use this term or any other SEO specialist... at least not until now.  Am I correct that this is a term coined here in this post?

"There is no supplemental index but there are supplemental search results."

I'll be honest... it is hard to know who is correct in the discussion of supplemental index, search, or whatever you want to call it.  

Here is a site that seems to have a slight bit of specific knowledge on the subject, but it all goes back to who is credible.  This article I believe is from 2003, but it may still be relevent.  Thoughts?

http://searchenginewatch.com/showPage.html?page=3071371

Jacob Reynolds
10:53pm • #59

Oops... meant to hyperlink. 

http://searchenginewatch.com/showPage.html?page=3071371

Jacob Reynolds
11:03pm • #60
8 Featured Posts

Kevin -

"There is no supplemental index but there are supplemental search results."

That's evasive. Forgive me for drilling on this, but are you saying there's no such physical supplemental index, but you are willing to admit there's a logical supplemental index from which supplemental results are derived? If so, you're on a semantic slippery slope.

Jacob -

Kevin may be technically accurate in the words he chose, but the reality is that there's evidence to suggest there's an apparent treatment in Google where pages are discriminated against for reasons that are not (and may never be) totally understood. I concluded this long ago when I noticed this phrase at the end of almost every Google search result -

In order to show you the most relevant results, we have omitted some entries very similar to the 6 already displayed.
If you like, you can
repeat the search with the omitted results included.

Omitted results? Why would Google omit results even when there are only six results displayed? Logically, this tells me there's a supplemental index, or at least a set of attributes that comprise results of a supplemental nature. ;-)

11:03pm • #61

I have to share a secret... There are times I may pop into a forum or a blog and play like I know nothing to see if I can help draw out the "accurate truth" in a discussion.  I have found it better to not always debate, but at times to simply question things and shake the tree a bit and see where the *experts* take us.

I am fully aware of the whole supplemental results effect, but the one answer that has eluded me is whether there is a physical "supplemental index" that Google has created as suggested by some and referenced here.  Perhaps I'll never get that answer.

While searching on this exact subject I ran across this site and it has been enjoyable reading.  I will have to come back every now and then and see how things progress.

Thanks to everyone contributing thus far. 

Jacob Reynolds
11:14pm • #63
136,105 Points 17 Featured Posts Outside Blog

Jacob,

"There are times I may pop into a forum or a blog and play like I know nothing"

My question is: why? That seems like a self defeating purpose or a method that has hidden means.

11:17pm • #64

Toby, I couldn't get that link to work... it gave an "unexpected error".

"shake the tree a bit and see where the *experts* take us"

 In reference to my comment above, when I say experts I consider those providing detailed commenting here as experts or close to it so please don't take it as sarcastic.  I realized it may sound that way after posting it, so I wanted to clarify.

Jacob Reynolds
11:20pm • #65
136,105 Points 17 Featured Posts Outside Blog

Jacob, the link was set to "members only" which it is why there was an error, my bad.

I dont take your remarks as sarcastic but this blog is a debate and I'm trying to provide info as needed.

Supplimental results are a fact in book and here is a link that categorizes my own pages as suplimental, http://www.google.com/search?hl=en&lr=&rls=GGGL%2CGGGL%3A2006-33%2CGGGL%3Aen&q=site%3Awww.barnettassociates.net&btnG=Search

11:24pm • #66
8 Featured Posts

Toby - Thanks for this - I know I've seen that word before, but now that you point it out, all I have to say is DUH... ;-) It was hiding in plain sight all along. Like I keep saying - I have lots to learn. I should start by opening my eyes. ha ha

Jacob - I sensed that you were doing that - I'll bet you like to play poker too. I like your style - in my view there's nothing wrong with baiting people with good questions as you attempt to probe the universe for the truth. Good defense attorneys do this with great respect for the truth and [sometimes] great benefit to society. Hmmm - that's the first good feeling I've had for lawyers in decades.

11:29pm • #67

Toby,

Why?  As far as "play like I know nothing". 

I should have probably worded it... Many times I pop into a forum or a blog and simply ask questions to help draw out the truth.  I guess I don't really "play like I know nothing" I simply may ask questions instead of trying to persuade individuals to see things my way.  Mainly because my way may not be correct hence asking questions that spur further debate from those that seem to be more educated on the matter.

I appreciate you questioning me on this as I made it sound more "secretive / hidden" when really it's just a choice as to whether I decide to be more inquisitive or more authoritative as a way to bring out more discussion.

Jacob Reynolds
11:31pm • #68
8 Featured Posts

Jacob -

"whether there is a physical "supplemental index"

Does it matter? A physical implementation seems to be irrelevant - after all, it's just an implementation detail. Whether it's virtual, logical, or physical - it seems to impact content providers.

11:34pm • #69

"Does it matter?"

 From the standpoint of how it effects my sites or anyone elses. No.

 From the standpoint of the constant student and desire to know the truth.  Absolutely. ;)

Jacob Reynolds
11:38pm • #70
136,105 Points 17 Featured Posts Outside Blog

Now I am getting confunded how which side you both are on, Jacob and Bill.

As how affects a website (our's or anothers) I say the information is highly important. If pages are being classified in a secondary index then there are not achieving the most value for the most competitive search terms for our sites.

Jacob, do you operate a site that we can gage you on a reference?

11:52pm • #71
NOV
05
2006

"Now I am getting confunded how which side you both are on, Jacob and Bill."

Please don't let me confuse things... I was just responding to Bill's question as to whether it mattered whether there was a "physical" supplemental index or a "logical" implementation of it.

And whether it is physical or not doesn't really matter... the fact is Google does return supplemental results whether it be from a physical separate supplemental index or calculated from just one index.

I was just saying that from the standpoint of physical or logical it doesn't matter, because either way the end result is supplemental results.

But from someone that always strives to learn and grow I always like to know how things are implemented... even if the end results are the same. 

Jacab Reynolds
12:02am • #72
8 Featured Posts

Toby -

"If pages are being classified in a secondary index then there are not achieving the most value for the most competitive search terms for our sites."

I agree completely. While the measure of index penetration (i.e., number of pages in the index) is a good one, a high percentage of pages in the primary index is yet another measure of [positive] online discoverability. With all that has been contributed here, there doesn't seem to be any doubt that this is important - how important is still a matter of debate. But there's no debate - this is the most [findable] and comprehensive discussion on the web about this subject. ;-)

I've been very vocal in this thread because Mary's original assertion advised folks to not be fooled by pretty graphics that describe the relationship between primary and supplemental index penetration. As it happens, one of our vertical market resellers (for real estate) actually publishes pretty graphics that document how well our platform does at avoiding the supplemental index - they provide this data as a quality assurance report for customers and occasionally use them as sales tools for prospects.

I am clearly biased about this subject and wanted to defend the use of pretty graphics when they contain the truth. ;-) I'm also biased towards the truth and Mary's assertions reflected a version of reality that didn't seem to reflect the evidence I was seeing. This blog post did what good blog posts do - they bring out a deeper understanding and I learned alot from you (Toby), Kevin, Mary, Jim, etc...

Matt should transform this into an eBook. ;-)

12:07am • #73
136,105 Points 17 Featured Posts Outside Blog

"does at avoiding the primary index"

Now I think I am completly confused...How can avioding (not being categorized) in the primary index be a benefit to a website? I know the goal is to achieve maximum exposure while providing quality content but the idea of being in a secondary index as positive in down right confusing.

(I'm not sure I completely understand the usage of the term "pretty graphics")

E-Book, i'd read it, a bazillion times ;)

12:14am • #74
8 Featured Posts
Ooops - good catch - I've been at this silicon-based device way too long. I fixed the comment. Sorry to drive you to the brink of crazy.
12:25am • #75
136,105 Points 17 Featured Posts Outside Blog

No worries Bill (crazy is as crazy does)   :P

but I would still love to know Jacobs frame of reference, either his website or anothers.

No offense Jacob, it just erks me when people don't leave a track back during a debate.

12:31am • #76
8 Featured Posts

Toby - 

"pretty graphics"

I used the term pretty graphics, well - because they are I guess. But to be accurate, Mary said:

"Conclusion: Do not be fooled by dressed up metrics of PIP.

Imagine a pie chart that represents total pages indexed, and the percentage of pages in the primary vs supplemental categories. I sent one to Jim over at Real Estate Tomato to get his take on supplemental and I encourage him to share it to get more feedback from experts. I think he shared it with Mary, hence the origin of this post. I'd publish one here but with due respect to Mary, I'd rather keep vendor-specific content out of Mary's blog.

12:35am • #77

"When the legend becomes fact, print the legend." -Maxwell Scott

There are seemingly infinite amounts of theories about search engines and the way they behave which takes me back to an earlier comment where I said that search engines ought to be nearly invisible.

6:46am • #78

Since I mentioned that Google returns results with missing search terms I thougth I would include a link where someone else describes it.

http://blog.pietrosperoni.it/2005/01/12/google-uses-synonymes-when-searching/ 

 

It seems these comments were tailor-made for Mary's blog war post. 

7:04am • #79
8 Featured Posts

Kevin -

"...where I said that search engines ought to be nearly invisible."

I'm hoping for that day soon too. But here at MyST we believe it's now possible to create that experience without search engines changing anything. Alterning the user experience so they feel like search is magical and invisible can be achieved through existing API's. It requires a level of arbitration between specific applications and the engines - not the ideal (or most scalable) approach, but for specific verticals like word processing applications, or real estate blogs, it seems to be possible.

Smart Tags in Microsoft Office is a pretty good step in the right direction and integrating that capability with Office Research Services begins to blend search into the background where it ought to be. Still, these are baby steps - I'd like to see some bigger ones.

9:37am • #80
8 Featured Posts

Kevin -

Thanks for calling this idea to me attention. It's logical and smart just as a spell checker in the query field would be smart - they're kind'a there already with the correct search link at the top.

"Instead Google used the words how to, as a synonym for tutorial." - P.S. Blog

I'm not ruling out the possibility that Google is being instrumented with synonym capability - it makes a great deal of sense. But this example isn't working - the page is not found. Also, it's possible that a page is index with a term in it, subsequently removed, and for a time will continue to generate recommendations on that term.

I'd love to see a working example.

bf

9:41am • #81
The next time I see one in a live query, I will post it here.  It never fails that I see them during the day and then forget what I searched for that caused it.
9:58am • #82
35 Featured Posts
Good grief guys.  I just got back to Cape Coral and saw my inbox filled to the brim with the richness of SEO and to my honor, the ever so distinguished, Mr. Kevin Fontenot, himself commenting on my blog.  For those of you that do know the ever so distinguished "crazy cajun," we are in the presence of search engine algorithm greatness! If you have questions, he has answers. Thanks for dropping by, Font.
12:00pm • #83
NOV
06
2006

Leave a response…



(optional)
What does the graphic say?
 
Rainmaker_large

Mary McKnight

Orlando, FL

More about me…

Fuel Records

Email Me

Helping Realtors learn to successfully write and promote their real estate blog. Online success is not magic, it's knowledge and most of time, it’s free. My focus is to give Realtors the tools and knowledge to affordably succeed online through search engine optimization, search engine marketing, blogging and proper RSS implementation.


Links

Archives

RSS 2.0 Feed for this blog

Find FL real estate agents and Orlando real estate on ActiveRain.