Wednesday, September 24, 2008

How Search Engines Work ...

Google, Yahoo, MSN, AOL, Dogpile and a whole gang of other search engine sites all work on the same basic principles. Find Web pages, index the content and return results when a user enters a "search".

Simple enough right? At first blush, this is basically all there is to it, however, when you look a little deeper, there seems to be a bit more than meets the eye at work here ...

When a search engine receives a Website submission (usually from the Webmaster that created it) it adds that Website to a queue of Websites waiting to be indexed. Once the search engine is ready to index the Website it runs a program commonly known as a "Spider".

The search engine spider reads the code on the site and removes everything but the text and links. Then it counts how many words exist on the page. Next, it removes common words such as: the, as, if, that, he, she, I, to etc ... and keeps the rest.

Then it counts how many times each word and phrase appears within the text. This is known as the keyword density ratio. For example; a real estate Website may have the phrase "New Homes" appearing 15 times out of a total of 1,423 words. This makes the keyword density of the phrase "New Homes" 15:1,423.

Still with me? Hang in there, it will make more sense in a moment.

The spider adds this information to the search engine database (known as indexing). When a user enters a search for New Homes the search engine looks in it's database and returns the results with the highest keyword density ratio.

Cool.

Finished right?

Well no. That worked in the early days of the Internet. Unfortunately, over time, some unscrupulous Webmasters began to figure out ways to beat the system. They then began to add keywords to their Websites in tiny fonts that matched the background color. This rendered the text invisible to users but since the search engines read the code they see it.

This technique is known as using "Spider Lines". Ultimately it got pretty obnoxious and porn and other undesirable sites began to appear even with the most benign search terms (this is now known as search engine spamming). Therefore, most of the search engines began to filter the keyword density and if it was to high the offending Website would be dropped from the database.

The search engine spammers then adapted to this and began to enter entire dictionaries in their spiderlines rather than the same terms over and over. This of course resulted in very inaccurate searches. Search engines began to allow the use of "Meta Tags" to give the search engines a clue to which terms were actually keywords. This of course, led to more abuse by search engine spamming. The cycle of oneupmanship continued.

This eventually led to "Link Popularity." Link popularity is the method of determining site ranking based upon how many other sites (and the ranking of those sites as well) that link to a particular site. In other words it is a count of incoming links. For example; if you had a Website (we'll call it mysite.com) and 50 other Websites contained links to mysite.com then your link popularity would be 50 (Again, remember that the quality and ranking of the linking site has a bearing also, but let's keep it simple.). A Website such as Amazon.com for instance can have millions of incoming links. By the search engine's reasoning Amazon.com is a more important site and therefore gets a higher page ranking.

Link popularity is the de-facto 800 lb gorilla when determining site ranking currently. It is difficult to falsely manipulate and a fairly accurate indication of the importance of a site.

4 comments:

  1. This is great info to know.

    ReplyDelete
  2. I am glad it helped. I will post more along these lines as time goes by...

    ReplyDelete
  3. i just want you to know this really helped me also . Candi

    ReplyDelete
  4. Awesome! I am glad to be of service. It is good to know that people actually read my blog! For a moment I was wondering...

    ReplyDelete