Word Count: 608 Date: Wed, 4 Mar 2009 5:04 PM
Search Engines Deconstructed Part I
Search engines are the single most used method for finding information on the Internet. Comscore.com's October 2008 research study reports that 'more than 750 million people age 15 and older or 95 percent of the worldwide Internet audience conducted 61 billion searches worldwide in August, an average of more than 80 searches per searcher.'
Ecommerce entrepreneurs have a huge need to win the search engine sweepstakes. The higher your listing falls in search engine ranking pages, the greater the number of viewers who will read your listing and visit your site.
Although the specific algorithms used by the major search engines (Google, Yahoo, MSN, etc.) are proprietary (though subject to intense investigation by Web watchers), the underlying principles of search engines are available to be studied. These principles are 'spidering', assessment and storage, retrieval and ranking. In this first of several articles on search engine optimization, we will look at the spidering and assessment/storage process.
Web spiders or crawlers
A web spider is an automated program that crawls the Web, gathering URLs and sending them back to a repository, where they are analysed and sorted. Web spiders make it much simpler and more efficient to search the Web because a lot of the work of gathering and sorting has been done days, weeks or even months before you search for that content.
A search engine uses many Web spiders to crawl the Web pages on the Internet, return contents and index the contents according to utility of the information.
Spiders operate according to a set of rules, e.g.:
* A selection policy that states which pages to download;
* A revisit policy that states when to check for changes to the pages;
* A politeness policy that states how to avoid overloading websites by accessing URLs too frequently;
* A parallelisation policy that states how to coordinate distributed web crawlers, that is, how to avoid too many crawlers accessing the same site at the same time.
Once the spider has retrieved URLs and sent them back to its repository, the pages must be assessed for value.
Storage and Assessment
During page assessment, a second search engine program scans each page sent by the spider, analysing the content of the page, i.e., studying 'on page' factors. This program indexes which words are used, how often they are used and whether or not there is special emphasis (bold, italicised, used in heading, part of a link). The results of this analysis are stored in the search engine's document index.
Some of the typical positive on page factors include:
* Keywords located in headings and meta tags;
* Keywords in URL and domain name;
* Keyword density (5 to 20%);
* Keyword proximity (for 2+ keywords).
Negative on page factors include:
* Mostly graphics, little text;
* Bad language;
* Stolen material;
* Keyword over-density.
The program later analyses 'off page factors', i.e., links to other pages and other pages that link to it.
Positive link strategies include:
* Incoming links from high ranking sites;
* Number of incoming links;
* Age of link;
* Keyword presence in link.
Negative link strategies include:
* Link buying;
* Cloaking: show one link to spider, another to users;
* Links to or from bad sites.
Once these analyses have been completed, the search engine can match a user query with web pages that have been dissected into 'component values' based on the search engine's particular logarithm.
About the Author
eSources is the Internet's largest database of verified wholesale suppliers, wholesalers, dropshippers, wholesale distributors, importers and manufacturers from the UK and worldwide. In addition to being a wholesale trade resource, the site helps startups and experienced traders in the development and growth of their online and brick and mortar retail businesses.
Rate, comment or bookmark this article
Comments 
No comments posted.
Add Comment
Popular Articles in this cathegory
1: Black Hat, White Hat...Gray Hat?2: Learn How to Get 10,000 Visitors Per Day
3: The Best of SEO Tools is Your Brain...Or is It Really?
So let's see the points from both sides of this debate and finally decide once and for all: Do we really need SEO tools?
4: Keyword Relevance Equals Targeted SEO Indexing
5: Website Design versus SEO - The Big Two of a Successful Site
This article is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

