Hossein Zahed

Web Developer, Entrepreneur, Software Educator

Anatomy of a Search Engine

Search engines such as Lycos and AltaVista were introduced at about the same time as portals. Although there is some variation, all search engines follow the same basic approach. On the host computer the search engine runs automatic Web searching programs (sometimes called “spiders” or “Web crawlers”). These programs systematically visit Web sites and follow the links to other sites and so on through many layers. Usually, several such programs are run simultaneously, from different starting points or using different approaches in an attempt to cover as much of the Web as possible. When a Web crawler reaches a site, it records the address (URL) and compiles a list of significant words. The Web crawlers give the results of their searches to the search engine’s indexing program, which adds the URLs to the associated keywords, compiling a very large word index to the Web.

Search engines can also receive information directly from Web sites. It is possible for page designers to add a special HTML “metatag” that includes keywords for use by search engines. However, this facility can be misused by some commercial sites to add popular words that are not actually relevant to the site, in the hope of attracting more hits. To use a search engine, the user simply navigates to the search engine’s home page with his or her Web browser. (Many browsers can also add selected search engines to a special “search pane” or menu item for easier access.) The user then types in a word or phrase. Most search engines accept logical specifiers such as AND, OR, or NOT. Thus, a search for “internet and statistics” will find only pages that have both words. Some engines also allow for phrases to be put in quote marks so they will be searched for as a whole. A search for “internet statistics” will match only pages that have these two words next to each other.

Because of the huge size of the Web, even seemingly esoteric search words can yield thousands of “hits” (results). Therefore, most search engines rank the results by analyzing how relevant they are likely to be. This can be done in a simple way by comparing the frequency with which the search terms appear on the various pages. More sophisticated search engines such as Google can determine how relevant a word or phrase seems to be because of its placement or presence in a heading or how often a site is referred to from other sites. Some search engines also offer the ability to “refine” searches by adding further words and performing a new match against the set of results.