How to Search the Web -- A Guide To Search Tools


| About AltaVista | About Excite | About WebCrawler | About Lycos |
| About Infoseek | About Yahoo | About Magellan |About other engines|
| A Summary Chart | Search Tips | Other Useful Articles |


Using the various search tools on the web is enhanced by knowing how they were actually designed, and especially by knowing the specific rules--all too often quite different--for each tool. We have tried to address both these needs. We have arranged the search engines and catalogs in order of usefulness, provided a link to them in the title of each section, and then spell out, in short form, the rules for using them. We have provided some short examples, but for detailed examples consult the help documentation available at the site. The goal of this article is to help the new user get the most useful "hits" when using the various tools. At the end we have placed a handy little table which summarizes certain common characteristics among the search engines, some general search tips, and cross-references to other useful articles.


AltaVista

AltaVista is the premier search engine on the web. It has the largest, most inclusive indices. That does not mean it is the only one you need, or in all situations the best one to use. Different robot and indexing strategies have resulted in different results when using the various search engines. AltaVista, however, returns consistently useful information, but since no editorial decisions have been made regarding content, it also has the largest "noise to signal" ratio.

AltaVista allows searching of both the web and many Usenet Newsgroups. It allows control of the result lists in a standard, compact, and detailed format. It provides both simple and advanced searches. Advanced searches include all the features of simple ones, and also allow the use of boolean and proximity operators, grouping of terms by parentheses, and results ranking by keyword.

Simple Searches

For an effective search, it is best to enter as many search terms or phrases which exactly qualify the subject in which you are interested. The more precise you can be by offering more exact terms, the better the results.

Case sensitivity: Search terms entered in lower case letters are case insensitive. The use of capitalized terms (or accented letters) makes the term case sensitive. HotDog finds only the terms spelled exactly with that capitalization; hotdog finds all occurances of the term, regardless of capitalization. LÛpez only finds a word spelled exactly that way.

Phrases: To group search terms into phrases, include them in double quotes. "Abraham Lincoln" finds occurences of the name Abraham Lincoln, capitalized in just that way. Another way to link words into phrases is to insert punctuation between them: Abraham;Lincoln;Gettysburg;Address.

Required Terms: To require that one of your terms be included in the document being indexed, preface (the formal term is prepend) it with a + symbol: +HotDog. There must not be a space between the + and the term.

Prohibited Terms: To prohibit the inclusion of a term from a document for which you are searching, prepend it with a - symbol: -mustard. To find a reference to F. Scott Fitzgerald without reference to Gatsby: +"F. Scott Fitzgerald" -Gatsby.

Wildcards: With simple queries you are allowed to enter a wildcard character at the end of phrases which will substitute for any combination of letters. The asterisk (*) is AltaVista's wildcard character. For example, butt* will get all occurences of butt, butts, butter, button, etc. The asterisk cannot be used at the beginning or in the middle of words. It will substitute for up to 5 additional lower case letters.

Rankings: AltaVista will assign a confidence ranking to the hits it returns based on the following:

These factors are weighted, and the document with the highest confidence rating is given a score of 1.000. All others are given decimal scores less than 1.000, in order of confidence. This does not mean that the document rated 1.000 is the best source. It only best meets the ranking algorithm. Only rarely is the "best" source ranked first, unless you know the specific title of the document for which you are searching. For example, to find the document "Mr. William Shakespeare and the Internet" a search for that phrase, in double quotes, will find the exact web page, but entering the search terms separately, or just searching for "shakespeare" will result in too many non-specific hits.

Another way to search for a document with a known title is to enter the keyword title: in the search window and follow it with the title in double quotes: title:"Mr. William Shakespeare and the Internet". AltaVista allows searching within specific html tags like this for anchors, applets, hosts, images, links, text, and urls also. The usage is: host:palomar.edu, etc. See the site help pages for more details.

The most useful advice for searching with AltaVista, since its indices are text based whole words, is to be as precise as possible in describing what you are looking for, while excluding things in which you are not interested. "Viet Nam" +Saigon -conflict -war, will find information on Viet Nam and in particular about Saigon without finding information on the conflict.

Advanced Searches

The same rules for capitalization, phrases, wildcards, required/prohibited terms, apply to advanced queries, and in addition the use of boolean searching, proximity operators, and logical groupings with parentheses are allowed. These are only available if you select an advanced search from the AltaVista main page.

Boolean and Proximity Searching: AltaVista supports the use of the binary operators AND, OR, NEAR and the unary operator NOT. You may use the following symbols in place of the words: & (AND), | (OR), ~ (NEAR), ! (NOT). It is a very good idea to use the words rather than the symbols, since the words are easier to remember and common to other search engines. You may enter the operators in lower or upper case letters, but it is probably best to use uppercase to make them stand out from ordinary search terms and make the logic of the search more apparent. If these words are part of the terms for which you are searching, they must be enclosed in quotes. It is best to group your terms within parentheses to avoid confusion, but this is not required.

Examples:

Results Ranking: With advanced searches you may also specify keywords you wish AltaVista to use in order to confidence rank your results. This is a very powerful feature which will let you control which items are ranked at the top of the hit list. Type the terms you wish AltaVista to weight more heavily in the Results Ranking Criteria box on the advanced search screen before submitting the search. Then, even though the search results will not be affected, the listing of the hits will contain those in which you will probably be most interested at the top.


Excite

Excite uses a combination of text and subject indices to search either by keyword or by concept. Concept searches, according to the Excite authors, find documents related to the idea of your search, and not just documents explicitly containing the search terms you enter. From the initial screen you choose which way you would like to search, by clicking the keyword or concept radio button. Concept is the default. You may search web documents, reviews, usenet newsgroups or classifieds. Simple or more advanced features are entered in the same search box. There are not separate entry screens for either type of search, but advanced features like boolean searching and logical grouping are supported. You may not control the appearance of the hit list into standard/summary/detailed formats as you can with some other search engines.

As with all search engines, the more descriptive search terms entered in the search box, the fewer relevant hits will result. Case sensitivity and words grouped into phrases are not observed in the same way AltaVista observes them. Because of the way the ranking algorithm works, the more times a word is entered in a search window, the higher documents containing that word will be ranked: dog dog dog cat will rank dog pages higher than cat pages, but find both.

The use of required terms and prohibited terms is the same with Excite and AltaVista. Precede a required term with a + symbol and a prohibited term with a - symbol: +football -rugby -soccer.

Boolean Searching: Excite supports the use of the binary operators AND, OR, and AND NOT and the unary operator NOT. It also supports grouping of terms within parentheses to create complex logic. The default Excite keyword uses an implicit OR, that is, it searches for documents containing ANY of the search terms specified, though the Excite authors describe this as a "fuzzy AND", meaning documents containing both terms are weighted higher, but either term qualifies. Booleans and grouping allow for more specific results.

Examples:

The use of multiple spellings in the same search window can increase the chances of hits:
Dostoyevski Dostoevski Dostoevsky.

Rankings: Excite ranks its hit lists in order of confidence, with a percentage factor for what it feels is the best fit for the document returned and the search terms entered. The document at the top of the list will not necessarily be 100%. As you scan the hit list, look for a document that is very close to the one you want, then click the little button next to the confidence rating. The search will be re-perfomed using search criteria based on the indexing of that particular document, and a new list will be produced with the one you chose rated 100% and other hits ranked based on their similarity to that one.


Webcrawler

Webcrawler, now sponsored by America On-Line, is an outstanding search engine very much in the mold of AltaVista. In fact, it has more power than AltaVista in implementing advanced features such as the proximity operators NEAR and ADJ. It also includes a catalog of pre-classified subjects (directory services) by editors at GNN. It implements a feature of further searching based on pre-set search terms from the subject catalog, very much like Excite. (This feature hides behind the Spidey button. [Sometimes I feel silly writing this stuff.]). Finally, like AltaVista, it is so good in its own right, and associated with such a large company, that it can afford to be less gaudily commercial than Excite or Lycos.

Webcrawler touts "natural language searching," so you can enter a search like "highest mountain in the world." It throws out the noise words, and does a fuzzy AND search on the others, weighting pages with occurences of all search terms highest, but including pages that contain only one of the search terms. This is the common strategy among the best search engines. Webcrawler is different in that its definition of "noise" words is rather broad. The term "web" for example, is not indexed.

Display Control: On the initial search screen, above the search box, you may select whether you want to see web titles only, or titles and summaries for each hit. You may also select the number of hits per page: 10, 25 or 100. Summary mode will display a brief abstract of the page, its URL, and a numeric version of the confidence ranking.

Confidence Rankings: Next to each hit a little icon which looks something like a June bug larva is displayed. The fuller the larva, the higher the confidence match between the page and the search term. You may see a numeric version of the confidence ranking, for what it is worth, when summary display is chosen. The confidence rankings seem to be nothing more than a count of the occurences of the search term within a particular document.

Phrases: Like AltaVista, you may enter terms you wish considered as a phrase in double quotes. This means the words must appear next to each other in the resulting document. Combined with single, precise search terms this will yield the best results on the first try: Lincoln "Civil War" "Gettysburg Address" Gettysburg.

Boolean and Proximity Searching: Webcrawler allows entry of the operators AND, OR and NOT in the standard search window. Items may also be grouped within parentheses to create complex logic: Simpson NOT (Homer OR Marge OR Lisa OR Bart OR Maggie).

The real strength of Webcrawler's advanced features is in the implementation of its proximity operators. You may use NEAR/n, where n is the number of words apart the two search terms should be: Shakespeare NEAR/5 Internet. If a range is not entered, NEAR will return hits on documents where the words are next to each other, in either order. For controlling the specific order two words must appear next to each other, you may use the ADJ operator: reverse ADJ osmosis. In this example, reverse must precede osmosis.

Webcrawler does not support the use of required/prohibited terms, or wildcard expanders or limiters.

Subject Categories: Another strength of Webcrawler is its implementation of a subject catalog which you may browse. The catalog (and related reviews of web sites) is created by the editors of Global Network Navigator, and is quite good. A feature, similar to Excite's confidence buttons, is the Spidey button which accompanies subject browse mode. By clicking Spidey, Webcrawler will perform a topical search based on search terms for the area of interest pre-entered by the GNN editors. These are called "similarity queries," and are supposed to create optimal results.

On the whole, Webcrawler excels in ease of use and implements some very nice proximity search features, but its indices do not seem to be as extensive as AltaVista or Lycos. It offers some unique special features, such as 'search the web backwards,' to see who is linked to your page, and net statistics.


Lycos

Many of us who have used the Internet for a while have a fond spot for Lycos from its Carnegie Mellon days when it was truly a Godsend. Since the explosion of the web, better search engines have appeared, but Lycos is still good and fast, if not as sophisticated as some of the others. It offers both keyword and subject searching (the subject searches are called directory services), as well as a Point rating system which rates web pages. Its strong points are its speed, ease of use, and the large size of its indices, which often produce usable results by sheer brute force. Its weakest point is that it does not support boolean searching or any of the more sophisticated searches that can be made with AltaVista, Webcrawler or Excite.

Display Control: To gain any sort of control over your searches in Lycos, you need to click on the "Enhance your search" link on the Lycos front page. You will be taken to a screen which will allow you to:

Changing the type of search from OR to AND will result in far fewer hits, of course. The business about matching 2,3,4,5,6,7 terms allows for a degree of fuzzy matching with variant spellings. An example of match 2 terms, different from AND would be: Fyodor Dostoevski Dostoyevski. Documents containing any two of the terms will be returned, but not all three.

Focusing Your Search: You can change (fine tune) the results of your searches by changing the type of matches Lycos considers a success: loose, fair, good, close, and strong. The stronger the match, the fewer sites returned by Lycos.

Inclusion/Exclusion and Rankings: Lycos does not support the required/prohibited term syntax, as does AltaVista and Excite. You may, however prepend a search term with a - symbol meaning that that particular term will not be weighted in determining the ranking of the results: dogs -doberman will still get pages with the term doberman, but the pages with the term doberman will not appear at the top of the list. Lycos ranks each search, rating the best fit as 1.000 and all other hits as less than 1.000. As with the other search engines, it is rare for the site rated 1.000 to be the most useful.

Wildcards: To expand a word with a wildcard, add the $ symbol to the end of the word. For example, gen$ to get genetic, genesis, general, and so on. Lycos provides the use of the period character (.) after a word to prohibit its expansion: gene. will get just gene, and not genetics or general.


Infoseek

Infoseek was once the only Netscape default search engine. It is not the best available. Its virtues are speed and ease of use. Its defects are a lack of sophistication (booleans are not supported) and a 'teaser' approach to showing the first 100 hits and offering to show more for pay. It is both a search engine, and a searchable subject catalog, with options to search Usenet newsgroups, email addresses and web FAQs.

Searches are quasi-case sensitive. Capitalized words are taken as proper nouns and the search is limited. Searching for Babe will find the famous hitter and the famous pig, searching for babe, will also find the Sonny and Cher lyrics. Adjacent capitalized words links them into a phrase. Capitalized phrases must be separated with commas: The Great Bambino, Baseball Hall Of Fame. Phrases may be formed by enclosing the words in double quotes: "i've got you babe". Yet a third way to link words into phrases is to place hyphens between them: wonderful-life.

Required/Prohibited Operators: By prepending a word with a + symbol it requires that the term must be in the documents found by the search. Prepending a - symbol excludes documents containing that term from the search results: +Lincoln -automobile. There cannot be a space between the + or - sign and the affected word.

Proximity operator: Placing words in square brackets causes a hit if they are found within 100 words of each other: [immune disease].

To search Infoseek's 'select sites' (their subject catalog) change the search option from World Wide Web to 'Infoseek Select Sites' on the form provided next to the search term window. There are several other options available, including Reuters news stories.


Yahoo!

Yahoo is not a search engine, but strictly a heirarchically arranged subject index. It has developed over a long time, with lots of editorial care, so the quality is very high. Browsing Yahoo is the best way to surf for good sites when you don't know (or perhaps care) where exactly you are going. It is also the best way to find good 'starter' sites, from which you can branch out to more specailized ones.

Using Yahoo is simple. Just enter your search term(s) in the search window and click SEARCH. Yahoo will return three types of information: 1) Yahoo categories that match the search term (so you can explore them for cross referencing); 2) Actual matching end-sites; and 3) The Yahoo categories from which the various pages are indexed--sort of a 'much broader term' cross reference. Though you cannot create very sophisticated searches as with the search engines, you can control:

You may access these controls by clicking the small 'options' link next to the main search window.

Yahoo has a couple of other unique features: At the bottom of each results page links to search engines are provided. By clicking on Yahoo Remote you can invoke a secondary Netscape window which you can minimize and then maximize whenever you need to do a quick search.

If the essential search engine is AltaVista, the essential subject catalog is Yahoo! Don't surf without it.


Magellan

Magellan is not actually a search engine, but rather an on-line guide to the Internet that contains a directory of rated and reviewed sites, along with an index to lots of unreviewed sites. It is like Yahoo, only less inclusive with a more thorough rating system. (One to four stars, rather than Yahoo's shades to indicate a cool site). Magellan's strength is its system of reviews. It is not a good starting place to do a search, but is rather more useful when looking for sites which are tried and true. The emphasis at Magellan is on pop sites (UFOs are one of the main categories on the front page), but if that is what you are looking for the site is great. The only drawback is the inevitable advertising.


Other Search Engines

Alltheweb

Google

Meta search engines: the following engines feature a multi search using the results of the most common search engines.

Dogpile

NlightN


Summary of Search Engine Features

The following table summarizes some of the common features of the search engines discussed above. The names at the head of the columns are hot.

Category

AltaVista

Excite

WebCrawler

Lycos

InfoSeek

Yahoo!

Alltheweb

Case Sensitive?

Y

N

N

N

Y

N

N

Considers Phrases?

Y

N

Y

N

Y

N

Y

Required Term Operator

+

+

N

N

+

N

N

Prohibited Term Operator

-

-

N

N

-

N

N

Wildcard Expander

*

N

N

$

N

N

N

Limiting Character

N

N

N

.

N

N

N

Results Ranking?

Y

Y

Y

Y

Y

N

N

Controllable Results Ranking?

Y

N

N

Y

N

N

N

Booleans Allowed?

Y

Y

Y

N

N

N

Y

Proximity Operators Allowed?

Y(10)

N

Y(range)

N

Y(100)

N

N

Subject (Directory) Searching?

N

Y

Y

Y

Y

Y

N

Refine Based On First Search?

N

Y

N

N

N

N

N

Controllable Display Format?

Y

N

Y

Y

N

N

N


General Search Tips

What is the best search tool? It depends on your premises and why you need the information. If you are just browsing, start at Yahoo, or use the directory services of Webcrawler (GNN) or one of the other subject catalogs. If you are looking for best of web--and your interests are "pop,"--use Magellan. If you need a fast and reliable search engine try Alltheweb. If you are doing "serious" research, start with AltaVista, but be prepared to use the other good search engines too, and follow these general rules of thumb:


Other Useful Articles

Beyond Surfing: Tools and Techniques for Searching the Web by Kathleen Webster and Kathryn Paul.

The Webmaster's Guide to Search Engines and Directories by Danny Sullivan.

Yahoo's index to searching the web.


BACK