Finding Information on the Web: Strategies and Techniques

A Workshop presented by
Janette R. Hill
Georgia State University


Return to the IT 3210 Course Projects Page
Return to the IT 7360 Course Projects Page
Return to the IT 3210 Software Help Sheet Index
Return to the IT 7360 Software Help Sheet Index

Outline of Topics



Overview of the Web

Where it’s been...

While the Internet has a rather long history, dating back to the 1960s and the U.S. military’s establishment of ARPANET, the Web, the multimedia-touting sibling of the text-based Internet, is still at toddler stage. The Web was created in 1990 by a group of scientists at the CERN research lab in Switzerland. Led by Tim Berners-Lee, the scientists hoped to more easily share information with each other as they conducted research and prepared reports.

In 1991, the Web was unleashed to the world and the metamorphosis began. Web browsers, essential tools for viewing Web pages, were created. In 1993, Mosaic, the first full-color graphical browser, was released by Marc Andressen and his colleagues at the University of Illinois to help make displaying Web pages easier for the everyday user. Other browsers, such as Netscape's Navigator and Microsoft's Internet Explorer, soon followed. Standards for displaying Web information (i.e., HTML) were, and continue to be, refined, enhanced, and expanded.

The system that started off with a few hundred pages has grown exponentially. Recent studies estimate that over 320 million pages of information are now apart of the World Wide Web (Lawrence & Giles, 1998). And the pages -- and the sites that package the pages -- cut across social and cultural institutions, ranging from more traditional information sources (e.g., magazines, journals, newspapers), to businesses (e.g., Apple Computer, Coca Cola, Volkswagen), to government agencies (e.g., international, national, state and local governments, the United Nations, NASA), to the arts (e.g., the Louvre, the Metropolitan Museum of Modern Art, the Smithsonian), to specific pieces of information posted by individuals (e.g., research papers, how to sites). The Web continues to grow in leaps and bounds with no end in sight.

Where it’s going...

Given the rapid development of the Web, it is difficult to predict where it might be headed. Instead, it may be more realistic to discuss where it is not going -- at least from experts' best guesses -- and to offer a few preliminary precautions about it’s use.

When the Web first arrived on the information source scene, many adopted it as the ultimate way to share personal information with the world. Today, the Web is much more than a collection of personal pages published by a few computer nerds. The Web has become one of the hottest applications around for electronic commerce, marketing, public relations, and education.

Another major function of the Web in its infancy was to serve as what many termed, the "ultimate digital library." While it did, and still does from many perspectives, serve in this role, trends indicate that in its continued evolution, the Web is moving beyond the library metaphor. In a recent article, Cliff Lynch, the director of library automation at the University of California at Berkley, argues rather pointedly that the Internet, and it’s companion the Web, is not the world’s ultimate library for the information age. Rather, the Internet and Web have "...evolved into what might be thought of as a chaotic repository for the collective output of the world’s digital ‘printing presses’" (Lynch, 1997). According to Lynch, we need to make decisions about what we want from the Web and help guide its evolution to meet these goals. Additionally, Lynch expresses the need to produce the tools we will use in order to make information retrieval on the Web more reliable and precise.

Lynch’s points are well articulated and difficult to ignore. Others (see, for example, Kratzert & Richey, 1997) have also discussed the precarious nature of the Web, offering cautions about it’s use and issues to keep in mind while using the Web as an information source. These include:

This is not to say that the Internet and Web do not have a lot to offer. Several experts have touted the wonders of these information resources. Further, the acceptance of these tools, especially the Web, for sharing information are evident throughout society. Print advertisements for companies and products display URLs (Uniform Resource Locators -- the addresses used to access Web sites); commercials for Web-based products (e.g., WebTV) are running on network TV; radio announcers often include URLs in their information announcements. The Web is quickly permeating our world, changing how we learn, work, and live.

The benefits offered by the Web for sharing and accessing information from around the global are substantial. However, if we are to take advantage of the promise the Web holds, we need to get smarter about the tools, strategies and techniques we use to seek information in this open-ended system (Hill & Hannafin, 1997). By honing our Web information retrieval processes, we enable more rapid and reliable retrieval of information from this often unwieldy information tool.

Return to the Outline of the Topics


Web Information Retrieval (IR) Tools

Note: This information was adapted from a guide established by librarians at the University of Northern Colorado, Greeley.

Web information retrieval tools are vital to finding information on the Web. Selecting the right tool for the type of search, or by the goal of your search, can substantially enhance the information retrieval process. Web IR tools can be broken out into three main categories: directories, search engines, and review sites. Category descriptions and examples of the tools that fall within the categories are listed below.

Directories

Directories are useful for finding information on a general topic. They contain hierarchical lists of subject categories.
 
Yahoo is probably the most popular directory. Its hierarchical listings can be browsed or searched. http://www.yahoo.com
The Virtual Library is a WWW cataloguing project where field experts maintain each topic or division. http://www.w3.org/vl
The Infomine catalog contains resources of relevance to faculty, students, and research staff at the university level. http://lib-www.ucr.edu

 

Search Engines

Search engines provide forms into which you can enter terms. They will return a list of Internet sites that match your search criteria.
 
Alta Vista, likely the fastest and most sophisticated engine available, has forms for both simple and advanced searching. A Refine feature is available for narrowing results. http://altavista.digital.com 
HotBot is one of the most comprehensive Internet search engines, indexing a larger number of World Wide Web sites and a greater variety of formats than most other search engines. http://www.hotbot.com
Excite offers a large database of Internet sources to search or browse. Excite can be personalized to match an individual's specific information needs. http://www.excite.com
InfoSeek Guide offers options for both searching and browsing, providing information for specific topic areas in an easy-to-find, interactive format. http://guide.infoseek.com
Lycos is one of the most advanced search engines on the World Wide Web. An extensive range of proximity search features is available in the Lycos Pro section. http://www.lycos.com 
Northern Light offers two databases, one for Web pages and the other for citations to over 3,400 publications. Full-text of the articles is available for a fee. http://www.northernlight.com

 

Review Sites

Review sites provide evaluations of World Wide Web resources. Review criteria and rating systems vary from site to site.
 
Magellan offers searching and browsing of a broad mix of reviewed Internet sites. http://www.mckinley.com
Top 5% rates and reviews the best sites on the World Wide Web. Searches cover full text of a site and the review. http://point.lycos.com/categories

 

More specific types of IR tools are also beginning to populate the Web, including:

Image Search Tools

Image search tools are designed to identify images by their colors, shapes, and subject content rather than the names of their files.
 
Yahoo Image Surfer is a directory of images located on popular Web sites. Searching and subject browsing of thumbnail images is possible. http://ipix.yahoo.com
WebSEEk offers a catalog of over 650,000 images and videos across the Web, supporting browsing and searching by subject. http://www.ctr.columbia.edu/webseek

People Search Tools

People search tools are designed to identify people around the world. Various information can be located, including e-mail addresses, addresses and telephone numbers.
 
Yahoo People Search. http://people.yahoo.com
Bigfoot. 

http://bigfoot.com/

Other specialized databases can be used to find specific types of information, including:
 
Company Information SmartScape, Open Market's Commercial Sites Index
News The New York Times on the Web, USA Today
Jobs Yahoo's Employment Ads
Education Education World

 

Return to the Outline of the Topics


Selecting a Tool for Searching

While the varied search tool categories can be helpful in narrowing the possibilities, multiple tools are available within each category. This is where rankings or ratings of the IR tools can prove to be quite useful in guiding decisions about what tool to use. Several leading computer magazines rank the various search tools according to their accuracy, relevancy and speed. Three of the most recent reviews are summarized below.

PC Magazine, September 1998

Number 1 on the list for search engines: Excite; for subject directories, Yahoo.

Cnet, January 1998

Number 1 for search engine: HotBot, and Infoseek

Iworld, December 1997

Number 1 for search engine: AltaVista; in second place, HotBot; in third place, InfoSeek

For more extensive reviews and up-to-date ratings, check out the Search Engine Watch Web site.

Return to the Outline of the Topics


Strategies for Searching

Strategies for information retrieval on the Web have been offered by a variety of people investigating best practices for Web IR. While no hard and fast rules have been established, a few general tips seem to be consistent across the varied sources.

1. Select the right tool.

The Web offers a variety of search tools (e.g., Yahoo, AltaVista). However, these tools, as well as the databases they search, vary greatly. You can narrow your selection by having an idea of the kinds of searching the tools perform (see the Web IR Tools section) or by selecting a tool based on its ranking by varied experts (see the Selecting a Tool for Searching section).

2. Know how to use the search tool.

If you have not previously used the search tool, look at the instructions for how to search (often labeled with a link called "Help" near the search entry box). The instructions will provide you with information on the scope and source of the data, how the search engine works, and additional features available to help refine your search. Most search tools change frequently so double check the instructions on a regular basis.

Web IR tools perform two types of text searches: keyword and concept. Keyword searching is the most common type of search performed, and almost all Web IR tools use this technique. Keywords may be specified by the author of a Web site/page, or may be determined by the search engine retrieving the information. Keywords are those words considered to be "significant" to the document content, often determined by location on the Web page (e.g., top of the document) or the number of times the term is repeated on the page.

Concept searching focuses on returning Web documents about the topic you are searching. The Excite search engine relies on concept-based searching. Rather than focusing specifically on the search terms you have selected, Excite attempts to determine what you mean, and then pulls relevant Web sites. For example, if you enter in the term "cat," and the term appears with other terms such as pet or veterinarian, the IR tool might return documents on the topic of cat as an animal. If "cat" appears with terms such as building or tools, the IR tool might return documents on the Caterpillar company.

3. Construct your search.

Keep your search as simple as possible. Use specific terms. Provide alternatives for words with varied spellings (e.g., color, colour). Avoid commonly used terms such as the, of, a. Use the additional features offered by the search tool. Features often found with search tools include:

Several charts are available on the Web outlining the specific features of various IR tools (see, for example, Terry Gray's site, How to search the Web, or the Search Engine Comparison Chart maintained by the Kansas City Public Library). Help features associated with each Web IR tool also detail how to perform advanced searches with the tool.

4. Check the search results.

Often, a search will not be perfect the first time. Evaluate the results list to help you refine your search (e.g., broaden the topic if it was too narrow).

5. Repeat your search using other search tools.

A single search is usually not sufficient to find a broad representation of information. Even if you have found what you are looking for, it may not be the most accurate or the most current information available. It is always a good idea to search using several tools to get a better representation of what is available.

Additional Search Tips

1. Use Frequently Asked Question (FAQ) files to find information from experts on a topic.

2. Check Internet newsgroups (special interest discussions on millions of topics) to locate postings by others talking about your topic (see, for example, Deja News Research Service).

3. Be aware of spelling variations (e.g. organisation, organization).

4. Good sites will often lead you to other sites relevant to your topic. Make use of "more like this" options found in some Web IR tools (e.g., Excite).

5. Find a Web page created and maintained by an expert in a particular field or domain.

6. Ask information professionals for assistance.

7. Use your Web browser’s FIND command to locate terms within a document.

Return to the Outline of the Topics


Trying out a few Techniques

The best way to discover the power and limitations of a Web IR tool is to use it! So, pick a topic and start playing! Here are a few ideas to help you get started...

1. After you have selected your topic, use at least two Web IR tools within a specific category (e.g., directories, search engines) to locate information. Compare the results you retrieve from each tool. In evaluating the IR tools, ask yourself the following questions:

2. Compare your search results across different types of IR tools (e.g., search engines vs. directories).

3. Play around with performing simple searches (e.g., keyword) and advanced searches (e.g., making use of Boolean operators). Compare the results from both types of searches.

Return to the Outline of the Topics


A Few References

General information on the Web

Hill, J. R., & Hannafin, M. J. (1997). Cognitive strategies and learning from the World Wide Web. Educational Technology Research & Development, 45(4), 37-64.

Kratzert, M., & Richey, D. (1997). Ten Internet myths. College and Undergraduate Libraries, 4(2), 1-8.

Lawrence, S., & Giles, C. L. (1998, April). Searching the World Wide Web. Science, 280, 100.

Lemay, L. (1996). Teach yourself Web publishing with HTML 3.2 in a week (3rd ed.). Indianapolis, IN: SamsNet.

Lynch, C. (1997, March). Searching the Internet. Scientific American. Available on-line: http://www.sciam.com/0397issue/0397lynch.html

Techniques for searching the Web

Barlow, L. (1998). The spider's apprentice -- Tips on searching the Web. http://www.monash.com/spidap.html

Gray, T. A. (date unknown). How to search the Web: A guide to search tools. http://daphne.palomar.edu/TGSEARCH

Kansas City Public Library. Introduction to search engines. http://www.kcpl.lib.mo.us/search/srchengines.htm

Sullivan, D. (1998). Search engine watch. http://searchenginewatch.com/

Tyner, R. (1998). Sink or swim: Internet search tools and techniques (v. 3.5). http://www.ouc.bc.ca/libr/connect96/search.htm

UNC Library (1998). Web search engines. http://www.unco.edu/library/guides/searchengines.htm

Webster, K., & Paul, K. (1996). Beyond surfing: Tools and techniques for searching the Web. http://magi.com/~mmelick/it96jan.html

Return to the Outline of the Topics