- Searching the Web
- The Internet is a constantly evolving, vast sea of explosive informational growth. Search Engines have been designed for one purpose -- to allow you to find information on the Internet including Web sites, Web pages, and Internet files that match one or more keywords you enter. Knowledge of Search Engines is essential to enable you to use your time more productively. Reviewing the descriptions about Archie, Gopher, and Veronica will give you some important background as to the origins of Search Engines. Click here for a list of Common Search Engines or here for a helpful tutorial on using search engines.
- Although search terms vary from engine to engine there are some similarities. Most allow you to type in a word or series of words as your search terms. When you press <Enter> or a button, the engine finds Web sites with information related to your request. Then you can click on these hyperlinks and views these sites.
- Some sites only allow simple searches. Some allow very complex Boolean searches which allow the use of logical operators such as AND, OR, NOT. To really use each site most effectively, read the help file or FAQ (Frequently Asked Question) available at most sites.
Search Engines have been quickly evolving. They now form four distinct hierarchical entities:
- Search Directories
- Search Engines
- Super Engines
- Meta Search Engines or Multiple Engine Search
- Directories:
- A search directory is a search tool that allows users to navigate to areas of interest without having to enter keywords. Surfing search directories is simply a matter of following the links to the specific topic you are looking for. Search directories are usually arranged in categories such as technology, education, sports, entertainment, or business. Search tools covering a specific discipline, such as education, are also available on the Web.
- Yahoo, Magellan and Look Smart are Web directories. When you search on Yahoo you are searching a database of select Web sites, NOT the entire World Wide Web. Every site is assigned a category (subject) in the directory. Yahoos search function allows a keyword search, or you can click on the appropriate category or subject and navigate through the directorys hierarchy.
- Search Engines:
Some search engines look for simple word matches and others allow for more specific searches on a series of words or an entire phrase. Search engines do not search the entire Internet, instead, they search an index or database of Internet sites and documents. Search tool companies are continuously updating their databases. Because the Internet is growing exponentially, and because search engines scan different parts of the Internet and in different ways, performing the same search using the different search engines may often yield different and inconsistent results.
- Infoseek, Webcrawler and Lycos build their indexes through the use of software "robots" or "spiders" that crawl around the Web indexing and cataloging Web site content. Each "robot" is designed differently and behaves differently. Robots look for words in the titles, descriptions and "Meta tags" (keywords) that a Web site producer assigns to a Web page. The number of times a key word appears and where that key word appears give rise to the relevancy scoring that you see (the percentage number that appears in the search results). So one robot may assign greater relevancy to key words in the description whereas another may assign relevancy to the number of times a word appears. (This would explain why a search on Lycos may yield a relevancy of 100% but on Infoseek it shows up at 90%).
- Super Engines
- Hot Bot, Altavista, Excite and Open Text utilizes robots and spiders, but in addition to indexing keywords from the tiles, descriptions and Meta tags, these robots actually index key words from the text on the pages themselves. They give you "hits" on keywords that are much deeper into the content of the Web page. Relevancy scoring differs for each Search Engine
Commercial producers of Web pages are extremely savvy in their positioning tactics. Their jobs are to get their pages seen and they utilize their knowledge of robots to make their Web site pop up in the top 5 or 10 during a key word search.
- Meta Search Engines or Multiple Engine Search
- Dogpile, Cyber411, Savvy Search or MetaCrawler allow you to perform a keyword search in many Search Engines at the same time. You can use these to search Infoseek, Altavista and many others simultaneously. Speed, number, and presentation of results vary. Most will give you the ten top hits from each of the Search Engines (or allow you to set the number and presentation of your search results).
- Special Search Engines
- DejaNews is a Search Engine that indexes the content of Usenet newsgroups. Another, Infospace, has an index that includes addresses and phone numbers for anyone that is listed in a U.S. phone book. FTP Search searches contents of FTP Archive sites. The number of specialty Search Engines is growing rapidly and youll notice that some of the Meta Search Engines (like Dogpile) have responded by including them in their list of engines that are searched.
Internet Archive crawls the Web to preserve Web sites. It estimated that as of April, 1997, Altavistas chief technical officer said his Search Engine had crawled at least 100million pages and thought it reasonable to assume there are at least 150 million pages out there (http://searchenginewatch.com/facts/major.html).
- Alta Vista
- Archie
- DejaNews
- The Electronic Library
- Excite
- Gophers
- FTP
- Infoseek
- Lycos
- Magellan
- MetaCrawler
- Net Search
- Northern Light
- Open Text Index
- The Webcrawler
- WhoWhere?
- Yahoo!
- Veronica
- FTP
- FTP stands for File Transfer Protocol. FTP is a special way to login to another Internet site to retrieve and/or send files. It is both a method of transferring files over the Internet, and a protocol to encode and decode these files. FTP makes it possible to send and receive files between different kinds of computers. It is important to remember that in the early days of computing, this was an impossible task. Therefore, the development of FTP transfer protocol was a huge step in the history of the Internet, because it allowed computers to "talk" to, and "understand" each other. Access to the computer to transfer files may or may not require a password. There are many Internet sites that have publicly accessibly material that can be obtained using FTP, by logging in using the name "anonymous." These sites are aptly called "anonymous FTP servers".
- Archie
- As the history of the Internet progressed, Search Engines were needed because it became increasingly difficult to find a site that had the file you were searching for. Archie was a program that created an archive of FTP sites (resources that are stored on Internet-based FTP servers). Archie is short for Archive because it performs an archive search for resources. The drawback was that Archie was purely text and not easily used.
- Gopher
- Gopher is a format structure and resource for providing information on the Internet. It was widely successful in providing menus of what material was available on the Internet. It was created at the University of Minnesota. Although there are still thousands of Gopher Servers on the Internet, the World Wide Web (also known as Hypertext) has pretty much taken over. Yet, there are still some excellent resources for the classrooms, as well as opportunities for collaborative projects available in Gopherspace. A gopher allows you to browse the Internet for information and retrieves files through the use of a multi-leveled menu system. The following address will take you to a directory of gopher sites: gopher://gopher.tc.umn.edu
- Veronica
- Later, when the menu interface to the Internet Gopher was developed, there were Gophers everywhere, but finding the one you were looking for was a problem. Veronica (Very Easy Rodent Oriented Net-wide Index to Computer Archives) was developed to solve this problem. Both Archie and Veronica were simple indexes of registered sites. The Web has no such central registration process, and this in addition to the vast proliferation of Web sites, poses monumental search problems way beyond those faced in the Internet's early days. So today's Search Engines are much more sophisticated.
Net Search is the Search Engine of the Netscape Corporation. This engine works best if you know what topic your search falls under. Topics cover a very broad range from Science & Technology to Arts and Entertainment. This site uses simple syntax for its searches.
- Northern Light
- It presorts search returns, groups them into folders based on where they come from and what they are. It has special collections-information from over 4,500 business magazines, trade journals, news wires, and academic journals.
Open Text Index is one of the deepest Search Engines on the Web. The Open Text Index searches every word of every Web page indexed! You can type queries of almost any length and narrow your search by limiting it to titles or links only. You can search for a single word, a phrase of any length or any combination. There is full support for simple Boolean terms like AND, OR and NOT and also more advanced terms like BUT NOT, NEAR, and FOLLOWED BY. If your search comes up with too many hits, you can limit the search by asking for URLs only, or searching only titles and headings of pages.
LYCOS Search Engine claims to be the only complete guide to the Internet, boasting the largest catalog of URLs, a directory of the most popular sites links to real-time news, and reviews of the Web's top sites. It is also organized by topics similar to Net Search. There are three parts to Lycos technology. First, Lycos uses computer programs called spiders to constantly scan the Internet to keep track of new documents as they appear. These spiders also scan for changes, and deletions to existing documents. The second part of Lycos is the data base. Lycos adds, deletes, and updates about 50,000 documents every day! The third part of Lycos is the Search Engine, which provides an efficient method of finding and retrieving information from the data base.
- Magellan, McKinley's Internet Guide, provides reviews and ratings for a vast collection of Web, FTP, and Gopher sites, and Usenet newsgroups.
- Teachers concerned with the availability of adult-oriented material will appreciate the greenlight feature of this site. All sites reviewed that contain objectionable material are indicated by Magellan's Green Light.
- You can browse Magellan topics or search specific keywords or phrases. Keep your searches simple and avoid using specific Boolean operators like AND. Magellan assumes that if you list several words, you want web sites that contain both of them.
MetaCrawler is a part of Go2Net network. MetaCrawler takes your search request and runs it against Altavista, Excite, Infoseek, Lycos, Thunderstone, Webcrawler, and Yahoo! to return a single list of results that come from every corner of the Web. You can use MetaCrawler to make a very fast sweep of the Web to make sure you're on the right track.
Excite claims to track down information by searching for concepts rather than keywords alone. Its data base is updated once a week. Excite's database also contains over 50,000 Web page reviews written by journalists, and taken from the latest (two weeks) of Usenet news and classifieds.
Alta Vista claims, as does Lycos to be the largest Web index on the net. The database consists of all 10 billion words found in over 21 million Web pages. It also provides a full-text index of over 13,000 news groups updated in real-time.
Yahoo! is a pioneer of Internet guides. Yahoo! has hierarchically organized a directory of Web pages. If you know your topic, and which of the Yahoo! categories it falls under, this engine is a good choice. It is fast, but you still have to find your way through many menus. It does not do an extensive Web search, it it does not allow you to view summaries.
WhoWhere? is an electronic White Pages for locating people and organizations on the Internet. WhoWhere? is fast and easy to use. It even handles misspelled or incomplete names, and lets you search by just initials.
shareware.com is the place to be if you're looking for shareware. It contains over 190,000 software files. Every day a PC or Macintosh title is reviewed.
The Electronic Library is a comprehensive digital archive that can be explored for a small monthly fee. However, the site is currently offering a free 30 day trial period. Users ask questions in plain English and the Electronic Library launches a comprehensive search through more than 150 full-text newspapers, over 900 full-text magazines, two international news wires, two thousand classic books, hundreds of maps, thousands of photographs, as well as major works of literature and art! What a site!
DejaNews is the source for Internet News Groups. The company claims that this is the world's largest publicly searchable Usenet news archive. Search options allow you to find articles by date, author, subject, and newsgroups. Usenet is a very powerful Internet resource, DejaNews helps you use it.
InfoSeek is a useful directory of reviews of popular Internet resources. These include Usenet newsgroups, Web sites, FTP and Gopher sites. The reviews are cross-referenced across multiple topics. InfoSeek performs case sensitive searches which are useful when you are looking for specific titles, names, or phrases. InfoSeek allows you to either search the entire World Wide Web, or just to certain particular categorized sites.
Accufind takes you straight to the search results by typing in a search phrase just once, and then getting the result pages from many Search Engines. Accufind leads you to the best places to search, and saves you considerable time in so doing.
The Webcrawler is a search tool operated by America Online, Inc. However you don't need to belong to AOL to use it. The WebCrawler finds resources on the Web that match your search inquiries.