9780136072249

Summary

KEY BENEFIT : Written by a leader in the field of information retrieval, this text provides the background and tools needed to evaluate, compare and modify search engines. KEY TOPICS : Coverage of the underlying IR and mathematical models reinforce key concepts. Numerous programming exercises make extensive use of Galago, a Java-based open source search engine. MARKET : A valuable tool for search engine and information retrieval professionals.

Author Biography

W. Bruce Croft is a Distinguished Professor in the Department of Computer Science at the University of Massachusetts, Amherst, which he joined in 1979. In 1992, he became the Director of the Center for Intelligent Information Retrieval (CIIR), which combines basic research with technology transfer to a variety of government and industry partners. He has published more than 180 articles related to information retrieval. Dr. Croft was elected a Fellow of ACM in 1997, received the Research Award from the American Society for Information Science and Technology in 2000, and received the Gerard Salton Award from the ACM Special Interest Group in Information Retrieval (SIGIR) in 2003.

Donald Metzler is a Research Scientist in the Search and Computational Advertising group at Yahoo! Research in Santa Clara, CA. He obtained his Ph.D. from the University of Massachusetts in 2007. During his graduate studies he was awarded a Microsoft Live Labs Graduate Fellowship. His research interests include formal information retrieval models, web search, advertising, and machine learning.

Trevor Strohman is a software engineer in the Google search quality division. His Ph.D., from the University of Massachusetts Amherst, focused on high-performance text retrieval systems that are easily adaptable to fit specific retrieval applications. He has published papers and presented a tutorial at the top information retrieval conference, SIGIR. He is the creator of the Galago search engine, and the primary developer of the Indri search engine (www.lemurproject.org/indri). He has ten years of professional software development experience, including desktop, server, and web applications.

Search Engines and Information Retrieval	p. 1
What is Information Retrieval?	p. 1
Search Engines	p. 6
Search Engineers	p. 9
Book Overview	p. 10
Architecture of a Search Engine	p. 15
What is an Architecture?	p. 15
Basic Building Blocks	p. 16
Breaking It Down	p. 19
Text Acquisition	p. 19
Text Transformation	p. 21
Index Creation	p. 24
User Interaction	p. 25
Ranking	p. 27
Evaluation	p. 29
How Does It Really Work?	p. 30
Crawls and Feeds	p. 33
Deciding what to search	p. 33
Crawling the Web	p. 33
Directory Crawling	p. 34
Document Feeds	p. 34
The Conversion Problem	p. 34
Storing the Documents	p. 35
Detecting Duplicates	p. 36
Removing Noise	p. 39
Processing Text
Table of Contents provided by Publisher. All Rights Reserved.

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Search Engines Information Retrieval in Practice

0136072240

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program