W. Bruce Croft is a Distinguished Professor in the Department of Computer Science at the University of Massachusetts, Amherst, which he joined in 1979. In 1992, he became the Director of the Center for Intelligent Information Retrieval (CIIR), which combines basic research with technology transfer to a variety of government and industry partners. He has published more than 180 articles related to information retrieval. Dr. Croft was elected a Fellow of ACM in 1997, received the Research Award from the American Society for Information Science and Technology in 2000, and received the Gerard Salton Award from the ACM Special Interest Group in Information Retrieval (SIGIR) in 2003.
Donald Metzler is a Research Scientist in the Search and Computational Advertising group at Yahoo! Research in Santa Clara, CA. He obtained his Ph.D. from the University of Massachusetts in 2007. During his graduate studies he was awarded a Microsoft Live Labs Graduate Fellowship. His research interests include formal information retrieval models, web search, advertising, and machine learning.
Trevor Strohman is a software engineer in the Google search quality division. His Ph.D., from the University of Massachusetts Amherst, focused on high-performance text retrieval systems that are easily adaptable to fit specific retrieval applications. He has published papers and presented a tutorial at the top information retrieval conference, SIGIR. He is the creator of the Galago search engine, and the primary developer of the Indri search engine (www.lemurproject.org/indri). He has ten years of professional software development experience, including desktop, server, and web applications.
Search Engines and Information Retrieval | p. 1 |
What is Information Retrieval? | p. 1 |
Search Engines | p. 6 |
Search Engineers | p. 9 |
Book Overview | p. 10 |
Architecture of a Search Engine | p. 15 |
What is an Architecture? | p. 15 |
Basic Building Blocks | p. 16 |
Breaking It Down | p. 19 |
Text Acquisition | p. 19 |
Text Transformation | p. 21 |
Index Creation | p. 24 |
User Interaction | p. 25 |
Ranking | p. 27 |
Evaluation | p. 29 |
How Does It Really Work? | p. 30 |
Crawls and Feeds | p. 33 |
Deciding what to search | p. 33 |
Crawling the Web | p. 33 |
Directory Crawling | p. 34 |
Document Feeds | p. 34 |
The Conversion Problem | p. 34 |
Storing the Documents | p. 35 |
Detecting Duplicates | p. 36 |
Removing Noise | p. 39 |
Processing Text | |
Table of Contents provided by Publisher. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.