Information Retrieval

Information retrieval is the foundation for modern search engines. This text offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus-a multiuser open-source information-retrieval system developed by one of the authors and available online-provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. After an introduction to the basics of information retrieval, the text covers three major topic areas-indexing, retrieval, and evaluation-in self-contained parts. The final part of the book draws on and extends the general material in the earlier parts, treating such specific applications as parallel search engines, Web search, and XML retrieval. End-of-chapter references point to further reading; exercises range from pencil and paper problems to substantial programming projects. In addition to its classroom use, Information Retrievalwill be a valuable reference for professionals in computer science, computer engineering, and software engineering.

Complete Table of Contents
Foreword	p. xix
Preface	p. xxi
Notation	p. xxv
Foundations
Introduction	p. 2
What Is Information Retrieval?	p. 2
Information Retrieval Systems	p. 5
Working with Electronic Text	p. 9
Test Collections	p. 23
Open-Source IR Systems	p. 27
Further Reading	p. 28
Exercises	p. 30
Bibliography	p. 32
Basic Techniques
Inverted Indices	p. 33
Retrieval and Ranking	p. 51
Evaluation	p. 66
Summary	p. 76
Further Reading	p. 77
Exercises	p. 79
Bibliography	p. 82
Tokens and Terms
English	p. 85
Characters	p. 91
Character N-Grams	p. 92
European Languages	p. 94
CJK Languages	p. 95
Further Reading	p. 97
Exercises	p. 99
Bibliography	p. 100
Indexing
Static Inverted Indices	p. 104
Index Components and Index Life Cycle	p. 104
The Dictionary	p. 106
Postings Lists	p. 110
Interleaving Dictionary and Postings Lists	p. 114
Index Construction	p. 118
Other Types of Indices	p. 131
Summary	p. 132
Further Reading	p. 132
Exercises	p. 133
Bibliography	p. 135
Query Processing
Query Processing for Ranked Retrieval	p. 137
Lightweight Structure	p. 160
Further Reading	p. 169
Exercises	p. 170
Bibliography	p. 171
Index Compression
General-Purpose Data Compression	p. 175
Symbolwise Data Compression	p. 176
Compressing Postings Lists	p. 191
Compressing the Dictionary	p. 216
Summary	p. 222
Further Reading	p. 223
Exercises	p. 224
Bibliography	p. 225
Dynamic Inverted Indices	p. 228
Batch Updates	p. 229
Incremental Index Updates	p. 231
Document Deletions	p. 243
Document Modifications	p. 250
Discussion and Further Reading	p. 251
Exercises	p. 253
Bibliography	p. 254
Retrieval And Ranking
Probilistic Retrieval	p. 257
Modeling Relevance	p. 259
The Binary Independence Model	p. 261
The Robertson/Sparck Jones Weighting Formula	p. 264
Term Frequency	p. 266
Document Length: BM25	p. 271
Relevance Feedback	p. 273
Field Weights: BM25F	p. 277
Experimental Comparison	p. 279
Further Reading	p. 280
Exercises	p. 281
Bibliography	p. 282
Language Modeling and Related Methods	p. 286
Generating Queries from Documents	p. 287
Language Models and Smoothing	p. 289
Ranking with Language Models	p. 292
Kullback-Leibler Divergence	p. 296
Divergence from Randomness	p. 298
Passage Retrieval and Ranking	p. 302
Experimental Comparison
Further Reading	p. 306
Exercises	p. 307
Bibliography	p. 307
Categorization and Filtering	p. 310
Detailed Examples	p. 313
Classification	p. 331
Probabilistic Classifiers	p. 339
Linear Classifiers	p. 349
Similarity-Based Classifiers	p. 354
Generalized Linear Models	p. 355
Information-Theoretic Models	p. 359
Experimental Comparison	p. 366
Further Reading	p. 371
Exercises	p. 372
Bibliography	p. 373
Fusion and Metalearning	p. 376
11.1 Search-Result Fusion	p. 377
Stacking Adaptive Filters	p. 381
Stacking Batch Classifiers	p. 383
Bagging	p. 385
Boosting	p. 387
Learning to Rank	p. 394
Further Reading	p. 400
Exercises	p. 401
Bibliography	p. 401
Evaluation
Measuring Effectiveness	p. 406
Traditional Effectiveness Measures	p. 407
The Text REtrieval Conference (TREC)	p. 410
Using Statistics in Evaluation	p. 412
Minimizing Adjudication Effort	p. 441
Nontraditional Effectiveness Measures	p. 451
Further Reading	p. 460
Exercises	p. 462
Bibliography	p. 463
Measuring Efficiency	p. 468
Efficiency Criteria	p. 468
Queueing Theory	p. 472
Query Scheduling	p. 478
Caching	p. 479
Further Reading	p. 484
Exercises	p. 484
Bibliography	p. 485
Applications And Extensions
Parallel Information Retrieval	p. 488
Parallel Query Processing	p. 488
MapReduce	p. 498
Further Reading	p. 503
Exercises	p. 504
Bibliography	p. 505
Web Search	p. 507
The Structure of the Web	p. 508
Queries and Users	p. 513
Static Ranking	p. 517
Dynamic Ranking	p. 535
Evaluating Web Search	p. 538
Web Crawlers	p. 541
Summary	p. 541
Further Reading	p. 553
Exercises	p. 556
Bibliography	p. 558
XML Retrieval	p. 564
The Essence of XML	p. 565
Paths, Trees, and FLWORs	p. 571
Indexing and Query Processing	p. 576
Ranked Retrieval	p. 579
Evaluation	p. 583
Further Reading	p. 585
Exercises	p. 587
Bibliography	p. 587
Appendix
Computer Performance	p. 592
Sequential Versus Random Access on Disk	p. 592
Sequential Versus Random Access in RAM	p. 593
Pipelined Execution and Branch Prediction	p. 594
Index	p. 597
Table of Contents provided by Publisher. All Rights Reserved.

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

9780262026512

0262026511

Summary

Table of Contents

Supplemental Materials

Rewards Program

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

Information Retrieval

9780262026512

0262026511

Summary

Table of Contents

Supplemental Materials

Rewards Program

Digital License