Introduction to Information Retrieval

  • ISBN13:


  • ISBN10:


  • Format: Hardcover
  • Copyright: 2008-07-07
  • Publisher: Cambridge University Press
  • Purchase Benefits
  • Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
  • We Buy This Book Back!
    In-Store Credit: $10.50
    Check/Direct Deposit: $10.00
List Price: $74.99 Save up to $2.99
  • Buy New
    Add to Cart Free Shipping


Supplemental Materials

What is included with this book?

  • The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
  • The eBook copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.


Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

Author Biography

Christopher D. Manning is Associate Professor of Computer Science and Linguistics at Stanford University Prabhakar Raghavan is Head of Yahoo! Research and a Consulting Professor of Computer Science at Stanford University Hinrich Schutze is Chair of Theoretical Computational Linguistics at the Institute for Natural Language Processing, University of Stuttgart

Table of Contents

Table of Notationp. xi
Prefacep. xv
Boolean retrievalp. 1
An example information retrieval problemp. 3
A first take at building an inverted indexp. 6
Processing Boolean queriesp. 9
The extended Boolean model versus ranked retrievalp. 13
References and further readingp. 16
The term vocabulary and postings listsp. 18
Document delineation and character sequence decodingp. 18
Determining the vocabulary of termsp. 21
Faster postings list intersection via skip pointersp. 33
Positional postings and phrase queriesp. 36
References and further readingp. 43
Dictionaries and tolerant retrievalp. 45
Search structures for dictionariesp. 45
Wildcard queriesp. 48
Spelling correctionp. 52
Phonetic correctionp. 58
References and further readingp. 59
Index constructionp. 61
Hardware basicsp. 62
Blocked sort-based indexingp. 63
Single-pass in-memory indexingp. 66
Distributed indexingp. 68
Dynamic indexingp. 71
Other types of indexesp. 73
References and further readingp. 76
Index compressionp. 78
Statistical properties of terms in information retrievalp. 79
Dictionary compressionp. 82
Postings file compressionp. 87
References and further readingp. 97
Scoring, term weighting, and the vector space modelp. 100
Parametric and zone indexesp. 101
Term frequency and weightingp. 107
The vector space model for scoringp. 110
Variant tf-idf functionsp. 116
References and further readingp. 122
Computing scores in a complete search systemp. 124
Efficient scoring and rankingp. 124
Components of an information retrieval systemp. 132
Vector space scoring and query operator interactionp. 136
References and further readingp. 137
Evaluation in information retrievalp. 139
Information retrieval system evaluationp. 140
Standard test collectionsp. 141
Evaluation of unranked retrieval setsp. 142
Evaluation of ranked retrieval resultsp. 145
Assessing relevancep. 151
A broader perspective: System quality and user utilityp. 154
Results snippetsp. 157
References and further readingp. 159
Relevance feedback and query expansionp. 162
Relevance feedback and pseudo relevance feedbackp. 163
Global methods for query reformulationp. 173
References and further readingp. 177
XML retrievalp. 178
Basic XML conceptsp. 180
Challenges in XML retrievalp. 183
A vector space model for XML retrievalp. 188
Evaluation of XML retrievalp. 192
Text-centric versus data-centric XML retrievalp. 196
References and further readingp. 198
Probabilistic information retrievalp. 201
Review of basic probability theoryp. 202
The probability ranking principlep. 203
The binary independence modelp. 204
An appraisal and some extensionsp. 212
References and further readingp. 216
Language models for information retrievalp. 218
Language modelsp. 218
The query likelihood modelp. 223
Language modeling versus other approaches in information retrievalp. 229
Extended language modeling approachesp. 230
References and further readingp. 232
Text classification and Naive Bayesp. 234
The text classification problemp. 237
Naive Bayes text classificationp. 238
The Bernoulli modelp. 243
Properties of Naive Bayesp. 245
Feature selectionp. 251
Evaluation of text classificationp. 258
References and further readingp. 264
Vector space classificationp. 266
Document representations and measures of relatedness in vector spacesp. 267
Rocchio classificationp. 269
k nearest neighborp. 273
Linear versus nonlinear classifiersp. 277
Classification with more than two classesp. 281
The bias-variance tradeoffp. 284
References and further readingp. 291
Support vector machines and machine learning on documentsp. 293
Support vector machines: The linearly separable casep. 294
Extensions to the support vector machine modelp. 300
Issues in the classification of text documentsp. 307
Machine-learning methods in ad hoc information retrievalp. 314
References and further readingp. 318
Flat clusteringp. 321
Clustering in information retrievalp. 322
Problem statementp. 326
Evaluation of clusteringp. 327
K-meansp. 331
Model-based clusteringp. 338
References and further readingp. 343
Hierarchical clusteringp. 346
Hierarchical agglomerative clusteringp. 347
Single-link and complete-link clusteringp. 350
Group-average agglomerative clusteringp. 356
Centroid clusteringp. 358
Optimality of hierarchical agglomerative clusteringp. 360
Divisive clusteringp. 362
Cluster labelingp. 363
Implementation notesp. 365
References and further readingp. 367
Matrix decompositions and latent semantic indexingp. 369
Linear algebra reviewp. 369
Term-document matrices and singular value decompositionsp. 373
Low-rank approximationsp. 376
Latent semantic indexingp. 378
References and further readingp. 383
Web search basicsp. 385
Background and historyp. 385
Web characteristicsp. 387
Advertising as the economic modelp. 392
The search user experiencep. 395
Index size and estimationp. 396
Near-duplicates and shinglingp. 400
References and further readingp. 404
Web crawling and indexesp. 405
Overviewp. 405
Crawlingp. 406
Distributing indexesp. 415
Connectivity serversp. 416
References and further readingp. 419
Link analysisp. 421
The Web as a graphp. 422
PageRankp. 424
Hubs and authoritiesp. 433
References and further readingp. 439
Bibliographyp. 441
Indexp. 469
Table of Contents provided by Ingram. All Rights Reserved.

Rewards Program

Write a Review