Draft of April 1, 2009
Online edition (c) 2009 Cambridge UP
Online edition (c) 2009 Cambridge UP
An Introduction to Information Retrieval
Christopher D. Manning Prabhakar Raghavan Hinrich Schütze
Cambridge University Press Cambridge, England
Online edition (c) 2009 Cambridge UP
DRAFT! DO NOT DISTRIBUTE WITHOUT PRIOR PERMISSION
© 2009 Cambridge University Press
By Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze Printed on April 1, 2009
Website: http://www.informationretrieval.org/ Comments, corrections, and other feedback most welcome at:
informationretrieval@yahoogroups.com
Online edition (c) 2009 Cambridge UP
DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.
v
Brief Contents
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 Boolean retrieval The term vocabulary and postings lists 19 Dictionaries and tolerant retrieval 49 Index construction 67 Index compression 85 Scoring, term weighting and the vector space model 109 Computing scores in a complete search system 135 Evaluation in information retrieval 151 Relevance feedback and query expansion 177 XML retrieval 195 Probabilistic information retrieval 219 Language models for information retrieval 237 Text classification and Naive Bayes 253 Vector space classification 289 Support vector machines and machine learning on documents Flat clustering 349 Hierarchical clustering 377 Matrix decompositions and latent semantic indexing 403 Web search basics 421 Web crawling and indexes 443 Link analysis 461
319
Online edition (c) 2009 Cambridge UP
Online edition (c) 2009 Cambridge UP
DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.
vii
Contents
List of Tables List of Figures Table of Notation Preface xxxi
xv xix xxvii
1 Boolean retrieval 1.1 1.2 1.3 1.4 1.5
1
An example information retrieval problem 3 A first take at building an inverted index
References: Online edition (c) 2009 Cambridge UP DRAFT! © April 1, 2009 Cambridge University Press DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.