Preview

An Introduction to Information Retrieval

Satisfactory Essays
Open Document
Open Document
8670 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
An Introduction to Information Retrieval
An Introduction to Information Retrieval

Draft of April 1, 2009

Online edition (c) 2009 Cambridge UP

Online edition (c) 2009 Cambridge UP

An Introduction to Information Retrieval

Christopher D. Manning Prabhakar Raghavan Hinrich Schütze

Cambridge University Press Cambridge, England

Online edition (c) 2009 Cambridge UP

DRAFT! DO NOT DISTRIBUTE WITHOUT PRIOR PERMISSION

© 2009 Cambridge University Press
By Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze Printed on April 1, 2009

Website: http://www.informationretrieval.org/ Comments, corrections, and other feedback most welcome at:

informationretrieval@yahoogroups.com

Online edition (c) 2009 Cambridge UP

DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.

v

Brief Contents

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

1 Boolean retrieval The term vocabulary and postings lists 19 Dictionaries and tolerant retrieval 49 Index construction 67 Index compression 85 Scoring, term weighting and the vector space model 109 Computing scores in a complete search system 135 Evaluation in information retrieval 151 Relevance feedback and query expansion 177 XML retrieval 195 Probabilistic information retrieval 219 Language models for information retrieval 237 Text classification and Naive Bayes 253 Vector space classification 289 Support vector machines and machine learning on documents Flat clustering 349 Hierarchical clustering 377 Matrix decompositions and latent semantic indexing 403 Web search basics 421 Web crawling and indexes 443 Link analysis 461

319

Online edition (c) 2009 Cambridge UP

Online edition (c) 2009 Cambridge UP

DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.

vii

Contents

List of Tables List of Figures Table of Notation Preface xxxi

xv xix xxvii

1 Boolean retrieval 1.1 1.2 1.3 1.4 1.5

1

An example information retrieval problem 3 A first take at building an inverted index



References: Online edition (c) 2009 Cambridge UP DRAFT! © April 1, 2009 Cambridge University Press DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome.

You May Also Find These Documents Helpful

  • Powerful Essays

    Ornge

    • 21055 Words
    • 85 Pages

    5 6 7 10 10 11 12 14 15 16 17 19 20 20 22 25 25 26 26…

    • 21055 Words
    • 85 Pages
    Powerful Essays
  • Powerful Essays

    Question/ Problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23…

    • 11014 Words
    • 45 Pages
    Powerful Essays
  • Powerful Essays

    Kudler

    • 5465 Words
    • 22 Pages

    2 3 3 3 4 4 5 5 5 5 6 6 7 7 7 8 8 8 9 9 9 10 10 11 11 11 12 12 12 12 13 13 14 15 16 17 18 19 20 21…

    • 5465 Words
    • 22 Pages
    Powerful Essays
  • Satisfactory Essays

    Maths Higher Tier Paper

    • 1948 Words
    • 8 Pages

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19…

    • 1948 Words
    • 8 Pages
    Satisfactory Essays
  • Good Essays

    Medical Terminology

    • 939 Words
    • 10 Pages

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25…

    • 939 Words
    • 10 Pages
    Good Essays
  • Powerful Essays

    1 1-2 2-3 3 4 4-5 5-6 7-8 8 9 9-10 10-11 11 12 12-13 13 14 14-15 15-16 16-17 17…

    • 5433 Words
    • 22 Pages
    Powerful Essays
  • Powerful Essays

    Level 3 Err

    • 9137 Words
    • 37 Pages

    3 4 5 5 8 11 12 13 16 18 19 20 21 21 21 22 22 23 24 25 25 25 26 28 29 29 31 34 38…

    • 9137 Words
    • 37 Pages
    Powerful Essays
  • Satisfactory Essays

    Writ-of-Praecipe

    • 507 Words
    • 3 Pages

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28…

    • 507 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Fedex

    • 10935 Words
    • 44 Pages

    2 3 5 8 9 10 11 13 14 16 17 19 20 21 22 24 25 26 27 29 29 30…

    • 10935 Words
    • 44 Pages
    Good Essays
  • Powerful Essays

    Woodside

    • 6799 Words
    • 28 Pages

    1 3 3 5 5 5 7 7 7 9 10 10 10 10 11 11 12 12 13 15 15 15 16…

    • 6799 Words
    • 28 Pages
    Powerful Essays
  • Powerful Essays

    Williams once called “moderately plain speech.” So while philosophy has a technical vocabulary, doing philosophy means more than…

    • 159399 Words
    • 638 Pages
    Powerful Essays
  • Powerful Essays

    Hmv Uk 7ps

    • 1100 Words
    • 5 Pages

    The attached article discussed the risks and rewards of China going global. Critically analyse China’s international expansion. Also recommend what China can do to generate further economic growth for both its local economy and discuss how this can contribute to the global economic growth.…

    • 1100 Words
    • 5 Pages
    Powerful Essays
  • Better Essays

    [16] A. Moschitti. A study on optimal paramter tuning for Rocchio text classifier. In Proceedings of the European Conference on Information Retrieval, Pisa, Italy, 2003. [17] K. Papineni. Why inverse document frequency? In NAACL ’01: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, pages 1–8, Morristown, NJ, USA, 2001. Association for Computational Linguistics. [18] J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Research and Development in Information Retrieval, pages 275–281, 1998. [19] M. Radovanovic and M. Ivanovic. Document representations for classification of short web-page descriptions. In DaWaK, pages 544–553, 2006. [20] R. Robertson and K. Sparck-Jones. Simple, proven approaches to text retrieval. Technical report, 1997. [21] S. Robertson. Understanding inverse document frequency: on theoretical arguments for idf. Journal of Documentation, 5:503–520, 2004. [22] M. Sahami. Learning limited dependence bayesian classifiers. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, pages 335–338, 1996. [23] K. Schneider. A new feature selection score for multinomial naive bayes text classification based on kl-divergence. In The Companion Volume to the Proceedings of 42st Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 2004. [24] H. Schutze, D. A. Hull, and J. O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington, 1995. [25] K. Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21, 1972. [26] S. Tan, X. Cheng, M. M. Ghanem, B. Wang, and H. Xu. A novel refinement approach for text categorization. In CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 469–476, Bremen, Germany, 2005. [27] V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. [28] Y. Yang and X. Liu. A reexamination of text categorization methods. In Proceedings of the 22nd ACM SIGIR Conference on Research and Development in Information Retrieval, 1999. [29] Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning, Nashville, US, 1997.…

    • 6409 Words
    • 26 Pages
    Better Essays
  • Powerful Essays

    Information Retrieval

    • 2475 Words
    • 10 Pages

    References: [1] Hui Hui Wang, Dzulkifli Mohamad, N.A Ismail, Image Retrieval: Techniques, Challenge, and Trend, World Academy of Science, Engineering and Technology, 2009. [2] MEI KOBAYASHI and KOICHI TAKEDA, Information Retrieval on the Web, ACM, 2000. [3] Keon Stevenson, Clement Leung, Comparative Evaluation of Web Image Search Engines for Multimedia Applications, IEEE, 2005. [4] Bo Luo, Xiaogang Wang, and Xiaoou Tang, A World Wide Web Based Image Search Engine Using Text and Image Content Features, SPIE Vol. 5018, 2003. [5] Wei-Hao Lin, Rong Jin, Alexander Hauptmann, Web Image Retrieval Re-Ranking with Relevance Model, IEEE, 2003. [6] Wouter Mettrop, Paul Nieuwnhuysen, Internet Search Engines Fluctuations in document Accessibility, Journal of Documentation, 2001. [7] Huan Wang, Song Liu, and Liang-Tien Chia, Does Ontology Help in Image Retrieval? — A Comparison between Keyword, Text Ontology and Multi-Modality Ontology Approaches, ACM, 2006. [8] Zhi-Hua Zhou, KE-JIA Chen, and Hong-Bin DAI, Enhancing Relevance Feedback in Image Retrieval Using Unlabeled Data, ACM, 2006.…

    • 2475 Words
    • 10 Pages
    Powerful Essays
  • Powerful Essays

    Information retrieval is a science related to documents and information searching. Information retrieval deals with the storage and representation of knowledge and the retrieval of information relevant to a specific user problem (Mandhl, 2007). Information retrieval technology mostly used in universities and public library to help students or information users to access to books, journals and other information resources that they needed. It also applies at organizations which having large collection of documents or information. In “The Seven Ages of Information Retrieval” article written by Michael Lesk, mentions the idea of information retrieval starting popular in year 1945. However, the use of information retrieval was started in 1880, where Herman Hollerith invents the recording of data on a machine readable medium (Wikipedia). This article has made good contribution to the field by describing the history of the IR systems from 1945 to 1996 with abundant information on the various technologies developed, IR systems built, and how they affected the research in IR. Based on his article, I observed and learned three important elements of information retrieval.…

    • 1290 Words
    • 6 Pages
    Powerful Essays

Related Topics