Preview

The Apostolate

Powerful Essays
Open Document
Open Document
8252 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
The Apostolate
Evaluating Ranked Queries in Limited Time and Memory for Information Retrieval for Distributed Digital Libraries
June Boltzis
LILAC Centre, School of Library Studies Clyde College, Elgin, Australia
Abstract
Ranking techniques are used to evaluate natural-language queries on text databases. Text databases are an important component of digital libraries. Effective ranking can be costly in memory and time: the database may contain millions of documents and queries can contain large numbers of terms. These information retrieval systems must access large volumes of text, often divided into several collections that may be held on separate machines. In many environments, such as current desktop computers, standard CPU speeds and volumes of mem- ory are more than adequate to rapidly resolve queries, even on databases of many gigabytes of text. Techniques for locating answers to queries must therefore consider identification of probable collections as well as identification of documents that are probable answers, to avoid the situation in which all queries must be answered in full by all servers. In other environ- ments, however, both memory and time are limited: examples include Internet search engines, corporate data servers, online product databases, and, at the other extreme, handheld com- puters with PCIMIA-slot disk drives. In this paper we show that use of centralised blocked indexes, expressly designed for a multi-collection environment, can meet these objectives and simultaneously reduce overall query processing costs.
1 Introduction
The use of information retrieval systems for management of text data is widespread, and their use is likely to accelerate with the advent of the digital library. All of these techniques reduce the time or memory required to resolve a query. Newspaper archives, library catalogues, and legislation repositories all require access by record content if they are to be useful and effective. However, they do not necessarily bound it.



References: [BCW90] T. C. Bell, J. G. Cleary, and I. H. Witten. Text Compression. Prentice-Hall, Englewood Cliffs, New Jersey, 1990. [Dat83] C. J. Date. An Introduction to Database Systems, volume II. Addison-Wesley, Massachusetts, 1983. [FBY92] W. B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992. [GGM95] L. Gravano and J. H. Garcia-Molina. Generalising GlOSS to vector-space databases and broker hierarchies. In Proc. Int. Conf. on Very Large Databases, Zurich, Switzerland, 1995. [OV91] M. T. O ̈ zsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall, New Jersey, 1991. [PZSD96] M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Jour. of the American Society for Information Science, 47(10):749–764, 1996. [Sal89] G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989. [VGJL94] E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The collection fusion problem. In D. K. Harman, editor, Proc. Text Retrieval Conf. (TREC), pages 95–104, Gaithersburg, Maryland, 1994. NIST Special Publication 500-225. [vR79] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. [WMB99] I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images [ZMR98] J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453–490, 1998.

You May Also Find These Documents Helpful

  • Good Essays

    ECET 370 Week 5 Lab 5

    • 650 Words
    • 3 Pages

    Exercise 1: Review of the Lecture Content Create a project using the ArrayList class and the Main class provided in DocSharing. The ArrayList class contains implementations of the first three search methods explained in this week's lecture: sequential, sorted, and binary search. The Main class uses these three methods. These programs test the code discussed in the lecture. Compile the project, run it, and review the code that is given carefully.…

    • 650 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Nt1310 Unit 3 Study Essay

    • 3921 Words
    • 16 Pages

    |Term-Document Matrix |A frequency matrix created from digitized and organized documents (the corpus) where the columns…

    • 3921 Words
    • 16 Pages
    Good Essays
  • Better Essays

    Dbm 381 Week 1 Individual

    • 971 Words
    • 4 Pages

    Pratt, P.J., & Adamski, J.J. (2012). Concepts of database management (7th ed.). Retrieved from The University of Phoenix eBook Collection database.…

    • 971 Words
    • 4 Pages
    Better Essays
  • Powerful Essays

    MSCD600 Course Project

    • 1578 Words
    • 18 Pages

    The salespersons have been recording the customer information in the past and even in the present through their personal manual efforts. With increasing customer Strength, managing information of each individual customer is indeed a cumbersome task in file systems .This project focuses on information retrieval, which is one of the foremost problems in manual systems. It is very difficult to gather the overall performance reports of the customer. It enables us with easy access to the customer and employee records with in no time .…

    • 1578 Words
    • 18 Pages
    Powerful Essays
  • Satisfactory Essays

    Periodical Database: A research aid that catalogues articles from a large number of journals or magazines…

    • 793 Words
    • 6 Pages
    Satisfactory Essays
  • Best Essays

    ITEC 610 Assingement 1

    • 1424 Words
    • 4 Pages

    Pardede, E. (2009) Open and Novel Issues in XML Database Applications:Future Directions and Advanced Technologies. Published by IGI Global. Chapter 1 and 2.…

    • 1424 Words
    • 4 Pages
    Best Essays
  • Good Essays

    When accounting files are sent to the archives at the end of the year, the portion taken up by the accounts payable documents usually exceeds that of all other documents combined. For some companies with high accounts payable files, it is a major expense to remove all the paperwork, box it up and identify it, and ship it off to a warehouse, from which it must be recalled occasionally for various tasks. Digitizing the documents is a means of avoiding the expense of archiving. Digitizing a document means that it is laid on a scanner that converts the document image into an electronic image stored in the computer database, which can be recalled by anyone with access to the database. To digitize a document, there should be a high-speed scanner available that is linked to a computer network. Documents are fed into the scanner and assigned one or more index numbers or codes, so that it will be easy to recall the correct documents from storage. For example, a document can be indexed by its purchase order number, date, or supplier number. A combination of several indexes is the best approach, since one can still recall a document, even if one does not remember the first index number. The document images are usually stored on an optical disk since it can hold enormous amounts of storage space (and digitized documents take up a lot of computer storage space). There will probably be many optical disks to provide a sufficient amount of storage, so the disks are usually stored in a “jukebox,” which gives the user access to all the data on all the storage disks. Users can then call up the images from any terminal that is linked to the network where the information is stored.…

    • 556 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Week 6 Discussion 2

    • 582 Words
    • 3 Pages

    Big data permeates every aspect of modern life. Not even the Library of Congress is beyond needing big data maintenance. In 2010, the library agreed to archive Twitter (Purcell, 2013). The need for big data management in this case is obvious. The challenge for the library lies in performing searches in the database. The ever-growing catalogue contains over 170 billion tweets; more than 130 terabytes of information (Purcell, 2013). Finding a needle in the data haystack of just one eighth of this behemoth currently takes over 24 hours (Purcell, 2013).…

    • 582 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Literature Search

    • 952 Words
    • 4 Pages

    1. Use a library database such as CINAHL Plus with full text for your search.…

    • 952 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Database Environment

    • 1121 Words
    • 5 Pages

    A database defines a structure for storing information and it collects information that is organized in such a way that a computer program can quickly select desired pieces of data. A database can also be thought of as an electronic filing system. Data and information are extracted from a database by creating a query and then submitting it to the query database management system (DBMS) and it is posed in a language that only the DBMS can understand. The query can be in the form of a question or just a keyword and once these queries run against the database, it will find a matching record (Reynolds, 2004) .…

    • 1121 Words
    • 5 Pages
    Powerful Essays
  • Satisfactory Essays

    Acoutability

    • 1405 Words
    • 3 Pages

    The importance of being on time accounted for is because lets say that you dont come back to formation after you go out on a convoy then they know that you are missing. If you do not show up and you do not respond on the radio there is going to possibly be a search team for you. If they do send a search team for then that puts that entire team at risk while looking for you. Point of accountability formation is to make sure all of your soldiers are there and that they are all accounted for. If one person is not accounted for then the entire formation does not leave. It is not just the fact that everyone is accounted for it is part of your military duty to be at formation and at movement. If you do not make it to formation or movement it is punishable by UCMJ. It could be the end of your carrer. Not only will it ruin everything that you have gone through and wasted your time but you are also letting your battels down and if you can not show up to formation what makes you think that they can trust you in being there in the time of need. If you can not be there then what is your team going to think when they hear over the radio that you are on the way and you are the only one that can help them. They are going to be like what that is the only person we have well we are al good as dead. Accountablility does not end in the army life it also goes into civilian world. When you have a job in the civilian world they are looking for someone that be on time and do the job but the second that you are not on time your accountabliity starts to go down the drain. Not only is accountability good tohave in the work place but at home. If your family can not count on you then who can they count on? Accountability is also important with your friends if they feel that they can count on you and for some reason you give them a reason to not count od you that could be a relationship killer. I just do not know how to put it into a essay how important accountability is. If you cannot be…

    • 1405 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    References: Coronel, C., Morris, S., & Rob, P. (2012). Database systems. (10th ed.). Independence, KY: Cengage.…

    • 782 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Product Development

    • 969 Words
    • 4 Pages

    References: Alcatel - Lucent | Company Overview. (2006 - 2010). Retrieved April 30, 2010, from Alcatel -…

    • 969 Words
    • 4 Pages
    Powerful Essays
  • Good Essays

    Idars

    • 4985 Words
    • 20 Pages

    Gartner RAS Core Research Note G00140780, Kenneth Chin, Toby Bell, 27 June 2006, R1997 06262007…

    • 4985 Words
    • 20 Pages
    Good Essays
  • Best Essays

    Image Retrieval Using Ann

    • 3358 Words
    • 14 Pages

    Previously the information was primarily text based. But with the rapid growth in the field of computer network and low cost permanent storage media, the shapes of information become more interactive. The people are accessing more multimedia files than the past. In past, images, videos and audio files were only used for the entertainment purpose but nowadays these are the major source of information. Because of intense dependency on multimedia files for information searching, to obtain a desired result is a major problem as the search engine searches within the text associated with the multimedia files, instead…

    • 3358 Words
    • 14 Pages
    Best Essays

Related Topics