The Apostolate

Evaluating Ranked Queries in Limited Time and Memory for Information Retrieval for Distributed Digital Libraries
June Boltzis
LILAC Centre, School of Library Studies Clyde College, Elgin, Australia
Abstract
Ranking techniques are used to evaluate natural-language queries on text databases. Text databases are an important component of digital libraries. Effective ranking can be costly in memory and time: the database may contain millions of documents and queries can contain large numbers of terms. These information retrieval systems must access large volumes of text, often divided into several collections that may be held on separate machines. In many environments, such as current desktop computers, standard CPU speeds and volumes of mem- ory are more than adequate to rapidly resolve queries, even on databases of many gigabytes of text. Techniques for locating answers to queries must therefore consider identification of probable collections as well as identification of documents that are probable answers, to avoid the situation in which all queries must be answered in full by all servers. In other environ- ments, however, both memory and time are limited: examples include Internet search engines, corporate data servers, online product databases, and, at the other extreme, handheld com- puters with PCIMIA-slot disk drives. In this paper we show that use of centralised blocked indexes, expressly designed for a multi-collection environment, can meet these objectives and simultaneously reduce overall query processing costs.
1 Introduction
The use of information retrieval systems for management of text data is widespread, and their use is likely to accelerate with the advent of the digital library. All of these techniques reduce the time or memory required to resolve a query. Newspaper archives, library catalogues, and legislation repositories all require access by record content if they are to be useful and effective. However, they do not necessarily bound it.

References: [BCW90] T. C. Bell, J. G. Cleary, and I. H. Witten. Text Compression. Prentice-Hall, Englewood Cliffs, New Jersey, 1990. [Dat83] C. J. Date. An Introduction to Database Systems, volume II. Addison-Wesley, Massachusetts, 1983. [FBY92] W. B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992. [GGM95] L. Gravano and J. H. Garcia-Molina. Generalising GlOSS to vector-space databases and broker hierarchies. In Proc. Int. Conf. on Very Large Databases, Zurich, Switzerland, 1995. [OV91] M. T. O ̈ zsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall, New Jersey, 1991. [PZSD96] M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Jour. of the American Society for Information Science, 47(10):749–764, 1996. [Sal89] G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989. [VGJL94] E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The collection fusion problem. In D. K. Harman, editor, Proc. Text Retrieval Conf. (TREC), pages 95–104, Gaithersburg, Maryland, 1994. NIST Special Publication 500-225. [vR79] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, second edition, 1979. [WMB99] I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images [ZMR98] J. Zobel, A. Moffat, and K. Ramamohanarao. Inverted files versus signature files for text indexing. ACM Transactions on Database Systems, 23(4):453–490, 1998.

The Apostolate

You May Also Find These Documents Helpful

ECET 370 Week 5 Lab 5

ECET 370 Week 5 Lab 5

Nt1310 Unit 3 Study Essay

Nt1310 Unit 3 Study Essay

Dbm 381 Week 1 Individual

Dbm 381 Week 1 Individual

MSCD600 Course Project

MSCD600 Course Project

COMM 111 study guide midterm

COMM 111 study guide midterm

ITEC 610 Assingement 1

ITEC 610 Assingement 1

Digitizing Accounts Payable Documents

Digitizing Accounts Payable Documents

Week 6 Discussion 2

Week 6 Discussion 2

Literature Search

Literature Search

Database Environment

Database Environment

Acoutability

Acoutability

Assignment 2- Database Modeling and Normalization

Assignment 2- Database Modeling and Normalization

Product Development

Product Development

Idars

Idars

Image Retrieval Using Ann

Image Retrieval Using Ann

Related Topics