Self-indexing inverted files for fast text retrieval

Article Properties

Language

English
DOI (url)

10.1145/237496.237497
Publication Date

1996/10/01
Journal

ACM Transactions on Information Systems
Indian UGC (Journal)
Refrences

35
Citations

91
Alistair Moffat Univ. of Melbourne, Victoria, Australia
Justin Zobel RMIT, Victoria, Australia

Abstract

Cite

Moffat, Alistair, and Justin Zobel. “Self-Indexing Inverted Files for Fast Text Retrieval”. ACM Transactions on Information Systems, vol. 14, no. 4, 1996, pp. 349-7, https://doi.org/10.1145/237496.237497.

Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4), 349-379. https://doi.org/10.1145/237496.237497

Moffat A, Zobel J. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems. 1996;14(4):349-7.

Journal Categories

Science

Mathematics

Instruments and machines

Electronic computers

Computer science

Science

Science (General)

Cybernetics

Information theory

Technology

Electrical engineering

Electronics

Nuclear engineering

Telecommunication

Technology

Technology (General)

Industrial engineering

Management engineering

Information technology

Description

Want to speed up text retrieval? This research introduces a novel self-indexing strategy to enhance the efficiency of query processing on large text databases. The method involves incorporating an internal index into each compressed inverted list, reducing the need to scan the entire list during query retrieval. Experimental results on a collection of nearly two million short documents demonstrate that this self-indexing approach significantly reduces processing time for both conjunctive Boolean queries and ranked queries, adding only a small overhead to the compressed inverted file size. This strategy offers a practical way to improve the performance of text retrieval systems.

This paper, published in ACM Transactions on Information Systems, is well-suited for the journal’s focus on information retrieval, database systems, and related areas of computer science. The proposed self-indexing strategy directly addresses the challenge of efficient query processing in large text collections, which is a key topic for the journal's readership. The emphasis on practical implementation and experimental evaluation further enhances the paper's value to the information systems community.

Refrences

Citations

Citations Analysis

Category	Category Repetition
Science: Science (General): Cybernetics: Information theory	54
Science: Mathematics: Instruments and machines: Electronic computers. Computer science	51
Technology: Technology (General): Industrial engineering. Management engineering: Information technology	35
Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication	35
Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software	21

The first research to cite this article was titled Memory efficient ranking and was published in 1994. The most recent citation comes from a 2023 study titled Memory efficient ranking . This article reached its peak citation in 2012 , with 10 citations.It has been cited in 44 different journals, 2% of which are open access. Among related journals, the Information Processing & Management cited this research the most, with 14 citations. The chart below illustrates the annual citation trends for this article.