Want to speed up text retrieval? This research introduces a novel self-indexing strategy to enhance the efficiency of query processing on large text databases. The method involves incorporating an internal index into each compressed inverted list, reducing the need to scan the entire list during query retrieval. Experimental results on a collection of nearly two million short documents demonstrate that this self-indexing approach significantly reduces processing time for both conjunctive Boolean queries and ranked queries, adding only a small overhead to the compressed inverted file size. This strategy offers a practical way to improve the performance of text retrieval systems.
This paper, published in ACM Transactions on Information Systems, is well-suited for the journal’s focus on information retrieval, database systems, and related areas of computer science. The proposed self-indexing strategy directly addresses the challenge of efficient query processing in large text collections, which is a key topic for the journal's readership. The emphasis on practical implementation and experimental evaluation further enhances the paper's value to the information systems community.