Fast and flexible word searching on compressed text

Article Properties
  • Language
    English
  • Publication Date
    2000/04/01
  • Indian UGC (Journal)
  • Refrences
    42
  • Citations
    79
  • Edleno Silva de Moura Univ. Federal de Minas Gerais, Belo Horizonte, Brazil
  • Gonzalo Navarro Univ. de Chile, Santiago, Chile
  • Nivio Ziviani Univ. Federal de Minas Gerais, Belo Horizonte, Brazil
  • Ricardo Baeza-Yates Univ. de Chile, Santiago, Chile
Abstract
Cite
Silva de Moura, Edleno, et al. “Fast and Flexible Word Searching on Compressed Text”. ACM Transactions on Information Systems, vol. 18, no. 2, 2000, pp. 113-39, https://doi.org/10.1145/348751.348754.
Silva de Moura, E., Navarro, G., Ziviani, N., & Baeza-Yates, R. (2000). Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2), 113-139. https://doi.org/10.1145/348751.348754
Silva de Moura E, Navarro G, Ziviani N, Baeza-Yates R. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems. 2000;18(2):113-39.
Journal Categories
Science
Mathematics
Instruments and machines
Electronic computers
Computer science
Science
Science (General)
Cybernetics
Information theory
Technology
Electrical engineering
Electronics
Nuclear engineering
Telecommunication
Technology
Technology (General)
Industrial engineering
Management engineering
Information technology
Description

Need faster text searches on compressed data? This paper introduces a fast compression technique for natural language texts, in which decompression of arbitrary portions of the text can be done very efficiently, direct search for words and phrases is enabled and approximate search can also be done efficiently without any decoding. The experiments show that running our algorithms on a compressed text is twice as fast as running the best existing software on the uncompressed version of the same text. The main content shows that the compression scheme uses a semistatic word-based model and a Huffman code where the coding alphabet is byte-oriented rather than bit-oriented. We compress typical English texts to about 30% of their original size, against 40% and 35% for *Compress* and *Gzip*, respectively. When searching for complex or approximate patterns, our algorithms are up to 8 times faster than the search on uncompressed text. We present three algorithms to search the compressed text. This can be used to keep the text compressed all the time, decompressing only for displaying purposes.

Published in ACM Transactions on Information Systems, this paper fits within the journal's focus on information retrieval and data management techniques. The research on fast and flexible word searching in compressed text enhances efficiency in information processing, aligning with the journal's core interests.

Refrences
Citations
Citations Analysis
The first research to cite this article was titled Compression: a key for next-generation text retrieval systems and was published in 2000. The most recent citation comes from a 2024 study titled Compression: a key for next-generation text retrieval systems . This article reached its peak citation in 2016 , with 9 citations.It has been cited in 41 different journals, 7% of which are open access. Among related journals, the Information Processing & Management cited this research the most, with 10 citations. The chart below illustrates the annual citation trends for this article.
Citations used this article by year