Fast and flexible word searching on compressed text

Article Properties

Language

English
DOI (url)

10.1145/348751.348754
Publication Date

2000/04/01
Journal

ACM Transactions on Information Systems
Indian UGC (Journal)
Refrences

42
Citations

79
Edleno Silva de Moura Univ. Federal de Minas Gerais, Belo Horizonte, Brazil
Gonzalo Navarro Univ. de Chile, Santiago, Chile
Nivio Ziviani Univ. Federal de Minas Gerais, Belo Horizonte, Brazil
Ricardo Baeza-Yates Univ. de Chile, Santiago, Chile

Abstract

Cite

Silva de Moura, Edleno, et al. “Fast and Flexible Word Searching on Compressed Text”. ACM Transactions on Information Systems, vol. 18, no. 2, 2000, pp. 113-39, https://doi.org/10.1145/348751.348754.

Silva de Moura, E., Navarro, G., Ziviani, N., & Baeza-Yates, R. (2000). Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2), 113-139. https://doi.org/10.1145/348751.348754

Silva de Moura E, Navarro G, Ziviani N, Baeza-Yates R. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems. 2000;18(2):113-39.

Journal Categories

Science

Mathematics

Instruments and machines

Electronic computers

Computer science

Science

Science (General)

Cybernetics

Information theory

Technology

Electrical engineering

Electronics

Nuclear engineering

Telecommunication

Technology

Technology (General)

Industrial engineering

Management engineering

Information technology

Description

Need faster text searches on compressed data? This paper introduces a fast compression technique for natural language texts, in which decompression of arbitrary portions of the text can be done very efficiently, direct search for words and phrases is enabled and approximate search can also be done efficiently without any decoding. The experiments show that running our algorithms on a compressed text is twice as fast as running the best existing software on the uncompressed version of the same text. The main content shows that the compression scheme uses a semistatic word-based model and a Huffman code where the coding alphabet is byte-oriented rather than bit-oriented. We compress typical English texts to about 30% of their original size, against 40% and 35% for *Compress* and *Gzip*, respectively. When searching for complex or approximate patterns, our algorithms are up to 8 times faster than the search on uncompressed text. We present three algorithms to search the compressed text. This can be used to keep the text compressed all the time, decompressing only for displaying purposes.

Published in ACM Transactions on Information Systems, this paper fits within the journal's focus on information retrieval and data management techniques. The research on fast and flexible word searching in compressed text enhances efficiency in information processing, aligning with the journal's core interests.

Category	Category Repetition
Science: Mathematics: Instruments and machines: Electronic computers. Computer science	45
Science: Science (General): Cybernetics: Information theory	37
Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication	33
Technology: Technology (General): Industrial engineering. Management engineering: Information technology	30
Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware	18

Fast and flexible word searching on compressed text

Article Properties

Abstract

Cite

Journal Categories

You May Also Like

Description

Refrences

Citations

Citations Analysis

Citations used this article by year

Database	Last update
UGC	December 2024
DOAJ	December 2024
Crossref	May 2024