Can space density computations optimize document indexing? This research proposes a new approach to automatic indexing for document retrieval and pattern matching, focusing on space density computations to improve indexing vocabulary. The model posits that optimal indexing occurs when entities are maximally separated in the indexing space, implying that indexing system value correlates inversely with object space density. An algorithm based on space density computations is then used to select an optimum indexing vocabulary for a collection of documents. Results demonstrating the model's usefulness are presented. These results have implications for information retrieval, search engine design, and text mining applications. The study offers valuable insights for optimizing indexing systems by considering the spatial distribution of documents and search requests within the indexing space.
This paper, published in Communications of the ACM, is relevant to the journal's focus on computer science, information systems, and communication technologies. The journal emphasizes effective communication. By addressing automatic indexing and vocabulary selection, the article aligns with the journal's themes of efficient information management and retrieval. Analyzing the citations and references could reveal its connections to other related works published in the ACM community.