A vector space model for automatic indexing

Article Properties
  • Language
    English
  • Publication Date
    1975/11/01
  • Indian UGC (Journal)
  • Refrences
    7
  • Citations
    1,990
  • G. Salton Cornell Univ., Ithaca, NY
  • A. Wong Cornell Univ., Ithaca, NY
  • C. S. Yang Cornell Univ., Ithaca, NY
Abstract
Cite
Salton, G., et al. “A Vector Space Model for Automatic Indexing”. Communications of the ACM, vol. 18, no. 11, 1975, pp. 613-20, https://doi.org/10.1145/361219.361220.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. https://doi.org/10.1145/361219.361220
Salton G, Wong A, Yang CS. A vector space model for automatic indexing. Communications of the ACM. 1975;18(11):613-20.
Journal Categories
Science
Mathematics
Instruments and machines
Electronic computers
Computer science
Science
Mathematics
Instruments and machines
Electronic computers
Computer science
Computer software
Technology
Electrical engineering
Electronics
Nuclear engineering
Electronics
Computer engineering
Computer hardware
Description

Can space density computations optimize document indexing? This research proposes a new approach to automatic indexing for document retrieval and pattern matching, focusing on space density computations to improve indexing vocabulary. The model posits that optimal indexing occurs when entities are maximally separated in the indexing space, implying that indexing system value correlates inversely with object space density. An algorithm based on space density computations is then used to select an optimum indexing vocabulary for a collection of documents. Results demonstrating the model's usefulness are presented. These results have implications for information retrieval, search engine design, and text mining applications. The study offers valuable insights for optimizing indexing systems by considering the spatial distribution of documents and search requests within the indexing space.

This paper, published in Communications of the ACM, is relevant to the journal's focus on computer science, information systems, and communication technologies. The journal emphasizes effective communication. By addressing automatic indexing and vocabulary selection, the article aligns with the journal's themes of efficient information management and retrieval. Analyzing the citations and references could reveal its connections to other related works published in the ACM community.

Refrences
Citations
Citations Analysis
The first research to cite this article was titled A distance measure for automatic document classification by sequential analysis and was published in 1978. The most recent citation comes from a 2024 study titled A distance measure for automatic document classification by sequential analysis . This article reached its peak citation in 2019 , with 150 citations.It has been cited in 685 different journals, 14% of which are open access. Among related journals, the Expert Systems with Applications cited this research the most, with 64 citations. The chart below illustrates the annual citation trends for this article.
Citations used this article by year