Automated learning of decision rules for text categorization

Article Properties

Language

English
DOI (url)

10.1145/183422.183423
Publication Date

1994/07/01
Journal

ACM Transactions on Information Systems
Indian UGC (Journal)
Refrences

24
Citations

211
Chidanand Apté IBM T. J. Watson Research Center, Yorktown Heights, NY
Fred Damerau IBM T. J. Watson Research Center, Yorktown Heights, NY
Sholom M. Weiss Rutgers Univ., New Brunswick, NJ

Abstract

Cite

Apté, Chidanand, et al. “Automated Learning of Decision Rules for Text Categorization”. ACM Transactions on Information Systems, vol. 12, no. 3, 1994, pp. 233-51, https://doi.org/10.1145/183422.183423.

Apté, C., Damerau, F., & Weiss, S. M. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233-251. https://doi.org/10.1145/183422.183423

Apté C, Damerau F, Weiss SM. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems. 1994;12(3):233-51.

Journal Categories

Science

Mathematics

Instruments and machines

Electronic computers

Computer science

Science

Science (General)

Cybernetics

Information theory

Technology

Electrical engineering

Electronics

Nuclear engineering

Telecommunication

Technology

Technology (General)

Industrial engineering

Management engineering

Information technology

Description

Can machines learn to categorize text as well as humans? This study presents extensive experiments on automated rule-based induction methods for large document collections, aiming to discover classification patterns for document categorization and personalized filtering. The research demonstrates that machine-generated decision rules can achieve performance comparable to human-engineered systems, while using the same rule-based representation. Results on the Reuters collection benchmark reveal a significant performance gain compared to other machine-learning techniques, achieving an 80.5% recall/precision breakeven point, a substantial improvement over the previously reported 67%. The study also explores methodological alternatives, including universal versus local dictionaries and binary versus frequency-related features, in the context of high-dimensional feature spaces. This work highlights the potential of machine learning to automate text categorization tasks, reducing the need for extensive human involvement. These findings have implications for information retrieval, document management, and the development of intelligent systems.

Published in ACM Transactions on Information Systems, this research aligns with the journal's focus on information retrieval, text processing, and intelligent systems. By presenting an automated approach to text categorization, the study contributes to the advancement of information systems technologies and their applications, which is central to the journal's scope.

Refrences

Citations

Citations Analysis

Category	Category Repetition
Science: Mathematics: Instruments and machines: Electronic computers. Computer science	126
Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics	79
Technology: Mechanical engineering and machinery	74
Technology: Engineering (General). Civil engineering (General)	49
Science: Science (General): Cybernetics: Information theory	44

The first research to cite this article was titled Optimized rule induction and was published in 1993. The most recent citation comes from a 2024 study titled Optimized rule induction . This article reached its peak citation in 2022 , with 13 citations.It has been cited in 127 different journals, 7% of which are open access. Among related journals, the Expert Systems with Applications cited this research the most, with 12 citations. The chart below illustrates the annual citation trends for this article.