Can computers truly understand language? This study addresses the issue of lexical ambiguity, where words can have multiple meanings, in the context of information retrieval (IR). The extent of lexical ambiguity in IR test collections was analyzed, and its impact on IR systems was investigated. The analysis was done to determine the utility of word meanings for separating relevant from nonrelevant documents. Experiments were conducted to determine the utility of word meanings for separating relevant from nonrelevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses provide a significant separation between relevant and nonrelevant documents. Resolving lexical ambiguity has little impact on retrieval effectiveness for documents that share many words with the query. Other uses of word sense disambiguation in an information retrieval context are discussed. The research contributes to the ongoing efforts to improve the relevance and accuracy of information retrieval systems by addressing a fundamental challenge in natural language processing.
This paper aligns with the aims of ACM Transactions on Information Systems, as it tackles the challenge of lexical ambiguity in information retrieval. By evaluating the impact of word meanings on the performance of IR systems, the study contributes to the journal's focus on the design, development, and evaluation of information systems, with a focus on improving information access and retrieval.