Can sacrificing some accuracy improve hash coding efficiency? This paper explores the space/time trade-offs in hash coding, specifically addressing the problem of testing messages for membership in a given set. It introduces two new hash-coding methods and compares them with a conventional method, focusing on hash area size (space), reject time, and allowable error frequency. The new methods aim to reduce the space needed for hash-coded information by tolerating a small fraction of errors. This approach is particularly relevant in applications with large datasets where a core-resident hash area isn't feasible using traditional methods. Performance can be improved by using a smaller core-resident hash area combined with a secondary test to catch errors. Analysis reveals that allowing a small number of test messages to be falsely identified as members of the given set permits a much smaller hash area to be used without increasing reject time. This trade-off can significantly improve overall performance in space-constrained applications, opening avenues for more efficient data management.
Published in Communications of the ACM, this paper aligns with the journal's focus on innovative approaches to computer science problems. By exploring space/time trade-offs in hash coding and introducing new methods to improve efficiency, the study contributes to the advancement of data management techniques relevant to the journal's readership.