How can computers efficiently locate specific patterns within vast amounts of text? This classic paper describes a method for locating character strings embedded in text using regular expressions and discusses an implementation of this method in the form of a compiler. It tackles the fundamental challenge of pattern matching in computer science. The compiler accepts a regular expression as source language and produces an IBM 7094 program as object language. The resulting object program then accepts the text to be searched as input and generates a signal every time an embedded string in the text matches the regular expression. The paper presents examples, problems, and solutions, making it a foundational resource for understanding regular expression search algorithms. This work has had a lasting impact on computer science, influencing the development of numerous tools and techniques for text processing and data analysis.
This article, published in Communications of the ACM, directly aligns with the journal's mission to present innovative techniques and algorithms for computer programming. The description of a regular expression search algorithm and its implementation as a compiler is relevant to the journal's audience of computer scientists and software developers.