Needle in a haystack? This research presents a novel algorithm designed for rapid character string searching. The algorithm efficiently locates the first instance of a pattern string within a larger string by initiating matches from the pattern's final character. This approach enables the algorithm to make significant jumps through the text, often without inspecting all initial characters. The average number of inspected characters decreases with the length of the pattern. For a random English pattern of length 5, only a quarter of the characters in the string need inspection before finding a match. Implemented to execute fewer than i + patlen machine instructions on average, this algorithm offers substantial speed improvements. Supported by empirical evidence and theoretical analysis, the research provides a practical solution for enhancing text processing efficiency in computer science applications.
Published in Communications of the ACM, this paper is relevant to the journal's focus on computer science and computer software. By presenting a fast and efficient string searching algorithm, this research contributes to the ongoing development of core computing techniques.