
Beginning Algorithms (2006)
.pdf

16
String Searching
The problem of finding one string within another comes up quite often: Searching through files on disk, DNA searches, and even Google rely on strategies for efficiently searching through text. If you’ve ever used a word processor or text editor or even the editor used for writing code, you have at some stage or another performed a string search. You may know it as the Find function.
There are many string searching algorithms — and no doubt many more will be discovered over time — each with its own optimizations for handling specific types of data. Some algorithms work better for plain text, while others work better for text and/or patterns containing a lot of repetition, such as DNA fragments.
This chapter covers two algorithms for plain-text searching. We start with an obvious brute-force algorithm and move on to the more sophisticated Boyer-Moore. Each is described in detail, and then you will see how a relatively simple twist on the brute-force approach enables the BoyerMoore algorithm to perform significantly faster.
After reading this chapter you should be able to do the following:
Describe and implement a brute-force string searching algorithm
Describe and implement the Boyer-Moore string searching algorithm
Understand the performance characteristics of each algorithm
Describe and implement a generic string match iterator
Describe and implement a simple file searching application
A Generic String Searcher Interface
Because we want to be able to implement various types of string search algorithms and implement our own variations as the need arises, it will be useful to conceive an interface that remains the same no matter what type of underlying mechanism is used. Additionally, because all of the string searches will conform to a single API, we will be able to write a single suite of tests that can be applied to all of them in order to assert their correctness.







