Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
256
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 14

One rather novel way in which ternary search trees can be used is for solving just such a problem: finding all the words that match a given pattern. A pattern is made up of a combination of regular word letters — a to z — and a special wildcard that matches anything. In the example just shown, you have used the hyphen (-) as the wildcard, but you could just as easily use a period (.) or even a question mark (?).

The important thing is to choose something that won’t appear as part of a normal word.

You should already be familiar with how basic searching in a ternary search tree works. Possibly the simplest way to perform a pattern match would be to use brute force: Take the pattern and construct a search word by substituting the wildcards with every possible combination of letters. Therefore, given our previous example, you might start with “aaraaat” followed by “aaraabt” and then “aaraact,” and so on, all the way up to “azrzzzt.” While this would work, it would be extremely slow, with a large proportion of obviously fruitless searches. Instead, you can take a more sophisticated and efficient approach by using the structure of the ternary search tree to your advantage.

Pattern matching is similar to a straight word search except that anytime you come across a wildcard, rather than look for a node with a matching letter (you won’t find one), you instead visit each node as if it was a match.

Imagine you are looking for the pattern “-a-” in the tree shown in Figure 14-1. Just like a regular word search, you start at the root node. In this case, though, the first character is a wildcard, so you visit each node at the current level in sorted order. Figure 14-13 shows the search beginning at the smallest node, a.

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-13: A wildcard forces a visit to each node at the current level, starting with the smallest.

Because you are searching for a wildcard, you “pretend” that each node at the current level is a match, so you proceed to match the next letter in the pattern, starting at the child of the current node.

Figure 14-14 shows that, in this instance, the next pattern character — a — fails to match the child node — p — so this branch of the tree is ignored completely.

354

Ternary Search Trees

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-14: A nonmatching character terminates the search for the current branch.

Having decided there can be no possible matches down this path, the search goes back to the previous level to continue searching at the next largest node (see Figure 14-15).

 

 

c

 

ax

 

u

m

px

b

p

a

ex

a

 

p

 

t

 

n

Figure 14-15: The wildcard search continues at the higher level, visiting the next largest node.

Again, because you are looking for a wildcard, you assume a match has been found and proceed to the next letter in the pattern, starting at the child node as shown in Figure 14-16.

This time, the “a” from the pattern matches the “a” in the tree, but there is still more pattern to match, so you continue searching at the child node for the next letter (see Figure 14-17).

355

Chapter 14

 

 

c

 

ax

 

u

m

px

b

p

a

ex

a

 

p

 

t

 

n

Figure 14-16: This time, you find a match with the next pattern character.

 

 

c

 

ax

 

u

m

px

b

p

a

ex

a

 

p

 

t

 

n

Figure 14-17: You find the first matching word.

Once again, you are looking for a wildcard, so all of the nodes will match, but you’ve now run out of pattern and have reached a word at the same time (see Figure 14-18) — you’ve found your first complete match: “bat.”

This process continues until all matching words have been found. Figure 14-19 shows all matching and nonmatching words in the tree.

Here you can see that three words matched the pattern “-a-”: “bat,” “man,” and “map.” Better still, you will find them all with only 11 character comparisons. Compare that with the brute-force approach that would have attempted to find all the words from “aaa” through “zaz.” In the latter case, there are 26 * 26 = 676 possible combinations, meaning you would need to perform at least that many character comparisons!

356

Ternary Search Trees

 

 

c

 

ax

 

u

m

px

b

p

a

ex

a

 

p

 

t

 

n

Figure 14-18: The search continues at the higher level with the next largest node.

 

 

cx

 

ax

 

ux

m

px

b

px

a

ex

a

 

p

 

t

 

n

Figure 14-19: The completed search showing the matching and nonmatching words.

Putting Ternar y Search Trees into Practice

Now that you understand the various ways in which ternary search trees can be used, it’s time to try your hand at creating a real one. As always, you will start by creating some test cases to ensure that your implementation works correctly. Then you’ll move on to creating the actual ternary search tree itself, and finally develop a simple application for helping you solve crossword puzzles.

357

Chapter 14

Try It Out

Testing a Ternary Search Tree

Create the aptly named TernarySearchTreeTest class as follows:

package com.wrox.algorithms.tstrees;

import com.wrox.algorithms.lists.LinkedList; import com.wrox.algorithms.lists.List; import junit.framework.TestCase;

public class TernarySearchTreeTest extends TestCase { private TernarySearchTree _tree;

protected void setUp() throws Exception { super.setUp();

_tree = new TernarySearchTree();

_tree.add(“prefabricate”); _tree.add(“presume”); _tree.add(“prejudice”); _tree.add(“preliminary”); _tree.add(“apple”); _tree.add(“ape”); _tree.add(“appeal”); _tree.add(“car”); _tree.add(“dog”); _tree.add(“cat”); _tree.add(“mouse”); _tree.add(“mince”); _tree.add(“minty”);

}

public void testContains() { assertTrue(_tree.contains(“prefabricate”)); assertTrue(_tree.contains(“presume”)); assertTrue(_tree.contains(“prejudice”)); assertTrue(_tree.contains(“preliminary”)); assertTrue(_tree.contains(“apple”)); assertTrue(_tree.contains(“ape”)); assertTrue(_tree.contains(“appeal”)); assertTrue(_tree.contains(“car”)); assertTrue(_tree.contains(“dog”)); assertTrue(_tree.contains(“cat”)); assertTrue(_tree.contains(“mouse”)); assertTrue(_tree.contains(“mince”)); assertTrue(_tree.contains(“minty”));

assertFalse(_tree.contains(“pre”)); assertFalse(_tree.contains(“dogs”)); assertFalse(_tree.contains(“UNKNOWN”));

}

public void testPrefixSearch() {

358

Ternary Search Trees

assertPrefixEquals(new String[] {“prefabricate”, “prejudice”, “preliminary”, “presume”}, “pre”);

assertPrefixEquals(new String[] {“ape”, “appeal”, “apple”}, “ap”);

}

public void testPatternMatch() {

assertPatternEquals(new String[] {“mince”, “mouse”}, “m???e”); assertPatternEquals(new String[] {“car”, “cat”}, “?a?”);

}

private void assertPrefixEquals(String[] expected, String prefix) { List words = new LinkedList();

_tree.prefixSearch(prefix, words);

assertEquals(expected, words);

}

private void assertPatternEquals(String[] expected, String pattern) { List words = new LinkedList();

_tree.patternMatch(pattern, words);

assertEquals(expected, words);

}

private void assertEquals(String[] expected, List actual) { assertEquals(expected.length, actual.size());

for (int i = 0; i < expected.length; ++i) { assertEquals(expected[i], actual.get(i));

}

}

}

How It Works

The TernarySearchTreeTest class holds an instance of a ternary search tree for use by each of the individual test cases and is initialized in setUp() by adding a number of words:

package com.wrox.algorithms.tstrees;

import com.wrox.algorithms.lists.LinkedList; import com.wrox.algorithms.lists.List; import junit.framework.TestCase;

public class TernarySearchTreeTest extends TestCase { private TernarySearchTree _tree;

protected void setUp() throws Exception { super.setUp();

_tree = new TernarySearchTree();

_tree.add(“prefabricate”);

359

Chapter 14

_tree.add(“presume”); _tree.add(“prejudice”); _tree.add(“preliminary”); _tree.add(“apple”); _tree.add(“ape”); _tree.add(“appeal”); _tree.add(“car”); _tree.add(“dog”); _tree.add(“cat”); _tree.add(“mouse”); _tree.add(“mince”); _tree.add(“minty”);

}

...

}

The method testContains() verifies that each word added in setUp() exists in the tree. In addition, you’ve checked for a few words that shouldn’t exist. Notice that the words have been chosen very carefully: The sequence of letters “pre” and “dog” will actually be found in the tree but only as prefixes to other words, so contains() should return false; “UNKNOWN” shouldn’t exist at all:

public void testContains() { assertTrue(_tree.contains(“prefabricate”)); assertTrue(_tree.contains(“presume”)); assertTrue(_tree.contains(“prejudice”)); assertTrue(_tree.contains(“preliminary”)); assertTrue(_tree.contains(“apple”)); assertTrue(_tree.contains(“ape”)); assertTrue(_tree.contains(“appeal”)); assertTrue(_tree.contains(“car”)); assertTrue(_tree.contains(“dog”)); assertTrue(_tree.contains(“cat”)); assertTrue(_tree.contains(“mouse”)); assertTrue(_tree.contains(“mince”)); assertTrue(_tree.contains(“minty”));

assertFalse(_tree.contains(“pre”)); assertFalse(_tree.contains(“dogs”)); assertFalse(_tree.contains(“UNKNOWN”));

}

There are only two other publicly accessible methods in your ternary search tree implementation: one for finding words with a common prefix, and another for finding words matching a pattern. Both of these return a list of search results, so you created a simple method to help you verify whether the search results match those expected.

The custom assertEquals() compares an array of expected words to a list of words actually returned from a search, element by element. If the size and contents of the list match the array, then you can be confident that the search was successful:

private void assertEquals(String[] expected, List actual) { assertEquals(expected.length, actual.size());

360

Ternary Search Trees

for (int i = 0; i < expected.length; ++i) { assertEquals(expected[i], actual.get(i));

}

}

To test prefix searching, you created the method testPrefixSearch(). This method assembles a

list of expected values and a prefix and then delegates most of the work to yet another helper method,

assertPrefixEquals():

public void testPrefixSearch() { assertPrefixEquals(

new String[] {“prefabricate”, “prejudice”, “preliminary”, “presume”}, “pre”);

assertPrefixEquals(

new String[] {“ape”, “appeal”, “apple”}, “ap”);

}

The method assertPrefixEquals() then creates a list to hold the results and calls the tree’s prefixSearch() method to populate the list. The expected and actual results are then passed to your custom assertEquals() method for validation:

private void assertPrefixEquals(String[] expected, String prefix) { List words = new LinkedList();

_tree.prefixSearch(prefix, words);

assertEquals(expected, words);

}

The method testPatternMatch() assembles an array of expected results along with a pattern, and delegates to another helper method, assertPatternEquals():

public void testPatternMatch() {

assertPatternEquals(new String[] {“mince”, “mouse”}, “m???e”);

assertPatternEquals(new String[] {“car”, “cat”}, “?a?”);

}

The method assertPatternEquals() calls patternMatch() on the tree and validates the results. Notice the use of the question mark (?) as the wildcard character. The choice of character is largely arbitrary, but a question mark can’t possibly appear in a word by mistake, and it looks very obvious that it means “something, anything, goes here”:

private void assertPatternEquals(String[] expected, String pattern) { List words = new LinkedList();

_tree.patternMatch(pattern, words);

assertEquals(expected, words);

}

361

Chapter 14

Tests in place, in the next Try It Out section, you create the actual ternary search tree class.

Try It Out

Implementing a Ternary Search Tree

Create the TernarySearchTree class as follows:

package com.wrox.algorithms.tstrees;

import com.wrox.algorithms.lists.List;

public class TernarySearchTree {

public static final char WILDCARD = ‘?’; private Node _root;

public void add(CharSequence word) {

 

assert

word != null : “word can’t be null”;

assert

word.length() > 0 : “word

can’t be empty”;

Node node = insert(_root, word, 0); if (_root == null) {

_root = node;

}

}

public boolean contains(CharSequence word) { assert word != null : “word can’t be null”;

assert word.length() > 0 : “word can’t be empty”;

Node node = search(_root, word, 0); return node != null && node.isEndOfWord();

}

public void patternMatch(CharSequence pattern, List results) { assert pattern != null : “pattern can’t be null”;

assert pattern.length() > 0 : “pattern can’t be empty”; assert results != null : “results can’t be null”;

patternMatch(_root, pattern, 0, results);

}

public void prefixSearch(CharSequence prefix, List results) { assert prefix != null : “prefix can’t be null”;

assert prefix.length() > 0 : “prefix can’t be empty”;

inOrderTraversal(search(_root, prefix, 0), results);

}

private Node search(Node node, CharSequence word, int index) { assert word != null : “word can’t be null”;

if (node == null) { return null;

362

Ternary Search Trees

}

char c = word.charAt(index);

if (c == node.getChar()) {

if (index + 1 < word.length()) {

node = search(node.getChild(), word, index + 1);

}

} else if (c < node.getChar()) {

node = search(node.getSmaller(), word, index); } else {

node = search(node.getLarger(), word, index);

}

return node;

}

private Node insert(Node node, CharSequence word, int index) { assert word != null : “word can’t be null”;

char c = word.charAt(index);

if (node == null) { node = new Node(c);

}

if (c == node.getChar()) {

if (index + 1 < word.length()) { node.setChild(insert(node.getChild(), word, index + 1));

} else { node.setWord(word.toString());

}

}else if (c < node.getChar()) { node.setSmaller(insert(node.getSmaller(), word, index));

}else {

node.setLarger(insert(node.getLarger(), word, index));

}

return node;

}

private void patternMatch(Node node, CharSequence pattern, int index, List results) {

assert pattern != null : “pattern can’t be null”; assert results != null : “results can’t be null”;

if (node == null) { return;

}

char c = pattern.charAt(index);

if (c == WILDCARD || c < node.getChar()) { patternMatch(node.getSmaller(), pattern, index, results);

363