Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
101
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Character Classes

Try It Out

Matching HTML Headers

1.Open PowerGrep, and enter the regular expression pattern <h[1-6]> in the Search: text box.

2.Enter C:\BRegExp\Ch05 in the Folder text box, assuming that you have saved the Chapter 5 files from the download in that directory.

3.Enter HTMLHeaders.txt in the File Mask text box.

4.Click the Search button, and inspect the results, as shown in Figure 5-17.

Figure 5-17

Metacharacter Meaning within Character Classes

Most, but not all, single characters have the same meaning inside a character class as they do outside.

The ^ metacharacter

The ^ metacharacter (also called a caret), when it is the first character after the left square bracket, indicates that any other cases specified inside the square brackets are not to be matched. The use of the ^ metacharacter is discussed in the section on negated character classes a little later.

If the ^ metacharacter occurs in any position inside square brackets other than the character that immediately follows the left square bracket, the ^ metacharacter has its literal meaning — that is, it matches the ^ character.

133

Chapter 5

A test file, Carets.txt, is shown here:

14^2 expresses the idea of 14 to the power 2.

The ^ character is called a caret.

The _ character is called an underscore or underline character.

3^2 = 9

Eating ^s helps you see in the dark. At least that’s what I think he said.

The problem definition can be expressed as follows:

Match any occurrence of the following characters: the underscore, the caret, or the numeric digit 3.

The character class to satisfy that problem definition is as follows:

[_^3]

Try It Out Using the ^ Inside a Character Class

This example matches the three characters mentioned in the preceding problem definition:

1.Open OpenOffice.org Writer, and open the test file Carets.txt.

2.Use the Ctrl+F keyboard shortcut to open the Find & Replace dialog box.

3.Check the Regular Expressions and Match Case check boxes, and enter the pattern [_^3] in the Search For text box.

4.Click the Find All button, and inspect the results, as shown in Figure 5-18.

5.Modify the regular expression pattern so that it reads [^_3].

6.Click the Find All button, and compare the results shown in Figure 5-19 with the previous results.

How It Works

When the pattern is [_^3], the meaning is simply a character class that matches three characters: the underscore, the caret, and the numeric digit 3.

When the ^ immediately follows the left square bracket, [, that creates a negated character class, which in this case has the meaning “Match any character except an underscore or the numeric digit 3.”

134

Character Classes

Figure 5-18

How to Use the - Metacharacter

You have already seen how the hyphen can be used to indicate a range inside a character class. The question therefore arises as to how you can specify a literal hyphen inside a character class.

The safest way is to use the hyphen as the first character after the left square bracket. In some tools, such as the Komodo Regular Expressions Toolkit, you can also use the hyphen as the character immediately before the right square bracket to match a hyphen. In OpenOffice.org Writer, for example, that doesn’t work.

135