Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
101
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

String, Line, and Word Boundaries

The $ Metacharacter

The ^ metacharacter allows you to be specific about where a matching sequence of characters occurs at the beginning of a file or the beginning of a line. The $ metacharacter provides complementary functionality in that it specifies matches in a sequence of characters that immediately precede the end of a line or a file.

First, look at a simple example that uses a test text containing a single line, Lathe.txt:

The tool to create round wooden or metal objects is the lathe

As you can see, the sequence of characters the occurs more than once in the sample text. The period that might naturally come at the end of the sample sentence has been omitted to illustrate the effect of the $ metacharacter. The following pattern should match only when the sequence of characters occurs immediately before the end of the test string:

the$

Try It Out The $ Metacharacter

This example demonstrates the use of the pattern the$:

1.Open PowerGrep, and check the Regular Expressions check box.

2.Enter the pattern the$ in the Search text box.

3.Enter C:\BRegExp\Ch06 in the Folder text box.

4.Enter Lathe.txt in the File Mask text box.

5.Click the Search button, and inspect the results displayed in the Results area, as shown in Figure 6-6.

Figure 6-6

6.

7.

Notice that there is only one match and that the sequence of characters The at the beginning of the line does not match nor does the word the, which precedes the word lathe.

Delete the $ metacharacter in the Search text box.

Click the Search button, and inspect the revised results in the Results area.

149

Chapter 6

Notice that with the $ metacharacter deleted the pattern now has three matches (not illustrated). The first is the The at the beginning of the test text. That matches because the default behavior in PowerGrep is a case-insensitive match. The second is the word the before the word lathe. The third is the character sequence the, which is contained in the word lathe.

How It Works

The default behavior of PowerGrep is case-insensitive matching. When the regular expression engine starts to match after Step 6, it starts at the position before the initial The. The regular expression engine attempts to match The and succeeds. Finally, the regular expression engine attempts to match the $ metacharacter against the position that follows the lowercase e in the test text. That position is not the end of the test string; therefore, the match fails. Because one component of the pattern fails to match, the whole pattern fails to match.

Attempted matching progresses through the test text. The first three characters of the pattern match when the regular expression engine is at the position immediately before the word the. However, as described earlier, the $ metacharacter fails to match; therefore, there is no match for the whole pattern.

However, when the regular expression engine reaches the position after the a of lathe and attempts to match, there is a match. The first character of the pattern, lowercase t, matches the next character, the lowercase t of lathe. The second character of the pattern, lowercase h, matches the h of lathe. The third character of the pattern, lowercase e, matches the lowercase e of lathe. The $ metacharacter of the pattern does match, because the e of lathe is the final character of the test string. Because all components of the pattern match, the whole pattern matches, and the character sequence the of lathe is highlighted as a match in Figure 6-6.

The $ Metacharacter in Multiline Mode

Like the ^ metacharacter, the $ metacharacter can have its behavior modified when it used in multiline mode. However, not all tools or languages support multiline mode for the $ metacharacter.

Tools or languages that support the $ metacharacter in multiline mode use the $ metacharacter to match the position immediately before a Unicode newline character. Some also match the position immediately before the end of the test string, but not all do, as you will see later.

The sample file, ArtMultiple.txt, is shown here:

A part for his car

Wisdom which he wants to impart

Leonardo da Vinci was a star of medieval art

At the start of the race there was a false start

Notice that to make the example a test of the $ metacharacter, the period that might be expected at the end of each sentence has been omitted.

150

String, Line, and Word Boundaries

Try It Out

The $ Metacharacter in Multiline Mode

This example demonstrates the use of the $ metacharacter with multiline mode:

1.Open PowerGrep, and check the Regular Expressions check box.

2.Enter the pattern art in the Search text box.

3.Enter the text C:\BRegExp\Ch06 in the Folder text box.

4.Enter the text ArtMultiple.txt in the File Mask text box.

5.Click the Search button, and inspect the results in the Results area, as shown in Figure 6-7. Notice that occurrences of the sequence of characters art are matched when they occur at the end of a line and at other positions — in this example, part in Line 1 and the first occurrence of start in Line 7.

Figure 6-7

6.Edit the regular expression pattern to add the $ metacharacter at the end, giving art$.

7.Click the Search button, and inspect the results in the Results area, as shown in Figure 6-8. Notice that the matches for the pattern art that were previously present in the words part in Line 1 and the first occurrence of start in Line 7 are no longer present, because they do not occur at the end of a line. The $ metacharacter means that matches must occur at the end of

a line.

151

Chapter 6

Figure 6-8

How It Works

When the regular expression pattern is simply the three literal characters art, any occurrence of those three literal characters is matched.

However, when the $ metacharacter is added to the pattern, the regular expression pattern engine must match the sequence of three literal characters art and must also match the position either immediately before a Unicode newline character or immediately before the end of the test string.

When an attempt is made to match art in part in the first line, the first three characters of the regular expression pattern match; however, the final $ metacharacter of the pattern art$ fails to match. Because a component of the pattern has failed to match, the entire pattern fails to match.

When the regular expression engine has reached a position immediately before the a of impart, it can match the first three characters of the pattern art$ successfully against, respectively, the a, r, and t of impart. Finally, an attempt is made to match the $ metacharacter against the position immediately following the t of impart. Because that position immediately precedes a Unicode newline character (that is it is the final position on that line), there is a match. Because all the components of the pattern match, the entire pattern matches.

When the regular expression engine has reached a position immediately before the a of the second start on the final line, it can match the first three characters of the pattern art$ successfully against, respectively, the a, r, and t of start. Finally, an attempt is made to match the $ metacharacter against the position immediately following the t of start. Because that position immediately precedes the end of the test string (that is, it is the final position of the test file), there is a match. Because all the components of the pattern match, the entire pattern matches.

152