Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
101
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Parentheses in Regular Expressions

Similarly, on Line 5, the final s of This is repeated in the initial s of sentence.

Because the search was conducted in a case-sensitive way (remember that the Match Case check box was checked), the word Fear, which is doubled as fear on Line 6, is not detected.

When you modify the pattern so that word-boundary metacharacters are included, the undesired matches of part words, such as the and theoretical, no longer match.

You could have made the beginning and the end of the first word and the beginning and the end of the second word explicit by using the following pattern:

\<([A-Za-z]+)\> +\<\1\>

However, the requirement for at least one space character after the first sequence of characters already achieves that. Either pattern will achieve the same results.

Exercises

Test your understanding of the material in this chapter using the following exercises:

1.Specify three patterns that will match the sequences of characters license and licence.

2.Find a solution that will identify the repeating of the word fear, irrespective of whether fear occurs at the beginning of a sentence. Assume that exactly one space character separates the two words.

193

8

Lookahead and Lookbehind

Chapter 7 looked briefly at back references, which allow a very specific form of coordinated testing or examination of related parts of test text, as you saw in the doubled words example in that chapter. A back reference allows you to test whether a sequence of characters has already occurred in the test text and use that previously occurring sequence of characters for some specified purpose. That is very helpful for a narrow range of uses, such as finding doubled words, but a more general form of awareness of preceding or following text allows the developer to express ideas such as “Match a word if it is preceded by a specified sequence of characters” or “Match a sequence of characters if it is followed by a specified sequence of characters.”

Matching a character sequence when you know what does or doesn’t follow or precede it allows you to get rid of many potentially undesired matches. This can be particularly useful when you have to process large amounts of data and the risk of undesired matches in a search-and-replace operation is significant.

Matching using patterns to implement such problem definitions enables matching to be carried out in a way that depends on the context of words or sequences of characters that are of interest.

The term lookaround is sometimes used to refer to both lookahead and lookbehind.

Lookahead and lookbehind are each split into positive and negative types. So, lookaround consists of positive lookahead, negative lookahead, positive lookbehind, and negative lookbehind.

In this chapter, you will learn the following:

What type of situations might benefit from lookahead and lookbehind

How to use positive and negative lookahead

How to use positive and negative lookbehind