Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Программирование

Файл:

Beginning Regular Expressions 2005.pdf

Скачиваний:

101

Добавлен:

17.08.2013

Размер:

25.42 Mб

Скачать

☆

<<< < Предыдущая 139 140 141 142 143 144 145 146 147 148 149 150151 / 169151 152 153 154 155 156 157 158 159 160 161 162 163 > Следующая >>>

Regular Expressions in Java

Then the value of the foundOrNot variable is tested as the condition controlling an if statement. If it is not true, the message No match found. is displayed:

if(!foundOrNot){ System.out.println(“No match found.”);

}

Finally, the tidyUp() method tidies up.

The pattern used is defined in the file Pattern.txt:

\d\w

The pattern matches a numeric digit followed by a word character (meaning an alphabetic character of either case, an underline character, or a numeric digit).

The test string is located in the file TestText.txt:

3D 2A 5R

There are three matches for the pattern \d\w: 3D, 2A, and 5R.

The Properties (Fields) of the Pattern Class

The following table summarizes information about the properties (fields) of the Pattern class.

Property (Field)	Description

CANON_EQ	Enables canonical equivalence when matching.
CASE_INSENSITIVE	Enables case-insensitive matching.
COMMENTS	Enables whitespace and comments to be included in the pattern.
DOTALL	With this flag set, the . (period) metacharacter matches all characters.
MULTILINE	Alters the behavior of the ^ (caret) and $ (dollar) positional
	metacharacters.
UNICODE_CASE	In this mode, case-insensitive matching is applied to all Unicode
	alphabetic characters (as appropriate).
UNIX_LINES	In this mode, only the \n line terminator affects the behavior of the
	. (period), ^ (caret), and $ (dollar) metacharacters.

The CASE_INSENSITIVE Flag

The CASE_INSENSITIVE flag applies only to U.S. ASCII characters. If you need case-insensitive matching to apply to other characters, you will likely need the UNICODE_CASE flag.

The CASE_INSENSITIVE flag can also be specified using the embedded flag expression (?i).

629

Chapter 25

Using the COMMENTS Flag

When the COMMENTS flag is set, it is possible to include whitespace in a regular expression pattern that is not matched against the test character sequence. In other words, whitespace included in a pattern is ignored, enabling the pattern (and the comments describing the meaning of the pattern’s components) to be displayed in a way that assists a human reader in reading and understanding it.

The # character is used at the beginning of a comment. All characters following the # character are ignored (as far as matching is concerned) by the regular expression engine.

Comments mode can also be enabled using the embedded flag expression (?x).

The following example shows how comments can be used when attempting to match a U.S. Zip code when the Pattern.COMMENTS flag is set.

Try It Out

Using the COMMENTS Flag

1.Type the following code into a text editor:

import java.util.regex.*;

public class MatchZipComments{

public static void main(String args[]) throws Exception{

String myTestString = “12345-1234 23456 45678 01234-1234”;

//Attempt to match US Zip codes.

//The pattern matches five numeric digits followed by a hyphen followed by four numeric digits.

String myRegex = “\\d{5} “ +

“# Matches five numeric digits” + “\n(-\\d{4})* “ +

“# Matches four numeric digits and a hyphen, all of which are optional”;

Pattern myPattern = Pattern.compile(myRegex, Pattern.COMMENTS);

Matcher myMatcher = myPattern.matcher(myTestString);

String myMatch = “”;

System.out.println(“The test string was ‘“ + myTestString + “‘.”); System.out.println(“The pattern was ‘\\d{5}-\\d{4}’.”);

while (myMatcher.find())

{

myMatch = myMatcher.group();

System.out.println(“A match ‘“ + myMatch + “‘was found.”); } // end while

if (myMatch == “”){

System.out.println(“There were no matches.”);

}// end if

}// end main()

}

630

Regular Expressions in Java

2.Save the code as MatchZipComments.java. To compile it at the command line, type javac MatchZipComments.java.

3.Run the code. At the command line, type java MatchZipComments, and inspect the results, as shown in Figure 25-4.

Figure 25-4

How It Works

The variable myTestString is assigned a string that contains four character sequences that could be U.S. Zip codes:

String myTestString = “12345-1234 23456 45678 01234-1234”;

Conventional Java comments can be used to indicate the purpose of the regular expression:

// Attempt to match US Zip codes.

Similarly, conventional Java comments can be used to specify how the pattern is constructed:

// The pattern matches five numeric digits followed by a hyphen followed by four

numeric digits.

The Pattern.COMMENTS flag is set in the following statement; therefore, the value of the myRegex variable can be written across several lines, with comments interwoven between the components of the regular expression pattern. Notice that the comments follow the # character:

String myRegex =
“\\d{5} “ +
“# Matches	five	numeric digits” +
“\n(-\\d{4})* “		+
“# Matches	four	numeric digits and a hyphen, all of which are optional”;

When the value of the variable myPattern is assigned the result of the Pattern class’s compile() method, the second argument of the compile() method, Pattern.COMMENTS, sets the COMMENTS flag. When the COMMENTS flag is set, whitespace inside the pattern is ignored, and characters from the # character to the next-line terminator character are treated as comments:

Pattern myPattern = Pattern.compile(myRegex, Pattern.COMMENTS);

Matching takes place against the myTestString variable using the myPattern object’s matcher() method:

Matcher myMatcher = myPattern.matcher(myTestString);

631

Chapter 25

There are four matches in the myTestString variable. Character sequences 12345-1234 and 01234-1234 match when the optional part of the pattern, (-\d{4})*, matches once; and 23456 and 45678 match when (-\d{4})* matches zero occurrences of the pattern.

The DOTALL Flag

By default, the . (period) metacharacter matches any character except a line terminator. In Java regular expressions, the term line terminator refers to those characters (or combinations of characters) specified in the following list. When the DOTALL flag is set, the . (period) metacharacter matches all characters, including line terminators:

\n — A newline (linefeed) character

\r\n — A carriage-return character followed immediately by a newline character

\r — A carriage return not followed by a newline character

\u0085 — A next-line character

\u2028 — A line-separator character

\u2029 — A paragraph-separator character

The DOTALL mode can also be specified using the embedded flag expression (?s).

The MULTILINE Flag

By default, the positional metacharacters ^ and $, respectively, match the position just before the first character in the test character sequence and the position just after the last character in the character sequence. When MULTILINE mode is specified, the ^ metacharacter matches the position just before the first character on each line, and the $ metacharacter matches the position just after the final character (ignoring line terminators) on each line.

The MULTILINE flag can also be specified using the embedded flag expression (?m).

The UNICODE_CASE Flag

The CASE_INSENSITIVE flag causes matching of U.S. ASCII characters to be carried out in a caseinsensitive way. To use case-insensitive matching with other characters, the UNICODE_CASE flag is set. It is likely that using the UNICODE_CASE flag will impose a performance penalty, so you should use it only when it is essential to the purpose of the regular expression.

The UNICODE_CASE flag can also be specified using the embedded flag expression (?u).

The UNIX_LINES Flag

The UNIX_LINES flag is set when you are dealing with multiline text originating from a Unix or related operating system where only the \n line terminator is used. Only \n is recognized as affecting the behavior of the . (period), ^ (caret), and $ (dollar) metacharacters.

The UNIX_LINES flag can also be specified using the embedded flag expression (?d).

632

<<< < Предыдущая 139 140 141 142 143 144 145 146 147 148 149 150151 / 169151 152 153 154 155 156 157 158 159 160 161 162 163 > Следующая >>>

Соседние файлы в предмете Программирование

#
17.08.20132.9 Mб60Beginning Perl Web Development - From Novice To Professional (2006).pdf
#
17.08.20138.05 Mб121Beginning Programming for Dummies 2004.pdf
#
17.08.201315.78 Mб177Beginning Python (2005).pdf
#
17.08.201313.91 Mб139Beginning Python - From Novice To Professional (2005).pdf
#
17.08.201318.51 Mб239Beginning REALbasic - From Novice To Professional (2006).pdf
#
17.08.201325.42 Mб101Beginning Regular Expressions 2005.pdf
#
17.08.20137.52 Mб29Beginning SharePoint With Excel - From Novice To Professional (2006).pdf
#
17.08.201325.54 Mб71Beginning Ubuntu Linux - From Novice To Professional (2006).pdf
#
17.08.201314.97 Mб224Beginning Visual Basic 2005 (2006).pdf
#
17.08.201321.25 Mб392Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006).pdf
#
17.08.201338.67 Mб39Blog Design Solutions (2006).pdf