Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

AhmadLang / Java, How To Program, 2004

.pdf
Скачиваний:
626
Добавлен:
31.05.2015
Размер:
51.82 Mб
Скачать

Original String

1: This

sentence ends in 5

stars *****

 

 

^ substituted for *: This sentence ends in

5 stars

^^^^^

 

 

"carets" substituted for "stars": This sentence ends in

5 carets ^^^^^

Every word replaced by "word": word word word word

word

word ^^^^^

 

Original String

2: 1, 2, 3, 4, 5, 6, 7, 8

 

 

 

 

First 3 digits replaced

by "digit" : digit, digit,

digit, 4, 5, 6,

7, 8

String split at

commas:

["digit", "digit",

"digit",

"4",

"5", "6",

"7", "8"]

 

 

 

 

 

 

 

[Page 1384]

Method replaceAll replaces text in a string with new text (the second argument) wherever the original string matches a regular expression (the first argument). Line 14 replaces every instance of "*" in firstString with "^". Note that the regular expression ("\\*") precedes character * with two backslashes, \. Normally, * is a quantifier indicating that a regular expression should match any number of occurrences of a preceding pattern. However, in line 14, we want to find all occurrences of the literal character *to do this, we must escape character * with character \. By escaping a special regular-expression character with a \, we instruct the regular-expression matching engine to find the actual character, as opposed to what it represents in a regular expression. Since the expression is stored in a Java string and \ is a special character in Java strings, we must include an additional \. So the Java string "\\*" represents the regular-expression pattern \* which matches a single * character in the search string. In line 19, every match for the regular expression "stars" in firstString is replaced with "carets".

[Page 1385]

Method replaceFirst (line 32) replaces the first occurrence of a pattern match. Java Strings are immutable, therefore method replaceFirst returns a new string in which the appropriate characters have been replaced. This line takes the original string and replaces it with the string returned by replaceFirst. By iterating three times we replace the first three instances of a digit (\d) in secondString with the text

"digit".

Method split divides a string into several substrings. The original string is broken in any location that matches a specified regular expression. Method split returns an array of strings containing the substrings between matches for the regular expression. In line 38, we use method split to tokenize a string of comma-separated integers. The argument is the regular expression that locates the delimiter. In this case, we use the regular expression ",\\s*" to separate the substrings wherever a comma occurs. By matching any whitespace characters, we eliminate extra spaces from the resulting substrings. Note that the commas and whitespace are not returned as part of the substrings. Again, note that the Java string ",\\s*" represents the regular expression ,\s*.

Classes Pattern and Matcher

In addition to the regular-expression capabilities of class String, Java provides other classes in package java.util.regex that help developers manipulate regular expressions. Class Pattern represents a regular expression. Class Matcher contains both a regular-expression pattern and a CharSequence in which to search for the pattern.

CharSequence is an interface that allows read access to a sequence of characters. The interface requires that the methods charAt, length, subSequence and toString be declared. Both String and StringBuffer implement interface CharSequence, so an instance of either of these classes can be used with class

Matcher.

Common Programming Error 29.4

A regular expression can be tested against an object of any class that implements interface CharSequence, but the regular expression must be a String. Attempting to create a regular expression as a StringBuffer

is an error.

[Page 1386]

If a regular expression will be used only once, static Pattern method matches can be used. This method takes a string that specifies the regular expression and a CharSequence on which to perform the match. This method returns a boolean indicating whether the search object (the second argument) matches the regular expression.

If a regular expression will be used more than once, it is more efficient to use static Pattern method compile to create a specific Pattern object for that regular expression. This method receives a string representing the pattern and returns a new Pattern object, which can then be used to call method

matcher. This method receives a CharSequence to search and returns a Matcher object.

Matcher provides method matches, which performs the same task as Pattern method matches, but receives no argumentsthe search pattern and search object are encapsulated in the Matcher object. Class

Matcher provides other methods, including find, lookingAt, replaceFirst and replaceAll.

Figure 29.24 presents a simple example that employs regular expressions. This program matches birthdays against a regular expression. The expression only matches birthdays that do not occur in April and that belong to people whose names begin with "J".

Lines 1112 create a Pattern by invoking static Pattern method compile. The dot character "." in the regular expression (line 12) matches any single character except a new-line character.

Figure 29.24. Regular expressions checking birthdays.

1// Fig. 29.24: RegexMatches.java

2/ Demonstrating Classes Pattern and Matcher.

3import java.util.regex.Matcher;

4import java.util.regex.Pattern;

5

6public class RegexMatches

7{

8public static void main( String args[] )

9{

10// create regular expression

11Pattern expression =

12Pattern.compile( "J.*\\d[0-35-9]-\\d\\d-\\d\\d" );

14String string1 = "Jane's Birthday is 05-12-75\n" +

15"Dave's Birthday is 11-04-68\n" +

16"John's Birthday is 04-28-73\n" +

17"Joe's Birthday is 12-17-77";

18

19// match regular expression to string and print matches

20Matcher matcher = expression.matcher( string1 );

21

22while ( matcher.find() )

23System.out.println( matcher.group() );

24} // end main

25} // end class RegexMatches

Jane's Birthday is 05-12-75 Joe's Birthday is 12-17-77

[Page 1387]

Line 20 creates the Matcher object for the compiled regular expression and the matching sequence (string1). Lines 2223 use a while loop to iterate through the string. Line 22 uses Matcher method find to attempt to match a piece of the search object to the search pattern. Each call to this method starts at the point where the last call ended, so multiple matches can be found. Matcher method lookingAt performs the same way, except that it always starts from the beginning of the search object and will always find the first match if there is one.

Common Programming Error 29.5

Method matches (from class String, Pattern or Matcher) will return true only if the entire search object matches the regular expression. Methods find and lookingAt (from class Matcher) will return TRue if a portion of the search object matches the regular expression.

Line 23 uses Matcher method group, which returns the string from the search object that matches the search pattern. The string that is returned is the one that was last matched by a call to find or lookingAt. The output in Fig. 29.24 shows the two matches that were found in string1.

Regular Expression Web Resources

This section presents several of Java's regular-expression capabilities. The following Web sites provide more information on regular expressions.

developer.java.sun.com/developer/technicalArticles/releases/1.4regex

Thoroughly describes Java's regular-expression capabilities.

java.sun.com/docs/books/tutorial/extra/regex/index.html

This tutorial explains how to use Java's regular-expression API.

java.sun.com/j2se/5.0/docs/api/java/util/regex/package-summary.html

This page is the javadoc overview of package java.util.regex.

[Page 1387 (continued)]

29.8. Wrap-Up

In this chapter, you learned about more String methods for selecting portions of Strings and manipulating Strings. You also learned about the Character class and some of the methods it declares to handle chars. The chapter also discussed the capabilities of the StringBuffer class for creating Strings. The end of the chapter discussed regular expressions which provide a powerful capability to search and match portions of Strings that fit a particular pattern.

[Page 1387 (continued)]

Summary

A character literal's value is its integer value in the Unicode character set. Strings can include

letters, digits and special characters such as +, -, *, / and $. A string in Java is an object of class String. String literals are often referred to as String objects and are written in double quotes in a program.

string objects are immutabletheir character contents cannot be changed after they are created.

string method length returns the number of characters in a String.

string method charAt returns the character at a specific position.

string method equals tests any two objects for equality. The method returns true if the

contents of the Strings are equal, false otherwise. Method equals uses a lexicographical comparison for Strings.

[Page 1388]

When primitive-type values are compared with ==, the result is true if both values are identical.

When references are compared with ==, the result is true if both references refer to the same object in memory.

Java treats all string literals with the same contents as a single String object.

string method equalsIgnoreCase performs a case-insensitive string comparison.

string method compareTo uses a lexicographical comparison and returns 0 if the strings it is

comparing are equal, a negative number if the string compareTo is invoked on is less than the String that is passed as an argument and a positive number if the string that compareTo is invoked on is greater than the string that is passed as an argument.

string method regionMatches compares portions of two strings for equality.

string method startsWith determines whether a string starts with the characters specified as

an argument. String method endsWith determines whether a string ends with the characters specified as an argument.

string method indexOf locates the first occurrence of a character or a substring in a string. String method lastIndexOf locates the last occurrence of a character or a substring in a string.

string method substring copies and returns part of an existing string object.

string method concat concatenates two string objects and returns a new string object containing the characters from both original strings.

string method replace returns a new string object that replaces every occurrence in a String of its first character argument with its second character argument.

string method toUpperCase returns a new string with uppercase letters in the positions where

the original string had lowercase letters. String method toLowerCase returns a new string with lowercase letters in the positions where the original string had uppercase letters.

string method TRim returns a new string object in which all whitespace characters (e.g., spaces, newlines and tabs) have been removed from the beginning and end of a string.

string method toCharArray returns a char array containing a copy of the string's characters.

string class static method valueOf returns its argument converted to a string.

Class StringBuffer provides constructors that enable StringBuffers to be initialized with no

characters and an initial capacity of 16 characters, with no characters and an initial capacity specified in the integer argument, or with a copy of the characters of the String argument and an initial capacity that is the number of characters in the String argument plus 16.

StringBuffer method length returns the number of characters currently stored in a

StringBuffer. StringBuffer method capacity returns the number of characters that can be stored in a StringBuffer without allocating more memory.

StringBuffer method ensureCapacity ensures that a StringBuffer has at least the specified capacity. StringBuffer method setLength increases or decreases the length of a StringBuffer.

StringBuffer method charAt returns the character at the specified index. StringBuffer method

setCharAt sets the character at the specified position. StringBuffer method getChars copies characters in the StringBuffer into the character array passed as an argument.

Class StringBuffer provides overloaded append methods to add primitive-type, character array,

String, Object and CharSequence values to the end of a StringBuffer. StringBuffers and the append methods are used by the Java compiler to implement the + and += concatenation operators.

Class StringBuffer provides overloaded insert methods to insert primitive-type, character array, String, Object and CharSequence values at any position in a StringBuffer.

Class Character provides a constructor that takes a char argument.

[Page 1389]

Character method isDefined determines whether a character is defined in the Unicode character set. If so, the method returns trueotherwise, it returns false.

Character method isDigit determines whether a character is a defined Unicode digit. If so, the method returns trueotherwise, it returns false.

Character method isJavaIdentifierStart determines whether a character can be used as the

first character of an identifier in Java [i.e., a letter, an underscore (_) or a dollar sign ($)]. If so, the method returns TRueotherwise, it returns false.

Character method isJavaIdentifierPart determines whether a character can be used in an

identifier in Java [i.e., a digit, a letter, an underscore (_) or a dollar sign ($)]. Character method isLetter determines whether a character is a letter. Character method isLetterOrDigit determines whether a character is a letter or a digit. In each case, if so, the method returns TRueotherwise, it returns false.

Character method isLowerCase determines whether a character is a lowercase letter. Character

method isUpperCase determines whether a character is an uppercase letter. In both cases, if so, the method returns trueotherwise, false.

Character method toUpperCase converts a character to its uppercase equivalent. Character method toLowerCase converts a character to its lowercase equivalent.

Character method digit converts its character argument into an integer in the number system

specified by its integer argument radix. Character method forDigit converts its integer argument digit into a character in the number system specified by its integer argument radix.

Character method charValue returns the char stored in a Character object. Character method toString returns a String representation of a Character.

stringTokenizer's default constructor creates a StringTokenizer for its string argument that

will use the default delimiter string "\t\n\r\f", consisting of a space, a tab, a newline and a carriage return for tokenization.

stringTokenizer method countTokens returns the number of tokens in a string to be tokenized.

stringTokenizer method hasMoreTokens determines whether there are more tokens in the string being tokenized.

stringTokenizer method nextToken returns a String with the next token.

Regular expressions are sequences of characters and symbols that define a set of strings. They are useful for validating input and ensuring that data is in a particular format.

string method matches receives a string that specifies the regular expression and matches the

contents of the String object on which it is called to the regular expression. The method returns a boolean indicating whether the match succeeded.

A character class is an escape sequence that represents a group of characters. Each character

class matches a single character in the string we are attempting to match with the regular expression.

A word character (\w) is any letter (uppercase or lowercase), any digit or the underscorecharacter.

A whitespace character (\s) is a space, a tab, a carriage return, a newline or a form feed.

A digit (\d) is any numeric character.

To match a set of characters that does not have a predefined character class, use square

brackets, []. Ranges of characters can be represented by placing a dash (-) between two characters. If the first character in the brackets is "^", the expression accepts any character other than those indicated.

When the regular expression operator "*" appears in a regular expression, the program attempts to match zero or more occurrences of the subexpression immediately preceding the "*".

[Page 1390]

Operator "+" attempts to match one or more occurrences of the subexpression preceding it.

The character "|" allows a match of the expression to its left or to its right.

The parentheses () are used to group parts of the regular expression.

The asterisk (*) and plus (+) are formally called quantifiers.

All quantifiers affect only the subexpression immediately preceding the quantifier.

Quantifier question mark (?) matches zero or one occurrences of the expression that it quantifies.

A set of braces containing one number ({n}) matches exactly n occurrences of the expression it quantifies.

Including a comma after the number enclosed in braces matches at least n occurrences of the quantified expression.

A set of braces containing two numbers ({n, m}) matches between n and m occurrences of the expression that it qualifies.

All of the quantifiers are greedy, which means that they will match as many occurrences as they can as long as the match is successful.

If any of these quantifiers is followed by a question mark (?), the quantifier becomes reluctant, matching as few occurrences as possible as long as the match is successful.

string method replaceAll replaces text in a string with new text (the second argument) wherever the original string matches a regular expression (the first argument).

Escaping a special regular-expression character with a \ instructs the regular-expression

matching engine to find the actual character, as opposed to what it represents in a regular expression.

string method replaceFirst replaces the first occurrence of a pattern match. Java Strings are

immutable, therefore method replaceFirst returns a new string in which the appropriate characters have been replaced.

string method split divides a string into several substrings. The original string is broken in any

location that matches a specified regular expression. Method split returns an array of strings containing the substrings between matches for the regular expression.

Class Pattern represents a regular expression.

Class Matcher contains both a regular-expression pattern and a CharSequence in which to search for the pattern.

CharSequence is an interface that allows read access to a sequence of characters. Both String

and StringBuffer implement interface CharSequence, so an instance of either of these classes can be used with class Matcher.

If a regular expression will be used only once, static Pattern method matches takes a string

that specifies the regular expression and a CharSequence on which to perform the match. This method returns a boolean indicating whether the search object matches the regular expression.

If a regular expression will be used more than once, it is more efficient to use static Pattern

method compile to create a specific Pattern object for that regular expression. This method receives a string representing the pattern and returns a new Pattern object.

Pattern method matcher receives a CharSequence to search and returns a Matcher object.

Matcher method matches performs the same task as Pattern method matches, but receives no arguments.

Matcher method find attempts to match a piece of the search object to the search pattern. Each

call to this method starts at the point where the last call ended, so multiple matches can be found.

Matcher method lookingAt performs the same as find, except that it always starts from the beginning of the search object and will always find the first match if there is one.

[Page 1391]

Matcher method group returns the string from the search object that matches the search pattern. The string that is returned is the one that was last matched by a call to find or lookingAt.

[Page 1391 (continued)]

Terminology

append method of class StringBuffer capacity method of class StringBuffer character literal

charAt method of class StringBuffer CharSequence interface

charValue method of class Character concat method of class String delete method of class StringBuffer deleteCharAt method of class String delimiter for tokens

digit method of class Character empty string

endsWith method of class String ensureCapacity method of class StringBuffer find method of class Matcher

forDigit method of class Character getChars method of class String getChars method of class StringBuffer greedy quantifier

hasMoreTokens method of class StringTokenizer immutable

indexOf method of class String isDefined method of class Character isDigit method of class Character

isJavaIdentifierPart method of class Character isJavaIdentifierStart method of class Character

isLetter method of class Character isLetterOrDigit method of class Character isLowerCase method of class Character isUpperCase method of class Character lastIndexOf method of class String

lazy quantifier

length method of class String length method of class StringBuffer lexicographical comparison lookingAt method of class Matcher

Matcher class

matcher method of class Pattern matches method of class Matcher matches method of class Pattern matches method of class String

nextToken method of class StringTokenizer

Pattern class

predefined character class quantifier for regular expression radix

regionMatches method of class String regular expressions

reluctant quantifier

replaceAll method of class String replaceFirst method of class String reverse method of class StringBuffer setCharAt method of class StringBuffer special character

split method of class String