Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

AhmadLang / Java, How To Program, 2004

.pdf
Скачиваний:
626
Добавлен:
31.05.2015
Размер:
51.82 Mб
Скачать

Line 21 uses Character method isLetter to determine whether character c is a letter. If so, the method returns true, and otherwise, it returns false. Line 23 uses Character method isLetterOrDigit to determine whether character c is a letter or a digit. If so, the method returns true, and otherwise, it returns false.

Line 25 uses Character method isLowerCase to determine whether character c is a lowercase letter. If so, the method returns true, and otherwise, it returns false. Line 27 uses Character method

isUpperCase to determine whether character c is an uppercase letter. If so, the method returns true, and otherwise, it returns false.

[Page 1375]

Line 29 uses Character method toUpperCase to convert the character c to its uppercase equivalent. The method returns the converted character if the character has an uppercase equivalent, and otherwise, the method returns its original argument. Line 31 uses Character method toLowerCase to convert the character c to its lowercase equivalent. The method returns the converted character if the character has a lowercase equivalent, and otherwise, the method returns its original argument.

Figure 29.16 demonstrates static Character methods digit and forDigit, which convert characters to digits and digits to characters, respectively, in different number systems. Common number systems include decimal (base 10), octal (base 8), hexadecimal (base 16) and binary (base 2). The base of a number is also known as its radix. For more information on conversions between number systems, see Appendix E.

Figure 29.16. Character class static conversion methods.

(This item is displayed on pages 1375 - 1376 in the print version)

1// Fig. 29.16: StaticCharMethods2.java

2// Static Character conversion methods.

3import java.util.Scanner;

4

5public class StaticCharMethods2

6{

7// create StaticCharMethods2 object execute application

8 public static void main( String args[] )

9{

10Scanner scanner = new Scanner( System.in );

12// get radix

13System.out.println( "Please enter a radix:" );

14int radix = scanner.nextInt();

16// get user choice

17System.out.printf( "Please choose one:\n1 -- %s\n2 -- %s\n",

18"Convert digit to character", "Convert character to digit");

19int choice = scanner.nextInt();

20

21// process request

22switch ( choice )

23{

24case 1 : // convert digit to character

25

System.out.println( "Enter a digit:" );

26

int digit = scanner.nextInt();

27

System.out.printf( "Convert digit to character: %s\n",

28

Character.forDigit( digit, radix ));

29

break;

30

 

31

case 2 : // convert character to digit

32

System.out.println( "Enter a character:" );

33

char character = scanner.next().charAt( 0 );

34

System.out.printf( "Convert character to digit: %s\n",

35

Character.digit( character, radix ));

36

break;

37} // end switch

38} // end main

39} // end class StaticCharMethods2

Please

enter

a radix:

 

16

 

 

 

 

 

 

Please

choose

one:

 

 

1

--

Convert

digit

to

character

2

--

Convert

character to digit

2

 

 

 

 

 

 

Enter

a

character:

 

 

A

 

 

 

 

 

 

Convert

character

to

digit: 10

 

 

 

 

 

 

 

Please

enter

a radix:

 

16

 

 

 

 

 

Please

choose

one:

 

1

--

Convert

digit to

character

2

--

Convert

character

to digit

1

 

 

 

 

 

Enter

a

digit:

 

13

 

 

 

 

 

Convert

digit

to character: d

 

 

 

 

 

 

Line 28 uses method forDigit to convert the integer digit into a character in the number system specified by the integer radix (the base of the number). For example, the decimal integer 13 in base 16 (the radix) has the character value 'd'. Note that lowercase letters represent the same value as uppercase letters in number systems. Line 35 uses method digit to convert the character c into an integer in the number system specified by the integer radix (the base of the number). For example, the character 'A' is the base 16 (the radix) representation of the base 10 value 10. The radix must be between 2 and 36, inclusive.

[Page 1376]

The application in Fig. 29.17 demonstrates the constructor and several non-static methods of class CharactercharValue, toString and equals. Lines 89 instantiate two Character objects and pass character literals to the constructor to initialize those objects. Line 12 uses Character method charValue to return the char value stored in Character object c1. Line 12 returns a string representation of Character object c2 using method toString. The condition in the if... else statement at lines 1417 uses method equals to determine whether the object c1 has the same contents as the object c2 (i.e., the characters inside each object are equal).

Figure 29.17. Character class non-static methods.

(This item is displayed on page 1377 in the print version)

1// Fig. 29.17: OtherCharMethods.java

2// Non-static Character methods.

3

4public class OtherCharMethods

5{

6 public static void main( String args[] )

7{

8Character c1 = 'A';

9Character c2 = 'a';

11System.out.printf(

12"c1 = %s\nc2 = %s\n\n", c1.charValue(), c2.toString() );

14if ( c1.equals( c2 ) )

15

System.out.println( "c1 and c2 are equal\n" );

16else

17System.out.println( "c1 and c2 are not equal\n" );

18} // end main

19} // end class OtherCharMethods

c1 = A

c2 = a

c1 and c2 are not equal

[Page 1376 (continued)]

29.6. Class StringTokenizer

When you read a sentence, your mind breaks the sentence into tokensindividual words and punctuation marks, each of which conveys meaning to you. Compilers also perform tokenization. They break up statements into individual pieces like keywords, identifiers, operators and other elements of a programming language. In this section, we study Java's StringTokenizer class (from package java.util), which breaks a string into its component tokens. Tokens are separated from one another by delimiters, typically whitespace characters such as space, tab, newline and carriage return. Other characters can also be used as delimiters to separate tokens. The application in Fig. 29.18 demonstrates

class StringTokenizer.

[Page 1377]

Figure 29.18. StringTokenizer object used to tokenize strings.

(This item is displayed on page 1378 in the print version)

1// Fig. 29.18: TokenTest.java

2// StringTokenizer class.

3import java.util.Scanner;

4import java.util.StringTokenizer;

6public class TokenTest

7{

8// execute application

9public static void main( String args[] )

10{

11// get sentence

12Scanner scanner = new Scanner( System.in );

13System.out.println( "Enter a sentence and press Enter" );

14String sentence = scanner.nextLine();

15

16// process user sentence

17StringTokenizer tokens = new StringTokenizer( sentence );

18System.out.printf( "Number of elements: %d\nThe tokens are:\n",

19tokens.countTokens() );

20

21while ( tokens.hasMoreTokens() )

22System.out.println( tokens.nextToken() );

23} // end main

24} // end class TokenTest

Enter a sentence and press Enter This is a sentence with seven tokens Number of elements: 7

The tokens are: This

is a

sentence with seven tokens

When the user presses the Enter key, the input sentence is stored in String variable sentence. Line 17 creates an instance of class StringTokenizer using String sentence. This StringTokenizer constructor takes a string argument and creates a StringTokenizer for it, and will use the default delimiter string "\t\n\r\f" consisting of a space, a tab, a carriage return and a newline for tokenization. There are two other constructors for class StringTokenizer. In the version that takes two String arguments, the second String is the delimiter string. In the version that takes three arguments, the second String is the delimiter string and the third argument (a boolean) determines whether the delimiters are also returned as tokens (only if the argument is TRue). This is useful if you need to know what the delimiters are.

Line 19 uses StringTokenizer method countTokens to determine the number of tokens in the string to be tokenized. The condition in the while statement at lines 2122 uses StringTokenizer method hasMoreTokens to determine whether there are more tokens in the string being tokenized. If so, line 22 prints the next token in the String. The next token is obtained with a call to StringTokenizer method nextToken, which returns a String. The token is output using println, so subsequent tokens appear on separate lines.

[Page 1378]

If you would like to change the delimiter string while tokenizing a string, you may do so by specifying a new delimiter string in a nextToken call as follows:

tokens.nextToken( newDelimiterString );

This feature is not demonstrated in Fig. 29.18.

[Page 1378 (continued)]

29.7. Regular Expressions, Class Pattern and Class Matcher

Regular expressions are sequences of characters and symbols that define a set of strings. They are useful for validating input and ensuring that data is in a particular format. For example, a ZIP code must consist of five digits, and a last name must contain only letters, spaces, apostrophes and hyphens. One application of regular expressions is to facilitate the construction of a compiler. Often, a large and complex regular expression is used to validate the syntax of a program. If the program code does not match the regular expression, the compiler knows that there is a syntax error within the code.

[Page 1379]

Class String provides several methods for performing regular-expression operations, the simplest of which is the matching operation. String method matches receives a string that specifies the regular expression and matches the contents of the String object on which it is called to the regular expression. The method returns a boolean indicating whether the match succeeded.

A regular expression consists of literal characters and special symbols. Figure 29.19 specifies some predefined character classes that can be used with regular expressions. A character class is an escape sequence that represents a group of characters. A digit is any numeric character. A word character is any letter (uppercase or lowercase), any digit or the underscore character. A whitespace character is a space, a tab, a carriage return, a newline or a form feed. Each character class matches a single character in the string we are attempting to match with the regular expression.

Figure 29.19. Predefined character classes.

Character

Matches

Character

Matches

 

 

 

 

\d

any digit

\D

any non-digit

\w

any word character

\W

any non-word character

\s

any whitespace

\S

any non-whitespace

 

 

 

 

Regular expressions are not limited to these predefined character classes. The expressions employ various operators and other forms of notation to match complex patterns. We examine several of these techniques in the application in Fig. 29.20 and Fig. 29.21 which validates user input via regular expressions. [Note:

This application is not designed to match all possible valid user input.]

Figure 29.20. Validating user information using regular expressions.

(This item is displayed on page 1380 in the print version)

1

//

Fig. 29.

20: ValidateInput

.java

2

//

Validate

user information

using regular expressions.

3

 

 

 

 

4public class ValidateInput

5{

6// validate first name

7 public static boolean validateFirstName( String firstName )

8{

9return firstName.matches( "[A-Z][a-zA-Z]*" );

10} // end method validateFirstName

11

12// validate last name

13public static boolean validateLastName( String lastName )

14{

15return lastName.matches( "[a-zA-z]+([ '-][a-zA-Z]+)*" );

16} // end method validateLastName

17

18// validate address

19public static boolean validateAddress( String address )

20{

21return address.matches(

22"\\d+\\s+([a-zA-Z]+|[a-zA-Z]+\\s[a-zA-Z]+)" );

23} // end method validateAddress

24

25// validate city

26public static boolean validateCity( String city )

27{

28return city.matches( "([a-zA-Z]+|[a-zA-Z]+\\s[a-zA-Z]+)" );

29} // end method validateCity

30

31// validate state

32public static boolean validateState( String state )

33{

34return state.matches( "([a-zA-Z]+|[a-zA-Z]+\\s[a-zA-Z]+)" );

35} // end method validateState

36

37// validate zip

38public static boolean validateZip( String zip )

39{

40return zip.matches( "\\d{5}" );

41} // end method validateZip

42

43// validate phone

44public static boolean validatePhone( String phone )

45{

46return phone.matches( "[1-9]\\d{2}-[1-9]\\d{2}-\\d{4}" );

47} // end method validatePhone

48} // end class ValidateInput

Figure 29.21. Inputs and validates data from user using the ValidateInput class.

(This item is displayed on pages 1381 - 1382 in the print version)

1

//

Fig. 29.21: Validate.java

2

//

Validate user information using regular expressions.

3

import java.util.Scanner;

4

 

 

5public class Validate

6{

7 public static void main( String[] args )

8{

9// get user input

10Scanner scanner = new Scanner( System.in );

11System.out.println( "Please enter first name:" );

12String firstName = scanner.nextLine();

13System.out.println( "Please enter last name:" );

14String lastName = scanner.nextLine();

15System.out.println( "Please enter address:" );

16String address = scanner.nextLine();

17System.out.println( "Please enter city:" );

18String city = scanner.nextLine();

19System.out.println( "Please enter state:" );

20String state = scanner.nextLine();

21System.out.println( "Please enter zip:" );

22String zip = scanner.nextLine();

23System.out.println( "Please enter phone:" );

24String phone = scanner.nextLine();

25

26// validate user input and display error message

27System.out.println( "\nValidate Result:" );

28

29if ( !ValidateInput.validateFirstName( firstName ) )

30System.out.println( "Invalid first name" );

31else if ( !ValidateInput.validateLastName( lastName ) )

32System.out.println( "Invalid last name" );

33else if ( !ValidateInput.validateAddress( address ) )

34System.out.println( "Invalid address" );

35else if ( !ValidateInput.validateCity( city ) )

36System.out.println( "Invalid city" );

37else if ( !ValidateInput.validateState( state ) )

38System.out.println( "Invalid state" );

39else if ( !ValidateInput.validateZip( zip ) )

40System.out.println( "Invalid zip code" );

41else if ( !ValidateInput.validatePhone( phone ) )

42System.out.println( "Invalid phone number" );

43else

44System.out.println( "Valid input. Thank you." );

45} // end main

46} // end class Validate

Please

enter

first

name:

Jane

 

 

 

Please

enter

last

name:

Doe

 

 

 

Please

enter

address:

123 Some Street

 

Please

enter

city:

 

Some City

 

 

Please

enter

state:

SS

 

 

 

Please

enter

zip:

 

123

 

 

 

Please

enter

phone:

123-456-7890

 

 

Validate Result:

 

Invalid

zip

code

 

 

 

 

 

Please

enter

first

name:

Jane

 

 

 

Please

enter

last

name:

Doe

 

 

 

Please

enter

address:

123 Some Street

 

Please

enter

city:

 

Some City

 

 

Please

enter

state:

SS

 

 

 

Please

enter

zip:

 

12345

 

 

 

Please

enter

phone:

123-456-7890

 

 

Validate Result:

 

Valid

input. Thank

you.

 

 

 

 

Figure 29.20 validates user input. Line 9 validates the first name. To match a set of characters that does not have a predefined character class, use square brackets, []. For example, the pattern "[aeiou]" matches a single character that is a vowel. Ranges of characters can be represented by placing a dash (-) between two characters. In the example, "[A-Z]" matches a single uppercase letter. If the first character in the brackets is "^", the expression accepts any character other than those indicated. However, it is important to note that "[^Z]" is not the same as "[A-Y]", which matches uppercase letters AY"[^Z]" matches any character other than capital Z, including lowercase letters and non-letters such as the newline

character. Ranges in character classes are determined by the letters' integer values. In this example, "[A- Za-z]" matches all uppercase and lowercase letters. The range "[A-z]" matches all letters and also matches those characters (such as % and 6) with an integer value between uppercase Z and lowercase a (for more information on integer values of characters see Appendix B, ASCII Character Set). Like predefined character classes, character classes delimited by square brackets match a single character in the search object.

[Page 1380]

In line 9, the asterisk after the second character class indicates that any number of letters can be matched. In general, when the regular-expression operator "*" appears in a regular expression, the application attempts to match zero or more occurrences of the subexpression immediately preceding the "*". Operator "+" attempts to match one or ore occurrences of the subexpression immediately preceding "+". So both "A*" and "A+" will match "AAA", but only "A*" will match an empty string.

[Page 1381]

[Page 1382]

If method validateFirstName returns TRue (line 29), the application attempts to validate the last name (line 31) by calling validateLastName (lines 1316 of Fig. 29.20). The regular expression to validate the last name matches any number of letters split by spaces, apostrophes or hyphens.

Line 33 validates the address by calling method validateAddress (lines 1923 of Fig. 29.20). The first character class matches any digit one or more times (\\d+). Note that two \ characters are used because \ normally starts an escape sequences in a string. So \\d in a Java string represents the regular expression pattern \d. Then we match one or more whitespace characters (\\s+). The character "|" allows a match of the expression to its left or to its right. For example, "Hi (John|Jane)" matches both "Hi John" and "Hi Jane". The parentheses are used to group parts of the regular expression. In this example, the left side of | matches a single word, and the right side matches two words separated by any amount of white space. So the address must contain a number followed by one or two words. Therefore, "10 Broadway" and "10 Main Street" are both valid addresses in this example. The city (line 2629 of Fig. 29.20) and state (line 3235 of Fig. 29.20) methods also match any word of at least one character or, alternatively, any two words of at least one character if the words are separated by a single space. This means both Waltham and West Newton would match.

[Page 1383]

The asterisk (*) and plus (+) are formally called quantifiers. Figure 29.22 lists all the quantifiers. We have already discussed how the asterisk (*) and plus (+) quantifiers work. All quantifiers affect only the subexpression immediately preceding the quantifier. Quantifier question mark (?) matches zero or one occurrences of the expression that it quantifies. A set of braces containing one number ({n}) matches exactly n occurrences of the expression it quantifies. We demonstrate this quantifier to validate the zip code in Fig. 29.20 at line 40. Including a comma after the number enclosed in braces matches at least n occurrences of the quantified expression. The set of braces containing two numbers ({n, m}), matches between n and m occurrences of the expression that it qualifies. Quantifiers may be applied to patterns enclosed in parentheses to create more complex regular expressions.

Figure 29.22. Quantifiers used in regular expressions.

Quantifier

Matches

 

 

*

Matches zero or more occurrences of the pattern.

+

Matches one or more occurrences of the pattern.

?

Matches zero or one occurrences of the pattern.

{ n }

Matches exactly n occurrences.

{ n, }

Matches at least n occurrences.

{ n, m}

Matches between n and m (inclusive) occurrences.

 

 

All of the quantifiers are greedy. This means that they will match as many occurrences as they can as long as the match is still successful. However, if any of these quantifiers is followed by a question mark

(?), the quantifier becomes reluctant (sometimes called lazy). It then will match as few occurrences as possible as long as the match is still successful.

The zip code (line 40 in Fig. 29.20) matches a digit five times. This regular expression uses the digit character class and a quantifier with the digit 5 between braces. The phone number (line 46 in Fig. 29.20) matches three digits (the first one cannot be zero) followed by a dash followed by three more digits (again the first one cannot be zero) followed by four more digits.

String Method matches checks whether an entire string conforms to a regular expression. For example, we want to accept "Smith" as a last name, but not "9@Smith#". If only a substring matches the regular expression, method matches returns false.

Replacing Substrings and Splitting Strings

Sometimes it is useful to replace parts of a string or to split a string into pieces. For this purpose, class String provides methods replaceAll, replaceFirst and split. These methods are demonstrated in Fig. 29.23.

Figure 29.23. Methods replaceFirst, replaceAll and split.

(This item is displayed on pages 1384 - 1385 in the print version)

1

//

Fig. 29.23: RegexSubstitution.java

2

//

Using methods replaceFirst, replaceAll and split.

3

 

 

4public class RegexSubstitution

5{

6

public static void main( String

args[]

)

 

7

{

 

 

 

 

8

String firstString = "This sentence

ends in 5

stars *****";

9

String secondString

= "1, 2,

3, 4,

5, 6, 7, 8";

10

 

 

 

 

 

11

System.out.printf(

"Original

String

1: %s\n",

firstString );

12

 

 

 

 

 

13// replace '*' with '^'

14firstString = firstString.replaceAll( "\\*" , "^" );

16System.out.printf( "^ substituted for *: %s\n", firstString );

18// replace 'stars' with 'carets'

19firstString = firstString.replaceAll( "stars", "carets" );

21System.out.printf(

22"\"carets\" substituted for \"stars\": %s\n", firstString );

24// replace words with 'word'

25System.out.printf( "Every word replaced by \"word\": %s\n\n",

26

firstString.replaceAll( "\\w+",

"word" ) );

 

27

 

 

 

 

 

28

System.out.printf( "Original String

2: %s\n",

secondString

);

29

 

 

 

 

 

30

// replace first three digits with 'digit'

 

 

31

for ( int i = 0

; i < 3; i++ )

 

 

 

32

secondString

= secondString.replaceFirst(

"\\d", "digit"

);

33

 

 

 

 

 

34System.out.printf(

35"First 3 digits replaced by \"digit\" : %s\n", secondString );

36String output = "String split at commas: [" ;

37

 

 

 

 

 

38

String[]

results =

secondString.split(

",\\s*"

); // split on commas

39

 

 

 

 

 

40

for ( String string : results )

 

 

41

output

+= "\""

+ string + "\", ";

// output

results

42

 

 

 

 

 

43// remove the extra comma and add a bracket

44output = output.substring( 0, output.length() - 2 ) + "]";

45System.out.println( output );

46} // end main

47} // end class RegexSubstitution