Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ganesh_JavaSE7_Programming_1z0-804_study_guide.pdf
Скачиваний:
94
Добавлен:
02.02.2015
Размер:
5.88 Mб
Скачать

Chapter 7 String Processing

Table 7-4.(continued)

Symbol Description

\W

Matches non-word characters.

\s

Matches whitespaces (equivalent to [\t\r\f\n]).

\S

Matches non-whitespaces.

\b

Matches word boundary when outside bracket. Matches backslash when inside bracket.

\B

Matches non-word boundary.

\A

Matches beginning of string.

\Z

Matches end of string.

 

 

Okay, so far so good. But what if you want to specify a regex when the match involves an occurrence count of characters? Well, for such situations you can use quantifier symbols, provided in Table 7-5.

Table 7-5.  Commonly Used Quantifier Symbols

 

 

Symbol

Description

expr?

Matches 0 or 1 occurrence of expr (equivalent to expr{0,1}).

expr*

Matches 0 or more occurrences of expr (equivalent to expr{0,}).

expr+

Matches 1 or more occurrences of expr (equivalent to expr{1,}).

expr{x}

Matches x occurrences of expr.

expr{x, y}

Matches between x and y occurrences of expr.

expr{x,}

Matches x or more occurrences of expr.

 

 

Regex Support in Java

Java 1.4 SDK introduced regex support in Java. The package java.util.regex supports regex. It consists of two important classes, Pattern and Matcher. Pattern represents a regex in a compiled representation, and Matcher interprets a regex and matches the corresponding substring in a given string.

At this point, you may ask why is regex supported by dedicated classes such as Matcher and Pattern when other methods such as split() in the String class already support regex? The single word answer for this question is: performance. The Pattern and Matcher classes are optimized for performance while methods like split() in String are not.

Use the Pattern and Matcher classes whenever you are performing search or replace on strings heavily; they are more efficient than split() in String or other methods.

Now, let’s see how you can use the Pattern and Matcher classes. You first need to call a static method compile() in the Pattern class to get an instance of a pattern. The first argument of this method is a regex. Then, you need to call another static method called matcher() from the Pattern class to get an instance of the Matcher class. The matcher() method returns a Matcher object. This object is then used to execute the required operation on the input string.

212

Chapter 7 String Processing

Let’s look at how to use the Pattern and Matcher classes. Let’s assume that you have a string consisting of personal details (such as name, address, phone number) of a set of people. You will use the following string for the examples covered in rest of this section:

String str = "Danny Doo, Flat no 502, Big Apartment, Wide Road, Near Huge Milestone, Hugo-city 56010, Ph: 9876543210, Email: danny@myworld.com. Maggi Myer, Post bag no 52, Big bank post office, Big bank city 56000, ph: 9876501234, Email: maggi07@myuniverse.com.";

You need to specify regex using the backslash (\); do not use the forward slash (/) instead. The compiler will not give any error if you use the forward slash; however, you won’t get the desired output.

Searching and Parsing with regex

Let’s start with a simple example. You need to write code to print all words of the string str. How can you do this? Well, do you remember metacharacter "\w", which matches with all symbols forming a word? You are going to use "\w" along with the quantifier "+" to make "\w+", which means you want to search all words of length one or more; see Listing 7-7.

Listing 7-7.  Regex1.java

import java.util.regex.Matcher; import java.util.regex.Pattern;

public class Regex1 {

public static void main(String[] args) {

String str = "Danny Doo, Flat no 502, Big Apartment, Wide Road, Near Huge Milestone, Hugo-city 56010, Ph: 9876543210, Email: danny@myworld.com. Maggi Myer, Post bag no 52, Big bank post office, Big bank city 56000, ph: 9876501234, Email: maggi07@myuniverse.com.";

Pattern pattern = Pattern.compile("\\w+"); Matcher matcher = pattern.matcher(str); while(matcher.find()) {

System.out.println(matcher.group());

}

}

}

It prints the following:

Danny Doo Flat no 502

...

maggi07 myuniverse com

213

Chapter 7 String Processing

(Note that we have truncated the results with . . . to save space.) However, you can see that the regular expression searched all words consisting at least one character. What happened here was that you invoked the compile() method along with a regex of the Pattern class to get an instance of Pattern. After that, you got an instance of the Matcher class by calling the matcher() method on pattern instance. And finally you got the result using the group() and find() methods of the Matcher class. The method find() returns true if there exists any search result. The group() method returns a search result occurrence as a string.

Note that we used two backslashes in the regex (“\\w+”) specified in Listing 7-7 because backslash is a escape character in regex. However, backslash is also a escape character in Java strings, which means literal “\\” is interpreted as a single backslash. This translates to an interesting outcome:

we write “\\” as a single backslash in a regex, which will be written as “\\\\” in a Java program if we want to specify a single backslash.

In the same way, you can search all the numbers using the "\d+" regex. Now, let’s say you want to search and print all ZIP codes (postal code) appeared in the string. Assume that the ZIP code length is always 5. Can you achieve this using regex? Will the program in Listing 7-8 work?

Listing 7-8.  Regex2.java

import java.util.regex.Matcher; import java.util.regex.Pattern;

// This program demonstrates how we can search numbers of a specified length public class Regex2 {

public static void main(String[] args) {

String str = "Danny Doo, Flat no 502, Big Apartment, Wide Road, Near Huge Milestone, Hugo-city 56010, Ph: 9876543210, Email: danny@myworld.com. Maggi Myer, Post bag no 52, Big bank post office, Big bank city 56000, ph: 9876501234, Email: maggi07@myuniverse.com.";

Pattern pattern = Pattern.compile("\\d{5}"); Matcher matcher = pattern.matcher(str); while(matcher.find()) {

System.out.println(matcher.group());

}

}

}

You used "\d{5}" as the regex string. Let’s see what this program prints:

56010

98765

43210

56000

98765

01234

Oops! It has printed two ZIP codes but also printed three partial phone numbers, which was unexpected. Hmm, to get only the ZIP codes, you must specify the regex more properly. Try again with Listing 7-9.

214

Chapter 7 String Processing

Listing 7-9.  Regex3.java

import java.util.regex.Matcher; import java.util.regex.Pattern;

// This program demonstrates how we can search numbers of a specified length public class Regex3 {

public static void main(String[] args) {

String str = "Danny Doo, Flat no 502, Big Apartment, Wide Road, Near Huge Milestone, Hugo-city 56010, Ph: 9876543210, Email: danny@myworld.com. Maggi Myer, Post bag no 52, Big bank post office, Big bank city 56000, ph: 9876501234, Email: maggi07@myuniverse.com.";

Pattern pattern = Pattern.compile("\\D\\d{5}\\D"); Matcher matcher = pattern.matcher(str); while(matcher.find()) {

System.out.println(matcher.group());

}

}

}

It prints the following:

56010,

56000,

This time you used "\D\d{5}\D" and it worked well. What you essentially did was specify that a non-digit character is preceded and followed by a six-digit number. Easy, right! Well, there is a problem in this solution.

The program is printing one whitespace just before the six-digit number and a comma just after the six-digit number (both matched by "\D"). Can you get rid of these unwanted characters? Yes, there is an elegant solution to this: you can use "\b" (used to detect word boundaries) here. See if this works by trying the code in Listing 7-10.

Listing 7-10.  RegexDemo.java

import java.util.regex.Matcher; import java.util.regex.Pattern;

// This program demonstrates how we can search numbers of a specified length public class Regex4 {

public static void main(String[] args) {

String str = "Danny Doo, Flat no 502, Big Apartment, Wide Road, Near Huge Milestone, Hugo-city 56010, Ph: 9876543210, Email: danny@myworld.com. Maggi Myer, Post bag no 52, Big bank post office, Big bank city 56000, ph: 9876501234, Email: maggi07@myuniverse.com.";

Pattern pattern = Pattern.compile("\\b\\d{5}\\b"); Matcher matcher = pattern.matcher(str); while(matcher.find()) {

System.out.println(matcher.group());

}

}

}

It prints the following:

56010

56000

215

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]