Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
PHP Programming With MySQL Second Edition.doc
Скачиваний:
0
Добавлен:
01.05.2025
Размер:
43.07 Mб
Скачать

CHAPTER 3

Manipulating Strings

$Identifier = "http://www.dongosselin,com";

echo preg_match("/.com$/", $Identifier); // returns 1

To correct the problem, you must escape the period in the pattern as

follows:

164

$Identifier = "http://www.dongosselin,com";

echo preg_match("/\.com$/", $Identifier); // returns 0

Escaping a dollar sign requires a little more work. Because the dollar

sign is used to indicate a variable name in PHP, it needs to be pre-

ceded by a backslash for PHP to interpret it as a literal $ character.

Therefore, when using double quotation marks around the pattern

string, you need to enter two backslashes (\\) to insert the literal

backslash, followed by a backslash and a dollar sign (\$) to include

the literal dollar sign. Altogether, this becomes three backslashes fol-

lowed by a dollar sign (\\\$). Another option is to use single quotes

around the pattern string, and to use a single backslash before the

dollar sign (\$). The following code demonstrates how to use both

techniques:

$Currency="$123.45";

echo preg_match('/^\$/', $Currency); // returns 1

echo preg_match("/^\\\$/", $Currency); // returns 1

Specifying Quantity

Metacharacters that specify the quantity of a match are called

quantifiers. Table 3-4 lists the quantifiers that you can use with PCRE.

Quantifier

?

+

*

{n}

{n,}

{,n}

{n1, n2}

Description

Specifies that the preceding character is optional

Specifies that one or more of the preceding characters must

match

Specifies that zero or more of the preceding characters can

match

Specifies that the preceding character repeat exactly n times

Specifies that the preceding character repeat at least n times

Specifies that the preceding character repeat up to n times

Specifies that the preceding character repeat at least n1 times

but no more than n2 times

PCRE quantifiers

Table 3-4

The question mark quantifier specifies that the preceding character

In the pattern is optional. The following code demonstrates how to

Working with Regular Expressions

use the question mark quantifier to specify that the protocol assigned

to the beginning of the $URL variable can be either http or https.

$URL = "http://www.dongosselin.com";

preg_match("/^https?/", $URL); // returns 1

The addition quantifier (+) specifies that one or more sequential occur-

rences of the preceding characters match, whereas the asterisk quantifier

(*) specifies that zero or more sequential occurrences of the preceding

characters match. As a simple example, the following code demonstrates

how to ensure that data has been entered in a required field.

$Name = "Don";

preg_match("/.+/", $Name); // returns 1

165

Similarly, because a numeric string might contain leading zeroes,

the following code demonstrates how to check whether the

$NumberString variable contains zero or more leading zeroes:

$NumberString = "00125";

preg_match("/^0*/", $NumberString); // returns 1

You can

validate a

ZIP code

much more

efficiently

with character classes,

which are covered later in

this chapter.

The { } quantifiers allow you to more precisely specify the number

of times that a character must repeat sequentially. The following code

shows a simple example of how to use the { } quantifiers to ensure

that a ZIP code consists of at least five characters:

preg_match("/ZIP: .{5}$/", " ZIP: 01562"); // returns 1

The preceding code uses the period metacharacter and the { } quan-

tifiers to ensure that the $ZIP variable contains a minimum of five

characters. The following code specifies that the $ZIP variable must

consist of at least five characters but a maximum of 10 characters, in

case the ZIP code contains the dash and four additional numbers that

are found in a ZIP+4 number:

preg_match("/(ZIP: .{5,10})$/", "ZIP: 01562-2607");

// returns 1

Specifying Subexpressions

As you learned earlier, regular expression patterns can include literal

Values; any strings you validate against a regular expression must

contain exact matches for the literal values contained in the pattern.

You can also use parentheses metacharacters (( and )) to specify

the characters required in a pattern match. Characters contained in

a set of parentheses within a regular expression are referred to as a

subexpression or subpattern. Subexpressions allow you to determine

the format and quantities of the enclosed characters as a group. As


CHAPTER 3

Notice that

the telephone

number regu-

lar expression

pattern

includes the ^ and $

metacharacters to anchor

both the beginning and

end of the pattern. This

ensures that a string

exactly matches the

pattern in a regular

expression.

Manipulating Strings

an example, consider the following pattern, which defines a regular

expression for a telephone number:

"/^(1 )?(\(.{3}\) )?(.{3})(\-.{4})$/"

166

The first and second groups in the preceding pattern include the ?

quantifier. This allows a string to optionally include a 1 and the area

code. If the string does include these groups, they must be in the exact

format of “1 ” for the first pattern and “(nnn) ” for the second pattern,

including the space following the area code. Similarly, the telephone

number itself includes two groups that require the number to be in

the format of “nnn” and “–nnnn.” Because the “1 ” and the area code

pattern are optional, all of the following statements return a value of 1:

preg_match("/^(1 )?(\(.{3}\) )?(.{3})(\-.{4})$/", "555-

1234");

preg_match("/^(1 )?(\(.{3}\) )?(.{3})(\-.{4})$/", "(707)

555-1234");

preg_match("/^(1 )?(\(.{3}\) )?(.{3})(\-.{4})$/", "1 (707)

555-1234");

As with the

string com-

parisons

earlier, the

ranges are

based on the ASCII values

of the characters. Ranges

must be specified from

smallest to largest value.

Defining Character Classes

You use character classes in regular expressions to treat multiple

characters as a single item. You create a character class by enclosing

the characters that make up the class with bracket ([]) metacharac-

ters. Any characters included in a character class represent alternate

characters that are allowed in a pattern match. As an example of a

simple character class, consider the word “analyze,” which the British

spell as “analyse.” Both of the following statements return 1 because

the character class allows either spelling of the word:

preg_match("/analy[sz]e/", "analyse"); // returns 1

preg_match("/analy[sz]e/", "analyze"); // returns 1

You cannot

use the

range

[A-z] or

the range

[a-Z] to match all let-

ters. The range [A-z]

contains all of the charac-

ters with ASCII values of

65 (‘A’) through 122 (‘z’),

which includes nonalpha-

betic characters such as

‘[’ and ‘^’. The range

[a-Z] means a range

from 97 to 90, which is

not in order from smallest

to largest value.

In comparison, the following regular expression returns 0 because

“analyce” is not an accepted spelling of the word:

preg_match("/analy[sz]e/", "analyce"); // returns 0

You use a hyphen metacharacter (-) to specify a range of values in a

character class. You can include alphabetical or numerical ranges. You

specify all lowercase letters as [a-z], all uppercase letters as [A-Z],

and all letters as [A-Za-z]. You specify all numeric characters as [0-9].

The following statements demonstrate how to ensure that only the

values A, B, C, D, or F are assigned to the $LetterGrade variable. The

character class in the regular expression specifies a range of A-D or

the character “F” as valid values in the variable. Because the variable is

assigned a value of "B", the preg_match() function returns 1.

$LetterGrade = "B";

echo preg_match("/[A-DF]/", $LetterGrade); // returns 1


Working with Regular Expressions

In comparison, the following preg_match() function returns 0

because E is not a valid value in the character class:

$LetterGrade = "E";

echo preg_match("/[A-DF]/", $LetterGrade); // returns 0

To specify optional characters to exclude in a pattern match, include

the ^ metacharacter immediately after the opening bracket of a char-

acter class. The following examples demonstrate how to exclude the

letters E and G-Z from an acceptable pattern in the $LetterGrade

variable. Any ASCII character not listed as being excluded will match

the pattern. The first preg_match() function returns a value of 1

because the letter A is not excluded from the pattern match, whereas

the second preg_match() function returns a value of 0 because the

letter E is excluded from the pattern match.

$LetterGrade = "A";

echo preg_match("/[^EG-Z]/", $LetterGrade); // returns 1

$LetterGrade = "E";

echo preg_match("/[^EG-Z]/", $LetterGrade); // returns 0

167

The following statements demonstrate how to include or exclude

numeric characters from a pattern match. The first statement returns

1 because it allows any numeric character, whereas the second state-

ment returns 0 because it excludes any numeric character.

echo preg_match("/[0-9]/", "5"); // returns 1

echo preg_match("/[^0-9]/", "5"); // returns 0

Note that you can combine ranges in a character class. The first state-

ment demonstrates how to include all alphanumeric characters and

the second statement demonstrates how to exclude all lowercase and

uppercase letters:

echo preg_match("/[0-9a-zA-Z]/", "7"); // returns 1

echo preg_match("/[^a-zA-Z]/", "Q"); // returns 0

The following statement demonstrates how to use character classes to

create a phone number regular expression pattern:

preg_match("/^(1 )?(\([0-9]{3}\) )?([0-9]{3})(\-[0-9]{4})$/",

"1 (707) 555-1234"); // returns 1

As a more complex example of a character class, examine the follow-

ing e-mail validation regular expression that you saw earlier in this

chapter. At this point, you should recognize how the regular expression

pattern is constructed. The statement uses a case-insensitive pattern

modifier, so letter case is ignored. The anchor at the beginning of the

pattern specifies that the first part of the e-mail address must include

one or more of the characters A-Z (uppercase or lowercase), 0-9, an

underscore (_), or a hyphen (-). The second portion of the pattern

specifies that the e-mail address can include a dot separator, as in “don.

CHAPTER 3

Manipulating Strings

168

If you

include any

of the three

special char-

acters -, ^,

or ] anywhere else in the

character class, you will

not get the desired

results.

gosselin.” The pattern also requires the @ character. Following the lit-

eral @ character, the regular expression uses patterns like those in the

name portion of the e-mail address to specify the required structure

of the domain name. The last portion of the pattern specifies that the

top-level domain must consist of at least two, but not more than three,

alphabetic characters.

preg_match("/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[_a-z0-9-]

+(\.[_a-z0-9-]+)*(\.[a-z]{2,3})$/i", $Email);

The backslash character is not an escape character within a character

class. To include a literal hyphen (-) in a character class, it must be

the final character before the closing bracket. Otherwise, it is inter-

preted as a range indicator. To include a literal circumflex (^), it must

be the final character before the closing bracket or the literal hyphen.

To include a literal closing bracket (]), it must be the first character

after the opening bracket or negation symbol.

PCRE includes special character types that you can use to represent dif-

ferent types of data. For example, the \w expression can be used instead

of the “_0-9a-zA-Z” pattern to allow any alphanumeric characters and

the underscore character. Table 3-5 lists the PCRE character types.

Escape Sequence Description

\a

\cx

\d

\D

\e

\f

\h

\H

\n

\r

\s

\S

\t

\v

\V

\w

\W

alarm (hex 07)

“control-x”, where x is any character

any decimal digit

any character not in \d

escape (hex 1B)

formfeed (hex 0C)

any horizontal whitespace character

any character not in \h

newline (hex 0A)

carriage return (hex 0D)

any whitespace character

any character not in \s

tab (hex 09)

any vertical whitespace character

any character not in \v

any letter, number, or underscore character

any character not in \w

PCRE character types

Table 3-5


Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]