Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Московский государственный индустриальный университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Oracle Database 11g.pdf

Скачиваний:

Добавлен:

10.06.2015

Размер:

12.69 Mб

Скачать

☆

<<< < Предыдущая 51 52 53 54 55 56 57 58 59 60 61 6263 / 8563 64 65 66 67 68 69 70 71 72 73 74 75 > Следующая >>>

Using Linguistic Sorts and Searches

657

Different languages follow different rules when it comes to sorting text. Unfortunately, that means that there is no “one-size-fits-all” algorithm that can be used. Instead, Oracle’s global support functionality allows not only binary sorting methods but also linguistic sorting and searching methodologies to provide the flexibility to support the needs of many languages.

In the following sections, you will learn the methods that Oracle uses when it performs text-sorting operations. You will also learn about the different NLS parameters that impact linguistic sorting and searching. Next, you will learn about the different types of linguistic sorts supported by Oracle. Last, you will learn about linguistic text searches.

An Overview of Text Sorting

There are many ways in which text sorting can be accomplished. Sort order can be case sensitive or case can be ignored. Diacritics (accent marks) can be considered or ignored. Sorting can be done phonetically or based on the appearance of the character.

Some languages even consider groupings of characters to have a specific sort order. For example, traditional Spanish treats ch as a character that sorts after the letter C. Therefore, when sorting the words cat, dog, cow, and chinchilla, the correct sort sequence would be cat, cow, chinchilla, and dog.

To support these different sorting methods, Oracle offers two basic categories of sorting: binary and linguistic.

Binary Sorts

Binary sorts are the fastest and most efficient sorting method offered by Oracle. However, they are also the most limited. Binary sorts perform a numeric sort based on the encoded value (in the character set) for each character. As long as the character set encodes all of the characters in the proper sort order, this method works very well. The performance is also exceptional.

For example, in the US7ASCII character set, alphabetical characters are encoded as shown in Table 13.4. Note that this is just a subset of the character set.

Ta b l e 13 . 4 US7ASCII Alphabetical Characters

Char	Value	Char	Value	Char	Value	Char	Value

A	65	N	78	a	97	n	110
B	66	O	79	b	98	o	111
C	67	P	80	c	99	p	112

658	Chapter 13 n Implementing Globalization Support
Ta b l e 13 . 4 US7ASCII			Alphabetical Characters (continued)

Char	Value	Char		Value	Char	Value	Char	Value

D	68	Q		81	d	100	q	113
E	69	R		82	e	101	r	114
F	70	S		83	f	102	s	115
G	71	T		84	g	103	t	116
H	72	U		85	h	104	u	117
I	73	V		86	i	105	v	118
J	74	W		87	j	106	w	119
K	75	X		88	k	107	x	120
L	76	Y		89	l	108	y	121
M	77	Z		90	m	109	z	122

Because the encoded values ascend in correlation with the characters, a binary sort will always perform a proper alphabetical sort. In addition, uppercase characters will always sort higher than lowercase characters.

However, different languages may share the same alphabet (and therefore, the same character set) yet utilize a sort order that deviates from the encoding order of the character set. In this situation, binary sorting will fail to produce an acceptable result.

Binary sorting is Oracle’s default sorting method.

Linguistic Sorts

Linguistic sorts, unlike binary sorts, operate independently of the underlying encoded values. Instead, they allow character data to be sorted based on the rules of specific languages. Linguistic sorts offer the flexibility to deal with the caveats imposed by different languages. Oracle provides a rich set of linguistic sort definitions that cover most of the languages

of the world. However, it also provides the ability to define new definitions or to modify existing definitions. The Oracle Locale Builder, a graphical tool that ships with Oracle 11g, can be used to view, create, and modify sort definitions and other locale definitions. However, as mentioned earlier, Locale Builder is not covered in this book.

Using Linguistic Sorts and Searches

659

Linguistic sorts are defined using a variety of rules that govern the sorting process. The following elements are available and can be used to create a comprehensive linguistic sort:

Base letters Base letters are the individual letters upon which other characters are based. The derived characters would map back to the base letter to determine the sorting value. For example, the character A is a base letter, while À, Á, Ã, Ä, a, à, á, and ä would all map to A as a base letter.

Ignorable characters Ignorable characters, just as the name implies, can be defined as having no effect on sort order. Diacritics, such as the umlaut, can be classified as ignorable, as can certain punctuation characters, such as the hyphen. Therefore, a word such as e mail would be sorted the same as email.

Contracting characters Contracting characters represent two or more characters that are treated linguistically as a single character. An example—the traditional Spanish ch string— was explained earlier in this section. Contracting characters require flexibility in the sorting algorithm to read ahead to the next character to determine if a contracting character has been found.

Expanding characters With some locales, repeating or commonly occurring strings of characters are compressed into a single character for brevity’s sake. However, the character needs to sort as if all characters are present. These are referred to as expanding characters.

For example, the ö character should be treated as the string oe for sorting purposes.

Context-sensitive characters With certain languages, characters are sorted differently based upon their relationship to other characters. These are generally characters that modify the preceding character. For example, a Japanese length mark is sorted according to the vowel that precedes it.

Canonical equivalence When a Unicode character set is used, the character ö and the string o¨ can be considered equal from a sorting perspective. This is because the code points of the two-character string match the code point of the individual character. The two are said to have canonical equivalence.

In situations of canonical equivalence, the value of the CANONICAL_EQUIVALENCE linguistic flag (with a value of TRUE or FALSE) determines whether the rules of canonical equivalence should be followed.

Reverse secondary sorting In some languages, strings containing diacritics will be sorted from left to right on the base characters and then from right to left on the diacritics. This is referred to as reverse secondary sorting. For example, resumé would sort before résume because of the position of the diacritic from left to right.

The REVERSE_SECONDARY=TRUE linguistic flag enables this functionality.

Character rearrangement In certain languages (notably Thai and Laotian dialects), sorting rules declare that certain characters should switch places with the following character before sorting. This generally happens when a consonant is preceded by a vowel sound. In this case, the consonant is given priority, forcing the characters to be switched before the sort begins.

660 Chapter 13 n Implementing Globalization Support

The SWAP_WITH_NEXT linguistic flag can determine whether character rearrangement will occur within a sort definition.

Don’t confuse linguistic flags with Oracle parameter settings. Linguistic flags are defined for specific sort-order definitions when they are created.

These different sorting methods represent the toolset available to sort and search text. Different languages may use one or more of the linguistic methods listed here. But as long as the rules of a language can be described using these methods, Oracle is able to perform linguistic sorts, either through an existing sort definition or through a custom sort definition.

Using Linguistic Sort Parameters

Linguistic sorts are generally applicable to a specific language or to a specific character set. And, as mentioned previously, there are many predefined linguistic sort definitions that may be utilized. Therefore, it is unlikely that you would ever need to define your own.

You can instruct Oracle to utilize specific linguistic sorts by setting the appropriate NLS sort parameters. In this section, you will learn about the NLS_SORT and NLS_COMP parameters and how they affect linguistic sorting operations.

NLS_SORT

The NLS_SORT parameter defines which type of sorting—binary or linguistic—should be performed for SQL sort operations. By default, the value for NLS_SORT is the default sort method defined for the language identified in the NLS_LANGUAGE parameter. For example, if the NLS_ LANGUAGE parameter is set to AMERICAN, the default value for the NLS_SORT parameter will be

BINARY.

To instruct Oracle to use linguistic sorting, this parameter can be set to the name of any valid linguistic sort definition, as shown here:

SQL> alter session set NLS_SORT = German;

Session altered.

A list of valid sort definition names is shown here. You could also query the V$NLS_ VALID_VALUES view (as shown earlier in this chapter).

ARABIC	GERMAN	SWISS
ARABIC_ABJ_MATCH	GERMAN_DIN	TCHINESE_RADICAL_M
ARABIC_ABJ_SORT	GREEK	TCHINESE_STROKE_M
ARABIC_MATCH	HEBREW	THAI_DICTIONARY
ASCII7	HKSCS	THAI_M

	Using Linguistic Sorts and Searches		661
AZERBAIJANI	HUNGARIAN	THAI_TELEPHONE
BENGALI	ICELANDIC	TURKISH
BIG5	INDONESIAN	UKRAINIAN
BINARY	ITALIAN	UNICODE_BINARY
BULGARIAN	JAPANESE	VIETNAMESE
CANADIAN FRENCH	JAPANESE_M	WEST_EUROPEAN
CANADIAN_M	KOREAN_M	XAZERBAIJANI
CATALAN	LATIN	XCATALAN
CROATIAN	LATVIAN	XCROATIAN
CZECH	LITHUANIAN	XCZECH
CZECH_PUNCTUATION	MALAY	XCZECH_PUNCTUATION
DANISH	NORWEGIAN	XDANISH
DANISH_M	POLISH	XDUTCH
DUTCH	PUNCTUATION	XFRENCH
EBCDIC	ROMANIAN	XGERMAN
EEC_EURO	RUSSIAN	XGERMAN_DIN
EEC_EUROPA3	SCHINESE_PINYIN_M	XHUNGARIAN
ESTONIAN	SCHINESE_RADICAL_M	XPUNCTUATION
FINNISH	SCHINESE_STROKE_M	XSLOVAK
FRENCH	SLOVAK	XSLOVENIAN
FRENCH_M	SLOVENIAN	XSPANISH
GBK	SPANISH	XSWISS
GENERIC_BASELETTER	SPANISH_M	XTURKISH
GENERIC_M	SWEDISH	XWEST_EUROPEAN

By using the NLS_SORT parameter, you can make the following changes to Oracle’s default functionality:

NN	Set the default sort method for all ORDER BY operations.
NN	Set the default sort value for the NLSSORT function.

662 Chapter 13 n Implementing Globalization Support

It is important to note that not all SQL functionality supports linguistic sorting. Certain functions support only binary sorts. However, most of the commonly used methods are supported. Also, all NLS-specific SQL functions (for example, NLSSORT) will support linguistic sorts.

The methods listed here support linguistic sorting:

NN ORDER BY

NN BETWEEN

NN CASE WHEN

NN HAVING

NN IN/OUT

NN START WITH

NN WHERE

By default, all of these operations will perform binary sorts. When you set the NLS_SORT parameter, SQL statements using the WHERE operation will perform linguistic sorts by default. The following example demonstrates the use of the NLS_SORT parameter. Initially, you can see that your session has no value set for NLS_SORT. Therefore, it will inherit the default

setting from the NLS_LANGUAGE parameter (BINARY).

SQL> show parameters NLS_LANGUAGE

NAME	TYPE	VALUE
------------------------------------	-----------	----------
nls_language	string	AMERICAN
SQL> show parameters NLS_SORT
NAME	TYPE	VALUE
------------------------------------	-----------	----------
nls_sort	string

As you learned earlier, the default sort for the language AMERICAN is BINARY. The default setting for NLS_COMP is BINARY as well. Therefore, you can expect that, by default, Oracle will perform a binary sort unless otherwise specified, as shown here:

SQL> select * from sort_test order by name;

NAME

--------------------------------

Finsteraarhornhutte Grünhornlücke einschließlich

Using Linguistic Sorts and Searches

663

finsteraarhornhütte

grünhornlücke

5 rows selected.

As expected, Oracle sorted the rows based on the encoded value of the characters in the US7ASCII character set. Therefore, all uppercase characters sort before lowercase characters.

Because the words in this table are of German origin, it is logical that you might want to sort them using a German sorting method. To change the sorting method for a GROUP BY clause, the NLS_SORT parameter can be set as shown here:

SQL> alter session set NLS_SORT=German_din;

Session altered.

SQL> select * from sort_test order by name;

NAME

--------------------------------------------

einschließlich Finsteraarhornhutte finsteraarhornhütte Grünhornlücke grünhornlücke

5 rows selected.

As you can see, setting the NLS_SORT parameter changed the default sorting method to a linguistic sort instead of a binary sort.

In the next step, another query is executed, this time using a WHERE condition rather than an ORDER BY clause:

SQL> select * from sort_test

2where name > ‘einschließlich’;

NAME

---------------------------------------

finsteraarhornhütte

grünhornlücke

2 rows selected.

664 Chapter 13 n Implementing Globalization Support

The result of this query might not be what you expect. Instead of the expected four rows (which a linguistic sort would have returned), only two rows are returned, indicating that a binary sort took place instead.

Remember, the NLS_SORT parameter overrides the default sorting method for ORDER BY operations and for the NLSSORT function, but it has no effect on other sort operations, such as WHERE conditions. Therefore, this query ignored the parameter entirely.

To perform a linguistic sort, you call the NLSSORT function. Normally, the function would be called like this:

SQL> select * from sort_test

where nlssort(name, ‘NLS_SORT=German_din’) > nlssort(‘einschließlich’,’NLS_SORT=German_din’);

NAME

--------------------------------------------------

Finsteraarhornhutte finsteraarhornhütte Grünhornlücke grünhornlücke

4 rows selected.

However, because the NLS_SORT parameter defines the default sort for the NLSSORT function, specifying the sort inside the function is unnecessary. The following method works as well:

SQL> select * from sort_test

where nlssort(name) > nlssort(‘einschließlich’);

NAME

--------------------------------------------------

Finsteraarhornhutte finsteraarhornhütte Grünhornlücke grünhornlücke

4 rows selected.

As you can see, in both queries the sort was performed linguistically and returned the expected rows.

The NLS_SORT parameter is very limited in relation to linguistic sorting. The NLS_COMP parameter, on the other hand, makes linguistic sorting much easier.

Using Linguistic Sorts and Searches

665

NLS_COMP

The NLS_COMP parameter works in conjunction with the NLS_SORT parameter to make linguistic sorts easier to use. When the NLS_COMP parameter is set to a value of ANSI, all of the following SQL operations will default to linguistic sorting (using the language specified in NLS_SORT parameter):

NN ORDER BY

NN BETWEEN

NN CASE WHEN

NN HAVING

NN IN/OUT

NN START WITH

NN WHERE

Setting the NLS_COMP parameter makes it unnecessary to call the NLSSORT function when using these sort operations. As you can guess, this makes linguistic sorting much easier to perform.

The NLS_COMP parameter can be set to either BINARY (the default) or ANSI. The following example shows the usage of the NLS_COMP parameter:

SQL> alter session set NLS_SORT=German_din;

Session altered.

SQL> alter session set NLS_COMP=ANSI;

Session altered.

SQL> select * from sort_test

2where name > ‘einschließlich’;

NAME

------------------------------------------

Finsteraarhornhutte finsteraarhornhütte Grünhornlücke grünhornlücke

4 rows selected.

As you can see, the query performed the linguistic sort and returned the expected results, even without using the NLSSORT function.

666 Chapter 13 n Implementing Globalization Support

If the NLS_COMP parameter is set back to BINARY, binary sorting occurs once again:

SQL> alter session set NLS_COMP=BINARY;

Session altered.

SQL> select * from sort_test

2where name > ‘einschließlich’;

NAME

--------------------------------------------

finsteraarhornhütte

grünhornlücke

2 rows selected.

Linguistic Sort Types

When performing linguistic sorts, Oracle uses different methodologies, depending upon the number of languages involved in the sort. If character data in only one language is being sorted, this is classified as a monolingual linguistic sort. If more than one language is involved in the sort, it is classified as a multilingual linguistic sort.

In the following sections, you will learn the methodology that Oracle implements in performing both monolingual and multilingual linguistic sorts. You will also learn about accent-insensitive and case-insensitive linguistic sorts.

Monolingual Linguistic Sorts

When dealing with only a single language inside a sort, Oracle performs a two-step process to compare character strings. First, the major value of the strings is compared. Next, if necessary, the minor value of the strings is compared.

Major and minor values are sort values assigned to letters in the character set. A base letter and those derived from it will generally share a common major value, but they will have different minor values based on the desired sort order.

Here’s an example:

Letter	Major Value	Minor Value
A	30	10
A	30	20
Ä	30	30
Ä	30	40
B	40	10

Using Linguistic Sorts and Searches

667

The example shows that all four variations of the letter A have identical major values, but differing minor values. When two letters share a major value, the minor value will determine the sort order.

The major-value numbers are assigned to a Unicode code point (a 16-bit binary value that defines a character in a Unicode character set).

Multilingual Linguistic Sorts

Multilingual sorts offer the ability to sort mixed languges within the same sort. For example, if you have a table that stores names in both English and Spanish, a multilingual sort should be used.

Multilingual sorts perform three levels of evaluation: primary, secondary, and tertiary. Primary sorts assign a primary sort value based on the base letter of each character (dia-

critics and case are ignored). If a character is defined as ignorable, it is assigned a primary value of zero.

Secondary-level sorts consider diacritics to differentiate accented letters from base letters in assigning a secondary sort level. For example, A and ä share the same base letter, so they have the same primary sort level but they will have different secondary levels.

Tertiary-level sorts consider character case to differentiate uppercase and lowercase letters. Tertiary sorts also handle special characters such as *, +, and -.

Consider the following words: Fahrvergnügen Fahrvergnugen farhrvergnugen fahrvergnügen

Because all of these words share the same base letters in the same order, all of them would be considered equivalent at the primary sort level. After a secondary-level sort, they would be ordered similarly to the following list:

Fahrvergnugen farhrvergnugen Fahrvergnügen fahrvergnügen

The secondary-level sort is concerned only with diacritics, so it will sort characters without diacritics above those with diacritics. After that, the words are displayed in their primary sort order. Because the words in this example have identical primary sort orders, there is no guarantee which word will appear before the other. The only guarantee is that those with diacritics will sort after those without.

After the tertiary-level sort is performed, the list should look exactly like this: farhrvergnugen

Fahrvergnugen fahrvergnügen Fahrvergnügen

668 Chapter 13 n Implementing Globalization Support

The tertiary-level sort applies the case rule to the data, forcing lowercase letters to sort before uppercase letters. This is the final result of the sort after applying all three multilingual sorting levels.

In keeping with the ISO 14651 standard for multilingual sorting, Oracle appends an _M to the sort name to identify it as multilingual. For example, FRENCH_M identifies a French multilingual sort, whereas FRENCH identifies a French monolingual sort. Table 13.5 shows the multilingual sorts predefined in Oracle 11g.

Ta b l e 13 . 5 Multilingual Sorts Available in Oracle 11g

Multilingual Sort Name	Description

CANADIAN_M	Canadian French
DANISH_M	Danish
FRENCH_M	French
GENERIC_M	Generic based on ISO 14651
JAPANESE_M	Japanese
KOREAN_M	Korean
SPANISH_M	Traditional Spanish
THAI_M	Thai
SCHINESE_RADICAL_M	Simplified Chinese
SCHINESE_STROKE_M	Simplified Chinese
SCHINESE_PINYIN_M	Simplified Chinese
TCHINESE_RADICAL_M	Traditional Chinese
TCHINESE_STROKE_M	Traditional Chinese

Case-Insensitive and Accent-Insensitive Linguistic Sorts

Oracle, by default, will always consider both the case of the characters and any diacritics when performing sort operations. As you’ve seen in previous examples, linguistic sorts have rules to govern precedence between uppercase and lowercase words, as well as those words containing diacritics.

Using Linguistic Sorts and Searches

669

However, you may wish to override this functionality from time to time and choose to ignore case and/or diacritics. Oracle 11g offers case-insensitive and accent-insensitive sorting options to allow for these cases.

To specify case-insensitive or accent-insensitive sorts, the NLS_SORT parameter is used, but with the following changes:

NN	For a case-insensitive linguistic sort, append the string _CI to the sort name.
NN
NN	For an accent-insensitive linguistic sort, append the string _AI to the sort name.

Accent-insensitive sorts are also case insensitive by default.

For example, to specify a French, multilingual, accent-insensitive sort, use the following:

NLS_SORT = French_M_AI

Here is how to specify a German, monolingual, case-insensitive sort:

NLS_SORT = German_CI

Case-Insensitive and Accent-Insensitive Binary Sorts

Binary sorts can also be designated as case insensitive or accent insensitive. The NLS_SORT parameter can be set to BINARY_CI or BINARY_AI. Table 13.6 shows how the sort will be affected by these settings.

Ta b l e 13 .6 Binary Case and Accent-Insensitive Sort Options

Sort Name	Sort Type	Case Insensitive?	Accent Insensitive?

BINARY_CI	Binary	Yes	No
BINARY_AI	Binary	Yes	Yes

As you can see, using the BINARY_AI sort will result in both an accent-insensitive and case-insensitive sort.

Searching Linguistic Strings

Linguistic searches are closely related to linguistic sorts and are directly affected by the NLS_ SORT setting. To accomplish linguistically meaningful searches, Oracle must apply the same rules it applies for linguistic sorts.

670 Chapter 13 n Implementing Globalization Support

Earlier in this section, you saw several examples of linguistic string searching, including the following:

SQL> select * from sort_test

2where name > ‘einschließlich’;

NAME

------------------------------------------

Finsteraarhornhutte finsteraarhornhütte Grünhornlücke grünhornlücke

4 rows selected.

When you set the NLS_COMP parameter to ANSI and the NLS_SORT parameter to the desired sort language, the WHERE operator (as well as several others) will perform linguistic searching by default.

And, just as you did with sorts, if the NLS_SORT is set to ignore case or accents, linguistic searches will follow suit, as in this example:

SQL> alter session set NLS_COMP=ANSI;

Session altered.

SQL> alter session set NLS_SORT=German_din_ci;

Session altered.

SQL> select * from sort_test

2where name = ‘Grünhornlücke’;

NAME

-----------------------------------------

Grünhornlücke

grünhornlücke

As you can see, the search ignored the case and returned both rows that matched. This is the expected functionality of the case-insensitive search.

In the next example, the NLS_SORT parameter will be set to define an accent-insensitive search:

SQL> alter session set NLS_SORT=German_din_ai;

Session altered.

<<< < Предыдущая 51 52 53 54 55 56 57 58 59 60 61 6263 / 8563 64 65 66 67 68 69 70 71 72 73 74 75 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
11.11.20181.02 Mб9OBDZ metod.doc
#
10.06.2015215.55 Кб11oformlenie_title_list.doc
#
01.04.20252.25 Mб1ogl_peredelannoe.doc
#
05.05.201982.43 Кб8Opt2.doc
#
10.06.20151.35 Mб18Optsionnye_strategii.pdf
#
10.06.201512.69 Mб90Oracle Database 11g.pdf
#
20.09.2019264.19 Кб15organizacia_servisa.doc
#
09.11.201968.61 Кб4Otchet(vit).doc
#
10.06.2015114.69 Кб19Otchet_po_proizvodstvennoy_praktike.doc
#
10.06.20151.05 Mб23otchet_sokraschenny [2].doc
#
10.06.2015569.34 Кб12otchet_sokraschenny.doc