Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
лекция с вопросами и тестами по лингвистике.doc
Скачиваний:
3
Добавлен:
01.05.2025
Размер:
1.14 Mб
Скачать

Do we really need linguistic models?

Now let us reason a little bit on whether computer scientists really need a generalizing (complete) model of language.

In modern theoretical linguistics, certain researchers study phonology, the other ones morphology, the third ones syntax, and the fourth ones semantics and pragmatics. Within phonology, somebody became absorbed in accentuation, within semantics, in speech acts, etc. There is no limit to the subdivision of the great linguistic science, as well as there is seemingly no necessity to occupy oneself once more, after ancient Greeks, Ferdinand de Saussure and Noam Chomsky, with the philosophical question “What is natural language and what should its general model be?”

The main criteria of truth in theoretical linguistic research are its logical character, consistency, and correspondence between intuitive conceptions about the given linguistic phenomena of the theory’s author and of other members of linguists’ community.

In this sense, the works of modern specialists in theoretical linguistics seem to be just stages of inner development of this science. It often seems unnecessary to classify them according to whether they support or correspond to any complete model.

The situation in computational linguistics is somewhat different. Here the criterion of truth is the proximity of results of functioning of a program for processing language utterances to the ideal performance determined by mental abilities of an average speaker of the language. Since the processing procedure, because of its complexity, should be split into several stages, a complete model is quite necessary to recommend what formal features and structures are to be assigned to the utterances and to the language as a whole on each stage, and how these features should interact and participate at each stage of linguistic transformations within computer. Thus, all theoretical premises and results should be given here quite explicitly and should correspond to each other in theirs structures and interfaces.

Theoreticians tell us about the rise of experimental linguistics on this basis. It seems that in the future, experimental tests of the deepest results in all “partial” linguistic theories will be an inevitable element of evolution of this science as a whole. As to computational linguistics, the computerized experimentation is crucial right now, and it is directly influenced by what structures are selected for language description and what processing steps are recommended by the theory.

Therefore, the seemimgly philosophical problem of linguistic modeling turned out to be primordial for computational linguistics. Two linguistic models selected from their vast variety will be studied in this book in more detail.

Analogy in natural languages

Analogy is the prevalence of a pattern (i.e., one rule or a small set of rules) in the formal description of some linguistic phenomena. In the simplest case, the pattern can be represented with the partially filled table like the one on the page 20:

revolución

revolution

investigación

?

The history of any natural language contains numerous cases when a phonologic or morphologic pattern became prevailing and, by analogy, has adduced words with similar properties.

An example of analogy in Spanish phonology is the availability of the e before the consonant combinations sp, st‑, sn‑, or sf‑ at the beginning of words. In Latin, the combinations sp- and st at the initial position were quite habitual: specialis, spectaculum, spiritus, statua, statura, etc.

When Spanish language was developed from Vulgar Latin, all such words had been considered uneasy in their pronunciation and have been supplied with e-: especial, espectáculo, espíritu, estatua, estatura, etc. Thus, a law of “hispanicizing by analogy” was formed, according to which all words with such a phonetic peculiarity, while loaned from any foreign language, acquire the e as the initial letter.

We can compile the following table of analogy, where the right column gives Spanish words after their loaning from various languages:

statura (Lat.)

estatura

sphaira (Gr.)

esfera

slogan (Eng.)

eslogan

smoking (Eng.)

esmoquin

standardize (Eng.)

estandarizar

As another example, one can observe a multiplicity of nouns ending in ‑ción in Spanish, though there exist another suffixes for the same meaning of action and/or its result: ‑miento, aje, azgo, anza, etc. Development of Spanish in the recent centuries has produced a great number of ‑ción-words derived by analogy, so that sometimes a special effort is necessary to avoid their clustering in one sentence for better style. Such a stylistic problem has been even called cacophony.

Nevertheless, an important feature of language restricts the law of analogy. If the analogy generates too many homonyms, easy understanding of speech is hampered. In such situations, analogy is not usually applied.

A more general tendency can be also observed. Lexicon and levels of natural language are conceptual systems of intricately interrelated subsystems. If a feature of some subsystem has the tendency to change and this hinders the correct functioning of another subsystem, then two possible ways for bypassing the trouble can be observed. First, the innovation of the initiating subsystem can be not accepted. Second, the influenced subsystem can also change its rules, introducing in turn its own innovations.

For example, if a metonymic change of meaning gives a new word, and the new word frequently occurs in the same contexts as the original one, then this can hinder the comprehension. Hence, either the novel or the original word should be eliminated from language. 

In modern languages, one can see the immediate impact of analogy in the fact that the great amount of scientific, technical, and political terms is created according to quite a few morphologic rules. For example, the Spanish verbs automatizar, pasteurizar, globalizar, etc., are constructed coming from a noun (maybe proper name) expressing a conception (autómata, Pasteur, globo, etc.) and the suffix -izar/-alizar expressing the idea of subjection to a conception or functioning according to it.

Computational linguistics directly uses the laws of analogy in the processing of unknown words. Any online dictionary is limited in its size so that many words already known in the language are absent in it (say, because these words appear in the language after the dictionary was compiled). To “understand” such words in some way, the program can presuppose the most common and frequent properties.

Let us imagine, for instance, a Spanish-speaking reader who meets the word internetizarán in a text. Basing on the morphologic rules, he or she readily reconstructs the infinitive of the hypothetical verb internetizar. However, this verb is not familiar either, whereas the word Internet could be already included in his or her mental dictionary. According to the analogy implied by ‑izar, the reader thus can conclude that internetizar means ‘to make something to function on the principles of Internet.’

A natural language processor can reason just in the same way. Moreover, when such a program meets a word like linuxizar it can suppose that there exists a conception linux even if it is absent in the machine dictionary. Such supposition can suggest a very rough “comprehension” of the unknown word: ‘to make something to function on the principles of  linux,’ even if the word linux is left incomprehensible.