
- •The role of natural language processing
- •Linguistics and its structure
- •What we mean by computational linguistics
- •Word, what is it?
- •The important role of the fundamental science
- •Current state of applied research on spanish
- •Conclusions
- •II. A historical outline
- •The structuralist approach
- •Initial contribution of chomsky
- •A simple context-free grammar
- •Transformational grammars
- •The linguistic research after chomsky: valencies and interpretation
- •Linguistic research after chomsky: constraints
- •Head-driven phrase structure grammar
- •The idea of unification
- •The meaning text theory: multistage transformer and government patterns
- •The meaning text theory: dependency trees
- •The meaning text theory: semantic links
- •Conclusions
- •III. Products of computational linguistics: present and prospective
- •Classification of applied linguistic systems
- •Automatic hyphenation
- •Spell checking
- •Grammar checking
- •Style checking
- •References to words and word combinations
- •Information retrieval
- •Topical summarization
- •Automatic translation
- •Natural language interface
- •Extraction of factual data from texts
- •Text generation
- •Systems of language understanding
- •Related systems
- •Conclusions
- •IV. Language as a meaning text transformer
- •Possible points of view on natural language
- •Language as a bi-directional transformer
- •Text, what is it?
- •Meaning, what is it?
- •Two ways to represent meaning
- •Decomposition and atomization of meaning
- •More on homonymy
- •Multistage character of the meaning text transformer
- •Translation as a multistage transformation
- •Two sides of a sign
- •Linguistic sign
- •Linguistic sign in the mmt
- •Linguistic sign in hpsg
- •Are signifiers given by nature or by convention?
- •Generative, mtt, and constraint ideas in comparison
- •Conclusions
- •V. Linguistic models
- •What is modeling in general?
- •Neurolinguistic models
- •Psycholinguistic models
- •Functional models of language
- •Research linguistic models
- •Common features of modern models of language
- •Specific features of the meaning text model
- •Reduced models
- •Do we really need linguistic models?
- •Analogy in natural languages
- •Empirical versus rationalist approaches
- •Limited scope of the modern linguistic theories
- •Conclusions
- •Exercises
- •Review questions
- •Problems recommended for exams
- •Literature
- •Recommended literature
- •Additional literature
- •General grammars and dictionaries
- •References
- •Appendices some spanish-oriented groups and resources
Do we really need linguistic models?
Now let us reason a little bit on whether computer scientists really need a generalizing (complete) model of language.
In modern theoretical linguistics, certain researchers study phonology, the other ones morphology, the third ones syntax, and the fourth ones semantics and pragmatics. Within phonology, somebody became absorbed in accentuation, within semantics, in speech acts, etc. There is no limit to the subdivision of the great linguistic science, as well as there is seemingly no necessity to occupy oneself once more, after ancient Greeks, Ferdinand de Saussure and Noam Chomsky, with the philosophical question “What is natural language and what should its general model be?”
The main criteria of truth in theoretical linguistic research are its logical character, consistency, and correspondence between intuitive conceptions about the given linguistic phenomena of the theory’s author and of other members of linguists’ community.
In this sense, the works of modern specialists in theoretical linguistics seem to be just stages of inner development of this science. It often seems unnecessary to classify them according to whether they support or correspond to any complete model.
The situation in computational linguistics is somewhat different. Here the criterion of truth is the proximity of results of functioning of a program for processing language utterances to the ideal performance determined by mental abilities of an average speaker of the language. Since the processing procedure, because of its complexity, should be split into several stages, a complete model is quite necessary to recommend what formal features and structures are to be assigned to the utterances and to the language as a whole on each stage, and how these features should interact and participate at each stage of linguistic transformations within computer. Thus, all theoretical premises and results should be given here quite explicitly and should correspond to each other in theirs structures and interfaces.
Theoreticians tell us about the rise of experimental linguistics on this basis. It seems that in the future, experimental tests of the deepest results in all “partial” linguistic theories will be an inevitable element of evolution of this science as a whole. As to computational linguistics, the computerized experimentation is crucial right now, and it is directly influenced by what structures are selected for language description and what processing steps are recommended by the theory.
Therefore, the seemimgly philosophical problem of linguistic modeling turned out to be primordial for computational linguistics. Two linguistic models selected from their vast variety will be studied in this book in more detail.
Analogy in natural languages
Analogy is the prevalence of a pattern (i.e., one rule or a small set of rules) in the formal description of some linguistic phenomena. In the simplest case, the pattern can be represented with the partially filled table like the one on the page 20:
revolución |
revolution |
investigación |
? |
The history of any natural language contains numerous cases when a phonologic or morphologic pattern became prevailing and, by analogy, has adduced words with similar properties.
An example of analogy in Spanish phonology is the availability of the e before the consonant combinations sp‑, st‑, sn‑, or sf‑ at the beginning of words. In Latin, the combinations sp- and st‑ at the initial position were quite habitual: specialis, spectaculum, spiritus, statua, statura, etc.
When Spanish language was developed from Vulgar Latin, all such words had been considered uneasy in their pronunciation and have been supplied with e-: especial, espectáculo, espíritu, estatua, estatura, etc. Thus, a law of “hispanicizing by analogy” was formed, according to which all words with such a phonetic peculiarity, while loaned from any foreign language, acquire the e as the initial letter.
We can compile the following table of analogy, where the right column gives Spanish words after their loaning from various languages:
statura (Lat.) |
estatura |
sphaira (Gr.) |
esfera |
slogan (Eng.) |
eslogan |
smoking (Eng.) |
esmoquin |
standardize (Eng.) |
estandarizar |
As another example, one can observe a multiplicity of nouns ending in ‑ción in Spanish, though there exist another suffixes for the same meaning of action and/or its result: ‑miento, ‑aje, ‑azgo, ‑anza, etc. Development of Spanish in the recent centuries has produced a great number of ‑ción-words derived by analogy, so that sometimes a special effort is necessary to avoid their clustering in one sentence for better style. Such a stylistic problem has been even called cacophony.
Nevertheless, an important feature of language restricts the law of analogy. If the analogy generates too many homonyms, easy understanding of speech is hampered. In such situations, analogy is not usually applied.
A more general tendency can be also observed. Lexicon and levels of natural language are conceptual systems of intricately interrelated subsystems. If a feature of some subsystem has the tendency to change and this hinders the correct functioning of another subsystem, then two possible ways for bypassing the trouble can be observed. First, the innovation of the initiating subsystem can be not accepted. Second, the influenced subsystem can also change its rules, introducing in turn its own innovations.
For example, if a metonymic change of meaning gives a new word, and the new word frequently occurs in the same contexts as the original one, then this can hinder the comprehension. Hence, either the novel or the original word should be eliminated from language.
In modern languages, one can see the immediate impact of analogy in the fact that the great amount of scientific, technical, and political terms is created according to quite a few morphologic rules. For example, the Spanish verbs automatizar, pasteurizar, globalizar, etc., are constructed coming from a noun (maybe proper name) expressing a conception (autómata, Pasteur, globo, etc.) and the suffix -izar/-alizar expressing the idea of subjection to a conception or functioning according to it.
Computational linguistics directly uses the laws of analogy in the processing of unknown words. Any online dictionary is limited in its size so that many words already known in the language are absent in it (say, because these words appear in the language after the dictionary was compiled). To “understand” such words in some way, the program can presuppose the most common and frequent properties.
Let us imagine, for instance, a Spanish-speaking reader who meets the word internetizarán in a text. Basing on the morphologic rules, he or she readily reconstructs the infinitive of the hypothetical verb internetizar. However, this verb is not familiar either, whereas the word Internet could be already included in his or her mental dictionary. According to the analogy implied by ‑izar, the reader thus can conclude that internetizar means ‘to make something to function on the principles of Internet.’
A natural language processor can reason just in the same way. Moreover, when such a program meets a word like linuxizar it can suppose that there exists a conception linux even if it is absent in the machine dictionary. Such supposition can suggest a very rough “comprehension” of the unknown word: ‘to make something to function on the principles of linux,’ even if the word linux is left incomprehensible.