
- •The role of natural language processing
- •Linguistics and its structure
- •What we mean by computational linguistics
- •Word, what is it?
- •The important role of the fundamental science
- •Current state of applied research on spanish
- •Conclusions
- •II. A historical outline
- •The structuralist approach
- •Initial contribution of chomsky
- •A simple context-free grammar
- •Transformational grammars
- •The linguistic research after chomsky: valencies and interpretation
- •Linguistic research after chomsky: constraints
- •Head-driven phrase structure grammar
- •The idea of unification
- •The meaning text theory: multistage transformer and government patterns
- •The meaning text theory: dependency trees
- •The meaning text theory: semantic links
- •Conclusions
- •III. Products of computational linguistics: present and prospective
- •Classification of applied linguistic systems
- •Automatic hyphenation
- •Spell checking
- •Grammar checking
- •Style checking
- •References to words and word combinations
- •Information retrieval
- •Topical summarization
- •Automatic translation
- •Natural language interface
- •Extraction of factual data from texts
- •Text generation
- •Systems of language understanding
- •Related systems
- •Conclusions
- •IV. Language as a meaning text transformer
- •Possible points of view on natural language
- •Language as a bi-directional transformer
- •Text, what is it?
- •Meaning, what is it?
- •Two ways to represent meaning
- •Decomposition and atomization of meaning
- •More on homonymy
- •Multistage character of the meaning text transformer
- •Translation as a multistage transformation
- •Two sides of a sign
- •Linguistic sign
- •Linguistic sign in the mmt
- •Linguistic sign in hpsg
- •Are signifiers given by nature or by convention?
- •Generative, mtt, and constraint ideas in comparison
- •Conclusions
- •V. Linguistic models
- •What is modeling in general?
- •Neurolinguistic models
- •Psycholinguistic models
- •Functional models of language
- •Research linguistic models
- •Common features of modern models of language
- •Specific features of the meaning text model
- •Reduced models
- •Do we really need linguistic models?
- •Analogy in natural languages
- •Empirical versus rationalist approaches
- •Limited scope of the modern linguistic theories
- •Conclusions
- •Exercises
- •Review questions
- •Problems recommended for exams
- •Literature
- •Recommended literature
- •Additional literature
- •General grammars and dictionaries
- •References
- •Appendices some spanish-oriented groups and resources
Natural language interface
The task performed by a natural language interface to a database is to understand questions entered by a user in natural language and to provide answers—usually in natural language, but sometimes as a formatted output. Typically, the entered queries, or questions, concern some facts about data contained in a database.
Since each database is to some degree specialized, the language of the queries and the set of words used in them are usually very limited. Hence, the linguistic task of grammatical and semantic analysis is much simpler than for other tasks related to natural language, such as translation.
There are some quite successful systems with natural language interfaces that are able to understand a very specialized sublanguage quite well. Other systems, with other, usually less specialized sublanguages, are much less successful. Therefore, this problem does not have, at least thus far, a universal solution, most of the solutions being constructed ad hoc for each specific system.
The developers of the most popular database management systems usually supply their product with a formal query-constructing language, such as SQL. To learn such a language is not too difficult, and this diminishes the need for a natural language interface. We are not aware of any existing commercial interface system that works with a truly unlimited natural language.
Nevertheless, the task of creating such an interface seems very attractive for many research teams all over the world. Especially useful could be natural language interfaces with speech recognition capabilities, which also would allow the user to make queries or give commands over a telephone line.
The task of development of natural language interfaces, though being less demanding to such branches of linguistics as morphology or syntax, are very demanding to such “deeper” branches of linguistics as semantics, pragmatics, and theory of discourse.
The specific problem of the interface systems is that they work not with a narrative, a monologue, but with a dialogue, a set of short, incomplete, interleaving remarks. For example, in the following dialogue:
User: Are there wide high-resolution matrix printers in the store?
System: No, there are no such printers in the store.
User: And narrow?
it is difficult for the computer to understand the meaning of the last remark.
A rather detailed linguistic analysis is necessary to re-formulate this user’s question to Are there narrow high-resolution matrix printers in the store? In many cases, the only way for the computer to understand such elliptical questions is to build a model of the user’s current goals, its knowledge, and interests, and then try to guess what the computer itself would be asking at this point of the dialogue if it were the user, and in what words it would formulate such a question. This idea can be called analysis through synthesis.
Extraction of factual data from texts
Extraction of factual data from texts is the task of automatic generation of elements of a factographic database, such as fields, or parameters, based on on-line texts. Often the flows of the current news from the Internet or from an information agency are used as the source of information for such systems, and the parameters of interest can be the demand for a specific type of a product in various regions, the prices of specific types of products, events involving a particular person or company, opinions about a specific issue or a political party, etc.
The decision-making officials in business and politics are usually too busy to read and comprehend all the relevant news in their available time, so that they often have to hire many news summarizers and readers or even to address to a special information agency. This is very expensive, and even in this case the important relationships between the facts may be lost, since each news summarizer typically has very limited knowledge of the subject matter. A fully effective automatic system could not only extract the relevant facts much faster, but also combine them, classify them, and investigate their interrelationships.
There are several laboratory systems of that type for business applications, e.g., a system that helps to explore news on Dow Jones index, investments, and company merge and acquisition projects. Due to the great difficulties of this task, only very large commercial corporations can afford nowadays the research on the factual data extraction problem, or merely buy the results of such research.
This kind of problem is also interesting from the scientific and technical point of view. It remains very topical, and its solution is still to be found in the future. We are not aware of any such research in the world targeted to the Spanish language so far.