
Rivero L.Encyclopedia of database technologies and applications.2006
.pdfNoirhomme, M. (2002). Visualization of large data sets: The zoom star solution. The Electronic Journal of Symbolic Data Analysis, 0, 000-000. Retrieved December, 2002, from http://www.jsda.unina2.it/volumes/Vol0/ noirho.pdf
Noirhomme, M. (2004). Visualisation of symbolic data.
Proceedings of the Workshop on Applications of Symbolic Data Analysis, Lisboa, Portugal. Retrieved May, 2004, from http://www.info.fundp.ac.be/asso/dissem/W- ASSO-Lisbon-Visu.pdf
Pak, K. (2003). Interprétation des pyramides. Thèse de DEA. Université Paris 9 Dauphine, France.
Périnel, E. (1996). Segmentation et Analyse des Données Symboliques: Application à des données probabilistes imprécises. Thèse de Doctorat, Université Paris 9 Dauphine, France.
Polaillon, G. (1998). Organisation et interprétation par les treilles de Gallois de données de type multivalué, intervalle ou histogramme. Thèse Docteur in Informatique. I’Université Paris IX-Dauphine. Retrieved August, 2005, from http://wwwsi.supelec.fr/gp/these.ps.gz
Touati, M., & Diday, E. Sodas home page. Retrieved July, 2004, from http://www.ceremade .dauphine.fr/~touati/ sodas-pagegarde.htm
KEY TERMS
Artificial Intelligence: The field of science that studies how to make computers “intelligent”. It consists mainly of the fields of machine learning (neuronal networks and decision trees) and expert systems. The principal problem is how to represent knowledge.
Decision Trees: A method of finding rules or rule induction which divide the data into subgroups that are as similar as possible with regard to a target variable (variable that we want to explain).
Exploratory Analysis: It is part of the Data Analysis French School, developed among 1960 and 1980. The principal authors are Tuckey and Benzecri. The process of analysis takes as a target to discover new relations between the sets of the analyzed information.
Extension: “I call the extension of an idea the subjects to which it applies, which are also called the inferiors of a universal term, that being called superior to them” (Arnault & Nicole, 1662).
Symbolic Objects and Symbolic Data Analysis
Formal Analysis Concept: A theory of data analysis, which identifies conceptual structures among data sets; Rudolf Wille introduced it in 1982. It structures data into units that are formal abstractions of concepts of human thought, allowing meaningful and comprehensible interpretation. FCA models the world as composed of objects and attributes. A strong feature of formal concept analysis is its capability of producing graphical visualizations of the inherent structures among data. In the field of information science, there is a further application: the mathematical lattices that are used in formal concept analysis can be interpreted as classification systems. Formalized classification systems can be analyzed according to the consistency of their relations. (FAC Home Page http://www.upriss.org.uk/fca/fca.html)
Fuzzy Sets: Let U be a set of objects so called universe of discourse. A fuzzy set F in U is characterized by a function of inclusion mF taking values in the interval [0,1], i.e. µF : U →[0,1]; where µF(u) represents the degree in which u U belongs to fuzzy set F.
Galois Lattice: Galois Lattice provides some meanings to analyze and represent data. This refers to twoordered set. An ordered set (I,#) is the set I together with a partial ordering # on I.
Intension: This is the comprehension of an idea. “I call the comprehension of an idea the attributes which it contains and which cannot be taken away from it without destroying it” (Arnault & Nicole, 1662).
ENDNOTES
1The cases are by absence value, null value, or default value.
2Here, we name between {} the principals authors; Diday referenced those.
3This software is the result of the investigation in which 17 institutions from 9 European countries are concerned and three official statistical institutions were involved in this project EUSTAT (Spain), INE (Portugal), and ONS (England). ASSO project continues investigating and developing new functionality for the software.
4Recently, this software was presented in the International Workshop on Symbolic Data Analysis on 6-7 May 2004 at PARIS-IX University Dauphine, France.
670
TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS
Fernando Ferri
Instituto di Ricerche sulla Popolazione e le Politiche Sociali, Italy
Maurizio Rafanelli
Instituto di Analisi dei Sistemi ed Informatica “A. Ruberti”, Italy
671
5
INTRODUCTION
One of the main topics in geographical information systems (GIS) research concerns the definition of high level visual query languages (Chrisman, 2002; Laurini & Thompson, 1992). This arises from the need to provide the user with a visual interactive tool for data manipulation and retrieval that is independent of the data’s physical organization. The use of standard query languages for spatial data handling (Rigaux, Scholl, & Voisard, 2001; Shekhar et al. 1999) has been hindered by the lack of appropriate language support. In fact, in visual query languages for GIS, a query can lead to multiple interpretations (Favetta & Aufaure-Portier, 2000).
For example, suppose the user wishes to formulate the following query: “Find all the regions that are passed through by a river and overlap a forest.” In this query, the user does not express interest in the relationship between the river and the forest. However, when he or she draws a shape representing a region, and another shape representing a river, he or she cannot avoid representing a spatial relationship between them, and so every representation considering a specific relationship between the two features can be considered a valid representation of the query. Different visual queries can thus represent the previous query in natural language. In Figure 1.a, the river passes through the forest; in Figure 1.b, the river touches the forest; and in Figure 1.c, the forest and the river are “disjointed.”
Thus, the three representations should be interpreted as three different queries and, moreover, each query has a different meaning from the original query in natural
language. Owing to semantical ambiguity problems, some configuration could be semantically invalid. For example, a lake cannot include a region. However, a visual query language with clear syntax and semantics can a priori overcome many cases of the ambiguities, minimizing multiple interpretations of a query for both the system and the user. This article discusses the syntactic and semantic correctness of spatial configurations in the context of nonprocedural geographic pictorial query languages. Thus, this article considers possible ambiguities related to visual representations of a query, and it does not consider ambiguities related to interactions between system and user.
BACKGROUND
Various proposals of visual query languages for geographical data have recently been made. It is possible to classify these different languages into two main approaches.
In the first approach, the user draws his or her query freehand, directly on the screen, using the blackboard metaphor. Examples of this are Sketch (Meyer, 1993) and Spatial-Query-by-Sketch (Blaser & Egenhofer, 2000; Egenhofer, 1997). With this approach, the parser considers both the exact solution of the query and other approximate solutions obtained by removing or relaxing some constraints.
In the second approach, the user is free to draw iconic symbols on the screen, to express an object or an operator. Important examples of this approach are the
Figure 1. Three visual queries representing the same query expressed in natural language
(a) |
(b) |
(c) |
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
Syntactical and Semantical Correctness of Pictorial Queries for GIS
Cigales language (Calcinelli & Mainguenaud, 1994; Mainguenaud & Portier, 1990) and the Lvis language (Aufaures-Portier, 1995; Aufaures-Portier & Bonhomme, 1999). In particular, Cigales is based on the idea of expressing a query by drawing the pattern corresponding to the result the user desires. The graphical forms and icons that conceptualize the operators are predefined.
Lvis is an extension of Cigales. The most relevant difference consists of the definition of new operators, because both spatial and temporal properties of the objects forming the query are considered. For this language, too, the main limitation is that there are different interpretations of the same query, that is, with the same visual representation the system is not able to give a unique interpretation. Ambiguity is increased with an increasing number of query objects.
Favetta and Aufaure-Portier (2000) confronted this problem and proposed a taxonomy based on user actions and system materialization (i.e., the different images the system can materialize), distinguishing ambiguities in visual query languages for GIS and ways to resolve them. They also proposed a model to solve a particular case of ambiguity. The proposed system, an enlargement of Lvis, established a dialogue with the user. Whenever an ambiguity occurred, it showed all the available configurations and requested a choice. The authors concluded that the strategy for avoiding ambiguities in most visual geographic query languages was to define not fully visual, but hybrid languages, including a textual part, and to offer a grammar with low expressive power.
The geographical pictorial query language (GeoPQL; Ferri & Rafanelli, 2004) is an evolution of the pictorial query language (PQL; Ferri, Massari, & Rafanelli, 1999), which resolved some previous limitations on ambiguities. It is possible to specify queries using symbolic graphical objects (SGOs) that have the appearance of the three classic shapes: point, polyline, and polygon. It is possible to assign to each SGO a semantic linked to the different kinds of information (i.e., layer) in the geographical database and impose constraints on both the SGO’s attributes and its spatial position. In addition, GeoPQL uses a limited set of graphical symbols to represent some operators that do not have a representation expressing the involved relations. Consequently, a generic pictorial sentence is represented by a set of symbolic graphical objects, a set of possible properties of each SGO, a set of possible symbolic operators, and the target of the query.
The GeoPQL algebra consists of two geometric operators: geo-union (Uni) and geo-difference (Dif); nine topological operators: geo-disjunction (Dsj), geotouching (Tch), geo-inclusion (Inc), geo-crossing (Crs),
geo-pass-through (Pth), geo-overlapping (Ovl), geoequality (Eql), geo-alias (Als) and geo-any (Any); and one metric operator: geo-distance (Dst).
The symbols used for the symbolic operators are ←→ for the geo-distance operator and ← “relation name” → for the geo-any and geo-alias operators, (where “relation name” is any or alias).
Geo-any allows elimination of undesired constraints and, if a geo-any relationship is defined between a pair of SGOs, this means that no constraint exists between them. In contrast, geo-alias allows representation of more than one relationship (in OR) between a pair of SGOs. In fact, a graphical representation does not allow, for example, a pair of polygons that are both disjoined and overlapped.
One important issue is the definition of a sound method for analysis of syntactic and semantic correctness, of queries that may lead to multiple system and user interpretations. Thus, we illustrate an approach able to determine the exact syntactic and semantic interpretations of geographic configurations involved in queries expressed by a pictorial query language.
A visual query language with clear syntax and semantics can prevent a priori many ambiguities, minimizing multiple interpretations. The goal of this article is to present the configurations between geographical objects that can be considered syntactically correct in a geographical context, identify the set of GeoPQL operators referred to each configuration, and give each configuration a nonambiguous semantic.
SYNTACTIC CORRECTNESS OF PICTORIAL CONFIGURATIONS
In this section, a generic SGO pair, part of the set of all query SGOs, is considered. The set of operators syntactically admissible for this pair is defined, starting from possible spatial configurations (see Figure 2; Shekhar & Chawla, 2002). Thus the system considers for each configuration drawn by the user only the syntactically correct predicates, ensuring the syntactical correctness of the pictorial query. The proposed operators represent the constraints on the spatial properties of the objects (or classes) of the database that the user must specify in his or her query in order to find the geographic objects of interest.
Each possible spatial configuration is given a code of three alphabetic characters followed by a number. The first two characters indicate the type of SGO, the third indicates the type of spatial configuration, and the number distinguishes the configuration. In this manner, the first configuration between two polygons in Figure 2 is referred to as “aaA1,” the last as “aaE1.”
672
TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS
Figure 2. The different configurations considered
Let ψi and ψj be two SGOs that form a given configuration and are associated respectively with the database’s geographical classes gcα and gcβ, possibly coincident.
For each configuration in Figure 2, a set of predicates <ψi Operator ψj> syntactically correct on the base of geometric and topological properties of ψi and ψj can be considered.
Table 1 summarizes syntactically correct predicates for all the configurations in Figure 2.
The operators geo-alias, geo-any, and geo-distance, in contrast to the other operators, must be expressed in the query using a suitable symbol (i.e., a labeled edge). Consequently, these operators are not linked to the particular configuration between the SGO pair involved, but to the symbols between them. For this reason, these kinds of operators require verification of the applicability requisites only, without considering the syntactical correctness of the configurations.
The geometric operators geo-union and geo-differ- ence are applied only in relation to the specification of the properties (e.g., the value of the area) of their result (i.e., the object union or difference of the involved SGO). The property specification is sometimes related to the application of topological operators (e.g., the value of the intersection area between two SGOs).
SEMANTIC CORRECTNESS OF PICTORIAL CONFIGURATIONS
In the previous table, all possible configurations of pairs of SGOs (operands) were considered. Their syntactic correctness was verified and the set of applicable operators identified. These properties are independent of the geographic database. In contrast with syntactic correct-
ness, semantic correctness is related to the meaning of |
5 |
the geographical information, so some syntactically |
correct configurations could be semantically incorrect, and some operators syntactically applicable to a configuration may be inapplicable semantically, due to the geographical information involved.
For example, the crossing of two polylines is always syntactically correct, and it is semantically correct, too, if the polylines represent two streets or a street and a river (the result could be a bridge), but it is semantically incorrect if they represent two rivers. It is obvious that semantic correctness is a subset of the syntactic correctness of the GeoPQL algebra operators applied to all possible pairs of SGOs.
Let ψi and ψj be two SGOs that form a given configuration and are associated respectively with the database’s geographical classes gcα and gcβ, possibly coincident.
The predicate <ψi Operator ψj> is semantically correct if it is syntactically correct (for the configuration) and results in a non-null set of geographical objects. A configuration between the SGOs ψi and ψj is semantically correct if it is syntactically correct and at least one of its associated operators is semantically correct. Obviously, semantic correctness depends on the geographical classes gcα and gcβ associated with ψi and ψj.
For all the operators for which the symmetric property is not valid from the syntactical point of view, the semantical correctness depends on the order of the two operands. For example, a region can include a lake, but a lake cannot include a region.
If the result is a null set of objects, however, the user can autonomously define the predicate as “semantically correct, but absent in the database” for that configuration and manage “lists of semantic correctness for configurations, operators, and pair of geographical classes,” or he or she can define the predicate as “semantically not correct” and manage “lists of semantic incorrectness for configurations, operators, and pair of geographical classes.”
This procedure can be applied every time a new geographic class is defined. However, it can be very onerous because it is necessary to specify, for all configurations and operators, between the new class and previously defined classes giving a null result, whether they are correct or not.
Thus the system considers configurations and operators giving a non-null result as semantically correct. It considers all remaining configurations as undetermined until the user formulates a query that involves a <ψi Operator ψj > predicate for the pair of geographical classes. It is only then that the user decides if this construct is semantically correct, absent, or incorrect.
It is also important to specify the different role played, from the semantical point of view, by the topological
673
TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS
Table 1. Summary of sintactically correct operators for the different configurations considered
Configurations |
Descriptions |
Operators syntactically correct |
|
|
|
|
|
ppA1 |
The configuration can represent two points ψi |
ψi Uni ψj |
(or ψj Uni ψi) |
|
and ψj which are separate, or specify a |
ψi Dsj ψj |
(or ψj Dsj ψi) |
|
property of their union. |
|
|
|
|
|
|
ppB1 |
The points ψi and ψj are elements of different |
ψi Eql ψj |
(or ψj Eql ψi) |
|
thematisms (classes) and their common |
|
|
|
property is their identical spatial coordinates |
|
|
|
|
|
|
plA1 |
The configuration represents a point ψi and a |
ψi Dsj ψj |
(or ψj Dsj ψi) |
|
polyline ψj which are separate. |
|
|
plB1 |
In this configuration a point ψi is located over |
ψj Tch ψi |
|
|
a polyline and ψj. |
ψj Inc ψi |
|
plC1 |
This is equal to the previous configuration and |
ψj Tch ψi |
|
|
represents a point ψi located over a polyline ψj |
ψj Inc ψi |
|
|
on the boundary. |
|
|
|
|
|
|
paA1 |
The configuration represents two polygons ψi |
ψi Dsj ψj |
(or ψj Dsj ψi) |
|
and ψj, which are separate. |
|
|
paB1 |
The configuration represents a point ψi located |
ψj Inc ψi |
|
|
in a polygon ψj. |
|
|
paC1 |
The configuration represents a point ψi located |
ψj Tch ψi |
|
|
on the boundary of the polygon ψj. |
ψj Inc ψi |
|
llA1 |
The configuration represents two polylines ψi |
ψi Dsj ψj |
(or ψj Dsj ψi) |
|
and ψj, which are separate, or allow |
ψi Uni ψj |
(or ψj Uni ψi) |
|
specification of a property of their union. |
|
|
|
|
|
|
llB1–llB2 |
Two configurations can be considered: a) ψ j is |
ψi Uni ψj |
(or ψj Uni ψi) |
|
completely within ψ i ; b) ψ j is completely |
ψj Inc ψi |
|
|
within ψ i and they have one common |
ψj Dif ψi |
|
|
boundary point. |
ψi Ovl ψj |
(or ψj Ovl ψi) |
llC1– llC2– llC3– |
These six configurations represent the different |
ψi Tch ψj |
(or ψj Tch ψi) |
llC4– llC5– llC6 |
cases in which the two polylines ψi and ψj can |
for llC2, llC3 and llC6 |
|
|
have some, but not all, of their points in |
ψi Uni ψj |
(or ψj Uni ψi) |
|
common, without crossing. |
always |
|
|
|
|
|
|
|
ψj Dif ψi |
for |
|
|
llC1, llC4, and llC5 |
|
|
|
ψi Ovl ψj |
(or ψj Ovl ψi) |
|
|
for llC1, llC4, and llC5 |
|
llD1 |
These two configurations represent the |
ψi Uni ψj |
(or ψj Uni ψi) |
|
different cases in which the two polylines ψi |
always |
|
|
and ψj can have some, but not all, of their |
ψi Ovl ψj |
(or ψj Ovl ψi) |
|
points in common, while crossing. |
for llD2 |
|
|
|
ψi Crs ψj |
(or ψj Crs ψi) |
|
|
for llD1 |
|
llE1 |
In this configuration the polylines ψi and ψj |
ψi Eql ψj |
(or ψj Eql ψi) |
|
are elements of different thematisms (classes) |
|
|
|
and are spatially coincident. |
|
|
|
|
|
|
operators in respect to the other operators. For example, if the query consists of two polylines that cross themselves, the syntactically correct predicates for this configuration are:
ψCrs ψ
ψii Uni ψjj
The first predicate derives from a topological property of the configuation of the two SGOs involved in the query. Thus, it is semantically correct if the user explic-
itly specifies that the operator is semantically correct for that pair of operands, or if there is a set of geographical object pairs that satisfies this predicate in the database.
The second predicate is taken into consideration in the query only if the user specifies in it properties concerning the union of the SGOs. For this reason it is semantically correct if there are common properties between the two geographical classes gcα and gcβ represented by ψi and ψj.
674
TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS
Table 1. Summary of sintactically correct operators for the different configurations considered (cont.)
5
Configurations |
Descriptions |
Operators syntactically correct |
||||
|
|
|
|
|||
laA1 |
The configuration represents a polyline ψi and |
ψi Dsj ψj |
(or ψj Dsj ψi) |
|||
|
a polygon ψj, which are separate. |
|
|
|
|
|
laB1– laB2– |
These 10 configurations represent the various |
ψj Inc ψi |
always |
|
||
laB3– laB4– |
cases in which a polygon ψj contains a polyline |
ψi Tch ψj |
(or ψj Tch ψi) |
|||
laB5– laB6– |
||||||
ψi. |
for |
laB2–laB3–laB4–laB5–laB6– |
||||
laB7– laB8– |
||||||
|
laB7–laB8–laB9–laB10 |
|
||||
laB9– laB10 |
|
|
||||
|
|
|
|
|
||
|
|
|
|
|||
laC1– laC2– |
These five configurations represent the various |
ψi Tch ψj |
(or ψj Tch ψi) always |
|||
laC3– laC4– laC5 |
cases in which a polyline ψi touches a polygon |
ψi Dif ψj |
for laC3 and laC4 |
|||
|
ψj. |
ψj Inc ψi |
for laC5 |
|
||
laD1– laD2– |
These six configurations represent the cases in |
ψi Pst ψj |
always |
|
||
laD3– laD4– |
which a polyline ψi intersects a polygon ψj. |
ψi Dif ψj |
always |
|
||
laD5– laD6 |
|
|||||
|
ψi Tch ψj |
(or ψj Tch ψi) |
||||
|
|
|||||
|
|
for laD4–laD5–laD6 |
|
|||
aaA1 |
The configuration represents two polygons ψi |
ψi Dsj ψj |
(or ψj Dsj ψi) |
|||
|
and ψj, which are separate, or allow a property |
ψi Uni ψj |
(or ψj Uni ψi) |
|||
|
of their union to be specified. |
|
|
|
|
|
|
|
|
|
|||
aaB1–aaB2 |
These configurations represent the cases in |
ψi Uni ψj |
(or ψj Uni ψi ) |
|||
|
which two polygons ψi and ψj are touching. |
ψi Tch ψj |
(or ψj Tch ψi) |
|||
aaC1– aaC2 |
These configurations represent the cases in |
ψi Uni ψj |
(or ψj Uni ψi ) |
|||
|
which two polygons ψi and ψj overlap. |
ψi Dif ψj |
(or ψj Dif ψi ) |
|||
|
|
ψi Ovl ψj |
(or ψj Ovl ψi ) |
|||
aaD1– aaD2– |
These configurations represent the cases in |
ψi Uni ψj |
(or ψj Uni ψi) |
always |
||
aaD3 |
which the polygon ψi encloses the polygon ψj. |
ψi Dif ψj |
always |
|
||
|
|
|||||
|
|
ψi Inc ψj |
always |
|
||
|
|
ψi Ovl ψj (or ψj Ovl ψi) |
always |
|||
|
|
ψi Tch ψj |
(or ψj Tch ψi) |
|||
|
|
for aaD1and aaD2 |
|
|||
aaE1 |
In this configuration, the polygons ψi and ψj |
ψ i Eql ψ j |
(or ψ j Eql ψ i ) |
|||
|
are elements of different thematisms (classes) |
|
|
|
|
|
|
and are spatially coincident. |
|
|
|
|
|
|
|
|
|
|
|
FUTURE TRENDS
A query on a geographical database involves the following two types of information properties:
•spatial properties (e.g., the topological relationships or the distance). These properties are verified by the geometric attributes of different objects which form the database; and
query, some problem related to their use could arise. Other works refer to studies regarding the meaningful power, completeness, and unambiguousness of visual query languages.
CONCLUSION
The use of nonprocedural visual query languages allows
•descriptive properties (e.g., the population of a the user to query geographical databases almost without
given region, or the length of a given river). These properties are verified by the alphanumeric attributes of the aforementioned objects.
Future trends in query languages for GIS refer to the representation of spatial properties using visual approaches. However, if the use of visual approaches allows to the user to specify, in a graphical way, topological and geometric properties among all the objects involved in the
knowing procedures and syntax of these languages. In particular, by pictorial query languages, user friendliness and ambiguous queries is no longer an alternation. Different problems are still open and are subjects of studies, as is the advancement the technology brings to new proposals and solutions for these kinds of languages.
675
TEAM LinG
Syntactical and Semantical Correctness of Pictorial Queries for GIS
REFERENCES
Aufaures-Portier, M. A. (1995). A high level interface language for GIS. Journal of Visual Languages and Computing, 6(2), 167-182.
Aufaures-Portier, M. A., & Bonhomme, C. (1999). A high level language for spatial data management. Proceedings on Visual Information Systems (VISUAL ’99), 1614 (pp. 325-332). Amsterdam, The Netherlands: Springer-Verlag.
Blaser, A. D., & Egenhofer, M. J. (2000). A visual tool for querying geographic databases. Proceedings on Advances in Visual Interfaces (pp. 211-216). Palermo, Italy: ACM.
Calcinelli D., & Mainguenaud, M. (1994). Cigales, a visual language for geographic information system: The user interface. Journal of Visual Languages and Computing, 5(2), 113-132.
Chrisman, N. (2002). Exploring geographic information systems. Wiley.
Egenhofer, M. J. (1997). Query processing in spatial- query-by-sketch. Journal of Visual Languages and Computing, 8(4), 403-424.
Favetta, F., & Aufaure-Portier, M. A. (2000). About ambiguities in visual GIS query languages: A taxonomy and solutions. Proceedings on Visual Information, 1929
(pp. 154-165). Lyon, France: Springer-Verlag.
Ferri, F., Massari, F., & Rafanelli, M. (1999). A pictorial query language for geographic features in an objectoriented environment. Journal of Visual Languages and Computing, 10(6), 641-671. Academic Press.
Ferri, F., & Rafanelli, M. (2004). GeoPQL: A geographical pictorial query language (Tech. Rep). IASI-CNR.
Laurini, R., & Thompson, D. (1992) Fundamentals of spatial information systems. Academic Press.
Mainguenaud, M., & Portier, M. A. (1990). Definition of Cigales: A GIS query language. Proceedings on Databases and Expert Systems Applications (pp. 275-280). Vienna, Austraia: Springer-Verlag.
Meyer, B. (1993). Beyond icons: Towards new metaphors for visual query languages for spatial information systems. Proceedings of the I° International Workshop on Interfaces in Database Systems (pp. 113-135). Glasgow, UK: Springer-Verlag.
Rigaux, P., Scholl, M. O., & Voisard, A. (2001). Spatial databases: With application to GIS. Morgan Kaufmann.
Shekhar, S., Chawla, S., Ravada, S., Fetterer, A., Liu, X., & Lu, C. (1999) Spatial databases: Accomplishments and research needs. IEEE Transactions on Knowledge and Data Engineering, 10(1), 45-55.
Shekhar, S., & Chawla, S. (2002). Spatial databases: A tour. Prentice Hall.
KEY TERMS
Blackboard Metaphor: A metaphor used in query languages, its philosophy is to let users draw the sketch of their query.
Declarative (Nonprocedural) Query Language:
A general term for a query language, as opposed to an imperative query language. Imperative (or procedural) languages specify explicit sequences of steps to follow to produce a result, while declarative (nonprocedural) languages describe relationships between variables in terms of functions or inference rules, and the language executor (interpreter or compiler) applies some fixed algorithm to these relations to produce a result.
Geographical Database: A database in which geographical information is store by x-y coordinates of single points, or points that identify the boundaries of lines (or polylines, which sometimes represent the boundaries of polygons). Different attributes characterize the objects stored in these databases. In general, the storing structure consists of “classes” of objects, each of them implemented by a layer. Often a geographical database includes raster, topological vector, image processing, and graphics production functionality.
Geographical Information System (GIS): A computerized database system used for the capture, conversion, storages, retrieval, analysis, and display of spatial objects.
Metaphor: Figurative language that creates an analogy between two unlike things. Metaphor does not make a comparison but creates its analogy by representing one thing as something else.
Pictorial Query Language: A general term for a query language, as opposed to a textual query language. Pictorial languages describe the result to produce characterized by, or composed of, pictures.
Visual Query Language: A language that allows the user to specify its goals in a two- (or more) dimensionsional way with visual expressions—spatial arrangements of textual and graphical symbols.
676
TEAM LinG

Temporal Databases
Mahesh S. Raisinghani
Texas Woman’s University, USA
Chris Klassen
University of Dallas, USA
INTRODUCTION
Information has emerged as an agent of integration and the enabler of new competitiveness for today’s enterprise in the global marketplace. The degree of change in the paradigm for storage of data in databases is examined to determine whether it can support the accelerated response time required for information systems and technology. This paper discusses the key concepts for understanding temporal databases, including major data types and its principal purpose. Moreover, once the temporal extension is added to existing ANSI and ISO SQL standards, it will enable users to take advantage of new temporal features in the major database products.
Time has always been of great interest to mankind. In recent years, the computer industry has made a great influence in human lifestyle and it like other industries is time prone. For instance, the year 2000 problem (Y2K) was time-related. Not surprisingly, researchers working in the field of databases are expressing increasing interest in the dimension of time. Classical database design is two-dimensional and contains only current data, which can be termed snapshot data type. Today’s businesses must adapt constantly to an ever-changing business environment, and databases must support an evolving business framework. More and more timemanagement seminars and devices are introduced everyday. As Snodgrass (1998) notes, “Time varying data is becoming pervasive. It has been estimated that one of every 50 lines of database application code involves a date or time value.”.
BACKGROUND:
THE TIME DIMENSION IN
DATABASES
The classical database is two-dimensional with columns and rows that intersect each other at cells, which contain particular values. Now extend this flat two-dimensional (2D) database into a three-dimensional (3D) figure, such as a cube. Apply this 3D concept to a database,
677
6
instead of a flat 2D construct, and now you have an extended 3D figure with the third dimension being represented as various time intervals (Tansel et al., 1993). Date, Darwen, and Lorentzos (2003) provide a detailed investigation into the application of interval and relational theory to the problem of temporal database management.
Jensen, Mark, Roussopoulos, and Sellis (1993) presented an architecture for query processing in the relational model extended with transaction time that integrates standard query optimization and computation techniques with new differential computation techniques. The use of differential computation techniques is essential in order to provide efficient processing of queries that access very large temporal relations.
Temporal database systems are systems that provide special support for storing, querying, and updating historical and future data (Date et al., 2003). A temporal database records time-variant information. Date (2004) states that the relational model needs no extension or corruption in order to support the time dimension. Snodgrass (1998) defines the temporal database as “a database that supports some aspect of time”. Another definition that may be better structured to fit a Temporal Relational Model states that a temporal database is defined as “an union of two sets of relations Rs and R1, where Rs is the set of all static relations and R1 is the set of all time-varying relations” (Navathe & Ahmed, 1993). This article is limited to the consideration of the relational model of the temporal database to the exclusion of the other well-known types of databases such as object-oriented, network, and hierarchical. While there is little temporal database research currently on the latter two types of databases, an increasing amount of research is being done in the area of temporal objectoriented databases. We might also note that temporal databases have also been referred to as time-oriented databases, time-varying databases, or historical databases. While time-oriented database and time-varying database are equivalent in meaning to temporal database, historical database is not. As discussed later in this article, a historical database is actually a subset of the temporal database.
Copyright © 2006s, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
The first thing to note about the Snodgrass (1998) definition of a temporal database is that the time dimension requirement for a database to be temporal does not include user-defined time. User-defined time is some aspect of time that is not recognized by the database management system as a special data type. A classical database essentially treats data, such as birthdates, as text strings. This treatment of date data does not allow for much manipulation. One way of attempting to avoid this problem is to treat the date as a type of text data to store the date as a number. This approach, however, has its own problem set. One such problem found in Oracle version 7.2 is that in some instances in year 2000, dates are treated as earlier than dates in the 1900s. Converting dates to numbers internally may be best way of making possible a wide range of ad hoc queries on temporal data. The fact that support of user-defined time does not merit a database being considered temporal and does not mean that the temporal database cannot or will not include user-defined time.
TEMPORAL DATABASE
CHARACTERISTICS
There are several types of time in a temporal database. Before describing these different types in detail, it is important to clarify the concept of precision relating to the time stored in a temporal database. This concept denoting precision in a temporal database is called the granularity of time (Howe, 1997). An example of progressive time granularity includes a day, an hour, a second, or a nanosecond.
It is important to understand three major concepts to grasp the precise nature of a temporal database. The first is the concept of valid time. The second concept is transaction time, the actual time recorded in the database at the time the data is entered. Time stamps can include either the date or the date and clock time. An object is an entity that has a well-defined role in the application domain, and its features include state, behavior, and identity. An employee is a good example of an object. In a classical database, once a change is made to an employee’s record, original data is changed, discarded, and replaced by new data. However, in a temporal database, which supports transaction time, transaction time can be attached in the form of a time stamp to both the old data and to the new data for that employee. In so doing, the database can store both the old data and new data for the same object. In this case, the salary of the employee was increased on a certain date. It is important to note here that the transaction time values or time stamps cannot be later than the current point in
Temporal Databases
time nor can they be changed, just as the past cannot be changed.
Another major type of time in a temporal database is termed as valid time. Valid time is the actual or realworld clock time at which the data is valid. Continuing with the employee example, while transaction time is the point in time at which the data is entered into the database, valid time is the unique point when the entered data become true or take effect. For instance, on January 3, an employee is notified that an increase in salary will be effective February 1. The Human Resources department, after being notified of the employee’s raise, must enter the new salary into the database. Presumably, they will enter the data before the raise goes into effect. The actual time they enter the raise into the database is prior to February 1st, and that time will be the time stamp for the transaction time. The data, however, are not yet valid, and for the rest of January, the employee will continue to receive the current salary. However, on February 1, the raise data will become valid. Thus, February 1 is the valid time. Also, if an employee receives a raise that is retroactive, the transaction time may be later than the valid time.
The two major types of time unique to the temporal database, valid time and transaction time, allow for the possibility of three forms of temporal databases: historical, rollback, and bi-temporal (Steiner, 2003). A historical database supports valid time, but not transaction time. To reiterate, a historical database is a poor choice as an alternative term for the temporal database. The reason is that a historical database is but one type of temporal database. A historical database, however, as explained later, would be a poor choice for anyone wishing to deploy a temporal database. The second form of a temporal database is the rollback database. This database is the opposite of the historical database. The rollback database only supports transaction time and not valid time. As opposed to the historical database, rollback database is quite useful for data recovery after database failure. The reason then that a temporal database would rarely be desirable is that it could not support rollback after DBMS failure. It is also necessary if the database does not use the locking technique to ensure data security. Most databases on the market today do support at least some rollback features.
In reality, a temporal database is a bi-temporal database. This database supports both types of time that are necessary for storing and querying time-varying data. The bi-temporal database could aid significantly in knowledge discovery, since it is able to fully support the time dimension on three levels: the DBMS level with transaction time, the data level with valid time, and the userlevel with user-defined time.
678
TEAM LinG
Temporal Databases
DATATYPES IN A TEMPORAL |
QUERY CAPABILITIES FOR THE |
|
|
6 |
|||
DATABASE |
TEMPORAL DATABASE |
||
|
|
|
A temporal database supports three major data types: temporal, static, and snapshot data. It is important that the data types be transparent since the user need not know whether the data used is temporal, static, or snapshot. One goal of temporal database should be provision of seamless handling of temporal, static, and snapshot data (Gadia & Nair, 1993).
The temporal datatype is the most important datatype for the temporal database, for the simple fact that it provides the foundation for building the temporal database. The temporal datatype has been defined as “a finite union of intervals” (Gadia & Nair, 1993). This datatype is termed temporal element. The temporal element is necessary for using time-varying criteria to perform adhoc queries. Later in this article, an example of a temporal querying language is presented, which uses the temporal element in a new temporal clause that is added to the existing SQL syntax.
Static datatype is defined as “a constant defined over the whole universe of time” (Gadia & Nair, 1993). In other words, the validity of a static value is defined as any future time. In contrast, temporal datatype value is valid for a specified time period or interval, and a snapshot datatype value is valid only for the current instant. Another way of expressing the validity of static data is to say that its domain is any finite point in the future. Without a finite period of validity, the static datatype does not need a time stamp.
The final major datatype is termed snapshot data. In a conventional or classical database, all data is of the snapshot datatype (Gadia & Nair, 1993; Navathe & Ahmed, 1993; Ozsoyoglu & Snodgrass, 1995). When tables are updated, the new values replace old values, which are discarded, and a new snapshot is created. Unfortunately, with the snapshot datatype, the history of the changes to the data is lost. Thus, the database should support the history of the data for a given business, so merely having a snapshot datatype supported by a database is unacceptable, hence, proving the need for temporal database. Another reason to retain the capability for the snapshot datatype is to provide a smooth transition from a classical database to a temporal database. Since all the data in a classical database is of the snapshot datatype, migration from a classical database to a temporal database should be seamless (Gadia & Nair, 1993).
One of the most important reasons for having a database that supports the temporal dimension is the ability to perform ad-hoc queries about the data. The current standard for conventional (relational) databases is Structured Query Language (SQL). SQL has become the industry standard for Relational Database Management Systems (RDBMSs) because of its ease of use due to its English-like syntax. The addition of the temporal dimension, however, greatly increases the complexity of the queries using temporal data. With the additional element of time, SQL in its current form is no longer able to process ad-hoc queries as was done formally using a (relational) classical database. A new query language or extension to SQL is necessary. Understandably, one of the biggest areas of research today in the field of temporal databases is focused on this topic. However, a few remarks on this topic are provided, and one example of a query language extension is given in the following discussion.
Any new temporal query language should support current SQL capabilities seamlessly. The most promising proposals for a query language which supports the time dimension and also continues to allow users to perform queries without specifying any time criteria (thus avoiding high retraining costs, among other things) are not really new query languages, per se, but are simply extensions of SQL. One of the most promising examples of temporal-capable SQL is called, appropriately, TSQL for Temporal Structured Query Language. With TSQL, ad-hoc queries can be performed without specifying any time-varying criteria. Thus, the only essential clauses that are necessary to perform a query in TSQL remain the same as in SQL. These are the SELECT and FROM clauses. If desired, a new criteria clause can be added to current SQL criteria clauses, which are WHERE, GROUP BY, HAVING, and ORDER BY. The new criteria clause would be either WHEN or WHILE. Either name captures the idea of a time-varying condition. Böhlen, Jensen, and Snodgrass (2000) advocate a different approach to articulating a set of requirements that directly implies the syntactic structure and core semantics of a temporal extension of an (arbitrary) nontemporal query language. This extended language, termed ATSQL, is formally defined via a denotational-semantics-style mapping of tempo-
679
TEAM LinG