Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
14
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

Noirhomme, M. (2002). Visualization of large data sets: The zoom star solution. The Electronic Journal of Symbolic Data Analysis, 0, 000-000. Retrieved December, 2002, from http://www.jsda.unina2.it/volumes/Vol0/ noirho.pdf

Noirhomme, M. (2004). Visualisation of symbolic data.

Proceedings of the Workshop on Applications of Symbolic Data Analysis, Lisboa, Portugal. Retrieved May, 2004, from http://www.info.fundp.ac.be/asso/dissem/W- ASSO-Lisbon-Visu.pdf

Pak, K. (2003). Interprétation des pyramides. Thèse de DEA. Université Paris 9 Dauphine, France.

Périnel, E. (1996). Segmentation et Analyse des Données Symboliques: Application à des données probabilistes imprécises. Thèse de Doctorat, Université Paris 9 Dauphine, France.

Polaillon, G. (1998). Organisation et interprétation par les treilles de Gallois de données de type multivalué, intervalle ou histogramme. Thèse Docteur in Informatique. I’Université Paris IX-Dauphine. Retrieved August, 2005, from http://wwwsi.supelec.fr/gp/these.ps.gz

Touati, M., & Diday, E. Sodas home page. Retrieved July, 2004, from http://www.ceremade .dauphine.fr/~touati/ sodas-pagegarde.htm

KEY TERMS

Artificial Intelligence: The field of science that studies how to make computers “intelligent”. It consists mainly of the fields of machine learning (neuronal networks and decision trees) and expert systems. The principal problem is how to represent knowledge.

Decision Trees: A method of finding rules or rule induction which divide the data into subgroups that are as similar as possible with regard to a target variable (variable that we want to explain).

Exploratory Analysis: It is part of the Data Analysis French School, developed among 1960 and 1980. The principal authors are Tuckey and Benzecri. The process of analysis takes as a target to discover new relations between the sets of the analyzed information.

Extension: “I call the extension of an idea the subjects to which it applies, which are also called the inferiors of a universal term, that being called superior to them” (Arnault & Nicole, 1662).

Symbolic Objects and Symbolic Data Analysis

Formal Analysis Concept: A theory of data analysis, which identifies conceptual structures among data sets; Rudolf Wille introduced it in 1982. It structures data into units that are formal abstractions of concepts of human thought, allowing meaningful and comprehensible interpretation. FCA models the world as composed of objects and attributes. A strong feature of formal concept analysis is its capability of producing graphical visualizations of the inherent structures among data. In the field of information science, there is a further application: the mathematical lattices that are used in formal concept analysis can be interpreted as classification systems. Formalized classification systems can be analyzed according to the consistency of their relations. (FAC Home Page http://www.upriss.org.uk/fca/fca.html)

Fuzzy Sets: Let U be a set of objects so called universe of discourse. A fuzzy set F in U is characterized by a function of inclusion mF taking values in the interval [0,1], i.e. µF : U [0,1]; where µF(u) represents the degree in which u U belongs to fuzzy set F.

Galois Lattice: Galois Lattice provides some meanings to analyze and represent data. This refers to twoordered set. An ordered set (I,#) is the set I together with a partial ordering # on I.

Intension: This is the comprehension of an idea. “I call the comprehension of an idea the attributes which it contains and which cannot be taken away from it without destroying it” (Arnault & Nicole, 1662).

ENDNOTES

1The cases are by absence value, null value, or default value.

2Here, we name between {} the principals authors; Diday referenced those.

3This software is the result of the investigation in which 17 institutions from 9 European countries are concerned and three official statistical institutions were involved in this project EUSTAT (Spain), INE (Portugal), and ONS (England). ASSO project continues investigating and developing new functionality for the software.

4Recently, this software was presented in the International Workshop on Symbolic Data Analysis on 6-7 May 2004 at PARIS-IX University Dauphine, France.

670

TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS

Fernando Ferri

Instituto di Ricerche sulla Popolazione e le Politiche Sociali, Italy

Maurizio Rafanelli

Instituto di Analisi dei Sistemi ed Informatica “A. Ruberti”, Italy

671

5

INTRODUCTION

One of the main topics in geographical information systems (GIS) research concerns the definition of high level visual query languages (Chrisman, 2002; Laurini & Thompson, 1992). This arises from the need to provide the user with a visual interactive tool for data manipulation and retrieval that is independent of the data’s physical organization. The use of standard query languages for spatial data handling (Rigaux, Scholl, & Voisard, 2001; Shekhar et al. 1999) has been hindered by the lack of appropriate language support. In fact, in visual query languages for GIS, a query can lead to multiple interpretations (Favetta & Aufaure-Portier, 2000).

For example, suppose the user wishes to formulate the following query: “Find all the regions that are passed through by a river and overlap a forest.” In this query, the user does not express interest in the relationship between the river and the forest. However, when he or she draws a shape representing a region, and another shape representing a river, he or she cannot avoid representing a spatial relationship between them, and so every representation considering a specific relationship between the two features can be considered a valid representation of the query. Different visual queries can thus represent the previous query in natural language. In Figure 1.a, the river passes through the forest; in Figure 1.b, the river touches the forest; and in Figure 1.c, the forest and the river are “disjointed.”

Thus, the three representations should be interpreted as three different queries and, moreover, each query has a different meaning from the original query in natural

language. Owing to semantical ambiguity problems, some configuration could be semantically invalid. For example, a lake cannot include a region. However, a visual query language with clear syntax and semantics can a priori overcome many cases of the ambiguities, minimizing multiple interpretations of a query for both the system and the user. This article discusses the syntactic and semantic correctness of spatial configurations in the context of nonprocedural geographic pictorial query languages. Thus, this article considers possible ambiguities related to visual representations of a query, and it does not consider ambiguities related to interactions between system and user.

BACKGROUND

Various proposals of visual query languages for geographical data have recently been made. It is possible to classify these different languages into two main approaches.

In the first approach, the user draws his or her query freehand, directly on the screen, using the blackboard metaphor. Examples of this are Sketch (Meyer, 1993) and Spatial-Query-by-Sketch (Blaser & Egenhofer, 2000; Egenhofer, 1997). With this approach, the parser considers both the exact solution of the query and other approximate solutions obtained by removing or relaxing some constraints.

In the second approach, the user is free to draw iconic symbols on the screen, to express an object or an operator. Important examples of this approach are the

Figure 1. Three visual queries representing the same query expressed in natural language

(a)

(b)

(c)

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS

Cigales language (Calcinelli & Mainguenaud, 1994; Mainguenaud & Portier, 1990) and the Lvis language (Aufaures-Portier, 1995; Aufaures-Portier & Bonhomme, 1999). In particular, Cigales is based on the idea of expressing a query by drawing the pattern corresponding to the result the user desires. The graphical forms and icons that conceptualize the operators are predefined.

Lvis is an extension of Cigales. The most relevant difference consists of the definition of new operators, because both spatial and temporal properties of the objects forming the query are considered. For this language, too, the main limitation is that there are different interpretations of the same query, that is, with the same visual representation the system is not able to give a unique interpretation. Ambiguity is increased with an increasing number of query objects.

Favetta and Aufaure-Portier (2000) confronted this problem and proposed a taxonomy based on user actions and system materialization (i.e., the different images the system can materialize), distinguishing ambiguities in visual query languages for GIS and ways to resolve them. They also proposed a model to solve a particular case of ambiguity. The proposed system, an enlargement of Lvis, established a dialogue with the user. Whenever an ambiguity occurred, it showed all the available configurations and requested a choice. The authors concluded that the strategy for avoiding ambiguities in most visual geographic query languages was to define not fully visual, but hybrid languages, including a textual part, and to offer a grammar with low expressive power.

The geographical pictorial query language (GeoPQL; Ferri & Rafanelli, 2004) is an evolution of the pictorial query language (PQL; Ferri, Massari, & Rafanelli, 1999), which resolved some previous limitations on ambiguities. It is possible to specify queries using symbolic graphical objects (SGOs) that have the appearance of the three classic shapes: point, polyline, and polygon. It is possible to assign to each SGO a semantic linked to the different kinds of information (i.e., layer) in the geographical database and impose constraints on both the SGO’s attributes and its spatial position. In addition, GeoPQL uses a limited set of graphical symbols to represent some operators that do not have a representation expressing the involved relations. Consequently, a generic pictorial sentence is represented by a set of symbolic graphical objects, a set of possible properties of each SGO, a set of possible symbolic operators, and the target of the query.

The GeoPQL algebra consists of two geometric operators: geo-union (Uni) and geo-difference (Dif); nine topological operators: geo-disjunction (Dsj), geotouching (Tch), geo-inclusion (Inc), geo-crossing (Crs),

geo-pass-through (Pth), geo-overlapping (Ovl), geoequality (Eql), geo-alias (Als) and geo-any (Any); and one metric operator: geo-distance (Dst).

The symbols used for the symbolic operators are ←→ for the geo-distance operator and “relation name” for the geo-any and geo-alias operators, (where “relation name” is any or alias).

Geo-any allows elimination of undesired constraints and, if a geo-any relationship is defined between a pair of SGOs, this means that no constraint exists between them. In contrast, geo-alias allows representation of more than one relationship (in OR) between a pair of SGOs. In fact, a graphical representation does not allow, for example, a pair of polygons that are both disjoined and overlapped.

One important issue is the definition of a sound method for analysis of syntactic and semantic correctness, of queries that may lead to multiple system and user interpretations. Thus, we illustrate an approach able to determine the exact syntactic and semantic interpretations of geographic configurations involved in queries expressed by a pictorial query language.

A visual query language with clear syntax and semantics can prevent a priori many ambiguities, minimizing multiple interpretations. The goal of this article is to present the configurations between geographical objects that can be considered syntactically correct in a geographical context, identify the set of GeoPQL operators referred to each configuration, and give each configuration a nonambiguous semantic.

SYNTACTIC CORRECTNESS OF PICTORIAL CONFIGURATIONS

In this section, a generic SGO pair, part of the set of all query SGOs, is considered. The set of operators syntactically admissible for this pair is defined, starting from possible spatial configurations (see Figure 2; Shekhar & Chawla, 2002). Thus the system considers for each configuration drawn by the user only the syntactically correct predicates, ensuring the syntactical correctness of the pictorial query. The proposed operators represent the constraints on the spatial properties of the objects (or classes) of the database that the user must specify in his or her query in order to find the geographic objects of interest.

Each possible spatial configuration is given a code of three alphabetic characters followed by a number. The first two characters indicate the type of SGO, the third indicates the type of spatial configuration, and the number distinguishes the configuration. In this manner, the first configuration between two polygons in Figure 2 is referred to as “aaA1,” the last as “aaE1.”

672

TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS

Figure 2. The different configurations considered

Let ψi and ψj be two SGOs that form a given configuration and are associated respectively with the database’s geographical classes gcα and gcβ, possibly coincident.

For each configuration in Figure 2, a set of predicates <ψi Operator ψj> syntactically correct on the base of geometric and topological properties of ψi and ψj can be considered.

Table 1 summarizes syntactically correct predicates for all the configurations in Figure 2.

The operators geo-alias, geo-any, and geo-distance, in contrast to the other operators, must be expressed in the query using a suitable symbol (i.e., a labeled edge). Consequently, these operators are not linked to the particular configuration between the SGO pair involved, but to the symbols between them. For this reason, these kinds of operators require verification of the applicability requisites only, without considering the syntactical correctness of the configurations.

The geometric operators geo-union and geo-differ- ence are applied only in relation to the specification of the properties (e.g., the value of the area) of their result (i.e., the object union or difference of the involved SGO). The property specification is sometimes related to the application of topological operators (e.g., the value of the intersection area between two SGOs).

SEMANTIC CORRECTNESS OF PICTORIAL CONFIGURATIONS

In the previous table, all possible configurations of pairs of SGOs (operands) were considered. Their syntactic correctness was verified and the set of applicable operators identified. These properties are independent of the geographic database. In contrast with syntactic correct-

ness, semantic correctness is related to the meaning of

5

the geographical information, so some syntactically

correct configurations could be semantically incorrect, and some operators syntactically applicable to a configuration may be inapplicable semantically, due to the geographical information involved.

For example, the crossing of two polylines is always syntactically correct, and it is semantically correct, too, if the polylines represent two streets or a street and a river (the result could be a bridge), but it is semantically incorrect if they represent two rivers. It is obvious that semantic correctness is a subset of the syntactic correctness of the GeoPQL algebra operators applied to all possible pairs of SGOs.

Let ψi and ψj be two SGOs that form a given configuration and are associated respectively with the database’s geographical classes gcα and gcβ, possibly coincident.

The predicate <ψi Operator ψj> is semantically correct if it is syntactically correct (for the configuration) and results in a non-null set of geographical objects. A configuration between the SGOs ψi and ψj is semantically correct if it is syntactically correct and at least one of its associated operators is semantically correct. Obviously, semantic correctness depends on the geographical classes gcα and gcβ associated with ψi and ψj.

For all the operators for which the symmetric property is not valid from the syntactical point of view, the semantical correctness depends on the order of the two operands. For example, a region can include a lake, but a lake cannot include a region.

If the result is a null set of objects, however, the user can autonomously define the predicate as “semantically correct, but absent in the database” for that configuration and manage “lists of semantic correctness for configurations, operators, and pair of geographical classes,” or he or she can define the predicate as “semantically not correct” and manage “lists of semantic incorrectness for configurations, operators, and pair of geographical classes.”

This procedure can be applied every time a new geographic class is defined. However, it can be very onerous because it is necessary to specify, for all configurations and operators, between the new class and previously defined classes giving a null result, whether they are correct or not.

Thus the system considers configurations and operators giving a non-null result as semantically correct. It considers all remaining configurations as undetermined until the user formulates a query that involves a <ψi Operator ψj > predicate for the pair of geographical classes. It is only then that the user decides if this construct is semantically correct, absent, or incorrect.

It is also important to specify the different role played, from the semantical point of view, by the topological

673

TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS

Table 1. Summary of sintactically correct operators for the different configurations considered

Configurations

Descriptions

Operators syntactically correct

 

 

 

 

ppA1

The configuration can represent two points ψi

ψi Uni ψj

(or ψj Uni ψi)

 

and ψj which are separate, or specify a

ψi Dsj ψj

(or ψj Dsj ψi)

 

property of their union.

 

 

 

 

 

 

ppB1

The points ψi and ψj are elements of different

ψi Eql ψj

(or ψj Eql ψi)

 

thematisms (classes) and their common

 

 

 

property is their identical spatial coordinates

 

 

 

 

 

 

plA1

The configuration represents a point ψi and a

ψi Dsj ψj

(or ψj Dsj ψi)

 

polyline ψj which are separate.

 

 

plB1

In this configuration a point ψi is located over

ψj Tch ψi

 

 

a polyline and ψj.

ψj Inc ψi

 

plC1

This is equal to the previous configuration and

ψj Tch ψi

 

 

represents a point ψi located over a polyline ψj

ψj Inc ψi

 

 

on the boundary.

 

 

 

 

 

 

paA1

The configuration represents two polygons ψi

ψi Dsj ψj

(or ψj Dsj ψi)

 

and ψj, which are separate.

 

 

paB1

The configuration represents a point ψi located

ψj Inc ψi

 

 

in a polygon ψj.

 

 

paC1

The configuration represents a point ψi located

ψj Tch ψi

 

 

on the boundary of the polygon ψj.

ψj Inc ψi

 

llA1

The configuration represents two polylines ψi

ψi Dsj ψj

(or ψj Dsj ψi)

 

and ψj, which are separate, or allow

ψi Uni ψj

(or ψj Uni ψi)

 

specification of a property of their union.

 

 

 

 

 

 

llB1–llB2

Two configurations can be considered: a) ψ j is

ψi Uni ψj

(or ψj Uni ψi)

 

completely within ψ i ; b) ψ j is completely

ψj Inc ψi

 

 

within ψ i and they have one common

ψj Dif ψi

 

 

boundary point.

ψi Ovl ψj

(or ψj Ovl ψi)

llC1– llC2– llC3–

These six configurations represent the different

ψi Tch ψj

(or ψj Tch ψi)

llC4– llC5– llC6

cases in which the two polylines ψi and ψj can

for llC2, llC3 and llC6

 

have some, but not all, of their points in

ψi Uni ψj

(or ψj Uni ψi)

 

common, without crossing.

always

 

 

 

 

 

 

ψj Dif ψi

for

 

 

llC1, llC4, and llC5

 

 

ψi Ovl ψj

(or ψj Ovl ψi)

 

 

for llC1, llC4, and llC5

llD1

These two configurations represent the

ψi Uni ψj

(or ψj Uni ψi)

 

different cases in which the two polylines ψi

always

 

 

and ψj can have some, but not all, of their

ψi Ovl ψj

(or ψj Ovl ψi)

 

points in common, while crossing.

for llD2

 

 

 

ψi Crs ψj

(or ψj Crs ψi)

 

 

for llD1

 

llE1

In this configuration the polylines ψi and ψj

ψi Eql ψj

(or ψj Eql ψi)

 

are elements of different thematisms (classes)

 

 

 

and are spatially coincident.

 

 

 

 

 

 

operators in respect to the other operators. For example, if the query consists of two polylines that cross themselves, the syntactically correct predicates for this configuration are:

ψCrs ψ

ψii Uni ψjj

The first predicate derives from a topological property of the configuation of the two SGOs involved in the query. Thus, it is semantically correct if the user explic-

itly specifies that the operator is semantically correct for that pair of operands, or if there is a set of geographical object pairs that satisfies this predicate in the database.

The second predicate is taken into consideration in the query only if the user specifies in it properties concerning the union of the SGOs. For this reason it is semantically correct if there are common properties between the two geographical classes gcα and gcβ represented by ψi and ψj.

674

TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS

Table 1. Summary of sintactically correct operators for the different configurations considered (cont.)

5

Configurations

Descriptions

Operators syntactically correct

 

 

 

 

laA1

The configuration represents a polyline ψi and

ψi Dsj ψj

(or ψj Dsj ψi)

 

a polygon ψj, which are separate.

 

 

 

 

laB1– laB2–

These 10 configurations represent the various

ψj Inc ψi

always

 

laB3– laB4–

cases in which a polygon ψj contains a polyline

ψi Tch ψj

(or ψj Tch ψi)

laB5– laB6–

ψi.

for

laB2–laB3–laB4–laB5–laB6–

laB7– laB8–

 

laB7–laB8–laB9–laB10

 

laB9– laB10

 

 

 

 

 

 

 

 

 

 

 

laC1– laC2–

These five configurations represent the various

ψi Tch ψj

(or ψj Tch ψi) always

laC3– laC4– laC5

cases in which a polyline ψi touches a polygon

ψi Dif ψj

for laC3 and laC4

 

ψj.

ψj Inc ψi

for laC5

 

laD1– laD2–

These six configurations represent the cases in

ψi Pst ψj

always

 

laD3– laD4–

which a polyline ψi intersects a polygon ψj.

ψi Dif ψj

always

 

laD5– laD6

 

 

ψi Tch ψj

(or ψj Tch ψi)

 

 

 

 

for laD4–laD5–laD6

 

aaA1

The configuration represents two polygons ψi

ψi Dsj ψj

(or ψj Dsj ψi)

 

and ψj, which are separate, or allow a property

ψi Uni ψj

(or ψj Uni ψi)

 

of their union to be specified.

 

 

 

 

 

 

 

 

aaB1–aaB2

These configurations represent the cases in

ψi Uni ψj

(or ψj Uni ψi )

 

which two polygons ψi and ψj are touching.

ψi Tch ψj

(or ψj Tch ψi)

aaC1– aaC2

These configurations represent the cases in

ψi Uni ψj

(or ψj Uni ψi )

 

which two polygons ψi and ψj overlap.

ψi Dif ψj

(or ψj Dif ψi )

 

 

ψi Ovl ψj

(or ψj Ovl ψi )

aaD1– aaD2–

These configurations represent the cases in

ψi Uni ψj

(or ψj Uni ψi)

always

aaD3

which the polygon ψi encloses the polygon ψj.

ψi Dif ψj

always

 

 

 

 

 

ψi Inc ψj

always

 

 

 

ψi Ovl ψj (or ψj Ovl ψi)

always

 

 

ψi Tch ψj

(or ψj Tch ψi)

 

 

for aaD1and aaD2

 

aaE1

In this configuration, the polygons ψi and ψj

ψ i Eql ψ j

(or ψ j Eql ψ i )

 

are elements of different thematisms (classes)

 

 

 

 

 

and are spatially coincident.

 

 

 

 

 

 

 

 

 

 

FUTURE TRENDS

A query on a geographical database involves the following two types of information properties:

spatial properties (e.g., the topological relationships or the distance). These properties are verified by the geometric attributes of different objects which form the database; and

query, some problem related to their use could arise. Other works refer to studies regarding the meaningful power, completeness, and unambiguousness of visual query languages.

CONCLUSION

The use of nonprocedural visual query languages allows

descriptive properties (e.g., the population of a the user to query geographical databases almost without

given region, or the length of a given river). These properties are verified by the alphanumeric attributes of the aforementioned objects.

Future trends in query languages for GIS refer to the representation of spatial properties using visual approaches. However, if the use of visual approaches allows to the user to specify, in a graphical way, topological and geometric properties among all the objects involved in the

knowing procedures and syntax of these languages. In particular, by pictorial query languages, user friendliness and ambiguous queries is no longer an alternation. Different problems are still open and are subjects of studies, as is the advancement the technology brings to new proposals and solutions for these kinds of languages.

675

TEAM LinG

Syntactical and Semantical Correctness of Pictorial Queries for GIS

REFERENCES

Aufaures-Portier, M. A. (1995). A high level interface language for GIS. Journal of Visual Languages and Computing, 6(2), 167-182.

Aufaures-Portier, M. A., & Bonhomme, C. (1999). A high level language for spatial data management. Proceedings on Visual Information Systems (VISUAL ’99), 1614 (pp. 325-332). Amsterdam, The Netherlands: Springer-Verlag.

Blaser, A. D., & Egenhofer, M. J. (2000). A visual tool for querying geographic databases. Proceedings on Advances in Visual Interfaces (pp. 211-216). Palermo, Italy: ACM.

Calcinelli D., & Mainguenaud, M. (1994). Cigales, a visual language for geographic information system: The user interface. Journal of Visual Languages and Computing, 5(2), 113-132.

Chrisman, N. (2002). Exploring geographic information systems. Wiley.

Egenhofer, M. J. (1997). Query processing in spatial- query-by-sketch. Journal of Visual Languages and Computing, 8(4), 403-424.

Favetta, F., & Aufaure-Portier, M. A. (2000). About ambiguities in visual GIS query languages: A taxonomy and solutions. Proceedings on Visual Information, 1929

(pp. 154-165). Lyon, France: Springer-Verlag.

Ferri, F., Massari, F., & Rafanelli, M. (1999). A pictorial query language for geographic features in an objectoriented environment. Journal of Visual Languages and Computing, 10(6), 641-671. Academic Press.

Ferri, F., & Rafanelli, M. (2004). GeoPQL: A geographical pictorial query language (Tech. Rep). IASI-CNR.

Laurini, R., & Thompson, D. (1992) Fundamentals of spatial information systems. Academic Press.

Mainguenaud, M., & Portier, M. A. (1990). Definition of Cigales: A GIS query language. Proceedings on Databases and Expert Systems Applications (pp. 275-280). Vienna, Austraia: Springer-Verlag.

Meyer, B. (1993). Beyond icons: Towards new metaphors for visual query languages for spatial information systems. Proceedings of the I° International Workshop on Interfaces in Database Systems (pp. 113-135). Glasgow, UK: Springer-Verlag.

Rigaux, P., Scholl, M. O., & Voisard, A. (2001). Spatial databases: With application to GIS. Morgan Kaufmann.

Shekhar, S., Chawla, S., Ravada, S., Fetterer, A., Liu, X., & Lu, C. (1999) Spatial databases: Accomplishments and research needs. IEEE Transactions on Knowledge and Data Engineering, 10(1), 45-55.

Shekhar, S., & Chawla, S. (2002). Spatial databases: A tour. Prentice Hall.

KEY TERMS

Blackboard Metaphor: A metaphor used in query languages, its philosophy is to let users draw the sketch of their query.

Declarative (Nonprocedural) Query Language:

A general term for a query language, as opposed to an imperative query language. Imperative (or procedural) languages specify explicit sequences of steps to follow to produce a result, while declarative (nonprocedural) languages describe relationships between variables in terms of functions or inference rules, and the language executor (interpreter or compiler) applies some fixed algorithm to these relations to produce a result.

Geographical Database: A database in which geographical information is store by x-y coordinates of single points, or points that identify the boundaries of lines (or polylines, which sometimes represent the boundaries of polygons). Different attributes characterize the objects stored in these databases. In general, the storing structure consists of “classes” of objects, each of them implemented by a layer. Often a geographical database includes raster, topological vector, image processing, and graphics production functionality.

Geographical Information System (GIS): A computerized database system used for the capture, conversion, storages, retrieval, analysis, and display of spatial objects.

Metaphor: Figurative language that creates an analogy between two unlike things. Metaphor does not make a comparison but creates its analogy by representing one thing as something else.

Pictorial Query Language: A general term for a query language, as opposed to a textual query language. Pictorial languages describe the result to produce characterized by, or composed of, pictures.

Visual Query Language: A language that allows the user to specify its goals in a two- (or more) dimensionsional way with visual expressions—spatial arrangements of textual and graphical symbols.

676

TEAM LinG

Temporal Databases

Mahesh S. Raisinghani

Texas Woman’s University, USA

Chris Klassen

University of Dallas, USA

INTRODUCTION

Information has emerged as an agent of integration and the enabler of new competitiveness for today’s enterprise in the global marketplace. The degree of change in the paradigm for storage of data in databases is examined to determine whether it can support the accelerated response time required for information systems and technology. This paper discusses the key concepts for understanding temporal databases, including major data types and its principal purpose. Moreover, once the temporal extension is added to existing ANSI and ISO SQL standards, it will enable users to take advantage of new temporal features in the major database products.

Time has always been of great interest to mankind. In recent years, the computer industry has made a great influence in human lifestyle and it like other industries is time prone. For instance, the year 2000 problem (Y2K) was time-related. Not surprisingly, researchers working in the field of databases are expressing increasing interest in the dimension of time. Classical database design is two-dimensional and contains only current data, which can be termed snapshot data type. Today’s businesses must adapt constantly to an ever-changing business environment, and databases must support an evolving business framework. More and more timemanagement seminars and devices are introduced everyday. As Snodgrass (1998) notes, “Time varying data is becoming pervasive. It has been estimated that one of every 50 lines of database application code involves a date or time value.”.

BACKGROUND:

THE TIME DIMENSION IN

DATABASES

The classical database is two-dimensional with columns and rows that intersect each other at cells, which contain particular values. Now extend this flat two-dimensional (2D) database into a three-dimensional (3D) figure, such as a cube. Apply this 3D concept to a database,

677

6

instead of a flat 2D construct, and now you have an extended 3D figure with the third dimension being represented as various time intervals (Tansel et al., 1993). Date, Darwen, and Lorentzos (2003) provide a detailed investigation into the application of interval and relational theory to the problem of temporal database management.

Jensen, Mark, Roussopoulos, and Sellis (1993) presented an architecture for query processing in the relational model extended with transaction time that integrates standard query optimization and computation techniques with new differential computation techniques. The use of differential computation techniques is essential in order to provide efficient processing of queries that access very large temporal relations.

Temporal database systems are systems that provide special support for storing, querying, and updating historical and future data (Date et al., 2003). A temporal database records time-variant information. Date (2004) states that the relational model needs no extension or corruption in order to support the time dimension. Snodgrass (1998) defines the temporal database as “a database that supports some aspect of time”. Another definition that may be better structured to fit a Temporal Relational Model states that a temporal database is defined as “an union of two sets of relations Rs and R1, where Rs is the set of all static relations and R1 is the set of all time-varying relations” (Navathe & Ahmed, 1993). This article is limited to the consideration of the relational model of the temporal database to the exclusion of the other well-known types of databases such as object-oriented, network, and hierarchical. While there is little temporal database research currently on the latter two types of databases, an increasing amount of research is being done in the area of temporal objectoriented databases. We might also note that temporal databases have also been referred to as time-oriented databases, time-varying databases, or historical databases. While time-oriented database and time-varying database are equivalent in meaning to temporal database, historical database is not. As discussed later in this article, a historical database is actually a subset of the temporal database.

Copyright © 2006s, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

The first thing to note about the Snodgrass (1998) definition of a temporal database is that the time dimension requirement for a database to be temporal does not include user-defined time. User-defined time is some aspect of time that is not recognized by the database management system as a special data type. A classical database essentially treats data, such as birthdates, as text strings. This treatment of date data does not allow for much manipulation. One way of attempting to avoid this problem is to treat the date as a type of text data to store the date as a number. This approach, however, has its own problem set. One such problem found in Oracle version 7.2 is that in some instances in year 2000, dates are treated as earlier than dates in the 1900s. Converting dates to numbers internally may be best way of making possible a wide range of ad hoc queries on temporal data. The fact that support of user-defined time does not merit a database being considered temporal and does not mean that the temporal database cannot or will not include user-defined time.

TEMPORAL DATABASE

CHARACTERISTICS

There are several types of time in a temporal database. Before describing these different types in detail, it is important to clarify the concept of precision relating to the time stored in a temporal database. This concept denoting precision in a temporal database is called the granularity of time (Howe, 1997). An example of progressive time granularity includes a day, an hour, a second, or a nanosecond.

It is important to understand three major concepts to grasp the precise nature of a temporal database. The first is the concept of valid time. The second concept is transaction time, the actual time recorded in the database at the time the data is entered. Time stamps can include either the date or the date and clock time. An object is an entity that has a well-defined role in the application domain, and its features include state, behavior, and identity. An employee is a good example of an object. In a classical database, once a change is made to an employee’s record, original data is changed, discarded, and replaced by new data. However, in a temporal database, which supports transaction time, transaction time can be attached in the form of a time stamp to both the old data and to the new data for that employee. In so doing, the database can store both the old data and new data for the same object. In this case, the salary of the employee was increased on a certain date. It is important to note here that the transaction time values or time stamps cannot be later than the current point in

Temporal Databases

time nor can they be changed, just as the past cannot be changed.

Another major type of time in a temporal database is termed as valid time. Valid time is the actual or realworld clock time at which the data is valid. Continuing with the employee example, while transaction time is the point in time at which the data is entered into the database, valid time is the unique point when the entered data become true or take effect. For instance, on January 3, an employee is notified that an increase in salary will be effective February 1. The Human Resources department, after being notified of the employee’s raise, must enter the new salary into the database. Presumably, they will enter the data before the raise goes into effect. The actual time they enter the raise into the database is prior to February 1st, and that time will be the time stamp for the transaction time. The data, however, are not yet valid, and for the rest of January, the employee will continue to receive the current salary. However, on February 1, the raise data will become valid. Thus, February 1 is the valid time. Also, if an employee receives a raise that is retroactive, the transaction time may be later than the valid time.

The two major types of time unique to the temporal database, valid time and transaction time, allow for the possibility of three forms of temporal databases: historical, rollback, and bi-temporal (Steiner, 2003). A historical database supports valid time, but not transaction time. To reiterate, a historical database is a poor choice as an alternative term for the temporal database. The reason is that a historical database is but one type of temporal database. A historical database, however, as explained later, would be a poor choice for anyone wishing to deploy a temporal database. The second form of a temporal database is the rollback database. This database is the opposite of the historical database. The rollback database only supports transaction time and not valid time. As opposed to the historical database, rollback database is quite useful for data recovery after database failure. The reason then that a temporal database would rarely be desirable is that it could not support rollback after DBMS failure. It is also necessary if the database does not use the locking technique to ensure data security. Most databases on the market today do support at least some rollback features.

In reality, a temporal database is a bi-temporal database. This database supports both types of time that are necessary for storing and querying time-varying data. The bi-temporal database could aid significantly in knowledge discovery, since it is able to fully support the time dimension on three levels: the DBMS level with transaction time, the data level with valid time, and the userlevel with user-defined time.

678

TEAM LinG

Temporal Databases

DATATYPES IN A TEMPORAL

QUERY CAPABILITIES FOR THE

 

6

DATABASE

TEMPORAL DATABASE

 

 

 

A temporal database supports three major data types: temporal, static, and snapshot data. It is important that the data types be transparent since the user need not know whether the data used is temporal, static, or snapshot. One goal of temporal database should be provision of seamless handling of temporal, static, and snapshot data (Gadia & Nair, 1993).

The temporal datatype is the most important datatype for the temporal database, for the simple fact that it provides the foundation for building the temporal database. The temporal datatype has been defined as “a finite union of intervals” (Gadia & Nair, 1993). This datatype is termed temporal element. The temporal element is necessary for using time-varying criteria to perform adhoc queries. Later in this article, an example of a temporal querying language is presented, which uses the temporal element in a new temporal clause that is added to the existing SQL syntax.

Static datatype is defined as “a constant defined over the whole universe of time” (Gadia & Nair, 1993). In other words, the validity of a static value is defined as any future time. In contrast, temporal datatype value is valid for a specified time period or interval, and a snapshot datatype value is valid only for the current instant. Another way of expressing the validity of static data is to say that its domain is any finite point in the future. Without a finite period of validity, the static datatype does not need a time stamp.

The final major datatype is termed snapshot data. In a conventional or classical database, all data is of the snapshot datatype (Gadia & Nair, 1993; Navathe & Ahmed, 1993; Ozsoyoglu & Snodgrass, 1995). When tables are updated, the new values replace old values, which are discarded, and a new snapshot is created. Unfortunately, with the snapshot datatype, the history of the changes to the data is lost. Thus, the database should support the history of the data for a given business, so merely having a snapshot datatype supported by a database is unacceptable, hence, proving the need for temporal database. Another reason to retain the capability for the snapshot datatype is to provide a smooth transition from a classical database to a temporal database. Since all the data in a classical database is of the snapshot datatype, migration from a classical database to a temporal database should be seamless (Gadia & Nair, 1993).

One of the most important reasons for having a database that supports the temporal dimension is the ability to perform ad-hoc queries about the data. The current standard for conventional (relational) databases is Structured Query Language (SQL). SQL has become the industry standard for Relational Database Management Systems (RDBMSs) because of its ease of use due to its English-like syntax. The addition of the temporal dimension, however, greatly increases the complexity of the queries using temporal data. With the additional element of time, SQL in its current form is no longer able to process ad-hoc queries as was done formally using a (relational) classical database. A new query language or extension to SQL is necessary. Understandably, one of the biggest areas of research today in the field of temporal databases is focused on this topic. However, a few remarks on this topic are provided, and one example of a query language extension is given in the following discussion.

Any new temporal query language should support current SQL capabilities seamlessly. The most promising proposals for a query language which supports the time dimension and also continues to allow users to perform queries without specifying any time criteria (thus avoiding high retraining costs, among other things) are not really new query languages, per se, but are simply extensions of SQL. One of the most promising examples of temporal-capable SQL is called, appropriately, TSQL for Temporal Structured Query Language. With TSQL, ad-hoc queries can be performed without specifying any time-varying criteria. Thus, the only essential clauses that are necessary to perform a query in TSQL remain the same as in SQL. These are the SELECT and FROM clauses. If desired, a new criteria clause can be added to current SQL criteria clauses, which are WHERE, GROUP BY, HAVING, and ORDER BY. The new criteria clause would be either WHEN or WHILE. Either name captures the idea of a time-varying condition. Böhlen, Jensen, and Snodgrass (2000) advocate a different approach to articulating a set of requirements that directly implies the syntactic structure and core semantics of a temporal extension of an (arbitrary) nontemporal query language. This extended language, termed ATSQL, is formally defined via a denotational-semantics-style mapping of tempo-

679

TEAM LinG

Соседние файлы в предмете Электротехника