
Rivero L.Encyclopedia of database technologies and applications.2006
.pdfRDF Model Theory: The RDF Model theory formally defines the interpretation of an RDF model using the notion of logical implication. The specification also defines a set of inference rules for computing implied statements.
RDF Schema: An RDF schema is an explicit representation of the conceptual model underlying an RDF model. The schema is represented in RDF using a set of properties with a standardized interpretation and effect on the interpretation of the model.
Query Processing for RDF Data
Resource Description Framework (RDF): The Resource Description Framework (RDF) is an XML based language for defining metadata for Web resources and relations between them. It is a W3C recommendation and serves as a basis for representing information and knowledge on the Semantic Web.
Statement: A statement is the basic structure found in an RDF specification. It consists of a subject, which is a Web resource represented by a unique identifier, and a predicate, which is a property that links the subject to an object which is either also a Web resource or an atomic value.
510
TEAM LinG

511
Query Processing in Spatial Databases1 3
AntonioCorral
University of Almeria, Spain
MichaelVassilakopoulos
TEI of Thessaloniki, Greece
INTRODUCTION
Spatial data management has been an active area of intensive research for more than two decades. In order to support spatial objects in a database system, several important issues must be taken into account such as spatial data models, indexing mechanisms, and efficient query processing. A spatial database system (SDBS) is a database system that offers spatial data types in its data model and query language and supports spatial data types in its implementation, providing at least spatial indexing and efficient spatial query processing (Güting, 1994).
The main reason that has caused the active study of spatial database management systems (SDBMS) comes from the needs of the existing applications, such as geographical information systems (GIS), computer-aided design (CAD), very large-scale integration design (VLSI), multimedia information systems (MIS), data warehousing, and so forth.
Some of the most important companies in the commercial database industry (Oracle, Informix, Autodesk, etc.) have products specifically designed to manage spatial data. Moreover, research prototypes as Postgres and Paradise offer the possibility to handle spatial data. The main functionality provided by these products includes a set of spatial data types such as the point, line, polygon, and region, and a set of spatial operations, including intersection, enclosure, and distance. The performance enhancement provided by these operations includes spatial access methods and query algorithms over such indexes (e.g., spatial range queries, spatial joins, etc.). We must also cite the Open Geographic Information Systems (OGIS) consortium (http://www.opengis.org/), which has developed a standard set of spatial data types and operations and SQL3/SQL99, which is an object-relational query language that provides the use of spatial types and operations.
In a spatial database system, the queries are usually expressed in a high-level declarative language such as SQL; therefore, specialized database software has to map the query in a sequence of spatial operations supported by spatial access methods (Shekhar & Chawla, 2003).
Spatial query processing refers to the sequence of steps that a SDBMS will initiate to execute a given spatial query. The main target of query processing in the database field is to process the query accurately and quickly (consuming the minimum amount of time and resources on the computer), by using both efficient representations and efficient search algorithms (Graefe, 1993). Query processing in a spatial environment focuses on the design of efficient algorithms for spatial operators (e.g., selection operations, spatial joins, distance-based queries, etc.). These algorithms are both CPU and I/O intensive, despite common assumptions of traditional databases that the I/ O cost will dominate CPU cost (except expensive distancebased queries), and therefore an efficient algorithm is one that minimizes the number of disk accesses.
BACKGROUND IN SPATIAL QUERIES AND PROCESSING
From the query processing point of view, the following three properties characterize the differences between spatial and relational databases (Brinkhoff, Kriegel & Seeger, 1993): (1) unlike relational databases (Elmasri & Navathe, 2000), spatial databases do not have a fixed set of operators that serve as building blocks for query evaluation; (2) spatial databases deal with extremely large volumes of complex objects, which have spatial extensions and cannot be sorted in a one-dimensional array; (3) computationally expensive algorithms are required to test the spatial operators, and the assumption that I/O costs dominate CPU costs is no longer valid.
We generally assume that the given spatial objects are embedded in d-dimensional Euclidean space (Ed). An object obj in a spatial database is usually defined by several non-spatial attributes and one attribute of some spatial data type (point, line, polygon, region, etc.). This spatial attribute describes the geometry of the object obj.G Ed, that is, the location, shape, orientation, and size of the object. The most representative spatial operations, which are the basis for the query processing in spatial databases, are (1) update operations; (2) selection operations (point and range queries); (3) spatial join; and
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
(4) spatial aggregate queries (Gaede & Günther, 1998; Shekhar & Chawla, 2003).
•Update Operations: Standard database operations such as modify, create, and so forth.
•Point Query (PQ): Given a query point p Ed, find all spatial objects O that contain it.
•Range Query (RQ): Given a query polygon P, find all spatial objects O that intersect P. When the query polygon is a rectangle, this is called a window query.
Query Processing in Spatial Databases
irrelevant objects quickly. An MBR is characterized by min and max points of hyper-rectangles with faces parallel to the coordinate axes. Using the MBR instead of the exact geometrical representation of the spatial object, its representational complexity is reduced to two points, where the most important object features (position and extension) are maintained. The R-tree (Guttman, 1984) is a spatial access method that represents the spatial objects by their MBR, and it is a height-balanced tree. Therefore, in the filter step, many candidates are eliminated using the
•Spatial Join Query (SJQ): Given two collections R spatial predicate and the MBRs of the spatial objects. In
and S of spatial objects and a spatial predicate θ, find all pairs of objects (O, O’) R S (O R and O’ S), where θ(O.G, O’.G) evaluates to true. Some examples
of the spatial predicate θ are intersects, contains, is_enclosed_by, distance, northwest, adjacent, meets, and so on. For spatial predicates such as contains, encloses, or adjacent, for example, the intersection join is an efficient filter that yields a set of candidate solutions typically much smaller than the Cartesian product RxS. An extension of the intersection join is the multiway spatial join, which involves an arbitrary number of spatial inputs (Mamoulis & Papadias, 2001). Very interesting distance join queries are actually being studied, for example, closest pairs query (Corral et al., 2000), buffer query (Chan, 2003), nearest neighbors join (Böhm & Krebs, 2002), iceberg queries (Shou et al., 2003), distance join queries of multiple inputs (Corral et al., 2003), and so on.
•Spatial Aggregate Queries (SAQ): This kind of spatial query involves specifying a region of space and asking for the value of some aggregate function (e.g., count, sum, min, max, average) for which we have measurements for this given region (Papadias et al., 2001). Spatial aggregates are usually variants of the nearest neighbor problem (Shekhar & Chawla, 2003). The Nearest Neighbor Query (NNQ) has the form: given a spatial object O’, find all spatial objects O having a minimum distance from O’. The distance between extended spatial data objects is usually defined as the distance between their closest points (common distance functions for points include the Euclidean and the Manhattan distance). An interesting variant of NNQ is the reverse nearest neighbor query (RNNQ), which reports the points that have the query point as their nearest neighbor (Korn & Muthukrishnan, 2000).
The spatial queries are often processed using filter and refine techniques to minimize both the CPU and I/O cost (Brinkhoff et al., 1994). Approximate geometry such as the minimal orthogonal bounding rectangle (MBR) of an extended spatial object is first used to filter out many
the refinement step, the exact geometry of each spatial object from the candidate set (result of the filter step) and the exact spatial predicate are examined. This step usually requires the use of CPU-intensive algorithms like computational geometry algorithms for spatial operations (Rigaux, Scholl & Voisard, 2001). Strategies for range-queries include a scan and index-search in conjunction with the plane-sweep algorithm (Brinkhoff et al., 1993). Strategies for the spatial join include the nested loop, tree matching (Brinkhoff et al., 1993; Huang, Jing & Rundensteiner, 1997), when indices are present on all participating inputs and space partitioning (Arge et al., 1998; Lo & Ravishankar, 1996; Patel & DeWitt, 1996) in absence of indexes. For the case when one spatial input is indexed, the most representative join strategies have been proposed by Lo & Ravishankar (1994) and Mamoulis & Papadias (2003).
Nearest neighbor queries (NNQ) are common in many applications, for example, GIS, pattern recognition, document retrieval, and learning theory. As the previous spatial queries, NNQ algorithms are also two-step algorithms (filter-refine paradigm), and they follow branch- and-bound techniques, using distance functions and pruning heuristics in order to reduce the search space. The most representative algorithms to perform NNQ over spatial data have been proposed by Roussopoulos, Kelley, and Vincent (1995) and Hjaltason and Samet (1999) on R- trees. The first query algorithm follows a depth-first traversal, whereas the second one is an incremental algorithm following a best-first search on the R-tree. These algorithms can be extended to find K-nearest neighbors by slight modification of the pruning rules to retain the K best candidates.
PERSPECTIVE AND NEW
IMPORTANT ISSUES
We have reviewed the most representative spatial queries, using mainly the overlap predicate for range queries and spatial join queries. However, there is a need to develop and evaluate query strategies for many other frequent spatial queries that can be demanded by the
512
TEAM LinG

Query Processing in Spatial Databases
Table 1. A list of new spatial queries
3
Buffer |
Find the areas 500 meters from power lines |
|
Voronoize |
Classify households as to which supermarket they are closest to |
|
Neighborhood |
Determine slope based on elevation |
|
Network |
Find the shortest path from the warehouse to all delivery stops |
|
Allocation |
Where is the best place to build a new restaurant? |
|
Transformation |
Triangulate a layer based on elevation |
|
Ranking |
Find the top-k hotels with the largest number of nearby |
|
|
restaurants |
|
Chromatic |
Find the type of monument nearest to the Eiffel Tower |
|
Aggregate |
Find the total number of (restaurant, hotel) pairs that are within |
|
|
1 km from each other |
|
Multi-way |
Find all cities within 100 km of Madrid crossed by a river which |
|
|
intersects an industrial area |
users in a spatial database system. Table 1 summarizes some of these new spatial queries, which include queries on objects using predicates other than overlap and queries on fields such as slope analysis as well as queries on networks, and so forth (Shekhar et al., 1999).
The ever-increasing demand and easy availability of the Internet have prompted the development of Webbased Geographic Information Systems (WGIS) for easy sharing of spatial data and executing spatial queries over the Internet using Web environments like Web servers, Web clients, common data formats (HTML, XML), common communication protocols (http), uniform resource locator (URL), and so forth. In a WGIS architecture (Shekhar & Chawla, 2003), the GeoSpatial Database Access Layer (GSDAL) allows access to the spatial data using a SDBMS, where efficient query processing is required. In order to improve Web-Based Spatial Database Systems (WSDBS), new important issues for SDBMSs and Web technology need to be addressed. For example, to offer SDBMS services on the Web, evaluate and improve the Web for SDBMS clients and servers, use Web data formats for spatial data (e.g., VMS and GML), employ safe communication protocols, allocate adequate search facilities on the Web, develop compatibility between several WSDBS, improve maintenance and integrity of data, and so on.
FUTURE TRENDS
Most of the spatial query algorithms have been studied over point data sets in the 2-dimensional space, and a natural extension is to study their application on data sets containing objects with spatial extent (lines, polygons, regions, 3D objects, etc.). In order to carry out this work, geometric algorithms (based on computational geometry) have to be designed and analyzed for spatial operators using these complex spatial objects (Rigaux et al., 2001).
Many open research areas exist at the logical level of query processing, including query-cost modeling and
queries related to spatial networks. Cost models are used to rank and select the promising processing strategies, given a spatial query and a spatial data set. Traditional cost models may not be accurate in estimating the cost of strategies for spatial operations, mainly due to the distance metric. Cost models are needed to estimate the selectivity of spatial search and join operations toward comparison of execution-costs of alternative processing strategies for spatial operations during query optimization. Preliminary work in the context of the R-tree, using fractal-models for NNQ (Belussi & Faloutsos, 1998), using the concept of Minkowski sum for RQ and NNQ (Böhm, 2000), and using the concept of density of a rectangle set for RQ and SJQ (Theodoridis, Stefanakis & Sellis, 2000) have been developed, but more work is needed.
Spatial network databases are an important component of SDBS, since this is the kernel of many important real-life applications as transportation planning, air traffic control, urban management, electric, telephone and gas networks, river transportation, and so forth. Previous efforts in this research area have been focused on disk-based graph representation as the connectivity clustered access method (CCAM) (Shekhar & Chawla, 2003), nearest neighbor queries in road networks by transforming the problem to a high dimensional space (Shahabi, Kolahdouzan & Sharifzadeh, 2002), and a flexible architecture that integrates network representation (preserving connectivity and locations) and Euclidean restriction for processing the most common spatial queries (range search, nearest neighbors, spatial joins, and closest pairs) (Papadias et al., 2003). Interesting research has been developed in this field, but much more work is needed mainly in measuring the I/O cost for network operations in new queries (e.g., find the two nearest gas stations, in terms of network distance, along the route from Madrid to Barcelona), or in moving objects environment (e.g., find the closest taxi to our present location), and so on.
513
TEAM LinG
CONCLUSION
Spatial query processing refers to the sequence of steps that a SDBMS will initiate to execute a given spatial query. The main target of query processing in the database field is to process the query accurately and quickly by using both efficient representations and efficient search algorithms. Query processing in a spatial environment focuses on the design of efficient algorithms for spatial operators (e.g., selection operations, spatial joins, etc.). Spatial query operations can be classified into four groups: point, range, spatial join, and spatial aggregate. For spatial query processing, the filter-refine paradigm is used over spatial access methods to minimize both the CPU and I/O cost. Future research trends include the study of new spatial queries (especially on spatial networks), the study of issues related to Web-Based Spatial Database Systems, and work on cost models for estimating the selectivity of spatial queries.
REFERENCES
Arge, L., Procopiuc, O., Ramaswamy, S., Suel, T., & Vitter, J.S. (1998). Scalable sweeping-based spatial join. Proceedings of the 24th International Conference on Very Large Data Bases (VLDB 1998), New York, August 24-27.
Belussi, A., & Faloutsos, C. (1998). Self-spacial join selectivity estimation using fractal concepts. ACM Transactions on Information Systems, 16(2), 161-201.
Böhm, C. (2000). A cost model for query processing in high dimensional data spaces. ACM Transactions on Database Systems, 25(2), 129-178.
Böhm, C., & Krebs, F. (2002). High performance data mining using the nearest neighbor join. Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, December 9-12.
Brinkhoff, T., Kriegel, H.P., Schneider, R., & Seeger, B. (1994). Multi-step processing of spatial joins. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD 1994), Minneapolis, Minnesota, May 24-27.
Brinkhoff, T., Kriegel, H.P., & Seeger, B. (1993). Efficient processing of spatial joins using R-Trees. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD 1993), Washington, DC, May 26-28.
Chan, E.P.F. (2003). Buffer queries. IEEE Transactions on Knowledge Data Engineering, 15(4), 895-910.
Query Processing in Spatial Databases
Corral, A., Manolopoulos, Y., Theodoridis, Y., & Vassilakopoulos, M. (2000). Closest pair queries in spatial databases. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, May 16-18.
Corral, A., Manolopoulos, Y., Theodoridis, Y., & Vassilakopoulos, M. (2003). Distance join queries of multiple inputs in spatial databases. Proceedings of Advances in Databases and Information Systems (ADBIS 2003), Dresden, Germany, September 3-6.
Elmasri, R., & Navathe, S. (2000). Fundamentals of database systems. Addison-Wesley/Benjamin Cummings.
Gaede, V., & Günther, O. (1998). Multidimensional access methods. ACM Computing Surveys, 30(2), 170-231.
Graefe, G. (1993). Query evaluation techniques for large databases. ACM Computing Surveys, 25(2), 73-170.
Güting, R. (1994). An introduction to spatial database systems. VLDB Journal, 3(4), 357-399.
Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD 1984), Boston, June 18-21.
Hjaltason, G.R., & Samet, H. (1999). Distance browsing in spatial databases. ACM Transactions on Database Systems, 24(2), 265-318.
Huang, Y.W., Jing, N., & Rundensteiner, E.A. (1997). Spatial joins using R-trees: Breadth-first traversal with global optimizations. Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece, August 25-29.
Korn, F., & Muthukrishnan, S. (2000). Influence sets based on reverse nearest neighbor queries. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, May 16-18.
Lo, M.L., & Ravishankar, C.V. (1994). Spatial joins using seeded trees. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD 1994), Minneapolis, Minnesota, May 24-27.
Lo, M.L., & Ravishankar, C.V. (1996). Spatial hash-joins. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), Montreal, Quebec, Canada, June 4-6.
Mamoulis, N., & Papadias, D. (2001). Multiway spatial joins. ACM Transactions on Database Systems, 26(4), 424-475.
514
TEAM LinG

Query Processing in Spatial Databases
Mamoulis, N., & Papadias, D. (2003). Slot index spatial join. IEEE Transactions on Knowledge Data Engineering, 15(1), 211-231.
Papadias, D., Kalnis, P., Zhang, J., & Tao, Y. (2001). Efficient OLAP operations in spatial data warehouses. Proceedings of the Symposium on Spatial and Temporal Databases (SSTD 2001), Redondo Beach, California, July 12-17.
Papadias, D., Zhang, J., Mamoulis, N., & Tao, Y. (2003). Query processing in spatial network databases. Proceedings of the Very Large Data Bases Conference (VLDB 2003), Berlin, Germany, September 9-12.
Patel, J.M., & DeWitt, D.J. (1996). Partition based spatialmerge join. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), Montreal, Quebec, Canada, June 4-6.
Rigaux, P., Scholl, M.O., & Voisard, A. (2001). Spatial databases: With application to GIS. Morgan Kaufmann.
Roussopoulos, N., Kelley, S., & Vincent, F. (1995). Nearest neighbor queries. Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), San Jose, California, May 22-25.
Shahabi, C., Kolahdouzan, M., & Sharifzadeh, M. (2002). A road network embedding technique for k-nearest neighbor search in moving object databases. Proceedings of the 1996 ACM Symposium on Advances in Geographic Information Systems (ACM GIS 1996), McLean, Virginia, November 8-9.
Shekhar, S., & Chawla, S. (2003). Spatial databases: A tour. Prentice Hall.
Shekhar, S., Chawla, S., Ravada, S., Fetterer, A., Liu, X., & Lu, C.T. (1999). Spatial databases: Accomplishments and research needs. IEEE Transactions on Knowledge Data Engineering, 11(1), 45-55.
Shou, Y., Mamoulis, N., Cao, H., Papadias, D., & Cheung, D.W. (2003). Evaluation of iceberg distance joins. Proceedings of the 8th International Symposium on Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, July 24-27.
Theodoridis, Y., Stefanakis, E., & Sellis, T. (2000). Efficient cost models for spatial queries using R-trees. IEEE Transactions on Knowledge Data Engineering, 12(1), 19-32.
KEY TERMS
Buffer Query: This spatial query involves two spatial datasets and a distance threshold ∆. The answer is a set
of pairs of spatial objects from the two input datasets that |
|
are within distance ∆ from each other. |
3 |
Filter-Refine Paradigm: Algorithms that follow this paradigm are two-step algorithms. Filter step: an approximation of each spatial object is used to produce a set of candidates (and, possibly, a set of actual answers), which is a superset of the answer set consisting of actual answers and false hits. Refinement step: each candidate from the filter step is then examined with respect to its exact geometry in order to produce the answer set by eliminating false hits.
Iceberg Distance Join: This spatial query involves two spatial datasets, a distance threshold d, and a cardinality threshold K (K≥1). The answer is a set of pairs of objects from the two input datasets that are within distance ∆ from each other, provided that the first object appears at least K times in the join result.
K Closest Pairs Query: This spatial query involves two spatial datasets and a cardinality threshold K (K≥1). It discovers the K distinct pairs of objects from the two input datasets that have the K smallest distances between them.
K Nearest Neighbors Join: This spatial query in-
volves two spatial datasets and a cardinality threshold K (K≥1). The answer is a set of pairs from the two input
datasets that includes, for each of the spatial objects of the first dataset, the pairs formed with each of its K nearest neighbors in the second dataset.
Spatial Data Types: Spatial data types provide a fundamental abstraction for modeling the structure of geometric entities in space (geometry) as well as their relationships (topology), for example, points, lines, polygons, regions, and so forth. A spatial object is an object with at least one attribute of a spatial data type.
Spatial Database System (SDBS): A spatial database system is a database system that offers spatial data types in its data model and query language and supports spatial data types in its implementation, providing at least spatial indexing and efficient spatial query processing.
Spatial Operators: Spatial operators represent the spatial relationships between spatial objects. The most representative spatial relationships are: (1) topological relationships, such as adjacent, inside, disjoint, and so forth are invariant under topological transformations like translation, scaling, and rotation; (2) direction relationships, for example, above, below, north_of, southwest_of, and so forth; and (3) metric relationships, for example, distance < 100.
Spatial Query: It is a set of spatial conditions characterized by spatial operators that form the basis for the
515
TEAM LinG
retrieval of spatial information from a spatial database system.
Spatial Query Processing: It focuses on extracting information from a large amount of spatial data without actually changing the spatial database. It is different to the concept of query optimization that focuses in finding the best query evaluation plan that minimizes the most relevant performance measure (e.g. CPU, I/O, etc.).
Query Processing in Spatial Databases
ENDNOTE
1Supported by the ARCHIMEDES project 2.2.14, «Management of Moving Objects and the WWW» of the Technological Educational Institute of Thessaloniki (EPEAEK II), co-funded by the Greek Ministry of Education and Religious Affairs and the European Union.
516
TEAM LinG

Raster Databases
PeterBaumann
International University Bremen, Germany
INTRODUCTION
Spatio-temporal data play an important role in science, both as observed natural phenomena like temperature curves or satellite imagery and as artificially generated data such as simulation results or statistical data derived from spatio-temporal phenomena. As a coarse but common classification, spatio-temporal data can be grouped into discretized and conceptually continuous data (Figure 1). The first category allows points, lines, areas, and bodies to have any coordinate value, while the latter category has all data values sitting at the crosspoints of equidistant grids. When dealing with maps, these categories are called vector and raster data, respectively, while in Computational Fluid Dynamics (CFD), for example, the terms general mesh and regular mesh are in use.
We will use the term raster/sampled/discretized data or Multidimensional Discrete Data (MDD) interchangeably with the definition that such an object consists of a set of point/value pairs where the points fill an axisparallel rectangular area in the Euclidean space Zd for some dimension d≥1. Obviously, this structure is equivalent to an array in the programming language sense.
MDD appear in a large variety of applications. Examples for multidimensional raster data are 1-D scalar measurements like temperature and radioactivity, 2-D satellite imagery spanning large seamless maps of the Earth’s surface, 3-D image time series (x/y/t) and geophysical data (x/y/z), and 4-D climate models (x/y/z/t). Figure 2 shows some of today’s most important areas; each field in turn usually has many diverse subfields, as the example of mapping/cartography shows.
The kind of services required on MDD can be summarized as fast, flexible selection on huge raster data assets.
Figure 1. Discretized raster vs. continuous spatial objects
517
4
Figure 2. Raster application fields
Life science: pharma, chem, med/bio research, genetics Geo: mapping/cartography, climate modeling/oceanography,
geophysics, etc.
satellite image archives,
Management/controlling: decision support, OLAP, data
cadastral maps, land
warehousing, census, statistics in industry and public
use, disaster mitigation,
administration, etc.
mining/exploration,
Engineering and science: simulation and experimental data in
environmental
automotive/shipbuilding/aerospace industry, turbines, process
monitoring, insurances,
engineering, experimental physics, astronomy, high-energy
energy, facility
physics, etc.
management, security,
Multimedia: tele-learning, prepress, audio/image/video
forestry, agro, tourism, ...
databases, etc.
For example, in Figure 3, the aerial image of Bavaria is depicted, consisting of 950,000 x 1,000,000 RGB (red/ green/blue) pixels; this represents a raw data volume of 2.85 Terabyte (TB). A common task on such geo-imagery is to interactively select and display the overall image or a small cutout scaled to the client’s window size (see Figure 3), something which occupies, say, 30 kB as a JPEG image. Further operations involve analysis like deriving the vegetation index from multi-band satellite images.
Recent hardware development allows holding of such large objects available for online access, and there is an increasing demand for Web services including MDD support, such as aerial/satellite image archives (see www.ceos.org, for example). Consequently, database systems (DBMSs) in future will have to provide storage, query, and optimization support to accommodate MDDs side by side with the traditional data types — in database speak, MDDs must become first-class citizens in the database. In the sequel, we discuss how this can be accomplished and what issues are still waiting to be solved.
Figure 3. Aerial image of Bavaria: downscaled (thumbnail) overall view (left) and zoomed cutout (right)
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
BACKGROUND
Differentiation to Other Fields
A related but different field where databases also handle images is Multimedia Databases. Multimedia database systems rely on content recognition techniques to extract semantic knowledge from the image beforehand and henceforth perform all querying on this semantic net leaving the imagery untouched except for displaying them unchanged. Raster databases, conversely, work on the pixel level and do not attempt to understand the contents; rather, they allow quick navigation on and selection from very large data objects.
Another related field is image processing. While the set of operations known there exceeds raster database functionality by far, data sets in imaging systems traditionally have to fit into main memory; raster databases, on the other hand, focus on data selection on objects which may well exceed main memory capacity by a factor of a thousand or more.
History and State of the Art
Work on raster databases—whether industrial or scien- tific—falls into two basic categories: statistics and sensor/image databases. The field of statistical databases has received outstanding thrust through data warehousing and online analytical processing (OLAP) which use model business data as cell values (“facts”) allocated in multidimensional spaces (“data cubes”) described by abstract dimensions (“features”).
Completely separate from this, sensor and image data have been investigated. Traditionally, images have been stored in BLOBs (binary large objects), that is, byte strings without any further semantics, introduced as “long fields” by Lorie (1982). First approaches to add more semantics include Tamura (1980) where a set of imaging functions was added to the programming interface, but not to the query language; there was no conceptual background justifying the operations chosen. A first image query language was proposed with PICDMS (Chock, Cardenas & Klinger, 1984; Joseph & Cardenas, 1988). However, many queries were dependent on the operation sequence, and no architectural support for large objects was indicated. In Vandenberg and DeWitt (1991), a gen- eral-purpose conceptual database model was extended with simple array capabilities. The quest for support of non-trivial array operations in a query language was first phrased in Buneman (1993). A conceptual raster model with a declarative, optimizable query language based on an algebraic framework was presented in Baumann (1994, 1999); this approach has been implemented in the rasdaman
Raster Databases
system (www.rasdaman.com) which is in worldwide commercial use. Other algebrae which have been implemented to a lesser extent are Libkin, Machlin, and Wong (1996) and Marathe and Salem (1997, 1999).
An example for domain-specific DBMS extensions to accommodate, in this case, 3D medical imagery is described in Arya et al. (1994). Requirements for supercomputing data management have been stated in Kleese (2000) from an application point of view.
With their version 10g, Oracle (www.oracle.com) has released raster support for large 2-D geographic imagery.
The main difference between statistical and sensor/ image databases does not lie in the data structure (both deal with multidimensional grids), nor are operations substantially different (an OLAP roll-up from days to weeks is mathematically close to scaling an image by a factor of 7). The essential difference lies in the sparsity, that is, the percentage of cells in the data space considered which actually carry a value. Statistical databases are sparsely populated (on the average about 2%, maximum 5%), while image data usually are densely populated (usually between 60% and 100%). Technology to handle such data is completely different for both cases. However, given the far-going similarity between both, it seems promising to research ways for an integrated approach.
Relevant Bodies
•SQL/MM defines database handling of imagery in the context of the SQL standard.
•The OpenGIS Consortium (OGC, www.opengis.org) standardizes interfaces for Web-based services on geographic information, among them, multidimensional raster (“coverage”) data in the Web Coverage Service (WCS) standard.
•CODATA (Committee on Data for Science and Technology, www.codata.org) is a user-driven international organization whose goal is to enhance accessibility of scientific data.
•ERCOFTAC (European Research Community On Flow, Turbulence And Combustion, www.ercoftac.org) coordinates data management research and provides sample data sets in the field of Computational Fluid Dynamics (CFD).
A SAMPLE RASTER SERVER
As an example for multidimensional raster database support, we sketch the rasdaman system which has been implemented in the course of several European-funded research projects and has been commercialized mainly in the field of geographic image map services. The underly-
518
TEAM LinG

Raster Databases
ing formal basis, rasdaman array algebra, has been influenced by Image Algebra (Ritter, Wilson & Davidson, 1990).
The conceptual model of rasdaman centers around the notion of typed n-D arrays augmented with an object identifier (OID). Such MDD objects are maintained in collections (sets) similar to relational tables. Arrays are defined through a template marray<b,d> which is instantiated with the array base type b and the array extent (spatial domain) d, specified by the lower and upper bound for each dimension. Thus, a 2-D color image of unbounded domain can be defined as:
typedef marray
< struct{ char red, green, blue; }, [ *:*, *:* ]
> MyImage;
Raster Retrieval
The rasdaman query language, rasql, extends SQL with multidimensional imaging expressions. Like SQL, a rasql query always returns a set of items (in this case, MDD objects). The expressive power has been limited to nonrecursive operations, thereby guaranteeing termination of any well-formed query. Below we provide a brief overview on rasql.
•Subsetting: This includes trimming (rectangular cutouts) and section (extraction of lower-dimen- sional sub-arrays).
Example 1: The following query retrieves a 2000x3000 cutout from every Landsat image in LandsatCollection:
SELECTc[1001:3000,1001:4000]
FROM LandsatCollection AS c
Example 2: Let us assume that ClimateCollection contains one 4-D cube. The following query, then, extracts a 3-D volume at time frame 1020 from this cube:
SELECT c[ 1020, *:*, *:*, *:* ]
FROM ClimateCollection AS c
•Induced Operations: For each operation available on the MDD cell type, a corresponding so-called induced operation is provided which simultaneously applies the base operation to all cells of an MDD. Both unary (e.g., record access or contrast enhancement) and binary operations (e.g., masking an image) can be induced.
Example 3: Figure 4 shows the three visible
bands of a satellite view on the Spanish coast. 4 The query below masks out all pixels from this image which are considered non-vegetation due
to their color value.
SELECTc*(c.green>130ANDc.red<110AND c.blue < 140)
FROM LandsatCollection AS c
In general, MDD expressions can be used in the SELECT part of a query and, if the outermost expression result type is Boolean, also in the WHERE part.
•Deriving Summary Data: Condenser operations— in standard SQL called aggregation—allow summarization over all values from some spatial domain. The most common statistical operators are provided as shorthands; others can be formulated as MDD expressions.
Example 4: “Average green intensity within a
Landsat scene.”
SELECT avg_cells( c.green )
FROM LandsatCollection AS c
Array Storage
The storage management’s task is to efficiently map the conceptual entities (collections of MDD objects) to some appropriate storage structure. Basically, three alternatives for representing an MDD object are known today:
•Sequential storage of cell values following some linearization scheme (e.g., row-major): Coordinates need not be stored since cell locations can be computed following the bijective linearization scheme. This is very efficient for dense MDD; for sparse data, it is less advantageous because many non-existing values have to be materialized, thereby
Figure 4. Landsat scene (left) and vegetation highlighted (right)
519
TEAM LinG