
Rivero L.Encyclopedia of database technologies and applications.2006
.pdf
Proper Placement of Derived Classes in the Class Hierarchy
Figure 5. The class hierarchy in Figure 4 after adding virtual class NRP
the classes participated in the query. The work of Bertino et al. (1992) follows the approach proposed in Kim (1992) in having the result of a query made a direct subclass of the root; it does not have user-defined methods; and it much resembles a set of tuples in a relation. Views of XSQL (Kifer, Kim & Sagiv, 1992) are seen as sets of objects, which may serve as partial input for further queries. The user is expected to specify whether the result from a query is to be saved and how a new unique OID is to be calculated. The same approach of having the user explicitly classifying a query result is also used by O2 (Santos, Abiteboul & Delobel, 1994). There is no restriction on the placement, and the user may easily violate inheritance constraints by specifying the result to be a subclass of some other unrelated class. Such an approach puts too much overhead on the user side.
In OQL (Alashqur, Su & Lam, 1989), results obtained from queries are saved as new objects if save is specified by the user. The new objects exist outside the hierarchy and may be referred to only by using the name assigned by the user. Placing a newly derived subdatabase outside the hierarchy would ensure that no inheritance rules are violated. However, reusability is not achieved.
Some approaches employ views to handle schema changes. The work of Tresch and Scholl (1993) presents several examples of schema changes that can be simulated using views. However, implementation solutions and how to push a persistent view into its proper place within the hierarchy are not mentioned . Two other approaches that deal with schema changes are Bertino (1992) and Bratsberg (1992). The former does not consider the effect of schema changes on subclasses. The latter separates type and extent hierarchies, hence, violates the full inheritance invariant. Finally, the work of Ra and Rundensteiner (1997) is based on the MultiView system (Rundensteiner, 1992) with some extensions. However, it does not handle the automated proper placement of persistent views. Rather, the user has to specify the position. Finally, researchers recently real-
ized the importance of views, schema changes, and integration within the context of XML technologies (Kang & Lim, 2002; Pedersen & Pedersen, 2003). We argue that the approach proposed in this article for object-oriented databases may be easily adapted for XML. In fact, it is essential to facilitate for dynamic changes of the structure of the XML schema for several reasons, including correcting design errors in the schema, allowing expansion of the application scope over time, or accounting for the merging of several XML schemas into a single common application.
As a result, none of the already mentioned approaches handle the proper placement of persistent views in the hierarchy with the aim of maximizing reusability. This study of the existing approaches was the major motivation to deal with the problem more seriously and to find the solutions presented in this article. Further, the approach presented in this article may be easily adapted to XML.
CONCLUSION
For a class to be properly placed inside the hierarchy, it should be defined by a user with complete knowledge about details of the class hierarchy. This is almost impossible in a dynamic environment where multi-users are accessing a common hierarchy with each user performing dynamic schema changes related to the user’s specific domain of interest. Because of this, we automated the process and developed a system that can handle the proper placement of new classes in the hierarchy with minimum user interference. Naive and professional users benefit from our system regardless of their knowledge about the hierarchy. We concentrated on classes that correspond to persistent views; however, our system is still able to deal with other classes added to the hierarchy using class definition constructs. Each of the latter classes belongs to one of the four groups identified for virtual classes, depending on the ability of the user to specify related subclasses or superclasses. The worst case is having the user unable to specify direct subclasses and superclasses of a new class. Such a class is considered in the fourth group, and both its superclasses and subclasses are investigated. As a result of this study, once again we demonstrated that it is necessary for an object-oriented model to support multiple inheritance. Finally, researchers recently developed approaches to handle schema changes for XML. For instance, the work of Koeller and Rundensteiner (2002) handles schema-restructuring view maintenance operations as schema changes and vice versa. Kozankiewicz et al. (2002) present a new approach to virtual, updatable views for a query language addressing XML native data-
490
TEAM LinG

Proper Placement of Derived Classes in the Class Hierarchy
bases. Su et al. (2001) developed a framework for managing the evolution of DTDs and XML documents. Lerner (2000) developed a system that handles type changes by comparing schemas and then produces a transformer that can update data in a database to correspond to a newer version of the schema. Such approaches may benefit from the work described in this article.
Kozankiewicz, H., et al. (2002). Updateable object views |
P |
(Tech. Rep. No. 950). Institute of Computer Science of PAS. |
Lerner, B.S. (2000). A model for compound type changes encountered in schema evolution. ACM TODS, 25(1), 83127.
Pedersen, D., & Pedersen, T.B. (2003). Achieving adaptivity for OLAP-XML federations. Proceedings of the ACM DOLAP (pp. 25-32).
REFERENCES
Alashqur, A., Su, S.Y., & Lam, H. (1989). OQL: A query language for manipulating object-oriented databases. Proceedings of the VLDB, Amsterdam, The Netherlands (pp. 433-442).
Alhajj, R., & Arkun, M.E. (1993). A query model for objectoriented database systems. Proceedings of the IEEEICDE, Vienna, Austria.
Alhajj, R., & Polat, F. (1994). Closure maintenance in an object-oriented query model. Proceedings of the ACMCIKM, Maryland.
Bertino, E. (1992). A view mechanism for object-oriented databases. Proceedings of the EDBT, Vienna, Austria.
Bertino, E., et al. (1992). Object-oriented query languages: The notion and the issues. IEEE TKDE, 4(3), 223-237.
Bratsberg, S.E. (1992). Unified class evolution by objectoriented views. Proceedings of the ER. Lecture Notes in Computer Science, Karlsruhe, Germany (pp. 423-439).
Carey, M.J., DeWitt, D.J., & Vandenberg, S.L. (1988). A data model and a query language for EXODUS. Proceedings of the ACM-SIGMOD (pp. 413-423).
Kang, H., & Lim, J. (2002). Deferred incremental refresh of XML materialized views. Proceedings of the CAiSE (pp. 742-746).
Kaul, M., Drosten, K., & Neuhold, E.J. (1990). Viewsystem: Integrating heterogeneous information bases by objectoriented views. Proceedings of the IEEE-ICDE, Los Angeles, California.
Kifer, M., Kim, W., & Sagiv, Y. (1992). Querying objectoriented databases. Proceedings of the ACM-SIGMOD, San Diego, California.
Kim, W. (1989). A model of queries for object-oriented databases. Proceedings of the VLDB (pp. 423-432).
Koeller, A., & Rundensteiner, E.A. (2002). Incremental maintenance of schema-restructuring views. Proceedings of the EDBT, Prague, Czech Republic (pp. 354-371).
Ra, Y.-G., & Rundensteiner, E.A. (1997). A transparent schema-evolution system based on object-oriented view technology. IEEE TKDE, 9(4), 600-624.
Rundensteiner, E.A. (1992). MultiView: A methodology for supporting multiple views in object-oriented databases. Proceedings of the VLDB, Vancouver, Canada (pp. 187-198).
Santos, C.S., Abiteboul, S., & Delobel, C. (1994). Virtual schemas and bases. Proceedings of the EDBT, Cambridge, UK (pp. 81-94).
Shaw, G., & Zdonik, S. (1990). A query algebra for objectoriented databases. Proceedings of the IEEE-ICDE (pp. 154-162).
Su, H., et al. (2001). XEM: Managing the evolution of XML documents. Proceedings of the IEEE RIDE (pp. 103-110).
Tresch, M., & Scholl, M.H. (1993). Schema transformation without database reorganization. SIGMOD Records, 22, 21-27.
KEY TERMS
Base Class: Userdefined class; a collection of objects that have the same behavior and state definition.
Class Hierarchy: A directed acyclic graph (DAG) that describes the subclass/superclass relationships among classes. Each node represents a class, the children of a node represent the direct subclasses of a class, and the parents of a node represent the direct superclasses of a class.
Inheritance: The ability of a superclass to pass its characteristics (methods and instance variables) onto its subclasses, allowing subclasses to reuse these characteristics.
Multiple Inheritance: The capability of a class of objects to inherit attributes and behavior from more than one superclass.
491
TEAM LinG
Proper Placement of Derived Classes in the Class Hierarchy
Object: A data structure that encapsulates behavior (operations) and data (state).
Subclass: A class that is derived from at least one other class.
Superclass: A class from which a particular class is derived via inheritance.
View or Virtual Class: A class derived using a query expression.
XML: eXtensible Markup Language, a data format widely used for data exchange over the iInternet.
ENDNOTES
1In this article, the two terms, views and virtual class, will be used to refer to a derived class.
2Operands become union-compatible when they are brother classes. Consequently, the union and difference operations can be transformed into a selection.
3For every class c, the corresponding artificial class will have the same name prefixed with the letter “a”.
492
TEAM LinG
|
493 |
|
Querical Data Networks |
|
|
|
3 |
|
|
|
|
|
|
|
Cyrus Shahabi
University of Southern California, USA
Farnoush Banaei-Kashani
University of Southern California, USA
INTRODUCTION
Recently, a family of massive self-organizing data networks has emerged. These networks mainly serve as large-scale distributed query-processing systems. We term these networks querical data networks (QDN). A QDN is a federation of a dynamic set of peer, autonomous nodes communicating through a transient-form interconnection. Data is naturally distributed among the QDN nodes in extra-fine grain, where a few data items are dynamically created, collected, and/or stored at each node. Therefore, the network scales linearly to the size of the data set. With a dynamic data set, a dynamic and large set of nodes, and a transient-form communication infrastructure, QDNs should be considered as the new generation of distributed database systems with significantly less constraining assumptions as compared to their ancestors. Peer-to-peer networks (Daswani, Garcia-Molina, & Yang, 2003) and sensor networks (Akyildiz, Su, Sankarasubramaniam, & Cayirci, 2002; Estrin, Govindan, Heidemann, & Kumar, 1999) are well-known examples of QDNs.
QDNs can be categorized as instances of “complex systems” (Bar-Yam, 1997) and studied using the complex system theory. Complex systems are (mostly natural) systems hard (or complex) to describe informationtheoretically and hard to analyze computationally. QDNs share the same characteristics with complex systems and, particularly, bear a significant similarity to a dominating subset of complex systems most properly modeled as large-scale interconnection of functionally similar (or peer) entities. The links in the model represent some kind of system-specific entity-to-entity interaction. Social networks, a network of interacting people, and cellular networks, a network of interacting cells, are two instances of such complex systems. With these systems, complex global system behavior (e.g., a social revolution in a society, or food digestion in a stomach!) is an emergent phenomenon, emerging from simple local interactions. Various fields of study, such as sociology, physics, biology, chemistry, etc., were founded to study different types of initially simple systems and have been gradually matured to analyze and describe
instances of incrementally more complex systems. An interdisciplinary field of study, the complex system theory,1 was recently founded based on the observation that analytical and experimental concepts, tools, techniques, and models developed to study an instance of complex system at one field can be adopted, often almost unchanged, to study other complex systems in other fields of study. More importantly, the complex system theory can be considered as a unifying metatheory that explains common characteristics of complex systems. One can extend application of the complex system theory to QDNs by:
1.Adopting models and techniques from a number of impressively similar complex systems to design and analyze QDNs as an instance of engineered complex systems; and
2.Exporting the findings from the study of QDNs (which are engineered, hence, more controllable) to other complex system studies.
This article is organized in two parts. In the first part, we provide an overview, where we (1) define and characterize QDNs as a new family of data networks with common characteristics and applications, and (2) review possible database-like architectures for QDNs as query processing systems and enumerate the most important QDN design principles. In the second part of the article, as the first step toward realizing the vision of QDNs as complex distributed query-processing systems, we focus on a specific problem, namely, the problem of effective data location (or search) for efficient query processing in QDNs. We briefly explain two parallel approaches, both based on techniques/models borrowed from the complex system theory, to address this problem.
BACKGROUND
Here, we enumerate the main componental characteristics and application features of a QDN.
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
Componental Characteristics
A network is an interconnection of nodes via links, usually modeled as a graph. Nodes of a QDN are often massive in number and bear the following characteristics:
•Peer functionality: All nodes are capable of performing a restricted but similar set of tasks in interaction with their peers and the environment, although they might be heterogeneous in terms of their physical resources. For example, joining the network and forwarding search queries are among the essential peer tasks of every node in a peer-to- peer network.
•Autonomy: Aside from the peer tasks mentioned above, QDN nodes are autonomous in their behavior. Nodes are either self-governing or governed by out-of-control uncertainties. Therefore, to be efficacious and applicable, the QDN engineering should avoid imposing requirements to and making assumptions about the QDN nodes.2 For example, strict regulation of connectivity (e.g., enforcing number of connections and/or target of connections) might be an undesirable feature for a QDN design.
•Intermittent presence: Nodes may frequently join and leave the network based on their autonomous decision, due to failures, etc.
On the other hand, links in various QDNs stand for different forms of interaction and communication. Links may be physical or logical, and they are fairly inexpensive to rewire. Therefore, a QDN is a large-scale federation of a dynamic set of autonomous peer nodes building a transient-form interconnection. Conventional approaches developed to model and analyze traditional distributed database systems (and classical networks, as their underlying communication infrastructure) are either too weak (oversimplifying) or too complicated (overcomplicated) to be effective with large-scale and topology-transient QDNs. The complex system theory (Bar-Yam, 1997), on the other hand, provides a set of conceptual, experimental, and analytical tools to contemplate, measure, and analyze systems such as QDNs.
Application Features
A QDN is applied as a distributed source of data (a data network) with nodes that are specialized for cooperative query processing and data retrieval. The node cooperation can be as trivial as forwarding the queries or as complicated as in-network data analysis. In order to
Querical Data Networks
enable such an application, QDN should support the following features:
•Data-centric naming, addressing, routing, and storage: With a QDN, queries are declarative; i.e., query refers to the names of data items and is independent of the location of the data. The data may be replicated and located anywhere in the data network, the data holders are unknown to the querier and are only intermittently present, and the querier is interested in data itself rather than the location of the data. Therefore, naturally QDN nodes should be named and addressed by their data content rather than an identifier in a virtual name space such as the IP address space. Consequently, with data-centric naming and addressing of the QDN nodes (Heidemann et al., 2001), routing (Ratnasamy, Francis, Handley, Karp, & Shenker, 2001) and storage (Ratnasamy et al., 2003) in QDN are also based on the content. It is interesting to note that non-procedural query languages such as SQL also support declarative queries and are appropriate for querying data-centric QDNs.
•Self-organization for efficient query processing: QDNs should be organized for efficient query processing. A QDN can be considered as a database system with the data network as the database itself (see the next section). QDN nodes cooperate in processing the queries by retrieving, communicating, and preferably on-the-fly processing of the data distributed across the data network. To achieve efficiency in query processing with high resource utilization and good performance (e.g., response time, query throughput, etc.), QDNs should be organized appropriately. Examples of organization are: intelligent partitioning of the query to a set of sub-queries to enable parallel processing, or collaborative maintenance of the data catalogue across the QDN nodes. However, the peer tasks of the QDN nodes should be defined such that they self-organize to the appropriate organization. In other words, organization must be a collective behavior that emerges from local interactions among nodes; otherwise, the dynamic nature and large scale of a QDN render any centralized micromanagement of a QDN unscalable and impractical.
VISION: A DATABASE QUERYING FRAMEWORK FOR QDNS
In the previous section, we defined a querical data network (QDN) as a distributed data source and a query
494
TEAM LinG

Querical Data Networks
processing system. On the other hand, a database system (DBS) is designed specifically as a general framework for convenient and efficient querying of static or dynamic collections of interrelated data. Thus, querying QDNs can be designed, developed, and executed by adopting a DBS as the general-purpose querying framework, leveraging on its rich abstractions, theories, and processing methods (Govindan et al., 2002; Harren et al., 2002). In particular, adopting the DBS framework potentially results in (1) convenient and rapid application development for users at the conceptual level, by providing well-known and transparent abstractions independent of the implementation of the querying, and (2) efficient query processing at the physical level, by providing a general-purpose querying component that adopts and customizes query processing methods from the database literature as well as other related fields such as distributed computing.
Here, we depict and compare potential architectures for the DBS framework, define a taxonomy of approaches to generalize this querying framework for the entire family of QDNs, and enumerate some important design principles for query processing in QDNs.
Architecture
The DBS querying framework for QDNs suggests a twolevel architecture consisting of a conceptual level and a physical level. At the conceptual level, queries are defined based on the conceptual schema of the data network, independent of the physical implementation of the query processing. The physical data independence allows rapid development of QDN applications, similar to the database application development. For instance, a peer-to-peer application that monitors violation of speed limit at a highway can pose the following query to a (hypothetical) mobile peer-to-peer network of vehicles:
SELECT |
vehicle-ID |
FROM |
Cars |
WHERE |
(speed > 70) AND (location IN |
|
“Highway No. 88”) |
Similarly, a heat alert application may pose the following query to a heat detection sensor field: “Report the outlier heat data in all offices during night.”
Queries are executed at the physical level. There are two extreme choices of design for query processing at the physical level: centralized and decentralized; a hybrid design may also be meaningful for particular applications. With a centralized design, a potential querier (i.e., one of the QDN nodes or an outsider) receives the query from the QDN application at the conceptual level. The query is disseminated to all QDN nodes, where the
query is interpreted based on the local conceptual schema,
and all raw data required to process the query are trans- 3 mitted back to the querier via the network (see Figure 1a).
The querier treats the collected data as a centralized database and processes and analyzes the data to respond to the query. With this scheme, data sourcing and query processing are completely decoupled; the data network maintains and communicates the data, and the querier processes the query individually. The cooperation among QDN nodes is limited to forwarding the query and the raw data, and the network is used only as a point- to-point communication infrastructure to communicate the data between the sources and the querier.
A centralized design is simple to implement. The centralized query processing approach is also resourceefficient for trivial queries such as typical peer-to- peer search queries. However, with complex queries, where, for example, two 10,000-record tables must be joined to retrieve a few records, the overhead of transmitting the entire content of the tables to the querier for central processing is overwhelming and renders the centralized approach impractical. With most QDNs (e.g., sensor networks), this overhead is particularly intolerable due to the several orders of magnitude higher cost of communication as compared with that of computation in typical QDN nodes. Instead, in-network and on-the-fly processing of the query potentially eliminates the redundant communication of the data; hence, it is more efficient and scalable. The decentralized design adopts the latter approach.
With the decentralized design, the QDN is itself both the source of the data and the query-processing unit (see Figure 1b). The data sourcing/communication and data analysis tasks are integrated, and QDN nodes cooperate to perform both tasks within the network. With this approach, queries are processed based on the following general scheme: Query is disseminated to a selected set of QDN nodes; QDN nodes exploit their
Figure 1. Database system framework for querying querical data networks (QDNs)
Conceptual Level |
|
Conceptual Level |
Physical Level |
Information |
Physical Level |
|
|
Information |
|
Database |
|
|
|
Database Querical |
|
Data |
Data Netowrk |
|
|
(DB-QDN) |
Querical Data Network (QDN)
a. |
Centralized design |
b. Decentralized design |
495
TEAM LinG
computing power to process the query locally, cooperatively, in parallel, and in a distributed fashion, to extract the required information from the raw data; and eventually, the extracted information merges to comprise the final query result while traveling toward the querier through the network.
The efficiency of the decentralized design is due to in-network processing. With this approach, communication of the raw data is restricted to short-range (hence, less costly) communications among local cooperativeanalysis groups of QDN nodes, which process the voluminous raw data and extract the concise required information to respond to the query. Although the decentralized design potentially promises an efficient and scalable querying framework for QDNs, realizing this design is a challenging endeavor and requires designing efficient distributed and cooperative query-processing mechanisms that comply with specific characteristics of QDNs.
Taxonomy
Based on the two fundamentally distinct design choices for the physical level of the DBS framework (i.e., centralized and decentralized), one can recognize two approaches to implement a DBS-based querying system for QDNs:
1.Database for QDN: This approach corresponds to the querying systems with centralized query processing. These systems are similar to other centralized database applications, where data are collected from some data sources (depending on the host application, the data sources can be text documents, media files, and, in this case, QDN data) to be separately and centrally processed.
2.Database-QDN (DB-QDN): The systems that are designed based on this approach strive to implement query processing in a decentralized fashion within the QDN; hence, in these systems “QDN is the database.”
Querical Data Networks
1.In-network query processing: In-network query processing is the main distinction of DB-QDNs. In-network query processing techniques should be implemented in a distributed fashion, ensuring minimal communication overhead and optimal load-balance.
2.Transaction processing with relaxed properties:
Due to the dynamic nature of QDNs, requiring ACIDlike properties for transaction processing in DBQDNs is too costly to be practical and severely limits the scalability of such processing technique. Hence, transaction-processing properties should be relaxed for DB-QDNs.
3.Adaptivequeryoptimization:Since QDNs are inherently dynamic structures, optimizing query plans for distributed query execution in DB-QDNs should also be a dynamic/adaptive process. Adaptive query optimization techniques are previously studied in the context of central query processing systems (Avnur & Hellerstein, 2000).
4.Progressive query processing: Distributed query processing tends to be time-consuming. With realtime queries, the user may prefer receiving a rough estimation of the query result quickly rather than waiting long for the final result. The rough estimation progressively enhances to the accurate and final result. This approach, termed progressive query processing (Schmidt & Shahabi, 2002a), allows users to rapidly obtain a general understanding of the result, to observe the progress of the query, and to control its execution (e.g., by modifying the selection condition of the query) on the fly.
5.Approximatequeryprocessing:Approximationtechniques such as wavelet-based query processing can effectively decrease the cost of the query while producing highly accurate results (Schmidt & Shahabi, 2002b). Inherent uncertainty of the QDN data, together with the relaxation of the query semantics, justifies application of approximation techniques to achieve efficiency.
Design Principles
By definition QDNs tend to be large-scale systems and their potential benefits increase as they grow in size. Therefore, between the two types of DBS-based querying systems for QDNs, the database-QDNs (DB-QDNs) are more promising because they are scalable and efficient. Among the most important design principles for distributed query processing at DB-QDNs, one can distinguish the following:
FUTURE TRENDS
One of the most fundamental functionalities required to realize a DB-QDN is the search primitive. Efficient location of the data within the QDN, a large-scale and dynamic system with a distributed and dynamic data set, is a challenging task vital to QDN query processing. For the remainder of this section, we briefly explain two parallel approaches one can adopt from the complex system theory to address the QDN search problem. First, we discuss a self-organizing mechanism to structure
496
TEAM LinG

Querical Data Networks
the topology of the QDN to a search-efficient topology. This topology can be considered as a distributed index structure that organizes the nodes and, therefore, the data content of the nodes for efficient search. For the design of the search-efficient QDN topology as well as the search dynamics, we are inspired by the “small world” models. Small worlds are models proposed to explain efficient communication in a social network, which is a semistructured complex system.
Second, we propose an efficient query flooding mechanism for QDNs. Flooding is not only required for broadcast queries at all QDNs but also for uni-cast and multicast queries in unstructurable/unindexible QDNs. With these QDNs, the extreme dynamism of the QDN topology and the extreme autonomy of the QDN nodes render any attempt to impose even a semi-structure on the network by an index-like structure inefficient and/or impossible. We use percolation theory, an analytical tool borrowed from the complex system theory, to formalize and analyze such efficient flooding mechanism.
Probabilistic Indexing of QDNs for Efficient Approximate Query Processing
Considering a QDN as a database (with every node of the QDN as the potential entry point of the query), similar to traditional databases QDN should be “indexed” for efficient processing of the queries. To process approximate queries,3 we propose self-organizing the interconnection of the QDN based on the data content of the QDN nodes. With this organization, the network distance between every two nodes is positively correlated with the similarity of their data content with high probability. This approach results in an indexed network with distinguishable data localities, allowing efficient routing of the queries toward the nodes holding the result set of the query. The similarity measurements performed by each node while joining the QDN to select an appropriate set of neighbors can be thought of as the off-line pre-computations required to create the index for efficient online query processing. Also, the topology of the generated interconnection should be compared with the tree-like topologies of the traditional hierarchical index structures in centralized databases. In addition to allowing efficient navigation/traversal of the data set, this topology should support the dynamism of the data set and the network and, more importantly, should avoid assuming a central entry point (the root node in hierarchical indices) for the query, in order to balance the query load among all the nodes of the QDN.
It turns out that a probabilistic “small world” model,
which is a topology proposed to explain efficient commu- 3 nication in social networks, is a perfect candidate topol-
ogy to index QDNs. With our searchable QDN model (Banaei-Kashani & Shahabi, 2003b), we propose a selforganization mechanism that generates a QDN with smallworld topology based on a recently developed smallworld model (Watts, Dodds, & Newman, 2002). We complement the generated small-world network topology (i.e., the index) with a query forwarding mechanism (i.e., the index lookup technique) that effectively routes partialmatch queries toward the QDN nodes that store the matching data items. Currently, we are focusing on extending this query routing technique to support more challenging queries such as range queries and nearestneighbor queries.
Criticality-Based Probabilistic Flooding
Flooding is a common mechanism used in many networks, including QDNs, to broadcast a piece of information (e.g., an alert or a search query) from a source node to other nodes of the network. With normal flooding, each node always forwards the received information to all its neighbors (i.e., directly connected nodes). In spite of many beneficial features, such as providing broad coverage and guaranteeing minimum delay, normal flooding is not a scalable communication mechanism, mainly because of the communication overhead it imposes to the system. To alleviate this problem, we introduce probabilistic flooding (Banaei-Kashani & Shahabi, 2003a). With probabilistic flooding, unlike normal flooding, a node forwards the information to its neighbor probabilistically, with probability p. By changing the probability value p, we can control the effective connectivity of the network while information is forwarded. The idea is to tune the probability value p to a critical operation point (the phase transition point) such that statistically the network remains connected (to preserve full reachability) while redundant paths are eliminated. Percolation theory (Stauffer & Aharony, 1992) is an analytical tool from the complex system theory extensively used to study probabilistic diffu- sion-like physical phenomena; e.g., diffusion of oil inside porous rocks in oil reservoir, a physical complex system. We use percolation theory to formalize the probabilistic flooding approach as a query-diffusion problem and to find its critical (optimal) operating point rigorously. Our formal analysis shows that the critical value of p can be as low as 1%, which translates to 99% reduction in communication overhead of flooding, hence, scalable flooding.
497
TEAM LinG
CONCLUSION
In this article, we identified querical data networks (QDNs) as a family of data networks recently emerging as a new generation of distributed database systems with significantly less constraining assumptions. We envision a QDN as a distributed query processing system with a database-like architecture. In search of an effective approach to design and analyze database-QDNs, we find the complex system theory, a theory that explains a family of systems with characteristics that bear significant similarity to those of QDNs, extremely helpful. As an instance application of this approach, we provide two parallel solutions for the QDN search problem inspired by models adopted from the complex system theory.
ACKNOWLEDGMENTS
This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC), IIS-0082826 (ITR), IIS0238560 (CAREER), IIS-0324955 (ITR), and IIS0307908 and unrestricted cash gifts from Okawa Foundation and Microsoft. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
REFERENCES
Akyildiz, I. F., Su, W., Sankarasubramaniam, Y., & Cayirci, E. (2002). A survey on sensor networks. IEEE Communications Magazine, 40(8), 102-114.
Avnur, R., & Hellerstein, J. (2000). Eddies: Continuously adaptive query processing. Proceedings of ACM International Conference on Management of Data (pp. 261-272).
Banaei-Kashani, F., & Shahabi, C. (2003a). Criticality-based analysis and design of unstructured peer-to-peer networks as complex systems. Proceedings of the Third International Workshop on Global and Peer-to-Peer Computing (GP2PC) in conjunction with CCGrid (pp. 351-359).
Banaei-Kashani, F., & Shahabi, C. (2003b). Searchable querical data networks. In Lecture notes in computer science: Vol. 2944 (pp. 17-32). Berlin, Germany: SpringerVerlag.
Bar-Yam, Y. (1997). Dynamics of complex systems. New York: Westview Press.
Querical Data Networks
Daswani, N., Garcia-Molina, H., & Yang, B. (2003). Open problems in data-sharing peer-to-peer systems. Proceedings of the ninth International Conference on Database Theory (pp. 1-15).
Estrin, D., Govindan, R., Heidemann, J., & Kumar, S. (1999). Next century challenges: Scalable coordination in sensor networks. Proceedings of International Conference on Mobile Computing and Networks (pp. 256262).
Govindan, R., Hellerstein, J., Hong, W., Madden, S., Franklin, M., & Shenker, S. (2002). The sensor network as a database (Tech. Rep. No. 02-771). University of Southern California, Los Angelos, CA.
Harren, M., Hellerstein, J., Huebsch, R., Loo, B. T., Shenker, S., & Stoica, I. (2002). Complex queries in DHT-based peer-to-peer networks. Proceedings of the first International Workshop on Peer-to-Peer Systems.
Heidemann, J., Silva, F., Intanagonwiwat, C., Govindan, R., Estrin, D., & Ganesan, D. (2001). Building efficient wireless sensor networks with low-level naming. Proceedings of the Symposium on Operating Systems Principles (pp. 146-159).
Ratnasamy, S., Francis, P., Handley, M., Karp, R., & Shenker, S. (2001). A scalable content addressable network. Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (pp. 161-172).
Ratnasamy, S., Karp, B., Shenker, S., Estrin, D., Govindan, R., Yin, L., & Yu, F. (2003). Data-centric storage in sensornets with GHT, a geographic hash table. Mobile Networks and Applications, 8(4), 427-442.
Schmidt, R., & Shahabi, C. (2002a). How to evaluate multiple range-sum queries progressively. 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 133-141).
Schmidt, R., & Shahabi, C. (2002b). Propolyne: A fast wavelet-based algorithm for progressive evaluation of polynomial range-sum queries. Eighth Conference on Extending Database Technology (pp. 664-681).
Stauffer, D., & Aharony, A (1992). Introduction to percolation theory (2nd ed.). London: Taylor and Francis.
Watts, D. J., Dodds, P. S., & Newman, M. E. J. (2002). Identity and search in social networks. Science, 296, 1302-1305.
498
TEAM LinG

Querical Data Networks
KEY TERMS
Complex Systems: Complex systems is a new field of science studying how parts of a complex system give rise to the collective behaviors of the system. Complexity (information-theoretical and computational) and emergence of collective behavior are the two main characteristics of such complex systems. Social systems formed (in part) out of people, the brain formed out of neurons, molecules formed out of atoms, and the weather formed out of air flows are all examples of complex systems. The field of complex systems cuts across all traditional disciplines of science, as well as engineering, management, and medicine.
Content-Centric Networks: A content-centric network is a network where various functionalities such as naming, addressing, routing, storage, etc. are designed based on the content. This is in contrast with classical networks that are node-centric.
Distributed Hash Tables (DHTs): A distributed index structure with hash table-like functionality for information location in the Internet-scale distributed computing environments. Given a key from a pre-speci- fied flat identifier space, the DHT computes (in a distributed fashion) and returns the location of the node that stores the key.
Peer-to-Peer (P2P) Networks: A peer-to-peer network is a distributed, self-organized federation of peer entities, where the system entities collaborate by sharing resources and performing cooperative tasks for mutual benefit. It is often assumed that such a federation lives, changes, and expands independent of any distinct service facility with global authority.
Percolation Theory: Assume a grid of nodes where each node is occupied with probability p and empty with probability (1-p). Percolation theory is a quantitative (statistical-theoretical) and conceptual model for under-
standing and analyzing the statistical properties (e.g., |
3 |
size, diameter, shape, etc.) of the clusters of occupied |
nodes as the value of p changes. Many concepts associated with complex systems such as clustering, fractals, diffusion, and particularly phase transitions are modeled as the percolation problem. The significance of the percolation model is that many different problems can be mapped to the percolation problem; e.g., forest-fire spread, oil field density estimation, diffusion in disordered media, etc.
Sensor Networks: A sensor network is a network of low-power, small form-factor sensing devices that are embedded in a physical environment and coordinate amongst themselves to achieve a larger sensing task.
Small-World Models: It is believed that almost any pair of people in the world can be connected to one another by a short chain of intermediate acquaintances, of typical length about six. This phenomenon is colloquially referred to as the “six degrees of separation,” or, equivalently, the “small world” effect. Sociologists propose a number of topological network models, the smallworld models, for the social network to explain this phenomenon.
ENDNOTES
1Go to New England Complex Systems Institute (http://necsi.org/) for more information about the complex system theory.
2One can consider peer tasks as rules of federation, which govern the QDN but do not violate autonomy of individual nodes.
3Considering the transience of the QDN structure and the dynamism of the data set, exact query processing with zero false dismissal is not a practical option.
499
TEAM LinG