
Rivero L.Encyclopedia of database technologies and applications.2006
.pdfRepairing and Querying Databases with Integrity Constraints
(each stable model defines a repair, and each repair is derived from a stable model) and more general than techniques previously proposed.
More specifically, the technique is based on the generation of an extended disjunctive program LP derived from the set of integrity constraints. The repairs for the database can be generated from the stable models of LP, whereas the computation of the consistent answers of a query (g,P) can be derived by considering the stable models of the program P LP over the database D.
Let c be a universally quantified constraint of the form:
X [ B1 ... Bk not Bk+1 ... not Bn φ B0 ]
then, dj(c) denotes the extended disjunctive rule:
←B’1 ... ←B’k B’k+1 ... B’n B’0 ← (B1 B’1), …, (Bk B’k),
(not Bk+1 ←B’k+1), …, (not Bn ←B’n), φ ,(not B0 ←B’0),
where B’i denotes the atom derived from Bi by replacing the predicate symbol p with the new symbol pd if Bi is a base atom, otherwise equal to false. Let IC be a set of universally quantified integrity constraints, then DP(IC) = { dj(c) | c IC } whereas LP(IC) is the set of standard disjunctive rules derived from DP(IC) by rewriting the body disjunctions.
Clearly, given a database D and a set of constraints IC, LP(IC)D denotes the program derived from the union of the rules LP(IC) with the facts in D whereas SM(LP(IC)D) denotes the set of stable models of LP(IC)D, and every stable model is consistent since it cannot contain two atoms of the form A and ←A. The following example shows how constraints are rewritten.
Example 5. Consider the following integrity constraints:
The above rules can now be rewritten in standard form. Let P be the corresponding extended disjunctive Datalog program. The computation of the program PD gives the following stable models:
M1 = D { ←pd(b), ←qd(a) },
M2 = D { ←pd(b), rd(a) },
M3 = D { ←qd(a), sd(b) },
M4 = D { rd(a), sd(b) },
M5 = D { qd(b), ←qd(a), rd(b) } and
M6 = D { qd(b), rd(a), rd(b) }.
A (generalized) extended disjunctive Datalog program can be simplified by eliminating from the body rules all literals whose predicate symbols are derived and do not appear in the head of any rule (these literals cannot be true). As mentioned before, the rewriting of constraints into disjunctive rules is useful for both (1) making the database consistent through the insertion and deletion of tuples and (2) computing consistent answers leaving the database inconsistent.
CONCLUSION
In the integration of knowledge from multiple sources, two main steps are performed: the first in which the various relations are merged together and the second in which some tuples are removed (or inserted) from the resulting database in order to satisfy integrity constraints.
The database obtained from the merging of different sources could contain inconsistent data. In this article, we investigated the problem of querying and repairing inconsistent databases. In particular, we presented the different techniques for querying and repairing inconsistent databases (Agarwal et al., 1995; Arenas et al., 1999; Greco & Zumpano, 2000; Lin & Mendelzon, 1996).
X [ p(X) not s(X) q(X) ] X [ q(X) r(X) ]
and the database D containing the facts p(a), p(b), s(a), and q(a).
The derived generalized extended disjunctive program is defined as follows:
← pd(X) sd(X) qd(X) ← (p(X) pd(X)) (not s(X) ← sd(X)) (not q(X) ← qd(X)).
←qd(X) rd(X) ← (q(X) qd(X)) (not r(X)
←rd(X)).
FUTURE TRENDS
As a future trend, an interesting topic consists in specifying preference criteria so that selecting the preferable repairs among a set of feasible ones, that is, those better conforming to the specified criteria. Preference criteria introduce desiderata on how to update the inconsistent database in order to make it consistent; thus, they can be considered as a set of desiderata which are satisfied if possible by a generic repair. Therefore, informally a preferred repair is a repair that better satisfies preferences.
540
TEAM LinG

Repairing and Querying Databases with Integrity Constraints
REFERENCES
Agrawal, S., Keller, A.M., Wiederhold, G., & Saraswat, K. (1995). Flexible relation: An approach for integrating data from multiple, possibly inconsistent databases. Proceedings of the IEEE International Conference on Data Engineering (pp. 495-504).
Arenas, M., Bertossi, L., & Chomicki, J. (1999). Consistent query answers in inconsistent databases. Proceedings of the International Conference on Principles of Database Systems (pp. 68-79).
Arenas, M., Bertossi, L., & Chomicki, J. (2000). Specifying and querying database repairs using logic programs with exceptions. Proceedings of the International Conference on Flexible Query Answering (pp. 2741).
Baral, C., Kraus, S., Minker, J., & Subrahmanian, V.S. (1991). Combining knowledge bases consisting of first order theories. Proceedings of the International Symposium on Methodologies for Intelligent Systems (pp. 92101).
Bry, F. (1997). Query answering in information systems with integrity constraints. Proceedings of the IFIP WG 11.5 Working Conference on Integrity and Control in Information System.
Dung, P.M. (1996). Integrating data from possibly inconsistent databases. Proceedings of the International Conference on Cooperative Information Systems (pp. 5865).
Greco, S., & Zumpano, E. (2000). Querying inconsistent databases. Proceedings of the International Conference on Logic Programming and Automated Reasoning (pp. 308-325).
Greco, S., & Zumpano, E. (2000). Computing repairs for inconsistent databases. Proceedings of the International Symposium on Cooperative Database Systems for Advanced Applications (pp. 33-40).
Greco, G., Greco, S., & Zumpano, E. (2001). A logic programming approach to the integration, repairing and querying of inconsistent databases. Proceedings of the International Conference on Logic Programming.
Greco, G., Sirangelo, C., Trubitsyna, I., & Zumpano, E. (2003). Preferred repairs for inconsistent databases. Proceedings of the International Conference on Database Engineering and Applications Symposium.
Lin, J. (1996). A semantics for reasoning consistently in
the presence of inconsistency. Artificial Intelligence, 4 86(1), 75-95.
Lin, J., & Mendelzon, A.O. (1996). Merging databases under constraints. International Journal of Cooperative Information Systems, 7(1), 55-76.
Lin, J., & Mendelzon, A.O. (1999). Knowledge base merging by majority. In R. Pareschi & B. Fronhoefer (Eds.), Dynamic worlds: From the frame problem to knowledge management. Kluwer.
Ullman, J.K. (2000). Information integration using logical views. 239(2), 189-210.
Wiederhold, G. (1992). Mediators in the architecture of future information systems. IEEE Computer, 25(3), 38-49.
KEY TERMS
Consistent Answer: A set of tuples, derived from the database, satisfying all integrity constraints.
Consistent Database: A database satisfying a set of integrity constraints.
Database Repair: Minimal set of insert and delete operations which makes the database consistent.
Data Integration: A process providing a uniform integrated access to multiple heterogeneous information sources.
Disjunctive Datalog Program: A set of rules of the form:
A1 … Ak ← B1, ..., Bm, not Bm+1, …,not Bn, k+m+n>0
where A1,…, Ak, B1,…, Bn are atoms of the form p(t1,..., th), p is a predicate symbol of arity h and the terms t1,..., th are constants or variables.
Inconsistent Database: A database violating some integrity constraints.
Integrity Constraints: Set of constraints which must be satisfied by database instances.
541
TEAM LinG
542
Repairing Inconsistent XML Data with Functional Dependencies
Sergio Flesca
DEIS Università della Calabria, Italy
FillippoFurfaro
DEIS Università della Calabria, Italy
Sergio Greco
DEIS Università della Calabria, Italy
EsterZumpano
DEIS Università della Calabria, Italy
INTRODUCTION
The World Wide Web is of strategic importance as a global repository for information and a means of communicating and sharing knowledge. Its explosive growth has caused deep changes in all the aspects of human life, has been a driving force for the development of modern applications (e.g., Web portals, digital libraries, wrapper generators, etc.), and has greatly simplified the access to existing sources of information, ranging from traditional DBMS to semi-structured Web repositories. The adoption by the WWW consortium (W3C) of XML (eXtensible Markup Language) as the new standard for information exchange among Web applications has led researchers to investigate classical problems in the new environment of repositories containing large amounts of data in XML format.
Great attention has also been recently devoted to the introduction of integrity constraints and the definition of normal forms for XML (Arenas & Libkin, 2003, 2004; Fan & Libkin, 2002; Vincent & Liu, 2003). XML allows a simple form of constraints to describe references obtained through ID/IDREF, but it does not actually provide a general mechanism for expressing semantic constraints like those commonly used in relational databases. The need of enriching the semantics of XML is so deep as a large amount of XML data originates in object-oriented and relational databases, where different forms of integrity constraints are used to add semantics to the collected information.
This work stems from the need of enriching the semantics of XML documents. This need is attested by several new works which introduce different forms of constraints to XML documents (Arenas, Fan & Libkin, 2002, 2004; Buneman et al., 2001, 2002; Fan & Libkin, 2002; Fan & Simeon, 2000; Vincent et al., 2004; Yang, Yu & Wang,
2001). Most of them introduce a simple form of constraints such as keys and foreign keys, whereas some others attempt to extend the class of integrity constraints associated with XML documents.
Obviously, reasoning about constraints in the presence of an incomplete knowledge of the data structure is rather complex so that some of these attempts are likely to be a purely theoretical exercise. In fact, their practical applicability follows the solution of non-trivial problems such as the implication and interaction among constraints which are far from being solved. In the presence of constraints on data, an XML document may result in being inconsistent; that is, it does not respect some constraint. The following example shows the case of an inconsistent XML document.
Example 1. Consider the following XML document representing a book collection:
<bib>
<book isbn=”0-451-16194-7">
<title> A First Course in Database Systems </
title>
<author> Ullman </author> <author> Widom </author>
<publisher> Prentice-Hall </publisher> </book>
<book isbn=”0-451-16194-7">
<title> Principles of Database and KnowledgeBase Systems
</title>
<author> Ullman </author>
<publisher> Computer Science Press </pub-
lisher>
</book>
</bib>
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG

Repairing Inconsistent XML Data with Functional Dependencies
and the functional dependency bib.book.@isbn → bib.book.title.S1, stating that two books with the same isbn must have the same title.
The above document does not satisfy this functional dependency, as the first and the second book have the same isbn attribute but different titles.
The above example shows that, generally, the satisfaction of constraints cannot be guaranteed, thus, in the presence of an XML document which must satisfy a set of constraints, we have to manage potential inconsistencies of data. This problem has been recently investigated for relational databases and several techniques based on the computation of repairs (minimal sets of insert/delete operations), and consistent answers have been proposed in this context (Arenas, Bertossi & Chomicki, 1999; Greco & Zumpano, 2000). However, these techniques cannot easily be extended to XML data because of the different structure of data and the different nature of constraints.
The document of the previous example can be repaired by performing one of the following minimal sets of update operations:
notion of functional dependency in the XML setting proposed in Arenas and Libkin (2004) and Arenas et al. 4 (2002)2, which will be used in the following as a basis for
our framework. Before introducing functional dependencies for XML, we present the tree-based representation model which will be adopted in the rest of this work, and then we provide the concept of tree tuple, corresponding to the concept of tuple in relational databases.
From now on, XML documents will be represented by means of labeled trees (called “XML trees”) whose nodes correspond to either elements, attributes, or string values. In particular, nodes corresponding to elements have a single label (representing the tag name), whereas nodes corresponding to attributes have two labels, representing the attribute name and value, respectively. The text content of an element will be represented by a distinguished node (marked with the symbol “S”) labeled with a string equal to the text content of the element. An example of XML tree is shown in Figure 1.
The information represented in an XML document can be extracted by means of path expressions identifying nodes of the corresponding XML tree. A path expression
•replace the string “A First Course in Database is a sequence of symbols (i.e., tag names, attribute names, Systems” with the title “Principles of Database and or the symbol “S”) occurring in the XML tree, identifying
Knowledge-Base Systems”; |
traversals over it. In more detail, the result of a path |
•replace the string “Principles of Database and expression ending with an element name is the set of node Knowledge-Base Systems” with the title “A First identifiers which can be reached, starting from the root,
Course in Database Systems”;
•assign a new, different value to one of the two isbn attributes so that there are no two books with the same isbn.
This means that the violation of a functional dependency could be repaired by applying several sets of possible update operations, yielding a consistent scenario of the information. In our framework, we prefer the repairs performing minimal sets of changes to the original document, in the same way as well known approaches proposed for relational database repairing.
We also address the problem of extracting “reliable” information from inconsistent documents. To this end, we define two different semantics for queries, which are evaluated on the repaired version of the given document, instead of the (inconsistent) original one. A wider discussion of the notions and contributions introduced in this article is provided in Flesca et al. (2003).
BACKGROUND
A functional dependency A→B in a relational database D models the correspondence between A and B values in the tuples of D. However, there is no standard concept for tuple in the XML context. In this section, we recall the
walking through a sequence of nodes whose labels satisfy the given expression. Otherwise, if the path expression ends with either an attribute name or the symbol “S”, the result is a set of strings representing the attribute values (or the text content of elements) which can be reached by means of path satisfying the given expression. For instance, the path expression bib.book.title applied on the XML tree of Figure 1 returns the set {v12, v22}, whereas bib.book.written_by.author.name.S returns the set {“Ullman”, “Widom”}.
Informally, a tree tuple groups together nodes of the document which are semantically correlated, according to the structure of the tree. For instance, a tree tuple of the XML tree XT of Figure 1 consists of a sub-tree which contains information about a book. Observe that each book is possibly described by more than one tree tuple, as each tree tuple contains the information of only one author.
Definition 1. Given an XML tree XT, a tree tuple t of XT is a maximal sub-tree of XT such that, for every path expression p defined on XT, t.p contains at most one element.
Example 2. Consider the XML tree XT of Figure 1. The sub-tree of XT shown in Figure 2(a) is a tree-tuple, whereas the sub-tree in Figure 2(b) is not a tree tuple (it contains
543
TEAM LinG

Repairing Inconsistent XML Data with Functional Dependencies
Figure 1. An XML tree
two authors of the same book). This means that each book stored in XT can correspond to more than one tree tuple: each tree tuple corresponds to one of the book authors. Note that any sub-tree of the tree in Figure 2(a) is not a tree tuple, as it would not be maximal.
Definition 2. Given an XML tree XT, a functional dependency on XT is an expression of the form X→ p, where X is a finite nonempty set of path expressions defined on XT, and p is a single path expression on XT.
Given an XML tree XT and a functional dependency F:X→p, we say that XT satisfies F (namely XT\= F) if for each pair of tree tuples t1, t2 of XT, t1.X=t2.X t1.X ≠ t1.p = t2.p. Given a set of functional dependencies FD={F1, …, Fn} over XT, we say that XT satisfies FD if it satisfies Fi for every i 1..n.
Example 3. Consider the XML tree XT of Figure 1. The constraint that the attribute @ano identifies univocally the (value of the) name of every author can be expressed with the following functional dependency:
Figure 2. Two sub-trees of the XML tree of Figure 1
b i b . b o o k . w r i t t e n _ b y . a u t h o r . @ a n o → bib.book.written_by.author.name.S
To say that two distinct authors of the same book cannot have the same value of the attribute ano, we can use the following FD:
{bib.book, bib.book.written_by.author.@ano} → bib.book.written_by.author
REPAIRING INCONSISTENT XML DATABASES
In this section, we present an approach to the problem of repairing XML documents which are inconsistent w.r.t. a given set of functional dependencies. A possibly inconsistent XML document can be repaired by taking two different kind of actions:
1.by changing the value of an attribute or the content of an element;
(a) |
(b) |
544
TEAM LinG

Repairing Inconsistent XML Data with Functional Dependencies
Figure 3. An XML tree
2.by marking some of the attributes or elements of the document as “unreliable”.
Example 4. Consider the XML tree XT of Figure 3, and suppose that we are given the following functional dependency:
{bib.book, bib.book.written _by.author.@ano} →bib.book.written_by.author
XT does not satisfy the above FD, as the two author elements contained in the same book, have the same value of the attribute ano, whereas the above FD requires that, for each book, there is only one author having a given ano value.
The constraint in the above example may not be satisfied for two possible reasons: (1) one of the two values is incorrect; (2) one of the two author elements is incorrect. Therefore, two repairing strategies are possible. If we assume that the former of the two errors occurs, we are induced to change the ano value of one of the authors. That is, we can make XT consistent w.r.t. the given FD by assigning a new value (i.e., a skolem constant, denoted as ‘ 1’) to the attribute ano of any of the author elements, see Figure 4(a). Otherwise, if we assume that the latter error occurs (i.e., one of the two author elements is incorrect), we choose to mark one of the two authors having the same ano as “unreliable”, see Figure 4(b), where unreliable nodes are marked with the symbol “*”. Marking a node as unreliable means that it contains some information of which trustworthiness is not guaranteed. This implies that when checking a functional dependency on a repaired document, nodes marked as unreliable are not considered. That is, marking nodes as unreliable makes the XML tree consistent w.r.t. the given functional dependency in a weaker sense: if an XML tree XT does not satisfy the functional dependency F:
{x1,…,xn}→p, we say that XT weakly satisfies F if either
t1.xi or t2.xi is marked as unreliable (for some iÎ[1..n]), or 4 either t1.p or t2.p is marked as unreliable.
However, in the case of the XML tree of Figure 3, the latter strategy (marking an author element as unreliable) changes a larger portion of the document, since all the sub-elements of author are marked as unreliable, whereas the first strategy only changes its ano. Repair strategies performing smaller changes to the original document will be preferred, in the same way as in well-known approaches in the relational database scenario (Arenas et al., 1999; Greco & Zumpano, 2000). Observe that we prefer marking a node as unreliable rather than deleting it, since removing elements from an XML document can lead to undesired side effects: for instance, deleting a node can yield to a new document not conforming the DTD associated to the original one.
When evaluating a query (expressed by means of a path expression) on a repaired document, nodes marked as unreliable are not considered.
Our repairing strategy can be summarized as follows. Given a document XT and a functional dependency such that two tuples t1, t2 of XT do not satisfy X → y [i.e. t1.X = t2.X but t1.y¹t2.y], we can repair XT in two ways:
1.change the value of any x X in one of the two tuples in order to make (if possible) t1.X≠t2.X. Two cases may occur:
a.if t1.x and t2.x are nodes, we cannot make a node different to itself; then, we mark t1.x as unreliable;
b.if t1.x and t2.x are strings (attribute or element values), we assign a new different value (skolem constant, namely ‘ ’) to either t1.x or t2.x;
2.change the value of y in one of the two tuples, in
order to make (if possible) t1.y = t2.y. Two cases may occur:
a.if t1.y and t2.y are nodes, we cannot merge two distinct nodes into a unique one; therefore, we mark either t1.y or t2.y as unreliable;
b.if t1.y and t2.y are strings (attribute or element values), we assign the value of t1.y to t2.y, or vice versa.
Finally, among all possible repairs, we consider minimal ones.
Indeed, for a given document, more than one repair could be minimal. Therefore, when evaluating a query q (expressed by means of a path expression) on an inconsistent document, we consider two possible semantics:
545
TEAM LinG

Repairing Inconsistent XML Data with Functional Dependencies
Figure 4. Two possible repairs of the XML tree of Figure 3
(a) |
(b) |
1.Possible answer: Set of nodes (or values) which belong to the answer of q returned on at least one repaired document; and
2.Certain answer: Set of nodes (or values) which belong to every answer of q returned on every possible repaired document.
Observe that the answer of a query on a repaired document does not comprise attribute and element values which are marked as unreliable.
Example 5. Consider the XML tree of Figure 3 and the functional dependency bib.book.written_ by.author.@ano → bib.book.written_by.author.name.S, stating that two authors with the same ano must have the same name. For the path query bib.book.title.S, both the possible and certain answers consist in the set {“Elements of the Theory of Computation”}. In fact, all possible repairs of the given XML tree do not change the element title.
As regards the path query bib.book.author.name.S, the possible answer is the set {“Lewis”, “Papadimitriou”}, whereas the certain answer is the empty set. In fact, four minimal repairs can be performed. Two of these repairs consist in replacing one of the two author names with the new value ^, whereas the other two repairs consist in replacing the string “Papadimitriou” with “Lewis”, and, respectively, the string “Lewis” with “Papadimitriou”. Therefore, there exists at least one repair yielding to a document where the string “Lewis” does not occur and at least one repaired document which does not contain the string “Papadimitriou”.
FUTURE TRENDS
We are currently investigating how to extend our repairing strategy when further constraints (such as DTD and inclusion dependencies) are defined.
CONCLUSION
We have defined a framework for repairing XML documents which are inconsistent w.r.t. to a given set of functional dependencies. Repairs consist in minimal sets of update operations which either change attribute and element values or mark elements as unreliable. We have also addressed the problem of extracting reliable information from inconsistent documents after applying our repairing strategy.
REFERENCES
Arenas, M., Bertossi, L., & Chomicki, J. (1999, Month 00). Consistent query answers in inconsistent databases.
Proceedings of the Symposium on Principles of Database Systems (PODS), Philadephia.
Arenas, M., Fan, W., & Libkin, L. (2002). On verifying consistency of XML specifications. Proceedings of the Symposium on Principles of Database Systems (PODS), Madison, Wisconsin.
Arenas, M., Fan, W., & Libkin, L. (2004). Consistency of XML specifications. In L. Bertossi, A. Hunter, & T. Schaub (Eds.), Inconsistency tolerance. Springer-Verlag.
Arenas, M., & Libkin, L. (2003). An information-theoretic approach to normal forms for relational and XML data.
546
TEAM LinG

Repairing Inconsistent XML Data with Functional Dependencies
Proceedings of the 22nd Symposium on Principles of Database Systems (PODS), New York.
Arenas, M., & Libkin, L. (2004). A normal form for XML documents. ACM Transactions on Database Systems, 29(1).
Buneman, P., Davidson, S.B., Fan, W., Hara, C.S., & Tan, W.C. (2001). Reasoning about keys for XML. Revised Papers from the 8th International Workshop on Database Programming Languages (DBPL), Frascati, Italy.
Buneman, P., Davidson, S.B., Fan, W., Hara, C.S., & Tan, W.C. (2002). Keys for XML. Computer Networks, 39(5).
Fan, W., & Libkin, L. (2002). On XML integrity constraints in the presence of DTDs. Journal of the ACM, 49(3).
Fan, W., & Simeon, J. (2000). Integrity constraints for XML. Proceedings of the 19th ACM Symposium on Principles of Database Systems (PODS), New York.
Flesca, S., Furfaro, F., Greco, S., & Zumpano, E. (2000). Repairs and consistent answers for XML data with functional dependencies. Proceedings of the XML Database Symposium (XSym), Berlin, Germany.
Greco, S., & Zumpano E. (2000). Querying inconsistent databases. Proceedings of the 7th International Conference on Logic for Programming and Automated Reasoning (LPAR), Reunion Island, France.
Liu, J., Vincent, M.W., & Liu, C. (2003). Functional dependencies, from relational to XML. Proceedings of the 5th Ershov Memorial Conference Perspectives of System Informatics (PSI), Novosibirsk, Russia.
Vincent, M.W., Schrefl, M., Liu, J., Liu, C., & Dogen, S. (2004). Generalized inclusion dependencies in XML. Proceedings of the 6th Asia Pacific Web Conference (APWeb), Hangzhou, China.
Vincent, M.W., & Liu, J. (2003). Functional dependencies
for XML. Proceedings of the 5th Asia Pacific Web Con- 4 ference (APWeb), Xian, China.
Yang, X., Yu, G., & Wang, G. (2001). Efficiently mapping integrity constraints from relational database to XML document. Proceedings of the 5th East European Conference on Advances in Databases and Information Systems (ADBIS), Vilnius, Lithuania.
KEY TERMS
Consistent Answer: Data satisfying both the query and all integrity constraints defined on the given database.
Consistent Database: A database satisfying a set of integrity constraints.
Data Integration: A process providing a uniform integrated access to multiple heterogeneous information sources.
Database Repair: Minimal set of insert and delete operations which makes the database consistent.
Inconsistent Database: A database violating some integrity constraint.
Integrity Constraints: Set of constraints which must be satisfied by database instances.
ENDNOTES
1The symbol “S” is used to extract the text content from an element.
2An alternative definition has been proposed in Vincent and Liu (2003) and Liu, Vincent, and Liu (2003).
547
TEAM LinG
548
Replication Mechanisms Over a Set of
Distributed UDDI Registries
Zakaria Maamar
Zayed University, UAE
INTRODUCTION
This paper presents a research initiative, which aims at developing replication mechanisms for the dynamic management of the content of several Universal Description, Discovery, and Integration (UDDI) registries (Curbera et al., 2003). These replication mechanisms are intended to be deployed in an environment of Web services (Papazoglou & Georgakopoulos, 2003). By content of an UDDI registry, we mean the announcements of Web services that providers post on the UDDI registry. Unlike other research initiatives in the field of Web services that essentially consider a single UDDI registry and assume a wired and stable communication infrastructure, the following aspects constitute the core of this research initiative:
•Several UDDI registries are spread across different regions. An UDDI registry is aware of the presence of other peers but does not perform any direct exchange of information on its content with them. The UDDI registries may belong to different businesses, have different usage policies, and pose various requirements on acceptable announcements and retrieval demands of Web services.
•There is no predefined communication infrastructure between the distributed UDDI registries. An infrastructure of type wired or wireless for direct interactions can be set up after assessing the importance of the exchange between the UDDI registries. In addition, an UDDI registry may be called to disappear if its owner decides to withdraw it.
•Absence of a centralized component that manages and coordinates the UDDI registries. It is noted that a central authority has always constituted a bottleneck in a system operation (Penserini et al., 2003). On the one hand, each UDDI registry is independent in defining the announcements of providers that it accepts and the retrieval demands of users that it satisfies. The definition of what to accept and what to satisfy is based on a set of UDDI registry-defined policies. On the other hand, each provider is independent in selecting the UDDI registries to which it will post its announcements
of Web services. The selection of where to post is based on a set of provider-defined policies.
In a Web services scenario, an UDDI registry participates in two operations. The first operation consists of receiving the announcements of the description of Web services (also called services) from providers. After posting the announcements, the second operation consists of searching the registry content for the services that satisfy specific needs upon users’ needs. Examples of needs are multiple, varying from hotel booking and car rental to weather forecasts. The search consists of identifying the relevant services and indicating who offers them so that the identified services can be triggered after a potential composition (Casati et al., 2003). It is accepted that the advantages of Web services are highlighted by their capacity to be composed into high-level business processes referred to as composite services (Berardi et al., 2003). However, since the announcements of services are submitted to multiple UDDI registries, this results in a different content across the registries.
Targeting the dynamic management of multiple UDDI registries has some overlapping with the well-known problem of information total-replication over a set of distributed databases. An immediate solution to the UDDI-registry dynamic management is to flood the communication infrastructure with the new content of any UDDI registry that has been subject to changes. Changes in UDDI registries are expected to become frequent as the number of Web services of providers continues to grow. While the flooding seems to be a suitable solution for the context of a wired communication infrastructure, the lack of a reliable and permanent communication infrastructure is a major obstacle to this solution deployment. It was observed that the traditional database approaches for collecting, caching, and indexing data of interest in monolithic contexts become obsolete in global computing contexts (Karakasidis & Pitoura, 2002). In addition, unlike the case of a wired environment, the assumption of large bandwidth availability, low error rates, and always-on connectivity are invalidated in a wireless environment. Therefore, another alternative is required for the dynamic manage-
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG

Replication Mechanisms Over a Set of Distributed UDDI Registries
ment of the content of UDDI registries. It will be outlined throughout this article how mobile users constitute the vehicles of supporting the content exchange between the UDDI registries (i.e., content replication). It is important to strengthen that this support is done in a transparent way to users because of the use of software agents, which act on their behalf (Jennings, Sycara & Wooldridge, 1998).
Each UDDI registry is associated with a structure known as cluster of Web services. Several clusters exist across the communication infrastructure so that providers can connect to the most appropriate cluster according to various criteria such as proximity to and workload status of a cluster. The connection between providers and clusters is of type wired. For tracking purposes, a provider cannot be connected to more than one cluster, which means that a provider cannot post its announcements of Web services on the UDDI registries of multiple clusters. The cluster in which a provider declares its services for the first time is called master. Interesting is the situation where providers have similar Web services but respectively announce their Web services in separate UDDI registries. In Benatallah, Sheng, and Dumas (2003), service similarity is explained with the concept of service communities where alliances are formed among a potentially large number of services performing the same operation types. Unless some appropriate exchange mechanisms are made available, an UDDI registry would never be aware of the existence of similar services in other registry peers. Besides that, for a users wishing to satisfy their needs by triggering or composing Web services, users should be given the opportunity to consider all the existing services regardless of where they are announced. The two aforementioned scenarios (i.e., service similarity and users’ needs) shed light on the importance of supporting a content exchange between the UDDI registries. This content exchange requires deploying appropriate replication mechanisms.
A part of the solution of the dynamic management problem of UDDI registries relies on users who, first, are mobile and, second, have mobile devices (e.g., cell phones, personal digital assistants). The other part of the solution relies on software agents. It is accepted that software agents are suitable candidate for performing the composition operations of Web services on behalf of users (Huhns, 2002; Kuno & Sahai, 2002). The solution, which is outlined in this article, combines users and software agents to constitute what we refer to as messengers. Briefly about the operation of messengers, a software agent resides in the mobile device of a user. The agent caches a description of the list of Web services that were involved in the satisfaction of a user’s needs. On behalf of providers, users post services on
various UDDI registries that are associated with clusters |
4 |
known as slaves. Because users have mobile devices, |
mobile support stations manage these devices when it comes to identifying their physical location and handling their incoming and outgoing messages/calls (Maamar, Ben-Younes & Al-Khatib, 2003). A mobile support station communicates with mobile users within its radio coverage area known as wireless cell. For the needs of this initiative, each cluster of Web services is attached to a mobile support station. Therefore, when a user enters a new cell (i.e., the user becomes under the coverage area of a new mobile support station), an exchange of information between the software agent of the user and the UDDI registry is conducted. This exchange enables updating this registry content. It is deemed appropriate to mention that users do not have to visit all the clusters. Their association with a mobile support station depends on their route to various places such as work, gym, and so forth.
Because a UDDI registry receives information on Web services from two independent sources, namely, providers of Web services and agents of users, the services are decomposed into two types: internal and external. Internal services are announced in an UDDI registry of a master cluster (the providers take care of the announcements). This registry has full control over the internal services by guaranteeing, for example, their QoS arguments. External services are always announced in an UDDI registry of a slave cluster (the agents of users take care of the announcements). This registry cannot, for example, guarantee the QoS arguments of the external services and their availability in their respective provider hosts for triggering purposes. Handling the features of external services constitutes one of the challenges of managing the content of several UDDI registries.
In this initiative, the exchange of the content of the UDDI registries does not target a total replication. Instead, a partial replication, which evolves over time to reach the level of a total replication, is aimed. A total replication between the UDDI registries might happen subject to the following factors:
•The route of users: users are not forced to visit all the clusters so that the UDDI registries are fed with new content. The involvement of users in the dynamic management of the UDDI registries does not have to be a burden on them. Because of the diversity of the routes of users, the update of an UDDI registry occurs each time these users are linked to a new support mobile station of a cluster, thus, in the vicinity of a new UDDI registry.
•The different policies that exist such as pro- vider-defined policies: for instance, each regis-
549
TEAM LinG