
Rivero L.Encyclopedia of database technologies and applications.2006
.pdfKryszkiewicz, M. (2001). Comparative study of alternative types of knowledge reduction in inconsistent systems.
International Journal of Intelligent Systems, 16(1), 105120.
Levy, A. (2000). Logic-based techniques in data integration. In J. Minker (Ed.), Logic-based artificial intelligence (pp. 575-596). Dordrecht: Kluwer.
Link, S. (2003). Consistency enforcement in databases.
Proceedings of the 2nd International Workshop on Semantics in Databases 2001, Lecture Notes in Computer Science, 2582 (pp.139-159).
Lloyd, F. (1987). Foundations of logic programming.
Berlin: Springer.
MacGregor, R., & Ko, I.-Y. (2003). Representing contextualized data using semantic Web tools. Electronic Proceedings of the ISWC’03 Workshop on Practical and Scaleable Semantic Web Systems Services.
Retrieved September 16, 2004, from http://km.aifb.uni- karlsruhe.de/ws/psss03/proceedings/macgregor-et- al.pdf
Marquis, P., & Porquet, N. (2003). Resource-bounded paraconsistent inference. Annals of Mathematics and Artificial Intelligence, 39(4), 349-384.
Mayol, E., & Teniente, E. (2003). Consistency preserving updates in deductive databases. Data and Knowledge Engineering, 47(1), 61-103.
Minker, J. (1999). Logic and databases: A 20 year retrospective. In F. Pirri & H. Levesque (Eds.), Logical foundations for cognitive agents (pp. 234-299). Berlin: Springer.
Pan, Z., & Heflin, J. (2003). DLDB: Extending relational databases to support semantic web queries. Electronic Proceedings of the ISWC’03 Workshop on Practical and Scalable Semantic web Systems Services. Retreived September 16, 2004, from http://km.aifb.uni-karlsruhe.de/ ws/psss03/proceedings/pan-et-al.pdf, http:// km.aifb.uni-karlsruhe.de/ws/psss03/proceedings/ pan-et-al.pdf, and http://km.aifb.uni-karlsruhe.de/ws/ psss03/proceedings/pan-et-al.pdf
Pradhan, S. (2003). Connecting databases with argumentation. Web Knowledge Management and Decision Support, 14th International Conference on Applications of Prolog, (INAP 2001), Lecture Notes in Computer Science, 2543 (pp.170-185).
Reiter, R. (1984). A logical reconstruction of relational database theory. In J. L. Mulopoulos & J. W. Schmidt (Eds.), On conceptual modelling (pp. 163-189). New York: Springer.
Logic Databases and Inconsistency Handling
KEY TERMS
Arguments: An argument in a KDB is a pair (∆,F), where ∆ is a subset of the KDB such that ∆ entails F. The basic relation among arguments is rebutting.
Closed World Assumption: A principle that claims that every atom not entailed by the KDB is assumed to be false. This principle is sound on KDBs with simple syntax, as logic programs.
Contextualizing Logics: Method to formally represent knowledge associated with a particular circumstance on which it has the intended meaning.
Description Logics: Logical formalism to represent structured concepts and the relationships among them. Formally, it is a subset of FOL dealing with concepts (monadic predicates) and roles (binary predicates) wich are useful to relate concepts. KDB in DL are composed of a Tbox (the intensional component) and an Abox (box of asserts, the extensional component part).
Logical Inconsistency: A logical theory is inconsistent if there is no logical models for it. In the logic database paradigm, the notion of inconsistency is usually restricted to express the violation of integrity constraints. This restriction makes sense when the data are atomic formulas.
Ontology Web Language (OWL): Language (based on description logics) designed to represent ontologies capable of being processed by machines. The World Wide Web Consortium released OWL as recommendation (see http://www.w3.org/2001/sw/webOnt).
Paraconsistent Logics: Logic systems that limit, for example, the power of classical inference relations to retrieve non trivial information of inconsistent sets of formulas.
Reiter’s Formalization of Database Theory: A set of axioms that, when added to a relational database, formalize the reasoning with them. They are the Unique Names Principle, the Domain Closure Axiom, the Completion Axioms and Equality Axioms. This formalization permits identification of the answering with logical consequences.
Skolem Noise: A kind of anomalous answers obtained by resolution oriented theorem provers when it works on nonclausal theories. The classical method of skolemization leads to new function symbols with no intended meaning. If these symbols appear in the output, the answer may not be consistently interpreted.
340
TEAM LinG

Main Memory Databases
MatthiasMeixner
Technische Universität Darmstadt, Germany
INTRODUCTION
For a long time, hard disks were the only technology that could store enough information to hold a database and offered random access at the same time. Therefore, conventional database management systems were tuned to take this technology to the maximum. But in recent years, main memories have become cheaper and grown to a point that for some fields of application it allows one to keep the whole information of a database in main memory and therefore speed up operation. This article focuses on the differences in conventional databases that affect both performance and internal structure of a database management system.
BACKGROUND
When storing information in main memory, many design decisions that were based on disk storage are not valid anymore, and different steps have to be taken to achieve maximum performance since main memory and disk storage have very different access performance characteristics. The access time of main memory is of orders of magnitude smaller than for disk storage, but on the other hand main memory is volatile, while disk storage is not. Disk accesses exhibit a high fixed cost per access due to seek time. Therefore, to achieve good performance, accesses should be sequential and transfer large amounts of needed data, i.e., data layout on disk is critical. In contrast, the access time of main memory is less dependent on the location, and therefore data layout is far less critical, although this is also changing since the access time of onprocessor caches is improving faster than the access time of main memory. These differences have effects on almost every aspect of database management. We will discuss this in the following sections.
MAIN THRUST
Main Differences to Disk Resident Databases
Both conventional disk resident database management systems (DRDBMSs) and main memory database man-
341
M
agement systems (MMDBMSs) process data in main memory, and both keep a (backup) copy on disk. If the cache of a DRDBMS is large enough, sooner or later the whole database will be in cache, and an MMDBMS needs to store a backup copy on disk so that it is able to recover from failures. So what is the main difference between a DRDBMS and an MMDBMS? The key difference is that in an MMDBMS the primary copy of the database lives permanently in main memory (Garcia-Molina & Salem, 1992), and this has important implications for the algorithms and data structures used. Even if the whole database of a DRDBMS is cached in main memory, it will not provide best performance since a DRDBMS is not tuned for this case. A DRDBMS cannot rely on data being present in main memory at all times. Therefore, each data access has to go to the buffer manager to make sure data is still in main memory. Index structures in DRDBMSs are designed for disk access and may trade computing power and storage efficiency for a lower number of disk accesses since disk access is the processing-time dominating factor in DRDBMSs. This overhead incurs even if all data is cached in main memory. In MMDBMSs the situation is different: Since data is guaranteed to stay present in main memory, index structures and all the other parts of the database do not need to consider disk access and can be tuned for low computational cost.
Performance and Data Structures
In DRDBMSs disk accesses have traditionally been the bottleneck since in the time required for one single disk access, a processor can perform up to several million instructions. Therefore, the most important optimizations in disk-based systems are to reduce the number of disk accesses, to prefer sequential access, and to keep the processor busy while waiting for I/O. To reduce the number of disk accesses, caching is used, and special index structures, like the B+ tree, were developed. Data that is used together in a database is grouped on disk to be able to access it using one single sequential read. A high degree of concurrency is employed to keep the processor busy while other transactions are waiting for I/O, and therefore a small locking granularity down to record level locking is used.
In the case of MMDBMSs all these optimizations are not relevant anymore; instead, performance is only determined by the CPU efficiency of the algorithms used.
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
MMDBMSs have an edge over disk-based systems in respect thereof since many simplifications can be used that reduce the processing power required when data is guaranteed to be in main memory. A DRDBMS cannot rely on data being cached in main memory. Therefore, each access has to go to the buffer manager to bring data into main memory or to make sure that data is already in main memory. In an MMDBMS this step does not exist and therefore does not consume processing power. Pointers can be used to address data: Instead of storing attribute values in an index, pointers to it can be used, reducing the size of the index and reducing the complexity of dealing with long or variable length fields.
Traditional index structures are also affected: Since disk accesses do not occur any more, index structures can be tuned for a low consumption of main memory and processing power. One result of this is the T-tree (Lehman & Carey, 1986). It is a binary search tree whose nodes contain more than one item to reduce the number of calls to the memory management and to reduce the amount of memory consumed by pointers.
Since a new access time gap between on-processor cache and main memory opens up, data layout becomes more critical again. Cache-sensitive search trees (CSStrees; Rao & Ross, 1999) address this problem and optimize the cache performance to achieve an additional speedup.
Pointers can also be passed to the applications, eliminating temporary copies of data. While this may boost performance, there is the danger that an application modifies unauthorized parts of the database. This can be circumvented by using a special compiler for the transactions that enforces checking of proper authorization and logs every object modification (Garcia-Molina & Salem, 1987).
Commit Processing
From a user’s point of view MMDBMSs should not differ from conventional database systems apart from the higher performance. Therefore MMDBMSs must be able to support the ACID properties known from conventional database systems. This has some important implications that will be described in the following sections.
Recovery
A commit must be able to guarantee persistence. Since data in main memory is volatile, some kind of mechanism must exist that protects data in case of a failure. One common mechanism is to use logging to disk alongside using backup copies of the data. In this case before a transaction can commit, its activity records must be writ-
Main Memory Databases
ten to the log. This can greatly affect response time since each transaction must wait for at least one stable write before committing. Although this is also the case in DRDBMSs, it is more severe in main memory systems because the logging represents the only disk access. Therefore, logging may become the bottleneck for the whole system. This problem can be solved by using precommitting and group-commit (DeWitt et al., 1984). Precommitting releases locks as soon as the log records are placed in the log but before they have actually been written to disk. By this the blocking delay of a transaction is reduced, but the response time of this transaction is not improved. Group-commit accumulates log records of several transactions and flushes them to disk in one write operation. While the number of I/O operations is reduced to relieve the I/O bottleneck, the response time may be even increased due to the accumulation of log records.
Since disk access is the bottleneck, the key to a faster response time is to eliminate the disk access and yet be able to guarantee persistence. One very expensive solution is to use solid-state disks that represent a drop-in replacement for conventional hard disks but that store data in battery (and disk) backed-up DRAM and therefore are able to offer very fast access time. Flash ROM cannot be used as a logging device due to the limited number of writes supported by this memory technology. The writeintensive use would wear out Flash-ROM within weeks and months depending on the actual use (assuming 1.000.000 supported write cycles—current numbers are closer to 100.000—and one write per 10 seconds to the same place in memory, flash ROM would last for about 120 days).
Another solution to guarantee persistence is by logging over a network using, e.g., a distributed, faulttolerant write cache (Mao, Zhang, Wang, & Zheng, 2002). Probably the most extreme solution is to guarantee persistence by means of fault tolerance in a distributed system. In case of a failure, data is not lost, but at least one copy still exists in main memory of one node of the distributed system. This not only allows improved response time, since network access is much faster than disk access, but at the same time such a system is able to provide high availability (Meixner & Buchmann, 2004).
Backup
Main memory is volatile and is lost in case of a power failure. Unless special care is taken, like in Bressoud, Clark, and Kan (2001), data is not only lost after a failure but even during a simple restart of the system. Therefore, a backup copy of the database must exist. One obvious solution is to back up data to disk(s). In case of a failure this backup can be used together with the logs to recover
342
TEAM LinG

Main Memory Databases
the database. To keep the time needed for a recovery short, the backup must be kept up-to-date. One commonly used mechanism is checkpointing (Woo, Kim, & Lee, 1997). Since the backup disk is only accessed by the checkpointer that backs up data in the background and no application has to wait for I/O to this disk, the I/O can be tailored to meet the need of the checkpointer alone. Therefore, a very large block size can be used since large blocks can be written to disk more efficiently.
Backing up MMDBMSs has some problems not found in DRDBMSs: While in DRDBMSs data already has a wellknown location and structure on disk, this is not the case in MMDBMSs. In contrast an MMDBMS has to answer the questions: How and where should data be stored, and how should pointers be treated? Solutions to this problem are presented in Lin and Dunham (1996) and Salem and Garcia-Molina (1989).
Concurrency Control
As it was mentioned above, in DRDBMSs a high degree of concurrency is desirable to keep the processor(s) busy while other transactions are waiting for disk I/O. To achieve maximum parallelism, locking granularity should be as small as possible. In an MMDBMS waiting for disk I/O is not an issue. Therefore, the higher overhead of locking outweighs the advantage of small locking granules. It has been suggested that very large lock granules, e.g., relations or even the entire database, are most appropriate. Even a serial execution of transactions that eliminates the need for locking can be advantageous in MMDBMSs (Blott & Korth, 2002).
Another advantage of MMDBMSs results from the coarser locking granularity. If tuples are locked on de- M mand, deadlocks may occur. Since the tuples to be locked
are not known in advance, deadlocks cannot be avoided. This is not acceptable in real-time systems since deadlocks screw up any runtime estimation. On the other hand if whole relations are locked, the relations to be locked may be determined in advance since the relations involved are listed in the transaction. This allows running only those transactions in parallel that do not have conflicts, thus avoiding deadlocks completely.
FUTURE TRENDS
For some fields of applications, MMDBMSs offer significant advantages over DRDBMSs systems. But some of the mechanisms developed for MMDBMSs can also be integrated in DRDBMSs: For example the advantage of pointers can be exploited by using the page fault mechanism, to swizzle (convert) the pointers to an in-memory representation as soon as an object actually gets used (Wilson & Kakkad, 1992).
As DRDBMSs perform more and more in-memory optimizations, they come closer to MMDBMSs. Depending on the dynamic use of data, good database management systems may detect that some data permanently resides in memory and use main memory database mechanisms to speed up operations for that data, whereas other parts of the database remain on disk and are treated as a conventional disk resident database. This allows having the best from both worlds.
Related Issues: Real-Time Systems
The goal in real-time systems is to guarantee the completion of a task before a deadline. Therefore, databases in real-time systems must estimate the worst-case runtime of all transactions to be able to schedule them so that every task meets its deadline. DRDBMSs have a problem regarding the worst-case runtime of transactions: In the worst case, each access may result in disk I/O, resulting in a very high worst-case runtime; however, in average, the processing time is much shorter. In MMDBMSs the situation is different: At most, one single write access to disk is required, i.e., writing the log record. Therefore, the worstcase processing time is much better. To resolve this problem in DRDBMSs prefetching can be used, which preanalyzes the transactions to be run and constructs a dynamic transaction-oriented main memory sub-database by preloading all potentially required data into main memory to profit from the advantages of MMDBMSs (Wedekind & Zoerntlein, 1986).
CONCLUSION
This article has presented the mechanisms of main memory databases that are used to improve performance, and it has presented their main differences to conventional databases. Nearly all parts of database management are affected due to the different properties of the underlying storage technology: Optimization algorithms, recovery, concurrency control, and backup all have to be tuned with the different characteristics in mind. On the other hand, main memory mechanisms are not limited to pure main-memory database management systems, but the optimizations used can be integrated into conventional databases to speed up parts of the operations performed.
343
TEAM LinG
REFERENCES
Blott, S., & Korth, H. F. (2002). An almost-serial protocol for transaction execution in main-memory database systems. In Proceedings of the 28th VLDB Conference.
Bressoud, T. C., Clark, T., & Kan, T. (2001). The design and use of persistent memory on the DNCP hardware faulttolerant platform. In International Conference on Dependable Systems and Networks (pp. 487-492). IEEE.
DeWitt, D. J., Katz, R. H., Olken, F., Shapiro, L. D., Stobebraker, M. R., & Wood, D. (1984). Implementation techniques for main memory database systems. In Proc. of the ACM SIGMOD Conf. (pp. 1-8). ACM Press.
Garcia-Molina, H., & Salem, K. (1987). High performance transaction processing with memory resident data. In
Proceedings of the International Workshop on High Performance Transaction Systems.
Garcia-Molina, H., & Salem, K. (1992). Main memory database systems: An overview. IEEE Transactions on Knowledge and Data Engineering, 4, 509-516.
Lehman, T. J., & Carey, M. J. (1986). A study of index structures for main memory database management systems. In Proceedings of the 12th International Conf. on Very Large Data Bases (pp. 294-303). Morgan Kaufmann.
Lin, J.-L., & Dunham, M. H. (1996). Segmented fuzzy checkpointing for main memory databases. In Proceedings of the 1996 ACM Symposium on Applied Computing
(pp. 158-165). ACM Press.
Mao, Y., Zhang, Y., Wang, D., & Zheng, W. (2002). LND: A reliable multi-tier storage device in NOW. SIGOPS Operating Systems Review, 36(1), 70-80.
Meixner, M., & Buchmann, A. (2004). HADES—A highly available distributed main memory reliable storage. In
Proceedings of the 2004 High Performance Computing & Simulation (HPC&S) Conference (pp. 50-56).
Rao, J., & Ross, K. A. (1999). Cache conscious indexing for decision-support in main memory. VLDB, Proceedings of the 25th International Conference on Very Large Databases (pp. 78-89). San Francisco: Morgan Kaufman.
Salem, K., & Garcia-Molina, H. (1989). Checkpointing memory-resident databases. In Proceedings of the Fifth International Conference on Data Engineering (pp. 452462). IEEE Computer Society.
Wedekind, H., & Zoerntlein, G. (1986). Prefetching in realtime database applications. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (pp. 215-226). ACM Press.
Main Memory Databases
Wilson, P. R., & Kakkad, S. V. (1992). Pointer swizzling at page fault time: Efficiently and compatibly supporting huge address spaces on standard hardware. In 1992 International Workshop on Object Orientation and Operating Systems (pp. 364-377). IEEE Computer Society Press.
Woo, S.-K., Kim, M.-H., & Lee, Y.-J. (1997). Accommodating logical logging under fuzzy checkpointing in main memory databases. In International Database Engineering and Application Symposium (pp. 53-62). IEEE.
KEY TERMS
Abort: Cancels all modifications of a transaction.
ACID Properties: Properties of transactions: atomicity, an operation is either completely performed or not at all; consistency, an operation transfers the database from one consistent state to another consistent state; isolation, intermediate states of a transaction are not visible to the outside; durability, changes made to a database are persistent.
Cache: Memory that mirrors often used parts of a slower but larger memory. The term cache mainly refers to the function not to the memory technology. Cache can be standard random access memory that is used to speed up disk accesses but it also can be very specialized highspeed memory that is used to speed up processor accesses to main memory.
Commit: Activates all modifications performed by a transaction, makes them visible to the outside, and makes all modifications durable.
Concurrency Control: The task of the concurrency control is to coordinate the concurrent execution of several transactions so that the chosen consistency properties (e.g., ACID properties) are not violated.
Main Memory: Memory that is used for storing data and program code and that can be directly accessed by the processor (random access memory).
Recovery: The task of recovery is to return the database to a consistent state after a crash: The effects of committed transactions must be guaranteed to be persistent, and effects of not committed transactions must be undone. Recovery requires that all modifications are written to some stable storage as part of a commit or else these modifications would be lost in a crash.
Transaction: Group of operations that are either performed all or none.
344
TEAM LinG
|
345 |
|
Managing Inconsistent Databases Using |
|
|
|
M |
|
Active Integrity Constraints |
|
|
|
|
|
|
|
Sergio Flesca
DEIS Università della Calabria, Italy
Sergio Greco
DEIS Università della Calabria, Italy
Ester Zumpano
DEIS Università della Calabria, Italy
INTRODUCTION
Integrity constraints are a fundamental part of a database schema. They are generally used to define constraints on data (functional dependencies, inclusion dependencies, exclusion dependencies, etc.), and their enforcement ensures a semantically correct state of a database. As the presence of data inconsistent with respect to integrity constraints is not unusual, its management plays a key role in all the areas in which duplicate or conflicting information is likely to occur, such as database integration, data warehousing, and federated databases (Bry, 1997; Lin, 1996; Subrahmanian, 1994). It is well known that the presence of inconsistent data can be managed by “repairing” the database, that is, by providing consistent databases, obtained by a minimal set of update operations on the inconsistent original environment, or by consistently answering queries posed over the inconsistent database.
The motivation of this work stems from the observation that in repairing a database it is natural to express among a set of update requirements the preferred ones, that is, those actions which, besides making the database consistent, also maintain the preferred information. The novelty of our approach consists in the formalization of active integrity constraints, a flexible and easy mechanism for specifying the preferred updates, that is, the actions that should be performed if an integrity constraint is not satisfied. In some sense, active integrity constraints represent a restricted form of active rules sufficient to (declaratively) express database repairs but without the typical problems of active databases, such as termination and procedural interpretation of rules. Thus, in the general case, active integrity constraints can be thought of as a means to define an “intended” repairing strategy.
Recently, there have been several proposals considering the computation of repairs and queries over
inconsistent databases (Arenas, Bertossi & Chomicki, 1999, 2000; Greco, Greco & Zumpano, 2001; Wijsen, 2003). Other works have investigated the updating of data and knowledge bases through the use of active rules and nonmonotonic formalisms. The application of the ECA (event-condition-action) paradigm of active databases to policies—collection of general principles specifying the desired behavior of systems—has been investigated in Chomicki, Lobo, and Naqvi (2003). In this work, the authors propose the introduction of active constraints to describe under which circumstances a set of actions cannot be executed simultaneously. In Alferes et al. (2000), the problem of updating knowledge bases represented by logic programs has been investigated. More specifically, the authors introduce the notion of updating a logic program P by means of another logic program U (denoted by P U) and a new paradigm, called dynamic logic programming, to model dynamic program update. The new paradigm has been further investigated in Alferes et al. (2002), where the language LUPS (Language for Dynamic Updates), designed for specifying changes to logic programs, has been proposed.
Nonmonotonic formalisms, such as Revision Programs (Marek, Pivkina & Truszczynski, 1998; Marek & Truszczynski, 1998), are based on the extension of the logic programming paradigm. Unlike earlier approaches in belief revision, where updates were represented in classical theories, revision programs are a collection of rules with nonclassical semantics; they can be interpreted as inference rules and are used to update interpretations. ECA languages based on revision programs have been proposed as well in Baral (1997).
The approach here proposed differs both from ECA languages, as only sets of actions making the input database consistent are allowed, and from revision programs, as actions can be enforced not only by the initial state of the database.
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
Managing Inconsistent Databases Using Active Integrity Constraints
BACKGROUND
We assume that readers are familiar with relational and deductive databases (Abiteboul, Hull & Vianu, 1995; Ullman, 1988).
A (disjunctive Datalog) rule r is a clause of the form A1 … Ak ← B1, ..., Bm, not Bm+1, …, not Bn,
k+m+n>0
where A1,…, Ak, B1,…, Bn are atoms of the form p(t1,..., th), p is a predicate symbol of arity h, and the terms t1,..., th are constants or variables (Eiter et al., 1998). The disjunction A1 ... Ak is the head of r, while the
conjunction B1,…,Bm, not Bm+1,…, not Bn is the body of r. We also assume the existence of the binary built-in
predicate symbols (comparison operators) which can only be used in the body of rules.
The Herbrand Universe UP of a program P is the set of all constants appearing in P, and its Herbrand Base BP is the set of all ground atoms constructed from the predicates appearing in P and the constants from UP. A term, (resp. an atom, a literal, a rule, or a program) is ground if no variables occur in it. A rule r’ is a ground instance of a rule r, if r’ is obtained from r by replacing every variable in r with some constant in UP. We denote by ground(P) the set of all ground instances of the rules in P. An interpretation of P is any subset of BP.
An interpretation M for P is a model of P if M satisfies each rule in ground(P). The (model-theoretic) semantics for a positive program, say P, assigns to P the set of its minimal models MM(P), where a model M for P is minimal, if no proper subset of M is a model for P (Minker, 1982). The more general disjunctive stable model semantics also applies to programs with (unstratified) negation (Gelfond & Lifschitz, 1991). For any interpretation I, denote with PI the ground positive program derived from ground(P) (1) by removing all rules that contain a negative literal not a in the body and a I, and (2) by removing all negative literals from the remaining rules. An interpretation M is a (disjunctive) stable model of P if and only if M MM(PM). For general P, the stable model semantics assigns to P the set SM(P) of its stable models. It is well known that stable models are minimal models (i.e., SM(P) = MM(P)) and that for negation free programs, minimal and stable model semantics coincide (i.e., SM(P) = MM(P)). Observe that stable models are minimal models which are “supported”; that is, their atoms can be derived from the program. An alternative semantics which overcomes some problems of stable model semantics has been recently proposed in Greco (1999).
INTEGRITY CONSTRAINTS
Integrity constraints express semantic information over data, that is, relationships that must hold among data in the theory. Generally, integrity constraints, denoted as IC, represent the interaction among data and define properties which are supposed to be explicitly satisfied by all instances over a given database schema. Therefore, they are mainly used to validate database transactions.
Definition 1
A full (or universal) integrity constraint is a formula of the first order predicate calculus of the form:
( X) [ B1 ... Bn ϕ A1 ... Am ψ1 ... ψk ]
where A1, ..., Am, B1, ..., Bn are base positive literals, ϕ, ψ1, ..., ψk are built-in literals, X denotes the list of all variables appearing in B1,...,Bn and it is supposed that variables appearing in A1,..., Am, ϕ, ψ1, ..., ψ k also appear
in B1,...,Bn.
In the definition above, the conjunction B1 ... Bnj is called the body, and the disjunction A1 ... Amψ1 ... ψk the head of the integrity constraint. Moreover, an integrity constraint is said to be positive if no negated literals occur in it. Classical definitions of integrity constraints only consider positive nondisjunctive constraints, called embedded dependencies (Kanellakis, 1991).
Often we shall write our constraints in a different format by moving literals from the head to the body and vice versa.
REPAIRING INCONSISTENT
DATABASES
Intuitively, a repair for a (possibly inconsistent) database D is a minimal consistent set of insert and delete operations which makes D consistent, whereas a consistent answer for a query consists of two sets containing, respectively, the maximal set of true and undefined atoms which match the query goal; atoms which are neither true nor undefined can be assumed to be false.
More formally:
346
TEAM LinG

Managing Inconsistent Databases Using Active Integrity Constraints
Definition 2
Given a (possibly inconsistent) database D, a repair for D is a pair of sets of atoms (R+,R-) such that (1) R+∩ R-= , 2) D R+- R- |= IC and (2) there is no pair (S+, S-) ≠ (R+, R-) such that R+ S+, R- S- and D S+- S- |= IC. The database D R+ - R- will be called the repaired database.
Thus repaired databases are consistent databases, derived from the source database by means of a minimal set of update operations. In more detail, for any repair R of a given database D, R+ (resp. R-) denotes the set of tuples which will be added to (deleted from) the database. Observe that any repair R is a consistent set of update operations (R+ R-= ). In the following, for a given repair R and a database D with integrity constraints IC, R(D) = D R+- R- denotes the application of R to D, whereas R(D,IC) denotes the set of all possible repairs for D.
Definition 3
Given a database D and a set of integrity constraints IC, an atom A is true (resp. false) with respect to (D,IC) if A belongs to all repaired databases (resp. there is no repaired database containing A). The set of atoms which are neither true nor false are undefined.
Thus, true atoms appear in all repaired databases whereas undefined atoms appear in a proper subset of repaired databases. Given a database D and a set of integrity constraints IC, the application of IC to D, denoted by IC(D), defines three distinct sets of atoms: the set of true atoms IC(D) + , the set of undefined atoms IC(D)u, and the set of false atoms IC(D)–.
where Φ is a range restricted conjunction of literals, and
Ψ is a disjunction of update atoms. M Given an active integrity constraint ( X) [Φ Ψ],
we denote with St(r) the standard integrity constraint ( X) [Φ ] derived from r by removing the head update atoms. Moreover, for a set of active integrity constraints AIC, St(AIC) denotes the corresponding set of standard integrity constraints, that is, St(AIC) = {St(r) | r AIC}.
We start by defining the truth value of ground atoms and ground update atoms with respect to a database D and a consistent set of update atoms R, that is, a set not containing two update atoms of the form +A and -A. The truth value of built-in atoms and conjunctions is given in the standard way.
Definition 5
Let D be a database and R a consistent set of update atoms for D, then a positive ground literal A is true in
(D,R) if A R(D), a negative ground literal not A is true in (D,R) if R(D), a ground update atom +A (resp. -A)
is true in (D,R) if A R+ (resp. A R-).
In the following, given a database D and a consistent set of update atoms R, a ground active constraint φ ϕ is said to be (1) true w.r.t. (D,R) if φ is true in (D,R) and
(2)applied w.r.t. (D,R) if both φ and ϕ are true in (D,R). A consistent set of update atoms R is a repair for a
database D and a set of active integrity constraints AIC only if R is a repair for D and St(AIC).
Every minimal set of update atoms R such that R(D) |= AIC is a repair for D; moreover, not all repairs contain atoms which could be derived from the active integrity constraints.
ACTIVE INTEGRITY CONSTRAINTS
In this section, we present, in an informal way, an extension of integrity constraints that allows specification for each constraint the actions to be performed to satisfy it. The actions are defined by means of insertions and deletions.
Definition 4
An update atom is either of the form +A or of the form -A, where A is a base atom. An active integrity constraint r is a formula of the first order predicate calculus of the form:
r = ( X) [Φ Ψ]
Definition 6
Let D be a database, AIC a set of active integrity constraints and R a repair for D and AIC. Then, R is said to be founded if for every ground update atom u R+ (resp. u R-), there is a ground active integrity constraint r: φ ϕ and an update atom +u (resp. -u) ϕ such that φ is true in (D,R).
Given a database D and a set of active integrity constraints AIC, FR(D,AIC) denotes the set of founded repairs for D.
Example 3
Consider the integrity constraints:
( E,P,D) [mgr(E,P) prj(P,D) not emp(E,D) +emp(E,D]
347
TEAM LinG
Managing Inconsistent Databases Using Active Integrity Constraints
(E1,E2,D,S) [emp(E,D1) emp(E,D2) D1 ≠ D2 - emp(E,D1) - emp(E,D2)
The first constraint states that every manager E of a project P carried out by a department D must be an employee of D, whereas the second one says that every employee must be in only one department.
Consider now the database D = { mgr(e1,p1), prj(p1,d1), emp(e1,d2) }. There are three repairs for D:
•R1 = { -mgr(e1,p1)},
•R2 = { -prj(p1,d1) }, and
•R3 = { +emp(e1,d1),-emp(e1,d2) }.
R3 is the only founded repair as only the update atoms +emp(e1,d1) and -emp(e1,d2) are derivable from the active constraints.
From the previous considerations we have that FR(D,AIC) R(D,AIC) R (D,St(AIC)).
EXAMPLES
The use of active integrity constraints allows the specification of preference criteria; moreover, if a founded repair exists, it is ensured it has been obtained by only performing actions specified by the conditioned updates atoms in the head of the active integrity constraints. In this section, we show that the expression of preferences by means of active integrity constraints gives us the possibility of formulating powerful queries expressing hard problems.
Example 4
•Map coloring: The following set of constraints AIC checks if the coloring of a (possibly partially colored) map can be completed by using only the two colors red and blue.
( X,P) [country(X,P) not col(X,red) not col(X,blue) not col(X,yellow) +col(X,red) +col(X,blue)]
( X,Y,C) [border(X,Y) col(X,C) col(Y,C) - col(X,C) -col(Y,C)]
For each country, we know the name and the population (expressed in millions of inhabitants) while the relation border says whether two countries are neighbors. The two constraints state that colored nodes can be (re-)colored with one of two available colors.
Observe that in the above example, if we delete from the second constraint the update atoms in the head, as colored nodes cannot be recolored, the expressed problem consists in completing the coloring of the map.
In the following, we consider a graph G=<V,E> defined by means of a unary predicate node and a binary predicate edge.
Example 5
•Clique: A clique of a given graph G is a set of nodes such that every pair of nodes in it is connected.
(X) [node(X) not c(X) not nc(X) +c(X) +nc(Y)]
( X,Y) [c(X) c(Y) not edge(X,Y), X ≠ Y - c(X) -c(Y) ]
where c(x) means that the node x belongs to the clique, and nc(x) means that x does not belong to the clique. Initially, the relations c and nc are empty and the input database consists of nodes and edges.
Example 6
•Max Clique: A clique of a given graph G is a set of nodes such that every pair of nodes in it is connected. A clique with maximum cardinality is called max-clique.
(X) [node(X) not c(X) not nc(X) +c(X)] (X,Y) [c(X) c(Y) not edge(X,Y), X ≠ Y -
c(X) -c(Y) ]
where c(x) means that the node x belongs to the maxclique and nc(x) means that x does not belong to the max-clique. Initially, the relations c and nc are empty and the input database consists of nodes and edges.
CONCLUSION AND FUTURE TRENDS
In this article, we have introduced active integrity constraints, a simple and powerful form of active rules with declarative semantics, well suited for computing database repairs and consistent answers. The problem with active integrity constraints is that the existence of founded repairs, in the general case, is not guaranteed.
Under the proposed semantics, called prescriptive, the allowed actions are exactly those specified by the constraints. Under such a semantics, the existence of
348
TEAM LinG

Managing Inconsistent Databases Using Active Integrity Constraints
founded repairs, in the general case, is not guaranteed. We are curretly investigating a different type of semantics called preferable where actions are interpreted as preference conditions on the set of possible repairs, so
that admitting repairs and consistent answers.
A general approach for the computation of repairs and consistent answers in the presence of databases with universal integrity constraints has been proposed in Greco et al. (2001). The technique, presented in Greco et al. (2001), can also be extended for dealing with active integrity constraints.
REFERENCES
Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases. Addison-Wesley.
Alferes, J.J., Leite, J.A., Pereira, L.M., Przymusinska, H.C., & Przymusinski T.C. (2000). Dynamic updates of nonmonotonic knowledge bases. Journal Logic Programming, 45(1-3), 43-70.
Alferes, J.J., Pereira, L.M., Przymusinska, H.C., & Przymusinski, T.C. (2002). LUPSA language for updating logic programs. Artificial Intelligence, 138(1-2), 87-116.
Arenas, M., Bertossi, L., & Chomicki, J. (1999). Consistent query answers in inconsistent databases. Proceedings of the International Conference on Principles of Database Systems (pp. 68-79).
Arenas, M., Bertossi, L., & Chomicki, J. (2000). Specifying and querying database repairs using logic programs with exceptions. Proceedings of the International Conference on Flexible Query Answering (pp. 27-41).
Baral, C. (1997). Embedding revision programs in logic programming situation calculus. IJCAI Conference, 30(1), 83-97.
Baral, C., & Zhang, Y. (2001). On the semantics of knowledge update. IJCAI Conference (pp. 97-102).
Brewka, G., & Eiter, T. (1999). Preferred answer sets for extended logic programs. Artificial Intelligence, 109(1- 2),297-356.
Bry, F. (1997). Query answering in information system with integrity constraints. IFIP WG 11.5 Working Conference on Integrity and Control in Information Systems.
Chomicki, J., Lobo, J., & Naqvi, S.A. (2003). Conflict resolution using logic programming. IEEE Transactions Knowledge Data Engineering, 15(1), 24.
Flesca, S., & Greco, S. (2001). Declarative semantics for |
M |
active rules. TPLP, 1(1), 43-69. |
Gelfond, M., & Lifschitz, V. (1993). Representing action and change by logic programs. J. Log. Program., 17(2/3/ 4),301-321.
Grant, J., & Subrahmanian, V.S. (1995). Reasoning in inconsistent knowledge bases. IEEE Transactin on Knowledge and Data Engineering, 7(1), 177-189.
Greco, G., Greco, S., & Zumpano, E. (2001). A logic programming approach to the integration, repairing and querying of inconsistent databases. Proceedings of the International Conference on Logic Programming.
Kanellakis, P.C. (1991). Elements of relational database theory. In J. van Leewen (Ed.), Handbook of theoretical computer science, volume 2. North-Holland.
Lin, J. (1996). A semantics for reasoning consistently in the presence of inconsistency. Artificial Intelligence, 86(1), 75-95.
Marek, V.W., Pivkina, I., & Truszczynski, M. (1998). Revision programming = logic programming + integrity constraints. Computer Science Logic, (pp. 73-89).
Marek, V.W., & Truszczynski, M. (1998). Revision programming. Theoretical Computer Science, 190(2), 241-277.
Sakama, C., & Inoue, K. (2000). Priorized logic programming and its application to commonsense reasoning. Artificial Intelligence, 123(1-2), 185-222.
Subrahmanian, V.S. (1994). Amalgamating knowledge bases. Proceedings of the ACM ToDS, 19(2), 291-331.
Ullman, J.K. (1988). Principles of database and knowl- edge-base systems, 1.
Wijsen, J. (2003). Condensed representation of database repairs for consistent query answering. Proceedings of the ICDT, 19(2), 378-393.
KEY TERMS
Active Integrity Constraint: A formula of the first order predicate calculus of the form: r = ( X) [Φ Ψ] where Φ is a range restricted conjunction of literals, and Ψ is a disjunction of update atoms.
Consistent Answer: A set of tuples, derived from the database, satisfying all integrity constraints.
Consistent Database: A database satisfying a set of integrity constraints.
349
TEAM LinG