Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
11
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

tion. Both approaches have pros and cons. The first one is easy to implement and allows users to be in charge of what information is stored in their profiles. However, users’ reluctance to provide personal information results in sparse or out-of-date user profiles. For this reason, techniques for (semi-)automatic construction of profiles should be employed as well. Nevertheless, users should be able to inspect and modify their personal information.

Holland, Ester, and Kießling (2003) describe preference mining techniques based on the model presented in Kießling and Köstler (2002). However, existing work has primarily focused on the population of simple, keyword profiles used in IR systems. Given the wealth of preference types, development of preference elicitation techniques remains an open and challenging research problem. For example, capturing preferences such as “I don’t like director W. Allen” or “I like films without violence” is one of the most difficult and challenging issues.

Query Personalization Logic

The purpose of query personalization is to focus a search by integrating relevant information stored in a user profile into the initial query. Which information is relevant and how this is integrated into a query are issues depending on the query, the user profile, and various aspects that comprise the query context, such as the search goal, the time of the request, the user location, the device of access, and so forth. Consideration of the query context is called contextualization (Pitkow et al., 2002). All the above are captured by the personalization logic adopted by the system, which may be represented as a set of criteria and rules. For instance, a rule may specify that the system should return short answers based on a few top user preferences, whenever users access information through a cellular phone. Accurate and effective personalization greatly depends on the personalization logic adopted, thus development of an appropriate and extensible set of rules and criteria is both crucial and challenging. Existing approaches primarily deal with the construction of personalized answers based on the query issued and a user profile (Koutrika, 2003). Consideration of the query context remains an open issue. Detection of a query’s context and context switches as well as formulation of system answers based on these, present great research challenges.

Query Personalization Implementation

Efficient algorithms need to be designed and implemented for implementation of each step of the query personalization process. Naturally, these depend on the user model and query personalization logic used by a personalized system. Such algorithms have been pro-

Database Query Personalization

posed in literature for determining which preferences are syntactically relevant to a query (based on the database structure) and for generating personalized answers (Koutrika & Ioannidis, 2004b). A challenging issue is to determine which preferences are semantically relevant to a query in a given context. For example, a preference for director W. Allen is semantically related to a query about comedies; on the other hand, a preference for director M. Tarkowski is semantically conflicting with the same query. For this purpose, additional knowledge needs to be captured.

FUTURE TRENDS

Personalized database information access opens the door to a new set of challenges and opportunities for the future.

Preference is a fundamental notion in areas of applied mathematics, philosophy, and computer science that deal with decisions and choice. In mathematical decision theory, preferences (or utilities) are used to model people’s economic behavior. In philosophy, they are used to reason about values, desires, and duties. In artificial intelligence, they capture agents’ goals. In databases, they are used for the expression of user query criteria and for the formulation of user profiles. There has been so far little interaction between those areas. The difference in foci and terminology variations make the results obtained in one area difficult to use in another. In my opinion, specialized research and collaboration between them will improve both understanding and modelling of preferences.

Another open research area is the manipulation of multiple user profiles belonging to the same person. Up to now, research efforts have primarily focused on representing a user by a single profile. However, one may have different profiles; for example, one may belong to different groups. Combining and reconciling information stored in diverse profiles, and profile hierarchies are just a few of the challenging topics that need to be addressed towards this direction.

Combining personal preferences with other aspects of a query’s context that call for query customization, such as time of day, user location, etc is certainly an outstanding research challenge in the near future.

Furthermore, because query personalization alters the search experience, the user interface needs to provide a way to explain what the system is doing to personalize the experience as well as to undo the personalization. Therefore, an interesting research direction is towards design of user interfaces that allows users to control the extent of the personalization, and can help alleviate inaccurate personalization.

150

TEAM LinG

Database Query Personalization

Finally, current efforts propose systems built on top of existing database management systems. It is interesting to explore how database technology can be extended, possibly with new operators and methods, in order to support personalized information access from within.

CONCLUSION

Traditionally, information access has followed a querybased paradigm. The advent of the World Wide Web and hand-held electronic devices generated the need for a new personalized information access paradigm. Different approaches aim to personalization of the overall user experience at different levels: content selection, content presentation, and user interaction. This article has focused on the level of personalized content selection and, in particular, on query personalization in databases. It has addressed the main issues and research problems in the area and presented state of the art research efforts. This is an emerging hot area and there is a plethora of open challenges. Personalization of search is the next frontier toward significantly increasing search efficiency.

REFERENCES

Agrawal, R., & Wimmers, E. (2000). A framework for expressing and combining preferences. Proceedings of the ACM International Conference on Management of Data (SIGMOD), Dallas, Texas (pp. 297-306).

André, E., & Rist, T. (2002). From adaptive hypertext to personalized Web companions. Communications of the ACM, 45(5), 43-46.

Belkin, N., & Croft, W. B. (1992). Information filtering and information retrieval: Two sides of the same coin?

Communications of the ACM, 35(12), 29-38.

Bruno, N., Chaudhuri, S., & Gravano, L. (2002). Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Transactions on Database Systems, 27(2), 153-187.

Chaudhuri, S., & Gravano, L. (1999). Evaluating top-k selection queries. Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland (pp. 397-410).

Chen,J.,DeWitt,D.,Tian,F.,&Wang,Y.(2000).NiagaraCQ: A scalable continuous query system for internet databases. Proceedings of the ACM International Confer-

ence on Management of Data (SIGMOD), Dallas, Texas

 

D

(pp. 379 - 390).

Chomicki, J. (2003). Querying with intrinsic preferences.

 

 

ACM Transactions on Database Systems, 28(4), 1-39.

 

Foltz, P., & Dumais, S. (1992). Personalized information

 

delivery: An Analysis of information filtering methods.

 

Communications of the ACM, 35(12), 51-60.

 

Glover, E., Lawrence, S., Birmingham, W., & Lee Giles, C.

 

(1999). Architecture of a metasearch engine that supports

 

user information needs. Proceedings of the ACM Interna-

 

tional Conference on Information and Knowledge Man-

 

agement (CIKM), Kansas City, Missouri (pp. 210-216).

 

Hristidis, V., Koudas, N., & Papakonstantinou, Y. (2001).

 

PREFER: A system for the efficient execution of multi-

 

parametric ranked queries. Proceedings of the ACM

 

International Conference on Management of Data

 

(SIGMOD), Santa Barbara, California (pp. 259-270).

 

Holland, S., Ester, M., & Kießling, W. (2003). Preference

 

mining: A novel approach on mining user preferences for

 

personalized applications. PKDD, LNAI 2838, Cavtat-

 

Dubrovnik, Croatia, 204-216.

 

Karypis, G. (2001). Evaluation of item-based top-n recom-

 

mendation algorithms. Proceedings of the ACM Interna-

 

tional Conference on Information and Knowledge Man-

 

agement (CIKM), Atlanta, Georgia (pp. 247-254).

 

Kießling, W., & Köstler, G. (2002a). Foundations of pref-

 

erences in database systems. Proceedings of the 28th

 

International Conference on Very Large Data Bases

 

(VLDB), Hong Kong, China (pp. 311-322).

 

Kießling, W., & Köstler, G. (2002b). Preference SQL-

 

design, implementation, experiences. Proceedings of

 

the 28th International Conference on Very Large Data

 

Bases (VLDB), Hong Kong, China (pp. 990-1001).

 

Koutrika, G. (2003). A personalization framework for data-

 

base queries. Proceedings of the 2nd Hellenic Data Man-

 

agement Symposium (HDMS), Athens, Greece.

 

Koutrika, G., & Ioannidis, Y. (2004a). Personalization of

 

queries in database systems. Proceedings of the Interna-

 

tional Conference on Data Engineering (ICDE), Boston

 

(pp. 597-608).

 

Koutrika, G., & Ioannidis, Y. (2004b). Personalized queries

 

using a generalized preference model. Proceedings of the

 

3rd Hellenic Data Management Symposium (HDMS),

 

Athens, Greece.

 

Liu, L., Pu, C., & Tang, W. (1999). Continual queries for

 

internet scale event-driven information delivery. IEEE

 

151

TEAM LinG

Transactions on Knowledge and Data Engineering, 11(4), 610-628.

Liu, F., Yu, C., & Meng, W. (2002). Personalized Web search by mapping user queries to categories. Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), McLean, Virginia (pp. 558565).

Pitkow, J., Schutze, H., Cass, T., Cooley, R., TurnBull, D., Edmonds, A., et al. (2002). Personalized search. Communications of the ACM, 45(9), 50-55.

Sakagami, H., Kamba, T., Sugiura, A., & Koseki, Y. (1997). Learning personal preferences on online newspaper articles from user behaviours. Proceedings of the 6th International World Wide Web Conference, Santa Clara, CA (pp. 291-300).

Semeraro,G.,Degemmis,M.,Lops,P.,Thiel,U.,&L’Abbate, M. (2003). A personalized information search process based on dialoguing agents and user profiling. ECIR, Lecture Notes in Computer Science 2633, 613-621.

Shahabi, C., Banaei-Kashani, F., Chen, Y., & McLeod, D. (2001). Yoda: An accurate and scalable Web-based recommendation system. Cooperative Information Systems, Lecture Notes in Computer Science 2172, 418-432.

Smyth, B., Bradley, K., & Rafter, R. (2002). Personalization techniques for online recruitment services. Communications of the ACM, 45(5), 39-40.

Database Query Personalization

KEY TERMS

Filter-Based Information Access Approaches: System responses are filtered on the basis of a rudimentary user profile storing long-term user interests.

Personalization: The approach of providing an overall customized, individualized user experience by taking into account the needs, preferences and particular characteristics of a user (or group of users).

Personalized Database System: A database system that provides personalized answers in response to a user request by dynamically considering relevant user information stored in user profiles. Its basic modules include a query personalization module, and a profile creation module.

Personalized Information Access Approaches:

Information is returned to the user, taking into account the query issued and particular characteristics of a user.

Query Personalization: The process of dynamically enhancing a query with relevant preferences stored in a user profile with the intention of focusing a search and providing individualized answers.

Query-Based Information Access Approaches:

Information is returned to the user, taking into account only the query issued.

User Profile: System-level representation (model) of a user, used for customizing system responses.

152

TEAM LinG

 

153

 

Database Replication Protocols

 

 

 

D

 

 

 

 

 

Francesc D. Muñoz-Escoí

Instituto Tecnológico de Informática, Spain

Luis Irún-Briz

Instituto Tecnológico de Informática, Spain

Hendrik Decker

Instituto Tecnológico de Informática, Spain

INTRODUCTION

Databases are replicated in order to obtain two complementary features: performance improvement and high availability. Performance can be improved when a database is replicated since each replica can serve read-only accesses without requiring any coordination with the rest of replicas. Thus, when most of the application accesses to the data are read-only, they can be served locally without preventing other processes to access the same or other replicas. Moreover, a careful coordination management can ensure that the failure of one or more replicas does not compromise the availability of the database as long as at least one of the replicas is alive.

BACKGROUND

Initially, database replication management had been decomposed into two tasks: concurrency control and replica control, usually tackled by different protocols. In the first case, solutions known from the non-repli- cated domain have been evolved into distributed concurrency control protocols (Bernstein & Goodman, 1981), based either on the two-phase-locking (2PL) or some timestamp-ordering protocol. In the second case, replica control management was based on voting techniques (Gifford, 1979). These voting techniques assign a given number of votes to each replica, usually one, requiring that each read access collects a read quorum (“r” votes) and each write access a write quorum (“w” votes). The database must assign version numbers to the records being replicated so that sum of the values of “r” and “w” is ensured to be greater than the total number of votes and that “w” is greater than half the amount of votes. Thus, it can be guaranteed that each access to the

data reaches at least one copy of the latest version number for each record. This approach ensured consistency, but the resulting communication costs could be prohibitively high.

Voting replica-control protocols with improved features were still used in the next decade, also including routines for system partition management using dynamic voting approaches (Jajodia & Mutchler, 1990).

However, replication management is not easily achieved when concurrency and replica control are merged, since what the replica control protocols do for ensuring consistency has to be accepted by the used concurrency control. Unfortunately, deadlocks and transaction abortions are quite common when both protocols are merged. Thus, it seems adequate to find better solutions for this global management task, that is, replication protocols that cater for both concurrency and replica controls. A first example of this combined technique is Thomas (1979), where a special kind of voting algorithm is combined with timestamp-based concurrency control. However, his solution still relies on simple communication primitives and is not efficient enough, both in terms of response time and abortion rate. In fact, efficient reliable or atomic broadcast protocols were not produced until the mid-1980s (Birman & Joseph, 1987) and could not yet be used in these first stages.

Thus, new replication techniques were introduced in the database arena, having evolved from the process replication approaches known from distributed systems. Depending on the used criteria, several classifications are possible. However, it is useful to distinguish, in a first step, between lazy and eager techniques (Gray, Helland, O’Neil & Shasha, 1996) and limit attention to considering only the update propagation strategy. Both of these approaches are described hereafter, where additional criteria are considered in order to refine this taxonomy.

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

EAGER REPLICATION

Eager replication propagates the updates of a transaction before the transaction commits. This ensures the consistency of all replicas, but all of the computing time needed for communication then needs to be added to the transaction lifetime, thus causing response times that are longer than those of lazy replication techniques. However, if an atomic broadcast protocol (Hadzilacos & Toueg, 1993) is used, concurrency control can be dealt with locally, so the overall communication costs can be kept low.

Several eager replication techniques exist. According to Wiesmann et al. (2000), the following three characteristics are used to classify the following techniques.

Server Architecture (Gray et al., 1996)

Considers where updates of a given data item are initially processed. There are two options:

Update Everywhere (UE): The updates can be initially done in any of the replicas of the data item.

Primary Copy (PC): For each data item, a distinguished replica exists, its primary copy. Only the primary copy may initially process an update of such a data item; that is, there is only one active replica, and it is always the same one.

Server Interaction

Analyzes how many messages are exchanged by the servers during transaction processing. There are two alternatives:

Linear Interaction (LI): Servers exchange messages for each operation involved in the transaction. As a result, the number of messages depends on the transaction length.

Constant Interaction (CI): Servers exchange a fixed number of messages. The typical case is one update message at the end of the transaction.

Transaction Termination

Considers the actions needed by the replicas to decide the result of a transaction. This depends on the determinism of the algorithm. Again, two options exist:

Voting Termination (VT): An additional round of messages is needed to decide whether the trans-

Database Replication Protocols

action has to be committed or aborted. The traditional two-phase commit protocol is a typical example of this kind of termination.

Non-Voting Termination (NT): This option is applicable when all replicas may decide locally on the completion of a transaction. To this end, a deterministic and symmetrical distributed algorithm is needed, in general. However, determinism is not needed when transactions do not conflict; that is, two or more transactions accessing disjoint sets of records may be executed or terminated in any order in different replicas.

Combining the three characteristics above yields eight classes of replication protocols:

1.UE-CI-VT: “Update everywhere” propagation, with “constant interaction” and “voting termination”.

2.UE-CI-NT: “Update everywhere” propagation, with “constant interaction” and “non-voting termination”.

3.UE-LI-VT: “Update everywhere” propagation, with “linear interaction” and “voting termination”.

4.UE-LI-NT: “Update everywhere” propagation, with “linear interaction” and “non-voting termination”.

5.PC-CI-VT: “Primary copy” update propagation, with “constant interaction” and “voting termination”.

6.PC-CI-NT: “Primary copy” update propagation, with “constant interaction” and “non-voting termination”.

7.PC-LI-VT: “Primary copy” update propagation, with “linear interaction” and “voting termination”.

8.PC-LI-NT: “Primary copy” update propagation, with “linear interaction” and “non-voting termination”.

For reducing the communication costs of an eager protocol, CI is better than LI. Traditionally, eager protocols mostly used the LI-VT combination, either with UE or with PC. The authors of this classification proposed UE-CI-VT and UE-CI-NT as the best possible alternatives for eager protocols; some examples of them have been presented in Kemme and Alonso (2000). Each of these protocols needs atomic broadcast. The protocol described in Irún, Muñoz, Decker, and Bernabéu (2003) belongs to the UE-CI-VT class and only requires uniform reliable broadcast, which is faster than atomic broadcast. However, since reliable broadcast does not guarantee that all updates are delivered in the same order in all replicas, a voting termination procedure is needed, since each replica cannot determine locally if a transaction has to be committed or aborted. Thus, using reliable broadcast without voting, the UE-CI-NT technique cannot be implemented.

The UE propagation technique is less scalable than the PC approach, since a coordination phase is needed to find out if different transactions collide, while in the PC

154

TEAM LinG

Database Replication Protocols

alternative, each “primary copy” replica is able to find and manage the transaction conflicts (Gray et al., 1996). In Patiño, Jiménez, Kemme, and Alonso (2000), two UE- CI-NT protocols are described. These protocols use a concurrency control scheme based on “conflict classes”, similar to locks but easier to compute, and requiring only local processing for these control tasks. Moreover, both protocols use an atomic broadcast protocol with optimistic delivery that is able to reduce the communication costs, needing the same number of message rounds as a reliable broadcast. Consequently, these algorithms are easily scalable, thus, eliminating one of the main disadvantages of UE eager protocols.

LAZY REPLICATION

Lazy replication delays the propagation of updates until the transaction has committed. Once committed, updates are propagated. This approach enables a fast transaction completion but does not always ensure replica consistency and may therefore lead to a high abortion rate. Despite its disadvantages, this technique has been used in several commercial DBMSs. Moreover, it is the only possible option for replicating mobile or disconnected databases.

Depending on the server architecture, as with eager protocols, two classes of lazy protocols can be distinguished: update everywhere (UE) and primary copy (PC).

Update everywhere protocols allow that any one of the replicas may update its local data directly, while transmitting the updates to the other replicas thereafter. If the concurrency control tasks are checked before commit time, this may lead to a high abortion rate. Otherwise, a reconciliation procedure is needed to merge the updates of conflicting transactions. For concurrency control purposes, a timestamp-based solution is commonly used. However, it is extremely difficult, if not impossible, to provide one-copy serializability guarantees for these kinds of protocols. Several algorithms providing one-copy serializability have been published, but they either use an additional precommit phase that may lead later to the abortion of the transaction (Agrawal, El Abbadi & Steinke, 1997), violating thus a strict lazy replication definition; or use a specific broadcasting topology which prevents that some of the replicas may initiate update broadcasts, thus resulting in something similar to the primary copy approach in eager systems, while limiting the general usefulness of such solutions (Anderson, Breitbart, Korth & Wool, 1998).

Since in primary copy protocols the first access is always managed by the primary replica, the latter may use locks or timestamps to avoid or detect conflicts between transactions. If locks are used, deadlocks may arise, and

additional deadlock-managing protocols are then

 

D

needed. When timestamps are used, the resulting abor-

tion rate will be high, at least when compared to eager

 

 

solutions. Primary copy solutions have been used in

 

some commercial database systems. For instance, they

 

are still supported in Sybase Replication Server (Sybase,

 

2003). In these systems, two opposed trends can be

 

identified. The first one uses replication to ensure only

 

availability, but not to improve performance. In that

 

case, the replicas behave as standby copies of the

 

primary replica. In the second one, replication is mainly

 

used to enhance performance, and serializable consis-

 

tency is not maintained. However, in this context, it is

 

worth noting that most applications can be run per-

 

fectly with relaxed consistency modes or isolation

 

levels. Indeed, the default isolation level of most rela-

 

tional DBMSs is not serializable, but read committed

 

(for instance, in PostgreSQL) or repeatable read.

 

As observed above, lazy protocols are the unique

 

option for mobile or disconnected database. One of the

 

first studies in this area is Gray et al. (1996). It de-

 

scribes a two-tier protocol that can manage mobile

 

replicas. To this end, Gray et al. classify the replicas

 

into two distinct groups:

 

Base nodes: Those that are always interconnected. They use an eager replication protocol to propagate the updates among them.

Mobile nodes: Those that are usually disconnected. They propose tentative update transactions to data items owned by base nodes.

This protocol requires that mobile nodes maintain two data item versions: a local one, and a so-called best-known master version. Thus, when a mobile node connects to a base node, it proposes tentative update transactions to a “primary copy” base node. These transactions are re-executed in the base node; they may succeed or not. Moreover, tentative transactions are designed as commutable with other transactions, improving thus their probability of successful termination. If a tentative transaction is rejected, its mobile node has to reconcile its updates. Additionally, the connection procedure updates the mobile replica, applying the updates it has missed during the disconnected period.

Regular updates can be transformed into commutable updates if they do not overwrite data items with new values, but only increment or decrement the data item’s previous value. This is the principle used in the mobile protocol described above. Similar solutions were used in the second protocol of Patiño et al. (2000), as mentioned in the section on eager protocols above. The aim of both protocols is the same: to reduce

155

TEAM LinG

the abortion rate. This problem is particularly important in all mobile environments.

FUTURE TRENDS

Future trends in database replication are, among others, the improvement of mobile databases support, the development of hybrid replication protocols, and the minimization of the blocking periods during replica recoveries. They are outlined in the following paragraphs.

Currently, only a few replication protocols for mobile databases exist; it is an open research problem where lazy replication protocols will be applied. Protocols designed for replication and consistency maintenance in mobile databases are easily portable to mere file systems, thus, serving as a basis for constructing tools to ensure the consistency of file system replicas in portable computers and PDAs, for instance.

Hybrid protocols are a third kind of update propagation solution when database replication is considered. The COLU protocol (Irún, Muñoz & Bernabéu, 2003) is an example of such a hybrid solution. By default, it is an UE-CI-VT lazy protocol, but it allows the configuration of the number of replicas receiving updates at commit time (i.e., the number of synchronous replicas), before the transaction is terminated. So, it may behave as a pure lazy protocol when no synchronous replicas are used, or as a pure eager protocol when all replicas are configured as synchronous. Moreover, intermediate configurations are also possible, guaranteeing that transactions are not lost when failures arise. This hybrid protocol is able to use a lazy recovery strategy, minimizing the recovery time.

Database replication protocols always need a recovery protocol, but traditionally the research works in this area did not focus on such recovery algorithms. This situation has recently changed. Kemme, Bartoli, and Babaoglu (2001) proposed a multiple-stage lazy recovery that minimized the blocking time, both in the recovering and source replicas. Jiménez, Patiño, and Alonso (2002) improved this solution, distributing the source role among several active replicas and reducing the amount of logged data to be transmitted, using periodic checkpoints. Further work remains to be done in this domain in order to reduce the amount of transmitted data or the blocking time, minimizing thus the recovery delays.

CONCLUSION

Database replication has commonly used lazy protocols in commercial DBMSs, thus, ensuring good perfor-

Database Replication Protocols

mance but without providing one-copy serializability. To overcome resulting problems, eager replication protocols with group communication support have been proposed. The use of atomic broadcast protocols has enabled the development of “update everywhere” propagation with “constant interaction” and “non-voting termination” eager protocols. Such solutions, combined with commutable transactions, ensure one-copy serializability with a performance similar to lazy protocols, plus easy scalability and very low abortion rates. Although these solutions are not available in commercial database management systems yet, we expect them to have a marketable impact soon.

New areas of research in the database replication arena are those of mobile databases and hybrid replication. Mobile databases require a lazy replication solution. This ensures that lazy protocols will again be of interest for database researchers and developers. The main problem to be solved in this area relates to transaction reconciliation procedures. No fully automated procedure yet exists.

Hybrid replication protocols will be able to provide the appealing properties of the new-generation eager protocols when one-copy serializability is needed, and the very good performance of traditional lazy protocols when a more relaxed consistency model is feasible.

REFERENCES

Agrawal, D., El Abbadi, A., & Steinke, R. (1997). Epidemic algorithms in replicated databases. Proceedings of the 16th ACM Symposium on Principles of Database Systems (pp. 161-172).

Anderson, T., Breitbart, Y., Korth, H.F., & Wool, A. (1998). Replication, consistency, and practicality: Are these mutually exclusive? Proceedings of the ACM SIGMOD International Conference on the Management of Data

(pp. 173-182).

Bernstein, P., & Goodman, N. (1981). Concurrency control for distributed database systems. ACM Computing Surveys, 13(2), 185-221.

Birman, K.P., & Joseph, T.A. (1987). Reliable communication in the presence of failures. ACM Transactions on Computer Systems, 5(1), 47-76.

Gifford, D.K. (1979). Weighted voting for replicated data.

Proceedings of the 7th ACM Symposium on Operating System Principles (pp. 150-162).

Gray, J., Helland, P., O’Neil, P., & Shasha, D. (1996). The dangers of replication and a solution. Proceedings of the ACM SIGMOD Conference (pp. 173-182).

156

TEAM LinG

Database Replication Protocols

Hadzilacos, V., & Toueg, S. (1993). Fault-tolerant broadcasts and related problems. In S. Mullender (Ed.), Distributed systems (pp. 97-145). Reading MA: AddisonWesley.

Irún, L., Muñoz, F.D., & Bernabéu, J. (2003). An improved optimistic and fault-tolerant replication protocol. Lecture Notes in Computer Science, 2822, 188-200.

Irún, L., Muñoz, F.D., Decker, H., & Bernabéu, J. (2003). COPLA: A platform for eager and lazy replication in networked databases. Proceedings of the 5th International Conference on Enterprise Information Systems

(pp. 273-278).

Jajodia, S., & Mutchler, D. (1990). Dynamic voting algorithms for maintaining the consistency of a replicated database. ACM Transactions on Database Systems, 15(2), 230-280.

Jiménez, R., Patiño, M., & Alonso, G. (2002). Non intrusive, parallel recovery of replicated data. Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems (pp. 150-159).

Kemme, B., Bartoli, A., & Babaoglu, O. (2001). Online reconfiguration in replicated databases based on group communication. Proceedings of the International Conference on Dependable Systems and Networks (pp. 117130).

Kemme, B., & Alonso, G. (2000). A new approach to developing and implementing eager database replication protocols. ACM Transactions on Database Systems, 25(3), 333-379.

Patiño, M., Jiménez, R., Kemme, B., & Alonso, G. (2000). Scalable replication in database clusters. Lecture Notes in Computer Science, 1914, 315-329.

Sybase, Inc. (2003). Replication strategies: Data migration, distribution and synchronization (Technical White Paper). Retrieved January 23, 2005, from http:// www.sybase.com/detail/1,6904,1028711,00.html

Thomas, R.H. (1979). A majority consensus approach to concurrency control for multiple copy databases. ACM Transactions on Database Systems, 4(2), 180-209.

Wiesmann, M., Pedone, F., Schiper, A., Kemme, B., & Alonso, G. (2000). Database replication techniques: A three-parameter classification. Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems (pp. 206-215).

KEY TERMS

D

Active Replica: The replica that directly processes a given transaction, generating the updates that later will be transmitted to the other, passive replicas. The active status of a replica functionally depends on the given transaction.

Asynchronous Replication: Lazy replication; that is, all updates of a transaction (if any) are transmitted to passive replicas once the transaction is committed but never ahead of commit time.

Atomic Broadcast: Requires that each correct process delivers all messages in the same order, that is, a reliable broadcast with total order.

Eager Replication: See Synchronous Replication

Hybrid Replication: A replication technique using protocols that may either behave as eager or as lazy, depending on the given system configuration.

Lazy Replication: See Asynchronous Replication

Passive Replica: A replica that does not directly process the transaction. Instead, it only applies updates received from active replicas.

Reconciliation Procedure: When, in lazy protocols, two conflicting transactions have been committed before conflict detection, this procedure is needed to adequately reconcile and merge the respective updates.

Reliable Broadcast: Requires that each correct process delivers the same set of messages and that the set includes each message broadcast by correct processes, but no spurious messages.

Synchronous Replication: Eager replication; that is, transaction updates are propagated before the transaction is committed.

157

TEAM LinG

158

Database Support for Workflow Management Systems

Francisco A. C. Pinheiro

Universidade de Brasília, Brazil

INTRODUCTION:

WORKFLOW SYSTEMS

A workflow is a series of work processes performed under rules that reflect the formal structure of the organization in which they are carried out and the relationships between their various parts. Workflow applications are software applications used to automate part of workflow processes. They run under the control of a workflow management system (WfMS). The WfMS usually comprises an organizational model, describing the process structure, and a process model, describing the process logic. The Workflow Management Coalition (WfMC, 2004) publishes a set of workflow definitions and related material, including a reference model.

Databases are commonly used as a WfMS supporting technology. Not only workflow data are maintained in databases, but also the rules governing processes can be stored in database schemas. Database functionality can be used both for defining and managing process models as well as for environment notification and process enactment. This article shows how particular databaserelated technologies can be used to support WfMS.

Table 1 relates workflow issues and the database technologies that can be used to deal with them. It

summarizes the content of this article presenting the relationships discussed in the text. The next two sections discussing workflow management issues and database support are related to the columns and lines of the table and provide an explanation for these relationships. Only the general relationships stressed in the text are shown. For example, data replication and partitioning have an influence on scalability and privacy, but the table does not show the same influence with respect to distributed database technology, although data replication and partitioning are basic strategies of distributed databases.

BACKGROUND:

WORKFLOW MANAGEMENT ISSUES

Workflow applications are usually complex, distributed applications consisting of activities performed by people playing different roles, sometimes involving multiple departments and organizations. It is not unusual for different workflows to be governed by different WfMS running under a variety of platforms, application servers, and communications middleware. This situation requires ways of dealing with diversity and complexity

Table 1. General relationships between workflow issues and database technologies

 

 

ENVIRONMENT

 

 

 

 

PROCESS

 

 

 

 

 

 

 

 

 

 

PEOPLE

 

 

 

 

 

 

DATA

 

 

 

Diversity

Interoperability

Scalability

Collaborative work

Flexibility

 

Evolution

Assignment of tasks

Consistency

 

Changes

Privacy

Location

 

Semantic heterogeneity

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DATABASE TYPES

Distributed

ü

ü

 

 

 

 

 

 

 

 

 

 

 

 

ü

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parallel

 

 

ü

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Multi

ü

ü

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Active

 

 

 

 

 

 

 

ü

 

 

 

 

ü

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Data replication

 

 

ü

 

 

 

 

 

 

 

 

ü

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DATABASE TECHNOLOGIES

Data

 

 

ü

 

ü

 

ü

 

ü

 

ü

ü

 

 

ü

Schema

 

 

 

 

 

 

 

 

 

partitioning

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Synchronisation

 

 

 

 

 

 

 

 

ü

 

ü

 

 

 

 

 

techniques

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Transaction

 

 

 

ü

ü

 

 

 

ü

 

 

 

 

 

 

 

models

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Metadata

 

 

 

ü

ü

 

ü

 

ü

 

 

 

ü

 

ü

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Database Support for Workflow Management Systems

and imposes a need for interoperability and scalability. Other issues like heterogeneity, consistency, privacy, and flexibility naturally arise when we think about the configuration of such an environment and the development of workflow applications. Stohr and Zhao (2001) present a detailed discussion of workflow issues.

Environment issues

Workflow diversity ranges from organization to infrastructure. Organizational diversity involves different processes, rules, and ways of organizing work. Infrastructure diversity involves different platforms, communication protocols, programming languages, and data formats. Interoperability is needed for applications running in such a diverse environment. The changing environment also makes scalability crucial to cope with the addition of new organizational units and redistribution of work to other (sometimes geographically distant) players. Support to these issues may be found in technologies incorporated into distributed, parallel, and multidatabases.

People Issues

Collaborative work should not be hindered by overly rigid procedures. There should be room to modify processes as necessary, making possible to follow different paths by changing the ordering of activities or suppressing some of them. Flexibility and evolution are necessary to cope with different views about the organization of work and use of resources. The need for flexible and evolving environments is supported by metadata descriptions and schema evolution. Advanced transaction models should be in place to reconcile flexibility with consistency.

Process Issues

A number of workflow patterns for relevant process issues is described by van der Aalst et al. (2003). Particularly, the assignment of tasks is a highly dynamic activity that should be performed promptly and correctly. Dynamic team formation (Georgakopoulos, 2004), with members being added and released as a process goes on, requires the system to be aware of its environment. Active databases provide some sort of environment awareness and synchronization techniques may be used to keep processes and their data in a consistent state.

Data Issues

D

Data change in different ways, including location, as a result of people performing their activities. To locate the relevant data in a changing environment, with autonomous collaborating agents, is a difficult task. A complicating factor in distributed environments is that of semantic heterogeneity, in which people assign different meanings to the same or closely related data. We also have the issue of transparency in which some data should be shared while others should be kept private.

The use of metadata description and schema evolution help to maintain data availability and consistency, and to deal with semantic heterogeneity. Techniques of data replication and partitioning have an impact on privacy, and active database technology may be employed to notify and deliver the needed data to agents.

DATABASE SUPPORT

This section relates particular database technologies to the workflow issues discussed above.

Distributed, Active, and

Multi-Databases

Distributed databases are used to manage data stored in several places, while maintaining a uniform control over the operations that use and change them. Parallel databases allow the parallel execution of actions and multidatabases, also referred to as federated or heterogeneous databases, make possible the coordinated use of several databases. These types of databases are already being used by workflow management systems that apply their internal mechanisms to deal with diversity, scalability, interoperability, and heterogeneity.

Active database management systems make possible the system to be aware of what is happening around it and to react properly and spontaneously, for example, firing triggers and alerters. They are useful for the assignment of tasks and location of data and provide an appropriate support to build adaptation mechanisms into workflow systems (Bernstein et al., 1998).

Data Replication and Partitioning

Data may be partitioned or replicated in different locations to improve performance and scalability and to assure privacy. Data replication and partitioning are basic strategies used in distributed and parallel databases. An optimal strategy to set the right combination of replication

159

TEAM LinG

Соседние файлы в предмете Электротехника