Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
14
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

330

Knowledge Mining

Mahesh S. Raisinghani

Texas Woman’s University, USA

INTRODUCTION

Numerous conferences and several articles in scholarly and business journals have tried to get a handle on knowledge. The growth of knowledge consulting organizations signals a growing conviction that knowing about knowledge is critical to business success. Multiple factors have led to the current knowledge boom. The perception and the reality of a new global competitiveness are one driving force. Rapid change and increasing competition for the dollars, marks, and yen of increasingly sophisticated consumers have led firms to seek a sustainable advantage that distinguishes them in their business environments.

Knowledge is neither data nor information, though it is related to both. Most people have an intuitive sense that knowledge is broader, deeper, and richer than data or information. More and more, business leaders and consultants talk about knowledge as the key to a sustainable competitive advantage. Knowledge workers, knowl- edge-creating company, knowledge capital, and leveraging knowledge have become familiar phrases (Davenport & Prusak, 1998; Turban, McLean, & Wetherbe, 2003).

During his keynote speech at the Information Resources’ annual meeting in Boston, Massachusetts, Venkatraman (1998) discussed how companies manage their knowledge assets and how organizations have moved from the industrial economy to the knowledge economy (see Figure 1).

The purpose of this article is to discuss the knowledge concept of knowledge mining and address the following questions:

How are the concepts of data life cycle and knowledge discovery related?

What is the taxonomy of knowledge mining and its benefits?

What is the role of knowledge in software development?

BACKGROUND: DATA LIFE CYCLE AND KNOWLEDGE DISCOVERY

To better understand how to manage data and knowledge, it is necessary to trace how and where data flow in organizations. Businesses do not run on data, they run on information and their knowledge of how to put that information to use successfully. Everything from innovative product designs to brilliant competitive moves relies on knowledge. However, knowledge is not readily available. In many cases, it is continuously derived from data. However, because of the difficulties, a derivation may not be simple.

The transformation of data into knowledge may be accomplished in several ways. The process starts with data collection from various sources. These data are stored in a database followed by storage in a data warehouse. To discover knowledge, the processed data may go through a transformation that makes them ready for analysis. The analysis is done with data mining tools, which look for patterns, to support data interpretation. The result of all these activities is generated knowledge. Both the data, at various times during the process, and the knowledge, derived at the end of the process, may need to be presented to users by using different presentation tools. As illustrated in Figure 2, the created knowledge is stored in a knowledge base (Turban et al., 2003).

Figure 1. Transition from the industrial economy to the knowledge economy

Production

 

 

Capital

 

 

 

Knowledge

 

 

 

 

 

 

 

 

Economy of

 

 

Economy of

Economy of

Scale

 

 

Scope

 

 

 

Expertise

 

 

 

 

 

 

 

 

 

 

 

Design

 

Expertise

Leverage

Execution

Data

Information

Knowledge

Action

Result

 

 

 

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Knowledge Mining

Figure 2. Data life cycle and knowledge discovery

 

1

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

3

 

 

Data Warehousing

 

 

 

Data Analysis

 

 

4

 

 

 

 

 

 

 

 

 

Collection5

Selection

Preprocessing

Transformation

Data Mining

Interpretation

 

Data

6

 

 

 

 

 

 

 

 

 

e

7

 

 

 

 

 

 

 

 

 

Use

 

 

 

 

 

 

 

 

 

Source

8

 

 

 

 

 

 

 

 

 

 

9

 

 

 

 

 

 

 

 

 

Data10

Storage

Target Data

Preprocessed Data

Transformed

Patterns

Knowledge

 

11

 

 

 

 

 

Data

 

 

 

 

12

 

 

 

 

 

 

 

 

 

 

13

 

 

 

 

 

 

 

 

Storage

 

14

 

 

 

Presentation

 

 

 

 

 

 

 

 

 

 

 

 

15

 

 

 

 

 

 

 

 

Knowledge base

 

 

 

 

 

 

 

 

 

 

K

KNOWLEDGE MINING:

TAXONOMY AND BENEFITS

All decision support systems use data, information, or knowledge. These three terms are sometimes used interchangeably. Data items refer to an elementary description about things, events, activities, and transactions that are recorded, classified, and stored but are not organized to convey any specific meaning. Data items can be numeric, alphanumeric, figures, sounds, or images. Knowledge consists of data items that are organized and processed to convey understanding, experience, accumulated learning, and expertise as they apply to current problem or activity (Grinstein, 2003; Grinstein, Kobsa, Plaisant, Shneiderman, & Stasko, 2003; Last, Friedman, & Kandel, 2003; Tan, Kumar, & Srivastava, 2004; Turban & Aronson, 1998).

The mental processing and representation of knowledge are complex activities, and our understanding is still rudimentary and subject to debate. A general concept for describing knowledge is an elusive as ever, though various key concepts have been developed from specific viewpoints in the cognitive sciences. Another way to define knowledge is to consider the way it is stored in human memory. Here, knowledge refers to a permanent structure of information stored in memory.

Han Koperski, Melli, Wang, and Zaane (1995) used the term knowledge mining as a practical synonym of knowledge discovery, not as an extension of it. At present, the use of the term is strongly associated as a synonym of knowledge discovery and data mining. In contrast, the term software mining is a special kind of

knowledge discovery wherein the source data is already in the form of rules or program code.

Knowledge mining consists of the following four integrated components, designed to seamlessly guide the extraction process and contribute to providing corporations with a concise understanding of their business rules (Aiken, Muntz, & Richards, 1994; Chiang, 1995; Weiss, Buckley, Kapoor, & Damgaard, 2003; Yang, Hongji, & Chu, 2001; Yang, Hongji, Chu, Cheng, & Zhan, 2001):

System-wide knowledge recovery provides an overall view of the business processes supported by an application. The system-wide knowledge recovery facility enables analysts to identify the programs in which particular business rules exist and then extract those rules from applications.

Program-level analysis enables the structure and interrelationships within programs to be revealed. The program focuses on variable usage, paragraph calls diagrams, GO TO diagrams, execution paths, and complex queries.

Business rule extraction enables concise business rules to be extracted from within legacy programs and across entire legacy systems. Support is provided by

variable-based techniques, which enable a business rule to be extracted based upon a specific variable within a program.

value-based specialization techniques, which contain embedded data that can be greatly simplified and specific rules that can be uncovered.

331

TEAM LinG

system-wide techniques, in which some legacy applications pass information from one program to another while processing a specific business rule. The system-wide extraction technique enables a business rule to be analyzed and extracted across multiple programs or across the system as a whole.

Automatic documentation allows for a structured view of the application and generates books of HTML documentation. Knowledge mining’s automatic documentation focuses on the ability to save diagrams and reports in various formats.

Knowledge mining satisfies the needs of corporations by helping to identify and understand the application assets that currently exist within their legacy applications. Knowledge mining:

increases understanding of internal process. Knowledge mining helps corporations to unlock the business knowledge contained in their embedded rules, providing a better understanding of the internal processes that run their external business.

increases reuse of existing assets. Increasingly, large portions of an organization’s business rules can be reused in new platforms once identified and understood by the organization.

decreases risk and cost of alternatives. Once a solid understanding of a corporation’s business knowledge is obtained, organizations are then well positioned to provide input into the planning processes for legacy modernization initiatives, thus, decreasing the risk for determining incorrect alternatives for business needs, and increasing the value of their business processes.

For instance, RescueWare, a knowledge mining tool, enables organizations to identify and extract critical business rules embedded in legacy applications (Relativity Technologies, n.d.). The U.S. Air Force felt grounded because its core business processes (a key retail supply management system) was buried in a Unisys 2200 Clearpath mainframe computer, and its data was trapped in a proprietary DMS-100 database. The Air Force wanted to move to a flexible Web-based infrastructure to reduce information technology costs and to streamline its sys- tems—all major priorities that could not be achieved with the inflexible structure of its legacy systems. Specifically, the Air Force wanted to Web-enable its standard base supply system (SBSS), a series of inventory, accounting, and order management systems that control the flow of supplies from the warehouse to deployment and integrate the SBSS system into the larger Air Force

Knowledge Mining

Integrated Logistics System, a broad set of supply maintenance and accounting systems. It also wanted to provide direct support to all active Air Force units, the National Guard, and the Reserve.

RescueWare’s automated functionality enabled the Air Force to reduce the expected project time to 30 months and reduce the cost to $12 million—much lower than the investment that would have been required to complete this task manually. In the inventory and analysis phase of the Air Force project, RescueWare solutions were used to analyze and identify the key areas of value within the large, monolithic system containing approximately 1.6 million lines of code that had been built and refined by the Air Force over the course of three decades.

Then RescueWare’s revolutionary knowledge mining capabilities isolated the Air Force’s key business processes and extracted them from the larger application in which they had been contained. RescueWare’s business rule extraction tools were used to create defined, stand-alone pieces called e-components that function as independent programs and can be easily integrated with other parts of the military’s large technology framework. RescueWare was used to extract complexity information on these various components that proved important for managing the project and determining the best path in performing the tasks associated with the project (Relativity Technologies, n.d.). Knowledge mining would be able to analyze business rules and software code and extract more general rules or models from them. However, the machine learning techniques that would be necessary to deal with such complex input data, are still in its infancy.

FUTURE TRENDS

Software development is knowledge intensive. Many concepts have been developed to ease or guide the processing of knowledge in software development, including information hiding, modularity, objects, functions and procedures, patterns and more. Methods and approaches in software engineering are often based on the results of empirical observations on individual success stories. The following table lists the viewpoint corresponding to each key knowledge concept in the cognitive sciences (Robillard, 1999).

Related studies have identified two types of knowl- edge-procedural and declarative-and their corresponding memory contents. Procedural knowledge, including psychomotor ability, is dynamic. Procedural memory stores all the information related to the skills developed to interact with our environment such as walking, typing, and so forth. Procedural knowledge

332

TEAM LinG

Knowledge Mining

Table 1. Viewpoints corresponding to each key knowledge concept in the cognitive sciences

 

Key knowledge concept

Viewpoint

Procedural / Declarative

Knowledge, nature of content

Schema

Knowledge, integral structure of

Proposition

Formal knowledge representation

Chunking

Representing units of knowledge

Planning

Managing knowledge structures

encompasses the know-how, and once learned is rarely forgotten.

Declarative knowledge, based on facts, is static and concerned with the properties of objects, persons, and events and their relationships. Declarative memory consists of two types of knowledge—topic or semantic, and episodic. Topic knowledge refers to the meaning of a word, such as its definitions in a dictionary or textbook. Episodic knowledge consists of ones experience with knowledge. These are learned through experience once the topic knowledge is obtained from textbooks, formal training, and education (Robillard, 1999).

Software development requires topic and episodic knowledge. The notion of “schema” was first proposed for artificial intelligence. The schema concept assumes that knowledge is stored in a human’s memory in a preorganized way. A schema is a generic structure built up from an undefined variety of topics and from episodic knowledge. The topic part of the schema represents objects or events; the episodic part represents temporal or causal links between objects or events. For instance, our schema of the operating system represents our memory organization of the related items of topical knowledge, including icons, setup, layout, and menu structure. It also comprises episodic knowledge built up from user’s experience with the operation system, including how to run a program, open a file, use a spreadsheet, listen to music, and send e-mail.

Knowledge formulation is based on atomic components described in terms of propositions and predicates. A proposition is the smallest unit of knowledge constituting an affirmation as well as the smallest unit that can be true or false. The theoretical hypothesis concerning the cognitive structure of human information system states that, at a certain level, information is organized in propositional form. Another component of the mental process is the amount of knowledge available for immediate processing. Psychologists use the concept of chunks (7 + or - 2) to account for the limited amount of

knowledge that can be handled by the human mind at any

given time. Software methodologies based on encapsu- K lation, information hiding, modularization, abstraction,

and even the divide-to-conquer approach all deal with the chunking phenomenon. Successful methodologies based on icons, graphic symbols, and reserved words are naturally limited to the chunk number for the simultaneous use of elements in working memory.

Planning is one of the human brain’s most powerful natural activities. The limited capacity of the human mind’s working memory cannot keep track of all the information from all the knowledge domains visited. These plans have three main characteristics (Robillard, 1999):

heuristic nature

optimal use of memory

higher control level

Experience plays a major role in any knowledge activity. Psychologists recognize a distinct structure (i.e., episodic structure) in human memory that accounts for experience. Software development can be improved by recognizing the related knowledge structure or representation, including building schemas, validating schema default values, acquiring topic knowledge, performing planning activities, applying formal specifications to define problems, and having the appropriate tools to manage the chunking phenomenon.

CONCLUSION

This article summarizes the elements necessary for the comprehension of knowledge mining. The transformation of data into knowledge to support decision making is a multiple step process and can be accomplished using different tools. Data mining and data warehousing play a major role in knowledge discovery. Software development is knowledge intensive and is based on the five key knowledge concepts in the cognitive sciences (i.e., procedural/declarative, schema, proposition, chunking, and planning). Mining knowledge at multiple concept levels may help end-users such as software analysts, managers/executives, or other decision-making personnel to find some interesting rules that are difficult to discover otherwise and view database contents at different abstraction levels and from different perspectives.

333

TEAM LinG

REFERENCES

Aiken, P., Muntz, A., & Richards, R. (1994). DoD legacy systems: Reverse engineering data requirements. Communications of the ACM, 37(5), 26-41.

Chiang, R. H. L. (1995). A knowledge-based system for performing reverse engineering of relational database.

Decision Support Systems, 13, 295-312.

Davenport, T. H., & Prusak, L. (1998). Working knowledge. Cambridge, MA: Harvard Business School Press.

Grinstein, G. (2003, October). Integrating visualization with data mining and knowledge discovery for high dimensional data exploration and discovery. Proceedings of the IEEE Visualization Conference, Seattle, WA.

Grinstein, G. , Kobsa, A., Plaisant, C., Shneiderman, B., & Stasko, J. (2003, October). Which comes first, usability or utility? IEEE Visualization Conference Proceedings, Seattle, WA.

Han, J., Fu, Y., Koperski, K., Melli, G., Wang, W., & Zaane, O. (1995). Knowledge mining in databases: An integration of machine learning methodologies with database technologies. Available online from http:// citeseer.ist.psu.edu/han95knowledge.html

Last, M., Friedman, M., & Kandel, A. (2003, August 2427). The data mining approach to automated software testing. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC (pp. 388-396).

Relativity Technologies. (n.d.) USAF saves millions with supply management software. Retrieved June 29, 2004, from http://64.233.161.104/search?q=cache :xdJpf6azZEJ:www.relativity.com/News/ coverage/eCompany- May-01.htm+RescueWare+ software&hl=en

Robillard, P. N. (1999). The role of knowledge in software development. Communications of the ACM, 42(1), 87-92.

Tan, P.-N., Kumar, V., Srivastava, J. (2004). Selecting the Right Objective Measure for Association Analysis Information Systems. Information Systems Archive, 29(4), 293313.

Turban, E., McLean, J., & Wetherbe, J. (2003). Information technology for management, making connections for strategic advantage (3rd Ed.). New York: Wiley.

Turban, E., & Aronson, R. (1998). Decision support systems and intelligent systems (5th Ed.). Upper Saddle River, NJ: Prentice Hall.

Knowledge Mining

Venkatraman, N. (1998). Keynote speech. Annual Meeting of the International Resources Management Association, Boston.

Weiss, S. M., Buckley, S. J., Kapoor, S., & Damgaard, S. (2003). Knowledge-based data mining. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, August 24-27 (pp. 456-461).

Yang L., Hongji, Y. & Chu. W. (2001, January/February). A concept-oriented belief revision approach to domain knowledge recovery from source code. Journal of Software Maintenance and Evolution: Research and Practice, 13(1), 31-52.

Yang L., Hongji, Y., Chu, W., Xiaochun, C. & Zhan, C. (2001). Improving the reliability of knowledge mining in legacy code by utilising cooperative information [Special issue]. International Journal of Fuzzy Systems, 3(2).

KEY TERMS

Automatic Documentation: Allows for a structured view of the application and generates books of HTML documentation. Knowledge mining’s automatic documentation focuses on the ability to save diagrams and reports in various formats.

Business Rule Extraction: Enables concise business rules to be extracted from within legacy programs and across entire legacy systems.

Declarative Knowledge: Based on facts, is static and concerned with the properties of objects, persons, and events and their relationships.

Episodic Knowledge: Declarative memory consists of two types of knowledge—topic or semantic, and episodic. Episodic knowledge consists of ones experience with knowledge. These are learned through experience once the topic knowledge is obtained from textbooks, formal training, and education.

Knowledge Mining: A practical synonym of knowledge discovery, not an extension of it. At present, the use of the term is strongly associated as a synonym of knowledge discovery and data mining. Knowledge mining consists of the following four integrated components designed to seamlessly guide the extraction process and contribute to providing corporations with a concise understanding of their business rules: Systemwide knowledge recovery, program-level analysis, business rule extraction, and automatic documentation.

334

TEAM LinG

Knowledge Mining

Program-Level Analysis: Enables the structure and interrelationships within programs to be revealed. The program focuses on variable usage, paragraph calls diagrams, GO TO diagram, execution paths, and complex queries.

Software Mining: A special kind of knowledge discovery in which the source data is already in the form of rules or program code.

System-Wide Knowledge Recovery: Provides an over-

all view of the business processes supported by an K application. The system-wide knowledge recovery facil-

ity enables analysts to identify the programs in which particular business rules exist and then extract those rules out from across applications.

Topic Knowledge: Declarative memory consists of two types of knowledge-topic or semantic and episodic. Topic knowledge refers to the meaning of a word, such as its definitions in a dictionary or textbook.

335

TEAM LinG

336

Logic Databases and Inconsistency Handling

José A. Alonso-Jiménez

Universidad de Sevilla, Spain

Joaquín Borrego-Díaz

Universidad de Sevilla, Spain

Antonia M. Chávez-González

Universidad de Sevilla, Spain

INTRODUCTION

Nowadays, data management on the World Wide Web needs to consider very large knowledge databases (KDB). The larger is a KDB, the smaller the possibility of being consistent. Consistency in checking algorithms and systems fails to analyse very large KDBs, and so many have to work every day with inconsistent information.

Database revision—transformation of the KDB into another, consistent database—is a solution to this inconsistency, but the task is computationally untractable. Paraconsistent logics are also a useful option to work with inconsistent databases. These logics work on inconsistent KDBs but prohibit nondesired inferences. From a philosophical (logical) point of view, the paraconsistent reasoning is a need that the self human discourse practises. From a computational, logical point of view, we need to design logical formalisms that allow us to extract useful information from an inconsistent database, taking into account diverse aspects of the semantics that are “attached” to deductive databases reasoning (see Table 1). The arrival of the semantic web (SW) will force the database users to work with a KDB that is expressed by logic formulas with higher syntactic complexity than are classic logic databases.

BACKGROUND

Logic databases are based on the formalisms of first order logic (FOL); thus, they inherit a classical semantics that is based on models. Also, they can be interpreted within a proof–theoretic approach to logical consequence from the logic programming paradigm (Lloyd, 1987). The extended database semantics paradigm is developed to lay before the foundations of query-answering tasks and related questions (see Minker, 1999), but its aim is not to deal with inconsistencies. The data cleaning task may involve—in the framework of repairing logic databases—

Table1. Semantics aspects to consider in logic databases

Classical semantics for FOL

Extended semantics for databases

Reiter's formalization of databases (Reiter, 1984). Closed world assumption

Relations among a KDB, queries and integrity constraints

Expresive power of recursive definitions

Consistency checking versus intentional part of the KDB

Multivalued semantics

Contextualized semantics for ontologies or data

logical reasoning and automated theorem proving (Boskovitz, Goré, & Hegland, 2003).

On the other hand, new paradigms, such as SW, need new formalisms to reason about data. Description logics (DL) provide logic systems based on objects, concepts, and relationships, with which we can construct new concepts and relations for reasoning (Baader, Calvanese, McGuinnes, Nardi, & Patel-Schneider, 2003). Formally, DL are a subset of FOL, and the classical problems on consistency remains, but several sublogics of DL provide nice algorithms for reasoning services. The ontology Web language (OWL; its DL-sublanguage) is a description logic designed for automated reasoning, not only designed for the classical ask–tell paradigm. With languages such as OWL, ontologies exceed their traditional aspects (e.g., taxonomies and dictionaries) to be essential in frameworks as data integration.

The classical notion of inconsistency in databases mainly deals with the violation of integrity constraints. This notion must be expanded because of the new notion of logic databases in SW, in which ontologies and data both play the same role in knowledge management. Therefore, there are several sources of inconsistency (see Table 2). This role is not only limited to the database but also includes the verification and validation task of knowl- edge-based systems (Bench-Capon, 2001). Inconsistency arises in the initial steps of ontology building due to several reasons and not only by the updating of data. In general, the repair of a logic database involves the study

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Logic Databases and Inconsistency Handling

Table 2. A list of sources of inconsistencies from the practical knowledge management

Dirty data

Deficient maintenance of KDB

 

Some kinds of data dirtiness give rise to fail of

The kind of the selected method for preserving consistence

integrity constraints (Kim, Choi, Hong, Kim, & Lee,

is not robust under every sort of updates.

2003).

Wrong data mining

 

 

 

Neglected development of the intentional database

The output of data mining systems does not satisfy

The development of the intentional component part of

integrity constraints or ontology requirements. In

the database produces an inconsistent theory (the

multiagent data mining, the different outputs lead to a

ontology, in the SW paradigm. See, e.g., Backlawski,

problem of data integration.

 

Kokar, Waldinger, & Kogut, 2002).

Expressiveness clashes with model theory

 

Logical interpretation in data integration

The logical syntax/semantics (from deductive database

The design of a data integration system—to provide

paradigm) does not allow new features to be used in the

uniform access to multiple and heterogeneus

usual knowledge representation in the SW. This

information sources—needs of query reformulation,

absence implies messy definitions that may be

ontology mapping, or integration and, in general,

incorrect.

 

 

logical interpretation.

Bad design of the common knowledge shared by

 

Procedural incompleteness

different users

 

 

Incomplete query-answering algorithms do not

The intentional component part does not describe the

produce any witness for some integrity constraint of

users intended requirements. The logical consistent

existential character.

KDB does not fit with users’s beliefs. Thus, new

 

updates may produce inconsistencies.

 

Conflict information in data integration

Deficient ontology learning

 

Special case in data integration: The information that

 

is received from different consistent resources is

The ontology acquisition is a tedious task that the user

inconsistent.

tends to finish before he or she thinks as advisable. A

 

poor ontology associated to consistent data may

Deficient specification of the ontology language

produce inconsistency.

 

The specification of the language for ontology

Skolem Noise

 

 

representation is inconsistent (Fikes, McGuinnes, &

 

 

Waldinger, 2002).

A kind of dirty data produced by the use of an

 

automated theorem prover in data cleaning of logic

Inadequate data cleaning

databases

(Alonso-Jiménez,

Borrego-Díaz,

Chávez-González, Gutiérrez-Naranjo,

& Navarro,

Some criteria to take decisitions in data claning make

2003).

 

 

the KDB inconsistent.

 

 

 

 

 

 

 

 

 

L

of the soundness and perhaps completeness (i.e., the method output’s only correct solutions and all the relevant solutions). Semantics would support reasoning services such as self-consistency, checking the relations between concepts (as subsumption), and classification of objects according to the ontology.

Systems exist in which both paradigms, classical and SW logic databases, are conciliated under the extension of the former (see, e.g., Pan & Heflin, 2003). However, the relation between DL and database models may not be fruitfully formalized because of the limited expressiveness of the DL system selected to make the reasoning feasible (see chapter 4 in Baader et al., 2003).

INCONSISTENCY HANDLING

Solutions that are suggested to work in the presence of inconsistences can be classified according to different views (see Table 3, where several references appear). The first aspect—and maybe the most important—is the compatibility between the original semantics of the KDB source and the logical formalism selected to handle inconsistency. From this point of view there exist paraconsistent logics that limit the inference power of FOL to avoid nondesired anwers and also modal logics for representing different aspects of the information sources. These

approaches manage semantics that are essentialy different from the semantics of KDBs. On the other side we can find methods that classify, order, or both, interesting subsets according to the original semantics of the KDB, such as the argumentative approach or the integration of data by fusion rules, but they do not repair the KDB. Other methods also exist that propose how the KDB should be revised (e.g., integrity constraints of the extensional database). However, it is necessary to point out that the automated knowledge revision is an essentially different task in the case of ontologies, because the ontology source represents a key organisation of the knowledge of the owner and, as in every logical theory, minor changes may produce unexpected and dangerous anomalies.

Another point of view concerns the share of the KDB that is repaired when an anomaly is found. According to this, the methods based on arguments mentioned earlier can be used to repair only the anomalous argument. Due to the high complexity of consistency checking algorithms, to preserve consistency under updates is a better option than repairing. In the case of evolving ontologies, new systems such as KAON infrastructure are needed (for more information, see http://kaon.semanticweb.org/kaon).

There are methods dealing with the enforcement of consistent answers (i.e., answers that satisfies integrity constraints) from inconsistent databases; it is done by

337

TEAM LinG

Logic Databases and Inconsistency Handling

Table 3.List of recent solutions for inconsistency handling

Paraconsistent logics (Grant & Subrahmanian, 2000; Hunter, 1998)

Nonrepairing and merging-oriented techniques:

Preorders on information sets (Cantwell, 1998; Marquis & Porquet, 2003 in the paraconsistent framework)

Argumentative hierachy (Elvang-Goransson & Hunter, 1995), argumentative frameworks (Dung, 1995) and databases (Pradhan, 2003)

Fusion rules (Bloch et al., 2001)

Merging databases (Cholvy & Moral, 2001)

Contextualizing ontologies (Bouquet, Giunchiglia, van Harmelen, Serafini, & Stuckenschmidt, 2003) and data (MacGregor & Ko, 2003)

Measuring the anomalies:

Evaluating by means of a paraconsistent logic (Hunter, 2003)

Measuring inconsistent information (Knight, 2003)

Consistent interpretation of Skolem noise (Alonso et al., 2003)

Repairing techniques:

To apply knowledge reductions in inconsistent systems (Kryszkiewiccz, 2001)

Fellegi-Holt method (Boskovitz et al., 2003)

Database repairs by tableaux method (Bertossi & Schwind, 2004)

Consistent querying to repair databases (Greco & Zumpano, 2000)

Consistent enforcement of the database by means of greatest consistent specializations (Link, 2003)

Consistent answering techniques without reparation:

Transfomation of the query to obtain consistent answers (Celle & Bertossi, 1994)

Consistent query answer in the presence of inconsistent databases (Greco & Zumpano, 2000)

To use bounded paraconsistent inference (see, e.g., Marquis & Porquet, 2003)

Detecting the cause of the inconsistency and retrieving a subset of the original KB (Ariel & Avron, 1999)

Consistency preserving methods:

Consistency preserving updates in deductive databases (Mayol & Teniente, 2003)

transformating the self-query or by limiting the inference power of the system. Another method is to work in the context of data integration–merging (Levy, 2000). Fusion rules are the most direct treatment of simple information sets. The complex case, in which several ontologies comes into play, can be solved by contextualizing the knowledge. The contextualization of ontologies is an extension of the classical method introduced by McCarthy, and it has been used in important ontology projects such as CyC (for more information, see http://www.cyc.com). The use of contexts prevents inconsistences and it allows to build coherent subsets of the ontology target.

Finally, there exist measures to estimate inconsistency. Although these measures may be unfeasible by their semantic oriented definitions, this obstacle may be partially solved by weakening metrics that estimate the cognitive difference between the ontology source and the ontology target by using only syntactic features (see, e.g., Gutiérrez-Naranjo, Alonso-Jiménez, & Borrego-Díaz, 2002; Hunter, 2003).

FUTURE TRENDS

To handle inconsistences in the semantic web, future trends must study verification techniques based on sound, limited testing and aided by a powerful automated theorem prover (see Alonso-Jiménez et al., 2003; Boskovitz et al., 2003). These techniques need a deep analysis of the behaviour of automated theorem provers having a great autonomy, because a slanted behaviour may produce defficent reports about inconsistencies in the KDB.

CONCLUSION

Inconsistency handling has been a prevailing task in important fields as the semantic web, data integration, and data cleaning. Several techniques are proposed, but the need of working with very large databases makes some of them unfeasible, especially those that are applied on the full KDB.

338

TEAM LinG

Logic Databases and Inconsistency Handling

REFERENCES

Alonso-Jiménez, J. A., Borrego-Díaz, J., Chávez-González A., Gutiérrez-Naranjo M. A., & Navarro-Marín, J. D. (2003). Towards a practical argumentative reasoning in qualitative spatial databases. Proceedings of the 16th International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems (IEA/ AIE 2003), Lecture Notes in Computer Science, 2850 (pp. 789-798).

Ariel, O., & Avron, A. (1999). A model-theoretic approach for recovering consistent data from inconsistent knowledge bases. Journal of Automated Reasoning, 22(2), 263309.

Baader, F., Calvanese, D., McGuinness D., Nardi, D., & Patel-Schneider, P. (2003). The description logic handbook. Cambridge, UK: Cambridge University Press.

Baclawski, K., Kokar, M., Waldinger, R., & Kogut, P. (2002). Consistency checking of semantic web ontologies. Proceedings of the First International Semantic Web Conference 2002 (ISWC’02), Lecture Notes in Computer Science, 2342 (pp. 454-459).

Bench-Capon, T. (2001). The role of ontologies in the verification & validation of knowledge-based systems. International Journal of Artificial Intelligence, 16, 377-390.

Bertossi, L. E., & Schwind, C. (2004). Database repairs and analytic tableaux. Annals of Mathematics and Artificial Intelligence, 40(1/2), 5-35.

Bloch, I., Hunter, A., Appriou, A., Ayoun, A., Benferhat, S., Besnard, P., Cholvy, L., Cooke, R., Cuppens, F., Dubois, D., Fargier, H., Grabisch, M., Kruse, R., Lang, J., Moral, S., Prade, H., Saffiotti, A., Smets, P., Sossai, C. (2001). Fusion: General concepts and characteristics, International Journal of Intelligent Systems, 16(10), 1107-1134.

Boskovitz, A., Goré, R., & Hegland, M. (2003). A logical formalisation of the Fellegi-Holt method of data cleaning.

International Conference on Intelligent Data Analysis (IDA 2003), Lecture Notes in Computer Science, 2810

(pp. 554-565).

Bouquet, P., Giunchiglia F, van Harmelen F., Serafini, L., & Stuckenschmidt, H. (2003). C-OWL: Contextualizing ontologies. Proceedings of the 2nd International Semantic Web Conference 2003 (ISWC’03), Lecture Notes in Computer Science , 2870 (pp. 164-179).

Cantwell, J. (1998). Resolving conflicting information.

Journal of Logic, Language and Information, 7(2), 191-220.

Cholvy, L., & Moral, S. (2001). Mergin databases: Prob-

lems and examples. International Journal of Intelligent L Systems, Special Issue on Data and Knowledge Fusion, 16(10).

Celle, A., & Bertossi, L. (1994). Consistent data retrieval.

Information Systems, 19(4), 33-54.

Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2), 321-358.

Elvang-Goransson, M., & Hunter, A. (1995). Argumentative logics: Reasoning from classically inconsistent information. Data and Knowledge Engineering, 16(1), 125145.

Fikes, R., McGuinness, D. L., & Waldinger, R. (2002, January). A first-order logic semantics for semantic Web markup languages [Knowledge Systems Laboratory Tech. Rep. No. 02-01]. Retrieved September 16, 2004, from http://www.ksl.stanford.edu/KSL_ Abstracts/ KSL-02-01.html

Grant, J., & Subrahmanian, V. S. (2000). Applications of paraconsistency in data and knowledge bases. Synthese, 125, 121-132.

Greco S., & Zumpano, E. (2000) Querying inconsistent databases. Proceedings of the 7th International Conference of Logic for Programming and Automated Reasoning (LPAR 2000), Lecture Notes in Computer Science, 1955 (pp.308-325).

Gutiérrez-Naranjo,M.A.,Alonso-Jiménez,J.A.,&Borrego- Díaz, J. (2003). A quasimetric for machine learning. In F. J. Garijo, J. C. Riquelme, & M. Toro (Eds.), Advances in Artificial Intelligence (IBERAMIA 2002), Lecture Notes in Computer Science, 2527 (pp. 193-203).

Hunter, A. (1998). Paraconsistent logics. In D. Gabbay & Ph. Smets (Eds.), Handbook of defeasible reasoning and uncertain information (pp. 13-44). Dordrecht: Kluwer.

Hunter, A. (2003). Evaluating the significance of inconsistencies. Proceedings of the International Joint Conference on AI (IJCAI’03) (pp.468-473). San Francisco: Morgan Kaufmann.

Kim, W. Y., Choi, B.-J., Hong, E. K., Kim, S.-K., & Lee, D. (2003). A taxonomy of dirty data. Data Mining Knowledge Discovery, 7(1), 81-99.

Knight, K. M. (2003). Two information measures for inconsistent sets. Journal of Logic, Language and Information, 12(2), 227-248.

339

TEAM LinG

Соседние файлы в предмете Электротехника