Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
14
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

data cubes because data mining can be performed at multidimensional and multilevel abstraction space in a data cube. Cubing and mining functions can be interleaved and integrated to make data mining a highly interactive and interesting process. Here, we first examine what the desired OLAP mining functions (Han, 1997, 1998c) are:

1.Cubing then mining: With the availability of data cubes and cubing operations, mining can be performed on any layers and any portions of a data cube. This means that one can first perform cubing operations to select the portion of data and then set the granularity level before a data mining process starts.

2.Mining then cubing: This means that data mining can be first performed on a data cube and then particular mining results can be analyzed further by cubing operations. Then, for each obtained class, such as the high profit class, cubing operations can be performed (e.g., drill-down to detailed levels and examine its characteristics).

3.Cubing while mining: A flexible way to integrate mining and cubing operation is to perform similar mining operations at multiple granularities by initiating cubing operations during mining. By doing so, the same data mining operations can be performed on different portions of a cube or at different abstraction levels.

4.Backtracking: To facilitate interactive mining, a mining process should allow to backtrack one or a few steps or backtrack up to a preset mark and then explore alternative mining paths.

5.Comparative mining: A flexible data miner should allow comparative data mining, that is, the comparison of alternative data mining processes.

It is possible to have other combinations in OLAP mining; for example, it could be performed as “mining then mining,” such as to first perform classification on a set of data and then to find association patterns for each class (Han, 1997).

Online Data Mining

Table 3. A summary of questions for research and development

Complete integration with data warehouse, OLAP, and relational technology

Scalability: efficient algorithms, parallel/distributed and incremental mining

Ad-hoc mining query language and its optimization

Multiple, integrated data mining functions and methods

Mining on new kinds of data: time-series data, text, multimedia, spatial, and Web

Visual data mining and knowledge visualization

Application exploration

Interactive, exploratory data mining environment

mining, to enable the user evaluate and understand the quality of his or her discoveries.

CONCLUSION

Most data mining tools must work on integrated, consistent, and clean data. This requires costly preprocessing for data cleaning, data transformation, and data integration. Therefore, a data warehouse constructed by valuable source of high-quality data for OLAP and data mining. Data mining may also serve as a valuable tool for data cleaning and data integration.

Effective data mining requires exploratory data analysis. OLAM provides facilities for data mining on different subsets of data and at different levels of abstraction. It does this by drilling, pivoting, filtering, dicing, and slicing on a data cube and on intermediate data mining results. This, together with data–knowledge visualization tools, can greatly enhance the power and flexibility of exploratory data mining.

By integrating OLAP with multiple data mining functions, OLAM provides users with the flexibility to select desired data mining functions and dynamically swaps data mining tasks.

From our point of view, one important application area of online data mining is constituted by the scientific databases. Specially, those referenced to socioeconomical aspects, such as census, demographic databases and life forms.

FUTURE TRENDS

The future for research and development is based on Aggarwal (2001); Aref Walid, Elfeky Mohamed, and Elmagarmid Ahmed (2004); Boulicaut, Marcel, and Rigotti (2001); Dong et al. (2003); Han (1998b, 1999); Radivojevic, Cvetanovic, Milutinovic, and Sievert (2003); Schwarz (2002):

Aggarwal (2001) shows the benefits of the interactive visual approaches. These can be applied to online data

REFERENCES

Aggarwal, C. (2001). Interaction towards effective and interpretable data mining by Visual. ACM SIGMOD Exploration, 3(2), 11-22.

Aref Walid, G., Elfeky Mohamed, G., & Elmagarmid Ahmed, K. (2004). Incremental, online, and merge mining of partial periodic patterns in time-series databases.

IEEE Transactions on Knowledge and Data Engineering, 16(3), 332-342.

430

TEAM LinG

Online Data Mining

Boulicaut, J., Marcel, P., & Rigotti. C. (2001). Query driven knowledge discovery via OLAP manipulations. Retrieved March 2004 from http://lisi.insa-lyon.fr/~jfboulic/ bda01.ps

Dong, G., Han, J., Lakshmanan, L., Pei, J., Wang, H., & Yu, P. (2003). Online mining of changes from data streams: Research problems and preliminary results. Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams. In cooperation with the 2003 ACM-SIGMOD International Conference on Management of Data (SIGMOD’03), San Diego, California, June 8. Retrieved March 2004 from http:// www.cse.buffalo.edu/faculty/jianpei/publications/ minechange_mdps03.pdf

Fayyad, U., Piatetsky-Shiapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery and data mining. Menlo Park, CA: AAAI Press.

Han, J. (1997, October). OLAP mining: An integration of OLAP with data mining. Proceedings of the 1997 IFIP Conference on Data Semantics (DS-7), Leysin, Switzerland. Retrieved March 2004 from ftp:// ftp.fas.sfu.ca/pub/cs/han/kdd/olapm.ps

Han, J. (1998a, August). Data mining and data warehousing research in Simon Fraser University. Presentation at the IFIP W2.6 Workshop, New York. Retrieved from ftp://ftp.fas.sfu.ca/pub/cs/han/slides/ifip98.ppt

Han, J. (1998b, December). Towards on-line analytical mining: An overview of data warehousing and data mining. An Invited Talk in BC Hydro Conference, Vancouver, British Columbia, Canada. Retrieved November 2003 from ftp://ftp.fas.sfu.ca/pub/cs/han/slides/ bchydro98.ppt

Han, J. (1998c, December). Towards on-line analytical processing and data mining for electronic commerce.

Workshop on Technological Challenges on Electronic Commerce,CASCON’98(CASCON-98),Toronto,Canada. Retrieved March 2003 from ftp://ftp.fas.sfu.ca/pub/cs/han/ slides/ibmec98.ppt

Han, J. (1999, November). Why is data mining the next frontier of high performance computing? Panel discussion at Super Computing ’99, Portland, Oregon. Retrieved November 2003 from ftp://ftp.fas.sfu.ca/pub/cs/han/slides/ sc99.ppt

Han, J. (2000, January). From DBMiner to WebMiner: What is the future of data mining? Talk at Commerce, UBC, BC. Retrieved June 2003 from ftp://ftp.fas.sfu.ca/ pub/cs/han/slides/ubc00.ppt

Han, J., Chee Sonny, H. S., & Chiang Jenny, Y. (1998). Towards on-line analytical mining in large databases. O

Retrieved December 2000 from http://citeseer.ist.psu.edu/ c a c h e / p a p e r s / c s / 2 7 0 4 6 / h t t p : z S z z S z w w w - faculty.cs.uiuc.eduzSz~hanjzSzpdfzSzsum98.pdf/to- wards-on-line-analytical.pdf/

Han, J., Chee Sonny, H. S., & Chiang Jenny, Y. (1999). Issues for on-line analytical mining of data warehouses. Retrieved March 2004 from http://www.acm.org/sigs/ sigmod/disc/disc99/disc/dmkd/han.pdf

Han, J., Fu, Y., Wang, W., Chiang, J., Gong, W., Koperski, K., et al. (1996, August). DBMiner: A system for mining knowledge in large relational databases. Proceedings of the 1996 International Conference on Data Mining and Knowledge Discovery (KDD ’96), Portland, Oregon. Retrieved March 2004 from ftp://ftp.fas.sfu.ca/pub/ cs/han/slides/kdd96_slides.ppt

Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. San Franciso: Morgan Kaufmann.

Lenz, H., & Shoshani, A. (1997). Summarizability in OLAP and statistical data bases. Retrieved August 2002 from h t t p : / / w w w . l b l . g o v / ~ a r i e / p a p e r s / s u m m a r i z a b ility.SSDBM97.ps

Pei, J. (2003). A general model for online analytical processing of complex data. Proceedings of the 22nd International Conference on Conceptual Modeling (ER ’03), Chicago, IL, October 13-16. Retrieved April 2004 from http://www.cse.buffalo.edu/faculty/jianpei/publications/golap_er03.pdf

Pei, J., & Han, J. (2002). Constrained frequent pattern mining. ACM SIGMOD Exploration, 4(1), 32-39.

Radivojevic, Z., Cvetanovic, M., Milutinovic, V., & Sievert, J. (2003). Data mining: A brief overview and recent IPSI research. Annals of Mathematics, Computing & Teleinformatics, 1(1), 84-90.

Schwarz, H. (2002). Integration von data mining und online analytical processing: Eine analyse von datenschemata, systemarchitekturen und optimieru ngsstrategien. Doctoral dissertation, Institut für Parallele und Verteilte Systeme der Universität Stuttgart, Stuttgart, Germany. Retrieved December 2003 from http://elib.unistuttgart.de/opus/volltexte/2003/1447/pdf/schwarz.pdf

KEY TERMS

Bookmaking and Backtracking Techniques: The OLAM paradigm offers the user complete freedom to explore and discover knowledge by applying any se-

431

TEAM LinG

quence of data mining algorithms with data cube navigation. Often, a user has a choice of many alternatives when traversing from one data mining state to another. It would be useful if she or he can set bookmarks: If a discover path proves uninteresting, she or he can return to a previous state and explore other alternatives. Efficient support of such marking and backtracking mechanisms will protect users from being “lost in the OLAM space” (Han, 1997; Han et al., 1999, p. 4).

Constraint-Based Online Analytical Mining: Online analytical mining requires fast response to data mining requests whereas most data mining requests are query based, or constraint based. This requires mining not only be performed with a limited scope of data, confined by queries and constraints, but also adopt efficient, con- straint-based data mining algorithms. For example, many constraints involving set containments or aggregation functions can be pushed deeply into the association rule mining process. Such constraint-based mining should be explored in many other data mining tasks (Han et al., 1999). A good proposal of constrained frequent pattern mining can be found in Pei and Han (2002).

Cube: A data structure of aggregated values summarized for a combination of preselected categorical variables (e.g., number of items sold and their total cost for each time period, region, and product). This structure is required for high-speed analysis of the summaries that is done in online analytical processing (OLAP). Also called a multidimensional database, or MDDB.

Drill Down: The process of navigating from a toplevel view of overall sales down through the sales territories, to the individual salesperson level. This is a more intuitive way to obtain information at the detail level. Drill-

Online Data Mining

down levels depend on the granularity of the data in the data warehouse. Roll up is the opposite function.

Filters: Saved sets of chosen criteria that specify a subset of information in a data warehouse.

Granularity: The level of detail of the facts stored in a data warehouse, or a concept in the database.

Layer-Shared Mining with Data Cubes: Because each dimension in a data cube represents an organized layer of concepts, data mining can be performed by first examining the high levels of abstraction and then progressively deepening the mining process towards lower levels of abstraction. This will save the efforts of indiscriminative examination of all the concepts at low level. It is important to explore this optimization at mining other kinds of knowledge (Han et al., 1999).

Support of Online Analytical Mining by High-Perfor- mance Data Cube Technology: High-performance data cube technology is critical to online analytical mining in data warehouses. There have been many efficient data cube computation techniques developed in recent years that helps efficient construction of large data cubes. However, since a mining system may need to compute the relationships among a great number of dimensions or examine the fine details, but such data may not always be materialized beforehand, it will be necessary to dynamically compute portions of data cubes on the fly. Moreover, besides on-the-fly computation of query-based data cubes, the efficient computation of multifeatured data cubes, and the support of nontraditional data cubes with complex dimensions and measures, they are crucial to effective data mining. Therefore, further development of data cube technology will provide enhanced support to OLAM (Han, 1997; Han et al., 1999).

432

TEAM LinG

 

433

 

Ontological Assumptions in Information

 

 

 

O

Modeling

 

 

 

 

 

John M. Artz

The George Washington University, USA

INTRODUCTION

Information modeling is a technique by which a database designer develops a conceptual model of a database depicting the entity classes that will be represented in the database. There are three competing ontological assumptions that guide the modeling process. The broadest characterization of these assumptions is realism vs. conceptualism, with social realism occupying a middle ground. The realist believes that object classes exist in the real world, waiting to be discovered. The conceptualist believes that object classes are constructed in the mind of the modeler, based on observations about the application domain and the objectives of the information model. The social realist believes that classes exist as shared meanings among stakeholders in an application domain. This article explores these assumptions and then reviews selected literature in information modeling to determine which assumptions are held by key authors. It concludes that most authors hold inconsistent views, and this inconsistency provides some important insights into information modeling while presenting serious problems for practitioners and students of information modeling.

BACKGROUND

Perhaps one of the most perplexing problems in information modeling is the ontological status of entity classes. This raises the question, Do entity classes exist in the world, or are they constructed in the mind of the modeler? This seemingly esoteric question is important because the way in which one answers it has a significant impact on how one approaches the process of information modeling.

If entity classes exist in the world, independent of the mind of the observer, then the job of the information modeler is to discover those classes and record them in an information model. Hence, information modeling is a discovery process rather than a constructive process. If two modelers examine a domain and come up with different models, then one is wrong. One or the other (or possibly both) must bring their models into conformance with the real world. An information model can be

validated by ensuring its conformance with the real world. This is the realist position. To the realist, the challenges in information modeling are how to discover the existing entity classes and accurately represent those classes in an information model. Validation is not a problem for the realist because a model is valid if it correctly represents the real world. Realism can be detected when writers use the phrases “real world” or “as they exist in the real world” or, more subtly, when they refer to “natural classes” or “natural data relationships.”

If classes do not exist in the world, then it is the job of the information modeler to construct them. Thus, information modeling becomes a process of construction rather than discovery. The conceptualist position holds that classes exist only in the mind of the observer and are constructed according to objectives (probably not explicit) that guide the process of abstraction. The modeler selects certain facts from the application domain and constructs classes based on individual objects with similar attributes. Conceptualists believe that different models of an application domain cannot be determined to be correct or not correct. They can only be more or less useful for meeting the model objectives. This leads to problems in both construction and validation of the information model. Construction is difficult because most of the literature on information modeling focuses on description of entities rather than construction of them. The literature is strangely silent on how to construct a set of entity classes to meet a set of modeling objectives. Validation is also a problem because the model cannot be compared with entities existing in the real world, it can be evaluated only with respect to the objectives of the model, again an area in which the literature is strangely silent. The conceptualist position creates serious problems for information modelers because it requires that modeling objectives be defined before model construction, and it requires some method of evaluating a model with respect to a set of objectives. Conceptualism can be detected in the literature when authors talk about “abstraction,” “the problem to be solved,” “objectives,” or the possibility of “multiple representations” or “multiple models.”

An intermediate position is social realism, which assumes that entity classes exist as shared meanings within a social context. This is a realist position in that the classes exist independent of the mind of the modeler.

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Presumably if several modelers were to examine the same application domain they would eventually discover the same shared meanings and hence would produce the same information model. Validation is less problematic under the social realist position in that the model can be compared with the social reality, that is, the model agrees with what people in the application domain believe to be the entity classes or it does not. The social realist discovers the entity classes by talking with users and recording their usage of key words. Consensus is an important factor in the social realist approach because social reality is a shared understanding. If people do not agree on meanings, then the realism assumption breaks down because different modelers may very well come away with different understandings, depending on who they spoke to. Domain experts are important to the social realist position because the domain expert is the gatekeeper to the social reality. Social realism can be detected in the literature when authors talk about “language,” “shared meanings,” “domain or subject matter experts,” or modeling as a process of “consensus.”

Realism is a shaky assumption from a philosophical perspective, but desirable from a pragmatic perspective. If entity classes exist in the world, where do they reside? Although there are ample instances of an entity class, nobody has ever seen the class itself, nor will they. Entity classes exist only in the mind of the observer and have no real existence in the application domain. The modeler examines the application domain and, through a cognitive process of abstraction, derives a set of entity classes. Yet this process of abstraction is poorly understood and difficult to explain, so modelers act as though the classes actually exist in the world and are being discovered. From a pragmatic perspective, realism is a desirable assumption because it reduces the class construction process to one of simple discovery and provides an easy means of validation by requiring that the model simply conform to the real world.

Conceptualism is a much more justifiable position from a philosophical perspective, yet a nightmare in practice. Conceptualism recognizes the role of modeler and his or her cognition in the class construction process. Yet in practice it presents some severe problems. Since classes are constructed, how does the process of construction work? What criteria are used in class construction? Once classes are constructed, how do we know that the right classes have been constructed? One answer is that classes are constructed by grouping objects with similar attributes. But that position raises the question of whether or not attributes exist in the world and opens up, once again, the three positions just described with regard to attributes. Another answer might be to say that the classes are right if they meet the objectives of the model, but that answers the question by

Ontological Assumptions in Information Modeling

raising two more: How do we define modeling objectives and how do we determine if a set of classes meets those modeling objectives?

Most authors nod towards conceptualism, using terms such as problem solving or multiple models but back off when it comes to the actual process of modeling, where they will often fall back to a realist position by talking about modeling “the real world.” Recognizing the faultiness of the realist position, several authors have adopted an intermediate position of social realism. The more rigorous ones adopt social realism with respect to attributes, but the best that can be said is the literature is confusing and few authors have taken and articulated a consistent philosophical position.

ANALYSIS OF THE LITERATURE

The ontological assumptions made by practitioners are rarely articulated. They are more often manifest in their behavior. Practitioners may even claim to hold one belief while acting as though they held a conflicting view. Hence, in order to gain a sense of the variety of assumptions that are held in the field of information modeling, it is necessary to look at the recorded litera- ture—widely read texts and papers—to see what assumptions are being put forth.

Peter Chen’s (1976) original article on the entityrelationship model begins by establishing a clearly realist perspective: “The entity-relationship model adopts the more natural view that the real world consists of entities and relationships. It incorporates some of the important semantic information about the real world” (p. 9-10). This perspective is picked up by later authors. Andleigh and Gretzinger (1992) claimed “the Entity Model describes the real-world relations for the information system” (p. 383), while Teorey (1990) referred to the model as including “the natural data relationships,” again a strong indication of realism. Yet only a couple of paragraphs later in Chen’s paper, he refers to a conceptual data model as “information concerning entities and relationships which exist in our minds” (p. 10) and “conceptual objects in our minds” (p. 14), showing a clearly conceptualist perspective. In discussing whether a given object should be an entity or a relationship, Chen defers to the enterprise administrator, who should decide so that “the distinction is suitable for his environment” (p. 10), suggesting an objec- tives-driven conceptualist view or a social-realist view, depending on the meaning of the word suitable. He goes on a bit later to say, “If we know an entity is in the entity EMPLOYEE, then we know that it has the properties common to other entities in the entity set” (p. 11), suggesting extreme class realism. Since nothing is said, in the

434

TEAM LinG

Ontological Assumptions in Information Modeling

paper, regarding how to construct entities, the reader is forced to defer back to the realist position and use the real world to guide the discovery and validation processes.

Only two years after Chen’s (1976) original article, Kent (1978) thoroughly devastated the realism assumption in Data and Reality by raising question after question that could not be answered from the realist’s perspective. “There is no natural set of categories” he said. “The set of categories to be maintained in an information system must be specified for that system” (p. 13). He goes on to say, “If we really did want to define what a data base modeled, we’d have to start thinking in terms of mental reality rather than physical reality. Most things are in the data base because they ‘exist’ in people’s minds, without having any ‘objective’ existence.” This is about as clear of a statement of conceptualism as one can find. It is unfortunate that this work raised many, many more questions than it answered. This, coupled with the fact that it is difficult to follow, reduced its impact on the practice of modeling. Later, Kent (1986) adopted a more conservative, fact-based approach that trades entity realism for attribute realism and allows class construction based on entities with similar attributes.

Nijssen and Halpin (1989) also adopted a fact-based approach, but give the reader mixed signals with regard to their ontological assumptions. They recognize that classes are constructed, but see this construction as guided by facts gathered from the Universe of Discourse. The ontological status of the Universe of Discourse is unclear. They say, “Recall that the UoD is the portion of the (typically) real world relevant to our application” (p. 35), which suggests realism. Yet, they go on to say, “Recall that entities are the basic objects or things that we want to talk about” (p. 37), which suggests that the Universe of Discourse is a social construct based on language usage. This view is reinforced by the statement that “The UoD expert or domain expert is familiar with the application area, and can clarify any doubtful aspects of the UoD” (Nijssen & Halpin, 1989; p. 14). These claims of social realism are further supported by the validation process: “Our conceptual schema design procedure facilitates early detection of errors by various checking arrangements including ongoing feedback to the user by way of examples” (p. 199). Here the model is being validated against user opinions, which suggests social realism.

Conflicting ontological assumptions are not uncommon. Shlaer and Mellor (1988) reflect the same confusion. At one point they assert, “What is needed is a way to capture information so that it can be checked against the reality, rather that the different, and possibly inconsistent, ‘user views’ of reality” (p. 3), which reflects a realist view of information modeling and a criticism of social realism. Then, a few paragraphs later, they say, “What we need is a method by which we can lay out candidate definitions of

the conceptual entities and examine the implications of

those definitions” (p. 4), which reflects a conceptualist O view.

Flavin (1981) clearly stated that “the decomposition of the system of interest into its component objects is a function of the system, the observer, and their mutual interaction” (p. 38), which again reflects a conceptualist perspective. Though Flavin does provide implicit modeling objectives in terms of abstraction and classification, he does not discuss how to construct classes to meet a given set of information objectives. Instead, he describes functional, transaction, and scenario analysis as a means of discovering entities. Hence, the ontology is unclear.

But it was not until Klein and Hirschheim (1987) that the problem was grounded philosophically. Klein and Hirschheim examined both ontological and epistemological assumptions in information modeling. They identified two ontological poles that they referred to realism and nominalism. Nominalism and conceptualism are slightly different antirealist views. Nominalism asserts that things are what they are because we have named them that way. The collection of things that share a name need not have anything in common other than the fact that they share the same name. Conceptualism asserts that there are abstract mental entities, called concepts, which are abstracted from the particulars that we experience. The name that we apply to a thing is really the name of a concept of the thing.

The social realist view is problematic because, whether the social reality is stable or in a process of change, the entity classes that exist in usage may or may not be appropriately constructed to meet the objectives of the information model. Yet, as Hirschheim, Klein and Lyytinen say, “The potential of objects to model imaginary, ‘ideal’ or socially constructed worlds has not, unfortunately, been widely recognized in the data modeling literature” (1995, p. 62). Indeed, since conceptualism requires the construction of imaginary, possible worlds, it is scrupulously avoided by researchers in information modeling. Yet the question must be asked: Is it better to use faulty philosophical assumptions that are easier to work with, or sound philosophical assumptions that may create problems in practice?

Rumbaugh et al. (1991) come the closest of any of the nonphilosophers to correctly articulate a consistent set of assumptions. They claim that “in the real world an object simply exists,” which is a safe assumption for modeling purposes, since it refers to the independent existence of the instance rather than the class. They go on to say “Classification means that objects with the same data structure (attributes) and behavior (operations) are grouped together in a class” (p. 2). This is clearly a conceptualist perspective since the grouping is

435

TEAM LinG

a result of the process of classification. To emphasize the point further, they state that “a class is an abstraction that describes properties important to an application and ignores the rest. Any choice of classes is arbitrary and depends on the application.” This is a clear statement of the conceptualist position. In defense of the other authors, conceptualism is more obvious in pure objectoriented systems because the objectives of hierarchy and reusability are much clearer.

On a closing note, it is interesting that practitioners are sometimes aware of deeper philosophical issues, but their concern is often brief. Coad and Yourdon (1990), who published one of the first books on object-oriented analysis, sum up the practitioner’s scant concern:

As authors, it would be intellectually satisfying if we could report that we studied the philosophical ideas behind methods of organization, from Aristotle and Socrates to Descartes and Kant. Then, based on the underlying methods human beings use, we could propose the basic constructs essential to a requirements analysis method, and in particular to OOA [object-oriented analysis]. But in truth, we cannot say that, nor did we do it. (p. 16)

FUTURE TRENDS

It is unlikely that this confusion in ontological assumptions will just go away by itself. Until information modeling is put on a firm philosophical foundation, we will continue to see confusion among authors and inconsistencies within specific treatments of the topic.

CONCLUSION

The best that can be said regarding the ontological assumptions underlying information modeling is that there is a great deal of confusion. Few authors really hold the realist position, as can be seen in their constant references to conceptualist ideas. Yet abandoning the realists’ position creates serious methodological problems in discovery and validation. Some have adopted social realism in a retreat from realism. This position handles the discovery and validation problems but is only slightly more tenable than realism. Social realism assumes that the classes that exist in the shared meanings of the stakeholders of the application domain are exactly the classes needed to meet the objectives of the information model. With luck this may be true, but it is unlikely. First, socially constructed classes are more likely to meet the objective of intellectual economy than any processing or information objectives. Second, social realism assumes

Ontological Assumptions in Information Modeling

that the application domain is static and that classes which have been useful in discourse in the past will be useful for information derivation in the future. It seems that conceptualism is the only justifiable ontological position. Unfortunately, conceptualism requires that modeling objectives be explicitly articulated and that models be constructed to meet objectives and evaluated with respect to those objectives. There is much work to be done here, but since it appears that conceptualism is the only sound foundation for information modeling, it is probably time to get started.

REFERENCES

Andleigh, P., & Gertzinger, M. (1992). Distributed object-oriented data-systems design. Englewood Cliffs, NJ: Prentice Hall.

Artz,J.(1997).Acrashcourseinmetaphysicsforthedatabase designer. Journal of Database Management, 8(4).

Chen, P. (1976). The entity relationship model: Towards a unified view of data. Association of Computing Machinery Transactions on Database Systems, 1(1).

Coad, P., & Yourdon, E. (1990). Object-oriented analysis. Englewood Cliffs, NJ: Prentice Hall.

Flavin, M. (1981). Fundamental concepts of information modeling. New York: Yourdon Press.

Hirschheim, R., & Klein, H. K. (1989). Four paradigms of information systems development. Communications of the ACM, 32(10), 1199-1216.

Hirschheim, R., Klein, H. K., & Lyytinen, K. (1995). Information systems development and data modeling: Conceptual and philosophical foundations. Cambridge, UK: Cambridge University Press.

Kent, W. (1978). Data and reality. New York: North Holland.

Kent, W. (1979). Limitations of record-based information models. ACM Transactions on Database Systems, 4(1) 107-131.

Kent, W. (1986). The realities of data: Basic properties of data reconsidered. In T.B. Steele, Jr., & R. Meersman (Eds.), Database semantics (DS-1). Elsevier Science.

Klein, H. K., & Hirschheim, R. A. (1987). A comparative framework of data modelling paradigms and approaches.

The Computer Journal, 30(1), 8-15.

Nijssen, G. M., & Halpin, T. A. (1989). Conceptual schema and relational database design: A fact oriented approach. New York: Prentice Hall.

436

TEAM LinG

Ontological Assumptions in Information Modeling

Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., & Lorensen, W. (1991). Object-oriented modeling and design. Englewood Cliffs, NJ: Prentice Hall.

Shlaer, S., & Mellor, S. J. (1988) Object-oriented systems analysis: Modeling the world in data. Englewood Cliffs, NJ: Yourdon Press.

Teorey, T. J. (1990). Database modeling and design: The entity-relationship approach. San Mateo, CA: Morgan Kaufman.

Veryard, R. (1984). Pragmatic data analysis. Oxford: Blackwell Scientific Publications.

Veryard, R. (1992) Information modeling. New York: Prentice-Hall.

Woozley, A.D. (1967) Universals. In P. Edwards (Ed.),

Encyclopedia of Philosophy (Vol. 8, pp. 194-206).

KEY TERMS

O

Attribute Realism: An ontological position that the properties of entities exist in the world independent of their being perceived by the modeler.

Conceptualism: An ontological position that entity classes exist only in the mind of the modeler.

Entity Realism: An ontological position that entity classes exist in the world independent of their being perceived by the modeler.

Nominalism: An ontological position that things are what they are because we have named them that way, and classes are formed of objects with similar names.

Ontology: A branch of philosophy that attempts to determine the structure of reality. It can also refer to a classification system that organizes particular things or phenomena.

Social Realism: An ontological position that entity classes exist in the shared meanings of a social group.

Universe of Discourse: The application domain of an information model viewed from a semantic perspective.

437

TEAM LinG

438

Ontologies and Their Practical Implementation

Gian Piero Zarri

University of Paris IV/Sorbonne, France

INTRODUCTION:

“ONTOLOGIES”AND“TAXONOMIES”

Starting from the ’90s, ontologies have emerged as an important research topic investigated by several research communities (including the database community) and used especially in defining standards for data exchange, information integration, and interoperability. The word “ontology” comes from medieval philosophy, where it was used to talk about the existence of beings in the world (Guarino & Giaretta, 1995). According to its modern com- puter-science technical meaning (Gruber, 1993), a consensus definition says that, “Ontologies represent a formal and explicit specification of a shared conceptualization,” (p. 199) where:

Conceptualization refers to an abstract model of some phenomenon/situation in the world, where the model results by the identification of the relevant concepts that characterize this particular phenomenon/situation. To avoid any hype, “concepts” can be simply understood here as the discrete, important notions that must be necessarily utilized to describe the phenomenon/situation under consideration.

Explicit means that the type of concepts used and the constraints on their use are explicitly defined.

Formal refers to the fact that the ontology should be machine-usable.

Shared reflects the notion that an ontology captures consensual knowledge, that is, this knowledge is not private to some individual but must be accepted by a group.

A definition like this needs, however, some further discussion. Apart from the requirement of being usable on a computer, there is nothing in the previous characterization of an ontology that, for example, could not be applied also to a “taxonomy” in the classical Linnaean meaning: It is obvious, in fact, that Linnaeus’ classifications for biology were intended to give an exhaustive definition of some phenomena/situations and that they were intended to be explicit and shared. And surely a “taxon” for a notion like “mammal” is not so different from a “concept” for the

same notion. Moreover, both the concepts and the taxa are organized into a hierarchy that takes the form of a tree—of a DAG (direct acyclic graph) if multi-inheritance (see the next section) is admitted. For a (pragmatic) distinction between “taxonomies” and “ontologies,” we must then rely on the “scope” of the definitions associated with the taxa/concepts.

Speaking for simplicity’s sake, from this time onward, of concepts, in a taxonomy (and in the most simple types of ontologies) the implicit definition of a concept derives simply by the fact of being inserted in a network of specific/generic relationships with the other concepts of the taxonomy/hierarchy. This means that a concept like company_ is defined by the fact of being, at the same time, a specific term of a higher order concept like social_body

(company_ is subsumed by social_body) and a generic term with respect to a specialized concept like computer_company (company_ subsumes computer_company).

To get now a “real” ontology, we must supply also some explicit definitions for the concepts—or at least for a majority among them. This can be obtained, e.g., by associating a “frame” (a set of properties/attributes with associated classes of admitted values; see the next sections) with these concepts. For example, if we consider that properties useful to better specify the concept company_ could be, among other things, DateOfCreation and DomainOfActivity, we will associate such properties (slots) with the concept. We will also impose, at the same time, that when specific examples (“instances”) of the concept company_ will be created, the slot DateOfCreation will only be filled by instances of another concept of the hierarchy, like date_, and the DomainOfActivity slot with instances of a second concept, like market_sector. A definition mechanism like this is totally extraneous to a classical Linnaean taxonomy.

In the next sections, we will outline the main principles of the “classic” ontology’s theory as it has been developed in the artificial intelligence domain. A companion article of this encyclopedia, “Using Semantic Web Tools for Ontologies Construction” deals, on the contrary, with the new developments originated by the use of ontologies as the basic knowledge representation tool in a Semantic Web context.

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Ontologies and Their Practical Implementation

BACKGROUND

Inheritance Hierarchies

In this subsection, we will deal with the general “architectural” issues related to the construction of well-formed hierarchies of concepts—both ontologies and taxonomies. Ontologies/taxonomies are structured as “inheritance” hierarchies, making use of the well-known IsA link—called also AKindOf (Ako), SuperC, etc.; see Figure 1. A relatively unchallenged—see, however, Brachman (1983)—semantic interpretation of IsA states that this relationship among concepts, when noted as (IsA B A), means that concept B is a specialization of the more general concept A. In other terms, A subsumes B. This assertion can be expressed in logical form as:

x(B(x) → A(x))

(1)

(1) says that if any elephant_ (B) IsA mammal_ (A), and if clyde_ is an elephant_, then clyde_ is also a mammal_. In this section, we will adopt the convention of writing down the concepts_ in italics and their instances_ (e.g., clyde_, an “individual”) in roman characters. When (1) is interpreted strictly, it also implies that a given concept B and all its instances must inherit all the features (properties) and their values of all the concepts Ci in the hierarchy that have B as a specialization; we speak in this case of “strict inheritance.” Note that, under the strict inheritance hypothesis, totally new properties can be added to B to differentiate it (specialize it) with respect to its parents.

Relation IsA is transitive: This means that, e.g., having both x(C(x) → B(x)) and x(B(x) →A(x)), we can deduce from this that x(C(x) →A(x)). This property is particularly important because it allows, in an inheritance hierarchy like that of Figure 1, one to represent explicitly only the IsA relationships that associate directly two nodes (i.e., without the presence of intermediary nodes). All the residual IsA relationships are then explicitly derived only when needed: E.g., from Figure 1 and from the transitive property of IsA, we can explicitly assert that (IsA chow_ mammal_).

The necessary complement of IsA for the construction of well-formed hierarchies concerns some form of InstanceOf link, used to introduced the “instances” (concrete examples) of the general notions represented by the concepts. The difference between (IsA B A) and (InstanceOf C B) is normally explained in terms of the difference between the two options of (i) considering B as a subclass of A in the first case, operator “ ,” and (ii) considering C as a member of the class B in the second, operator “ .” Unfortunately, this is not sufficient to eliminate any ambiguity about the notion of instance,

which is much more controversial than that of concept. Problems about the definition of instances concern, e.g., O

(i) the possibility of accepting that concepts (to the exclusion of the root) could also be considered as “instances” of their generic concepts and (ii) the possibility of admitting several levels of instances, i.e., instances of an instance. For a discussion about these problems and the possible solutions, see Bertino, Catania, and Zarri (2001, p. 138).

The precise definition of the meaning of InstanceOf is not the only problem that affects the construction and use of inheritance hierarchies, especially when the inheritance considered is more a “behavioral” than a “structural” one, i.e., is more interested in the actual behavior and meaning of the properties inherited than in the pure mechanical aspects of the propagation. From this point of view, we have to face two main problems: “overriding” (or “defeasible inheritance,” or “inheritance with exceptions”) and “multiple inheritance”; we will discuss overriding in some depth.

Overriding consists in the possibility of admitting exceptions to the “strict inheritance” interpretation of (1). Let us consider this group of assertions:

a.Elephants are grey, except for royal elephants.

b.Royal elephants are white.

c.All the royal elephants are elephants.

Assertion (c) introduces a new concept, royal_elephant, as a specialization of elephant_ of Figure 1. If now (InstanceOf clyde_ royal_elephant), the strict inheritance law would lead us to conclude that the property (slot; see the next subsection) ColourOf of clyde_ is “filled” with the value gray_, but from (a) and (b) we know that the correct filler is instead white_. This means that royal_elephant has an “overriding property,”

Figure 1. A simple inheritance hierarchy

439

TEAM LinG

Соседние файлы в предмете Электротехника