Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
11
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

Musto, A., Polese, G., Costagliola, G., & Tortora, G. (2000). Automatic generation of RDBMS based applications from object oriented design schemes. Proceedings of SAC 2000, (pp. 398-402).

OMG (Object Management Group). (2003). UML specification version 1.5. Retrieved February 3, 2005, from from http://www.omg.org/technology/documents/formal/ uml.htm

Persistence Software. (1993). Persistence technical overview. San Marco, CA: Author.

Poet Software. (1993). POET programmer’s and reference guide. Hamburg, Germany: Author.

Polese, G., Tortora, G., & Pannella, A. (2000). An extended relational data model for multimedia databases. Journal of Visual Languages and Computing, 11(6), 663-685.

KEY TERMS

Access Table: A table listing the transactions to be implemented with an application. For each transaction, it shows the classes from the Cclass Ddiagram it needs to visit, including the number and types of accesses (Read, Write) in order to collect the data necessary for composing the final result.

Object Modeling of RDBMS Based Applications

Extended RDBMS Schema: A database schema specified according to the relational data model extended with new data types. RDBMSs allowing the extension of the basic Type System are those implementing the concept of Universal Server. Examples are Oracle, Informix, DB2, and Illustra.

OID: An object’’s identifier (OID) is an unchangeable value that uniquely identifies that object and distinguishes it from all other objects. It is separate from the state and invisible to the user. OIDs can be used to represent associations between objects.

OMAR: Object Modeling of Relational Applications. A software engineering methodology for the object modeling of applications to be implemented over an RDBMS.

PC++: Persistent C++ extends C++ language by providing persistence facilities.

Visual Language: Language characterized by a set of visual sentences, each composed of icons spatially arranged over a twodimensional space.

Visual Language Compiler: A software tool capable of performing a twodimensional parsing of an input visual sentence, deciding whether the sentence belongs to a given visual language. After parsing, it translates successfully parsed visual sentences into sentences of a target language, which can be visual or textual.

VLCC: Visual Language Compiler Compiler.

420

TEAM LinG

 

421

 

Object-Relational Modeling in the UML

 

 

 

O

 

 

 

 

 

Jaroslav Zendulka

Brno University of Technology, Czech Republic

INTRODUCTION

Modeling techniques play an important role in the development of database applications. One of the trends in current database management systems is that they become object-relational (Stonebraker & Brown, 1999). The most recent version of the SQL standard, SQL:1999, includes object-relational features, and a number of leading companies have already released packages that incorporate them.

Well-known modeling techniques for relational databases, such as entity-relational diagrams, do not support important features of object-relational databases. In addition, the development of a database application involves a close working relationship between the software and database developers. Software developers deal with object-oriented software development and use ob- ject-oriented modeling techniques, such as a logical class model, to represent the main view of the application, whereas database developers model, design, build, and optimize the database. The most successful projects are marked by a shared vision and clear communication of project details (IBM, 2001). A common modeling language and supporting development tools can provide good conditions for it.

The Unified Modeling Language (UML) was adopted as an Object Management Group (OMG) standard for object modeling in 1997. Since that time, it has become popular and widely used and provides several types of diagrams that visualize a system from different perspectives. From database-design point of view, a class diagram is the most important diagram. It shows a set of structural elements and their static relationships. This model can be used not only as documentation, but also for data definition language (DDL) statements generation. If we want to employ the UML as a modeling language for development of a database application where persistent data is stored in an object-relational database, it is necessary to add the ability to model features of these kinds of databases in an effective and intelligible way, and the UML provides a proper extension mechanism for it. Modeling of an object-relational database schema will be called object-relational modeling in this article.

If we accept three perspectives for drawing class diagrams (Fowler, 2003)—conceptual, specification, and implementation—the need for object-relational

modeling is only for the implementation, and possibly specification, levels. A conceptual model should be developed with little or no regard to implementation and a target database environment.

BACKGROUND

The UML provides three extensibility mechanisms that make it possible to extend the language in controlled ways (Booch, Rumbaugh, & Jacobson, 1998; Object Management Group [OMG], 2003):

stereotypes,

tagged values, and

constraints.

A stereotype extends a vocabulary of the UML. It allows the introduction of a new model element derived from one existing in the UML metamodel. A tagged value extends the properties of the UML’s model elements. It is a keyword-value pair element that may be attached to any kind of a model element. The keyword is called a tag. A constraint extends the semantics of a model block by means of specifying conditions and propositions that must be maintained as true, otherwise the system described by the model is invalid. There are some standard stereotypes, tagged values, and constraints predefined in the UML. One of them is a stereotype <<Table>>, which is a stereotype of the UML class element.

The main purpose of the extendability mechanisms is to tailor the UML to the specific needs of a given application domain or target environment. It makes it possible to develop a predefined set of stereotypes, tagged values, and constraints, and notation icons that collectively specialize and tailor the UML for specific domain or process. Such a set is called a profile. Several profiles has already been accepted by OMG as standard profiles, but none of them is for data or object-rela- tional modeling.

Several works that propose extensions of the UML for data and object-relational modeling has been presented. Most of the extensions have been proposed not for SQL:1999 but for Oracle8, because this database management system (DBMS) had provided object extensions

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

before SQL:1999 was published. Probably the most important proposal has been developed and implemented by Rational Software Corporation in their Rational Rose product, which is one of the best-known UML-oriented modeling tools. It provides support not only for objectoriented database modeling but also for relational database modeling (Rational Software Corp,. 2001) and Oracle8 object-relational modeling. The Rose Oracle8 tool permits both forward and backward engineering of Oracle8 ob- ject-relational schemas.

Marcos, Vela, and Cavero (2001, 2003) proposed several stereotypes, tagged values, and constraints for the modeling of structured types, typed tables, ARRAY type, REF type, two types of methods in the SQL:1999, and for modeling of similar elements in Oracle8i.

The approach presented in this article is based on extensions for Oracle8 by Rational Software, but it can be used for SQL:1999, too. Examples are drawn in Rational Rose, with the Rose Oracle8 tool.

OBJECT-RELATIONAL MODEL IN ORACLE8 AND SQL:1999

Both the SQL:1999 and the SQL dialect of Oracle8 (and more recent releases) extend the relational model in several important directions. Only those modeling features that are presented in this article are summarized. More on new features of the SQL:1999 can be found in Melton & Simon (2001) or in the standard specification (Database Language SQL, 1999). More information on the Oracle object-relational model is available in Oracle documentation (Oracle, 2003a, 2003b).

First, both Oracle8 and SQL:1999 relinquished the basic demand on the relation in the relational model— to be in the 1NF. The user can define user-defined data types. In Oracle, there are two types of user-defined data types: object types and collection types.

An object data type is an abstraction of a real-word entity, the representation of which will be stored in a database. An object type is a schema object with a name, set of attributes and methods. Each attribute is of either a built-in scalar data type or a user-defined data type. This allows for the defining object types with a complex data structure.

Methods of an object data type implement operations with the data type. Every object type has a systemdefined constructor method.

An object data type is a template for objects. Objects can be instantiated by the constructor method of a given object type and stored in object tables. An object table can be viewed either as a single column table of row objects or as a multicolumn table. Each row object has

Object-Relational Modeling in the UML

assigned a unique object identifier (OID). It can be sys- tem-generated or primary-key based. Oracle provides a built-in data type called REF to encapsulate references to row objects. This type can be used to implement links between objects.

There are two collection types available: array types and table types.

Both collection types are sets of data elements of the same type, but there are important differences between them. An array type (called VARRAY) is an ordered and bounded set, whereas a table type (called a nested table) is unordered and unbounded. In addition, a VARRAY data value is stored and retrieved as one data unit, whereas a nested table value is stored in a storage table with every element of the collection mapped into a row of the storage table.

Another object extension concerns views. Just as a relational view is a virtual table, an object view is a virtual object table. Using object views, it is possible to create virtual object tables with columns of both builtin and user-defined data types mapped to columns of relational or object base tables. Object views provide the ability to offer specialized or restricted access to data stored in relational and object tables. In addition, they provide the ability to view relational data as objects. The object view definition contains information about the object type of the view objects, the way of constructing OID, and the mapping SELECT statement.

The Oracle8 object-relational model can be described as a metamodel in the UML (Zendulka, 2001). Although most of Oracle8 extensions are included in SQL:1999, there are some differences. First, the terminology of SQL:1999 differs from that of Oracle8 (e.g., in SQL:1999, object data types are called structured types and object tables/views are referred to as referenceable tables/views). In addition, only array as a collection type is available in SQL:1999. On the other hand, Oracle8 does not support inheritance (more recent versions do) whereas SQL:1999 does.

EXTENSION OF THE UML

The mapping used in Rational Rose Oracle8 can be perceived as a profile definition for object-relational modeling with Oracle8 as a target DBMS. The profile introduces several stereotypes, some constraints in a form of conventions, and tagged values. The tagged values have a form of schema-generation properties attached to a project, class, operation, and attribute. They contain such values as, for example, a WHERE clause of a view definition, Stereotypes of the profile are listed in Table 1.

.

422

TEAM LinG

Object-Relational Modeling in the UML

Table 1. Stereotypes for the Oracle8 profile

Stereotype

UML model element

<<ObjectType>>

Class

<<ObjectView>>

Class

<<ObjectTable>>

Class

<<NestedTable>>

Class

<<RelationalTable>>

Class

<<RelationalView>>

Class

<<VARRAY>>

Class

<<ObjectView>>

Dependency

Figure 2. Modeling of a nested object type and an

array

 

 

 

 

 

O

 

 

 

 

 

 

 

 

 

 

 

 

 

 

<<ObjectType>>

 

 

 

 

 

 

 

<<ObjectType>>

 

 

 

 

ADDRESS_T

 

 

 

 

 

 

ADDRESS

CUSTOMER_T

PHONE

<<VARRAY>>

 

STREET : VARCHAR2

 

CUSTNO : NUMBER

 

PHONE_V

 

 

CITY : VARCHAR2

 

 

 

ZIP : VARCHAR2

 

NAME : VARCHAR2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Relationships between schema elements are modeled as associations, aggregations, and dependencies. Dependencies are used primarily to express relationships between data types and structures that store values of these types.

A simple purchase order example shows how the basic object extensions of Oracle8 are modeled in the profile. A conceptual class diagram of the example is displayed in Figure 1. Only one operation (sumLineItems() of the Order class) is shown in the diagram. The Order and Customer classes contain a complex attribute of an Address type consisting of street, city, and zip attributes.

Object Types

Object types are modeled by a stereotype <<ObjectType>>, which is a stereotype of a UML class element. Attributes of a given object type are modeled as attributes, methods as operations.

sponding attribute. Association in the profile implies a relationship by value.

VARRAYs

A VARRAY collection type is modeled in the UML by a stereotype VARRAY, which is a stereotype of a UML class element. All elements of the array must be of the same type, either scalar or object one. If the array is based on an object type, the relationship is modeled as dependency of the VARRAY type on a given object type. A VARRAY can be used as an attribute of an object type. Association, as for nested object types, models this.

Assume that the Customer class from the example will be implemented by means of an object type

CUSTOMER_T with a nested type ADDRESS_T and a

VARRAY type attribute for a fixed-size array of phone numbers. The corresponding model in the UML is in Figure 2.

Nested Object Types

An object type can contain an attribute of another object type. This is modeled as a unidirectional association from the outer object type to the nested object type. A role name of the nested type is the name of the corre-

Figure 1. A conceptual class diagram of a purchase order example

 

Order

 

 

 

 

 

 

Customer

orderNo : Integer

 

 

 

places

 

custNo : Integer

orderDate : Date

 

 

name : String

 

 

shipDate : Date

 

 

1..*

1

address : Address

shipAddress : Address

 

 

phone[3] : Integer

 

 

 

 

 

 

sumLineItems()

 

 

 

 

 

 

 

1..*

 

 

 

 

 

 

 

 

 

 

StockItem

 

LineItem

 

 

 

 

 

 

 

stockNo : Integer

 

itemNo : Integer

 

 

 

 

 

 

 

price : Double

 

quantity : Integer

0..*

1

 

taxRate : Double

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Nested Tables

A nested table is modeled as a class with a stereotype <<NestedTable>>. If the nested table is based on an object type, the relationship is modeled as dependency of the nested table on a given object type, similarly as in VARRAY modeling. A nested table can be used as an attribute of an object type. Association, as for nested object types and VARRAYs, models this.

Let us assume that the Order class from our example will be implemented by means of an object type ORDER_T with a nested type ADDRESS_T and a nested table containing all items of a given order (of a

Figure 3. Modeling of a nested table

<<ObjectType>>

 

<<ObjectType>>

 

 

<<NestedTable>>

 

 

ORDER_T

ITEMS

LINEITEMS_N

 

ADDRESS_T

 

 

STREET : VARCHAR2

SHIPADDRESS

ORDERNO : NUMBER

 

 

 

 

CITY : VARCHAR2

 

ORDERDATE : DATE

 

 

 

 

 

SHIPDATE : DATE

 

 

 

 

ZIP : VARCHAR2

 

 

 

 

 

 

 

 

 

<<ObjectType>>

 

 

SUMLINEITEMS()

 

 

LINEITEM_T

 

 

 

 

ITEMNO: NUMBER

 

 

 

 

QUANTITY : NUMBER

 

 

 

 

 

 

 

423

TEAM LinG

Figure 4. Modeling of a REF type

 

 

<<ObjectType>>

<<ObjectType>>

 

ORDER_T

CUSTOMER_T

CUSTOMER

ORDERNO : NUMBER

CUSTNO : NUMBER

 

ORDERDATE : DATE

NAME : VARCHAR2

 

SHIPDATE : DATE

 

 

 

 

 

SUMLINEITEMS()

 

 

 

 

 

LINEITEM_T type). The corresponding model in the UML is in Figure 3.

REFs

A built-in Oracle8 REF type implements a reference to an object. The reference is modeled by aggregation. In the profile, aggregation implies “by reference” containment. Figure 4 shows that the ORDER_T type contains an attribute CUSTOMER of the REF type, which references an object of the CUSTOMER_T type.

Object Tables

An object table is modeled as a class with a stereotype <<ObjectTable>>. Because object tables are built from underlying object types, this relationship is modeled as dependency. Figure 5 introduces an object table, CUSTOMER.

Relational Tables

A relational table is modeled as a class with a stereotype <<RelationalTable>>. Attributes of the table that are of an object type, VARRAY, nested table, and REF are modeled in the same way as for object types.

Object Views

An object view is modeled as a class with a stereotype <<ObjectView>>. Links between source tables (object or relational) and the object view are modeled as dependencies. The relationship with the underlying object type is modeled as a dependency with a stereotype <<ObjectView>>.

Figure 5. Modeling of an object table

 

 

<<ObjectType>>

<<ObjectTable>>

 

CUSTOMER_T

CUSTOMER

 

CUSTNO : NUMBER

 

 

NAME : VARCHAR2

 

 

 

 

 

 

 

 

Object-Relational Modeling in the UML

Figure 6. Modeling of an object view

<<ObjectType>> SIMPLE_CUSTOMER_T

CUSTNO : NUMBER NAME : VARCHAR2 STREET : VARCHAR2

<<ObjectView>>

<<RelationalTable>> RCUSTOMER

CUSTNO : NUMBER NAME : VARCHAR2 STREET : VARCHAR2 CITY : VARCHAR2 ZIP : VARCHAR2 PHONE1 : NUMBER PHONE2 : NUMBER PHONE3 : NUMBER

<<ObjectView>> CUSTOMERS_FROM_BRNO

CUSTNO : NUMBER = RCUSTOMER.CUSTNO

NAME : VARCHAR2 = RCUSTOMER.NAME

STREET : VARCHAR2 = RCUSTOMER.STREET

Figure 6 shows an example of an object view

CUSTOMERS_FROM_BRNO with objects of a

SIMPLE_CUSTOMER_T type. The view is defined on a relational table RCUSTOMER. The figure also shows how the columns of the source relational table are mapped on columns of the object type. It is one of conventions (constraints) defined in the profile. In addition, the condition from the WHERE clause of the view definition SELECT statement is stored in the WhereClause property (a tagged value) of the corresponding <<ObjectView>> class.

Methods and Triggers

Methods of object types and database triggers are modeled as operations of classes with some conventions.

FUTURE TRENDS

We can expect that object-relational modeling will become more common in the near future, because recent releases of DBMSs are more object-relational and ob- ject-oriented, and comply more with SQL:1999. This trend should also encourage the development of methods and tools for object-relational database design. Unfortunately, only a few such methods have been published (Mok & Paper, 2001). On the contrary, methods for a relational schema model generation exist and are implemented in computer-aided software engineering (CASE) tools (Rational Software Corp., 2001).

CONCLUSION

The objective of this short paper was to introduce a UML profile for object-relational modeling. We focused on only fundamental features of the profile. Many

424

TEAM LinG

Object-Relational Modeling in the UML

details, which are important for real projects, were omitted here (e.g., constraints and other model properties associated with the introduced elements). The profile can be used for modeling during both forward and reverse engineering.

REFERENCES

Booch, G., Rumbaugh, J., & Jacobson, I. (1998). The unified modelling language user guide. Addison-Wesley Longman.

Database Language SQL–Part 2: Foundation. (1999). ISO/ IEC 9075-2:1999 standard. International Committee for Information Technology Standards, Washington, DC.

Fowler, M. (2003). UML distilled (3rd Ed.). AddisonWesley Longman.

IBM. (2001). Mapping objects to data models with the UML [White paper]. Somers, NY: IBM Corporation Software Group. Retrieved December 10, 2003, from http:// www3.software.ibm.com/ibmdl/pub/software/rational/ web/whitepapers/2003/tp185.pdf

Marcos, E., Vela, B., & Cavero J. M (2001). Extending UML for object-relational database design, UML 2001. In M. Gogolla & C. Kobryn (Eds.), LCNS 2185 (pp. 225-239). Heidelberg, Germany: Springer-Verlag.

Marcos, E., Vela, B., & Cavero, J. M. (2003). Amethological approach for object-relational database design using UML. Software and Systems Modeling, 1(2), 59-72.

Melton, J., & Simon, A. (2001). SQL: 1999. Understanding relational language components. Morgan Kaufmann.

Mok, W. Y., & Paper, D. P. (2001). On transformations from UML models to object-relational databases. Proceedings of the 34th Havaii International Conference on System Sciences. IEEE. Retrieved December 10, 2003, from http:/ /csdl.computer.org/comp/proceedings/hicss/2001/ 0981/03/09813046.pdf

Object Management Group. (2003). Unified Modeling Language Specification (Version 1.5) [Language specification].Retrieved from http://www.omg.org/technology/documents/formal/uml.htm

Oracle Corporation. (2003a). Oracle database application developer’s guide: Object-relational features. 10g release 1 [Computer software manual]. Retrieved from http:// o t n . o r a c l e . c o m / p l s / d b 1 0 g / p o r t a l . p o r tal_demo3?selected=3

Oracle Corporation. (2003b). Oracle database. Concepts. 10g release 1 [Computer software manual]. Retrieved from

h t t p : / / o t n . o r a c l e . c o m / p l s / d b 1 0 g /

 

O

portal.portal_demo3?selected=3

Rational Software Corp. (2000). Using Rose Oracle8.

 

 

Santa Clara, CA: Rational Software Corp.

 

Rational Software Corp. (2001). Using Rose data modeler.

 

Santa Clara, CA: Rational Software Corp. Retrieved De-

 

cember 10, 2003, from ftp://ftp.software.ibm.com/software/

 

rational/docs/documentation/manuals/rose.jsp

 

Stonebraker, M., & Brown, P. (1999). Object-relational

 

DBMSs: Tracking the next great wave. Morgan Kauffman.

 

Zendulka, J. (2001). Object-relational modeling in UML.

 

Proceedings of the Conference Information Systems

 

Modelling (pp. 17-24).

 

KEY TERMS

 

Collection Type: A composite value comprising

 

elements of some data types. SQL:1999 supports ar-

 

rays, which are ordered and unbounded sets of elements.

 

Another collection type, not supported by the standard,

 

is a nested table.

 

Object-Relational Data Model: Extends the rela-

 

tional data model by providing a richer type system,

 

including complex data types and object orientation.

 

Object-Relational Modeling: Modeling of an ob-

 

ject-relational database schema. It requires using model

 

elements that are available in neither classic data mod-

 

els nor object models.

 

REF: A data type value which references a row in a

 

referenceable table (or object, in Oracle).

 

Referenceable Table: A table whose row can be ref-

 

erenced by REF type values. The table is based on a

 

structured user-defined type and comprises one extra

 

column containing row (or object) identifiers generated

 

automatically when a new row is inserted into the table. In

 

Oracle, such a table is called an object table.

 

SQL:1999: The most recent major release of the SQL

 

standard, published in 1999, which is based on an object-

 

relational model. Thus, it is a standard database language

 

for object-relational databases.

 

UML: Unified Modeling Language (UML) is the in-

 

dustry standard modeling language (mainly graphical) for

 

visualizing, specifying, constructing, and documenting

 

the artifacts of software systems.

 

UML Profile: A predefined set of stereotypes, tagged

 

values, constraints, and notation icons that collectively

 

425

TEAM LinG

specialize and tailor the UML for a specific domain or process.

UML Stereotype: One of UML extensibility mechanisms. It is an extension of the vocabulary of the UML that allows creating new kinds of building blocks that are derived from existing ones.

Object-Relational Modeling in the UML

User-Defined Type: A named data type defined by a user. It can contain a list of attributes, in which case it is said to be a structured type (or object type, in Oracle). It is an abstraction of a real-world entity. It can also provide explicitly defined methods that implement operations with the entity.

426

TEAM LinG

 

427

 

Online Data Mining

 

 

 

O

 

 

 

 

 

Héctor Oscar Nigro

Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina

Sandra Elizabeth González Císaro

Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina

INTRODUCTION

Several approaches for intelligent data analysis are not only available but also tried and tested. Online analytical processing (OLAP) and data mining represent two of the most important approaches. They mainly emphasize different aspects of the data and allow deriving of different kinds of information. So far, these approaches have mainly been used in isolation (Schwarz, 2002).

OLAP is a powerful data analysis method for multidimensional analysis of data warehouses. Data mining is the extraction of interesting (i.e., nontrivial, implicit, previously unknown, and potentially useful) information or patterns from data in large databases (Fayad, PiatetskyShiapiro,Smyth,&Uthurusamy,1996;Han&Kamber,2001).

OLAP is more frequently used for verification of the existing hypothesis. Data mining tries to generate such a hypothesis by uncovering hidden patterns. It is essentially an inductive process. It means that OLAP and data mining should be used together to complement each other (Schwarz, 2002).

Motivated by the popularity of OLAP technology, Han developed an online analytical mining (OLAM) mechanism for multidimensional data mining.

This paper is organized in the following way: First, the background section, in which OLAP and data mining are introduced. Second, the main section is divided into the following subsections: Definition, Online Mining—Ex- pected Characteristics, Olam Architecture, Olam and Complex Data Types, and OLAM System—Essential Functionality. Then, Future Trends in the area. Finally the Conclusions, References, Key Terms.

BACKGROUND

Basically, an OLAP system must show multidimensional data with dimensions, measures, and hierarchies through the cube’s paradigm. The multidimensional model simplifies for users the process of making complex queries, modifying information in a report, swapping between aggregated and detailed data, selecting part of the data, and so forth (Han & Kamber, 2001; Lenz & Shoshani, 1997).

Selected functionalities include the following:

drilling down

rolling up

dice & slice

pivoting

cubing

There are three types of OLAP systems:

Rolap: Relational OLAP works on normalized data; the classic models are star and snowflake. Its characteristic is having few storing spaces but with the disadvantage of presenting the information more slowly.

Molap: Multidimensional OLAP works with precalculated data for the optimization of response time. It is very fast but it uses more space than Rolap.

Hybrid: Hybrid OLAP combines the benefits of both.

Data mining is an especially interesting and very complex task, as defined in the previous section, and it is also a confluence of multiple disciplines (see Table 1).

A good data mining query language will support ad hoc and interactive data mining. Ad hoc query-based data mining is better when users want to examine various data portions with different constraints. Database queries can be answered intelligently using concept hierarchies, data mining results, or data mining techniques.

The major functions of data mining follow (Han, Chee Sonny, & Chiang Jenny, 1999; Han & Kamber, 2001):

1.Characterization: Generalizes a set of task-relevant data into a generalized data cube, which can then be used to extract different kinds of rules or be viewed at multiple levels of abstraction from different angles. In particular, it derives a set of characteristic rules, which summarize the general characteristics of a set of user’s specified data (called the target class).

2.Comparison: Mines a set of discriminant rules that summarize the features, which distinguish the

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Online Data Mining

Table 1. A list of topics related to data mining

Database systems, data warehouse, and OLAP

Statistics

Machine learning

Visualization

Information science

High performance computing

Business and application domain knowledge expertise

Other disciplines: neuronal networks, mathematical modeling, information retrieval, pattern recognition, and others (Han, 2000)

class being examined (the target class) from other classes (called contrasting classes).

3.Classification: Analyzes a set of training data (i.e., a set of objects whose class label is known) and constructs a model for each class based on the features in the data. A set of classification rules is generated by such a classification process, which can be used to classify future data and develop a better understanding of each class in the database.

4.Association: Discovers a set of association rules at multiple levels of abstraction from the relevant set(s) of data in a database.

5.Prediction: Predicts the possible values of some missing data or the value distribution of certain attributes in a set of objects. This involves finding the set of attributes relevant to the attribute of interest (by some statistical analysis) and predicting the value distribution based on the set of data similar to the selected object(s).

6.Cluster analysis: Groups a selected set of data in the database or data warehouse into a set of clusters to ensure the interclass similarity is low and the intra-class similarity is high.

7.Time series analysis: Performs data analyses for time-related data in databases or data warehouses, including similarity analysis, periodicity analysis, sequential pattern analysis, and trend and deviation analysis.

ONLINE DATA MINING:

MAIN FEATURES

Definition

Online analytical mining (OLAM; also called OLAP mining) is among the many different paradigms and architectures for data mining systems. It integrates OLAP with data mining and mining knowledge in multidimensional databases. (Han, 1997)

Online Mining:

Expected Characteristics

The following features are important for successful OLAM (Han, 1998a):

Ability to mine anywhere

Carve any portion of a database by drilling, dicing, filtering, and so forth, for mining

Drill through to raw data and mining

Support of multifeature cubes and cubes with complex dimensions and measures

Discovery-drivenexplorationofmultifeaturecubes

Spatial and multimedia dimensions/measures

Cube-based mining method

Multimining tasks and multilevel mining

Selection and addition of data mining algorithms

Different algorithms may generate different results

Standard APIs allow users to develop their own algorithms

Integration of data mining functions

First dicing, then classification, then association, and so forth

Ad-hoc query-based (constraint-based) mining

Fast response and high performance online

Visualization tools

Data and knowledge visualization tools

Extensibility

428

TEAM LinG

Online Data Mining

Integration with statistical package, extend to spatial, multimedia, Web

It is a promising direction due to the following (Han, 1998b):

High quality of data in data warehouses

Available information processing infrastr ucture surrounding data warehouses

OLAP-based exploratory data analysis

Online selection of data mining functions

OLAM Architecture

In a similar way to how OLAP engine performs online analytical processing, an OLAM engine performs analytical mining in data cubes. Work with the data cube in the data analysis is performed through an API cube, and a metadata directory guides the data cube access. Furthermore, a data mining process might disclose that the dimensions or measures of a constructed cube are not appropriated for data analysis. Here, a refined data cube design could improve the quality of data warehouse construction. Moreover, in some data mining processes, at least some of the data may require exploration in great detail. The interaction of multiple data mining modules with an OLAP engine can assure mining is easily performed anywhere in a data warehouse (Han, 1997; Han et al., 1998).

Traditional data cube queries compute simple aggregations at multiple granularities. Moreover, traditional data cubes support only dimensions of categorical data and measures of numerical data. Support of such nontraditional data cubes will enhance the power of data mining. Cube-based data mining methods should be the foundation of the OLAM mechanism. One particular OLAM strength is the interaction of multiple data mining and OLAP functions; another one is selecting a set of data mining functions. For example, an OLAM system might be integrated with a statistical data analysis package, or it might be extended for spatial data mining, text mining, financial data analysis, multimedia data mining, or Web mining (Han et al.,1996; Han, 1998a, 1998b, 1998c; Han et al., 1998). Special attention should be paid to efficient

implementation of the OLAM mechanism (see Table 2;

Han et al., 1999). O An OLAM system might well integrate a variety of

data mining modules through different kinds of data cubes and visualization tools. High-performance data cube technology is critical to OLAM in data warehouses. Moreover, effective data mining requires the support of nontraditional data cubes with complex dimensions and measures, in addition to the on-the-fly computation of query-based data cubes and the efficient computation of multifeatured data cubes. These have the need of further development of data cube technology (Han, 1998b, 1998c).

While most data mining needs are query or constraint based, OLAM requires fast response to data mining requests. The organization should also explore such constraint-based mining in other data mining tasks. There is a wide range of data mining algorithms. The OLAM paradigm offers the user freedom to explore and discover knowledge by applying any sequence of data mining algorithms with data cube navigation (Han et al., 1999).

OLAM and Complex Data Types

A generalization based data mining method can generalize complex objects, construct a multidimensional object cube, and perform analytical mining in such an object cube. Spatial data mining can be performed in a spatial database as well as in a spatial data cube (Han, 1998c).

By the method of OLAM of text and multimedia data, text/multimedia data cubes are built, whereupon the cubebased relational and spatial mining techniques are extended towards mining text and multimedia data. The process includes construction of a Web Log, and the performance of time-related, multi-dimensional data analysis and data mining (Han, 1999, 2000).

The work of Pei (2003) exhibits a general model for OLAP with complex types data, called Golap. Some ideas can be appropriate to online data mining architecture.

OLAM System:

Essential Functionality

By integration of OLAP and data mining, OLAP mining facilitates flexible mining of interesting knowledge in

Table 2. Review of OLAM principle implementation issues

Modularized design and standard APIs

Support of online analytical mining by high-performance data cube technology

Constraint-based online analytical mining

Progressive refinement of data mining quality

Layer-shared mining with data cubes

Bookmaking and backtracking techniques

429

TEAM LinG

Соседние файлы в предмете Электротехника