
Rivero L.Encyclopedia of database technologies and applications.2006
.pdfIntegration of Data Semantics in Heterogeneous Database Federations
Lecture Notes in Computer Science: Vol. 2348. Proceedings of CAISE 2002. Berlin, Germany: Springer.
Grahne, G., & Mendelzon, A. O. (1999). Tableau techniques for querying information sources through global schemas. In Lecture Notes in Computer Science: Vol. 1540. Proceedings of ICDT’99. Berlin, Germany: Springer.
Halevy, A. Y. (2001). Answering queries using views: A survey. VLDB Journal, 10.
Hull, R. (1997). Managing semantic heterogeneity in databases. In Proceedings of the Sixteenth ACM SIGACTSIGART Symposium on Principles of Database Systems. New York: ACM Press.
Lenzerini, M. (2002). Data integration: A theoretical perspective. In ACM PODS’02. ACM Press.
Miller, R. J., Haas, L. M., & Hernandez, M. A. (2000). Schema mapping as query discovery. In Proceedings of The 26th VLDB Conference. San Mateo, CA: Morgan Kaufmann.
Rahm, E., & Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB Journal, 10.
Sheth, A. P., & Larson, J. A. (1990). Federated database systems for managing distributed, heterogeneous and autonomous databases. ACM Computing Surveys, 22.
Türker, C., & Saake, G. (2000). Global extensional assertions and local integrity constraints in federated schemata. Information Systems, 25(8).
Vermeer, M., & Apers, P. G. M. (1996). The role of integrity constraints in database interoperation. In Proceedings of the 22nd VLDB Conference. San Mateo, CA: Morgan Kaufmann.
Warmer, J. B., & Kleppe, A. G. (2003). The object constraint language (2nd ed.). Boston: Addison-Wesley.
Wiederhold, G. (1995). Value-added mediation in largescale information systems. In Proceedings of IFIP Data Semantics. Norwell, MA: Kluwer.
KEY TERMS
Data Exchange: The situation that the local source schemas, as well as the global schema, are given beforehand; the data integration problem then exists in establishing a suitable mapping between the given global schema and the given set of local schemas.
Data Extraction: The process of creation of uniform representations of data in a database federation.
Data Reconciliation: The process of resolving data inconsistencies in database federations (such as constraint conflicts).
Database Federation: A database federation provides for tight coupling of a collection of heterogeneous legacy databases into a global integrated system. Main problem is achieving and maintaining consistency and a uniform representation of the data on the global level of the federation.
GAV: Global-As-View, in which the global schema is defined directly in terms of the source schemas. GAV systems typically arise in the context where the source schemas are given, and the global schema is to be derived from the local schemas.
Interoperability Problem: Getting a collection of autonomous legacy systems to cooperate in a single federated system.
LAV: Local-As-View, in which the relation between the global schema and the sources is established by defining every source as a view over the global schema. LAV systems typically arise in the context where the global schema is given beforehand and the local schemas are to be derived in terms of the global schema.
Mediation: A global service to link local data sources and local application programs, thus providing the integrated information on the global level, while letting the component systems of the federation to remain intact.
Ontology: Ontology deals with the connection between syntax and semantics, and how to classify and resolve difficulties and classification between syntactical representations on the one hand and semantics providing interpretations on the other hand.
290
TEAM LinG
|
291 |
|
Integrative Document and Content |
|
|
|
I |
|
Management Systems’ Architecture |
|
|
|
|
|
|
|
Len Asprey
Practical Information Management Solutions Pty Ltd., Australia
Rolf Green
OneView Pty Ltd., Australia
Michael Middleton
Queensland University of Technology, Australia
INTRODUCTION
Purpose
This paper discusses the benefits of managing digital content (business documents and Web content) within the context of an integrative information systems architecture. This architecture incorporates database management, document and Web content management, integrated scanning/imaging, workflow, and data warehousing technologies.
Business Context
The ubiquitous use of digital content (such as office documents, e-mail, and Web content) for business deci- sion-making makes it imperative that adequate systems are in place to implement management controls over digital content repositories. The traditional approach to managing digital content has been for enterprises to store it in folder structures on file or Web servers. The content files stored within folders are relatively unmanaged, as there are often inadequate classification and indexing structures (taxonomies and metadata), no adequate version control capabilities, and no mechanisms for managing the complex relationships between digital content. These types of relationships include embedded or linked content, content renditions, or control over authored digital documents and published Web content.
In some cases enterprises have achieved a form of management control over hard-copy documents that are records of business transactions by using database applications to register, track, and manage the disposal of physical files and documents. These types of file or document “registers” do not provide adequate controls over the capture, retrieval, and accessibility to digital content.
This deficiency has led to many organizations seeking solutions, such as document management systems, to manage digital business content. Document management systems have generally been implemented to meet regulatory compliance within the context of document record-keeping requirements or management of digital archive collections. Otherwise, they have been implemented as solutions for managing specific types of content objects, such as ISO9001 quality management system documentation, engineering drawings, safety documents, and similar.
More recently, organizations have sought to acquire Web content management systems with the view to providing controls over digital content that is published to Web sites. The imperative for such a solution may be a commercial one, motivated by product-to-market visibility, customer service, and profitability. There may also be a response to compliance needs, motivated by managing Web content in the context of “record keeping” to satisfy regulatory or governance requirements.
The methodology of implementing document or Web content management systems has often been based on a silo approach, with more emphasis on tactical business imperatives than support for strategic enterprise information architecture initiatives. For example, organizations may attempt a Web content management solution without taking into full account digital documents that may be used to create content outside the constraints of Web-compatible formats such as XML-defined, but which are subsequently required for publication. Thus, document and Web content management may be viewed as discrete solutions, and business applications may be implemented without an integrative approach using workflow and systems for managing both business documentation and Web content.
Another example of a silo approach is the deployment of database solutions without cognizance of document or Web content management requirements. For example, organizations may deploy a solution for man-
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG
Integrative Document and Content Management Systems’ Architecture
aging contracts, including database application capabilities for establishing the contract, recording payments and variations, and managing contract closure. However, the management of contract documents may not be viewed as an integral part of the application design, or the workflow review and approval, or managing the published contract materials on Web sites. The result is that users often miss vital information rather than manually relate data retrieved through a number of separate applications.
There are compelling reasons for organizations, as they address the constructs of enterprise information architecture, to consider the management of digital content within the context of an integrative approach to managing business documents and Web content. The strategic rationale for such an approach encompasses the following types of business imperatives:
•Customer satisfaction is a key commercial driver for both business and government: in the case of the commercial sector, the need to attract and retain customers, and in the public sector, the need to support government initiatives directed at taxpayer benefits. Organizations are adopting strategic approaches such as single view of customer and one-source solution for customer information, invoking the use of information knowledge management tools.
•Speed and quality of product to market is another major business driver. The rapid adaptation of the WWW and e-commerce systems to support online business transactions opens markets to global competition. Commercial enterprises are not only required to deliver product to market rapidly, but also within quality management constraints, to attract and retain customers.
•Regulatory imperatives, such as Sarbanes-Oxley in the United States (U.S. Congress, 2002) have introduced new measures for creating greater transparency within organizations, which impact corporate governance and require disclosure with real-time reporting requirements.
The enterprise information architecture would include information policy, standards and governance for the management of information within an organization, and provide supporting tools in the form of an integrative information systems architecture as the platform for managing information. An integrative systems architecture would provide a platform that enables businesses to meet the challenges of both commercial and regulatory imperatives, benefit from reusable information, and provide a coherent view of relevant information enterprise-wide to authorized users.
In respect to document and Web content management, an integrative document and content management (IDCM) model (Asprey & Middleton, 2003) offers a framework for unification of these components into an enterprise information architecture. The model features the management of both documents and Web content within an integrative business and technology framework that manages designated documents and their content throughout the document/content continuum and supports record-keeping requirements.
SCOPE
The core IDCM elements that address document and Web content management requirements (capturing content, reviewing/authorizing content, publishing content, and archival/disposal) comprise:
•Integrated document and Web publishing/content management capabilities.
•Integration of document-imaging capabilities.
•Recognition technologies, such as bar codes, to assist with capturing document information or conversion of image data to text as a by-product of scanning/imaging.
•Enterprise data management capabilities.
•Workflow.
However, when determining requirements within the context of process improvement initiatives that help to address business imperatives (such as customer satisfaction, product to market, and regulatory compliance), these capabilities might be supported by other technologies. This technology support may help businesses to achieve an integrative systems architecture for deployment of innovative and integrated solutions.
•Universal access/portal, which allows users to invoke functions and view information (including digital content) via a Web-based interface.
•Integration with business systems, such as enterprise resource planning (ERP) systems, human resource systems, financial systems, and vertical line of business systems.
These types of capabilities, when combined, augment an integrative systems architecture to support the development of solutions that take advantage of digital content in managed repositories. Users that access business information then have the confidence that they are accessing, retrieving, and printing the most current
292
TEAM LinG

Integrative Document and Content Management Systems’ Architecture
Figure 1. Information systems architecture: document/ content management
Imaging and Recognition |
Universal Access |
|
|
Systems |
|
||
/ Portal Tier |
Presentation Layer |
||
|
Application / Business Rules |
|
|
IDCM |
|
|
|
|
|
|
Tier |
|
Workflow |
|
|
OCR / OMR / ICR |
|
|
Document / Content |
Business Application |
|
|
|
|
Management System |
|
Systems |
|
|
|
|
|
|
|
||
Distributed IDCM |
|
Replication |
|
|
|
|
System |
|
Content |
|
|
Application |
Data |
|
|
|
|
|||
|
Index |
and |
Index data |
Task Data |
||
|
|
Data |
Warehouse |
|||
|
Database |
Annotation Data |
|
|||
|
|
|
|
|
||
|
Management |
|
|
|
||
IDCM |
Business Data |
|
|
|
||
|
Tier |
|
|
|
Index |
Content |
|
Database |
||
Storage |
||
|
||
|
Tier |
Document |
Web |
Full Text |
Document |
Report |
|
Content |
|||||
Content |
Content |
Content |
Output |
||
|
digital content. This confidence can aid decision making, as end users may not then be required to access physical copies of documents, which can be both cumbersome and time-consuming due to customer expectations on speed of service in a modern technology environment.
The following section discusses those features of an integrative information systems architecture that would support a requirement for business users to gain rapid access to the most up-to-date digital content via an interface that is simple and intuitive.
tem data but to provide a single application interface |
|
|
I |
||
for all enterprise systems. |
||
Corporate portals may be differentiated by applica- |
|
|
|
||
tion (Collins, 2003) or level (Terra & Gordon, 2003), |
|
|
but they are typically aimed to enhance basic Intranet |
|
|
capability. This may be achieved by provision of facili- |
|
|
ties such as personalization, metadata construction |
|
|
within enterprise taxonomy, collaborative tools, sur- |
|
|
vey tools, and integration of applications. The integra- |
|
|
tion may extend to external applications for business- |
|
|
business or business-customer transactions or for deri- |
|
|
vation of business intelligence by associated external |
|
|
and internal information. |
|
|
Universal access/portal applications include the ca- |
|
|
pability of understanding how the information in each |
|
|
system is related to the business process and the rules |
|
|
by which the data can be combined. They provide differ- |
|
|
ent views of information depending on the user’s needs |
|
|
and allow information to be utilized to greater effect. |
|
|
Just as importantly they provide the controls by which |
|
|
the information capture of the enterprise can be man- |
|
|
aged to allow the support of both industry and compli- |
|
|
ance requirements. |
|
|
These applications employ business intelligence to |
|
|
provide the integrative framework to support utiliza- |
|
|
tion of information across the enterprise. This frame- |
|
|
work allows enterprises to provide on demand report- |
|
|
ing of information and to use the information actively |
|
|
in business applications rather then the more passive |
|
|
approach of printed reporting. |
|
SYSTEM FEATURES
Schematic
Figure 1 provides a schematic of database management within the context of IDCM and supporting technologies, such as universal interface (e.g., portal), and interfaces to business applications.
Universal Access/Portal
Universal access/portal applications are used within enterprises to allow the viewing and interpretation of information from a number of sources, providing a single view of information. These sources of information may include data stored within application databases, data stored in warehouses, as well as digital documents and other content stored within document and content management systems. The evolution of universal access/ portal applications is to encapsulate not only multisys-
DOCUMENT/CONTENT
MANAGEMENT
Document management applications are used within enterprises to implement management controls for digital and physical document objects. These objects may include office documents, e-mail and attachments, images and multimedia files, engineering drawings, and technical specifications. The systems manage complex relationships between documents and provide capabilities to manage integrity, security, authority, and audit. Business classification and metadata capabilities support access, presentation, and disposal (Bielawski & Boyle, 1997; Wilkinson et al., 1998), thus supporting organizational knowledge management initiatives.
Web content management applications are implemented by organizations to manage digital content that is published to the Web, including Internet, intranet, and extranet sites. The functionality of Web content management applications can be summarized as content creation, presentation, and management (Arnold,
293
TEAM LinG
Integrative Document and Content Management Systems’ Architecture
2003; Boiko, 2002; Robertson, 2003). Organizations are discerning the requirement to have integrated architectures for managing documents and Web content and to consider the implications of record-keeping requirements when seeking unified document and content solutions.
Document Imaging
Document imaging systems are used to scan and convert hard-copy documents to either analog (film) or digital format (Muller, 1993). Film-based images can be registered and tracked within a document management system, similar to the way in which a paper document can be registered and tracked. Digital images can be captured into either a document or Web content management system via integration with the scanning system.
Digital imaging may involve black and white, grayscale, or color imaging techniques, depending on the business need, as the requirement may involve scanning a range of document types, e.g., physical copies of incoming correspondence, forms processing, or capture of color brochures.
Some systems allow users to search and view film images on their local computer by the film being scanned on demand. The speed of viewing film images is slow due to the need for the specified film roll to be loaded in the scanning device, for the film to be wound to the correct position, and then for the image to be scanned. Digital images may be stored on magnetic storage, allowing rapid recall by users and so supporting not only the records archive process but the demands of workflow and document management.
Recognition Systems
Recognition systems, such as bar code recognition, optical mark recognition (OMR), optical character recognition (OCR), or intelligent character recognition (ICR), are often used to facilitate data capture when documents are scanned to digital image.
Systems can take advantage of encoding within bar codes, which can be intelligently encoded to assist with improving business processes. For example, a bar code may contain specific characters that identify a geographical region, product type, or details of a specific subject/topic of interest.
OMR systems are used to capture recognition marks on physical documents, such as bank checks, during digital scanning.
OCR and ICR technologies can allow text to be extracted from a document during digital scanning and conversion processes. The text might be an alphabetical string, or unique number, or words within a zoned area of a form. Depending on the quality of the hard-copy original document, much of the printed information might be convertible to text within acceptable error constraints.
OCR and ICR may also be used to convert existing digital documents, such as Tagged Image File (TIF) Format or Portable Document Format (PDF) documents, to a full text rendition capable of being searched for specific words or phrases and so increasing the ability to recall the primary file being the PDF format.
Enterprise Data Management
Each of the components within an IDCM application is highly dependent on database technology for the storage of data, including the metadata relating to the various information objects, and often storage of the objects themselves. Software for managing the repositories must accommodate versioning, routing, security, and review and approval processes. It should also link to tailored interfaces and be able to deal with different renditions of documents, maintain associations within complex documents, and provide for complex queries. Information retrieval development in this area focuses on facilitating indexing and integrating taxonomies (Becker, 2003).
Many organizations use data warehouse applications to store archived records from various business systems. In such cases data may be accessible only through a separate user interface, therefore requiring manual steps to reference data to records within the originating and other systems. The implementation of data repositories within an integrated architecture allows crossreferencing and searching of corporate knowledge and information across the enterprise.
The improvement of data usage increases the number of data calls to the supporting databases and the need for data to be accessible across a wider geographic area. Data supporting document/content management, workflow, and other business systems may be required to be managed across a distributed architecture of replicated and cached data stores.
Integration of data warehouse and document/content management systems allows for various types of data, such as business reports, system logs, etc., to be stored as data or text files rather then within the data warehouse database. This integration allows storage planning based
294
TEAM LinG

Integrative Document and Content Management Systems’ Architecture
on information life cycles, supporting better data management.
Workflow
Workflow systems allow more direct support for managing enterprise business processes. These systems are able to automate and implement management controls over tasks from initiation of a process to its closure. Though many application systems can be said to have some level of workflow, they lack the flexibility that enables visual creation and alteration of processes by an enterprise workflow system. More importantly the enterprise workflow allows processes to flow across a number of systems rather than be isolated within a single application.
The drivers for implementing workflow may be to improve both efficiency (by automating and controlling work through a business process) and effectiveness (by monitoring the events work may pass through). Ideally, the design of such systems takes into account a unified analysis of enterprise information content (Rockley, Kostur, & Manning, 2003).
The implementation of workflow is often driven from a technology perspective rather then by business processes. Questions of flexibility and interoperability become the drivers for the project rather than enhancements to the business processes. The most marked effect of this is the amplification of bottlenecks within the business process when it is run through a workflow system.
Workflow systems provide the integrative framework to support business processes across the enterprise. This is done by providing a capability for the business easily to represent their businesses processes within the workflow system and by allowing the workflow to call the business functions specific to each system with the required data.
System Integration
Given the business impetus towards improving customer service, such as optimization of product to market, in what are mostly highly volatile environments, there is a need for management to obtain rapid access to strategic, tactical, and operational information. Management information is typically stored in database applications, such as ERP systems, human resource systems, financial systems, and vertical line of business systems. However, information stored in databases often needs to be supplemented by copies of supporting documentation.
Integration between document/content repositories
and operational/ administrative systems enables end I users to access all information that is relevant to a particular matter. For example, copies of a contract may
be required to support a contract variation, or a supporting invoice may be required to support payment processing.
REASONS FOR UTILIZATION
The implementation of systems for managing digital and Web content in silo systems may only solve a tactical business requirement. Essentially, controls over content are implemented. However, silo approaches to the implementation of systems, while they perhaps solve tactical requirements, might not be the most strategic and innovative approach to help solve the key imperatives facing enterprises. Solutions that embrace integrated document and Web content management, combined with universal interface/portal applications and integration with business systems, may support better opportunities for business process improvement.
With respect to knowledge management, document and Web content management solutions support enterprise knowledge management initiatives, where the knowledge has been made explicit. The strategic approach to managing both data in databases and digital content using an integrative systems architecture may provide a more cohesive and coherent management of information, with the potential to add more value to knowledge management initiatives (Laugero & Globe, 2002). The use of the universal interface or portal will help to address usability issues with document and content management applications. While these systems contain a wide range of functionality, the end user may not need to invoke much of the functionality, and an enterprise portal–style interface approach allows better presentation and management of end-user functions and viewing capabilities.
Organizations may find that it is difficult to cost justify digital document management or Web content management applications based purely on information management notions of “better managing enterprise document collections.” The investment in the technology may outweigh the perceived benefits. However, the approach of solving strategic management challenges using an integrative systems architecture to deliver end- to-end business applications may make it easier to support business justification. At the same time, the enterprise is able to secure a managed repository for documents and Web content to support information policy and planning.
295
TEAM LinG

Integrative Document and Content Management Systems’ Architecture
Table 1. A summary of critical issues
Business Issues |
Technology Issues |
Cost |
Infrastructure |
The cost of an information systems |
The infrastructure within the organization |
architecture is likely to be higher than a |
may not be adequate for the deployment of |
tactical or silo solution. |
the information systems architecture. |
Planning |
Systems Integration |
|
|
|
|
The extent of planning required for the |
The integration between the components of |
||||
acquisition and implementation of a strategic |
the solutions may be jeopardized if technical |
||||
enterprise platform is likely to be longer |
specifications |
are |
not |
correctly |
and |
than that required for a tactical solution. |
thoroughly defined or if the package |
||||
|
selection process is not well -managed. |
|
Specifications |
Evolving Technology |
The extent of specifications required for a |
The evolving nature of technology, including |
strategic enterprise platform is likely to be |
technology convergence, may impact the |
more extensive than that required for a |
long-term rollout of the information systems |
tactical solution. |
architecture. |
Benefits Realization |
Security |
Benefits of a strategic solution may not be |
The nature of the architecture, including |
realized as early as those for a tactical |
integrated components, is likely to involve |
solution. However, the benefits may be more |
the development of detailed specifications to |
enduring. |
ensure seamless access to authorized |
|
information. |
Lack of Business Unit Buy-In |
Disaster Recovery |
Autonomous business units may not see the |
Storage of large amounts of data made |
benefit of a strategic enterprise solution, |
available on demand introduces the need for |
focusing instead on specific tactical and |
complex backup procedures and longer |
operational requirements. |
recovery times. |
CRITICAL ISSUES
Table 1 summarizes some of the critical issues that enterprises may need to consider when reviewing requirements for an information systems architecture that features integrative document and content management capabilities.
CONCLUSION
The acquisition of enterprise document and Web content management systems might not be justified based only on notions of good information management or contribution to knowledge management. The likelihood of a document and content management project targeted at the enterprise being successful may be enhanced significantly by incorporating the capabilities of these systems into an integrative information systems architecture. This platform may then allow the business to initiate projects that deliver a wide range of business process improvement initiatives, all relying on a supporting and consistent enterprise platform and which provides easy access to authorized users of information. Thus, the platform becomes strategically positioned to support business imperatives in the strategic, tactical, and operational hierarchy of an organization and to allow the deployment of innovative and stable end-to-end business solutions.
REFERENCES
Addey, D., Ellis, J., Suh, P., & Thiemecke, D. (2002).
Content management systems. Birmingham, UK: Glasshaus.
Arnold, S.E. (2003). Content management’s new realities. Online, 27(1), 36-40.
Asprey, L., & Middleton, M. (2003). Integrative document and content management: Strategies for exploiting enterprise knowledge. Hershey, PA: Idea Group Publishing.
Becker, S.A. (2003). Effective databases for text & document management. Hershey, PA: IRM Press.
Bielawski, L., & Boyle, J. (1997). Electronic document management systems: A user centered approach for creating, distributing and managing online publications. Upper Saddle River, NJ: Prentice Hall.
Boiko, B. (2002). Content management bible. New York: Hungry Minds.
Collins, H. (2003). Enterprise knowledge portals. New York: American Management Association.
Laugero, G., & Globe, A. (2002). Enterprise content services: A practical approach to connecting content management to business strategy. Boston: AddisonWesley.
296
TEAM LinG

Integrative Document and Content Management Systems’ Architecture
Muller, N.J. (1993). Computerized document imaging systems: Technology and applications. Boston: Artech House.
Robertson, J. (2003). So, what is a content management system? Retrieved March 18, 2004, from http:// www.steptwo.com.au/papers/kmc_what/index.html
Rockley, A., Kostur, P., & Manning, S. (2003). Managing enterprise content: A unified content strategy. Indianapolis, IN: New Riders.
Terra, J., & Gordon, C. (2003). Realizing the promise of corporate portals. Boston: Butterworth-Heinemann.
United States Congress. (2002, July 30). Public Law 107-204: Sarbanes-Oxley Act of 2002. Retrieved July 16, 2004, from http://frwebgate.access.gpo.gov/cgibin/getdoc.cgi?db name=107_cong_pu blic_l aws&do cid=f:publ204.107.pdf
Wilkinson, R., Arnold-Moore, T., Fuller, M., Sacks-Davis, R., Thom, J., & Zobel, J. (1998). Document computing: Technologies for managing electronic document collections. Boston: Kluwer.
KEY TERMS
I
Content Management: Implementation of a managed repository for digital assets such as documents, fragments of documents, images, and multimedia that are published to intranet and Internet WWW sites.
Document Capture: Registration of an object into a document, image, or content repository.
Document Imaging: Scanning and conversion of hardcopy documents to either analog (film) or digital image format.
Document Management: Implements management controls over digital documents via integration with standard desktop authoring tools (word processing, spreadsheets, and other tools) and document library functionality. Registers and tracks physical documents.
IDCM: Integrative document and content management.
Portal: User interface to a number of process and information sources.
Recognition Technologies: Technologies such as bar code recognition, optical character recognition (OCR), intelligent character recognition (ICR), and optical mark recognition (OMR) that facilitate document registration and retrieval.
Workflow Software: Tools that deal with the automation of business processes in a managed environment.
297
TEAM LinG
298
Intension Mining
Héctor Oscar Nigro
INCA/INTIA, Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina
Sandra Elizabeth González Císaro
Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina
INTRODUCTION
Knowledge discovery is defined as “the non trivial extraction of implicit, unknown, and potentially useful knowledge of the data” (Fayyad, Piatetsky-Shiapiro, Smyth, & Uthurusamy, 1996, p. 6). According to these principles, the knowledge discovery process (KDP) takes the results just as they come from the data (i.e., the process of extracting tendencies or models of the data), and it carefully and accurately transforms them into useful and understandable information. To consider the discovery of knowledge useful, this knowledge has to be interesting (i.e., it should have a potential value for the user; Han & Kamber, 2001).
Current data mining solutions are based on decoupled architectures. Data mining tools assume the data to be already selected, cleaned, and transformed. Large quantities of data are required to provide enough information to derive additional knowledge (Goel, 1999). Because large quantities of data are required, an efficient process becomes essential.
With the idea of efficiency, intension mining was born. Gupta, Bhatnagar, and Wasan proposed the architecture and framework.
Intension mining arises as a framework that focuses on the user of the current KDP. The basic idea behind the concept of intension mining is to separate the user from the intricacies of the KDP and give him or her a single database management system (DBMS)-like interface to interactively mine for the required kind of knowledge. The user can plan the data mining needs beforehand and input them in the form of knowledge discovery schema (KDS). The system mines knowledge and presents the results in the required format. Intension mining leads to efficiency and makes the whole process more realistic, user-friendly, and, hence, popular (Goel, 1999). As a result, intension mining is a logical extension of incremental mining, with an oriented paradigm to the user, who establishes and conceives the requirements of the mining before the mining begins (Gupta, Bhatnagar, & Wasan, 2000a, 2000d).
BACKGROUND
There are countless contributions to improve and understand KDP. The concept of a second-generation data mining system (Imielinski & Mannila, 1996; Virmani, 1998) involves rule generation, data-rule management, and rule postprocessing. Another extension includes providing users with the ability to remember past mining sessions. Virmani (1998) developed a design called discovery board, which provides a framework for DBMS-like environment supporting query language and APIs to build data mining applications.
Imielinski and Mannila(1996) proposed an evolution of KDP with an SQL-like interface for ad hoc mining. Meo, Psaila, and Ceri (1998) suggested architecture strongly coupled with an SQL server. Ganti, Gherke, and Pamakrishman (2000) developed DEMON, a system based on an incremental mining paradigm; with DEMON, it is possible to mine the entire data repository or some selected subset.
The work being done in the field of structured data mining and upcoming database ideas, such as Hippocratic databases, share various levels of commonality with the I-MIN model in their core ideas (Gupta et al., 2000c). An extensive and explicative coverage with other researches can be found in Gupta et al. (2000a) and Gupta, Bhatnagar, and Wasan (2001).
INTENSION MINING:
MAIN FEATURES
General Characteristics
The basic proposal of intension mining is the separation of the mining’s requirements to prepare the data a priori and, as a result, to carry out the task of the mining in a more efficient and structured way.
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG

Intension Mining
Once planned, the objectives of data mining are kept in the form of a KDS. Besides capturing the requirements, the functionality of this scheme is to provide the user with a friendly interface, improving the user’s productivity due to its understanding (Gupta et al., 2000a).
As the database framework’s outlines contain the specifications of the relations, the KDS contains the specification of the mining requirements. The outline of knowledge discovery guides the selection, cleaning, transformation, and aggregation processes of the data before mining and, due to the readiness of the requirements in the KDS, the system is able to execute off-line pre-mining operations periodically. This information should be preserved in secondary storage in an appropriate form to be used to satisfy the mining queries performed by the user (Gupta et al., 2000a). Thus, the mining in the base can be carried out on the basis of demand, using this information.
An important characteristic of intension mining is that it perceives the KDP as a continuous process. Because the temporary aspect is captured by the operations of periodic premining, experimentation as well as monitoring is possible (Gupta et al., 2000c).
Intension mining, as DBMS architecture (for more detail on DBMS, see Elmasri & Narvathe, 2000), is built up in three phases:
•Phase 1-Planning: The aim of phase 1 is to evaluate the objectives of data mining and to formalize them as a KDS. As was previously mentioned, the user anticipates the mining requirements and specifies them in the aforementioned scheme during the planning phase.
A clear understanding of the domain and the requirements of the mining help in designing a KDS, which directs the KDP. The type of knowledge to be mined in the database, the database to be mined, the transformation, the selection, the cleaning, the specific mining algorithm to be executed, and finally the presentation tools to be used are specified by the user in the KDS (Goel, 1999). The metadata is stored to be used in the accumulation and mining phases. Just as a good database outline design can efficiently satisfy the users’ queries in most occasions, a well-thought-out KDS would be able to efficiently discover different types of knowledge in most instances (Gupta, 2000). At this level, security, backup, and recovery issues also arise in the database (Gupta et al., 2000a).
•Phase 2-Accumulation: Phase 2 starts after compilation of the KDS and continues until the user decides to drop the mining requirements (schema) altogether. During this accumulation phase, the incremental database is premined and aggregated in consultation with the metadata to yield knowledge
concentrate (KC), which stores the intermediate form of intended knowledge (Gupta et al., 2001). I The crucial parts of this level are the interaction between the database and the KDP to get to the registers of data and the maintenance of the KC in
the secondary storage. The extraction of the KCs, starting from the incremental database, represents the intensive task of I/O in intension mining and it can endure several scannings of the data. Significantly, all these tasks are carried out off-line (Gupta et al., 2000a).
In conclusion, the presence of the trade-off that the KCs imply can be observed. Although KCs allow new functionality to be added, because the mining, when working on them is speeded up, they have a cost when occupying extra space. In the ideal case, all the tuples of the database will be the same, and so will the small structures; but, in the worst case, all the tuples are different; they can occupy a great quantity of space. One should then evaluate what is convenient for the user in each case.
•Phase 3-Mining: Mining is the final phase of the system. In general, during this step the KDP is intensive. The phase of mining begins when a user invokes a mining query in the user’s interface (Goel, 1999). The user specifies the parameters when carrying out the query. This offers the user the freedom to explore the database and to experiment with the KDP. The query is processed, and the mining is done on the specific structures of data, which are kept in the KC (Gupta, 2000).
The mining process consults the KDS, and it executes the algorithm of the specified mining on the cumulative KCs during the accumulation phase. Finally, the results are presented by means of the presentation tool (Gupta et al., 2000a).
Those response times are also better because the I/ O is avoided and is giving the possibility of carrying out an exploratory analysis, choosing the subset of data, or varying the parameters. As the mining is carried out on the KCs, it does not interfere with the operations of the database.
I-MIN MODEL: AN INSTANTIATION OF INTENSION MINING
The pattern has been designed to support intension mining. Thus, it is developed in three layers, according to the basic ideas mentioned in Gupta et al. (2000a).
The architectural proposal emulates a DBMS-like environment for the managers, administrators, and end users in the organization. Knowledge management functions, such as sharing and reusing of the discovered
299
TEAM LinG