Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
11
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

KEY TERMS

E-Government: The delivery of local government service through electronic means. The use of technology to improve government operations and create public value.

Encryption: A modification process that transforms data into a non-readable format to protect against unauthorized viewing. Encryption can be handled by special applications, but it is often included as a feature of a database system, or as a utility application as part of the operating system. Depending on the encryption/ decryption method, information transmitted in encrypted format may be decrypted routinely and without user intervention by e-mail software or commodity Web viewers, products such as Microsoft Internet Explorer, Netscape Navigator, or Mozilla-compliant browsers.

Geographic Information System (GIS): A database of a region and software interfaces to view and manage the data. GIS implementation often begins with a digitized map of an area derived from original parcel maps or aerial photography. Multiple “layers” are created for the map to include different infrastructure systems such as roads, sewers, and telecommunications.

E-Government Databases

Interface: The point at which two systems connect, and the method by which communication is accomplished. The computer keyboard, mouse, printer, and video display exemplify interfaces between the machine’s internal operations and the human user.

Legacy Data: Contents of databases that precede the installation and implementation of new systems. Optimally, legacy data is migrated into new data systems; following this process, the older application and data structure may be archived or deleted. Frequently, in an effort to reduce the cost of implementation, legacy data remains outside a new data store and accessed as foreign data records from the new application.

Referential Integrity: A concept developed as part of relational database management systems. A connecting construct, or “key”, allows a database designer to optimally develop a set of tables while retaining links between related data. With referential integrity, records cannot be updated in isolation in an inconsistent manner.

Web-Enabling: A modification process whereby information formerly requiring a locally constrained interface for access becomes available to commodity Web viewers, products such as Microsoft Internet Explorer, Netscape Navigator, or Mozilla-compliant browsers.

210

TEAM LinG

E-Mail Data Stores

CatherineHoriuchi

Seattle University, USA

INTRODUCTION

For many people, e-mail has become a running record of their business and personal lives. Somewhere in that big clot of e-mail messages that have accumulated over the years is a wealth of information about people they’ve met, work they’ve done, meetings they’ve held. There are tough calls and tender moments, great debates and funny episodes. When did you first meet a certain person? Just what was the initial offer in a business deal? What was that joke somebody sent you in 1998? The answers lie in your old e-mail. Unfortunately, it’s often easier to search the vast reaches of the World Wide Web than to quickly and accurately search your own stored e-mail. (Mossberg, 2004)

Electronic mail, or e-mail, has evolved from its beginnings as one of the earliest Internet applications. The network originally connected computers to computers, but in 1977, RFC 733 updated the messaging protocol to “focus on people and not mailboxes as recipients” (Crocker, Vittal, Pogran, & Henderson, 1977, p.1). Once considered a simple method to send text messages between two machines, e-mail has become a complex system of hardware and software interfaces between individuals and institutions. Messaging technologies manage the flow of major lines of business and significant public sector policy processes. As a corollary to this, databases associated with e-mail now rank among the most missioncritical data stores in many organizations.

BACKGROUND

User-oriented client software interfaces create flexibility to map messages in patterns strongly congruent with the way individuals think and organize information. This usability has resulted in e-mail becoming the catch basin of an organization’s intellectual capital and institutional history, particularly knowledge and activities that are not captured by software systems focused on basic business processes, such as inventory and accounts receivable. The message store grows organically over time. The central message store is linear in nature, with messages stored chronologically, but copies of those messages also are managed by each message originator and recipi-

211

E

ent. Multiple instances of a message are organized in individualistic fashion in multiple locations, based on each user’s business and personal priorities. Options also exist to create, send, and store messages in encrypted format. Firms face critical decisions regarding e-mail administration including policies on retention, software package management, and mitigation of risks from worms, viruses, and e-mail bombs. Administering e-mail is further complicated by the multiple parties who have the option to discard, retain, or forward an e-mail message, creating yet more copies: the sender, the recipient, the administrator of the originating mail system, and the administrator of the receiving mail system. Table 1 describes the basic location of e-mail messages.

The highly congruent, highly personal aspects of e- mail have contributed to efforts to capitalize on these attributes in an organized fashion. These efforts have varied in approach, and each faces specific challenges, discussed in the following section.

DATA MANAGEMENT CHALLENGES

Strategies to capture the knowledge held in e-mails have ranged from benign neglect (e.g., limited to backing up and archiving a central e-mail message store), to direct integration with a primary business application (e.g., using mail message protocols within a supply-chain software package, such as SAP), to sophisticated programming to capitalize on a particular e-mail platform (e.g., business application programming on Lotus Notes). Each approach is complicated by authentication, platform dependence, data corruptibility, and referential integrity issues.

The simplest strategy, benign neglect, is also the most common: The message store is backed up with the rest of the data on the server. If a user inadvertently deletes a message considered important, a request to the system administrator can result in it being restored from backup. This strategy can also meet legal requirements to retain e- mail if the system administrator has been notified of the requirement and has adequate storage capacity and processes in place. However, it is also dependent on the original sender/recipient to reestablish the connection between the particular e-mail message and its context among many issues and correspondents. And if the mes-

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

E-Mail Data Stores

Table 1. Where e-mail data is stored

For each e-mail account, a message and any attachments must be stored somewhere until it is deleted. P. Hoffman (2004) offers a succinct list of common component terms:

On individual, end-user systems, a common approach for POP3 users.

Messages are copied from a mail server message store onto an individual’s computer. The messages are then deleted from the server.

On servers, a common approach for IMAP users with web-based mail clients. Messages are stored only on the mail server.

Most mail systems are configured to store messages both on users’ machines and in a central database repository. Large message stores are usually included in system backups, resulting in further replication of messages and attachments.

sage was encrypted, loss of the original key will result in a need for a brute-force decryption, a time-consuming process. This simplest strategy also fails to address referential integrity, the principle that assures all copies of data are congruent and no part of the data loses its association with other elements. For instance, if an attachment is deleted, the message is incomplete; if an Internet site linked to a message is altered or expires, the message no longer retains the same meaning.

Organizations with multiple hardware and software systems struggle to collect and analyze data at the organizational level (i.e., metadata analysis). To improve connections and reduce the time required for metadata analysis, firms may replace numerous free-standing applications with an enterprise resource management (ERP) package, or combine data sources into a more integrated data warehouse (Zeng, Chang, & Yen, 2003). These ERP and data warehouse solutions include hooks for messaging technologies to establish links between departments and even external companies in supply-chain automation. This linking technology is compatible with several major e-mail engines so that firms can leverage their existing e- mail systems’ knowledge in service to these specialized applications.

Combining a corporate-level e-mail system with a corporate-level business software package automates many manual processes and creates a strong audit trail. However, this creates a high degree of dependence on the technology companies who license the software products as well as any consultants who may have been hired to write specialized software routines and database extensions targeting particular business process automation. These type of technology enhancement projects easily run into the tens of millions for initial implementation and millions for annual maintenance. ERP packages and similar integrated software strategies address the referential integrity problem inherent in having multiple systems that describe various aspects of a single transaction. Instead of separate systems that catalog the purchase of a piece of equipment-its installation at a location, maintenance

schedule, depreciation, ultimate removal, and salvaging - a single system tags all these events to the equipment, resulting in a more comprehensive picture. Although this operational cohesion is of high value to management, a firm’s dependence on particular vendors results in loss of competitive pressure. Transitioning to an alternate vendor involves major expense, as does changing the integrated system to meet new business requirements. Historically, firms used software/hardware packages for decades, but that was before software tightly programmed employee behaviors that must shift with changing economic cycles and market challenges.

Authentication between major applications and external data stores can be handled in more than one way, with differing security profiles. The least satisfactory method, from a database administrator point of view, assigns administrative privileges to an application. The database administrator does not control rights of users in the application and cannot match data requests to users. It also exposes the data store to hacking exploits designed to enter the data store using other methods, such as a query engine inherent to the database management system. An alternative method, with named users and permissions at both the database and application level, may require users to authenticate multiple times. This basic dilemma underlies efforts for “single sign on.” In most instances, an end user has several separate entities established with the operating systems of multiple servers as well as applications, and a portion of their authentication has been established in a pass-through or permissionstable fashion. A firm risks security compromises to information based on the degree to which it maintains an active user directory.

Rather than buying a software package and using the messaging software merely to route transactions, other firms have centered on the messaging itself and written extensive software enhancements to the basic message function. This is the Lotus Notes strategy. It is best suited for firms with substantial intellectual property, as opposed to firms with extensive inventory to track or manu-

212

TEAM LinG

E-Mail Data Stores

facturing processes around which the concept of supplychain management was originally developed. This strategy uses a technical staff with a deep understanding of both technology and the firm’s business. Its principal weakness is the required special programming for each integrated element, resulting in its attention primarily to high-value integration. Lower priority data will remain outside the data store, and any messaging regarding those materials will have no connection to the other systems that manage them.

Customer relationship management (CRM) software, while originating in call-center management, can be designed around multimodal communication with customers. The customer becomes the primary focus, whether contacted by phone, fax, e-mail, standard mail, or direct face- to-face meetings. Information about existing and potential customers is compounded into the data store (Bose & Sugumaran, 2003). This strategy combines the referential integrity of major software systems with the intellectual property management of a Lotus Notes model. Its primary weakness is its origin in telephone technologies and freestanding call-center business models. To the degree that firms contract out noncore operations, they potentially fragment the knowledge or open access to internal systems through very weak authentication protocols to partner firms.

McCray and Gallager (2001) catalogs the following principles for designing, implementing, and maintaining a digital library: Expect change, know your content, involve the right people, be aware of proprietary data rights, automate whenever possible, adhere to standards, and be concerned about persistence. These library principles are equally relevant for designing, implementing, and maintaining an e-mail data store regardless of the knowledgemanagement strategy adopted. Applications that interact with messaging systems anticipate stability and persistence in the data store. Messages have mixed data types as attachments, users anticipate the messages will be retrievable to the limits of the organization’s persistence policy, and they must remain readable despite likely hardware and software changes.

An organization’s managers and staff can consider technical options and decide upon a workable information management strategy. This does not end their e-mail datamanagement considerations. The comprehensive information contained in metadata stores, and the persistence of the information, creates legal and ethical questions that must be considered. Legal matters, new privacy concerns, and historic fiduciary requirements result in a body of intergovernmental regulations to consider as well.

LEGAL AND ETHICAL QUESTIONS

E

In the United States, existing e-mail messages are subject to discovery, a legal process to compel the exchange of information. Because of this risk, and to limit the amount of server storage space that must be managed, corporations may establish e-mail policies to routinely purge data stores. To the degree end users transfer messages to their local machines, these efforts can be confounded. Furthermore, e-mail stores are constrained legally to be simultaneously available and unavailable. For public agencies, the Freedom of Information Act requires accessibility, and the Privacy Act of 1974 requires that all personal information be protected (Enneking, 1998).

Business trends and government rulemaking efforts have resulted in a technologically complex environment for e-mail data stores. Globalization-the economic and social changes related to penetration of linking technologies, such as computer networking and the accompanying rise of multinational firms-creates exposure to multiple nation–state rules, though no international body enforces any constraints on global firms; enforcement remains state-specific. A plethora of international regulations on the use of encryption has affected vendors of e-mail technologies and created variations in allowable management of message stores. Case law and federal legislation have tried to reduce the quantity of unsolicited e-mail or “spam” that comprises a sizable percentage of all Internet messages. Regarding data replication, a ruling determined that the state’s privacy act was not violated when copies of e- mail messages were printed or stored as evidence (Washington State v. Townsend, 2001).

Lawyers have special interests in e-mail in terms of managing legal matters related to technology used by their clients and also within their own firms. Hopkins and Reynolds (2003) argued that ethical representation of a client should require informing the client of the value of an encrypted message, and the use of encryption for all communications, despite a 1999 American Bar Association position that stated lawyers did not violate rules of professional contact in sending unencrypted e-mail. The capacity to easily encrypt (which is now built into most commercial e-mail products) combined with increases in cyber crime and ease by which data packets can be intercepted, suggests a preference for encrypted e-mail as more secure than traditional mail or faxes that are customary methods for lawyer–client communication.

Inadvertent mistakes in using e-mail applications can result in erroneous transmissions of what in other media might be considered confidential and privileged informa-

213

TEAM LinG

tion. For instance, pressing “reply all” instead of “reply” might send information to parties outside a privileged contact (Hall & Estrella, 2000). Judicial and ethical opinion is divided on whether the act of sending the e-mail— even inadvertently—waives attorney-client privilege, resulting in each situation being evaluated individually, according to the jurisdiction involved. For the most part, the receiving party is able to review and retain these materials for use. Table 2 cites instances of these mistakes.

Emerging communication technologies, such as instant messaging and third-party email servers, create new legal challenges for firms trying to manage their information (Juhnke, 2003).

FUTURE TRENDS

A degree of stability is created by implementing standard records management processes and using application or platform-specific archiving methods for e-mail data stores. But messaging data management is complicated by adoption of new communication media, such as instant messaging (IM) and Webcasting. In IM or chat mode, a user is online in real time with another user at another computer, both network connected. Originally, chat sessions were not captured and stored, but were transient, similar to most telephone calls. And similar to a modern telephone conversation with a customer service representative, IM sessions are now considered a part of the normal business communications to be captured, stored, indexed, and analyzed.

The U.S. government requires financial firms to be able to turn over IM logs within 24 hours of a request, just as with an e-mail. Failing to meet this requirement can be costly: The Securities and Exchange Commission fined five firms $8.25 million dollars in 2002 for not preserving e-mail and making it available as required (T. Hoffman, 2004).

No straightforward method exists to manage multiple data stores created on separate devices and managed through multiple proprietary networks. Basic individual e-mail stores on computers can generally be sorted into

Table 2. Most infamous e-mail faux pas

E-Mail Data Stores

folders, clustering related ideas on a topic. Store-and- forward mail servers catalog details on recipients of e- mail. Instant messaging is session-oriented and transitory. Although some large, technology-oriented firms store these sessions for replay, they are not so easily indexed and searched, much less integrated into a company’s metadata repository. Rapid adaptation of new communication technologies results in periodic stranding of data and loss of business intelligence.

CONCLUSION

Over a quarter century has passed since the Internet messaging protocols defining e-mail message transfers were adopted, expanding communication options. E-mail and the enhancement of IM are simple technologies easily mastered and congruent with human psychology. Organizations can capture these interactions, resulting in massive data stores with a rich mix of data types in attachments. Their importance is evident in efforts to incorporate the knowledge collected in these communication media within ERP and CRM installations.

Legal and ethical constraints may exceed the technical and sociologic challenges. To create a framework that can succeed, organizations must include in their business strategies routine assessments of success factors in their adoption and adaptation of proprietary messaging and application software. High failure rates exist for many of these projects. Emerging case law regarding e-mail increases the complexity in maximal use of message stores. Proliferation of messages and limited manageability are side effects of these policies.

These attributes of metadata creation and analysis favor the largest firms with the most sophisticated staff and data-management systems. The smallest, nimblest firms can adopt any emerging trend in information, and must do so to create a strategic niche. Once a trend persists, these small firms can be acquired and incrementally their new data media integrated into the larger firm. So it appears that form and function are indeed merging between technology and process.

Learning the hard way that “DELETE” doesn’t necessarily get rid of e-mail messages. By 1986, the entire White House was using IBM’s PROFS (“professional office system”) e-mail software. In late November of that year, Oliver North and John Poindexter deleted thousands of e-mail messages as the Iran-Contra scandal broke. The system backups were not destroyed, however, and the Tower Commission was able to obtain this material in its hearings on unauthorized arms-for-hostages deals (Blanton, 1995).

The Morris Worm. The first Internet worm (the Morris worm) was launched November 2, 1988. It exploited a hole in UNIX’s sendmail routine. The spread of the worm was an unintended result of a flaw in Morris’s code (Hafner & Markoff, 1995).

214

TEAM LinG

E-Mail Data Stores

REFERENCES

Blanton, T. S., ed.(1995). White House e-mail: The topsecret messages the Reagan/Bush White House tried to destroy (Book and Disk). New York: New Press.

Bose, R., & Sugumaran, V. (2003). Application of knowledge management technology in customer relationship management. Knowledge and Process Management, 10(1), 3-17.

Crocker, D. H., Vittal, J. J., Pogran, K. T., & Henderson, D. A., Jr. (1977). Standard for the formation of ARPA network text messages (RFC 733). Department of Defense, Defense Advanced Research Projects Agency, Washington, DC.

Enneking, N. E. (1998). Managing e-mail: Working toward an effective solution. Records Management Quarterly, 32(3), 24-38.

Hafner, K., & Markoff, J. (1995). Cyberpunk: Outlaws and Hackers on the Computer Frontier (Revised). New York: Simon & Schuster.

Hall, T., J., & Estrella, R. V. (2000). Privilege and the errant email. The Journal of Proprietary Rights, 12(4), 2-7.

Hoffman, P. (2004). Terms used In Internet mail. Internet Mail Consortium. Retrieved March 30, 2004, from http:/ /www.imc.org/terms.html.

Hoffman, T. (2004). Banks, brokerages dogged by e-mail regulations. Retrieved June 29, 2004, from http:// www.computerworld.com

Hopkins, R. S., & Reynolds, P. R (2003). Redefining privacy and security in the electronic communication age: A lawyer’s ethical duty in the virtual world of the Internet.

The Georgetown Journal of Legal Ethics, 16(4), 675-692.

Juhnke, D. H. (2003). Electronic discovery in 2010. Information Management Journal, 37(6), 35-42.

McCray, A. T., & Gallagher, M. E. (2001). Principles for digital library development. Communications of the ACM, 44(5), 48-54.

Mossberg, W. S. (2004, March 25). New program searches hard disks really well, but has rough edges. Wall Street Journal, p. B1.

Washington State v. Townsend, 105 Wn. App. 622, 20 P. 3d. 1027 2001 Wash. App. LEXIS 567 (2001).

Zeng, Y. Chang., R. H. L., & Yen, D. C. (2003). Enterprise integration with advanced information technologies: ERP

and data warehousing. Information Management and

E

Computer Security, 11(2/3), 115-122.

KEY TERMS

Attachment: An extra file, in any file format, that is linked to a e-mail message. The e-mail message itself must structurally conform to messaging protocols.

E-Mail Bomb: An attempt to overwhelm a mail server by sending large numbers of e-mails to a particular account, consuming system resources and initiating a denial of legitimate access.

Header: The beginning portion of a message. By design, it should contain the source and target address for the message.

IM: Instant messaging. A method for real-time communication over a wired or wireless network. An evolution from IRC, inter-relay chat, an early Internet real-time communication protocol.

IMAP: Internet message access protocol, defined in RFC2060.

Mail Client: A software process that moves mail from a message store and presents it to a user.

Mail Server: A software process that receives mail from other mail systems and manages the message store.

Message Store: The physical location where messages are held until a mail client retrieves them. The type of file or files varies by software package. Some are monolithic database structures; some may be encrypted. Others are plain text files.

POP, POP3: Post office protocol, defined in RFC 1939. A process that authorizes a transfer of mail messages to a user’s computer, then updates the data store source. This update may optionally include deleting the stored message.

RFCs: Requests for comments. The process and rules by which distributed computing standards are designed. The RFC process and the Internet’s protocols are developed and managed by the Internet Engineering Task Force (IETF; comprehensive information on RFCs can be found online at http://www.ietf.org/rfc.html).

Worm: A self-replicating, self-propagating program that uses basic computer operating system code to transfer instances of itself from computer to computer.

215

TEAM LinG

216

Engineering Information Modeling in

Databases

Z.M.Ma

Northeastern University, China

INTRODUCTION

Computer-based information technologies have been extensively used to help industries manage their processes and information systems become their nervous center. More specifically, databases are designed to support the data storage, processing, and retrieval activities related to data management in information systems. Database management systems provide efficient task support and tremendous gain in productivity is thereby accomplished using these technologies. Database systems are the key to implementing industrial data management. Industrial data management requires database technique support. Industrial applications, however, are typically dataand knowledge-intensive applications and have some unique characteristics (e.g., large volumes of data with complex structures) that makes their management difficult. Product data management supporting various life-cycle aspects in the manufacturing industry, for example, should not only to describe complex product structure but also manage the data of various lifecycle aspects from design, development, manufacturing, and product support. Besides, some new techniques, such as Web-based design and artificial intelligence, have been introduced into industrial applications. The unique characteristics and usage of these new technologies have created many potential requirements for industrial data management, which challenge today’s database systems and promote their evolvement.

BACKGROUND

From a database-technology standpoint, information modeling in databases can be identified at two levels: (conceptual) data modeling and database modeling, which results in conceptual (semantic) data models and logical database models. Generally, a conceptual data model is designed, and then the designed conceptual data model is transformed into a chosen logical database schema. Database systems based on logical database models are used to build information systems for data management. Much attention has been directed at conceptual data modeling of industrial information systems. Product data models, for example, can be viewed as a class

of semantic data models (i.e., conceptual data models) that take into account the needs of engineering data. Recently conceptual data modeling of enterprises has received increased attention.

Generally speaking, traditional ER/EER (Entity-Rela- tionship/Extended Entity Relationship) or UML models in the database area can be used for industrial data modeling at the conceptual level. But limited by the power of the aforementioned data models in industrial data modeling, some new conceptual data models such as IDEF1X and STEP/EXPRESS have been developed. In particular, to implement the share and exchange of industrial data, the Standard for the Exchange of Product Model Data (STEP) is being developed by the International Organization for Standardization (ISO). EXPRESS is the description method of STEP and a conceptual schema language, which can model product design, manufacturing, and production data. EXPRESS model hereby becomes a major one of conceptual data models for industrial data modeling. Much research has been reported on the database implementation of the EXPRESS model in context of STEP, and some software packages and tools are available in markets.

As to industrial data modeling in database systems, the generic logical database models, such as relational, nested relational, and object-oriented databases, have been used. However, these generic logical database models do not always satisfy the requirements of industrial data management. In nontransaction processing, such as CAD/CAM (Computer-Aided Design/Computer-Aided Manufacturing), knowledge-based system, multimedia, and Internet systems, most of the data-intensive application systems suffer from the same limitations of relational databases. Some nontraditional database models based on the aforementioned special, hybrid, or extended database models have been proposed accordingly.

MAJOR ISSUES AND SOLUTIONS

Conceptual Data Models

Much attention has been directed at conceptual data modeling of engineering information (Mannisto, Peltonen, Soininen, & Sulonen, 2001; McKay, Bloor, & de

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Engineering Information Modeling in Databases

Pennington, 1996). Product data models, for example, can be viewed as a class of semantic data models (i.e., conceptual data models) that take into account the needs of engineering data (Shaw, Bloor, & de Pennington, 1989). Recently, conceptual information modeling of enterprises such as virtual enterprises has received increasing attention (Zhang & Li, 1999). Generally speaking, traditional ER (P. P. Chen, 1976) and EER can be used for engineering information modeling at conceptual level. But, limited by their power in engineering modeling, some new conceptual data models have been developed.

IDEF1X is a method for designing relational databases with a syntax designed to support the semantic constructs necessary in developing a conceptual schema. Some researchers have focused on the IDEF1X methodology (a thorough treatment of the IDEF1X method can be found in Wizdom Systems, 1985). The use of the IDEF1X methodology to build a database for multiple applications was addressed in Kusiak, Letsche, and Zakarian (1997).

As mentioned earlier, STEP provides a means to describe a product model throughout its life cycle and to exchange data between different units. STEP consists of four major categories, namely, description methods, implementation methods, conformance testing methodology and framework, and standardized application data models/schemata. EXPRESS (Schenck & Wilson, 1994), being the description methods of STEP and a conceptual schema language, can model product design, manufacturing, and production data, and the EXPRESS model hereby becomes a major conceptual data model for engineering information modeling.

On CAD/CAM development for product modeling, Eastman and Fereshetian (1994) conducted a review and studied five information models used in product modeling, namely, ER, NAIM, IDEF1X, EXPRESS, and EDM. Compared with IDEF1X, EXPRESS can model complex semantics in engineering applications, including engineering objects and their relationships. Based on the EXPRESS model, it is easy to implement the share and exchange of engineering information.

It should be noted that ER/EER, IDEF1X and EXPRESS could model neither knowledge nor fuzzy information. The first effort was undertaken by Zvieli and Chen (1986) to extend ER models to represent three levels of fuzziness. The first level refers to the set of semantic objects, resulting in fuzzy entity sets, fuzzy relationship sets, and fuzzy attribute sets. The second level concerns the occurrences of entities and relationships. The third level relates to the fuzziness in attribute values of entities and relationships. Consequently, ER algebra was fuzzily extended to manipulate fuzzy data. In G. Q. Chen and Kerre (1998), several major notions in the EER model were extended, including fuzzy exten-

sion to generalization/specialization, and shared subclass/category, as well as fuzzy multiple inheritance, E fuzzy selective inheritance, and fuzzy inheritance for derived attributes. More recently, using fuzzy sets and possibility distribution (Zadeh, 1978), fuzzy extensions

to IDEF1X (Ma, Zhang, & Ma, 2002) and EXPRESS were proposed and (Ma, Zhang, & Ma, 2001; Ma, Zhang, Ma, & Chen, 2000), respectively.

Unified modeling language (UML; Booch, Rumbaugh, & Jacobson, 1998; OMG, 2003), standardized by the Object Management Group (OMG), is a set of OO (object oriented) modeling notations. UML provides a collection of models to capture the many aspects of a software system. From an information modeling point of view, the most relevant model is the class model. The building blocks in this class model are those of classes and relationships. The class model of UML encompasses the concepts used in ER as well as other OO concepts. In addition, it also presents the advantage of being open and extensible, allowing its adaptation to the specific needs of the application, such as workflow modeling of e-commerce (Chang, Chen, Chen, & Chen, 2000) and product structure mapping (Oh, Hana, & Suhb, 2001). In particular, the class model of UML is extended for the representation of class constraints and the introduction of stereotype associations (Mili et al., 2001).

With the popularity of Web-based design, manufacturing and business activities, the requirement has been put on the exchange and share of engineering information over the Web. Because of limitations of HTML in content-based information processing, the World Wide Web Consortium created eXtensible markup language (XML), a language similar in format to HTML, but more extensible. This new language lets information publishers invent their own tags for particular applications or work with other organizations to define shared sets of tags that promote interoperability and that clearly separate content and presentation. XML provides a Webfriendly and well-understood syntax for the exchange of data. Because XML affects the way data is defined and shared on the Web (Seligman & Rosenthal, 2001), XML technology has been increasingly studied, and more and more Web tools and Web servers support XML. Bourret (2001) developed product data markup language, the XML for product data exchange and integration. As to XML modeling at concept level, Conrad, Scheffner and Freytag (2000) used UML was used for designing XML DTD. Xiao, Dillon, Chang, and Feng (2001) developed an object-oriented conceptual model to design XML schema. Lee, Lee, Ling, Dobbie, and Kalinichenko (2001) used the ER model for the conceptual design of semistructured databases in. Note, however, that XML does not support imprecise and uncertain information modeling as well as it does knowledge modeling. Intro-

217

TEAM LinG

ducing imprecision and uncertainty into XML has received increased attention (Abiteboul, Segoufin, & Vianu, 2001; Damiani, Oliboni, & Tanca, 2001).

LOGICAL DATABASE MODELS

Classical Logical Database Models

For engineering information modeling in database systems, the generic logical database models (e.g., relational databases, nested relational databases, objectoriented databases) can be used. Also, some hybrid logical database models, such as object-relational databases, are very useful for this purpose. On top of a relational DBMS, Arnalte and Scala (1997) build an EXPRESSoriented information system for supporting information integration in a computer-integrated manufacturing environment. In this case, the conceptual model of the information was built in EXPRESS and then parsed and translated to the corresponding relational constructs. Relational databases for STEP/EXPRESS were also discussed in Krebs and Lührsen (1995). In addition, Barsalou and Wiederhold (1990) developed an object-oriented layer to model complex entities on top of a relational database. This domain-independent architecture permits object-oriented access to information stored in relational format information that can be shared among applications.

Object-oriented databases provide an approach for expressing and manipulating complex objects. A prototype object-oriented database system called ORION was thus designed and implemented to support CAD (Kim, Banerjee, Chou, & Garza, 1990). Goh Jei, Song, and Wang (1994,1997) studied object-oriented databases for STEP/EXPRESS.. In addition, Dong, Chee, He, and Goh (1997) designed an object-oriented active database for STEP/EXPRESS models. According to the characteristics of engineering design, Samaras, Spooner, and Hardwick (1994) provided a framework for the classification of queries in object-oriented engineering databases, in which‘ the strategy for query evaluation is different from traditional relational databases.

XML Databases

It is crucial for Web-based applications to model, storage, manipulate, and manage XML data documents. XML documents can be classified into data-centric documents and document-centric documents (Bourret, 2001). Data-centric documents have a fairly regular structure, fine-grained data (i.e., the smallest independent unit of

Engineering Information Modeling in Databases

data is at the level of a PCDATA-only element or an attribute), and little or no mixed content. The order in which sibling elements and PCDATA occurs is generally not significant, except when validating the document. Data-centric documents are documents that use XML as a data transport. They are designed for machine consumption, and it is usually superfluous that XML is used at all. That is, it is not important to the application or the database that the data is, for some length of time, stored in an XML document.

As a general rule, the data in data-centric documents is stored in a traditional database, such as a relational, object-oriented, or hierarchical database. The data can also be transferred from a database to an XML document. For the transfers between XML documents and databases, the mapping relationships between their architectures and their data should be created (Lee & Chu, 2000; Surjanto, Ritter, & Loeser, 2000). Note that it is possible to discard information such as the document and its physical structure when transferring data between them. It must be pointed out, however, that data in data-centric documents, such as semistructured data, can also be stored in a native XML database, in which a document-centric document is usually stored. Document-centric documents are characterized by less regular or irregular structure, larger grained data (i.e., the smallest independent unit of data might be at the level of an element with mixed content or the entire document itself), and a large amount of mixed content. The order in which sibling elements and PCDATA occur is almost always significant. Document-centric documents are usually designed for human consumption. As a general rule, document-centric documents are stored in a native XML database or a content management system (i.e., an application designed to manage documents and built on top of a native XML database). Native XML databases are databases designed especially for storing XML documents. The only difference between native XML databases and other databases is that their internal model is based on XML and not something else, such as the relational model.

In practice, however, the distinction between datacentric and document-centric documents is not always clear. So the aforementioned rules are not absolute. Data, especially semistructured data, can be stored in native XML databases, and documents can be stored in traditional databases when few XML-specific features are needed. Furthermore, the boundaries between traditional databases and native XML databases are beginning to blur, as traditional databases add native XML capabilities and native XML databases support the storage of document fragments in external databases.

218

TEAM LinG

Engineering Information Modeling in Databases

Special, Hybrid and Extended Logical

Database Models

It should be pointed out that the generic logical database models, such as relational databases, nested relational databases, and object-oriented databases, do not always satisfy the requirements of engineering modeling. As pointed out in Liu (1999), relational databases do not describe the complex structure relationship of data naturally, and separate relations may result in data inconsistencies when updating the data. In addition, the problem of inconsistent data still exists in nested relational databases, and the mechanism of sharing and reusing CAD objects is not fully effective in objectoriented databases. In particular, these database models cannot handle engineering knowledge. Special databases, such as those based on relational or objectoriented models, are hereby introduced. Dong and Goh (1998) developed an object-oriented active database for engineering application to support intelligent activities in engineering applications. Deductive databases were considered preferable to database models for CAD databases and deductive object-relational databases for CAD were introduced (Liu, 1999). Constraint databases based on the generic logical database models are used to represent large or even infinite sets in a compact way and are suitable hereby for modeling spatial and temporal data (Belussi, Bertino, & Catania, 1998; Kuper, Libkin, & Paredaens, 2000). Also, it is well established that engineering design is a constraint-based activity (Dzbor, 1999; Guiffrida & Nagi, 1998; Young, Giachetti, & Ress, 1996). So constraint databases are promising as a technology for modeling engineering information that can be characterized by large data in volume, complex relationships (i.e., structure, spatial, or temporal semantics), and intensive knowledge. Posselt and Hillebrand (2002) investigated the issue about constraint database support for evolving data in product design.

It should be noticed that fuzzy databases have been proposed to capture fuzzy information in engineering (Sebastian & Antonsson, 1996; Zimmermann, 1999). Fuzzy databases may be based on the generic logical database models such as relational databases (Buckles & Petry, 1982; Prade & Testemale, 1984), nested relational databases (Yazici, Soysal, Buckels, & Petry, 1999), and object-oriented databases (Bordogna, Pasi, & Lucarella, 1999; George, Srikanth, Petry, & Buckles, 1996; Van Gyseghem & Caluwe, 1998). Also, some special databases are extended for fuzzy information handling. Medina, Pons, Cubero, and Vila, (1997) presented the architecture for deductive fuzzy relational database, and Bostan and Yazici (1998) proposed a fuzzy deductive object-oriented data model. More recently, Saygin and Ulusoy (2001) investigated how to construct

fuzzy event sets

automatically and to apply them to

 

E

active databases.

 

 

 

 

FUTURE TRENDS

Database modeling for engineering serves as engineering applications, and the evolution of engineering application in data management challenges database technologies and promotes their evolvement. Database modeling must provide effective support for current Webbased, distributed, knowledge-based, and intelligent engineering data management.

CONCLUSION

Computer, network, and information technologies have been the foundation of today’s industrial enterprises. Enterprises obtain an increasing variety of products and products with low prices, high quality, and shorter lead time by using these technologies. On the other hand, as manufacturing engineering is a data and knowledge intensive application area, some new requirements are needed, which impose a challenge to current technologies and promote their evolvement. Database systems, being the depository of data, are the cores of information systems and provide the facilities of data modeling and data manipulation.

This article reviews database technologies in engineering information modeling. The contribution of the article is to identify the direction of database study viewed from engineering applications and provide modeling tools for engineering design, manufacturing, and production management. Perhaps more powerful database models will be developed to satisfy engineering information modeling.

REFERENCES

Abiteboul, S., Segoufin, L., & Vianu, V. (2001). Representing and querying XML with incomplete information. Proceedings of the 12th ACM SIGACT-SIGMOD- SIGART Symposium on Principles of Database Systems

(pp. 150-161).

Arnalte, S., & Scala, R. M. (1997). An information system for computer-integrated manufacturing systems. Robotics and Computer-Integrated Manufacturing, 13(3), 217228.

Barsalou, T., & Wiederhold, G. (1990). Complex objects for relationaldatabases.Computer-AidedDesign,22(8),458-468.

219

TEAM LinG

Соседние файлы в предмете Электротехника