
Rivero L.Encyclopedia of database technologies and applications.2006
.pdf
720
Using Semantic Web Tools for Ontologies Construction
Gian Piero Zarri
University of Paris IV/Sorbonne, France
INTRODUCTION: ONTOLOGIES AND SEMANTIC WEB
The current state of Web technology—the “first generation” or “syntactic” Web—gives rise to well-known serious problems when trying to accomplish, in a nontrivial way, essential tasks like indexing, searching, extracting, maintaining, and generating information. These tasks would, in fact, require some sort of “deep understanding” of the information dealt with. In a “syntactic” Web context, on the contrary, computers are only used as tools for posting and rendering information by brute force. Faced with this situation, Tim BernersLee first proposed a sort of “Semantic Web,” where the access to information is based mainly on the processing of the semantic properties of this information: “The Semantic Web is an extension of the current Web in which information is given well-defined meaning [italics added], better enabling computers and people to work in cooperation” (Berners-Lee, Hendler, & Lassila, 2001, p. 35). The Semantic Web’s challenge consists then in being able to access and retrieve information on the Web by “understanding” its proper semantic content (its meaning) and not simply by matching some keywords.
From a technical point of view, the Semantic Web vision is deeply rooted into an “ontological” approach, with some proper characteristics that differentiate it from the “classical” approach to the construction of ontologies that has been described in companion article of this encyclopedia title “Ontologies and Their Practical Implementation.” We will describe these characteristics in the following sections.
BACKGROUND INFORMATION
Berners-Lee’s Architectural Proposal for the Semantic Web
To support his vision, Berners-Lee has proposed in several talks an “architecture” for the Semantic Web like that reproduced in Figure 1. Making abstraction now from all the discussions and criticisms that this
Figure 1. Semantic Web architecture according to Tim Berners-Lee
proposal has brought up, what is relevant for the topic of this article is the central position that “ontologies” occupy in the architecture—and that nobody wishes to challenge. A first important difference with respect to what is expounded in the article “Ontologies and Their Practical Implementation” is, however, that ontologies are no more considered “in isolation”: They are now supported by lower-level tools like XML and RDF and must also implement an additional logic level.
In the embedded architecture of Figure 1, Unicode and URI make up the basis of the hierarchy. The Unicode standard provides a unique numerical code for every character that can be found in documents produced according to any possible language, no matter what are the hardware and software used to deal with such documents. It is supported in many operating systems and all the modern browsers, and it enables a single software product or a single Web site to be targeted across multiple platforms, languages, and countries without reengineering. URI (Uniform Resource Identifier) represents a generalization of the well-known URL (Uniform Resource Locator), which is used to identify a “Web resource” (e.g., a particular page) by denoting its primary access mechanism (essentially, its “location” on the network). URI has been created to allow recording information about all those “notions” that, unlike Web pages, do not have network locations or URLs but that need to be referred to in an RDF statement. These notions include network-accessible things, such as an electronic document or an image, and things that are not
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG

Using Semantic Web Tools for Ontologies Construction
network-accessible, such as human beings, corporations, and bound books in a library, or abstract concepts like the concept of a “creator.”
XML (Extensible Markup Language; see Bray, Paoli, Sperberg-McQueen, Maler, & Yergeau, 2004), has been created to overcome some difficulties proper to HTML (Hypertext Markup Language), developed in 1989 by Tim Berners-Lee as a means for sharing information from any location. An HTML file is a text file characterized by the presence of a small set of “tags”—like <Head>, <Body>, <Input>, <Applet>, <Font>, etc.— that instruct the Web browsers how to display a given Web page. HTML is, then, a “presentation-oriented” markup tool. In spite of its evident utility, HTML suffers from a number of limitations, from its lack of efficiency in handling the complex client/server communication of today’s applications to (mainly) the impossibility of defining new tags to customize exactly the user’s needs. XML is called “extensible” because, at the difference of HTML, it is not characterized by a fixed format but lets the user design his own customized markup languages (a specific DTD, Document Type Description, see below) for limitless different types of documents; XML is a “content-oriented” markup tool. Basically, the syntactic structure of XML is very simple. Its markup elements are normally identified by an opening and a closing tag, like <employees> and </employees> and may contain other elements or text. The elements must be properly nested, and every XML document must have exactly one root element. Markup elements can be characterized by adding attribute/value pairs inside the opening tag of the element, like <person name=”Mary”>. Taking into account the nesting constraint, a very simple fragment of XML document could then be represented as: <employees> <person name=”Mary”> <id>99276</id> </person> </employees>. To allow a computer to interpret correctly a fragment like this, it is necessary, however, to specify the semantics of the markup elements and tags used to make up it; a simple way of doing this is to make use of a DTD. A DTD is a formal description in XML declaration syntax of a particular type of document. It begins with a <!DOCTYPE keyword and sets out what names are to be used for the different types of markup elements, where they may occur, the elements’ possible attributes, and how they all fit together. For example, a DTD may specify that every person markup element must have a name attribute, and that it can have an offspring element called id whose content must be text. Before reading an XML document, the validating parsers and the application programs (editors, search engines, navigators, databases) read the corresponding DTD so that they can identify where every element type ought to come and how each relates to the other. There are many
sorts of DTDs ready to be used in all kinds of areas (see,
e.g., http://www.w3.org/QA/2002/04/valid-dtd- 7 list.html#full) that can be downloaded and used freely.
Some of them are MathML, for mathematical expressions; SMIL, Sync Multimedia Integration Language; CML, Chemical Markup Language; OSD, Open Software Description; EDI, Electronic Data Interchange; PICS, Platform for Internet Content Selection; etc. A more complete way of specifying the semantics of a set of XML markup elements is to make use of XML Schema (as mentioned in Figure 1). XML Schema (see Biron & Malhotra, 2001; Thompson, Beech, Maloney, & Mendelsohn, 2001) supplies a more complete grammar for specifying the structure of the elements allowing one, e.g., to define the cardinality of the offspring elements, default values, etc.
RDF, Resource Description Framework
Moving up in the structure of Figure 1, we find now RDF (Resource Description Framework), an example of “metadata” language (metadata = data about data) used to describe generic “things” (“resources,” according to the RDF jargon) on the Web. An RDF document is, basically, a list of statements under the form of triples having the classical format: <object, property, value>, where the elements of the triples can be URIs (Universal Resource Identifiers, see above), literals (mainly, free text), and variables. To follow a well-known RDF ex- ample—reproduced, e.g., in the 2004 edition of the “official” W3C RDF Primer (Manola & Miller)—let us suppose we want to represent a situation where someone named John Smith has created a particular Web page. We will then make use of the RDF triple: <http:// www.example.org/index.html(object), creator(property), john_smith (value)>. Adding additional information about the situation, by stating, e.g., that the Web page was created May 15, 2004, and that the language in which the page is written is English, amounts to adding two statements:<http://www.example.org/index.html(object), creation_date (property), May 15, 2004 (value)> and <http://www.example.org/index.html(object), language
(property), English (value)>. Note that, unfortunately, RDF uses a particular terminology for denoting the three elements of the triples, calling then subject, predicate, and object, respectively, the object, property, and value elements of the triples. This decision is unfortunate because it introduces an undue confusion with well-defined and totally different linguistic categories.
RDF triples are very easily represented as directed labeled graphs, by denoting resources as ovals, properties (predicates) as arrows, and literal values like May 15, 2004 or English within boxes. Figure 2a represents then in graph form the original statement: “John Smith has
721
TEAM LinG

Using Semantic Web Tools for Ontologies Construction
Figure 2. RDF statements represented into graph format
created a Web page.” The addition of information about date and language gives rise to the graph of Figure 2b, given that groups of statements are represented by corresponding groups of nodes and arcs. Note that, to simulate the actual conditions of utilization of RDF, the properties creator, creation_date, and language in Figure 2 have been replaced, respectively, by http://pur1.org/dc/elements/1.1/creator, http://www.example.org/terms/cre- ation-date, and http://pur1.org/dc/elements/1.1/language; analogously, john_smith has been replaced by http:// www.example.org/staffid/85740. All these “http://…” terms are URIs that identify in an unambiguous way specific RDF entities; more exactly, they refer to the ontologies/metadata repositories/lists of reserved domain names where these entities are defined. For example, “http:// pur1.org/dc/…” refers to the collection of metadata terms maintained by the Dublin Core Metadata Initiative (see, e.g., Dekkers & Weibel, 2003). In this collection, e.g., http:/ /pur1.org/dc/elements/1.1/creator is defined as: “An entity primarily responsible for making the content of the
resource.” The literal en (Unicode characters) is an international standard two-letter code for English, see http:/
/purl.org/dc/elements/1.1/language; the example.org
Internet domain name is reserved for documentation purposes.
From what has been expounded until now, RDF seems to be nothing more than a downgraded form, Internetoriented, of semantic networks as they were used in the artificial intelligence domain at the beginning of the 1970s. Its significance in a Semantic Web context becomes more evident when we examine the way of writing RDF statements into XML format—the so-called RDF/ XML syntax (see Beckett, 2004)—i.e., when RDF is seen as a sort of additional DTD of XML. Table 1 reproduces then the simple example of Figure 2b, making use of the RDF/XML syntax.
The first line of the code, <?xml version=”1.0"?> is the “XML declaration,” which states that what follows consists of XML and which specifies the version used. In the second line we find an XML markup element that
Table 1. The RDF/XML syntax
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:exterms="http://www.example.org/terms/">
<rdf:Description rdf:about="http://www.example.org/index.html"> <exterms:creation-date>May 15, 2004</exterms:creation-date> <dc:language>en</dc:language>
<dc:creator rdf:resource="http://www.example.org/staffid/85740"/> </rdf:Description>
</rdf:RDF>
722
TEAM LinG

Using Semantic Web Tools for Ontologies Construction
starts with the tag <rdf:RDF—this tag specifies that all the following XML code, until the </rdf:RDF> tag of the last line, is intended to represent RDF statements—and ends with the “>” symbol at the right limit of line 4. Within this markup element we find three “XML attributes,” see above, of the opening <rdf:RDF tag; all these attributes (xmlns attributes) have as values the declarations of the namespaces to be used within the RDF/XML code. An attribute like xmlns:rdf means that, according to “value” associated with this attribute (after the “=” symbol), all the terms/tags included in this RDF/XML content and prefixed with rdf: are part of the namespace identified with the URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#; analogously for the xmlns:dc (Dublin Core terms) and xmlns:exterms (example terms) attributes.
After these preliminary “housekeeping” declarations, lines 5 through 9 represent the core of the RDF/XML coding of the example. The rdf:Description start tag of line 5 indicates that we are now introducing the “description” of a resource; this resource, http:// www.example.org/index.html, is identified as the value of the rdf:about attribute of the start tag. The three following lines, 6 through 8, are examples of use of “property element” constructions. In these lines, the tags are built up according to the XML Qname (Qname = qualified name) convention, which allows shortening the writing of full RDF triples by introducing abbreviations for the URI references. A Qname tag contains, in fact, a “prefix” that denotes a given namespace (e.g., exterms in line 6) followed, after a colon, by a “local name” (creation-date, i.e., the name of the property). A full URI reference is then created by appending the local name to the URI of the namespace identified by the first part of the Qname. For lines 6 through 8, the full URIs become then http://www.example.org/terms/creation-date, http:// pur1.org/dc/elements/1.1/ Language, and http:// pur1.org/dc/elements/1.1/creator. Note that, for the properties corresponding to literals (lines 6 and 7), the values of the properties are directly included within opening and closing Qname tags. For the property of line 8, which corresponds to a resource, the value corresponds to the value of the rdf:resource attribute of the dc:creator Qname tag. The description of the resource introduced in line 5 ends with the closing tag of line 9.
To conclude about RDF, we will note that RDF Schema (or RDFS; see Brickley & Guha, 2004) provides a mechanism for constructing specialized RDF vocabularies through the description of domain-specific properties. This is obtained mainly by describing the properties in terms of the classes of resources to which they apply. For example, we could define the creator property, saying that it has the resource document as “domain” (document is the value or “object” of this property) and the resource person as “range” (this property must always be associ-
ated with a resource person, its “subject”). Other basic modeling primitives of RDFS allow setting up hierarchies 7 (taxonomies), both hierarchies of concepts, thanks to the
use of “class” and “subclass-of” statements, and hierarchies of properties, thanks to the use of “property” and “subproperty-of” statements. Instances of a specific class (concept) can be declared, making use of the “type” statement. Note eventually that, notwithstanding the introduction of these primitives, RDFS is still quite simple compared to “true” knowledge representation languages.
OWL, The Web Ontology Language
Passing to the next stage of the structure of Figure 1, we can make three general remarks about the notion of “ontology” in a Semantic Web context:
•The basic “nature” of ontologies as described in the article “Ontologies and Their Practical Implmentation” does not change fundamentally in this new context: They are still formed by hierarchies (DAGs) of concepts defined through properties and values.
•For their practical implementation, however, these Semantic Web ontologies make large use of the RDF/XML syntactic/semantic constructs.
•Taking also into account that the level that follows “ontology vocabulary” in the pyramid of Figure 1 is “logic,” the Semantic Web ontologies evidence a very strong logic influence.
On February 10, 2004, the W3C published an official recommendation concerning OWL, the Web Ontology Language (see Bechhofer et al., 2004). W3C—the World Wide Web Consortium, coordinated by MIT (USA), ERCIM, the European Research Consortium for Informatics and Mathematics, and the Keio University (Japan)—includes all the main bodies on Earth interested in the developments of Internet and the Web. At the beginning of this document it is stated that: “The Web Ontology Language (OWL) is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. … An OWL ontology is an RDF graph, which is in turn a set of RDF triples.” The mention of DAML+OIL (McGuinness, Fikes, Hendler, & Stein, 2002) explains the strong logic orientation of OWL— given that OIL (Ontology Inference Layer), the “European” component of DAML+OIL (DAML is the Darpa Agent Markup Language), was implemented in description Logics (DL) terms—DL (Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider, 2002) have
723
TEAM LinG
Using Semantic Web Tools for Ontologies Construction
been created to offer a formal foundation for frame-based systems.
OWL consists of three subsets (three specific sublanguages) characterized by an increasing level of complexity and expressiveness: OWL Lite, OWL DL (DL stands for description logics), and OWL Full.
OWL Lite includes only a reduced subset of the OWL language constructors and has a lower formal complexity than the other OWL versions. It is meant mainly to allow (1) the implementation of simple classification hierarchies and (2) the familiarization with the OWL approach. It employs all the features already introduced by RDFS, making use of the same tags—like rdfs:subclassOf, rdfs:subPropertyOf, rdfs:domain, rdfs:range—with the same semantics. Note that rdfs:subclassOf is the fundamental constructor that is used to set up taxonomies/ontologies in OWL. It relates, in fact, a more specific class (concept) to a more general one: If X is a subclass of Y, then every instance of X is also an instance of Y. The relation rdfs:subclassOf is transitive: If X is a subclass of Y and Y is a subclass of Z, then X is a subclass of Z. With respect to RDFS, OWL Lite includes several new features:
•Constructors for equality and inequality, i.e., owl:equivalentClass, owl:equivalentProperty, owl:sameAs (two individuals may be stated to be the same), owl:differentFrom, owl: AllDif ferent.
•Constructors used to provide specific information about properties and their values, like owl:inverseOf—e.g., stating that the property hasChild is the inverse of the property hasParent, and stating that Mary is endowed with the property (hasParent Lucy) allows then an OWL reasoner to deduce that Lucy is endowed with the property (hasChild Mary)—owl:TransitiveProperty, and owl:SymmetricProperty.
•Constructors used to impose constraints on the way properties can be used by the instances of a class (concept). They are owl:allValuesFrom and owl:someValuesFrom. For example, owl:allValuesFrom introduces a range restriction, imposing, e.g., that the property hasDaughter of the class Person is restricted to obtaining all its values (allValuesFrom) from the class Woman. This allows a reasoner to deduce that, if an individual Lucy is related by the property hasDaughter with the individual Mary, Mary must be an instance of the class Woman.
•Constructors, e.g., owl:minCardinality and owl:maxCardinality, used to introduce a limited form of cardinality restrictions, stated on the properties of a particular class and to be intended as
constraints on the cardinality of that property when used in the instances of that class. Note that, for algorithmic efficiency reasons, OWL Lite allows using only the integers 0 and 1 to express the cardinality constraints; this restriction is removed in OWL DL.
•OWL Lite includes a (restricted form) of intersection constructor, owl:intersectionOf, allowing one, e.g., to state that the class EmployedPerson is the intersectionOf the classes Person and EmployedThings.
OWL DL makes use of the full set of the OWL constructors, but it introduces also some constraints on their use to give rise to systems that are “complete” (all the possible deductions are computable) and “decidable” (all the computations will be executed in finite time). Mainly, OWL DL implements what is called “type separation,” which means that a class (concept) cannot be also an individual or a property, and that a property cannot be also an individual or a class. This restriction is removed in OWL Full. OWL DL adds to the OWL Lite list of constructors some new constructors like owl:oneOf (classes may be described by enumeration of the individuals that make up the class), owl:hasValue (a property is required to have a given individual as value), owl:disjointWith (classes can be described as disjoint from each other, see the classes Man and Woman), owl:unionOf, owl:complementOf, owl: intersactionOf
(Boolean combinations of classes), etc.
OWL Full is similar to OWL DL, but for this sublanguage all the constraints have been suppressed— e.g., a class (concept) can be simultaneously treated as a collection of instances (individuals) and as an individual in itself. This can lead to the implementation of systems that are, at least partly, “incomplete” and/or “undecidable.” Currently, no complete implementation of OWL Full exists.
To give now an at least a partial picture of the representation of an ontology in OWL format, we reproduce in Table 2 a small fragment of the OWL version of the “wine” ontology, an ontology often used for exemplification’s purposes in the Semantic Web milieus (see McGuinness et al., 2002; Smith, Welty, & McGuinness, 2004). The code in this table can be considered indifferently as OWL Lite, OWL DL, or OWL Full; in Table 3, we reproduce, on the contrary, another fragment of the wine ontology that makes use of constructors proper to the DL version of the OWL language. Note that, for simplicity’s sake, we have not reproduced in Tables 2 and 3 the “housekeeping” declarations (see Table 1) that are necessary to identify all the XML namespaces associated with the wine ontology: E.g., xmlns:owl=“http:// www.w3.org/2002/07/owl#” is the conventional OWL
724
TEAM LinG

Using Semantic Web Tools for Ontologies Construction
declaration that is used to introduce the OWL vocabulary; xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syn- tax-ns#” identifies the elements prefixed as rdf: as referring to the RDF namespace, etc.
In the first line of Table 2, the class Wine is introduced, making use of an rdf:ID attribute. At the difference of the rdf:about attribute used in Table 1, rdf:ID introduces as its value only a “fragment identifier” (here wine) that represents an abbreviation of the complete reference to the URI of the resource being described. The full URI reference is formed by taking the base URI of the wine ontology, e.g., http://www.w3.org/TR/2004/REC-owl-guide-20040210/ wine, and appending the character # (to indicate that what follows is a fragment identifier) and then wine to it, giving then the absolute URI reference: http://www.w3.org/TR/ 2004/REC-owl-guide-20040210/wine#wine. Note that the Wine class can now be referred to by using #wine; e.g., rdf:resource=“#wine” is a well-formed OWL statement. As already stated, the fundamental taxonomic constructor is rdfs:subClassOf; the second line of the code of Table 2 allows then the insertion of the class Wine into the global ontology by asserting that it is a specialization of the class (concept) PotableLiquid (liquid suitable for drinking), which can be defined, in turn, as a specialization of the class ConsumableThing.
The third line of the code warns that the class Wine is also a specialization of a second class: This last is an
“anonymous” class, whose definition is included within
the opening owl:Restriction markup element in line 4 and 7 ends with the closing /owl:Restriction markup element in
line 7. In OWL, in fact, a property restriction on a class is a special kind of class description, that of the anonymous class including all the individuals that satisfy the given restriction. In line 5, the owl:onProperty constructor introduces then the name of the property, madeFromGrape, to associate with the class Wine; line 6 specifies that the cardinality of this property is 1. The insertion of this restriction in the definition of the class Wine states, globally, that every specific wine must also be characterized by at least one madeFromGrape relation. Note that (1) the xsd:nonNegativeInteger data type used to introduce the literal 1 in the owl:minCardinality restriction of line 6 is part of the built-in XML Schema data types (see Biron & Malhotra, 2001) and their use is strongly recommended in an OWL context; and (2) the value 1 conforms to the OWL Lite restrictions.
The code fragment of Table 3 defines the class RedWine as the precise intersection (logical conjunction “and”) of the class Wine and the set of things that are red in color (anonymous class). The presence of the attribute rdf:parseType=”Collection” is mandatory for this type of construction. Note the use of the DL constructor owl:hasValue to impose the value “Red” on the property hasColor of the anonymous class.
Table 2. A fragment of the OWL wine ontology
<owl:Class rdf:ID="Wine">
<rdfs:subClassOf rdf:resource="#PotableLiquid"/> <rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#madeFromGrape"/>
<owl:minCardinality rdf:datatype="xsd:nonNegativeInteger">1</owl:minCardinality>
</owl:Restriction>
</rdfs:subClassOf>
...
</owl:Class>
Table 3. Use of OWL DL constructors in the context of the wine ontology
<owl:Class rdf:about="#RedWine"> <owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="#Wine" /> <owl:Restriction>
<owl:onProperty rdf:resource="#hasColor" /> <owl:hasValue rdf:resource="#Red" />
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
725
TEAM LinG
Using Semantic Web Tools for Ontologies Construction
CONCLUSION
In spite of heavy W3C support, the Semantic Web vision outlined in this article has not fully reached the status of an “inescapable” standard.
Berners-Lee’s architecture has been very criticized, in particular because it ignores some fundamental components of computer science today, from database technology (the whole world economy runs on SQL) to UML (Unified Modeling Language; UML is the standard modeling language in software engineering and, at the difference of RDF, OWL, etc., has received wide attention not only in academia but also in the professional milieus). Note, however, that some researchers are actually investigating the possibility of defining a mapping between UML and OWL-like languages. UML has, in fact, a type hierarchy comparable with OWL and a class diagrams facility that can be compared to a frame-based language; see, in this context, the comparison between UML and DAML in Baclawski et al. (2001). A general discussion about the proposals for defining transformations between UML and the Semantic Web ontology languages can be found in Falkovych, Sabou, and Stuckenschmidt (2003).
The choice of OWL as a paradigmatic language to be used for ontological work in a Web context has also raised some criticisms, and several knowledge representation specialists have challenged as hastily the endorsement of OWL by the W3C. Criticisms range from the use of a particularly cumbersome syntax, inherited from RDF/XML, to the availability of an expressive power that, from a strict knowledge representation point of view, does not seem to improve so much with respect to “traditional” frame systems like Protégé; see again the article “Ontologies and Their Practical Implementation.” We can however remark in this context that an “OWL plug-in” for Protégé has been recently implemented; see Horridge (2004) and http://protege.stanford.edu/plugins/owl/. It allows loading and saving OWL and RDF ontologies, editing and visualizing OWL classes and their properties, and, mainly, supporting reasoners such as the description logics classifiers.
Note that, according to OWL’s supporters, it is precisely these last characteristics that “make all the difference” between a simple frame system, which utilizes pragmatically based inference procedures, and an OWLbased reasoning tool—see, e.g., RACER (Haarslev & Möller, 2003)—that employs sound and complete inferencing algorithms supported by the description logics theory. Unfortunately, description logics have been, in turn, criticized in spite (or because) of their (too) rigorous formal framework, associated, inter alia, with a reduced expressiveness of their main reasoning component, the automatic classification mechanism.
To give only an example, nearly a printed page is needed in McGuinness et al. (2002) to demonstrate that, using the DAML+OIL definitions (DAML+OIL is the ancestor of OWL), we can infer that “Red” can be considered as a sort of “WineColor.” A plea for the use, in a Semantic Web context, of knowledge representation languages more “meaningful” than those based on a description logics approach can be found in Zarri (2002).
REFERENCES
Baader, F., Calvanese, D., McGuinness, D., Nardi, D., & Patel-Schneider, P. F. (Eds.). (2002). The description logic handbook: Theory, implementation and applications.
Cambridge, UK: Cambridge University Press.
Baclawski, K., Kokar, M. K., Kogut, P. A., Hart, L., Smith, J., Holmes, W. S., et al. (2001). Extending UML to support ontology engineering for the Semantic Web. In Lecture notes in computer science: Vol. 2185. Proceedings of the fourth International Conference on the Unified Modeling Language, UML 2001 (pp. 342-360). Heidelberg, Germany: Springer-Verlag.
Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D. L., Patel-Schneider, P. F., et al. (Eds.). (2004). OWL Web Ontology Language reference (W3C Recommendation 10 February 2004). Retrieved from http:/ /www.w3.org/TR/owl-ref/
Beckett, D. (Ed.). (2004). RDF/XML syntax specification
(Revised; W3C Recommendation 10 February 2004). Retrieved from http://www.w3.org/TR/rdf-syntax-grammar/
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American, 284(5), 34-43.
Biron, P. V., & Malhotra, A. (Eds.). (2001). XML Schema part 2: Datatypes (W3C Recommendation 02 May 2001). Retrieved from http://www.w3.org/TR/xmlschema-2/
Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., & Yergeau, F. (Eds.). (2004). Extensible Markup Language (XML) 1.0 (3rd ed.; W3C Recommendation 04 February 2004). Retrieved from http://www.w3.org/TR/REC-xml/
Brickley, D., & Guha, R. V. (Eds.). (2004). RDF Vocabulary Description Language 1.0: RDF Schema (W3C Recommendation 10 February 2004). Retrieved from http:// www.w3.org/TR/rdf-schema/
Dekkers, M., & Weibel, S. (2003). State of the Dublin Core Initiative, April 2003. D-Lib Magazine, 9(4). Retrieved from http://www.dlib.org/dlib/april03/weibel/ 04weibel.html
726
TEAM LinG

Using Semantic Web Tools for Ontologies Construction
Falkovych, K., Sabou, M., & Stuckenschmidt, H. (2003). UML for the Semantic Web: Transformation-based approaches. In B. Omelayenko & M. Klein (Eds.), Knowledge transformation for the Semantic Web (pp. 93-106). Amsterdam: IOS Press.
Haarslev, V., & Möller, R. (2003). Racer: A core inference engine for the Semantic Web. In Proceedings of the second International Workshop on Evaluation of Ontology Tools (EON2003). Aachen, Germany: CEUR-WS.
Horridge, M. (2004). A practical guide to building OWL ontologies with the Protégé-OWL plugin (ed. 1.0, pp. 2737). Manchester, UK: University of Manchester.
Manola, F., & Miller, E. (2004). RDF primer (W3C Recommendation 10 February 2004). Retrieved from http:// www.w3.org/TR/rdf-primer/
McGuinness, D. L., Fikes, R., Hendler, J., & Stein, L. A. (2002). DAML+OIL: An ontology language for the Semantic Web. IEEE Intelligent Systems, 17(5), 72-80.
Smith, M. K., Welty, C., & McGuinness, D. L. (Eds.). (2004). OWL Web Ontology Language guide (W3C Recommendation 10 February 2004). Retrieved from http://www.w3.org/TR/owl-guide/
Thompson, H. S., Beech, D., Maloney, M., & Mendelsohn, N. (Eds.). (2001). XML Schema part 1: Structures (W3C Recommendation 02 May 2001). Retrieved from http://www.w3.org/TR/xmlschema-1/
Zarri, G. P. (2002). Semantic Web and knowledge representation. In Database and expert systems applications: Proceedings of the 13th International Conference, DEXA’02 (75-79). Los Alamitos, CA: IEEE Computer Society Press.
KEY TERMS
DTD (Document Type Description): A DTD is a formal description in XML declaration syntax of a particular type of document It begins with a <!DOCTYPE keyword and sets out what names are to be used for the different types of markup elements, where they may occur, the elements’ possible attributes, and how they all fit together. For example, a DTD may specify that every person markup element must have a name attribute, and that it can have an offspring element called id whose content must be text. There are many sorts of DTDs ready to be used in all kinds of areas that can be downloaded and used freely; see, e.g., MathML, for mathematical expressions; CML, Chemical Markup
Language; EDI, Electronic Data Interchange; PICS, Plat- |
7 |
form for Internet Content Selection; etc. |
OWL: The Web Ontology Language (OWL) is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF and is derived from the DAML+OIL Web ontology language. An OWL ontology is an RDF graph, which is in turn a set of RDF triples. OWL includes three specific sublanguages characterized by an increasing level of complexity and expressiveness: OWL Lite, OWL DL—DL stands for description logics, a particular logic-oriented knowledge representation language introduced to supply a formal foundation for frame-based systems—and OWL Full.
RDF (Resource Description Framework): An example of “metadata” language (metadata = data about data) used to describe generic “things” (“resources,” according to the RDF jargon) on the Web. An RDF document is a list of statements under the form of triples having the classical format: <object, property, value>, where the elements of the triples can be URIs (Universal Resource Identifiers), literals (mainly, free text), and variables. RDF statements are normally written into XML format (the so-called “RDF/XML syntax”).
RDF Schema (RDFS): Provides a mechanism for constructing specialized RDF vocabularies through the description of domain-specific properties. This is obtained mainly by describing the properties in terms of the classes of resource to which they apply: For example, we could define the creator property saying that it has the resource document as “domain” (document is the value or “object” of this property) and the resource person as “range” (this property must always be associated with a resource person, its “subject”). Other basic modeling primitives of RDFS allow setting up hierarchies (taxonomies), both hierarchies of concepts, thanks to the use of “class” and “subclass-of” statements, and hierarchies of properties, thanks to the use of “property” and “subproperty-of” statements. Instances of a specific class (concept) can be declared making use of the “type” statement.
Semantic Web Architecture: A layered architecture proposed by Berners-Lee for Semantic Web applications. In this architecture, ontologies occupy a central place: They are built on the top of the RDF (Resource Description Framework) layer, which is in turn built on the top of the XML layer (see below). The XML/ RDF base constrains the particular format ontologies assume in a Semantic Web context, inheriting, e.g., all the well-known XML “verbosity.”
727
TEAM LinG
Using Semantic Web Tools for Ontologies Construction
XML (Extensible Markup Language): Has been created to overcome some difficulties proper to HTML (Hypertext Markup Language), which—developed as a means for instructing the Web browsers how to display a given Web page—is a “presentation-oriented” markup tool. XML is called “extensible” because, at the difference of HTML, it is not characterized by a fixed format but lets the user design his own customized markup languages (a specific DTD, Document Type Description, see below) for
limitless different types of documents. XML is then a “content-oriented” markup tool.
XML Schema: A more complete way of specifying the semantics of a set of XML markup elements. XML Schema supplies a complete grammar for specifying the structure of the elements, allowing one, e.g., to define the cardinality of the offspring elements, default values, etc.
728
TEAM LinG

|
729 |
|
Using Views to Query XML Documents |
|
|
|
7 |
|
|
|
|
|
|
|
MarioCannataro
Magna Graecia University of Catanzaro, Italy
Sophie Cluet
INRIA Rocquencourt-Xyleme S.A., France
GiuseppeTradigo
Magna Graecia University of Catanzaro, Italy
Pierangelo Veltri
Magna Graecia University of Catanzaro, Italy
DanVodislav
CNAM Cedric-Lab, Paris, France
INTRODUCTION
We consider using views in databases and in particular their applications to access heterogeneous and large volumes of digital documents, focusing on (1) views in databases and (2) views for data integration.
We start our paper first answering the questions what is a view and what are their applications. We give a short representation of this concept and reference Ullman (1988) and Abiteboul, Hull, and Vianu (1995) for more detailed and formal definitions in the database literature.
Views are used for several reasons to manipulate data coming from different sources through a unique and more convenient structure; e.g., to hide data (e.g., for security reasons) or to restructure data, giving a simplified vision of the database.
Information may be grouped in one or more collections of data depending on the database model (e.g., as a set of tables or classes in a structured model or in a graph in a semi-structured model). View facilities allow one to create an imaginary source collecting interesting data. For instance, in the case of a relational database, a view might be an imaginary table containing data coming from several tables of the database. Once views are created they can be queried by users as if they were part of the database.
Views can be defined, queried, and modified using view definition, view query, and view manipulation languages. These three languages depend on the data model; e.g., for a relational data model, the view definition and query languages coincide with SQL, the standard query language to access to databases.
A view is created specifying a schema, a domain, and a view definition.
•The view domain is a set of sources containing information that the view represents.
•The view schema is the structure of the view and describes how data are presented and are queried by the users.
•The view definition expresses the links existing between data in the sources (actual data) and data in the view. It is expressed using the view definition language (see Figure 1).
In a simple case, the view domain is a set of data sources in the same database (e.g., a set of tables in a relational database), and the view definition language is that of the database. In more complex cases, the view
Figure 1. A view on a relational database
View Schema
View Definition
SQL query
Books |
Users |
… … … |
… … … |
… … … |
… … … |
… … … |
… … … |
Loans |
… … … |
… … … |
… … … |
View Domain
… … … |
… … … |
… … … |
… … … |
… … … |
… … … |
… |
Database
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
TEAM LinG