Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
26
Добавлен:
07.02.2016
Размер:
8.32 Mб
Скачать

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

occurrences divided by the total number of terms in the text). And the weight in the concept vector represents the probability of the term being present in a text of that theme. The next section (the ontology) describes how concept weights are defined.

The text mining method compares the vector representing the text of a message against vectors representing concepts in the ontology. The method multiplies the weights of common terms (those present in both vectors). The overall sum of these products is the degree of relation between the text and the concept, meaning the relative probability of the concept presence in the text or that the text holds the concept with a specific degree of importance. The decision concerning if a concept is present or not depends then on the threshold used to cut off undesirable degrees. This threshold is dependent on the domain ontology used in the system and is previously set by experts after some initial evaluations.

After that, the text mining module passes the identified concepts to the recommender module, which looks in the digital library for items to suggest.Theconceptsidentifiedinthemessages represent users’ interests and will be used to generate the recommendations.

The digital library is a repository of information sources, like electronic documents, links to Web page and bibliographic references, especially created for the recommendation system. The inclusion (upload) of items in the digital library is responsibility of authorized people and can be made offline in a specific module. The classification of the electronic documents is made automatically by software tools, using the same textminingmethodusedinthetextminingmodule and the same domain ontology. A difference is that a document may be related to more than one concept. A threshold is used to determine which concepts can be related to one document. Thus, the relation between concepts and documents in the digital library is many-to-many. The relationship degree is also stored.

The profile base contains identification of authorized users. Besides administrative data like name, institution, department, e-mail, and so forth, the profile base also stores the interest areas of each person, as well an associated degree, informing the user’s knowledge level on the subject or how much is his/her competence in the area. The interest areas must exist in the ontology. The profile base also records the items accessed (uploaded, added, read, or downloaded) by the user or recommended to him/her. This information is useful to avoid recommending known items.

Therecommendermodulesuggeststotheusers that participate in the chat items from the digital library, according to the theme being discussed.

Recommendations are particular of each user; thus, each user receives a different list of suggestions in the screen.

the engineering prOCess fOr COnstrUCting a dOmain OntOlOgy

Noy and McGuinnes (2003) propose a methodology (set of ordered steps) for constructing ontologies:

Step 1—Determine the domain and the scope of the ontology: To answer this question, ontology engineers must elicit the goal of the ontology, the intended users and the information that will be stored in the ontology.

Step 2—Consider the reuse of existing ontologies: Engineers must verify if there are other ontologies for the same domain. There are many ontologies available in electronic formats that can be adapted. For example, there libraries of ontologies such as Ontolingua (www.ksl.stanford.edu/software/ontolingua) and the DAML ontology library (www.daml.org/ontologies). The

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

reuse minimizes the time and effort in the construction process and furthermore it gives quality to the final result since the existing ontologies must have been already tested.

Step 3—Identify the important terms of the ontology: And identify concepts that these terms represent. Elements that compose the ontology must be specified as concepts and will be the base for the construction of the ontology.

Step4—Definetheclassesandthehierar- chy of classes: This step may be performed following one of the approaches (Uschold

& Gruninger, 1996):

Top-down: Starts with the definition of more general concepts and follows dividingtheseconceptsinmorespecific ones.

Bottom-up: Starts with the definition ofmorespecificclassesandthengroups classes to compose more general concepts.

Hybrid: A combination of the two approaches above; starts with the definition of the more important concepts; after that, general concepts are composed from grouping some of those andmorespecificconceptsarecreated of the initial ones.

Step 5—Define the characteristics of the classes: Classes alone are not very useful.

It is necessary to find the characteristics of the classes. Characteristics may be terms identified in the Step 3 and not yet classified in one concept or may be associations between classes.

Step6—Definetheattributesofthechar- acteristics: Thisstepmustdefineattributes such as cardinality, type, domain, and so forth, to each characteristic defined early.

Step 7—Create instances: The last step is to populate the ontology with instances for each class, including their attributes.

Challenges fOr

COnstrUCting a dOmain

OntOlOgy

Theprocessofconstructinganontologyconsumes a lot of time and effort. One of the challenges for researchers is to develop tools to help this process, especially for minimizing the need for human work. Some works have investigated ways to construct ontologies in semiautomatic ways.

The majority of them analyze text to discover concepts and relations for an ontology (as for exampleKietzetal.,2000).Furthermore,theterms

(wordsandexpressions)areusedascharacteristics of the concepts.

For example, Saias (2003) enriches an initial ontology, developed by experts, with information extracted from texts selected about the domain. Syntactic information is used to discover classes (subjects, verbs, and complements in a sentence) and to infer relations among classes (syntactic relations).

Alani et al., (2002) complement an existing ontology with instances and new classes extracted from texts by syntactic analysis. New terms are addedascharacteristicsoftheclassesbyanalyzing synonyms in the WordNet.

Lame (2003) also uses syntactic analysis to discover classes of the ontology. However, all termsareconsideredaspotentialclasses,including adjectives andadverbs.Differentterms areassociatedtothesameclassbyanalyzingthesimilarity between them, using statistical methods.

This chapter presents an engineering process for constructing domain ontologies. The focus of this work is on Steps 5 and 6 of the general methodology proposed by Noy and McGuinnes

(2003). The goal is to find terms that represent classes/concepts of the ontology and the degree of importance of each term.

Theprocessisbasedonstatisticsandprobabilities; it does not use natural language processing, as syntactic or morphological analysis. The idea is to have a simple process for constructing on-

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

tologies, based on analysis of textual documents. The work compares a manual process (with help of software tools based on statistics of words) against a semiautomatic process (also based on statistics of words). In both cases, we consider that part of the ontology is previously created, that is, the concepts are defined the hierarchy of concepts is established.

Thecomparisonisondefiningtermstorepresent each concept and the weights of each term. In the manual process, humans considered experts in the domain define the terms and weights with help of software tools. In the semiautomatic process, terms and weights are defined under a supervised learning process, where experts select textual documents and a software selects terms and weights using the TFIDF method.

eXperiments and evalUatiOns

Some experiments were carried out to investigate different approaches to construct domain ontologies. The idea is to compare these approaches to determine good practices in this process.

Figure 2. Part of the hierarchy of concepts for all ontologies

All the ontologies constructed for the experiments have the same internal structure: a hierarchy of concepts. Each concept is represented by a node in the hierarchy. The root (main) node is the own domain ontology. Nodes can have one or many child-nodes and only one parent node. Each concept has associated to it a list of terms and their respective weights. Weights are used to state the relative importance or the probability of the term for identifying the concept in a text. The relation between concepts andtermsismany-to-many,that is, a term may be present in more than one concept and a concept may be described by many terms.

The hierarchy of concepts was constructed manually by experts for all approaches, that is, all ontologiesconstructedfortheexperimentshavethe same hierarchy of concepts. The difference relies on the way how the list of terms and weights were created in each approach.

The ontology was created for the domain of computer science, following the classification of computer science areas proposed by the ACM (AssociationforComputingMachinery).However, expertsrefinedthehierarchyaddingnewchild/leaf nodes for some concepts. The generated ontology has a total of 57 concepts and is different from the original ACM classification. Figure 2 shows part of the hierarchy of concepts constructed for the experiments.

The main investigation was to compare the intervention of humans in the elaboration of the lists of terms and weights for each concept and the use ofsoftwaretoolsforautomaticallyextractingterms and weights. Two ontologies were constructed under different conditions:

a.The first ontology (manually constructed) was elaborated by humans, considered experts in subareas of computer science (subareas correspond to concepts in the ontology).

b.The second ontology (semiautomatic) was constructed under a supervised learning process.

0

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

the engineering process for the manually Constructed Ontology

For each concept present in the ontology, a list of terms and corresponding weights was created by experts in the concept. The elaboration of each list was performed independently of other concepts or lists.

The first step was to select documents about the considered concept (made by the own experts). For each concept, a set of 100 textual documents was selected from the Citeseer digital library (www.researchindex.org). Then a software tool was used to extract a centroid of the documents. A centroid is a list of the terms more frequent in the documents, disregarding stopwords. The weight associated to each term in the list was calculated as the average among the relative frequencies of the term in each document. The relative frequency of a term inside a document is the number of times that the term appears in the document divided by the total number of terms in that document.

Afterthisstep,expertsrefinedthelistofterms andweights,analyzingindividuallyeachlistand using a software tool that identifies subjects in texts using a probabilistic method comparing documents and the ontology. Analyzing these results, experts could include new terms (for example, synonyms), exclude terms (for example, polysemic words or general terms that do not help in identifying a concept) or adjust the weight of the terms. Experts reduced the weight of the terms that appear in more than one concept or increased the weight of important terms.

Experts also analyzed terms that appeared in both parent and child concepts: the idea was to verify if the term was better suited for one of the concepts or if the weights should be adjusted according to some human criteria. For example, if the term was specific of the child concept, that is, it was a narrow term; the weight in the child concept should be greater than in the parent concept. Otherwise, if the term is more general but

alsocanbeusedtodetermineaspecificconcept, the weight should be greater in the parent concept than in the child one.

the engineering process for the semiautomatic Constructed Ontology

For each concept present in the ontology, a list of terms and corresponding weights was created by a software tool using the supervised learning approach (machine learning method). The TFIDF method—term frequency and inverse document frequency (Salton & McGill, 1983) was used to elaborateeachlistoftermsandweightsanalyzing documents selected by experts.

The method used as learning examples the same sets of documents per concept used in the first ontology, that is, a set of 100 textual documents selected for each concept, extracted by experts from the Citeseer digital library (www. researchindex.org).

The TFIDF method increases the weight of terms that frequently appear in documents of the set and decreases the weight of terms that appear in more than one concept set. There was no human intervention after this step.

As we can see in Table 1, the TFIDF method resulted in eight times more terms than the manual process. The reason is that humans do not have patience to set and include a great volume of terms, thus limiting the number of terms to be used in each concept.

the nOrmalizatiOn step

A special approach was investigated in the experiments.Sincetermweightsinthedifferentconcepts could range in different scales, a normalization step was done in the two ontologies (manual and semiautomatic).

The reason is that some concepts could be identified more often than others if the weights

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

of their terms were greater than the values in other concepts. As we can see in the Table 1, the greatest weight in the manually constructed ontology is quite different from the greatest weight in the semiautomatic ontology and so on with the smallestweights.Inthenormalizationprocess,the limits were defined arbitrarily as: superior limit = 0.01; inferior limit = 0.000001. These values were chosen because the text mining method should not generate a degree greater than one.

The inferior limit was defined to make a large range between the limits. All the weights in the original ontologies were transposed to the new scale preserving the proportion and the relations between a certain value and both superior and inferior limits.

evalUatiOn Of the OntOlOgies fOr indeXing dOCUments

Four ontologies were evaluated to compare the approaches for constructing ontologies. These four ontologies were described early and their

Table 1. Some characteristics of each ontology

characteristics are presented in the Table 1. Ontologieswereevaluatedforfindingconceptsinside textualdocuments,usingaBayesianclassification method. Thirty documents were selected from Citeseer (www.researchindex.org).

In a first round, thresholds were used to filter concepts to be considered and to eliminate weak concepts. Precision and recall of each ontology for identifying concepts was calculated. Experts judgedeachconceptidentifiedforeachdocument and listed the concepts that should have been identified but were not. The precision (Pr) for each document was calculated as the number of concepts correctly identified divided by the total number of concepts identified. Recall (Rc) was evaluated as the number of concepts correctly identified divided by the number of concepts that should be identified. Precision and recall for each ontology were calculated as the average for all documents. F-measure (F-m) was used to determine the best performance considering both precision (Pr) and recall (Rc) and it was calculated as { (2 * Precision * Recall) / (Precision + Recall) } according to (Lewis, 1991). Different thresholds

 

Total of terms in the ontology

Greatest weight

Smallest weight

 

 

 

 

Manually constructed

3,689

0.033987

0.000001

 

 

 

 

Manually constructed with

3,689

0.010000

0.000001

normalization

 

 

 

 

 

 

 

Semiautomatic

27,154

0.314645

0.000140

 

 

 

 

Semiautomatic with normalization

27,154

0.010000

0.000001

 

 

 

 

Table 2. Evaluation of each ontology over textual documents (first round)

Thresholds

0.0001

0.0001

0.0001

0.00005

0.00005

0.00005

0.00001

0.00001

0.00001

 

 

 

 

 

 

 

 

 

 

Ontologies

Pr

Rc

F-m

Pr

Rc

F-m

Pr

Rc

F-m

 

 

 

 

 

 

 

 

 

 

Manually constructed

100.0%

20.0%

33.3%

60.1%

40.7%

48.5%

33.4%

73.2%

45.9%

 

 

 

 

 

 

 

 

 

 

Manually constructed with

47.4%

38.5%

42.5%

43.2%

49.8%

46.3%

30.2%

79.8%

43.8%

normalization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Semiautomatic

12.4%

25.1%

16.6%

10.1%

43.9%

16.5%

21.2%

77.3%

33.3%

 

 

 

 

 

 

 

 

 

 

Semiautomatic with

47.5%

28.2%

35.4%

43.2%

65.3%

52.0%

20.2%

94.7%

33.3%

normalization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

were tested assuming that a certain threshold can favor one ontology. Table 2 shows the results of the first round.

In a second round, the precision was calculated using the top N concepts identified in each document. In this case, we are assuming that there is not a best threshold or that it is not easy to determine the best one. In addition, it is possible to occur that one ontology has one best threshold and other ontology has an other best threshold. Thus, we evaluated the performance of the ontologies using the top N concepts in the ranking of concepts identified in the documents.

The ranking is the ordered presentation of the concepts, from top to down, initiating with the concept with highest degree. We considered three kinds of rankings: top 3, top 5 and top 10. Table 3 shows the results of the second round.

Analyzingtheresultsofthefirstround(Table

2), we can see that the best performance is of the semiautomaticwithnormalizationontology(atthe threshold 0.00005, with 52% of F-measure).

If considering the performances in each threshold, we found that: (a) the manually constructedontology achieves the best performance with the least threshold (0.00001) with 45.9% of F-measure, (b) the semiautomatic with normalization ontology wins at the intermediary threshold (0.00005) with 52% of F-measure and

(c) the manually constructed with normalization ontology obtains the best F-measure (42.5%) in the highest threshold (0.0001). One conclusion is that we can not determine the best way to construct ontologies when using thresholds. In the same sense, we can not determine if there is a best threshold, maybe because each ontology may have its best threshold.

Oneinterestingfindingisthatthebestthreshold in all ontologies was the intermediary one (0.00005), maybe because this threshold can equilibrate measures of precision and recall; has the highest threshold favors precision but loses recall and vice-versa for the smallest threshold.

Table 3. Evaluation of each ontology over textual documents (second round)

Top N

Top 3

Top 3

Top 3

Top 5

Top 5

Top 5

Top 10

Top 10

Top 10

 

 

 

 

 

 

 

 

 

 

Ontologies

Pr

Rc

F-m

Pr

Rc

F-m

Pr

Rc

F-m

 

 

 

 

 

 

 

 

 

 

Manually constructed

59.5%

47.8%

53.0%

47.1%

61.6%

53.4%

33.1%

81.1%

47.0%

 

 

 

 

 

 

 

 

 

 

Manually constructed with

61.9%

37.9%

47.0%

43.4%

47.8%

45.5%

29.3%

80.5%

42.9%

normalization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Semiautomatic

33.3%

4.3%

7.6%

21.9%

8.9%

12.7%

13.9%

17.8%

15.6%

 

 

 

 

 

 

 

 

 

 

Semiautomatic with

50.6%

22.3%

30.9%

40.0%

27.8%

32.8%

25.1%

47.3%

32.8%

normalization

 

 

 

 

 

 

 

 

 

Table 4. Average number of concepts identified in the first round

 

0.0001

0.00005

0.00001

 

 

 

 

Manually constructed

1

1.5

8.3

 

 

 

 

Manually constructed with normalization

1.7

4.6

21.7

 

 

 

 

 

Semiautomatic

20.4

29.7

40.9

 

 

 

 

Semiautomatic with normalization

11.4

24.3

39.9

 

 

 

 

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

Table 4 presents the average number of concepts identified in the texts of the first round, by threshold. This table makes clear that thresholds may cause problems in evaluations. We can see that the same threshold range may bring a small number of concepts in some cases (manually constructedontologies) and an excessive number of concepts in other cases (semiautomaticontologies). Future work will try to investigate the use of different thresholds for different ontologies.

Analyzing the results of the second round

(Table 3), we can see that the best performance for F-measure in all top N is due to the manually constructed ontology.

An interesting finding is that normalization improved the performance of the semiautomatic ontology in the three top N evaluations but reduced the values of F-measure of the manually constructed ontology in the three top N.

Analyzingtheperformanceofthenormalization step, we have the following:

Manual (thresholds): Without normalization 2 × 1 with normalization.

Manual (top N): Without normalization 3 × 0 with normalization.

Semiautomatic (thresholds): Without normalization 0 × 2 with normalization (1 tie).

Semiautomatic (top N): Without normalization 0 × 3 with normalization.

The final conclusion about the normalization step is that normalization should not be used in the manually constructed ontologies but it is useful in ontologies created by semiautomatic processes.

Comparing manually constructed ontologies to semiautomtic ones, we can see the following:

Without normalization (thresholds):

Manual 3 × 0 semiautomatic.

Without normalization (top N): Manual 3 × 0 semiautomatic.

With normalization (thresholds): Manual 2 × 1 semiautomatic.

With normalization (top N): Manual 3 × 0 semiautomatic.

These results lead us to conclude that manually constructed ontologies are better than semiautomatic ones, at least when using the methods and steps described in this paper (and used in the experiments).

The best combination (manually constructed ontology without normalization) wins in four of thesixevaluations(threewiththresholdsandthree with top N) and its best performance in thresholds (48.5%) is only 9.3% minor than the best one (52%), while in top N this ontology achieves the best performance.

Oneinterestingobservationisthatthemanuallyconstructedontologywithnormalizationdoes not achieve the best performance in none of the evaluations in both rounds. One reason may be that experts, when creating the term list for each concept, adjust the weights in a way that normalizes the scales for all concepts.

Analyzing the overall performances, we can say that the best one (53.4% of F-measure) is still far from the desired one. Future works must evaluate a better combination of human intervention and automated tools.

evalUatiOns Of the OntOlOgies On Chat messages

Wecarriedoutafinalevaluationoftheontologies using them to identify concepts in chat messages. A discussion session, which occurred in a private chat of the recommender system, was used as a sample for this evaluation. Messages with more than two words were individually extracted from the session, resulting in 165 messages to be used in this experiment. Experts identified the correct concepts that should be identified. Again,

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

precision, recall, and F-measure were use in the comparison.Theevaluationsweredoneusingonly the top N concepts identified by each ontology. Table 5 presents the results of this final round.

We can see that the manually constructed ontology was again the best one in all top N evaluations. The best performance of 67% was greater than the best performance among all ontologies on textual documents using thresholds (52%) and with top N concepts (53.4%).

Again, normalization was not useful with manually constructed ontologies and improves performance of the semiautomatic ontologies. And the manuallyconstructedontology wins the semiautomatic with and without normalization.

COnClUding remarks

This chapter presented an investigation on the construction of domain ontologies for contentbased recommender systems. The work is focused on the step for defining characteristics for concepts present in a previous ontology. The goal was to compare manually constructed ontologies to semiautomatic constructed ontologies in a specific task for classifying texts (identifying themes in texts).

Experiments were carried out on textual documents and on textual messages of a chat. A normalizationstepwasalsoinvestigatedtominimize limits problems in term weights; the goal was to

determineifnormalizationimprovestheontology quality for identifying themes in texts. Precision, recall, and F-measure were used to compare the performances in all experiments.

The conclusion is that manually constructed ontologies achieve better results for identifying concepts in texts than semiautomatic ones, at least whenusingthemethodsandstepsdescribedinthis chapter (and used in the experiments). Software tools are useful to help in identifying terms that can represent concepts in an ontology, but the final decision must be responsibility of humans.

Furthermore, even with a smaller set of terms, the manually constructed ontologies achieved better performance than semiautomatic ones. Therefore, the hint is to use human interference in the process to get a more concise set of properties (minimizing processing efforts and resources) with a grant of superior quality.

Regarding the normalization step, we noted that normalization should not be used in the manually constructed ontologies, but it is useful in ontologies created by semiautomatic processes.

The reason is that humans tend to normalize term weights (even intuitively) when creating the ontology.

The chapter showed that is feasible to use ontologies to mine textual messages in a chat in order to identify the content of the discussion. The result of this discovery process can be used by recommender systems suggest more precise items to users.

Table 5. Evaluation of each ontology on chat messages

Top N

 

Top 3

 

 

Top 5

 

 

Top 10

 

 

 

 

 

 

 

 

 

 

 

Ontologies

Pr

Rc

F-m

Pr

Rc

F-m

Pr

Rc

F-m

 

 

 

 

 

 

 

 

 

 

Manually constructed

56.6%

81.9%

67.0%

46.6%

92.0%

61.9%

35.1%

97.1%

51.6%

 

 

 

 

 

 

 

 

 

 

Manually constructed with

53.3%

76.4%

62.8%

41.1%

80.1%

54.4%

33.6%

84.0%

48.0%

normalization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Semiautomatic

18.5%

15.4%

16.8%

13.3%

19.0%

15.6%

7.2%

31.5%

11.8%

 

 

 

 

 

 

 

 

 

 

Semiautomatic with

38.6%

64.2%

48.2%

22.1%

65.4%

33.0%

14.2%

79.1%

24.1%

normalization

 

 

 

 

 

 

 

 

 

 

 

 

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

fUtUre trends

The chapter showed that semiautomatic construction of domain ontologies is feasible. However, the part that can be automated is the construction of the term list for each concept, under a supervised learning process, where experts select good documents to be used by the software tool.

The challenge is to create methods able to automatically identify the concepts and the relations between them (for example, the hierarchy). One direction of the researches is the use of clustering methods that automatically groups textual documents. The goal of the clustering process is to allocate in a same group (cluster) those documentswithsimilarcontentandallocateindifferent clusters documents that are not similar. For more details about clustering, see (Jain et al., 1999).

We can assume that each cluster represents a different concept, since the documents inside the cluster are grouped for having similar content or because they are about the same subject. In this sense, after the clustering process, we can identifythetermlistanalyzingthecontentofthe documents inside the cluster (as in a supervised learning process). The problem is to determine the name of the concept.

Furthermore, it is possible to use a divisive clustering algorithm that results in a hierarchy of clusters, from one cluster in the top (with all elements) to the bottom level where there are many clusters as elements (each cluster with only one element). This hierarchical schema may be used as the hierarchy of concepts.

Another approach using clustering is to group not documents but words, in a way similar to latent semantic indexing (Deerwester et al., 1990). In this case, the clustering algorithm is used on data about similarities between words. Each resulting clustermaybeconsideredaconceptintheontology and the respective set of words inside the cluster composes the characteristics of the concept (the term list and the weights).

aCknOwledgment

This research is partially supported by CNPq, an entity of the Brazilian government for scientific and technological development (Project DIGITEX - Editoração, Indexação e Busca em

Bibliotecas, grant number 550845/2005-4), and FAPERGS, Foundation for Supporting Research inRioGrandedoSulstate(ProjectRec-Semântica

-PlataformadeRecomendaçãoeConsultanaWeb

Semântica Evolutiva, grant number 0408933).

fUtUre researCh direCtiOns

Constructing ontologies is a challenging task and so much must yet to be done. One interesting direction is the use of computational intelligence to minimizehumaneffortinthisprocess.Techniques and tools from expert systems and knowledge acquisition areas can accelerate some parts of the process. Machine learning can help engineers to find initial structures and/or to complete the structures.

In special, the use of machine learning techniques over textual documents is a promising field. Organizations have lots of texts, but this kind of document has unstructured information. The acquisition of information from texts is still a challenge, even with text mining advances and the evolution in information retrieval techniques.

Ontologiescansupportinacquiringinformation fromtextsbuttheycanalsobenefitfromthisprocess. Words carry meaning and represent the real world;byanalyzingwordsandtheirsemantics,we can observe how people describe the world and thus how the world is composed or structured. For this reason, we must approximate our research to those concerning computational linguistics and knowledge representation.

Evaluating the Construction of Domain Ontologies for Recommender Systems Based on Texts

referenCes

Alani, H., Kim, S., Millard, D.E., Weal, M.J., Hall, W., Lewis, P.H., et al. (2002). Automatic ontol- ogy-based knowledge extraction and tailored biography generation from the Web. Technical Report 02-049.

Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction, 12(4), 331-370.

Deerwester, S., et al. (1990). Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6).

Gauch, S., Chaffee, J., & Pretschner, A. (2003). Ontology-based personalized search and browsing. Web Intelligence and Agent System, 1(3-4), 219-234.

Guarino, N. (1998). Formal ontology and information systems. In International Conference on Formal Ontologies in Information Systems

FOIS’98, Trento, Italy (pp. 3-15).

Jain,A.K.,Murty,M.N.,&Flynn,P.J.(1999).Data clustering: a review. ACM Computing Surveys, 31(3), 264-323.

Kietz, J.U., Maedche, A., & Volz, R. (2000). A method for semiautomatic ontology acquisition from a corporate intranet. In Proceedings of EKAW-2000 Workshop Ontologies and Text.

Lecture Notes in Artificial Intelligence (LNAI). France: Springer-Verlag.

Labrou, Y., & Finin, T. (1999). Yahoo! as an ontology: Using Yahoo! categories to describe documents. In 8th International Conference on KnowledgeandInformationManagement(CIKM99 (pp.180-187). Kansas City, MO.

Lame, G. (2003). Using text analysis techniques to identifylegalontologiescomponents.InWorkshop onLegalOntologiesoftheInternationalConference on Artificial Intelligence and Law.

Lewis, D.D. (1991). Evaluating text categorization. In Proceedings of the Speech and Natural Language Workshop (pp. 312-318).

Loh, S., Wives, L.K., & Oliveira, J.P.M. (2000).

Concept-based knowledge discovery in texts extracted from the web. ACMSIGKDDExplorations, 2(1), 29-39.

Middleton,S.E.,Shadbolt,N.R.&Roure,D.C.D.

(2003). Capturing interest through inference and visualization:Ontologicaluserprofilinginrecommender systems. In InternationalConferenceon Knowledge Capture KCAP’03, (pp. 62-69). New York: ACM Press.

Noy, F.N., & McGuinnes, D.L. (2003). Ontology development 101: A guide to create your first ontology.Retrieved 2003 from http://ksl.stanford. edu/people/dlm/papers/ontology-tutorial-noy- mcguinnes.doc

Saias, J. (2003). Uma metodologia para a construçãoautomáticadeontologiaseasuaaplicação em sistemas de recuperação de informação (A methodology for the automatic construction of ontologies and applications in information retrievalsystems). PhD thesis. University of Évora, Portugal (in Portuguese).

Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill.

Sowa, J.F. (2000). Knowledge representation: logical, philosophical, and computational foundations. Pacific Grove, CA: Brooks/Cole

Publishing.

Terveen, L., & Hill, W. (2001). Beyond recommender systems: helping people help each other. In J. Carroll (Ed.), Human computer interaction in the new millennium. Addison-Wesley.

Uschold,M.,&Gruninger,M.(1996).Ontologies:

Principles, methods and applications. Knowledge Engineering Review, 11(2).