
- •Introduction
- •Classification features
- •Experiments
- •Dataset
- •Computation of classification features
- •Product feature classification
- •Table 1. Properties of selected datasets
- •Table 2. Aspect classification results
- •Table 3. Aspect classification results with different features
- •Conclusion
- •References
Table 3. Aspect classification results with different features
|
PMI-snt |
PMI-doc |
||||
Feature |
P |
R |
F1 |
P |
R |
F1 |
TF-IDF-LIB |
0.643 |
0.172 |
0.271 |
0.61 |
0.226 |
0.33 |
Weirdness-LIB |
0.778 |
0.038 |
0.073 |
0.789 |
0.041 |
0.078 |
Weirdness-RNC |
0.321 |
0.025 |
0.046 |
0.529 |
0.025 |
0.047 |
TF-IDF-RNC |
0.344 |
0.03 |
0.055 |
0.481 |
0.071 |
0.124 |
PMI-YM-docs |
0.383 |
0.049 |
0.087 |
0.154 |
0.005 |
0.011 |
PMI-YM |
0.242 |
0.022 |
0.04 |
0.278 |
0.014 |
0.026 |
OpinionNEAR3 |
0.231 |
0.016 |
0.031 |
0.512 |
0.06 |
0.107 |
One can see that the “TF-IDF-LIB” and “Weirdness-LIB” are the most useful features. Interestingly, the same features computed for a different corpus, provide worse results. It is accounted for by the fact that the newspaper corpus from RNC is used, while the corpus on LIB is mostly fiction. Newspapers are more probable to have product features, rather than fiction. Another interesting observation is that PMI computed using “the same document” query perform slightly better than the one computed with “the same sentence” query.
Conclusion
We performed the task of product features extraction from Russian reviews. It was addressed as a classification problem. A product feature dataset was created and labeled. A number of different classification features were used and several classification algorithms applied. The experiments demonstrated efficiency of our approach.
Our further work is to use additional linguistic and statistical attributes for classification. Spelling corrector will be employed to correct spelling of candidates. We plan to apply sequence labeling classifiers as well. We will do product features clustering to group them into meaningful groups. This may help us to filter features as well.
References
Ahmad, K.; Gillam, L.; and Tostevin, L. 1999. University of surrey participation in trec8: Weirdness indexing for logical document extrapolation and retrieval (wilder). In The Eighth Text Retrieval Conference (TREC-8).
Ana M. Popescu, Oren Etzioni. Extracting Product Features and Opinions from Reviews. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005)
Brody, S. and S. Elhadad. An Unsupervised Aspect-Sentiment Model for Online Reviews. In Proceedings of The 2010 Annual Conference of the North American Chapter of the ACL, 2010.
Chatviorkin Ilya, Lukashevich Natalia. Automatic Extraction of Domain-Specific Opinion Words. Proceedings of the International Conference Dialog, 2010.
Chetviorkin I. I., Braslavski P. I. Sentiment analysis track at ROMIP 2011. Dialog 2011.
Chugreev A., Marchuk A., Makeev I., Mokaev T., Skudarnov Y., Bat’kovich D., Ulanov A. GoodsReview – a Service for Automatic Sentiment Analysis [GoodsReview – servis avtomatichestoko analiza portrebitel’skogo mneniya]. Proceedings of KESW, 2012.
Ghani, R., K. Probst, Y. Liu, M. Krema, and A. Fano. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 2006, 8(1): p. 41-48.
Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I. H. (2009); The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1.
Hu, M. and B. Liu. Mining and summarizing customer reviews. In Proceedings of ACM SIGKD International Conference on Knowledge Discovery and Data Mining (KDD-2004), 2004.
Jakob, N., & Gurevych, I. (2010, October). Extracting opinion targets in a single-and cross-domain setting with conditional random fields. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 1035-1045). Association for Computational Linguistics.
Jin, W. and H. Ho. A novel lexicalized HMM-based learning framework for web opinion mining. In Proceedings of International Conference on Machine Learning (ICML-2009), 2009.
Liu, Bing. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. 2nd ed. 2011, XX, 622 p.
Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.
Pak A., Paroubek P. Language independent approach to sentiment analysis (LIMSI Participation in ROMIP’11), Dialog 2011.
Riloff E. Automatically constructing a dictionary for information extraction tasks. Proceedings of the National Conference on Artificial Intelligence, 811-811, 1993
Titov, I. and R. McDonald. Modeling online reviews with multi-grain topic models. In Proceedings of International Conference on World Wide Web (WWW-2008), 2008.
Wu, Y., Q. Zhang, X. Huang, and L. Wu. Phrase dependency parsing for opinion mining. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), 2009.
Zhang Z., Iria J., Brewster C., Ciravegna F.A Comparative Evaluation of Term Recognition Algorithms. In The sixth international conference on Language Resources and Evaluation, (LREC 2008).
1 http://www.citilink.ru/
2 http://company.yandex.ru/technologies/mystem/
3 http://www.yandex.ru/
4 http://xml.yandex.ru/
5 http://market.yandex.ru/
6 http://lib.rus.ec/
7 http://www.ruscorpora.ru/en/index.html
8 http://www.cs.waikato.ac.nz/ml/weka/