Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

Summary

413

R code executed during the data-mining session can be viewed in the Log tab and exported to a text file for reuse.

To learn more, visit the Rattle homepage (http://rattle.togaware.com/), and see Graham J. Williams’ overview article in the R journal (http://mng.bz/D16Q). Data Mining with Rattle and R, also by Williams (2011), is the definitive book on Rattle.

17.8 Summary

This chapter presented a number of machine-learning techniques for classifying observations into one of two groups. First, the use of logistic regression as a classification tool was described. Next, traditional decision trees were described, followed by conditional inference trees. The ensemble random forest approach was considered next. Finally, the increasingly popular support vector machine approach was described. The last section introduced Rattle, a graphic user interface for data mining, which allows the user point-and-click access to these functions. Rattle can be particularly useful for comparing the results of various classification techniques. Because it generates reusable R code in a log file, it can also be a useful tool for learning the syntax of many of R’s predictive analytics functions.

The techniques described in this chapter vary in complexity. Data miners typically try some of the simpler approaches (logistic regression, decision trees) and more complex, black-box approaches (random forests, support vector machines). If the black-box approaches don’t provide a significant improvement over the simpler methods, the simpler methods are usually selected for deployment.

The examples in this chapter (cancer and diabetes diagnosis) both came from the field of medicine, but classification techniques are used widely in other disciplines, including computer science, marketing, finance, economics, and the behavioral sciences. Although the examples involved a binary classification (malignant/benign, dia- betic/non-diabetic), modifications are available that allow these techniques to be used with multigroup classification problems.

To learn more about the functions in R that support classification, look in the CRAN Task View for Machine Learning and Statistical Learning (http://mng.bz/ I1Lm). Other good resources include books by Kuhn & Johnson (2013) and Torgo (2010).

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]