Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 110 111 112 113 114 115 116 117 118 119 120 121122 / 173122 123 124 125 126 127 128 129 130 131 132 133 134 > Следующая >>>

Decision trees

397

Finally, the predict() function is used to classify each observation in the validation sample d. A cross-tabulation of the actual status against the predicted status is provided. The overall accuracy was 96% in the validation sample. Unlike the logistic regression example, all 210 cases in the validation sample could be classified by the final tree. Note that decision trees can be biased toward selecting predictors that have many levels or many missing values.

17.3.2Conditional inference trees

Before moving on to random forests, let’s look at an important variant of the traditional decision tree called a conditional inference tree. Conditional inference trees are similar to traditional trees, but variables and splits are selected based on significance tests rather than purity/homogeneity measures. The significance tests are permutation tests (discussed in chapter 12).

In this case, the algorithm is as follows:

1Calculate p-values for the relationship between each predictor and the outcome variable.

2Select the predictor with the lowest p-value.

3Explore all possible binary splits on the chosen predictor and dependent variable (using permutation tests), and pick the most significant split.

4Separate the data into these two groups, and continue the process for each subgroup.

5Continue until splits are no longer significant or the minimum node size is reached.

Conditional inference trees are provided by the ctree() function in the party package. In the next listing, a conditional inference tree is grown for the breast cancer data.

Listing 17.4 Creating a conditional inference tree with ctree()

library(party)

fit.ctree <- ctree(class~., data=df.train)

plot(fit.ctree, main="Conditional Inference Tree")

>ctree.pred <- predict(fit.ctree, df.validate, type="response")

>ctree.perf <- table(df.validate$class, ctree.pred,

		dnn=c("Actual", "Predicted"))
> ctree.perf
	Predicted
Actual	benign malignant
benign	122	7
malignant	3	78

Note that pruning isn’t required for conditional inference trees, and the process is somewhat more automated. Additionally, the party package has attractive plotting

398	CHAPTER 17 Classification

Conditional Inference Tree

1 sizeUniformity p < 0.001

≤ 3

> 3

2 bareNuclei p < 0.001

≤ 5

> 5

3 normalNucleoli p < 0.001

≤ 3

> 3

4 bareNuclei p < 0.001

	≤ 2		> 2
Node 5 (n = 297)		Node 6 (n = 20)		Node 7 (n = 13)		Node 8 (n = 17)		Node 9 (n = 142)
benign	1	benign	1	benign	1	benign	1	benign	1
	0.8		0.8		0.8		0.8		0.8

	0.6		0.6		0.6		0.6		0.6
malignant	0.4	malignant	0.4	malignant	0.4	malignant	0.4	malignant	0.4
	0.2		0.2		0.2		0.2		0.2
	0		0		0		0		0
	0		0		0		0		0

Figure 17.3 Conditional inference tree for the breast cancer data

options. The conditional inference tree is plotted in figure 17.3. The shaded area of each node represents the proportion of malignant cases in that node.

Displaying an rpart() tree with a ctree()-like graph

If you create a classical decision tree using rpart(), but you’d like to display the resulting tree using a plot like the one in figure 17.3, the partykit package can help. After installing and loading the package, you can use the statement plot(as.party(an.rpart.tree)) to create the desired graph. For example, try creating a graph like figure 17.3 using the dtree.pruned object created in listing 17.3, and compare the results to the plot presented in figure 17.2.

The decision trees grown by the traditional and conditional methods can differ substantially. In the current example, the accuracy of each is similar. In the next section, a large number of decision trees are grown and combined in order to classify cases into groups.

<<< < Предыдущая 110 111 112 113 114 115 116 117 118 119 120 121122 / 173122 123 124 125 126 127 128 129 130 131 132 133 134 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx