Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

►Содержание►

<<< < Предыдущая 106 107 108 109 110 111 112 113 114 115 116 117118 / 173118 119 120 121 122 123 124 125 126 127 128 129 130 > Следующая >>>

382	CHAPTER 16 Cluster analysis

		Number of Clusters Chosen by 26 Criteria
	14
ofCriteria	12
ofCriteria	8 10
Number	4 6
	2
	0
	0	1	2	3	10	12	14	15
				Number of Clusters

Figure 16.5 Recommended number of clusters using 26 criteria provided by the

NbClust package

standardized data, the aggregate() function is used along with the cluster memberships to determine variable means for each cluster in the original metric.

How well did k-means clustering uncover the actual structure of the data contained in the Type variable? A cross-tabulation of Type (wine varietal) and cluster membership is given by

>ct.km <- table(wine$Type, fit.km$cluster)

>ct.km

	1	2	3
1	59	0	0
2	3	65	3
3	0	0	48

You can quantify the agreement between type and cluster using an adjusted Rand index, provided by the flexclust package:

>library(flexclust)

>randIndex(ct.km) [1] 0.897

The adjusted Rand index provides a measure of the agreement between two partitions, adjusted for chance. It ranges from -1 (no agreement) to 1 (perfect agreement). Agreement between the wine varietal type and the cluster solution is 0.9. Not bad—shall we have some wine?

16.4.2Partitioning around medoids

Because it’s based on means, the k-means clustering approach can be sensitive to outliers. A more robust solution is provided by partitioning around medoids (PAM). Rather than representing each cluster using a centroid (a vector of variable means), each cluster is identified by its most representative observation (called a medoid). Whereas k-means uses Euclidean distances, PAM can be based on any distance measure. It can therefore accommodate mixed data types and isn’t limited to continuous

variables.

Partitioning cluster analysis

383

The PAM algorithm is as follows:

1Randomly select K observations (call each a medoid).

2Calculate the distance/dissimilarity of every observation to each medoid.

3Assign each observation to its closest medoid.

4Calculate the sum of the distances of each observation from its medoid (total cost).

5Select a point that isn’t a medoid, and swap it with its medoid.

6Reassign every point to its closest medoid.

7Calculate the total cost.

8If this total cost is smaller, keep the new point as a medoid.

9Repeat steps 5–8 until the medoids don’t change.

A good worked example of the underlying math in the PAM approach can be found at http://en.wikipedia.org/wiki/k-medoids (I don’t usually cite Wikipedia, but this is a great example).

You can use the pam() function in the cluster package to partition around medoids. The format is pam(x, k, metric="euclidean", stand=FALSE), where x is a data matrix or data frame, k is the number of clusters, metric is the type of distance/ dissimilarity measure to use, and stand is a logical value indicating whether the variables should be standardized before calculating this metric. PAM is applied to the wine data in the following listing; see figure 16.6.

Bivariate Cluster Plot

	4
	2
Component 2	0
	−2
	−4	−2	0	2	4

Component 1

These two components explain 55.41 % of the point variability.

Figure 16.6 Cluster plot for the three-group PAM clustering of the Italian wine data

<<< < Предыдущая 106 107 108 109 110 111 112 113 114 115 116 117118 / 173118 119 120 121 122 123 124 125 126 127 128 129 130 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx