Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 117 118 119 120 121 122 123 124 125 126 127 128129 / 173129 130 131 132 133 134 135 136 137 138 139 140 141 > Следующая >>>

422	CHAPTER 18 Advanced methods for missing data

	6
	5
	4
Dream	3
	2
	1
	0

0 4

100

200

300

400

500

600

Gest

Figure 18.4 Scatter plot between amount of dream sleep and length of gestation, with information about missing data in the margins

reversed. You can see that a negative relationship exists between length of gestation and dream sleep and that dream sleep tends to be higher for mammals that are missing a gestation score. The number of observations missing values on both variables at the same time is printed in blue at the intersection of both margins (bottom left).

The VIM package has many graphs that can help you understand the role of missing data in a dataset and is well worth exploring. There are functions to produce scatter plots, box plots, histograms, scatter plot matrices, parallel plots, rug plots, and bubble plots that incorporate information about missing values.

18.3.3Using correlations to explore missing values

Before moving on, there’s one more approach worth noting. You can replace the data in a dataset with indicator variables, coded 1 for missing and 0 for present. The resulting matrix is sometimes called a shadow matrix. Correlating these indicator variables with each other and with the original (observed) variables can help you to see which variables tend to be missing together, as well as relationships between a variable’s “missingness” and the values of the other variables.

Consider the following code:

x <- as.data.frame(abs(is.na(sleep)))

The elements of data frame x are 1 if the corresponding element of sleep is missing and 0 otherwise. You can see this by viewing the first few rows of each:

>	head(sleep, n=5)
	BodyWgt	BrainWgt NonD Dream			Sleep	Span	Gest	Pred	Exp Danger
1	6654.000	5712.0	NA	NA	3.3	38.6	645	3	5	3

			Exploring missing-values patterns							423
2	1.000	6.6	6.3	2.0	8.3	4.5	42	3	1	3
3	3.385	44.5	NA	NA	12.5	14.0	60	1	1	1
4	0.920	5.7	NA	NA	16.5	NA	25	5	2	3
5	2547.000	4603.0	2.1	1.8	3.9	69.0	624	3	5	4

>head(x, n=5)

BodyWgt BrainWgt NonD Dream Sleep Span Gest Pred Exp Danger

1	1	1	0
2	0	0	0
3	1	1	0
4	1	1	1
5	0	0	0

The statement

y <- x[which(apply(x,2,sum)>0)]

extracts the variables that have some (but not all) missing values, and

cor(y)

gives you the correlations among these indicator variables:

	NonD	Dream	Sleep	Span	Gest
NonD	1.000	0.907	0.486	0.015	-0.142
Dream 0.907 1.000 0.204 0.038 -0.129
Sleep	0.486	0.204	1.000	-0.069 -0.069
Span	0.015	0.038	-0.069	1.000	0.198
Gest	-0.142 -0.129 -0.069			0.198	1.000

Here, you can see that Dream and NonD tend to be missing together (r = 0.91). To a lesser extent, Sleep and NonD tend to be missing together (r = 0.49) and Sleep and Dream tend to be missing together (r = 0.20).

Finally, you can look at the relationship between missing values in a variable and the observed values on other variables:

> cor(sleep, y,		use="pairwise.complete.obs")
	NonD	Dream	Sleep	Span	Gest
BodyWgt	0.227	0.223	0.0017	-0.058 -0.054
BrainWgt	0.179	0.163	0.0079	-0.079 -0.073
NonD	NA	NA	NA -0.043 -0.046
Dream	-0.189	NA	-0.1890	0.117	0.228
Sleep	-0.080	-0.080	NA	0.096	0.040
Span	0.083	0.060	0.0052	NA -0.065
Gest	0.202	0.051	0.1597	-0.175	NA
Pred	0.048	-0.068	0.2025	0.023	-0.201
Exp	0.245	0.127	0.2608	-0.193 -0.193
Danger	0.065	-0.067	0.2089	-0.067 -0.204
Warning message:
In cor(sleep, y,		use =	"pairwise.complete.obs") :
the standard deviation is zero

In this correlation matrix, the rows are observed variables, and the columns are indicator variables representing missingness. You can ignore the warning message and NA values in the correlation matrix; they’re artifacts of our approach.

<<< < Предыдущая 117 118 119 120 121 122 123 124 125 126 127 128129 / 173129 130 131 132 133 134 135 136 137 138 139 140 141 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx