Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

432

CHAPTER 18 Advanced methods for missing data

approaches. If you’re interested in the multiple imputation approach to missing data, I recommend the following resources:

The multiple imputation FAQ page (www.stat.psu.edu/~jls/mifaq.html)

Articles by Van Buuren and Groothuis-Oudshoorn (2010) and Yu-Sung, Gelman, Hill, and Yajima (2010)

Amelia II: A Program for Missing Data (http://gking.harvard.edu/amelia)

Each can help to reinforce and extend your understanding of this important, but underutilized, methodology.

18.8 Other approaches to missing data

R supports several other approaches for dealing with missing data. Although not as broadly applicable as the methods described thus far, the packages described in table 18.2 offer functions that can be useful in specialized circumstances.

Table 18.2 Specialized methods for dealing with missing data

Package

Description

 

 

mvnmle

Maximum-likelihood estimation for multivariate normal data with

 

missing values

cat

Analysis of categorical-variable datasets with missing values

arrayImpute,

Useful functions for dealing with missing microarray data

arrayMissPattern, and SeqKnn

 

longitudinalData

Utility functions, including interpolation routines for imputing

 

missing time-series values

kmi

Kaplan-Meier multiple imputation for survival analysis with miss-

 

ing data

mix

Multiple imputation for mixed categorical and continuous data

pan

Multiple imputation for multivariate panel or clustered data

 

 

Finally, there are two methods for dealing with missing data that are still in use but should be considered obsolete: pairwise deletion and simple imputation.

18.8.1Pairwise deletion

Pairwise deletion is often considered an alternative to listwise deletion when working with datasets that are missing values. In pairwise deletion, observations are deleted only if they’re missing data for the variables involved in a specific analysis. Consider the following code:

> cor(sleep, use="pairwise.complete.obs")

 

 

 

 

 

 

BodyWgt BrainWgt NonD Dream Sleep

Span Gest

Pred

Exp Danger

BodyWgt

1.00

0.93

-0.4

-0.1

-0.3

0.30

0.7

0.06

0.3

0.13

BrainWgt

0.93

1.00

-0.4

-0.1

-0.4

0.51

0.7

0.03

0.4

0.15

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]