Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

Permutation tests with the lmPerm package

287

12.3 Permutation tests with the lmPerm package

The lmPerm package provides support for a permutation approach to linear models. In particular, the lmp() and aovp() functions are the lm() and aov() functions modified to perform permutation tests rather than normal theory tests.

The parameters in the lmp() and aovp() functions are similar to those in the lm() and aov() functions, with the addition of a perm= parameter. The perm= option can take the value Exact, Prob, or SPR. Exact produces an exact test, based on all possible permutations. Prob samples from all possible permutations. Sampling continues until the estimated standard deviation falls below 0.1 of the estimated p-value. The stopping rule is controlled by an optional Ca parameter. Finally, SPR uses a sequential probability ratio test to decide when to stop sampling. Note that if the number of observations is greater than 10, perm="Exact" will automatically default to perm="Prob"; exact tests are only available for small problems.

To see how this works, you’ll apply a permutation approach to simple regression, polynomial regression, multiple regression, one-way analysis of variance, one-way analysis of covariance, and a two-way factorial design.

12.3.1Simple and polynomial regression

In chapter 8, you used linear regression to study the relationship between weight and height for a group of 15 women. Using lmp() instead of lm() generates the permutation test results shown in the following listing.

Listing 12.2 Permutation tests for simple linear regression

>library(lmPerm)

>set.seed(1234)

>fit <- lmp(weight~height, data=women, perm="Prob") [1] "Settings: unique SS : numeric variables centered"

>summary(fit)

Call:

lmp(formula = weight ~ height, data = women, perm = "Prob")

Residuals:

 

 

 

 

Min

1Q

Median

3Q

Max

-1.733

-1.133

-0.383

0.742

3.117

Coefficients:

 

 

 

 

 

Estimate

Iter Pr(Prob)

 

height

3.45

5000

<2e-16

***

---

 

 

 

 

 

Signif. codes:

0 '***' 0.001

'**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.5 on 13 degrees of freedom

Multiple R-Squared: 0.991, Adjusted R-squared: 0.99

F-statistic: 1.43e+03 on 1 and 13 DF, p-value: 1.09e-14

To fit a quadratic equation, you could use the code in this next listing.

288

CHAPTER 12 Resampling statistics and bootstrapping

Listing 12.3 Permutation tests for polynomial regression

>library(lmPerm)

>set.seed(1234)

>fit <- lmp(weight~height + I(height^2), data=women, perm="Prob") [1] "Settings: unique SS : numeric variables centered"

>summary(fit)

Call:

lmp(formula = weight ~ height + I(height^2), data = women, perm = "Prob")

Residuals:

 

 

 

 

 

 

 

Min

1Q

Median

3Q

 

Max

 

-0.5094 -0.2961

-0.0094

0.2862

0.5971

 

Coefficients:

 

 

 

 

 

 

 

Estimate Iter Pr(Prob)

 

 

height

-7.3483

5000

<2e-16

***

 

I(height^2)

0.0831

5000

<2e-16

***

 

---

 

 

 

 

 

 

 

Signif. codes:

0 '***' 0.001 '**'

0.01

'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.38 on 12 degrees of freedom

Multiple R-Squared: 0.999, Adjusted R-squared: 0.999

F-statistic: 1.14e+04 on 2 and 12 DF, p-value: <2e-16

As you can see, it’s a simple matter to test these regressions using permutation tests and requires little change in the underlying code. The output is also similar to that produced by the lm() function. Note that an Iter column is added, indicating how many iterations were required to reach the stopping rule.

12.3.2Multiple regression

In chapter 8, multiple regression was used to predict the murder rate based on population, illiteracy, income, and frost for 50 US states. Applying the lmp() function to this problem results in the following output.

Listing 12.4 Permutation tests for multiple regression

>library(lmPerm)

>set.seed(1234)

>states <- as.data.frame(state.x77)

>fit <- lmp(Murder~Population + Illiteracy+Income+Frost,

data=states, perm="Prob")

[1] "Settings: unique SS : numeric variables centered" > summary(fit)

Call:

lmp(formula = Murder ~ Population + Illiteracy + Income + Frost, data = states, perm = "Prob")

Residuals:

Min 1Q Median 3Q Max -4.79597 -1.64946 -0.08112 1.48150 7.62104

 

Permutation tests with the lmPerm package

289

Coefficients:

 

 

 

 

 

Estimate Iter Pr(Prob)

 

 

Population 2.237e-04

51

1.0000

 

 

Illiteracy 4.143e+00 5000

0.0004

***

 

Income

6.442e-05

51

1.0000

 

 

Frost

5.813e-04

51

0.8627

 

 

---

 

 

 

 

 

Signif. codes: 0 '***' 0.001 '**'

0.01 '*' 0.05 '. ' 0.1 ' ' 1

 

Residual standard error: 2.535 on 45 degrees of freedom

Multiple R-Squared: 0.567, Adjusted R-squared: 0.5285

F-statistic: 14.73 on 4 and 45 DF, p-value: 9.133e-08

Looking back to chapter 8, both Population and Illiteracy are significant (p < 0.05) when normal theory is used. Based on the permutation tests, the Population variable is no longer significant. When the two approaches don’t agree, you should look at your data more carefully. It may be that the assumption of normality is untenable or that outliers are present.

12.3.3One-way ANOVA and ANCOVA

Each of the analysis of variance designs discussed in chapter 9 can be performed via permutation tests. First, let’s look at the one-way ANOVA problem considered in section 9.1 on the impact of treatment regimens on cholesterol reduction. The code and results are given in the next listing.

Listing 12.5 Permutation test for one-way ANOVA

>library(lmPerm)

>library(multcomp)

>set.seed(1234)

>fit <- aovp(response~trt, data=cholesterol, perm="Prob") [1] "Settings: unique SS "

>anova(fit)

Component 1 :

 

 

 

 

 

Df R Sum Sq

R Mean Sq Iter

Pr(Prob)

 

trt

4

1351.37

337.84 5000

< 2.2e-16

***

Residuals

45

468.75

10.42

 

 

---

 

 

 

 

 

Signif. codes:

0 '***'

0.001 '**' 0.01 '*' 0.05

'. ' 0.1 ' ' 1

The results suggest that the treatment effects are not all equal.

This second example in this section applies a permutation test to a one-way analysis of covariance. The problem is from chapter 9, where you investigated the impact of four drug doses on the litter weights of rats, controlling for gestation times. The next listing shows the permutation test and results.

Listing 12.6 Permutation test for one-way ANCOVA

>library(lmPerm)

>set.seed(1234)

>fit <- aovp(weight ~ gesttime + dose, data=litter, perm="Prob")

290

 

CHAPTER 12 Resampling statistics and bootstrapping

[1] "Settings:

unique SS : numeric variables centered"

> anova(fit)

 

 

 

 

 

 

Component 1 :

 

 

 

 

 

 

Df R Sum Sq

R Mean Sq Iter Pr(Prob)

 

gesttime

1

161.49

161.493

5000

0.0006

***

dose

3

137.12

45.708

5000

0.0392

*

Residuals

69

1151.27

16.685

 

 

 

---

 

 

 

 

 

 

Signif. codes:

0 '***'

0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the p-values, the four drug doses don’t equally impact litter weights, controlling for gestation time.

12.3.4Two-way ANOVA

You’ll end this section by applying permutation tests to a factorial design. In chapter 9, you examined the impact of vitamin C on the tooth growth in guinea pigs. The two manipulated factors were dose (three levels) and delivery method (two levels). Ten guinea pigs were placed in each treatment combination, resulting in a balanced 3 × 2 factorial design. The permutation tests are provided in the next listing.

Listing 12.7 Permutation test for two-way ANOVA

>library(lmPerm)

>set.seed(1234)

>fit <- aovp(len~supp*dose, data=ToothGrowth, perm="Prob") [1] "Settings: unique SS : numeric variables centered"

>anova(fit)

Component 1 :

 

 

 

 

 

 

Df R Sum Sq

R Mean Sq Iter Pr(Prob)

 

supp

1

205.35

205.35

5000

< 2e-16

***

dose

1

2224.30

2224.30

5000

< 2e-16

***

supp:dose

1

88.92

88.92

2032

0.04724

*

Residuals

56

933.63

16.67

 

 

 

---

 

 

 

 

 

 

Signif. codes:

0 '***'

0.001 '**' 0.01

'*' 0.05 '.' 0.1 ' ' 1

At the .05 level of significance, all three effects are statistically different from zero. At the .01 level, only the main effects are significant.

It’s important to note that when aovp() is applied to ANOVA designs, it defaults to unique sums of squares (also called SAS Type III sums of squares). Each effect is adjusted for every other effect. The default for parametric ANOVA designs in R is sequential sums of squares (SAS Type I sums of squares). Each effect is adjusted for those that appear earlier in the model. For balanced designs, the two approaches will agree, but for unbalanced designs with unequal numbers of observations per cell, they won’t. The greater the imbalance, the greater the disagreement. If desired, specifying seqs=TRUE in the aovp() function will produce sequential sums of squares. For more on Type I and Type III sums of squares, see section 9.2.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]