Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

546

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 78 79 80 81 82 83 84 85 86 87 88 8990 / 17390 91 92 93 94 95 96 97 98 99 100 101 102 > Следующая >>>

Permutation tests with the lmPerm package

287

12.3 Permutation tests with the lmPerm package

The lmPerm package provides support for a permutation approach to linear models. In particular, the lmp() and aovp() functions are the lm() and aov() functions modified to perform permutation tests rather than normal theory tests.

The parameters in the lmp() and aovp() functions are similar to those in the lm() and aov() functions, with the addition of a perm= parameter. The perm= option can take the value Exact, Prob, or SPR. Exact produces an exact test, based on all possible permutations. Prob samples from all possible permutations. Sampling continues until the estimated standard deviation falls below 0.1 of the estimated p-value. The stopping rule is controlled by an optional Ca parameter. Finally, SPR uses a sequential probability ratio test to decide when to stop sampling. Note that if the number of observations is greater than 10, perm="Exact" will automatically default to perm="Prob"; exact tests are only available for small problems.

To see how this works, you’ll apply a permutation approach to simple regression, polynomial regression, multiple regression, one-way analysis of variance, one-way analysis of covariance, and a two-way factorial design.

12.3.1Simple and polynomial regression

In chapter 8, you used linear regression to study the relationship between weight and height for a group of 15 women. Using lmp() instead of lm() generates the permutation test results shown in the following listing.

Listing 12.2 Permutation tests for simple linear regression

>library(lmPerm)

>set.seed(1234)

>fit <- lmp(weight~height, data=women, perm="Prob") [1] "Settings: unique SS : numeric variables centered"

>summary(fit)

Call:

lmp(formula = weight ~ height, data = women, perm = "Prob")

Residuals:
Min	1Q	Median		3Q	Max
-1.733	-1.133	-0.383		0.742	3.117
Coefficients:
	Estimate		Iter Pr(Prob)
height	3.45		5000	<2e-16	***
---
Signif. codes:			0 '***' 0.001		'*' 0.01 '' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.5 on 13 degrees of freedom

Multiple R-Squared: 0.991, Adjusted R-squared: 0.99

F-statistic: 1.43e+03 on 1 and 13 DF, p-value: 1.09e-14

To fit a quadratic equation, you could use the code in this next listing.

288	CHAPTER 12 Resampling statistics and bootstrapping

Listing 12.3 Permutation tests for polynomial regression

>library(lmPerm)

>set.seed(1234)

>fit <- lmp(weight~height + I(height^2), data=women, perm="Prob") [1] "Settings: unique SS : numeric variables centered"

>summary(fit)

Call:

lmp(formula = weight ~ height + I(height^2), data = women, perm = "Prob")

Residuals:
Min	1Q	Median		3Q		Max
-0.5094 -0.2961		-0.0094		0.2862	0.5971
Coefficients:
	Estimate Iter Pr(Prob)
height	-7.3483		5000	<2e-16		***
I(height^2)	0.0831		5000	<2e-16		***
---
Signif. codes:		0 '*' 0.001 ''				0.01	'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.38 on 12 degrees of freedom

Multiple R-Squared: 0.999, Adjusted R-squared: 0.999

F-statistic: 1.14e+04 on 2 and 12 DF, p-value: <2e-16

As you can see, it’s a simple matter to test these regressions using permutation tests and requires little change in the underlying code. The output is also similar to that produced by the lm() function. Note that an Iter column is added, indicating how many iterations were required to reach the stopping rule.

12.3.2Multiple regression

In chapter 8, multiple regression was used to predict the murder rate based on population, illiteracy, income, and frost for 50 US states. Applying the lmp() function to this problem results in the following output.

Listing 12.4 Permutation tests for multiple regression

>library(lmPerm)

>set.seed(1234)

>states <- as.data.frame(state.x77)

>fit <- lmp(Murder~Population + Illiteracy+Income+Frost,

data=states, perm="Prob")

[1] "Settings: unique SS : numeric variables centered" > summary(fit)

Call:

lmp(formula = Murder ~ Population + Illiteracy + Income + Frost, data = states, perm = "Prob")

Residuals:

Min 1Q Median 3Q Max -4.79597 -1.64946 -0.08112 1.48150 7.62104

	Permutation tests with the lmPerm package				289
Coefficients:
	Estimate Iter Pr(Prob)
Population 2.237e-04		51	1.0000
Illiteracy 4.143e+00 5000			0.0004	***
Income	6.442e-05	51	1.0000
Frost	5.813e-04	51	0.8627
---
Signif. codes: 0 '*' 0.001 ''				0.01 '*' 0.05 '. ' 0.1 ' ' 1

Residual standard error: 2.535 on 45 degrees of freedom

Multiple R-Squared: 0.567, Adjusted R-squared: 0.5285

F-statistic: 14.73 on 4 and 45 DF, p-value: 9.133e-08

Looking back to chapter 8, both Population and Illiteracy are significant (p < 0.05) when normal theory is used. Based on the permutation tests, the Population variable is no longer significant. When the two approaches don’t agree, you should look at your data more carefully. It may be that the assumption of normality is untenable or that outliers are present.

12.3.3One-way ANOVA and ANCOVA

Each of the analysis of variance designs discussed in chapter 9 can be performed via permutation tests. First, let’s look at the one-way ANOVA problem considered in section 9.1 on the impact of treatment regimens on cholesterol reduction. The code and results are given in the next listing.

Listing 12.5 Permutation test for one-way ANOVA

>library(lmPerm)

>library(multcomp)

>set.seed(1234)

>fit <- aovp(response~trt, data=cholesterol, perm="Prob") [1] "Settings: unique SS "

>anova(fit)

Component 1 :
	Df R Sum Sq		R Mean Sq Iter	Pr(Prob)
trt	4	1351.37	337.84 5000	< 2.2e-16	***
Residuals	45	468.75	10.42
---
Signif. codes:		0 '***'	0.001 '*' 0.01 '' 0.05		'. ' 0.1 ' ' 1

The results suggest that the treatment effects are not all equal.

This second example in this section applies a permutation test to a one-way analysis of covariance. The problem is from chapter 9, where you investigated the impact of four drug doses on the litter weights of rats, controlling for gestation times. The next listing shows the permutation test and results.

Listing 12.6 Permutation test for one-way ANCOVA

>library(lmPerm)

>set.seed(1234)

>fit <- aovp(weight ~ gesttime + dose, data=litter, perm="Prob")

290		CHAPTER 12 Resampling statistics and bootstrapping
[1] "Settings:		unique SS : numeric variables centered"
> anova(fit)
Component 1 :
	Df R Sum Sq		R Mean Sq Iter Pr(Prob)
gesttime	1	161.49	161.493	5000	0.0006	***
dose	3	137.12	45.708	5000	0.0392	*
Residuals	69	1151.27	16.685
---
Signif. codes:		0 '***'	0.001 '*' 0.01 '' 0.05 '.' 0.1 ' ' 1

Based on the p-values, the four drug doses don’t equally impact litter weights, controlling for gestation time.

12.3.4Two-way ANOVA

You’ll end this section by applying permutation tests to a factorial design. In chapter 9, you examined the impact of vitamin C on the tooth growth in guinea pigs. The two manipulated factors were dose (three levels) and delivery method (two levels). Ten guinea pigs were placed in each treatment combination, resulting in a balanced 3 × 2 factorial design. The permutation tests are provided in the next listing.

Listing 12.7 Permutation test for two-way ANOVA

>library(lmPerm)

>set.seed(1234)

>fit <- aovp(len~supp*dose, data=ToothGrowth, perm="Prob") [1] "Settings: unique SS : numeric variables centered"

>anova(fit)

Component 1 :
	Df R Sum Sq		R Mean Sq Iter Pr(Prob)
supp	1	205.35	205.35	5000	< 2e-16	***
dose	1	2224.30	2224.30	5000	< 2e-16	***
supp:dose	1	88.92	88.92	2032	0.04724	*
Residuals	56	933.63	16.67
---
Signif. codes:		0 '***'	0.001 '**' 0.01		'*' 0.05 '.' 0.1 ' ' 1

At the .05 level of significance, all three effects are statistically different from zero. At the .01 level, only the main effects are significant.

It’s important to note that when aovp() is applied to ANOVA designs, it defaults to unique sums of squares (also called SAS Type III sums of squares). Each effect is adjusted for every other effect. The default for parametric ANOVA designs in R is sequential sums of squares (SAS Type I sums of squares). Each effect is adjusted for those that appear earlier in the model. For balanced designs, the two approaches will agree, but for unbalanced designs with unequal numbers of observations per cell, they won’t. The greater the imbalance, the greater the disagreement. If desired, specifying seqs=TRUE in the aovp() function will produce sequential sums of squares. For more on Type I and Type III sums of squares, see section 9.2.

<<< < Предыдущая 78 79 80 81 82 83 84 85 86 87 88 8990 / 17390 91 92 93 94 95 96 97 98 99 100 101 102 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб2psihologia.rtf
#
02.06.2015162.69 Кб86Psyh_final_ver.docx
#
02.06.2015141.74 Кб48Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб546R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб19Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб7Raeff_Modernity.pdf
#
26.03.20162.12 Mб22raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
01.05.2025173.06 Кб0raschet_zdaniy.doc