Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

Fitting ANOVA models

215

anxiety changed from week five to the six-month follow-up. A significant Therapy × Time interaction indicates that the two treatments for anxiety had a differential impact over time (that is, the change in anxiety from five weeks to six months was different for the two treatments).

Now let’s extend the design a bit. It’s known that depression can have an impact on therapy, and that depression and anxiety often co-occur. Even though subjects were randomly assigned to treatment conditions, it’s possible that the two therapy groups differed in patient depression levels at the initiation of the study. Any posttherapy differences might then be due to the preexisting depression differences and not to your experimental manipulation. Because depression could also explain the group differences on the dependent variable, it’s a confounding factor. And because you’re not interested in depression, it’s called a nuisance variable.

If you recorded depression levels using a self-report depression measure such as the Beck Depression Inventory (BDI) when patients were recruited, you could statistically adjust for any treatment group differences in depression before assessing the impact of therapy type. In this case, BDI would be called a covariate, and the design would be called an analysis of covariance (ANCOVA).

Finally, you’ve recorded a single dependent variable in this study (the STAI). You could increase the validity of this study by including additional measures of anxiety (such as family ratings, therapist ratings, and a measure assessing the impact of anxiety on their daily functioning). When there’s more than one dependent variable, the design is called a multivariate analysis of variance (MANOVA). If there are covariates present, it’s called a multivariate analysis of covariance (MANCOVA).

Now that you have the basic terminology under your belt, you’re ready to amaze your friends, dazzle new acquaintances, and learn how to fit ANOVA/ANCOVA/ MANOVA models with R.

9.2Fitting ANOVA models

Although ANOVA and regression methodologies developed separately, functionally they’re both special cases of the general linear model. You could analyze ANOVA models using the same lm() function used for regression in chapter 7. But you’ll primarily use the aov() function in this chapter. The results of lm() and aov() are equivalent, but the aov() function presents these results in a format that’s more familiar to ANOVA methodologists. For completeness, I’ll provide an example using lm() at the end of this chapter.

9.2.1The aov() function

The syntax of the aov() function is aov(formula, data=dataframe). Table 9.4 describes special symbols that can be used in the formulas. In this table, y is the dependent variable and the letters A, B, and C represent factors.

216

 

CHAPTER 9 Analysis of variance

 

Table 9.4 Special symbols used in R formulas

 

 

 

 

Symbol

Usage

 

 

 

~Separates response variables on the left from the explanatory variables on the right. For example, a prediction of y from A, B, and C would be coded

y ~ A + B + C

:Denotes an interaction between variables. A prediction of y from A, B, and the interaction between A and B would be coded

y ~ A + B + A:B

*

Denotes the complete crossing variables. The code y ~ A*B*C expands to

 

y ~ A + B + C + A:B + A:C + B:C + A:B:C

^Denotes crossing to a specified degree. The code y ~ (A+B+C)^2 expands to y ~ A + B + C + A:B + A:C + A:B

. Denotes all remaining variables. The code y ~ . expands to

y ~ A + B + C

Table 9.5 provides formulas for several common research designs. In this table, lowercase letters are quantitative variables, uppercase letters are grouping factors, and Subject is a unique identifier variable for subjects.

Table 9.5 Formulas for common research designs

Design

Formula

 

 

One-way ANOVA

y ~ A

One-way ANCOVA with 1 covariate

y ~ x + A

Two-way factorial ANOVA

y ~ A * B

Two-way factorial ANCOVA with 2 covariates

y ~ x1 + x2 + A * B

Randomized block

y ~ B + A (where B is a blocking factor)

One-way within-groups ANOVA

y ~ A + Error(Subject/A)

Repeated measures ANOVA with 1 within-groups

y ~ B * W + Error(Subject/W)

factor (W) and 1 between-groups factor (B)

 

 

 

We’ll explore in-depth examples of several of these designs later in this chapter.

9.2.2The order of formula terms

The order in which the effects appear in a formula matters when (a) there’s more than one factor and the design is unbalanced, or (b) covariates are present. When either of these two conditions is present, the variables on the right side of the equation will be correlated with each other. In this case, there’s no unambiguous way to divide up their impact on the dependent variable. For example, in a two-way ANOVA

Fitting ANOVA models

217

with unequal numbers of observations in the treatment combinations, the model y ~ A*B will not produce the same results as the model y ~ B*A.

By default, R employs the Type I (sequential) approach to calculating ANOVA effects (see the sidebar “Order counts!”). The first model can be written as y ~ A + B

+A:B. The resulting R ANOVA table will assess

The impact of A on y

The impact of B on y, controlling for A

The interaction of A and B, controlling for the A and B main effects

Order counts!

When independent variables are correlated with each other or with covariates, there’s no unambiguous method for assessing the independent contributions of these variables to the dependent variable. Consider an unbalanced two-way factorial design with factors A and B and dependent variable y. There are three effects in this design: the A and B main effects and the A × B interaction. Assuming that you’re modeling the data using the formula

Y ~ A + B + A:B

there are three typical approaches for partitioning the variance in y among the effects on the right side of this equation.

TYPE I (SEQUENTIAL)

Effects are adjusted for those that appear earlier in the formula. A is unadjusted. B is adjusted for the A. The A:B interaction is adjusted for A and B.

TYPE II (HIERARCHICAL)

Effects are adjusted for other effects at the same or lower level. A is adjusted for B. B is adjusted for A. The A:B interaction is adjusted for both A and B.

TYPE III (MARGINAL)

Each effect is adjusted for every other effect in the model. A is adjusted for B and A:B. B is adjusted for A and A:B. The A:B interaction is adjusted for A and B.

R employs the Type I approach by default. Other programs such as SAS and SPSS employ the Type III approach by default.

The greater the imbalance in sample sizes, the greater the impact that the order of the terms will have on the results. In general, more fundamental effects should be listed earlier in the formula. In particular, covariates should be listed first, followed by main effects, followed by two-way interactions, followed by three-way interactions, and so on. For main effects, more fundamental variables should be listed first. Thus gender would be listed before treatment. Here’s the bottom line: when the research design isn’t orthogonal (that is, when the factors and/or covariates are correlated), be careful when specifying the order of effects.

218

CHAPTER 9 Analysis of variance

Before moving on to specific examples, note that the Anova() function in the car package (not to be confused with the standard anova() function) provides the option of using the Type II or Type III approach, rather than the Type I approach used by the aov() function. You may want to use the Anova() function if you’re concerned about matching your results to those provided by other packages such as SAS and SPSS. See help(Anova, package="car") for details.

9.3One-way ANOVA

In a one-way ANOVA, you’re interested in comparing the dependent variable means of two or more groups defined by a categorical grouping factor. This example comes from the cholesterol dataset in the multcomp package, taken from Westfall, Tobia, Rom, & Hochberg (1999). Fifty patients received one of five cholesterol-reducing drug regimens (trt). Three of the treatment conditions involved the same drug administered as 20 mg once per day (1time), 10mg twice per day (2times), or 5 mg four times per day (4times). The two remaining conditions (drugD and drugE) represented competing drugs. Which drug regimen produced the greatest cholesterol reduction (response)? The analysis is provided in the following listing.

Listing 9.1 One-way ANOVA

> library(multcomp)

 

 

b Group sample sizes

> attach(cholesterol)

 

 

> table(trt)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

trt

 

 

 

 

 

 

 

 

 

1time 2times 4times

drugD

drugE

 

10

10

10

10

10

 

c Group means

 

 

 

 

 

 

 

 

> aggregate(response, by=list(trt), FUN=mean)

 

 

 

 

Group.1

x

 

 

 

 

 

 

 

1

1time

5.78

 

 

 

 

 

 

 

2

2times

9.22

 

 

 

 

 

 

 

3

4times 12.37

 

 

 

 

 

 

 

4

drugD 15.36

 

 

 

 

 

 

 

5

drugE 20.95

 

 

 

 

 

 

 

d Group standard deviations

> aggregate(response, by=list(trt), FUN=sd) Group.1 x

11time 2.88

22times 3.48

34times 2.92

4drugD 3.45

5drugE 3.35

> fit <- aov(response ~ trt)

 

 

e Tests for group

> summary(fit)

 

 

 

 

differences (ANOVA)

 

Df Sum Sq

Mean Sq

F value

Pr(>F)

trt

4

1351

338

32.4

9.8e-13 ***

Residuals

45

469

10

 

 

 

---

 

 

 

 

 

 

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]