Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Handbook_of_statistical_analysis_using_SAS

.pdf
Скачиваний:
17
Добавлен:
01.05.2015
Размер:
4.92 Mб
Скачать

 

 

The GLM Procedure

 

 

 

 

Class Level Information

 

 

 

Class

Levels

Values

 

 

 

origin

2

A N

 

 

 

sex

 

2

F M

 

 

 

grade

4

F0 F1 F2 F3

 

 

 

type

2

AL SL

 

 

 

Number of observations 154

 

 

 

 

The GLM Procedure

 

 

Dependent Variable: days

 

 

 

 

 

 

 

 

Sum of

 

 

 

Source

DF

 

Squares

Mean Square

F Value

Pr > F

Model

6

4953.56458

825.59410

3.60

0.0023

Error

147

33752.57179

229.60933

 

 

Corrected Total

153

38706.13636

 

 

 

R-Square Coeff Var Root MSE days Mean

0.127979 93.90508 15.15287 16.13636

Source

DF

Type I SS Mean Square

F Value

Pr > F

origin

1

2645

.652580

2645

.652580

11

.52

0.0009

sex

1

338

.877090

338

.877090

1

.48

0.2264

grade

3

1837

.020006

612

.340002

2

.67

0.0500

type

1

132

.014900

132

.014900

0

.57

0.4495

Source

DF

Type III SS

Mean Square

F Value

Pr > F

origin

1

2403

.606653

2403

.606653

10

.47

0.0015

sex

1

185

.647389

185

.647389

0

.81

0.3700

grade

3

1917

.449682

639

.149894

2

.78

0.0430

type

1

132

.014900

132

.014900

0

.57

0.4495

©2002 CRC Press LLC

The GLM Procedure

Class Level Information

Class

Levels

Values

origin

2

A N

sex

2

F M

grade

4

F0 F1 F2 F3

type

2

AL SL

Number of observations 154

The GLM Procedure

Dependent Variable: days

 

 

Sum of

 

 

 

Source

DF

Squares

Mean Square

F Value

Pr > F

Model

6

4953

.56458

825.59410

3.60

0.0023

Error

147

33752

.57179

229.60933

 

 

Corrected Total

153

38706

.13636

 

 

 

R-Square Coeff Var Root MSE days Mean

0.127979 93.90508 15.15287 16.13636

Source

DF

Type I SS Mean Square

F Value

Pr > F

grade

3

2277

.172541

759

.057514

3

.31

0.0220

sex

1

124

.896018

124

.896018

0

.54

0.4620

type

1

147

.889364

147

.889364

0

.64 0.4235

origin

1

2403

.606653

2403

.606653

10

.47

0.0015

©2002 CRC Press LLC

The GLM Procedure

Class Level Information

Class

Levels

Values

origin

2

A N

sex

2

F M

grade

4

F0 F1 F2 F3

type

2

AL SL

Number of observations 154

 

 

The GLM Procedure

 

 

Dependent Variable: days

 

 

 

 

 

 

 

Sum of

 

 

 

Source

DF

Squares

Mean Square

F Value

Pr > F

Model

6

4953

.56458

825.59410

3.60

0.0023

Error

147

33752

.57179

229.60933

 

 

Corrected Total

153

38706

.13636

 

 

 

R-Square

Coeff Var

Root MSE

days Mean

0.127979

93.90508

15.15287

16.13636

Source

DF

Type I SS Mean Square

F Value

Pr > F

type

1

19

.502391

19

.502391

0

.08

0.7711

sex

1

336

.215409

336

.215409

1

.46 0.2282

origin

1

2680

.397094 2680

.397094

11

.67

0.0008

grade

3

1917

.449682

639

.149894

2

.78

0.0430

©2002 CRC Press LLC

The GLM Procedure

Class Level Information

Class

Levels

Values

origin

2

A N

sex

2

F M

grade

4

F0 F1 F2 F3

type

2

AL SL

Number of observations 154

The GLM Procedure

Dependent Variable: days

 

 

Sum of

 

 

 

Source

DF

Squares

Mean Square

F Value

Pr > F

Model

6

4953

.56458

825.59410

3.60

0.0023

Error

147

33752

.57179

229.60933

 

 

Corrected Total

153

38706

.13636

 

 

 

 

R-Square

Coeff Var

Root MSE

days Mean

 

 

0.127979 93.90508

15.15287

16.13636

 

Source

DF

Type I SS

Mean Square

F Value

Pr > F

sex

1

308

.062554

308

.062554

1

.34 0.2486

origin

1

2676

.467116

2676

.467116

11

.66

0.0008

type

1

51

.585224

51

.585224

0

.22

0.6362

grade

3

1917

.449682

639

.149894

2

.78

0.0430

Display 6.2

©2002 CRC Press LLC

Next we fit a full factorial model to the data as follows:

proc glm data=ozkids;

class origin sex grade type;

model days=origin sex grade type origin|sex|grade|type /ss1 ss3;

run;

Joining variable names with a bar is a shorthand way of specifying an interaction and all the lower-order interactions and main effects implied by it. This is useful not only to save typing but to ensure that relevant terms in the model are not inadvertently omitted. Here we have explicitly specified the main effects so that they are entered before any interaction terms when calculating Type I sums of squares.

The output is shown in Display 6.3. Note first that the only Type I and Type III sums of squares that agree are those for the origin * sex * grade * type interaction. Now consider the origin main effect. The Type I sum of squares for origin is “corrected” only for the mean because it appears first in the proc glm statement. The effect is highly significant. But using Type III sums of squares, in which the origin effect is corrected for all other main effects and interactions, the corresponding F value has an associated P-value of 0.2736. Now origin is judged nonsignificant, but this may simply reflect the loss of power after “adjusting” for a lot of relatively unimportant interaction terms.

Arriving at a final model for these data is not straightforward (see Aitkin [1978] for some suggestions), and the issue is not pursued here because the data set will be the subject of further analyses in Chapter 9. However, some of the exercises encourage readers to try some alternative analyses of variance.

The GLM Procedure

Class Level Information

Class

Levels

Values

origin

2

A N

se

2

F M

grade

4

F0 F1 F2 F3

type

2

AL SL

Number of observations 154

©2002 CRC Press LLC

The GLM Procedure

Dependent Variable: days

 

 

Sum of

 

 

 

 

Source

DF

Squares

Mean Square

F Value

Pr > F

Model

31

15179.41930

489

.65869

2.54

0.0002

Error

122

23526.71706

192

.84194

 

 

Corrected Total

153

38706.13636

 

 

 

 

R-Square

Coeff Var

Root MSE days Mean

 

0.392171

86.05876

13.88675

16

.13636

 

Source

DF

Type I SS

Mean Square

F Value

Pr > F

origin

1

2645

.652580

2645

.652580

13

.72

0.0003

sex

1

338

.877090

338

.877090

1

.76

0.1874

grade

3

1837

.020006

612

.340002

3

.18

0.0266

type

1

132

.014900

132

.014900

0

.68

0.4096

origin*sex

1

142

.454554

142

.454554

0

.74

0.3918

origin*grade

3

3154

.799178

1051

.599726

5

.45

0.0015

sex*grade

3

2009

.479644

669

.826548

3

.47

0.0182

origin*sex*grade

3

226

.309848

75

.436616

0

.39

0.7596

origin*type

1

38

.572890

38

.572890

0

.20

0.6555

sex*type

1

69

.671759

69

.671759

0

.36

0.5489

origin*sex*type

1

601

.464327

601

.464327

3

.12

0.0799

grade*type

3

2367

.497717

789

.165906

4

.09

0.0083

origin*grade*type

3

887

.938926

295

.979642

1

.53

0.2089

sex*grade*type

3

375

.828965

125

.276322

0

.65

0.5847

origi*sex*grade*type

3

351

.836918

117

.278973

0

.61

0.6109

©2002 CRC Press LLC

 

Source

DF

Type III SS

Mean Square

F Value

Pr > F

 

 

origin

1

233

.201138

233

.201138

1.21

0.2736

 

 

sex

1

344

.037143

344

.037143

1.78

0.1841

 

 

grade

3

1036

.595762

345

.531921

1.79

0.1523

 

 

type

1

181

.049753

181

.049753

0.94

0.3345

 

 

origin*sex

1

3

.261543

3

.261543

0.02

0.8967

 

 

origin*grade

3

1366

.765758

455

.588586

2.36

0.0746

 

 

sex*grade

3

1629

.158563

543

.052854

2.82

0.0420

 

 

origin*sex*grade

3

32

.650971

10

.883657

0.06

0.9823

 

 

origin*type

1

55

.378055

55

.378055

0.29

0.5930

 

 

sex*type

1

1

.158990

1

.158990

0.01

0.9383

 

 

origin*sex*type

1

337

.789437

337

.789437

1.75

0.1881

 

 

grade*type

3

2037

.872725

679

.290908

3.52

0.0171

 

 

origin*grade*type

3

973

.305369

324

.435123

1.68

0.1743

 

 

sex*grade*type

3

410

.577832

136

.859277

0.71

0.5480

 

 

origi*sex*grade*type

3

351

.836918

117

.278973

0.61

0.6109

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Display 6.3

Exercises

6.1Investigate simpler models for the data used in this chapter by dropping interactions or sets of interactions from the full factorial model fitted in the text. Try several different orders of effects.

6.2The outcome for the data in this chapter — number of days absent

— is a count variable. Consequently, assuming normally distributed errors may not be entirely appropriate, as we will see in Chapter 9. Here, however, we might deal with this potential problem by way of a transformation. One possibility is a log transformation. Investigate this possibility.

6.3Find a table of cell means and standard deviations for the data used in this chapter.

6.4Construct a normal probability plot of the residuals from fitting a main-effects-only model to the data used in this chapter. Comment on the results.

©2002 CRC Press LLC

Chapter 7

Analysis of Variance

of Repeated Measures:

Visual Acuity

7.1 Description of Data

The data used in this chapter are taken from Table 397 of SDS. They are reproduced in Display 7.1. Seven subjects had their response times measured when a light was flashed into each eye through lenses of powers 6/6, 6/18, 6/36, and 6/60. Measurements are in milliseconds, and the question of interest was whether or not the response time varied with lens strength. (A lens of power a/b means that the eye will perceive as being at “a” feet an object that is actually positioned at “b” feet.)

7.2 Repeated Measures Data

The observations in Display 7.1 involve repeated measures. Such data arise often, particularly in the behavioural sciences and related disciplines, and involve recording the value of a response variable for each subject under more than one condition and/or on more than one occasion.

©2002 CRC Press LLC

Visual Acuity and Lens Strength

 

 

 

Left Eye

 

 

 

Right Eye

 

Subject

6/6

6/18

6/36

6/60

6/6

6/18

6/36

6/60

 

 

 

 

 

 

 

 

 

1

116

119

116

124

120

117

114

122

2

110

110

114

115

106

112

110

110

3

117

118

120

120

120

120

120

124

4

112

116

115

113

115

116

116

119

5

113

114

114

118

114

117

116

112

6

119

115

94

116

100

99

94

97

7

110

110

105

118

105

105

115

115

 

 

 

 

 

 

 

 

 

 

 

Display 7.1

Researchers typically adopt the repeated measures paradigm as a means of reducing error variability and/or as the natural way of measuring certain phenomena (e.g., developmental changes over time, learning and memory tasks, etc). In this type of design, the effects of experimental factors giving rise to the repeated measures are assessed relative to the average response made by a subject on all conditions or occasions. In essence, each subject serves as his or her own control and, accordingly, variability due to differences in average responsiveness of the subjects is eliminated from the extraneous error variance. A consequence of this is that the power to detect the effects of within-subjects experimental factors is increased compared to testing in a between-subjects design.

Unfortunately, the advantages of a repeated measures design come at a cost, and that cost is the probable lack of independence of the repeated measurements. Observations made under different conditions involving the same subjects will very likely be correlated rather than independent. This violates one of the assumptions of the analysis of variance procedures described in Chapters 5 and 6, and accounting for the dependence between observations in a repeated measures designs requires some thought. (In the visual acuity example, only within-subject factors occur; and it is possible — indeed likely — that the lens strengths under which a subject was observed were given in random order. However, in examples where time is the single within-subject factor, randomisation is not, of course, an option. This makes the type of study in which subjects are simply observed over time rather different from other repeated measures designs, and they are often given a different label — longitudinal designs. Owing to their different nature, we consider them specifically later in Chapters 10 and 11.)

©2002 CRC Press LLC

7.3Analysis of Variance for Repeated Measures Designs

Despite the lack of independence of the observations made within subjects in a repeated measures design, it remains possible to use relatively straightforward analysis of variance procedures to analyse the data if three particular assumptions about the observations are valid; that is

1.Normality: the data arise from populations with normal distributions.

2.Homogeneity of variance: the variances of the assumed normal distributions are equal.

3.Sphericity: the variances of the differences between all pairs of the repeated measurements are equal. This condition implies that the correlations between pairs of repeated measures are also equal, the so-called compound symmetry pattern.

It is the third assumption that is most critical for the validity of the analysis of variance F-tests. When the sphericity assumption is not regarded as likely, there are two alternatives to a simple analysis of variance: the use of correction factors and multivariate analysis of variance. All three possibilities will be considered in this chapter.

We begin by considering a simple model for the visual acuity observations, yijk, where yijk represents the reaction time of the ith subject for eye j and lens strength k. The model assumed is

yijk = µ + α j + β k + (αβ )jk + γ i + (γα )ij + (γβ )ik + (γαβ )ijk + ijk (7.1)

where α j represents the effect of eye j, β k is the effect of the kth lens strength, and (αβ )jk is the eye × lens strength interaction. The term γ i is

a constant associated with subject i and (γα )ij, (γβ )ik, and (γαβ )ijk represent interaction effects of subject i with each factor and their interaction. The

terms α j, β k, and (αβ )jk are assumed to be fixed effects, but the subject and error terms are assumed to be random variables from normal distributions with zero means and variances specific to each term. This is an example of a mixed model.

Equal correlations between the repeated measures arise as a consequence of the subject effects in this model; and if this structure is valid, a relatively straightforward analysis of variance of the data can be used. However, when the investigator thinks the assumption of equal correlations is too strong, there are two alternatives that can be used:

©2002 CRC Press LLC

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]