Handbook_of_statistical_analysis_using_SAS
.pdf
|
|
The GLM Procedure |
|
|
||
|
|
Class Level Information |
|
|
||
|
Class |
Levels |
Values |
|
|
|
|
origin |
2 |
A N |
|
|
|
|
sex |
|
2 |
F M |
|
|
|
grade |
4 |
F0 F1 F2 F3 |
|
|
|
|
type |
2 |
AL SL |
|
|
|
|
Number of observations 154 |
|
|
|||
|
|
The GLM Procedure |
|
|
||
Dependent Variable: days |
|
|
|
|
|
|
|
|
|
Sum of |
|
|
|
Source |
DF |
|
Squares |
Mean Square |
F Value |
Pr > F |
Model |
6 |
4953.56458 |
825.59410 |
3.60 |
0.0023 |
|
Error |
147 |
33752.57179 |
229.60933 |
|
|
|
Corrected Total |
153 |
38706.13636 |
|
|
|
R-Square Coeff Var Root MSE days Mean
0.127979 93.90508 15.15287 16.13636
Source |
DF |
Type I SS Mean Square |
F Value |
Pr > F |
||||
origin |
1 |
2645 |
.652580 |
2645 |
.652580 |
11 |
.52 |
0.0009 |
sex |
1 |
338 |
.877090 |
338 |
.877090 |
1 |
.48 |
0.2264 |
grade |
3 |
1837 |
.020006 |
612 |
.340002 |
2 |
.67 |
0.0500 |
type |
1 |
132 |
.014900 |
132 |
.014900 |
0 |
.57 |
0.4495 |
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
|||
origin |
1 |
2403 |
.606653 |
2403 |
.606653 |
10 |
.47 |
0.0015 |
sex |
1 |
185 |
.647389 |
185 |
.647389 |
0 |
.81 |
0.3700 |
grade |
3 |
1917 |
.449682 |
639 |
.149894 |
2 |
.78 |
0.0430 |
type |
1 |
132 |
.014900 |
132 |
.014900 |
0 |
.57 |
0.4495 |
©2002 CRC Press LLC
The GLM Procedure
Class Level Information
Class |
Levels |
Values |
origin |
2 |
A N |
sex |
2 |
F M |
grade |
4 |
F0 F1 F2 F3 |
type |
2 |
AL SL |
Number of observations 154
The GLM Procedure
Dependent Variable: days
|
|
Sum of |
|
|
|
|
Source |
DF |
Squares |
Mean Square |
F Value |
Pr > F |
|
Model |
6 |
4953 |
.56458 |
825.59410 |
3.60 |
0.0023 |
Error |
147 |
33752 |
.57179 |
229.60933 |
|
|
Corrected Total |
153 |
38706 |
.13636 |
|
|
|
R-Square Coeff Var Root MSE days Mean
0.127979 93.90508 15.15287 16.13636
Source |
DF |
Type I SS Mean Square |
F Value |
Pr > F |
||||
grade |
3 |
2277 |
.172541 |
759 |
.057514 |
3 |
.31 |
0.0220 |
sex |
1 |
124 |
.896018 |
124 |
.896018 |
0 |
.54 |
0.4620 |
type |
1 |
147 |
.889364 |
147 |
.889364 |
0 |
.64 0.4235 |
|
origin |
1 |
2403 |
.606653 |
2403 |
.606653 |
10 |
.47 |
0.0015 |
©2002 CRC Press LLC
The GLM Procedure
Class Level Information
Class |
Levels |
Values |
origin |
2 |
A N |
sex |
2 |
F M |
grade |
4 |
F0 F1 F2 F3 |
type |
2 |
AL SL |
Number of observations 154
|
|
The GLM Procedure |
|
|
||
Dependent Variable: days |
|
|
|
|
|
|
|
|
Sum of |
|
|
|
|
Source |
DF |
Squares |
Mean Square |
F Value |
Pr > F |
|
Model |
6 |
4953 |
.56458 |
825.59410 |
3.60 |
0.0023 |
Error |
147 |
33752 |
.57179 |
229.60933 |
|
|
Corrected Total |
153 |
38706 |
.13636 |
|
|
|
R-Square |
Coeff Var |
Root MSE |
days Mean |
|||||
0.127979 |
93.90508 |
15.15287 |
16.13636 |
|||||
Source |
DF |
Type I SS Mean Square |
F Value |
Pr > F |
||||
type |
1 |
19 |
.502391 |
19 |
.502391 |
0 |
.08 |
0.7711 |
sex |
1 |
336 |
.215409 |
336 |
.215409 |
1 |
.46 0.2282 |
|
origin |
1 |
2680 |
.397094 2680 |
.397094 |
11 |
.67 |
0.0008 |
|
grade |
3 |
1917 |
.449682 |
639 |
.149894 |
2 |
.78 |
0.0430 |
©2002 CRC Press LLC
The GLM Procedure
Class Level Information
Class |
Levels |
Values |
origin |
2 |
A N |
sex |
2 |
F M |
grade |
4 |
F0 F1 F2 F3 |
type |
2 |
AL SL |
Number of observations 154
The GLM Procedure
Dependent Variable: days
|
|
Sum of |
|
|
|
|
Source |
DF |
Squares |
Mean Square |
F Value |
Pr > F |
|
Model |
6 |
4953 |
.56458 |
825.59410 |
3.60 |
0.0023 |
Error |
147 |
33752 |
.57179 |
229.60933 |
|
|
Corrected Total |
153 |
38706 |
.13636 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE |
days Mean |
|
|||
|
0.127979 93.90508 |
15.15287 |
16.13636 |
|
||||
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
|||
sex |
1 |
308 |
.062554 |
308 |
.062554 |
1 |
.34 0.2486 |
|
origin |
1 |
2676 |
.467116 |
2676 |
.467116 |
11 |
.66 |
0.0008 |
type |
1 |
51 |
.585224 |
51 |
.585224 |
0 |
.22 |
0.6362 |
grade |
3 |
1917 |
.449682 |
639 |
.149894 |
2 |
.78 |
0.0430 |
Display 6.2
©2002 CRC Press LLC
Next we fit a full factorial model to the data as follows:
proc glm data=ozkids;
class origin sex grade type;
model days=origin sex grade type origin|sex|grade|type /ss1 ss3;
run;
Joining variable names with a bar is a shorthand way of specifying an interaction and all the lower-order interactions and main effects implied by it. This is useful not only to save typing but to ensure that relevant terms in the model are not inadvertently omitted. Here we have explicitly specified the main effects so that they are entered before any interaction terms when calculating Type I sums of squares.
The output is shown in Display 6.3. Note first that the only Type I and Type III sums of squares that agree are those for the origin * sex * grade * type interaction. Now consider the origin main effect. The Type I sum of squares for origin is “corrected” only for the mean because it appears first in the proc glm statement. The effect is highly significant. But using Type III sums of squares, in which the origin effect is corrected for all other main effects and interactions, the corresponding F value has an associated P-value of 0.2736. Now origin is judged nonsignificant, but this may simply reflect the loss of power after “adjusting” for a lot of relatively unimportant interaction terms.
Arriving at a final model for these data is not straightforward (see Aitkin [1978] for some suggestions), and the issue is not pursued here because the data set will be the subject of further analyses in Chapter 9. However, some of the exercises encourage readers to try some alternative analyses of variance.
The GLM Procedure
Class Level Information
Class |
Levels |
Values |
origin |
2 |
A N |
se |
2 |
F M |
grade |
4 |
F0 F1 F2 F3 |
type |
2 |
AL SL |
Number of observations 154
©2002 CRC Press LLC
The GLM Procedure
Dependent Variable: days
|
|
Sum of |
|
|
|
|
Source |
DF |
Squares |
Mean Square |
F Value |
Pr > F |
|
Model |
31 |
15179.41930 |
489 |
.65869 |
2.54 |
0.0002 |
Error |
122 |
23526.71706 |
192 |
.84194 |
|
|
Corrected Total |
153 |
38706.13636 |
|
|
|
|
R-Square |
Coeff Var |
Root MSE days Mean |
|
|||
0.392171 |
86.05876 |
13.88675 |
16 |
.13636 |
|
Source |
DF |
Type I SS |
Mean Square |
F Value |
Pr > F |
|||
origin |
1 |
2645 |
.652580 |
2645 |
.652580 |
13 |
.72 |
0.0003 |
sex |
1 |
338 |
.877090 |
338 |
.877090 |
1 |
.76 |
0.1874 |
grade |
3 |
1837 |
.020006 |
612 |
.340002 |
3 |
.18 |
0.0266 |
type |
1 |
132 |
.014900 |
132 |
.014900 |
0 |
.68 |
0.4096 |
origin*sex |
1 |
142 |
.454554 |
142 |
.454554 |
0 |
.74 |
0.3918 |
origin*grade |
3 |
3154 |
.799178 |
1051 |
.599726 |
5 |
.45 |
0.0015 |
sex*grade |
3 |
2009 |
.479644 |
669 |
.826548 |
3 |
.47 |
0.0182 |
origin*sex*grade |
3 |
226 |
.309848 |
75 |
.436616 |
0 |
.39 |
0.7596 |
origin*type |
1 |
38 |
.572890 |
38 |
.572890 |
0 |
.20 |
0.6555 |
sex*type |
1 |
69 |
.671759 |
69 |
.671759 |
0 |
.36 |
0.5489 |
origin*sex*type |
1 |
601 |
.464327 |
601 |
.464327 |
3 |
.12 |
0.0799 |
grade*type |
3 |
2367 |
.497717 |
789 |
.165906 |
4 |
.09 |
0.0083 |
origin*grade*type |
3 |
887 |
.938926 |
295 |
.979642 |
1 |
.53 |
0.2089 |
sex*grade*type |
3 |
375 |
.828965 |
125 |
.276322 |
0 |
.65 |
0.5847 |
origi*sex*grade*type |
3 |
351 |
.836918 |
117 |
.278973 |
0 |
.61 |
0.6109 |
©2002 CRC Press LLC
|
Source |
DF |
Type III SS |
Mean Square |
F Value |
Pr > F |
|
||
|
origin |
1 |
233 |
.201138 |
233 |
.201138 |
1.21 |
0.2736 |
|
|
sex |
1 |
344 |
.037143 |
344 |
.037143 |
1.78 |
0.1841 |
|
|
grade |
3 |
1036 |
.595762 |
345 |
.531921 |
1.79 |
0.1523 |
|
|
type |
1 |
181 |
.049753 |
181 |
.049753 |
0.94 |
0.3345 |
|
|
origin*sex |
1 |
3 |
.261543 |
3 |
.261543 |
0.02 |
0.8967 |
|
|
origin*grade |
3 |
1366 |
.765758 |
455 |
.588586 |
2.36 |
0.0746 |
|
|
sex*grade |
3 |
1629 |
.158563 |
543 |
.052854 |
2.82 |
0.0420 |
|
|
origin*sex*grade |
3 |
32 |
.650971 |
10 |
.883657 |
0.06 |
0.9823 |
|
|
origin*type |
1 |
55 |
.378055 |
55 |
.378055 |
0.29 |
0.5930 |
|
|
sex*type |
1 |
1 |
.158990 |
1 |
.158990 |
0.01 |
0.9383 |
|
|
origin*sex*type |
1 |
337 |
.789437 |
337 |
.789437 |
1.75 |
0.1881 |
|
|
grade*type |
3 |
2037 |
.872725 |
679 |
.290908 |
3.52 |
0.0171 |
|
|
origin*grade*type |
3 |
973 |
.305369 |
324 |
.435123 |
1.68 |
0.1743 |
|
|
sex*grade*type |
3 |
410 |
.577832 |
136 |
.859277 |
0.71 |
0.5480 |
|
|
origi*sex*grade*type |
3 |
351 |
.836918 |
117 |
.278973 |
0.61 |
0.6109 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Display 6.3
Exercises
6.1Investigate simpler models for the data used in this chapter by dropping interactions or sets of interactions from the full factorial model fitted in the text. Try several different orders of effects.
6.2The outcome for the data in this chapter — number of days absent
— is a count variable. Consequently, assuming normally distributed errors may not be entirely appropriate, as we will see in Chapter 9. Here, however, we might deal with this potential problem by way of a transformation. One possibility is a log transformation. Investigate this possibility.
6.3Find a table of cell means and standard deviations for the data used in this chapter.
6.4Construct a normal probability plot of the residuals from fitting a main-effects-only model to the data used in this chapter. Comment on the results.
©2002 CRC Press LLC
Chapter 7
Analysis of Variance
of Repeated Measures:
Visual Acuity
7.1 Description of Data
The data used in this chapter are taken from Table 397 of SDS. They are reproduced in Display 7.1. Seven subjects had their response times measured when a light was flashed into each eye through lenses of powers 6/6, 6/18, 6/36, and 6/60. Measurements are in milliseconds, and the question of interest was whether or not the response time varied with lens strength. (A lens of power a/b means that the eye will perceive as being at “a” feet an object that is actually positioned at “b” feet.)
7.2 Repeated Measures Data
The observations in Display 7.1 involve repeated measures. Such data arise often, particularly in the behavioural sciences and related disciplines, and involve recording the value of a response variable for each subject under more than one condition and/or on more than one occasion.
©2002 CRC Press LLC
Visual Acuity and Lens Strength
|
|
|
Left Eye |
|
|
|
Right Eye |
|
||
Subject |
6/6 |
6/18 |
6/36 |
6/60 |
6/6 |
6/18 |
6/36 |
6/60 |
||
|
|
|
|
|
|
|
|
|
||
1 |
116 |
119 |
116 |
124 |
120 |
117 |
114 |
122 |
||
2 |
110 |
110 |
114 |
115 |
106 |
112 |
110 |
110 |
||
3 |
117 |
118 |
120 |
120 |
120 |
120 |
120 |
124 |
||
4 |
112 |
116 |
115 |
113 |
115 |
116 |
116 |
119 |
||
5 |
113 |
114 |
114 |
118 |
114 |
117 |
116 |
112 |
||
6 |
119 |
115 |
94 |
116 |
100 |
99 |
94 |
97 |
||
7 |
110 |
110 |
105 |
118 |
105 |
105 |
115 |
115 |
||
|
|
|
|
|
|
|
|
|
|
|
Display 7.1
Researchers typically adopt the repeated measures paradigm as a means of reducing error variability and/or as the natural way of measuring certain phenomena (e.g., developmental changes over time, learning and memory tasks, etc). In this type of design, the effects of experimental factors giving rise to the repeated measures are assessed relative to the average response made by a subject on all conditions or occasions. In essence, each subject serves as his or her own control and, accordingly, variability due to differences in average responsiveness of the subjects is eliminated from the extraneous error variance. A consequence of this is that the power to detect the effects of within-subjects experimental factors is increased compared to testing in a between-subjects design.
Unfortunately, the advantages of a repeated measures design come at a cost, and that cost is the probable lack of independence of the repeated measurements. Observations made under different conditions involving the same subjects will very likely be correlated rather than independent. This violates one of the assumptions of the analysis of variance procedures described in Chapters 5 and 6, and accounting for the dependence between observations in a repeated measures designs requires some thought. (In the visual acuity example, only within-subject factors occur; and it is possible — indeed likely — that the lens strengths under which a subject was observed were given in random order. However, in examples where time is the single within-subject factor, randomisation is not, of course, an option. This makes the type of study in which subjects are simply observed over time rather different from other repeated measures designs, and they are often given a different label — longitudinal designs. Owing to their different nature, we consider them specifically later in Chapters 10 and 11.)
©2002 CRC Press LLC
7.3Analysis of Variance for Repeated Measures Designs
Despite the lack of independence of the observations made within subjects in a repeated measures design, it remains possible to use relatively straightforward analysis of variance procedures to analyse the data if three particular assumptions about the observations are valid; that is
1.Normality: the data arise from populations with normal distributions.
2.Homogeneity of variance: the variances of the assumed normal distributions are equal.
3.Sphericity: the variances of the differences between all pairs of the repeated measurements are equal. This condition implies that the correlations between pairs of repeated measures are also equal, the so-called compound symmetry pattern.
It is the third assumption that is most critical for the validity of the analysis of variance F-tests. When the sphericity assumption is not regarded as likely, there are two alternatives to a simple analysis of variance: the use of correction factors and multivariate analysis of variance. All three possibilities will be considered in this chapter.
We begin by considering a simple model for the visual acuity observations, yijk, where yijk represents the reaction time of the ith subject for eye j and lens strength k. The model assumed is
yijk = µ + α j + β k + (αβ )jk + γ i + (γα )ij + (γβ )ik + (γαβ )ijk + ijk (7.1)
where α j represents the effect of eye j, β k is the effect of the kth lens strength, and (αβ )jk is the eye × lens strength interaction. The term γ i is
a constant associated with subject i and (γα )ij, (γβ )ik, and (γαβ )ijk represent interaction effects of subject i with each factor and their interaction. The
terms α j, β k, and (αβ )jk are assumed to be fixed effects, but the subject and error terms are assumed to be random variables from normal distributions with zero means and variances specific to each term. This is an example of a mixed model.
Equal correlations between the repeated measures arise as a consequence of the subject effects in this model; and if this structure is valid, a relatively straightforward analysis of variance of the data can be used. However, when the investigator thinks the assumption of equal correlations is too strong, there are two alternatives that can be used:
©2002 CRC Press LLC