Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
28.6 Psychometric Attractions
Latent-variable analytic methods, such as factor and principal component analyses, have had substantial appeal in the world of psychometrics, where new indexes (or rating scales) are regularly prepared to “measure” such “constructs” as intelligence, social opinions, or health status, that cannot readily be cited in ordinary dimensions.
In the customary arrangement, a psychometric “instrument” contains a set of multiple items, each of which can be regarded as a variable. Each item often asks a question or makes an assertion for which the response is cited on a five-point (or other) scale that can refer to degrees of frequency (never, …, always), agreement (strongly disagree, …, strongly agree), volume (none, …, a lot), or other pertinent expressions. The psychometrician usually begins with a large number of items (often more than 100) and tries to reduce them to a smaller group (perhaps 30 or fewer) that will provide a “unidimensional” representation of the selected construct.
To demonstrate that all the items refer to the same construct, i.e., the “latent variable,” they are required to have the “homogeneity” shown by a high intercorrelation with one another. The index of multivariable intercorrelation is called Cronbach’s alpha, but also receives other names (e.g., KuderRichardson formula 20) when the variables are expressed in binary categories rather than ordinal (or dimensional) scales.
The factor and principal component analyses usually produce several “constructs,” rather than just one; but the individual factors are sometimes checked for the “internal reliability” demonstrated by a high value in Cronbach’s alpha. (It was used in the previously cited study12 of PANSS to support the construct “validity” of the “positive and general psychopathology scales.”)
Despite the psychometric enthusiasm, the demonstration of “homogeneity” among multiple items is often contrary to clinimetric goals36,37 in forming new indexes and rating scales. The clinician usually wants to combine different attributes into a single composite index (such as the Apgar Score or TNM staging system), not to join multiple items that all presumably measure the same attribute. If all the items express the same thing, the clinician may want to eliminate the ones that seem redundant. For example, in routinely assessing a patient’s red blood status, an efficient clinician might examine the value of hematocrit or hemoglobin or red blood count, but not all three. A psychometrician, however, might combine all three into an index of erythematosity that would have a particularly high value for Cronbach’s alpha. Furthermore, because Cronbach’s alpha inevitably gets much higher when more items are included, an impressively high value may merely represent multiplicity of items rather than maximization of homogeneity.38
The psychometrician’s methods may therefore be quite suitable for the goal of getting a “unidimensional construct,” but ineffective (or misleading) when applied for the clinimetric goal of forming a multi-dimensional composite of different entities.
28.7 Scientific Problems and Challenges
Lacking an established criterion or a target variable to validate the results, the diverse forms of nontargeted analysis produce accomplishments that are wholly arbitrary. The approaches were at first rejected by most statisticians, but then began to gain respectability as the challenges became mathematically intriguing. For example, factor analysis was originally advocated and used exclusively in the psychosocial sciences, but is now often discussed as a statistical method. About 3 decades ago, Cormack39 denounced the “irrelevantly and unjustifiably … large quantities of multivariate data [summarized] by clusters, undefined a priori … [as a] waste of more valuable time than any other statistical innovation.” More recently, however, Hansen and Tukey40 benevolently stated that clustering can “help with either graphical or verbal description … [particularly] if we can avoid asking too much of clustering techniques.” Hartigan18 suggested that clustering can “be used routinely in the early stages of a data analysis in a similar way to drawing graphs or histograms,” but his belief that “the classification of disease … [is]
© 2002 by Chapman & Hall/CRC
an important … area of application” has not been confirmed when cluster methods were used for subsequent nosographic explorations.
Because every scientific taxonomy must have a specific purpose and function, 36 the main reason scientists do not like the various forms of non-targeted analysis is that the substantive goal of the classification may be neither identified nor overtly checked. The results depend completely on the mathematical methods used for the statistical arrangement and on the particular collection of data being analyzed. For example, insulin-dependent diabetes mellitus, although an important and wellrecognized clinical category, might not emerge as a significant entity if the analytic data came from a clinic having very few such patients.
A separate scientific problem, particularly in nonclinical activities, is the “reification” that occurs when the factors or clusters, although constructed as acts of pure mathematics, become regarded as specific real entities. For example, a still raging controversy in the psychosocial sciences was provoked when intelligence, originally called a g factor in analyses by Spearman,41 became advocated as a specific attribute of human biology. The initial dispute was whether the attribute was appropriately examined with “intelligence tests” and catalogued with factor-analytic gambits.2 The more recent dispute arises from attempts42 to separate genetic, ethnic, and environmental contributions to personal intelligence.
In a different application, the principal component method has been proposed for solving problems of multicollinearity in regression analysis. When multiple “independent” variables are regressed against a single target variable, efforts are often made to eliminate the independent variables that are highly correlated with one another. To avoid the sometimes invidious choice of deciding which variables are redundant, the principal component method might allow all of them to be retained when reformulated as suitably constructed new principal-component variables. Despite the theoretical attractiveness, the approach was recently shown to have major flaws and “very serious potential pitfalls” when checked in several “well-known data sets.”43
In an intriguing investigation, Juniper et al.44 compared “two philosophically different methods for selecting items for a disease-specific quality of life questionnaire.” One method was a psychometric factor analysis. The other was an “impact approach” in which patients with asthma chose what they regarded as important. The investigators found substantial agreement in many of the selections, but also noted that the psychometric approach produced items that did not always “make clinical sense” while omitting “three items of the greatest importance to patients.”
In medical applications, non-targeted analyses seem to attain face validity only when clinicians agree that the results are “sensible.” For example, to avoid idiosyncratic effects from 30 individual variables that could be used in prognostic prediction for a cohort of 4226 patients receiving coronary angiography, Harrell et al.45 tried the “parsimonious approach” of grouping the variables into more “simple indices.” The approach depended on clusters formed after a principal component analysis, but the actual clinical index for each group was created only after the variables were “further grouped” by “two cardiologists.” The 5 indexes that emerged referred to pain, myocardial damage, vascular disease, risk factors, and conduction defects; and 5 of the original variables were left “standing alone.” These 10 entities then gave a better prognostic performance than the original 30 variables alone and were just as effective (but more comprehensible) than the results of the original principal component analysis. The authors recommended use of the “clustering method for clinical prediction problems when the number of potential predictor variables is large” but also stated that “the more clinical insight one injects into the analysis at any point, the better is the end result.”
If clinical “insight” is indeed valuable, however, clinicians might be urged more often to take the incentive in creating suitable classifications by working from “inside-out” in the manner used by Virginia Apgar and other clinical taxonomists. The need for mathematical approaches, working from “outside-in,” seems to arise only when clinicians have been delinquent in meeting their own intellectual responsibilities.36,46,47 If connoisseurs of the substantive phenomena have not made suitable taxonomic efforts, the mathematical procedures will obviously seem more attractive than nothing. On the other hand, if non-targeted analytic methods are not guided by a substantive direction or orientation, the approach can perhaps be best summarized with the waggish remark once made to me by the late Donald Mainland: “If you don’t know what you’re doing, factor analysis is a great way to do it.”
© 2002 by Chapman & Hall/CRC
References
1. Apgar, 1953; 2. Gould, 1981; 3. Kendall, 1968; 4. Chandra Sekhar, 1991; 5. Mason, 1988; 6. Goold, 1994; 7. Ries, 1991; 8. Bailey, 1992; 9. Lauer, 1993; 10. Henderson, 1990; 11. Oden, 1992; 12. von Knorring, 1995; 13. Aubert, 1990; 14. Cowie, 1985; 15. Flameng, 1984; 16. Everitt, 1993; 17. Wastell, 1987; 18. Hartigan, 1973; 19. Ellman, 1985; 20. Heinrich, 1985; 21. Persico, 1993; 22. Furukawa, 1992; 23. Ciampi, 1990; 24. Schlundt, 1991; 25. Thielemans, 1988; 26. Wolleswinkel-van den Bosch, 1997; 27. Krim, 1987; 28. Harris, 1978; 29. Lapointe, 1994; 30. Peacock, 1995; 31. Waller, 1995; 32. Feinstein, 1988a; 33. Hill, 1974; 34. Coste, 1991; 35. Crichton, 1989; 36. Feinstein, 1987a; 37. Wright, 1992; 38. Steiner, 1995; 39. Cormack, 1971; 40. Hansen, 1992; 41. Spearman, 1904; 42. Herrnstein, 1994; 43. Hadi, 1998; 44. Juniper, 1997; 45. Harrell, 1984; 46. Feinstein, 1967a; 47. Feinstein, 1994.
© 2002 by Chapman & Hall/CRC
29
Analysis of Variance
CONTENTS
29.1Conceptual Background
29.1.1Clinical Illustration
29.1.2Analytic Principles
29.2Fisher’s F Ratio
29.3Analysis-of-Variance Table
29.4Problems in Performance
29.5Problems of Interpretation
29.5.1Quantitative Distinctions
29.5.2Stochastic “Nonsignificance”
29.5.3Stochastic “Significance”
29.5.4Substantive Decisions
29.6Additional Applications of ANOVA
29.6.1Multi-Factor Arrangements
29.6.2Nested Analyses
29.6.3Analysis of Covariance
29.6.4Repeated-Measures Arrangements
29.7Non-Parametric Methods of Analysis
29.8Problems in Analysis of Trends
29.9Use of ANOVA in Published Literature References
The targeted analytic method called analysis of variance, sometimes cited acronymically as ANOVA, was devised (like so many other procedures in statistics) by Sir Ronald A. Fisher. Although often marking the conceptual boundary between elementary and advanced statistics, or between amateur “fan” and professional connoisseur, ANOVA is sometimes regarded and taught as “elementary” enough to be used for deriving subsequent simple procedures, such as the t test. Nevertheless, ANOVA is used much less often today than formerly, for reasons to be noted in the discussions that follow.
29.1 Conceptual Background
The main distinguishing feature of ANOVA is that the independent variable contains polytomous categories, which are analyzed simultaneously in relation to a dimensional or ordinal dependent (outcome) variable.
Suppose treatments A, B, and C are tested for effects on blood pressure in a randomized trial. When the results are examined, we want to determine whether one of the treatments differs significantly from the others. With the statistical methods available thus far, the only way to answer this question would be to do multiple comparisons for pairs of groups, contrasting results in group A vs. B, A vs. C, and B vs. C. If more ambitious, we could compare A vs. the combined results of B and C, or group B vs. the combined results of A and C, and so on. We could work out various other arrangements, but in each
© 2002 by Chapman & Hall/CRC
instance, the comparison would rely on contrasting two collected groups, because we currently know no other strategy.
The analysis of variance allows a single simultaneous “comparison” for three or more groups. The result becomes a type of screening test that indicates whether at least one group differs significantly from the others, but further examination is needed to find the distinctive group(s). Despite this disadvantage, ANOVA has been a widely used procedure, particularly by professional statisticians, who often like to apply it even when simpler tactics are available. For example, when data are compared for only two groups, a t test or Z test is simpler, and, as noted later, produces exactly the same results as ANOVA. Nevertheless, many persons will do the two-group comparison (and report the results) with an analysis of variance.
29.1.1Clinical Illustration
Although applicable in experimental trials, ANOVA has been most often used for observational studies. A real-world example, shown in Figure 29.1, contains data for the survival times, in months, of a random sample of 60 patients with lung cancer,1,2 having one of the four histologic categories of WELL (welldifferentiated), SMALL (small cell), ANAP (anaplastic), and CYTOL (cytology only). The other variable (the five categories of TNM stage) listed in Figure 29.1 will be considered later. The main analytic question now is whether histology in any of these groups has significantly different effects on survival.
29.1.1.1 Direct Examination — The best thing to do with these data, before any formal statis - tical analyses begin, is to examine the results directly. In this instance, we can readily determine the group sizes, means, and standard deviations for each of the four histologic categories and for the total. The results, shown in Table 29.1, immediately suggest that the data do not have Gaussian distributions, because the standard deviations are almost all larger than the means. Nevertheless, to allow the illustration to proceed, the results can be further appraised. They show that the well-differentiated and small-cell groups, as expected clinically, have the highest and lowest mean survival times, respectively. Because of relatively small group sizes and non-Gaussian distributions, however, the distinctions may not be stochastically significant.
TABLE 29.1
Summary of Survival Times in Four Histologic Groups of Patients with Lung Cancer in Figure 29.1
Histologic |
Group |
Mean |
Standard |
Category |
Size |
Survival |
Deviation |
|
|
|
|
WELL |
22 |
24.43 |
26.56 |
SMALL |
11 |
4.45 |
3.77 |
ANAP |
18 |
10.87 |
23.39 |
CYTOL |
9 |
11.54 |
13.47 |
Total |
60 |
14.77 |
22.29 |
|
|
|
|
Again before applying any advanced statistics, we can check these results stochastically by using simple t tests. For the most obvious comparison of WELL vs. SMALL, we can use the components of Formula [13.7] to calculate sp =
[21 (26.56 )2 + 10 (3.77 )2 ]/(21 + 10) = 21.96; (1/nA) + (1/nB) = (1/22) + (1/11) =
.369; and XA – XB = 24.43 − 4.45 = 19.98. These data could then be entered into Formula [13.7] to produce t = 9.98/[(21.96)(.369)] = 2.47. At 31 d.f., the associated 2P value is about .02. From this distinction, we might also expect that all the other paired comparisons will not be stochastically significant. (If you check the calculations, you will find that the appropriate 2P values are all >.05.)
29.1.1.2 “Holistic” and Multiple-Comparison Problems — The foregoing comparison indicates a “significant” difference in mean survival between the WELL and SMALL groups, but does not answer the “holistically” phrased analytic question, which asked whether histology has significant effects in any of the four groups in the entire collection. Besides, an argument could be made, using
© 2002 by Chapman & Hall/CRC
distinctions discussed in Section 25.2.1.1, that the contrast of WELL vs. SMALL was only one of the six (4 × 3/2) possible paired comparisons for the four histologic categories. With the Bonferroni correction, the working level of α ′ for each of the six comparisons would be .05/6 = .008. With the latter criterion, the 2P value of about .02 for WELL vs. SMALL would no longer be stochastically significant.
We therefore need a new method to answer the original question. Instead of examining six pairs of contrasted means, we can use a holistic approach by finding the grand mean of the data, determining the deviations of each group of data from that mean, and analyzing those deviations appropriately.
OBS |
ID |
1 |
62 |
2 |
107 |
3 |
110 |
4 |
157 |
5 |
163 |
6 |
246 |
7 |
271 |
8 |
282 |
9 |
302 |
10 |
337 |
11 |
344 |
12 |
352 |
13 |
371 |
14 |
387 |
15 |
428 |
16 |
466 |
17 |
513 |
18 |
548 |
19 |
581 |
20 |
605 |
21 |
609 |
22 |
628 |
23 |
671 |
24 |
764 |
25 |
784 |
26 |
804 |
27 |
806 |
28 |
815 |
29 |
852 |
30 |
855 |
31 |
891 |
32 |
892 |
33 |
931 |
34 |
998 |
35 |
1039 |
36 |
1044 |
37 |
1054 |
38 |
1057 |
39 |
1155 |
40 |
1192 |
41 |
1223 |
42 |
1228 |
43 |
1303 |
44 |
1309 |
45 |
1317 |
46 |
1355 |
47 |
1361 |
48 |
1380 |
49 |
1405 |
50 |
1444 |
51 |
1509 |
52 |
1515 |
53 |
1521 |
54 |
1556 |
55 |
1567 |
56 |
1608 |
57 |
1612 |
58 |
1666 |
59 |
1702 |
60 |
1738 |
FIGURE 29.1
HISTOL |
TNMSTAGE |
SURVIVE |
WELL |
I |
82.3 |
WELL |
II |
5.3 |
WELL |
IIIA |
29.6 |
WELL |
I |
20.3 |
WELL |
I |
54.9 |
SMALL |
I |
10.3 |
WELL |
IIIB |
1.6 |
ANAP |
IIIA |
7.6 |
WELL |
I |
28.0 |
CYTOL |
I |
12.8 |
WELL |
II |
4.0 |
ANAP |
IIIA |
1.3 |
WELL |
IIIB |
14.1 |
SMALL |
IIIA |
0.2 |
SMALL |
II |
6.8 |
ANAP |
IIIB |
1.4 |
ANAP |
I |
0.1 |
ANAP |
IV |
1.8 |
ANAP |
IV |
6.0 |
CYTOL |
IV |
1.0 |
CYTOL |
IV |
6.2 |
SMALL |
IV |
4.4 |
SMALL |
IV |
5.5 |
SMALL |
IV |
0.3 |
ANAP |
IV |
1.6 |
WELL |
I |
12.2 |
ANAP |
IIIB |
6.5 |
WELL |
I |
39.9 |
WELL |
IIIB |
4.5 |
WELL |
II |
1.6 |
CYTOL |
IIIB |
8.1 |
WELL |
IIIB |
62.0 |
CYTOL |
IIIB |
8.8 |
WELL |
IIIB |
0.2 |
SMALL |
IV |
0.6 |
ANAP |
II |
19.3 |
WELL |
IIIB |
0.6 |
ANAP |
I |
10.9 |
ANAP |
I |
0.2 |
SMALL |
IV |
11.2 |
ANAP |
IV |
0.9 |
ANAP |
II |
27.9 |
ANAP |
IIIB |
2.9 |
ANAP |
II |
99.9 |
ANAP |
IV |
4.7 |
CYTOL |
IIIB |
1.8 |
WELL |
IV |
1.0 |
CYTOL |
IV |
10.6 |
SMALL |
IV |
3.7 |
WELL |
II |
55.9 |
SMALL |
IV |
3.4 |
WELL |
I |
79.7 |
ANAP |
IV |
1.9 |
ANAP |
IIIB |
0.8 |
SMALL |
IV |
2.5 |
CYTOL |
I |
8.6 |
WELL |
IIIA |
13.3 |
CYTOL |
IV |
46.0 |
WELL |
II |
23.9 |
WELL |
II |
2.6 |
Printout of data on histologic type, TNM Stage, and months of survival in a random sample of 60 patients with primary cancer of the lung. [OBS = observation number in sample; ID = original indentification number; HISTOL = histology type; TNMSTAGE = one of five ordinal anatomic TNM stages for lung cancer; SURVIVE = survival time (mos.); WELL = well-differentiated; SMALL = small cell; ANAP = anaplastic; CYTOL = cytology only.]
© 2002 by Chapman & Hall/CRC
Many different symbols have been used to indicate the entities that are involved. In the illustration here, Yij will represent the target variable (survival time) for person i in group j. For example, if WELL is the first group in Figure 29.1, the eighth person in the group has Y8,1 = 4.0. The mean of the values
in group j will be |
Y |
j = Σ Yij/nj, where nj is the number of members in the group. Thus, for the last |
|||||||
group (cytology) in Table 29.1, n4 |
= 9, Σ Yi,4 = 103.9, and |
|
4 = 103.9/9 = 11.54. The grand mean, |
|
|
||||
Y |
G, |
||||||||
will be Σ (nj |
Y |
j )/N, where N = Σ nj |
= size of the total group under analysis. From the data in Table 29.1, |
||||||
G = [(22 × 24.43) + (11 × 4.45) + (18 × 10.87) + (9 × 11.54)]/60 = 885.93/60 = 14.77. |
|||||||||
We can now determine the distance, Yj − G , between each group’s mean and the grand mean. For the ANAP group, the distance is 10.87 − 14.77 = −3.90. For the other three groups, the distances are –3.23 for CYTOL, −10.32 for SMALL, and +9.66 for WELL. This inspection confirms that the means of the SMALL and WELL groups are most different from the grand mean, but the results contain no attention to stochastic variation in the data.
29.1.2 Analytic Principles
To solve the stochastic challenge, we can use ANOVA, which like many other classical statistical strategies, expresses real world phenomena with mathematical models. We have already used such models both implicitly and explicitly. In univariate statistics, the mean, Y , was an implicit “model” for fitting a group of data from only the values in the single set of data. The measured deviations from that model, Yi − Y, were then converted to the group’s basic variance, Σ (Yi – Y)2 .
In bivariate statistics for the associations in Chapters 18 and 19, we used an explicit model based on
an additional variable, expressed algebraically as |
ˆ |
= a + bXi. We then compared variances for three |
||||||||||||
Yi |
||||||||||||||
sets of deviations: Yi |
ˆ |
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|||||||||
− Yi , between the items of data and the explicit model; Yi − Y , between the items |
||||||||||||||
|
|
|
|
ˆ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
of data and the implicit model; and Yi – Y , between the explicit and implicit models. The group variances |
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
ˆ |
2 |
, |
or sums of squares associated with these deviations were called residual (or “error”) for Σ ( Yi – Yi ) |
|
|||||||||||||
|
|
|
2 |
|
ˆ |
|
|
|
|
|
|
|
|
|
basic for Σ (Yi – Y) |
|
|
|
|
|
|
|
|
||||||
|
, and model for Σ ( Yi − Y )2. |
|
|
|
|
|
|
|||||||
29.1.2.1 Distinctions in Nomenclature — The foregoing symbols and nomenclature have been simplified for the sake of clarity. In strict statistical reasoning, any set of observed data is regarded as a sample from an unobserved population whose parameters are being estimated from the data. If “modeled” with a straight line, the parametric population would be cited as Y = α + β X. When the
|
|
|
|
|
|
|
|
|
|
ˆ |
+ bXi, the coefficients a and b are estimates of |
|
results for the observed data are expressed as Yi = a |
||||||||||||
the corresponding α |
and β |
parameters. |
|
|||||||||
Also in |
|
strict |
reasoning, variance is an attribute |
of the parametric population. Terms such as |
||||||||
|
|
|
2 |
|
ˆ |
|
|
|
2 |
|
|
|
Σ (Yi – Y) |
or Σ |
– Y) |
, which are used to estimate the parametric variances, should be called sums |
|||||||||
|
(Yi |
|
||||||||||
of squares, not group variances. The linguistic propriety has been violated here for two reasons: (1) the distinctions are more easily understood when called variance, and (2) the violations constantly appear in both published literature and computer print-outs. The usage here, although a departure from strict formalism, is probably better than in many discussions elsewhere where the sums of squares are called variances instead of group variances.
Another issue in nomenclature is syntactical rather than mathematical. In most English prose, between is used for a distinction of two objects, and among for more than two. Nevertheless, in the original description of the analysis of variance, R. A. Fisher used the preposition between rather than among when more than two groups or classes were involved. The term between groups has been perpetuated by subsequent writers, much to the delight of English-prose pedants who may denounce the absence of literacy in mathematical technocracy. Nevertheless, Fisher and his successors have been quite correct in maintaining between. Its use for the cited purpose is approved by diverse high-echelon authorities, including the Oxford English Dictionary, which states that “between has been, from its earliest appearance, extended to more than two.”3 [As one of the potential pedants, I was ready to use among in this text until I checked the dictionary and became enlightened.]
29.1.2.2 Partition of Group Variance — The same type of partitioning that was used for group variance in linear regression is also applied in ANOVA. Conceptually, however, the models are
© 2002 by Chapman & Hall/CRC
expressed differently. Symbolically, each observation can be labelled Yij, with j representing the group
and i, the person (or other observed entity) within the group. The grand mean, G, is used for the “implicit model” when the basic group or system variance, Σ (Yi – G)2 , is summed for the individual values of Yi in all of the groups. The individual group means, Yj , become the explicit models when the total system is partitioned into groups. The residual group variance is the sum of the values of Σ (Yi − Yj )2 within each of the groups. [In more accurate symbolism, the two cited group variances would be written
with double subscripts and summations as ΣΣ |
(Yij − |
G |
)2 and ΣΣ |
(Yij − |
Y |
j |
)2.] The model group variance, |
||||||||
summed for each group of nj members with group mean |
|
j , is Σ nj ( Yj − |
|
|
|
||||||||||
Y |
G )2. These results for data in |
||||||||||||||
the four groups of Figure 29.1 and Table 29.1 are shown in Table 29.2. |
|
|
|
|
|
||||||||||
|
TABLE 29.2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Group-Variance Partitions of Sums of Squares for the Four Histologic Groups |
|
|
||||||||||||
|
in Figure 29.1 and Table 29.1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Basic |
|
|
Model |
|
|
|
|
|
|
Residual |
|||
|
Group |
(Total System) |
(Between Groups) |
|
|
|
(Within Groups) |
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
WELL |
16866.67 |
22(24.43 − 14.77)2 |
= 2052.94 |
|
|
14813.73 |
|
|||||||
|
SMALL |
1313.52 |
11(4.45 − 14.77)2 |
= 1171.53 |
|
|
141.99 |
|
|||||||
|
ANAP |
9576.88 |
18(10.87 − 14.77)2 |
= 273.78 |
|
|
|
9303.10 |
|
||||||
|
CYTOL |
1546.32 |
9(11.54 − 14.77)2 |
= 93.90 |
|
|
|
|
1452.42 |
|
|||||
|
Total |
29304.61* |
|
|
|
|
|
3593.38* |
|
|
25711.24 |
|
|||
*These are the correct totals. They differ slightly from the sum of the collection of individual values, calculated with rounding, in each column.
Except for minor differences due to rounding, the components of Table 29.2 have the same structure noted earlier for simple linear regression in Section 19.2.2. The structure is as follows:
{Basic Group Variance} = {Model Variance between Groups} + { Residual Variance within Groups}
or Syy = SM + SR.
The structure is similar to that of the deviations
Total Deviation = Model Deviation + Residual Deviation
which arises when each individual deviation is expressed in the algebraic identity
Yij – G = (Yj – G) + (Yij – Yj )
If G is moved to the first part of the right side, the equation becomes
Yij = G + (Yj – G) + (Yij – Yj )
and is consistent with a parametric algebraic model that has the form
Yij = µ + γ j + ε ij
In this model, each person’s value of Yij consists of three contributions: (1) from the grand parametric mean, µ (which is estimated by G ); (2) from the parametric increment, γ j (estimated by Yj − G ), between the grand mean and group mean; and (3) from an error term, ε ij (estimated by Yij − Yj ), for the increment between the observed value of Yij and the group mean.
For stochastic appraisal of results, the null hypothesis assumption is that the m groups have the same parametric mean, i.e., γ 1 = γ 2 = … = γ j = … = γ m.
© 2002 by Chapman & Hall/CRC
29.1.2.3 Mean Variances and Degrees of Freedom — When divided by the associated degrees of freedom, each of the foregoing group variances is converted to a mean value. For the basic group variance, the total system contains N = Σ nj members, and d.f. = N − 1. For the model variance, the m groups have m − 1 degrees of freedom. For the residual variance, each group has nj − 1 degrees of freedom, and the total d.f. for m groups is Σ (nj − 1) = N − m.
The degrees of freedom are thus partitioned, like the group variances, into an expression that indicates their sum as
N − 1 = (m − 1) + (N − m)
The mean variances, however, no longer form an equal partition. Their symbols, and the associated values in the example here, are as follows:
Mean Group Variance = Syy /(N − 1) = 29304.61/59 = 496.69
Mean Model Variance = SM /(m − 1) = 3593.38/3 = 1197.79 (between groups)
Mean Residual Variance = SR/(N − m) = 25711.24/56 = 459.13 (within groups)
29.2 Fisher’s F Ratio
Under the null hypothesis of no real difference between the groups—i.e., the assumption that they have the same parametric mean—each of the foregoing three mean variances can be regarded as a separate estimate of the true parametric variance. Within the limits of stochastic variation in random sampling, the three mean variances should equal one another.
To test stochastic significance, R. A. Fisher constructed a variance ratio, later designated as F, that is expressed as
Mean variance between groups
--------------------------------------------------------------------------
Mean variance within groups
It can be cited symbolically as |
|
F = ---------------------------SM /(m – 1 ) |
[29.1] |
SR/(N – m ) |
|
If only two groups are being compared, some simple algebra will show that Formula [29.1] becomes the square of the earlier Formula [13.7] for the calculation of t (or Z). This distinction is the reason why the F ratio is sometimes used, instead of t (or Z), for contrasting two groups, as noted earlier in Section 13.3.6.
The Fisher ratio has a sampling distribution in which the associated 2P value is found for the values of F at the two sets of degrees of freedom in values of m − 1 and N − m. The three components make the distribution difficult to tabulate completely; and it is usually cited according to values for F for each degree of freedom simultaneously at fixed values of 2P such as .1, .05, .01.
In the example under discussion here, the F ratio is 1197.79/459.13 = 2.61. In the Geigy tables4 available for the combination of 3 and 56 degrees of freedom, the required F values are 2.184 for 2P = .1, 2.769 for 2P = .05, and 3.359 for 2P = .025. If only the Geigy values were available, the result would be written as .05 < 2P < .1. In an appropriate computer program, however, the actual 2P value is usually calculated and displayed directly. In this instance, it was .0605.
If 2P is small enough to lead to rejection of the null hypothesis, the stochastic conclusion is that at least one of the groups has a mean significantly different from the others. Because the counter-hypothesis for the F test is always that the mean variance is larger between groups than within them, the null hypothesis can promptly be conceded if the F ratio is < 1. In this instance, because the null hypothesis cannot be rejected at α = .05, we cannot conclude that a significant difference in survival has been
© 2002 by Chapman & Hall/CRC
stochastically confirmed for the histologic categories. The observed quantitative distinctions seem impressive, however, and would probably attain stochastic significance if the group sizes were larger.
29.3 Analysis-of-Variance Table
The results of an analysis of variance are commonly presented, in both published literature and computer printouts, with a tabular arrangement that warrants special attention because it is used not only for ANOVA but also for multivariable regression procedures that involve partitioning the sums of squared deviations (SS) that form group variances.
In each situation, the results show the partition for the sums of squares of three entities: (1) the total SS before imposition of an explicit model, (2) the SS between the explicit model and the original implicit grand mean, and (3) the residual SS for the explicit model. The last of these entities is often called the “unexplained” or “error” variance. Both of these terms are unfortunate because the mathematical “explanation” is a statistical phenomenon that may have nothing to do with biologic mechanisms of explanation and the “error” represents deviations between observed and estimated values, not mistakes or inaccuracies in the basic data. In certain special arrangements, to be discussed shortly, the deviations receive an additionally improved “explanation” when the model is enhanced with subdivisions of the main variable or with the incorporation of additional variables.
Figure 29.2 shows the conventional headings for the ANOVA table of the histology example in Figure 29.1. For this “one-way” analysis, the total results are divided into two rows of components. The number of rows is appropriately expanded when more subgroups are formed (as discussed later) via such mechanisms as subdivisions or inclusion of additional variables.
Dependent Variable: SURVIVE |
|
|
|
|
|
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
3 |
3593.3800000 |
1197.7933333 |
2.61 |
0.0605 |
Error |
56 |
25711.2333333 |
459.1291667 |
|
|
Corrected Total |
59 |
29304.6133333 |
|
|
|
|
R-Square |
C.V. |
Root MSE |
SURVIVE Mean |
|
|
0.122622 |
145.1059 |
21.427300 |
|
14.766667 |
FIGURE 29.2
Printout of analysis-of-variance table for survival time in the four histologic groups of Figure 29.1.
29.4 Problems in Performance
The mathematical reasoning used in many ANOVA arrangements was developed for an ideal experimental world in which all the compared groups or subgroups had the same size. If four groups were being compared, each group had the same number of members, so that n1 = n2 = n3 = n4. If the groups were further divided into subgroups—such as men and women or young, middle-aged, and old—the subgroups had the same sizes within each group.
These equi-sized arrangements were easily attained for experiments in the world of agriculture, where R. A. Fisher worked and developed his ideas about ANOVA. Equally sized groups and subgroups are seldom achieved, however, in the realities of clinical and epidemiologic research. The absence of equal sizes may then create a major problem in the operation of computer programs that rely on equal sizes, and that may be unable to manage data for other circumstances. For the latter situations, the computer programs may divert ANOVA into the format of a “general linear model,” which is essentially a method of multiple regression. One main reason, therefore, why regression methods are replacing ANOVA
© 2002 by Chapman & Hall/CRC
