Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

methods today is that the automated regression methods can more easily process data for unequal-sized groups and subgroups.

29.5 Problems of Interpretation

The results of an analysis of variance are often difficult to interpret for both quantitative and stochastic reasons, as well as for substantive decisions.

29.5.1Quantitative Distinctions

The results of ANOVA are almost always cited with F ratios and P values that indicate stochastic accomplishments but not quantitative descriptive distinctions. The reader is thus left without a mechanism to decide what has been accomplished quantitatively, while worrying that “significant” P values may arise mainly from large group sizes.

Although not commonly used, a simple statistical index can provide a quantitative description of the results. The index, called eta squared, was previously discussed in Section 27.2.2 as a counterpart of r2 for proportionate reduction of group variance in linear regression. Labeled “R-square” in the printout of Figure 29.2, the expression is

η 2 = Model------------------------------------------------------------------------------(between-group) variance =

------SM

Total system (basic) variance

Sy y

For the histologic data in Figure 29.2, this index is 3593.38/29304.61 = 0.12, representing a modest achievement, which barely exceeds the 10% noted earlier (see Section 19.3.3) as a minimum level for “quantitative significance” in variance reduction.

29.5.2Stochastic “Nonsignificance”

Another important issue is what to do when a result is not stochastically significant, i.e., P > α . In previous analytic methods, a confidence interval could be calculated around the “nonsignificant” increment, ratio, or coefficient that described the observed dO distinction in the results. If the upper end of this confidence interval excluded a quantitatively significant value (such as δ ), the result could be called stochastically nonsignificant. If the confidence interval included δ , the investigator might be reluctant to concede the null hypothesis of “no difference.”

This type of reasoning would be equally pertinent for ANOVA, but is rarely used because the results seldom receive a descriptive citation. Confidence intervals, although sometimes calculated for the mean of each group, are almost never determined to give the value of eta the same type of upper and lower confidence boundaries that can be calculated around a correlation coefficient in simple linear regression.

In the absence of a confidence interval for eta, the main available descriptive approach is to examine results in individual groups or in paired comparisons. If any of the results seem quantitatively significant, the investigator, although still conceding the null hypothesis (because P > α ), can remain suspicious that a “significant” difference exists, but has not been confirmed stochastically. For example, in Figure 29.2, the P value of 0.06 would not allow rejection of the null hypothesis that all group means are equal. Nevertheless, the modestly impressive value of 0.12 for eta squared and the large increment noted earlier between the WELL and SMALL group means suggest that the group sizes were too small for stochastic confirmation of what is probably a quantitatively “significant” distinction.

29.5.3Stochastic “Significance”

If P < α , the analysis has identified something that is stochastically significant, and the next step is to find where it is located. As noted earlier, the search involves a series of paired comparisons. A system

© 2002 by Chapman & Hall/CRC

containing m groups will allow m(m 1)/2 paired comparisons when each group’s mean is contrasted against the mean of every other group. With m additional paired comparisons between each group and the total of the others, the total number of paired comparisons will be m(m + 1)/2. For example, the small-cell histologic group in Table 29.1 could be compared against each of the three other groups and also against their total. A particularly ingenious (or desperate) investigator might compare a single group or paired groups against pairs (or yet other combinations) of the others.

This plethora of activities produces the multiple comparison problem discussed in Chapter 25, as well as the multiple eponymous and striking titles (such as Tukey’s honestly significant difference5) that have been given to the procedures proposed for examining and solving the problem.

29.5.4Substantive Decisions

Because the foregoing solutions all depend on arbitrary mathematical mechanisms, investigators who are familiar with the substantive content of the data usually prefer to avoid the polytomous structure of the analysis of variance. For example, a knowledgeable investigator might want to compare only the SMALL vs. WELL groups with a direct 2-group contrast (such as a t test) in the histologic data, avoiding the entire ANOVA process. An even more knowledgeable investigator, recognizing that survival can be affected by many factors (such as TNM stage and age) other than histologic category, might not want to do any type of histologic appraisal unless the other cogent variables have been suitably accounted for. For all these reasons, ANOVA is a magnificent method of analyzing data if you are unfamiliar with what the data really mean or represent. If you know the substantive content of the research, however, and if you have specific ideas to be examined, you may want to use a simpler and more direct way of

examining them.

29.6 Additional Applications of ANOVA

From a series of mathematical models and diverse arrangements, the analysis of variance has a versatility, analogous to that discussed earlier for chi square, that for many years made ANOVA the most commonly used statistical procedure for analyzing complex data. In recent years, however, the ubiquitous availability of computers has led to the frequent replacement of ANOVA by multiple regression procedures, whose results are often easier to understand. Besides, ANOVA can mathematically be regarded as a subdivision of the general-linear-model strategies used in multivariable regression analysis.

Accordingly, four of the many other applications of ANOVA are outlined here only briefly, mainly so that you will have heard of them in case you meet them (particularly in older literature). Details can be found in many statistical textbooks. The four procedures to be discussed are multi-factor arrangements, nested analyses, the analysis of covariance (ANCOVA), and repeated-measures arrange - ments (including the intraclass correlation coefficient).

29.6.1Multi-Factor Arrangements

The procedures discussed so far are called one-way analyses of variance, because only a single independent variable (i.e., histologic category) was examined in relation to survival time. In many circumstances, however, two or more independent variables can be regarded as “factors” affecting the dependent variable. When these additional factors are included, the analysis is called two-way (or two-factor), threeway (or three-factor), etc.

For example, if the two factors of histologic category and TNM stage are considered simultaneously, the data for the 60 patients in Figure 29.1 would be arranged as shown in Table 29.3. The identification of individual survival times would require triple subscripts: i for the person, j for the row, and k for the column.

© 2002 by Chapman & Hall/CRC

TABLE 29.3

Two-Way Arrangement of Individual Data for Survival Time (in Months) of Patients with Lung Cancer

Histologic

 

 

TNM Stage

 

 

Mean for Total

Category

I

II

IIIA

IIIB

IV

Row Category

 

 

 

 

 

 

 

 

Well

82.3

5.3

29.6

1.6

1.0

 

 

 

20.3

4.0

13.3

14.1

 

 

 

 

54.9

1.6

 

4.5

 

 

 

 

 

 

 

 

 

28.0

55.9

 

62.0

 

 

24.43

 

12.2

23.9

 

0.2

 

 

 

 

39.9

2.6

 

0.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

79.7

 

 

 

 

 

 

Small

10.3

6.8

0.2

4.4

 

 

 

 

 

 

 

5.5

 

 

 

 

 

 

 

0.3

 

 

 

 

 

 

 

0.6

 

 

 

 

 

 

 

11.2

 

4.45

 

 

 

 

 

3.7

 

 

 

 

 

 

 

 

 

 

 

 

 

3.4

 

 

 

 

 

 

 

2.5

 

 

Anap

0.1

19.3

7.6

1.4

1.8

 

 

 

10.9

27.9

1.3

6.5

6.0

 

 

 

 

 

0.2

99.9

 

2.9

1.6

 

10.87

 

 

 

 

0.8

0.9

 

 

 

 

 

 

 

4.7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.9

 

 

Cytol

12.8

8.1

1.0

 

 

 

8.6

 

 

8.8

6.2

 

11.54

 

 

 

 

1.8

10.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

46.0

 

 

Mean for Total

 

 

 

 

 

 

 

Column Category

27.7

24.72

10.40

8.72

5.96

 

14.77

 

 

 

 

 

 

 

 

29.6.1.1 “Main Effects” — In the mathematical model of the two-way arrangement, the categorical mean for each factor—Histology and TNM Stage—makes a separate contribution, called the “main effect,” beyond the grand mean. The remainder (or unexplained) deviation for each person is called the residual error. Thus, a two-factor model for the two independent variables would express the observed results as

Yijk =

G

+ (

Y

j

G

) + (

Y

k

G

) + (Yijk

Y

j

Y

k +

G

)

[29.2]

The G term here represents the grand mean. The next two terms represent the respective deviations of

each row mean ( Yj ) and each column mean ( Yk ) from the grand mean. The four components in the last term for the residual deviation of each person are constructed as “residuals” that maintain the

algebraic identity. The total sum of squares in the system will be Σ (Yijk G )2, with N 1 degrees of freedom. There will be two sums of squares for the model, cited as Σ n j( Yj G )2 for the row factor, and as Σ nk( Yk G )2 for the column factor. The residual sum of squares will be the sum of all the

values of (Yijk Yj Yk + G )2.

Figure 29.3 shows the printout of pertinent calculations for the data in Table 29.3. In the lower half of Figure 29.3, the 4-category histologic variable has 3 degrees of freedom and its “Type I SS” (sum of squares) and mean square, respectively, are the same 3593.38 and 1197.79 shown earlier. The 5-category TNM-stage variable has 4 degrees of freedom and corresponding values of 3116.39 and 779.10. The residual error group variance in the upper part of the table is now calculated differently—as the “corrected

© 2002 by Chapman & Hall/CRC

Dependent Variable: SURVIVE

 

 

 

 

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

7

6709.7729638

958.5389948

2.21

0.0486

Error

52

22594.8403695

434.5161610

 

 

Corrected Total

59

29304.6133333

 

 

 

 

R-Square

C.V.

Root MSE

SURVIVE Mean

 

0.228966

141.1629

20.845051

 

14.766667

Source

DF

Type I SS

Mean Square

F Value

Pr > F

HISTOL

3

3593.3800000

1197.7933333

2.76

0.0515

TNMSTAGE

4

3116.3929638

779.0982410

1.79

0.1443

FIGURE 29.3

Printout for 2-way ANOVA of data in Figure 29.1 and Table 29.3.

total” sum of squares minus the sum of Type I squares, which is a total of 6709.77 for the two factors in the model. Since those two factors have 7 (=3 + 4) degrees of freedom, the mean square for the model is 6709.77/7 = 958.54, and the d.f. in the error variance is 59 7 = 52. The mean square for the error variance becomes 22594.84/52 = 434.52. When calculated for this two-factor model, the F ratio of mean squares is 2.21, which now achieves a P value (marked “Pr > F”) just below .05. If the α level is set at

.05, this result is “significant,” whereas it was not so in the previous analysis for histology alone.

The label “Type I SS” is used because ANOVA calculations can also produce three other types of sums of squares (marked II, III, and IV when presented) that vary with the order in which factors are entered or removed in a model, and with consideration of the interactions discussed in the next section. As shown in the lower section of Figure 29.3, an F-ratio value can be calculated for each factor when its mean square is divided by the “error” mean square. For histology, this ratio is 1197.79/434.52 = 2.76. For TNM stage, the corresponding value in the printout is 1.79. The corresponding 2P values are just above .05 for histology and .14 for TNM stage.

29.6.1.2 Interactions — In linear models, each factor is assumed to have its own separate additive effect. In biologic reality, however, the conjunction of two factors may have an antagonistic or synergistic effect beyond their individual actions, so that the whole differs from the sum of the parts. For example, increasing weight and increasing blood pressure may each lead to increasing mortality, but their combined effect may be particularly pronounced in persons who are at the extremes of obesity and hypertension. Statisticians use the term interactions for these conjunctive effects; and the potential for interactions is often considered whenever an analysis contains two or more factors.

To examine these effects in a two-factor analysis, the model for Yijk is expanded to contain an interaction term. It is calculated, for the mean of each cell of the conjoined categories, as the deviation from the product of mean values of the pertinent row and column variables for each cell. In the expression

of the equation for Yijk , the first three terms of Equation [29.2] are the same: G , for the grand mean; Yj G for each row; and Yk G for each column. Because the observed mean in each cell will be

Yjk , the interaction effect will be the deviation estimated as Yjk Yj Yk + G . The remaining residual effect, used for calculating the residual sum of squares, is Yijk Yjk . For each sum of squares, the degrees of freedom are determined appropriately for the calculations of mean squares and F ratios.

The calculation of interaction effects can be illustrated with an example from the data of Table 29.3 for the 7-member cell in the first row, first column. The grand mean is 14.77; the entire WELL histologic

category has a mean of 24.43; and TNM stage I has a mean of 27.71. The mean of the seven values in the

cited cell is (82.3 + 20.3 +

+ 79.7)/7 = 45.33. According to the algebraic equation, G = 14.77; in the

first row, ( Yj – G ) = 24.43 14.77 = 9.66; and in the first column, ( Yk – G ) = 27.71 14.77 = 12.94. The interaction effect in the cited cell will be estimated as 45.33 24.43 27.71 + 14.77 = 7.96. The estimated value of the residual for each of the seven Yijk values in the cited cell will be Yijk 7.96.

© 2002 by Chapman & Hall/CRC

Figure 29.4 shows the printout of the ANOVA table when an interaction model is used for the twofactor data in Table 29.3. In Figure 29.4, the sums of squares (marked Type I SS) and mean squares for histology and TNM stage are the same as inTable 29.3, and they also have the same degrees of freedom. The degrees of freedom for the interaction are tricky to calculate, however. In this instance, because some of the cells of Table 29.3 are empty or have only 1 member, we first calculate degrees for freedom for the residual sum of squares, Σ (Y ijk Yjk )2. In each pertinent cell, located at (j, k) coordinates in the table, the degrees of freedom will be njk 1. Working across and then downward through the cells

in Table 29.3, the sum of the njk 1 values will be 6 + 5 + 1 + 5 + 7 + 2 + 2 + 1 + 3 + 5 + 1 + 2 + 3 = 43. (The values are 0 for the four cells with one member each and also for the 3 cells with no members.)

This calculation shows that the model accounts for 59 43 = 16 d.f.; and as the two main factors have a total of 7 d.f., the interaction factor contributes 9 d.f. to the model, as shown in the last row of Figure 29.4.

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

16

13835.482381

864.717649

2.40

0.0114

Error

43

15469.130952

359.747231

 

 

Corrected Total

59

29304.613333

 

 

 

 

R-Square

C.V.

Root MSE

SURVIVE Mean

 

0.472126

128.4447

18.967004

 

14.766667

Source

DF

Type I SS

Mean Square

F Value

Pr > F

HISTOL

3

3593.3800000

1197.7933333

3.33

0.0282

TNMSTAGE

4

3116.3929638

779.0982410

2.17

0.0890

HISTOL*TNMSTAGE

9

7125.7094171

791.7454908

2.20

0.0408

FIGURE 29.4

Two-way ANOVA, with interaction component, for results in Table 29.3 and Figure 29.3. [Printout from SAS PROC GLM computer program.]

Calculated with the new mean square error term in Figure 29.4, the F values produce 2P values below <.05 for the model, for the histology factor, and for the histology-TNM-stage interaction. The 2P value is about .09 for the TNM-stage main effect.

The difficult challenge of interpreting three-way and more complex interactions are considered elsewhere2 in discussions of multivariable analysis.

29.6.2Nested Analyses

The groups of a single factor in ANOVA can sometimes be divided into pertinent subgroups. For example, the three treatments A, B, and C might each have been given in two sets of doses, low and high, so that six subgroups could be analyzed, two for each treatment. The results can then be evaluated with a procedure called a hierarchical or nested analysis. The variations in the total sum of squares would arise for the six subgroups and the three main groups, and the analysis is planned accordingly.

29.6.3Analysis of Covariance

An analysis of covariance (acronymically designated as ANCOVA) can be done for at least two reasons. The first is to adjust for the action of a second factor suspected of being as a confounder in affecting both the dependent variable and the other factor under analysis.

The second reason is to allow appropriate analyses of a ranked independent variable that is expressed in either a dimensional or ordinal scale. This ranking is ignored when the ordinary ANOVA procedure relies on nominal categories for the independent variable. Thus, in the analyses shown in Figs. 29.3 and 29.4, the polytomous categories of TNM stage were managed as though they were nominal. To

© 2002 by Chapman & Hall/CRC

allow maintenance of the ranks, TNM stage could be declared a covariate, which would then be analyzed as though it had a dimensional scale.

The results of the covariance analysis are shown in Figure 29.5. Note that TNM stage now has only 1 degree of freedom, thus giving the model a total of 4 D.F., an F value of 3.61 and a P value of 0.0111, despite a decline of R-square from .229 in Figure 29.3 to .208 in Figure 29.5. The histology variable, which had P = .052 in Figure 29.3 now has P = .428; and TNM stage, with P = .144 in Figure 29.3, has now become highly significant at P = .0012. These dramatic changes indicate what can happen when the rank sequence is either ignored or appropriately analyzed for polytomous variables.

Source

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

4

6087.9081999

1521.9770500

3.61

0.0111

Error

55

23216.7051334

422.1219115

 

 

Corrected Total

59

29304.6133333

 

 

 

 

R-Square

C.V.

Root MSE

SURVIVE Mean

 

0.207746

139.1350

20.545606

 

14.766667

Source

DF

Type I SS

Mean Square

F Value

Pr > F

TNMSTAGE

1

4897.3897453

4897.3897453

11.60

0.0012

HISTOL

3

1190.5184546

396.8394849

0.94

0.4276

FIGURE 29.5

Printout of Analysis of Covariance for data in Figure 29.3, with TNM stage used as ranked variable.

In past years, the effect of confounding or ranked covariates was often formally “adjusted” in an analysis of covariance, using a complex set of computations and symbols. Today, however, the same adjustment is almost always done with a multiple regression procedure. The adjustment process in ANCOVA is actually a form of regression analysis in which the related effects of the covariate are determined by regression and then removed from the error variance. The group means of the main factor are also “adjusted” to correspond to a common value of the covariate. The subsequent analysis is presumably more “powerful” in detecting the effects of the main factor, because the confounding effects have presumably been “removed.” The process and results are usually much easier to understand, however, when done with multiple linear regression.2

29.6.4Repeated-Measures Arrangements

Repeated measures is the name given to analyses in which the same entity has been observed repeatedly. The repetitions can occur with changes over time, perhaps after interventions such as treatment, or with examinations of the same (unchanged) entity by different observers or systems of measurement.

29.6.4.1 Temporal Changes — The most common repeated-measures situation is an ordinary crossover study, where the same patients receive treatments A and B. The effects of treatment A vs. treatment B in each person can be subtracted and thereby reduced to a single group of increments, which can be analyzed with a paired t test, as discussed in Section 7.8.2.2. The same analysis of increments can be used for the before-and-after measurements of the effect in patients receiving a particular treatment, such as the results shown earlier for blood glucose in Table 7.4.

Because the situations just described can easily be managed with paired t tests, the repeated-measures form of ANOVA is usually reserved for situations in which the same entity has been measured at three or more time points. The variables that become the main factors in the analysis are the times and the groups (such as treatment). Interaction terms can be added for the effects of groups × times.

© 2002 by Chapman & Hall/CRC

Four major problems, for which consensus solutions do not yet exist, arise when the same entity is measured repeatedly over time:

1.Independence. The first problem is violation of the assumption that the measurements are independent. The paired t test manages this problem by reducing the pair of measurements to their increment, which becomes a simple “new” variable. This distinction may not always be suitably employed with more than two sets of repeated measurements.

2.Incremental components. A second problem is the choice of components for calculating

incremental changes for each person. Suppose t0 is an individual baseline value, and the

subsequent values are t1, t2, and t3. Do we always measure increments from the baseline value, i.e., t1 t0, t2 t0, and t3 t0, or should the increments be listed successively as t1 t0, t2 t1, t3 t2?

3.Summary index of response. If a treatment is imposed after the baseline value at t0, what is the best single index for summarizing the post-therapeutic response? Should it be the mean

of the post-treatment values, the increment between t0 and the last measurement, or a regression line for the set of values?

4.Neglect of trend. This problem is discussed further in Section 29.8. As noted earlier, an ordinary analysis of variance does not distinguish between unranked nominal and ranked ordinal categories in the independent polytomous variable. If the variable represents serial points in time, their ranking may produce a trend, but it will be neglected unless special arrangements are used in the calculations.

29.6.4.2 Intraclass Correlations — Studies of observer or instrument variability can also be regarded as a type of repeated measures, for which the results are commonly cited with an intraclass correlation coefficient (ICC).

As noted in Section 20.7.3, the basic concept was developed as a way of assessing agreement for measurements of a dimensional variable, such as height or weight, between members of the same class, such as brothers in a family. To avoid the inadequacy of a correlation coefficient, the data were appraised with a repeated-measures analysis-of-variance. To avoid decisions about which member of a pair should be listed as the first or second measurements, all possible pairs were listed twice, with each member as the first measurement and then as the second. The total sums of squares could be partitioned into one sum for variability between the individuals being rated, i.e., the subjects (SSS), and another sum of squares due to residual error (SSE). The intraclass correlation was then calculated as

SSS – SSE

RI = --------------------------

SSS + SSE

The approach was later adapted for psychometric definitions of reliability. The appropriate means for the sums of squares were symbolized as sc2 for variance in the subjects and se2 for the corresponding residual errors. Reliability was then defined as

RI = sc2 /(sc2 + se2 )

Using the foregoing symbols, when each of a set of n persons is measured by each of a set of r raters, the variance of a single observation, s, can be partitioned as

s2 = sc2 + sr2 + se2

where sr2 is the mean of the appropriate sums of squares for the raters.

These variances can be arranged into several formulas for calculating RI. The different arrangements depend on the models used for the “sampling” and the interpretation.6 In a worked example cited by Everitt,7 vital capacity was measured by four raters for each of 20 patients. The total sum of squares for the 80 observations, with d.f. = 79, was divided into three sets of sums of squares: (1) for the four

© 2002 by Chapman & Hall/CRC

observers with d.f. = 3; (2) for the 20 patients with d.f. = 19; and (3) for the residual “error” with d.f. = 3 × 19 = 57. The formula used by Everitt for calculating the intraclass correlation coefficient was

RI

=

n(sc2

– se2 )

ns----------------------------------------------------------c2 + rsr2 + (nr – n – r)se2

 

 

A counterpart formula, using SSR to represent sums of squares for raters, is

SSS – SSE

R = ---------------------------------------------------

I SSS + SSE + 2 (SSR )

The intraclass correlation coefficient (ICC) can be used when laboratory measurements of “instrument” variability are expressed in dimensional data. Nevertheless, as discussed in Chapter 20, most laboratories prefer to use simpler pair-wise and other straightforward statistical approaches that are easier to under - stand and interpret than the ICC.

The simpler approaches may also have mathematical advantages that have been cited by Bland and Altman,8 who contend that the ICC, although appropriate for repetitions of the same measurement, is unsatisfactory “when dealing with measurements by two different methods” where “there is no ordering of the repeated measures and hence no obvious choice of X or Y.” Other disadvantages ascribed to the ICC are that it depends “on the range of measurement and … is not related to the actual scale of measurement or to the size of error which might be clinically allowable.” Instead, Bland and Altman recommend their “limits of agreement” method, which was discussed throughout Section 20.7.1. The method relies on examining the increments in measurement for each subject. The mean difference then indicates bias, and the standard deviation is used to calculate a 95% descriptive zone for the “limits of agreement.” A plot of the differences against the mean value of each pair will indicate whether the discrepancies in measurement diverge as the measured values increase.

For categorical data, concordance is usually expressed (see Chapter 20) with other indexes of variability, such as kappa, which yields the same results as the intraclass coefficient in pertinent situations.

29.7 Non-Parametric Methods of Analysis

The mathematical models of ANOVA require diverse assumptions about Gaussian distributions and homoscedastic (i.e., similar) variances. These assumptions can be avoided by converting the dimensional data to ranks and analyzing the values of the ranks. The Kruskal-Wallis procedure, which is the eponym for a one-way ANOVA using ranked data, corresponds to a Wilcoxon–Mann–Whitney U test for 3 or more groups. The Friedman procedure, which refers to a two-way analysis of ranked data, was proposed almost 60 years ago by Milton Friedman, who later become more famous in economics than in statistics.

29.8 Problems in Analysis of Trends

If a variable has ordinal grades, the customary ANOVA procedure will regard the ranked categories merely as nominal, and will not make provision for the possible or anticipated trend associated with different ranks. The problem occurs with an ordinal variable, such as TNM stage in Figure 29.1, because the effect of an increasing stage is ignored. The neglect of a ranked effect can be particularly important when the independent variable (or “factor”) is time, for which the effects might be expected to occur in a distinct temporal sequence. This problem in repeated-measures ANOVA evoked a denunciation by Sheiner,9 who contended that the customary ANOVA methods were wholly inappropriate for many studies of the time effects of pharmacologic agents.

© 2002 by Chapman & Hall/CRC

The appropriate form of analysis can be carried out, somewhat in the manner of the chi-square test for linear trend in an array of proportions (see Chapter 27), by assigning arbitrary coding values (such as 1, 2, 3, 4) to the ordinal categories. The process is usually done more easily and simply, however, as a linear regression analysis.

29.9 Use of ANOVA in Published Literature

To find examples of ANOVA in published medical literature, the automated Colleague Medical Database was searched for papers, in English, of human-subject research that appeared in medical journals during 1991–95, and in which analysis of variance was mentioned in the abstract-summary. From the list of possibilities, 15 were selected to cover a wide array of journals and topics. The discussion that follows is a summary of results in those 15 articles.

A one-way analysis of variance was used to check the rate of disappearance of ethanol from venous blood in 12 subjects who drank the same dose of alcohol in orange juice on four occasions.10 The authors concluded that the variation between subjects exceeded the variations within subjects. Another classical one-way ANOVA was done to examine values of intestinal calcium absorption and serum parathyroid hormone levels in three groups of people: normal controls and asthmatic patients receiving either oral or inhaled steroid therapy.11 A one-way ANOVA compared diverse aspects of functional status in two groups of patients receiving either fluorouracil or saline infusions for head and neck cancer.12 In a complex but essentially one-way ANOVA, several dependent variables (intervention points, days of monitoring, final cardiovascular function) were related to subgroups defined by APACHE II severity scores in a surgical intensive care unit.13 (The results were also examined in a regression analysis.) In another one-way analysis of variance, preference ratings for six different modes of teaching and learning were evaluated14 among three groups, comprising first-year, second-year, and fourth-year medical students in the United Arab Emirates. The results were also examined for the preferences of male vs. female students.

In a two-way ANOVA, neurologic dysfunction at age four years was related15 to two main factors: birth weight and location of birth in newborn intensive care units of either Copenhagen or Dublin. Multifactor ANOVAs were applied,16 in 20 patients with conjunctival malignant melanoma, to the relationship between 5-year survival and the counts of cells positive for proliferating cell nuclear antigen, predominant cell type, maximum tumor depth, and site of tumor. The result, showing that patients with low counts had better prognoses, was then “confirmed” with a Cox proportional hazards regression analysis. (The latter approach would probably have been best used directly.)

Repeated measures ANOVA was used in the following studies: to check the effect of oat bran consumption on serum cholesterol levels at four time points;17 to compare various effects (including blood pressure levels and markers of alcohol consumption) in hypertensive men randomized to either a control group or to receive special “advice” about methods of reducing alcohol consumption;18 to assess the time trend of blood pressure during a 24-hour monitoring period in patients receiving placebo or an active antihypertensive agent;19 and to monitor changes at three time points over 6 months in four indexes (body weight, serum osmolality, serum sodium, and blood urea nitrogen/creatinine ratios) for residents of a nursing home. 19

The intraclass correlation coefficient was used in three other studies concerned with reliability (or reproducibility) of the measurements performed in neuropathic tests,21 a brief psychiatric rating scale,22 and a method of grading photoageing in skin casts.23

References

1. Feinstein, 1990d; 2. Feinstein, 1996; 3. Oxford English Dictionary, 1971; 4. Lentner, 1982; 5. Tukey, 1968; 6. Shrout, 1979; 7. Everitt, 1989; 8. Bland, 1990; 9. Sheiner, 1992; 10. Jones, 1994; 11. Luengo, 1991; 12. Browman, 1993; 13. Civetta, 1992; 14. Paul, 1994; 15. Ellison, 1992; 16. Seregard, 1993; 17. Saudia, 1992; 18. Maheswaran, 1992; 19. Tomei, 1992; 20. Weinberg, 1994; 21. Dyck, 1991; 22. Hafkenscheid, 1993; 23. Fritschi, 1995.

© 2002 by Chapman & Hall/CRC

References

[Numbers in brackets indicate chapter(s) where reference was cited]

Abbe, E. Gesammelte Abhandlungen. Vol. II. Jena, Germany: Gustav Fischer Verlag, 1906. [14] Abramson, J.H. Age-standardization in epidemiological data (Letter to editor). Int. J. Epidemiol.

1995; 24:238–239. [26]

Adams, W.J. The Life and Times of the Central Limit Theorem. New York: Kaedmon Publishing Co., 1974. [7]

Agocs, M.M., White, M.C., Ursica, G., Olson, D.R., and Vamon, A. A longitudinal study of ambient air pollutants and the lung peak expiratory flow rates among asthmatic children in Hungary. Int. J. Epidemiol. 1997; 26:1272–1280. [22]

Aickin, M. and Gensler, H. Adjusting for multiple testing when reporting research results: the Bonferroni vs. Holm methods. Am. J. Public Health 1996; 86:726–728. [25]

Aitken, R.C.B. Measurement of feelings using visual analogue scales. Proc. Roy. Soc. Med. 1969; 62:989–993. [15]

Albert, J. Exploring baseball hitting data: What about those breakdown statistics? J. Am. Stat. Assn. 1994; 89:1066–1074. [8]

Allman, R.M., Goode, P.S., Patrick, M.M., Burst, N., and Bartolucci, A.A. Pressure ulcer risk factors among hospitalized patients with activity limitation. JAMA 1995; 273:865–870. [15]

Allred, E.N., Bleecker, E.R., Chaitman, B.R. et al. Short-term effects of carbon monoxide exposure on the exercise performance of subjects with coronary artery disease. N. Engl. J. Med. 1989; 321:1426–1432. [19]

Altman, D.G. Statistics and ethics in medical research. III. How large a sample? Br. Med. J. 1980; 281:1336–1338. [23]

Altman, D.G. and Gardner, M.J. Presentation of variability (Letter to editor). Lancet 1986; 2:639. [9] Altman, D.G. and Gardner, M.J. Calculating confidence intervals for regression and correlation. Br.

Med. J. 1988; 296:1238–1242. [19]

American Journal of Epidemiology. Special issue on National Conference on Clustering of Health Events, Atlanta. Am. J. Epidemiol. 1990; 132:Sl–S202. [25]

Anderson, R.N. and Rosenberg, H.M. Age Standardization of Death Rates: Implementation of the Year 2000 Standard. National Vital Statistics Reports, v. 47, n. 3. Hyattsville, MD: National Center for Health Statistics, 1998. [26]

Anderson, S. Individual bioequivalence: A problem of switchability. Biopharm. Rep. 1993; 2:1–5. [24] Anscombe, F.J. Graphs in statistical analysis. Am. Statist. 1973; 27:17–21. [19]

Apgar, V. A proposal for a new method of evaluation of the newborn infant. Anesth. Analg. 1953; 32:260–267. [28]

Armitage, P. Statistical Methods in Medical Research. New York: John Wiley & Sons, 1971, 239–250 [24]; 389. [26]

Armitage, P. Sequential Medical Trials. 2nd ed. New York: John Wiley and Sons, 1975. [25] Armitage, P. and Berry, G. Statistical Methods in Medical Research. 2nd ed. Oxford: Blackwell 1987. [27] Armitage, P. and Hills, M. The two-period crossover trial. Statistician 1982; 31:119–131. [15] Armstrong, B., Stevens, N., and Doll, R. Retrospective study of the association between use of

rauwolfia derivatives and breast cancer in English women. Lancet 1974; 2:672–675. [25] Ashby, D. (Guest Ed.). Conference on methodological and ethical issues in clinical trials. Stat. in

Med. 1993; 12:1373–1534. [11]

© 2002 by Chapman & Hall/CRC