22.7 Dynamic (Age-Period) Cohort Analysis
In the methods just discussed for estimating life expectancy, all of the mortality rates for each age group were cross-sectional. They came from demographic mortality data at a single period in secular time, which would usually be the latest available data for the general population. For example, the estimate of life expectancy for someone in 1993 would rely on the death rates noted for each age group in 1992, or in the closest calendar date (such as 1990) for which trustworthy information is available for census counts, numbers of deaths, and the corresponding age-specific mortality rates. This approach seems quite reasonable for an insurance company, which would want to estimate life expectancy for someone today according to the death rates that pertain today.
In various other circumstances, however, the investigators may take a more “historical” approach, aimed at determining the social or secular effects of changes in calendar time. For example, because of reductions in infant and other annual mortality rates, a cohort of persons born in 1905 might have life expectancies, at each successive age in each successive calendar interval, that differ substantially from a cohort of persons born in 1950. These two cohorts would be unfairly compared, however, if crosssectional mortality rates were chosen from 1905 for all members of the first cohort, and from 1950 for the second. When persons born in 1905 became 20 years old, they were affected by the annual mortality rate for age 20 that existed in 1925, not in 1905. At age 40, they would be subject to the corresponding mortality rate in 1945. Similarly, the members of the 1950 birth cohort would be expected to die at age 20 according to the corresponding rate for 1970, and at age 40 according to the corresponding rate for 1990.
Analyzing the changing age-specific rates for each calendar period, rather than the “cross-sectional” rates existing at a single calendar year, produces a dynamic approach that has received various names. Age-period cohort analysis is the most popular term today, but the approach has also been called a generation or fluent analysis.
22.7.1Methods of Construction and Display
Although applicable to general demographic explorations, age-period-cohort analyses have medically appeared most often in epidemiologic research on changes in mortality rates for a particular disease. In an early demonstration in 1947, Margaret Merrell35 constructed the appropriate analytic grid, shown in Figure 22.6, using age-specific death rates for tuberculosis in Massachusetts men at 10 year intervals from 1880 to 1940. The columns show the “cross-sectional” secular calendar rates and the rows show the pertinent age groups. The demarcated diagonal segment shows the “cohort experience” as persons born in 1880 went through the changing age and secular rates at each period.
The data are usually plotted with age on the abscissa and age-specific death rates on the ordinate for each “birth cohort.” The graphs can become quite complex because the investigators often show both the cross-sectional and the diagonal rates for each cohort. Such an arrangement appears in Figure 22.7, which presents both sets of results for lung cancer in cohorts of U.S. men.36 The solid lines show the cross-sectional secular rates of age-specific deaths from lung cancer during calendar intervals ranging from 1931–35 to 1971–75. The dashed lines show the “diagonal” rates for the life experience of different birth cohorts, born in approximately 1876, 1881, 1886, 1891, 1896, and 1901.
22.7.2Uncertainties and Problems
Because age-period-cohort analysis seems to be an attractive method of examining the “natural history” of general and disease-specific mortality, various statistical models have been proposed for the procedure. The proposals have been disputed, however, with authorities such as Holford37,38 favoring and Kupper et al.36 doubting, the value of “currently available” models. The mathematical disputes refer to the choice of an appropriate statistical model for estimating parameters, and to the “identifiability” problem that arises because the age, period, and cohort effects are interrelated rather than independent.
© 2002 by Chapman & Hall/CRC
Age |
|
|
1880 |
1890 |
1900 |
1910 |
1920 |
1930 |
1940 |
|
|
|
|
|
|
|
|
|
|
|
0 |
- |
4 |
|
760 |
578 |
309 |
209 |
109 |
41 |
11 |
5 |
- |
9 |
|
43 |
49 |
31 |
21 |
24 |
11 |
2 |
10 |
- 1 9 |
|
126 |
115 |
90 |
63 |
49 |
21 |
4 |
20 |
- 2 9 |
|
444 |
361 |
288 |
207 |
149 |
81 |
35 |
30 |
- 3 9 |
|
378 |
368 |
396 |
353 |
164 |
115 |
51 |
40 |
- 4 9 |
|
364 |
336 |
253 |
253 |
175 |
118 |
86 |
50 |
- 5 9 |
|
366 |
325 |
267 |
252 |
171 |
127 |
92 |
60 |
- 6 9 |
|
475 |
346 |
304 |
246 |
172 |
95 |
109 |
70 |
+ |
|
|
672 |
396 |
343 |
163 |
127 |
95 |
79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FIGURE 22.6
Age-specific death rates per 100,000 from tuberculosis for Massachusetts males, 1880 to 1940. Rates for the birth cohort of 1880 are indicated in the outlined diagonal strip. [Taken from Chapter Reference 35]
Regardless of the way this mathematical controversy is resolved, however, the age-period-cohort analytic method has the profound scientific problem of using disease-specific rates of death. The analyses might be much more persuasive if aimed at the credible data of total mortality. As noted in Chapter 17, cause-specific rates of disease can seldom be scientifically accepted as trustworthy data, 4,39,40 regardless of whether the rates come from the idiosyncratic inconsistencies of coded death certificates or from the technologic detection bias that affects results assembed at tumor registries.
22.8 Longitudinal Analysis
In longitudinal analysis, the members of a cohort are followed to delineate the results of repeated measurements over time.
22.8.1Confusion with Longitudinal Cross-Sections
The age-period-cohort procedure is seldom mistaken for a truly longitudinal analysis because the information comes from general mortality data for a population; i.e., individual members of a cohort are not examined or followed. A procedure that is sometimes confused with longitudinal analysis, however, is constantly used by practicing pediatricians to track the location of a child’s annual height and weight on a graph showing appropriate percentiles in a reference “growing” population. The data for apparent growth in the reference population, however, come from cross-sections of children at the different ages, not from a longitudinally followed cohort. These analytic structures are often called longitudinal cross-sections.
The use of longitudinal cross-sections rather than cohort studies can create complex problems and controversies. The tactic seems to work well for the growth of normal children (as shown in Exercise 17.2), but can produce major distortions in studies of the course of older adults, where various noxious influences, eliminating vulnerable persons at relatively early ages, can lead to deceptive results in the cross-sections of “survivors.” The topic is further discussed in Section 22.8.2.5.
© 2002 by Chapman & Hall/CRC
|
420 |
|
390 |
|
360 |
|
330 |
|
300 |
|
270 |
Rate/ 100,000 |
240 |
180 |
|
210 |
150
120
90
60
30
0
FIGURE 22.7
1971-75
1896
|
1901 |
(1897-1909) birth conart |
1891 |
|
1966-70
|
|
|
|
|
|
|
|
|
|
1886 |
|
|
|
|
|
|
|
|
|
|
1961-65 |
|
|
|
|
|
|
|
|
|
|
1881 |
|
|
|
|
|
|
|
|
|
|
1956-60 |
|
|
|
|
|
|
|
|
|
|
1876 |
|
|
|
|
|
|
|
|
|
|
1951-55 |
|
|
|
|
|
|
|
|
|
|
1946-50 |
|
|
|
|
|
|
|
|
|
|
1941-45 |
|
|
|
|
|
|
|
|
|
|
1936-40 |
|
|
|
|
|
|
|
|
|
|
1931-35 |
30 |
35 |
40 |
45 |
50 |
55 |
60 |
65 |
70 |
75 |
80 |
|
|
Age at death ( first year at five years ) |
|
|
|
|
U.S. white male ling cancer mortality rates per 100,000 by age at death, period of death, and birth cohort. Solid lines show rates of death for the age groups cross-sectionally at each calendar period. Dashed lines show the corresponding rates of death as each birth cohort (marked 1901, 1896, 1891, etc.) goes through its ageing. To avoid too much overlapping, birth cohorts are labelled only for 5-year increments in the midpoints of 9 year intervals, such as “1901” for the interval 1897–1905. [Taken from Chapter Reference 36]
22.8.2Applications of Longitudinal Analysis
Longitudinal analyses for the individual members of a followed cohort have become a prominent statistical challenge only in recent years,41-43 as pertinent data were made available from long-term clinical trials and epidemiologic cohort studies that were seldom conducted until the past few decades. The
© 2002 by Chapman & Hall/CRC
analyses have been done to appraise recurrent events, to explore dynamic impacts, to develop longitudinal correlations, to “track” changes, and to validate longitudinal cross-sections.
22.8.2.1 Recurrent Events — Longitudinal analyses become needed when a “failure event” — such as a streptococcal or urinary tract infection, episode of diarrhea, asthmatic attack, or epileptic seizure — can appear recurrently, thus precluding the conventional life-table procedure. Formal longitu - dinal analyses can be attempted, and proposals have been made44 to extend the life table to repeated and changing events, but the most commonly used approach for recurrent events is to convert the denominator to person-durations of observation, rather than individual persons.
The results for the group are then cited, for example, as the number of events per patient-year. If certain persons are particularly susceptible to recurrent events, the results can also be cited as number of “eventful” persons per patient-year. Thus, if the recurrent events are streptococcal infections, the “attack rates” can be cited per patient-year either for number of infections or for number of infected persons. The process was discussed earlier in Section 17.3.3.
22.8.2.2Dynamic Impacts — Longitudinal studies can also be used to investigate two types of dynamic impact. In one situation, the investigators determine the way that subsequent changes in baseline factors can affect (or predict) a single outcome event. In the other situation, individual events or factors are checked for their impact on a dynamic outcome — such as weight, blood pressure, or psychic status — that can change over time.
The first situation is illustrated by the famous Framingham study,45 in which persons without known heart disease were followed for more than 20 years to study the original values and changes in baseline variables regarded as “risk factors” for the subsequent development of coronary heart disease. The members of the cohort were re-examined in sequential “panels” every few years to check the pertinent variables and to determine whether coronary disease had occurred.
In other studies of dynamic factors, a predictive impact has been noted from declines in pulmonary function46 and sometimes from a patient’s immediate or short-term early response to treatment.47 Although “time-dependent covariate” and “time series” strategies have been developed for these analyses, an easier and more effective approach may be to record the successive values of pertinent clinical variables and then repeat the prognostic estimation at each new “zero-time” status, after a pertinent short-term interval.
The second type of dynamic longitudinal study, with moving values for the outcomes, has been done to check the impact of naturally occurring events in cohorts of women. For example, pregnancy was appraised for subsequent effects on adiposity,48 and menopause for impact on depression.49
22.8.2.3Longitudinal Correlations — In another pattern of longitudinal analysis, the changes that occur over time can be correlated with one another. For example, the results of repeated pulmonary function tests may be correlated with concomitant respiratory symptoms, smoking status, and age.50 Changes in endogeneous testosterone over time may be correlated with age and with changes in triglyc - erides and high density lipoprotein cholesterol.51 Serial values of ambient air pollutants, adjusted for temperature, humidity, and time of week or season, did not affect serial peak expiratory flow rates in asthmatic children.52 The longitudinal correlations may also be used prognostically if data are obtained for a specific target outcome.53,54
22.8.2.4Tracking — The term tracking is commonly used for the process of following a patient’s course (or “trajectory”) in successive values of certain factors, such as blood pressure or serum cholesterol. The analysis will often check whether a person’s initial rating or rank persists as time progresses. For example, do children who are in the lower (or upper) percentiles of height and weight at age 2 remain in those percentiles at age 9 or age 19? Does asthma or acute rheumatic fever tend to have the same kinds of manifestations in recurrent episodes as in the initial attack? In a longitudinal follow-up study, when compared with children of parents with coronary artery disease (CAD), the offspring of those with CAD were found to have progressively higher development of cardiovascular “risk factors.” 55
©2002 by Chapman & Hall/CRC
22.8.2.5 Validation of Longitudinal Cross-Sections — A major advantage of longitudinal analysis is the opportunity to resolve disputes about the value of research with longitudinal cross-sections. The effort and expense of long-term follow-up could be avoided if cross-sections of appropriately aged persons yielded the same results that might emerge from following a large single group of persons as they grew older. For example, Margolis et al.56 contended that short-term follow-up of a suitably analyzed cross-section gave essentially many of the same results obtained in the long-term Framingham cohort. In four contradictory examples, however, longitudinal analyses were used to show the “noncomparability of longitudinally and cross-sectionally determined annual change in spirometry,”57 to demonstrate the error of previous cross-sectional studies claiming an increase of stillbirth in women who postpone childbearing until their “late twenties,” 58 to refute the wrong results of cross-sectional data for assessing “generational changes in the lifetime risk of depression or other psychiatric disorders,”59 and to note that cross-sectional estimates of change in urinary symptom severity may underestimate the true effect of aging on prostatism.60
22.8.3Statistical and Scientific Problems
Longitudinal analytic studies have all the difficulties cited earlier for follow-up in any type of cohort research, but the problems are much more complex, occurring from persons, times, and data.
An obvious scientific difficulty is what to do about persons who are “intermediate drop-outs,” with subsequent longitudinal measurements that are unknown. Should they be excluded from the analysis, or conversely, with the intention-to-treat principle, should they be listed as though they had been followed to the end, receiving the originally assigned regimen? If the latter, should their most recent set of results be “carried over” and listed as the final data? In timing, the problems involve what to do with repeated measurements that are obtained irregularly when members of the cohort do not comply with the scheduled arrangements. (The problem of irregular long-term measurements seldom occurs when “acute” conditions — such as congestive heart failure, gastrointestinal bleeding, acute myocardial infarction, extremely low birth weight, or non-traumatic coma — have a short-term outcome.) An additional challenge refers to the timing of asymptomatic events (such as major changes that can be shown only in laboratory tests). Chappell and Branch61 have described the “Goldilocks dilemma” of arranging for follow-up intervals that are neither too short nor too long, but just right.
In data, a major question is what to do about missing subsequent values. Should they be imputed from other members of the cohort or “guesstimated” from a statistical regression of previous values for the individual person? Another set of problems involves the management of both missing data for interme - diate tests and wrong timing of tests.
As might be expected, many mathematical models have been proposed 62–65 for analyzing the scientific and statistical complexity. In a particularly clear account, Matthews et al.66 classified two types of challenges presented by the patterns (peaks or growth) of longitudinal data, indicated the specific questions associated with each challenge, and proposed methods of answering each question. The authors strongly urged that the analyses be constructed from pertinent summaries of each patient’s data, rather than with the more customary approach, which uses summaries of the group at different points in time. For example, instead of fitting a regression line through the group’s mean values as serial time progresses, we might fit a suitable regression line through each person’s data and then try to summarize the set of regression lines.
The choice of optimum analytic methods will doubtlessly be expanded and clarified as more experience is acquired with longitudinal studies.
© 2002 by Chapman & Hall/CRC
References
1. Berkson, 1950; 2. Kaplan, 1983; 3. Kaplan, 1958; 4. Chiang, 1961; 5. Kernan, 1991; 6. Feinstein, 1985; 7. Kurtzke, 1989; 8. Dorn, 1950; 9. Ederer, 1961; 10. Peto, 1977; 11. Gentleman, 1991; 12. Waller, 1992; 13. Machin, 1988; 14. Pocock, 1982; 15. Kadel, 1992; 16. Greenwood, 1926; 17. Gehan, 1965; 18. Prentice, 1979; 19. Peto, 1972; 20. Matthews, 1988; 21. Savage, 1956; 22. Breslow, 1984; 23. Gehan, 1972; 24. Cox, 1972; 25. Tibshirani, 1982; 26. Coldman, 1979; 27. Haybittle, 1979; 28. Mantel, 1966; 29. Forsythe, 1970; 30. Freedman, 1982; 31. Borenstein, 1994; 32. Manton, 1980; 33. Tsai, 1982; 34. Kochanek, 1994; 35. Merrell, 1947; 36. Kupper, 1985; 37. Holford, 1983; 38. Holford, 1985; 39. Gittlesohn, 1982; 40. Feinstein, 1987c; 41. Plewis, 1985; 42. Dwyer, 1992; 43. Diggle, 1994; 44. Hoover, 1996; 45. Kahn, 1966; 46. Rodriguez, 1994; 47. Esdaile, 1992; 48. Smith, 1994; 49. Avis, 1994; 50. Sherrill, 1993; 51. Zmuda, 1997; 52. Agocs, 1997; 53. Lieberman, 1983; 54. Capdevila, 1994; 55. Bao, 1997; 56. Margolis, 1974; 57. Glindmeyer, 1982; 58. Resseguie, 1976; 59. Simon, 1992; 60. Jacobsen, 1995; 61. Chappell, 1993; 62. Zeger, 1992; 63. Stukel, 1993; 64. Twisk, 1994; 65. Zerbe, 1994; 66. Matthews, 1990.
Exercises
22.1. Here is the “skeleton table” of a set of data for subsequent survival of a group of patients with a particular disease:
(1) |
|
(3) |
|
|
|
|
|
|
Years |
(2) |
Died |
(4) |
(5) |
(6) |
(7) |
(8) |
(9) |
after |
Alive at |
during |
Lost to |
Withdrawn |
Adjusted |
Mortality |
Survival |
Cummulative |
Diagnosis |
Beginning |
Interval |
Follow-up |
Alive |
Denominator |
Rate |
Rate |
Survival Rate |
|
|
|
|
|
|
|
|
|
0–1 |
126 |
47 |
4 |
15 |
|
|
|
|
1–2 |
|
5 |
6 |
11 |
|
|
|
|
2–3 |
|
2 |
0 |
15 |
|
|
|
|
3–4 |
|
2 |
2 |
7 |
|
|
|
|
4–5 |
|
0 |
0 |
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
22.1.1.Using a fixed-interval method of analysis and calculating all rates to at least three decimal places (to avoid errors due to rounding), complete the table.
22.1.2.Using the direct method of decrementing denominators, calculate the cumulative (or “total”) survival rates at each of the cited intervals in the table.
22.1.3.Comment on the different results obtained for 5-year cumulative survival rates with the two methods of calculation. Which method would you prefer to use as a clinician evaluating treatment and making decisions in patient care?
22.2.For the data shown in Exercise 22.1, assume that the investigators checked each patient’s status
every 6 months. The deaths and censored events listed in the data were noted at the following times:
|
During Interval |
|
|
Lost to Follow-Up |
“Withdrawn Alive” |
Deaths at End of Interval |
|
|
|
|
0–6 mos. |
3 |
5 |
32 |
6–12 mos. |
1 |
10 |
15 |
12–18 mos. |
3 |
7 |
3 |
18–24 mos. |
3 |
4 |
2 |
24–30 mos. |
0 |
8 |
2 |
30–36 mos. |
0 |
7 |
0 |
36–42 mos. |
1 |
3 |
0 |
42–48 mos. |
1 |
4 |
2 |
48–54 mos. |
0 |
5 |
0 |
54–60 mos. |
0 |
1 |
0 |
|
|
|
|
© 2002 by Chapman & Hall/CRC
Prepare a Kaplan-Meier analysis for these data.
22.3. Prepare a graph showing the results of the fixed-interval and Kaplan-Meier analyses of these data. (You need not recalculate the previous data of Exercise 22.1.)
100
80 Overall
%
40
20
0
Years
FIGURE E.22.5
Overall and disease-free survival for entire group of 16 study patients.
22.4.Like the direct method, the variable-interval method does not augment the interval denominators to include contributions from people censored during an interval. In Exercises 22.1 and 22.2, are the results of the variable -interval method more like those of the direct method or those of the fixed-interval method? What is your explanation for the observed distinction?
22.5.The graph in Figure E.22.5 is an exact reproduction from a recent report in a respectable medical journal. If you were reviewing this manuscript for publication, what suggestions would you offer about how to improve the graph?
© 2002 by Chapman & Hall/CRC
Part IV
Additional Activities
This last part of the text is devoted to two sets of additional topics in making stochastic decisions and in describing categorical contrasts and associations.
The new stochastic decisions arise when the goals differ from those of previously discussed stochastic tests, almost all of which were aimed at confirming something “big.” The new descriptive strategies are used to “adjust” for “confounding variables” and to provide indexes of association for arrangements that cannot be suitably managed with the methods discussed earlier in Part III.
In the stochastic tests discussed thus far, the main goal was to show “stability” and proclaim “signif - icance” for an observed distinction, do, that was regarded as quantitatively impressive. Regardless of the index (increment, ratio, etc.) used to describe the impressive quantitative magnitude of do, its stochastic confirmation was sought from a P value or confidence interval calculated under the null hypothesis of “no difference.” After this calculation, the investigator would conclude that the results were stochastically either “significant” if the null hypothesis was rejected or “nonsignificant” if the hypothesis was conceded.
Beyond confirming a single quantitatively impressive value of do, however, stochastic tests can also have many purposes. Sometimes, if do is smaller than expected, the goal is to see how large it really might be. This type of exploration can be done with confidence intervals, but can also involve the ideas of capacity, alternative hypothesis, type II errors, power, and doubly significant sample sizes that are discussed in Chapter 23. In other situations, the investigator may want to find and confirm a small rather than large distinction. With this goal, discussed in Chapter 24, the stochastic procedures become re-oriented for testing “equivalence.” Another important issue in that chapter is the interpretation of disparities between the investigator’s goals and the statistical results. In still other situations, discussed in Chapter 25, the stochastic testing is done on multiple occasions rather than just once. The multiple testing may occur for different hypotheses (called “multiple comparisons”) within the same set of data, for repeated sequential tests of the same hypothesis in an accumulating set of data, or for a meta-analytic aggregation of data pooled from a series of individual studies, each of which has already been stochas - tically tested.
The additional approaches to association occupy the last four chapters. With new descriptive indexes, discussed in Chapter 26, the two-group contrasts discussed in part II can receive special approaches to deal with the problems of “confounding.” The approaches involve such tactics as “adjustment,” “standardization,” and “matching.” A different descriptive challenge requires new indexes of association, discussed in Chapter 27, for categorical relationships that are not suitably managed with the strategies previously considered in Part III.
The last two chapters, which contain no Exercises at the end, are intended mainly to familiarize you with two sets of procedures that you may meet in medical literature. For reasons noted in those chapters, you may not do (or want to do) any of these procedures yourself, but they regularly appear in published papers. The non-targeted analyses in Chapter 28 contain methods for reducing multiple variables (or categories) into a smaller set of variables (or categories) without aiming at a specific target variable. The analysis of variance, in Chapter 29, aims at a target variable that is associated with more than two “independent” categories.
The additional stochastic and descriptive topics are not part of the common daily events in medical statistics, but they occur often enough to warrant inclusion among the principles needed by thoughtful readers or users.
© 2002 by Chapman & Hall/CRC
23
Alternative Hypotheses and Statistical “Power”
CONTENTS
23.1Sources of Wrong Conclusions
23.1.1Scientific Problems
23.1.2Statistical Problems
23.2Calculation of Capacity
23.3Disparities in Desired and Observed Results
23.3.1General Conclusions
23.3.2Group Size for Exclusion of δ
23.4Formation of Alternative Stochastic Hypothesis
23.4.1Statement of Alternative Hypothesis
23.4.2Alternative Standard Error
23.4.3Determining ZH and PH Values
23.4.4Role of β
23.4.5Analogy to Diagnostic Marker Tests
23.4.6Choice of β
23.5The Concept of “Power”
23.5.1Statistical Connotation of “Power”
23.5.2Comparison of “Capacity” and “Power”
23.5.3Example of Complaints and Confusion
23.5.4Reciprocity of Zo and ZH
23.6Neyman-Pearson Strategy
23.6.1Calculation of “Doubly-Significant” Sample Size
23.6.2Example of Calculations
23.6.3Use of Tables
23.7Problems in Neyman-Pearson Strategy
23.7.1Mathematical Problems
23.7.2Scientific Problems
23.7.3Additional Scientific Problems
23.8Pragmatic Problems
23.8.1“Lasagna’s Law” for Clinical Trials
23.8.2Clinical Scenario
23.9Additional Topics
23.9.1The “Power” of “Single Significance”
23.9.2Premature Cessation of Trials with do < δ
23.9.3Choice of Relative Values for α and β
23.9.4Gamma-Error Strategy
23.9.5Prominent Statistical Dissenters
23.9.6Scientific Goals for δ
23.9.7Choosing an “Honest” δ
References
Exercises
© 2002 by Chapman & Hall/CRC
Type II errors are a common event in stochastic testing. They occur when the investigator concedes the null hypothesis and concludes that the observed results are “not significant,” although the true distinction is really impressive. These errors are the reverse counterpart of the Type I errors, emphasized in all the stochastic discussions so far, that occur when the null hypothesis is rejected although it is actually true.
In conventional statistical discussions, the stochastic conclusion is often regarded as the fundamental decision in the research. The stochastic strategy is then aimed mainly at either confirming the “correct” decisions or preventing the errors of “false-positive” and “false-negative” conclusions. The research may reach a wrong conclusion, however, for scientific rather than stochastic reasons. In the first part of this chapter, devoted to the diverse sources of wrong conclusions, the scientific problems are discussed first. A knowledge of those problems can often help either to avoid the use of an unsuitable stochastic test or to keep the investigator (and reader) from being deceived by results of an appropriate test.
23.1 Sources of Wrong Conclusions
Assuming that the raw data are themselves correct, a wrong conclusion can be produced by several types of scientific and statistical problems.
23.1.1Scientific Problems
The main scientific sources of error are biased groups, erroneous hypotheses, and erroneous interpretations.
23.1.1.1Biased Groups — Whether “significant” or “nonsignificant,” a statistical result can deviate from truth because the particular group(s) under study did not suitably represent the pertinent population or events. This problem was the source of a particularly famous error in political poll-taking. Before the U.S. presidential election of 1936, a sample of more than 1,000,000 people was assembled by a respectable magazine, Literary Digest, from the mailed-in “ballots” of readers and from randomly placed telephone calls. The opinions expressed in this huge sample led to a highly confident prediction that the Republican candidate, Alfred Landon, would win the election overwhelmingly. On Election Day, however, the incumbent Democrat, Franklin Roosevelt, won by a “landslide.” A “post-mortem examination” showed that the sampling process had a major bias, which you are invited to diagnose in Exercise 23.1.
In the example just cited, the composition of a single group (potential voters) was biased. When two groups are contrasted for conclusions about cause–effect etiologic or therapeutic relationships, biased comparisons can arise from diverse problems in susceptibility, detection, and other nonstatistical sources of distortion. If the problems are recognized and if appropriate data are available, statistical efforts can be made to adjust for the bias. The best approach, however, is scientific, not statistical. The research should be designed in ways that can avoid or substantially reduce the bias.
23.1.1.2Erroneous Scientific Hypotheses — As discussed in Chapter 11, most research comparisons involve two types of hypotheses. One of them, seldom emphasized in statistical discussions, is the scientific hypothesis. It denotes what the investigator expects (or wants) to find in the research. Always cited descriptively, the scientific hypothesis may be stated in phrases such as: “Treatment A is better than Treatment B,” or “Agent E causes or promotes development of Disease D.” The scientific hypothesis will be wrong if the proposed maneuvers do not, in fact, have the action anticipated by the investigator.
Medical history contains many accounts of erroneous etiologic and therapeutic hypotheses that were scientifically popular in different eras until truth (or suitable corrections) eventually emerged. The scientific sources of these errors were usually wrong qualitative concepts. For many centuries, disease was thought to be caused by imbalances in the four “humors” of the body. The imbalances were then “cured” by such treatments as blood-letting, blistering, purging, and puking. During the 19th century,
©2002 by Chapman & Hall/CRC