Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
CUMULATIVE SURVIVAL RATE
1.00 
.98 Fixed Interval Curve
.96
.94 |
Variable Interval Curve |
|
|
.92 |
|
.90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Group B |
|
|
|
|
.80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3,4 |
|
7 |
|
|
|
|
|
12,13 |
|
|
|
|
|
|
|
||||||||
1 2 |
|
|
5,6 |
|
|
8 |
9 |
10 |
11 |
|
14 |
15 |
16 |
17 |
(DEATHS) |
||||||||||
|
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 YR |
2 YR |
3 YR |
SERIAL TIME
FIGURE 22.1
Survival curves for the data in Tables 22.3 and 22.4. The points and dashed lines show the results obtained with the fixedinterval technique of actuarial analysis. The solid line “staircase” shows the results obtained with the Kaplan-Meier method. The vertical arrows on the abscissa show the timing of the 17 deaths used to demarcate the Kaplan-Meier intervals. In this instance, the fixed-interval and variable-interval methods give similar results at 1, 2, and 3 years.
Regardless of which method is used, a problem occurs at the right end of the curve, when the cited proportions of survival may depend on small numbers of observed patients. Although the constituent numbers should always be shown, they are frequently omitted when artists (or computers) prepare the displayed curves. Consequently, a reader may mistakenly believe that a five-year survival rate of 30% refers to all 219 members of the initial cohort, when in fact only 10 persons in the entire group may have been followed as long as five years. This deception can be avoided only if editors and reviewers insist on showing the pertinent constituent denominators for each interval of follow-up. In an alternative approach, standard errors and confidence intervals can be calculated (see Section 22.5) and cited for each cumulative rate, but the simplest and best strategy is to let the reader see directly what is happening.
22.3 Scientific Problems and Adjustments
The main scientific problems in constructing survival curves arise from “informative censoring,” decisions about competing risks and competing outcomes, “left censoring,” and the adjustments that occur with sensitivity and relative survival analyses.
22.3.1Informative Censoring
In any of the three cited methods for constructing survival curves, the censored patients are regarded as essentially similar to those who continue under observation. The particular reasons that may have led to the censoring are ignored and are not used to modify the post-censoring results. This laissez-faire approach can lead to substantial distortions if the censorings were biased, i.e., caused by the effects of treatment or by anything other than random events. The term informative is a delicate name for this type of potentially biased censoring.
© 2002 by Chapman & Hall/CRC
For example, in a clinical trial of treatment for chronic pain, suppose the failure event is the development (or persistence) of pain severe enough to warrant vigorous supplemental therapy. If Treatment A is particularly effective, many of its recipients may drop out of the study early, without further treatment, because their pain has been “cured.” The remaining cohort for Treatment A will then contain only those patients who did not respond promptly. An analysis that does not acknowledge the successful censored patients will misrepresent the true effects of Treatment A. Conversely, many censored patients for Treatment B may have dropped out because of an adverse side effect or other untoward event that is not counted as a “failure.” The remaining uncensored cohort will then misrepresent the true effects of Treatment B.
For these reasons, in a scientifically well-conducted study, vigorous efforts are made to determine the reasons why patients dropped out so that they are not listed merely as “lost-to-follow-up.” The data might then be analyzed separately according to the reasons why the “informative censoring” took place. In the foregoing example, a separate success (rather than failure) rate might also be calculated (to show the advantages of Treatment A), and a separate adverse-event rate (to show the flaws of Treatment B).
22.3.2Competing Risks
The competing-risk problem is a variation in the general theme of informative censoring. Special mathematical efforts4 have been made to adjust life tables for deaths ascribed to “competing” nonrelevant causes, and even to calculate separate rates of occurrence for those causes, but relatively little attention has been given to the scientific difficulty of deciding when a cause is truly “non-relevant.” For example, suicide would appear to be an unrelated cause of death in a patient with cancer, but sometimes the suicide is brought on by cancer-induced depression. In a patient with a particularly difficult-to- maintain therapy, suicide may have been evoked by the discomforts of the therapy. The problem of competing risks is heightened if the untrustworthy information on death certificates is used to determine the “cause of death.”
Because of these difficulties, an unequivocal decision is not always possible about the unrelatedness of death attributed to a competing risk. Many cautious analysts therefore reject any disease-specific attributions and insist on using death itself, i.e., total deaths in the cohort, as the only unequivocal outcome event.
22.3.3Competing Outcomes
The problem of competing outcomes arises when several “hierarchical” entities can be regarded as the failure event, and when occurrence of one of the entities precludes the occurrence of another. For example, suppose the failure event is occurrence of either a stroke or death. If the patient survives a stroke, death can always occur later, but someone who dies cannot have a subsequent stroke. The same type of competitive problem arises if the failure event can be the development of either myocardial infarction or of angina pectoris, which is often ended by myocardial infarction.
Kernan et al.5 have discussed the misleading results that can occur if the competing outcomes are examined in only a single life-table analysis. The proposed solution for the problem is to do a series of analyses with appropriately chosen categories and combinations of categories for the outcome phenomena.
22.3.4Left Censoring
In any life-table analysis, an appropriate zero time must be chosen for the start of each person’s serial follow-up. In randomized clinical trials, the date of randomization is an easy, obvious choice, and so the zero-time challenge is generally ignored when discussions of survival analysis are limited to randomized trials. In other forms of clinical epidemiologic research, however, a zero-time problem commonly arises when nonrandomized cohorts are assembled to evaluate therapy or to identify either prognostic factors for outcome of disease or risk factors for etiology of disease.
In many situations, the problem is easily resolved by assembling an inception cohort, which fulfills all of three requirements. The first requirement is that the cohort consist of persons whose zero time
© 2002 by Chapman & Hall/CRC
was the date of the first therapy aimed at the disease (or the date of the decision not to treat). The second and third requirements refer to the zero-time event: It should have occurred both during a bounded secular (calendar) interval, such as 1991–1996, and at the institution(s) whose results are under study. The rationale for these criteria is discussed elsewhere.6
If a presumably effective therapy is not available, however, as in certain chronic diseases such as multiple sclerosis, zero time may be chosen as the date of first diagnosis. Consequently, the assembled group of patients at a particular specialty center may contain many who have had the disease for 5, 10, 20 years or longer before they appeared for follow-up at that center. Such patients are the residue of cohorts elsewhere whose follow-up began at least 5, 10, or 20 years earlier, but the numbers and fates of the other cohort members are unknown. If patients in the specialty-center cohort are counted as though their zero time was the date of diagnosis, the follow-up results for the total group can be substantially distorted.
This problem can be avoided with a tactic called left censoring. (The term refers to the period before the cohort observations began, rather than to the customary right censoring, which occurs afterward.) With left censoring, patients who have already had 10 years of post-diagnostic observation are “admitted” to the cohort at the 10-year interval point of serial time; they are then followed thereafter from the 10year point onward. The process of left censoring thus leads to increases in the post-zero serial denominator, rather than the usual decrements produced by right censoring. The two processes have been engagingly compared by John Kurtzke7 in “a tale of two censors.”
22.3.5Sensitivity Analysis for Lost-to-Follow-Up Problems
No bias occurs when persons who were still alive at the end of the observation period are terminally censored because of insufficient duration. A possible bias can arise, however, from imbalances in the unknown reasons for “informative censoring” among persons who have become intermediate losses-to- follow-up.
The best way of managing the problem of “unknown reasons” for intermediate losses is to avoid it. Instead of relying on mathematical adjustments for the lost patients, an investigator can improve the scientific quality of the work by getting suitable follow-up information for all patients, even those who may have moved away from the research site. In an excellent early discussion of this problem, Harold Dorn8 said, “The only correct method of handling persons lost to follow-up is not to have any.” Demonstrating the flaws of each proposed method of “adjustment,” Dorn concluded that “even a small percentage lost to follow-up, less than five percent of the total number under observation … may seriously bias [the results] … if this group has a relatively large proportion of persons withdrawing due to the condition being studied.”
A sensitivity analysis is one mathematical method of approaching the problem of unknown-reason intermediate losses. For this analysis, a final cumulative survival rate is determined by the direct method for everyone whose final status was known as dead or alive for the full period of observation. This rate is then recalculated twice: once with the assumption that all of the intermediate withdrawals were dead in their exit status, and again with the assumption that they are all alive at the end of the full observation period.
The range between the high and low “sensitivity-adjusted” survival rates will usually give a better scientific idea of potential variability in the results than the purely statistical calculation (see Section 22.5) of standard errors or confidence intervals.
22.3.6Relative Survival Rates
To avoid invidious diagnostic decisions about the cause of competing-risk deaths, any type of death can be used as the failure event. Many investigators prefer this total-death approach because it avoids possibly biased or wrong decisions about what is a “relevant” death, and because (in clinical trials) a treatment is not really successful if it reduces relevant but not total deaths. The approach also allows a straightforward analysis of results without further adjustment for “competing risks.”
© 2002 by Chapman & Hall/CRC
If uncomfortable about the absence of such adjustments, however, the data analyst can calculate a relative survival rate.9 In the first step of this calculation, a general demographic survival rate is determined from census and general mortality data for a cohort having the same age-sex composition as the group under observation. In this general-census cohort, the expected survival rate, SE, would presumably reflect all the deaths — relevant and non-relevant — that would occur in ordinary circumstances. The actual survival rate, SG, for the observed group under study, is then expressed as a relative proportion of the demographically expected general survival rate, SE. For example, suppose the observed life-table survival rate is 0.62 at the end of five years in a cohort for whom the demographically expected general 5-year survival rate would have been 0.83. The relative survival rate at that time would be 0.62/0.83 = .75.
Despite the advantage of avoiding decisions about deaths due to competing risks, the relative survival rate has two major disadvantages: (1) it can sometimes be paradoxically higher at a later follow-up date than at an earlier date; and (2) if an unusually healthy cohort is being observed or treated, the general population may not be a pertinent comparison group and the relative survival rates may exceed 1.
Relative survival rates are now used much less often than in the past, probably because newer multivariable statistical methods are readily applied to “adjust” the survival results for baseline features, such as age and co-morbidity, that can affect competing risks.
22.4 Quantitative Descriptions
An unfortunate omission in contemporary statistics is the absence of a standard quantitative index to describe either a single survival curve or the contrast of two curves.
The slope or standardized slope of a regression line through the data might offer such an index, but it has seldom been applied to survival curves, probably because the denominators vary at different times and the time relationship may not be rectilinear. The area under the survival curve could also be used as a quantitative index, but no routine methods have been developed for calculating area. Besides, the calculated areas might be misleading if two survival curves with relatively similar
areas have striking differences in shape (and in the corresponding clinical implications). Figure 22.2 shows two
curves with a crossing that can produce many complexities in the analysis and interpretation.
In the absence of a quantitative descriptive index for each curve, survival curves are almost never compared directly. Instead, the usual comparisons are stochastic rather than descriptive, using the tests discussed in Section 22.5. Because the stochastic results depend so strongly on size of the groups, the survival curves themselves should always be inspected and evaluated judgmentally to decide whether the distinctions are quantitatively (rather than just stochastically) impressive.
Beyond the “physical examination” of the curve itself, however, the median and several other indexes can be used for quantitative summaries of a single curve, and a “hazard ratio” can be used to contrast two curves.
© 2002 by Chapman & Hall/CRC
22.4.1Median Survival Time
The most obvious and readily available statistical index for quantitative description is the median survival time. Although it does not reflect a dynamic pattern, it has the major advantage of being simple and easy to interpret.
The median can always be calculated if all of the censorings are terminal and do not occupy more than half the cohort. If intermediate censorings, with an unknown survival duration, do not have an obvious chronologic pattern and are not common, they might be ignored. If worst comes to worst, the median survival can be determined from the survival curve itself, using the time at which the cumulative survival rate reaches 50%. If the 50% survival mark is a flat “step” line, the median can be estimated at half the duration occupied by the step.
The examination of median survivals was denounced, however, in a frequently cited treatise on the analysis of clinical trials. According to Peto et al.,10 the median survival is one of the thirteen “bad methods of analysis” that are “either [sic] inefficient, misleading, or actually wrong.” The main complaint is that the median can be “very unreliable unless the death rate around the time of median survival is still high.” Peto et al. therefore urged that median survival times be “treated with great caution, except for diseases in which nearly everyone dies, the data are extensive, and the life table falls rapidly through the whole region between 70% and 30% alive (the region in which the life table is used to estimate the median).”
After delivering this indictment, however, Peto et al. later in the same paper used the “bad method” of median survival to denote quantitative clinical importance for a prognostic distinction in comparing two groups. The authors wrote, “The difference between the relative death rates of 1.3 and 0.4 … represents a difference between about 18 months and 5 years in median survival time, and is thus of considerable medical significance” (p. 27).
22.4.2Comparative Box Plots
The absence of satisfactory descriptions for survival curves was lamented by Gentleman and Crowley11 who said, “In more standard settings, no analysis would be considered complete without a detailed graphical examination of the data.” After pointing out the difficulty of trying “to visually decode” the differences in plots of survival curves, the authors recommended several alternative graphic methods. Perhaps the simplest and potentially most useful method is a comparison of “censored-data box plots.” Each box is formed by the median, upper quartile, and lower quartile of the successive survival durations in the group. If data have been censored, the pertinent quantile values can be determined from the survival curve itself.
Figure 22.3 shows comparative censored box plots of survival distributions
for two treatments. The box plots are called truncated because some of the
data are missing. (With “severe censoring,” the upper quartile point may not
be observed, and the truncated box may not have a top.) The median survival,
at close to 600 days in Figure 22.3, was distinctly higher for “Treatment One” than the 200-day median for “Treat-
ment Two.” The quantitatively significant difference was also stochastically
significant, according to the “half-H- spread overlap” rule discussed in Section 16.1.2.2. (A personal communication from Dr. Gentleman has explained
© 2002 by Chapman & Hall/CRC
a potentially confusing feature of Figure 22.3. The numbers “0.19” and “0.06” represent the survival rate just before the last observed failure in each group.)
22.4.3Quantile-Quantile Plots
A quantile-quantile plot, another visual approach, allows the two curves to be compared in a dynamic manner. If not readily available from the raw data, the appropriate quantiles of survival for each group can be taken from the life-table graphs. If the quantile-quantile plot is a straight line, its location above or below the “identity” line will promptly indicate the ratio of magnitude by which the quantiles of one curve are larger or smaller for one curve than the other. Although proposed by Waller and Turnbull12 for checking goodness-of-fit between a model and the censored data, quantile-quantile plots should be readily applicable for comparing two survival curves.
22.4.4Linear Trend in Direct-Method Survival Rates
If the survival proportions at each interval are calculated with the direct method, rather than with the cumulative products of an interval method, a counted numerator and denominator will be available at each time point. Although the array of binary proportions will not be “independent” (because the same patients may appear repetitively), the points can be fit with a suitably selected regression line.
The simplest model would be an ordinary rectilinear expression, Y = 1 − bt, where Y is the survival proportion at each time duration, t. The slope, −b, would then be a descriptive quantitative index. If the survival curve seems to have an exponential-decay pattern, however, the slope might best be expressed with the model ln Y = −ct (see Section 22.2). The value of −c would then be an alternative quantitative index.
22.4.5Hazard Ratio
The hazard ratio is a descriptive index that contrasts the survival rates in two groups at a single selected point in time. Machin and Gardner13 have described the calculation of this index, together with a formula for its confidence interval.
At any time point in the life table, the hazard ratio for two groups, A and B, compares the total number of observed deaths, symbolized as OA and OB , and the number of deaths that might have been expected for the number of persons at risk in each group during that interval. If the number of persons at risk in a particular interval is NA and NB, the total number of deaths would be expected to have the same ratio as NA and NB. After all the algebra is worked out, the hazard ratio is calculated essentially as
R = (OA/NA)/(OB/NB)
In an ordinary 2 × 2 table for the alive and dead persons of two groups, with qA (= 1 – pA) and qB (= 1 – pB) representing rates of mortality, the hazard ratio is simply the ordinary risk ratio, qA/qB. Its main advantage is applicability to the decremented denominators of a life-table arrangement. Unfortunately, even when calculated at the last temporal interval after all observed deaths have occurred, the hazard ratio pertains to only one time point, not to the dynamic pattern in two compared curves.
22.4.6Hazard Plots
The cumulative survival rates are useful for making predictions and giving an overall view of what is happening to a group, but do not indicate whether the interval “force” or “hazard” of mortality is changing over time. For example, the progressive declines in the cumulative survival curves in Figure 22.1 do not indicate that the annual interval mortality rates (in Table 22.3) are relatively constant around values of
.06. Because the latter rates correspond to the annual hazard (see Section 22.2.2.4), data analysts14 have suggested that important descriptive information can be obtained by inspecting the hazard function, which is essentially a plot of the interval mortality rates over time. The hazard pattern is easier to visualize when
© 2002 by Chapman & Hall/CRC
the graph has distinctive fixed intervals rather than the erratic oscillations that can occur with particularly small intervals and numbers in the Kaplan-Meier table.
The left side of Figure 22.4 shows the customary cumulative survival rates and the right side shows the annual percentage dying in a study of mortality and possible “curability” for a cohort of patients with breast cancer. Because the hazard function levels off at similar values for the long-term survivors in Stages I–III, the analysts concluded that stage becomes relatively unimportant after 5-year survival. This distinction could be inferred from the relatively parallel cumulative survival curves on the left side of Figure 22.4, but is shown more clearly and distinctly on the right side.
|
|
|
|
|
60 |
|
|
|
|
100 |
|
|
|
|
50 |
|
|
|
|
80 |
|
|
|
|
40 |
|
|
|
|
|
|
|
|
|
Annual |
|
|
|
|
|
|
|
|
|
percentage |
|
|
|
|
|
|
|
|
|
dying |
Stage IV |
|
|
|
60 |
|
|
|
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
||
Percentage |
|
|
|
|
|
|
|
|
|
surviving |
|
|
|
|
|
|
|
|
|
40 |
|
|
|
Stage I |
20 |
Stage III |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(1299) |
|
Stage II |
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
|
|
Stage II |
10 |
|
|
|
|
|
|
|
(637) |
|
|
|
|
||
|
|
|
|
Stage III |
5 |
|
|
|
|
|
|
Stage IV (467) |
|
(1390) |
Stage I |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
|
|
|
|
0 |
|
|
|
|
0 |
5 |
10 |
15 |
20 |
0 |
5 |
10 |
15 |
20 |
|
|
Years since initial treatment |
|
|
|
|
Year of follow-up |
|
|
FIGURE 22.4
Cumulative survival (left side) and hazard function (right side) plots for a cohort with breast cancer. In Stage IV, the hazard plot is discontinued because too few patients were alive after 5 years. [Figure taken from Figures 1 and 2 in Chapter Reference 14.]
Plots of hazard functions are also used to check the assumption of “constant hazards” when survival is checked in multivariable analysis with the Cox regression procedure.
22.4.7Customary Visual Displays
Most life tables are drawn, rather than tabulated, and are depicted in one of the two types of curves in Figure 22.1, emphasizing rates for avoiding the “failure event.” The graphs can also go in an inverted direction, however, to show cumulative rates for the “failure” event itself. With suitable artistry, as shown in Figure 22.5, both types of plots for several types of events can be shown in the same graph.
22.5 Stochastic Evaluations
Standard errors and confidence intervals can be calculated at each time point of an individual survival curve; and two curves can be stochastically compared with various adaptions of conventional inferential tests.
22.5.1Standard Errors
If the cumulative direct-method survival rate is ST at time T, the standard error is determined with the customary calculation for any proportion. The formula will be
ST (1 – ST )/nT where nT is the number of survivors at time T.
With the intervals of the actuarial methods, the cumulative survival rate is a product of the individual proportions, pt = 1 − (dt/nt), where dt is the number of persons who died during the preceding interval
© 2002 by Chapman & Hall/CRC
and nt is the number of persons at risk in that interval. After the last interval, T, the value of ST = p1 × p2 × p3 × × pt × × pT. The standard error of ST is calculated by Major Greenwood’s formula16 as
SET = ST
Σ [(1 – pt )/(nt pt )]
The summation in this formula includes the individual values, from the first to the last interval, of [(1 − p1)/(n1p1)] + [(1 − p2)/(n2p2)] + + [(1 − pT)/(nTpT)]. As time progresses, the values of nt will get smaller, and the magnitude of the standard error will usually be dominated by the relatively large quantities produced by small numbers in the last few terms.
100
80
60
%
40
20
0
cardiac survival survival
event free survival
re-PTCA
surgery non-lethal MI
648 |
604 |
124 |
0 |
2 |
4 |
6 |
8 |
10 |
years
FIGURE 22.5
Long-term prognosis after immediately successful PTCA (percutaneous transluminal coronary angioplasty) in 648 patients. Event-free survival is defined as absence of repeat PTCA, bypass surgery, myocardial infarction, and death. MI, Myocardial infarction. [Figure and legend taken from Chapter Reference 15]
22.5.2Confidence Intervals
With the usual Gaussian assumptions and choices of Zα , confidence intervals can be calculated for any point on a single curve using Zα times the standard error. If two curves are being compared, however, standard errors are seldom determined for the differences in successive survival rates. Because the differences are seldom cited with confidence intervals, the stochastic comparison is usually done with methods discussed in the next section.
22.5.3Comparative Tests
The stochastic comparison of two (or more) survival curves has evoked many statistical tests, most of which can be catalogued as variations of the Wilcoxon rank or chi-square procedures. The most important thing to remember (or beware) about all these tests is that, like all stochastic procedures, they depend on group sizes. No matter how impressive the P value may be, the curves themselves should always be inspected to determine whether their difference is really quantitatively significant.
22.5.3.1 Wilcoxon-Based Tests — The Wilcoxon test for comparing ranks in two groups was generalized and applied by Gehan17 to contrast two or more survival curves. After the Gehan tactic came under criticism,18 Peto and Peto19 developed an alternative log rank test that is currently preferred.
© 2002 by Chapman & Hall/CRC
According to Matthews and Farewell,20 the Wilcoxon tests attach “more importance” to early deaths, whereas “the log-rank test gives equal weight to all others.”
22.5.3.2 Log-Rank and Chi-Square Procedures — Several chi-square procedures have been used to form a stochastic index from stratified arrangements of the ranked survival data. Almost all the procedures are derived from a rank test originally devised in 1956 by Richard Savage.21 The subsequent contributions, from many prominent statisticians, have been succinctly summarized by Breslow.22
The most commonly used type of chi-square procedure today is the log rank test, christened by Peto and Peto,19 who offered that name although the test overtly employs neither logs nor ranks. (According to Gehan,23 “the log rank statistic is the same as the statistic U(O)” previously proposed by Cox. 24) The log rank test relies on calculating the observed and expected deaths for each group, using the principle discussed for the hazard ratio in Section 22.4.5. At a particular time interval, if OT = OA + OB for the total and component deaths in groups A and B, and if N = NA + NB is the corresponding number of persons at risk of death, the expected number of deaths in each group is EA = (NA/N)OT and EB = (NB/N)OT. These entities then receive a conventional chi-square expression for each interval, and the sum of the interval values for each group is calculated as
X2 = Σ {[(OA – EA )2 /EA ] + [(OB – EB )2 /EB ]}
This result is then interpreted with 1 degree of freedom (when two groups are compared) in a chisquare distribution. Clear worked examples of the procedure have been shown by Peto et al.10 and also by Tibshirani.25 Coldman and Elwood26 offer a worked example that shows calculation of both the Wilcoxon and the log-rank statistics.
To help preserve ambiguity, the log rank procedure is also sometimes called the Mantel–Haenszel test, or the Cox–Mantel test. According to Haybittle and Freedman,27 however, the procedures have different operational mechanisms, and a test attributed to Mantel alone28 is recommended “if a substantial proportion of deaths in one group occur after the other group has been entirely removed from risk” or if the comparison is done with prognostically adjusted groups.
22.5.3.3 Permutation Resampling — Of various other proposals for stochastically contrasting two survival curves, perhaps the most interesting was Forsythe and Frey’s suggestion, 29 almost 25 years ago, that the tests be done with a “permutation technique” of rearrangement. The suggestion appears to have been ignored during the many subsequent proposals of diverse stochastic methods. Now that permutation and diverse resampling methods are becoming well known and regularly employed, the Forsythe-Frey proposal may warrant appropriate “resurrection” and reconsideration.
22.5.4Sample-Size and Other Calculations
For additional stochastic activities, Freedman30 has prepared a set of tables showing required sample sizes with the log rank test; and Borenstein31 has described the challenges (and an appropriate computer program) of planning for “precision” in hazard ratios, attrition rates, and confidence intervals, as well as the “power” discussed in Chapter 23.
22.6 Estimating Life Expectancy
One of the main reasons life insurance companies developed actuarial methods was to estimate life expectancy, not to plot survival curves. The estimates are derived from what is called an age-cohort analysis, using the “cross-sectional” or “current” demographic mortality rates available for the age and sex groups of a regional population.
© 2002 by Chapman & Hall/CRC
22.6.1Customary Actuarial Technique
Suppose the annual mortality rates for a general population at a particular calendar period are .00960 at age 55–56, .01054 at age 56–57, and .01156 at age 57–58. If a cohort contains 89,000 persons who are alive at age 55, we would expect that .00960 × 89000 = 854 of them will die in the next year, leaving 88146 (= 89000 − 854) still alive at age 56. If the 854 deaths are evenly distributed throughout the year, the dead persons will have each lived about a half year. The total number of years lived by the cohort during that year will have been 88146 + (0.5)(854) = 88573. During the next year, beginning at age 56, deaths would occur in 88146 × .01054 = 929 persons, leaving 87217 alive at age 57. The total years lived by the cohort in the 56–57 year interval will be 87217 + (0.5)(929) = 87682. The number who die in the interval from age 57 to 58 will be 87217 × .01156 = 1008, leaving 86209 alive, and a total of 86209 + (0.5)(1008) = 86713 years lived.
The cumulative total of years lived by the cohort for the three-year period will be 88573 + 87682 + 86713 = 262,968. The foregoing calculations can be iterated, using the appropriate mortality rates for each successive year of age, to obtain the annual deaths, survivors, and annual years lived at each year of age until almost everyone in the cohort has died.
The annual depletions and additions of the cumulative years-lived data can then be used to determine life expectancy. In the most common procedure, the analysis begins with a “stationary” cohort of 100,000 newborn persons who are then successively depleted by deaths at annual intervals until the cohort size becomes negligible at perhaps age 110. After the deaths and years lived are determined for each annual interval of the cohort, a “future cumulative” total of years lived is also calculated for each year. The future cumulative total consists of the total number of years lived by the cohort at that age, plus the annual total of years lived in all subsequent years. For example, in a stationary cohort of 100,000 persons who start at birth in the U.S., the future cumulative total might be 7,247,519 years. The average life expectancy at birth would then be 7,247,519/100,000 = 72.48 years.
With the stationary-cohort tactic, life expectancy can be determined for anyone at any age. Suppose 87,217 persons in the stationary cohort are alive at age 57. The future cumulative total number of years lived in that year and in all subsequent years might be 1,895,992. If so, the average remaining life expectancy at age 57 would be 1,895,992/87,217 = 21.74 years.
22.6.2Additional Procedures
In the strategy just discussed, the death rates and life expectancies were determined for a general population. The results can be made more demographically specific if the calculations use death rates for annual ages of different sex or ethnic groups.
In addition, extra refinements are sometimes used to determine what would happen to general life expectancy if diseases such as cancer were eliminated. Assuming accuracy of information, the annual death rates for cancer are subtracted from the corresponding total death rate to produce a cancer-free death rate at each year of age. The annual life-table events are then appropriately recalculated, and the new value of life expectancy denotes the arithmetical consequences of eliminating cancer. The increment between this result and ordinary life expectancy (without the conquest of cancer) is usually disappointingly small32,33 — only about a 2 to 3 year increase in average longevity.
A new life-table “disaggregation technique” was recently used34 to explore reasons why AfricanAmerican life expectancy declined, while Caucasian life expectancy rose, during 1984–89 in the United States. The authors concluded that for African-Americans the prime contributions came from HIV infection in both sexes, as well as from homicide in men and cancer in women. The diverse calculations were done, however, without attention to the massive problems caused by census underenumeration of African-Americans, as noted in Chapter 17.
© 2002 by Chapman & Hall/CRC
