Ординатура / Офтальмология / Английские материалы / Using and Understanding Medical Statistics_Matthews, Farewell_2007
.pdf
Table 5.3. The results of a study to determine the diagnostic consistency between two pathologists: (a) initial tabulation; (b) revised presentation
a Initial tabulation
|
Malignant |
Benign |
Total |
|
|
|
|
Pathologist A |
18 |
82 |
100 |
Pathologist B |
10 |
90 |
100 |
|
|
|
|
Total |
28 |
172 |
200 |
|
|
|
|
b Revised presentation |
|
|
|
|
|
|
|
Pathologist B |
Pathologist A |
|
Total |
|
|
|
|
|
malignant |
benign |
|
|
|
|
|
Malignant |
9 |
1 |
10 |
Benign |
9 |
81 |
90 |
|
|
|
|
Total |
18 |
82 |
100 |
|
|
|
|
Pr( 21 6 to). In the case of the fetal mortality study, the approximate significance level is Pr( 21 6 0.09) 1 0.25. Therefore, after adjusting the analysis for the confounding effect of clinic location, we conclude that there is no evidence in the data to suggest that the rate of fetal mortality is associated with the amount of prenatal care received.
Much more has been written about the hazards of combining 2 ! 2 tables. For example, an exact test of the null hypothesis we have just discussed can be performed. However, the details of that version of the test are beyond the intent and scope of this brief discussion. In our view, it suffices to alert the reader to the hazards involved. For situations more complicated than the one which we have outlined, we suggest consulting a statistician.
5.3. Matched Pairs Binary Data
A second situation which, at first glance, seems tailored to the straightforward use of Fisher’s test or the 2 test of significance in a 2 ! 2 table is that of binary data which incorporate matching. The following example illustrates more precisely the situation we have in mind.
Two pathologists each examine coded material from the same 100 tumors and classify the material as malignant or benign. The investigator conducting
Matched Pairs Binary Data |
51 |
Table 5.4. A 2 ! 2 table indicating the conclusion of each pathologist concerning tumor sample i
|
Malignant |
Benign |
Total |
|
|
|
|
Pathologist A |
ai |
Ai – ai |
Ai = l |
Pathologist B |
bi |
Bi – bi |
Bi = l |
Total |
ri |
Ni – ri |
Ni = 2 |
the study is interested in determining the extent to which the pathologists differ in their assessments of the study material. The results could be recorded in the 2 ! 2 table shown in table 5.3a.
Although the data are presented in the form of a 2 ! 2 table, certain facets of the study have been obscured. The total number of tumors involved appears to be 200, but in fact there were only 100. Also, some tumors will be more clearly malignant than the rest; therefore, the assumption that there is a constant probability of malignancy being coded for each tumor is unreasonable. Finally, the table omits important information about the tumors which A and B classified in the same way and those on which they differed.
A better presentation of the study results might be the 2 ! 2 table shown in table 5.3b, but this is still not a 2 ! 2 table to which we could properly apply either Fisher’s test or the 2 test of significance discussed in chapters 3 and 4. Though the observations are independent (there are exactly 100 tumors, each classified by A and by B), it is still unreasonable to suppose that for each of B’s malignant tumors, and separately for each of B’s benign tumors, there is a constant probability that A will identify the same material as malignant. Nevertheless, this is one of the two principal assumptions of both the 2 test and Fisher’s test. (Recall that, within each group, the probability of success must be constant. In this example, the two groups are B’s malignant and benign tumors.)
An appropriate way to present and analyze these data is as a series of 2 ! 2 tables, with each table recording the experimental results for one tumor. A sample 2 ! 2 table is shown in table 5.4. It should be immediately apparent that the method for analyzing this series of 2 ! 2 tables is the procedure described in the preceding section. Because the study material is so variable, each sample item represents a distinct case of the confounding factor ‘tumor material’. Therefore, we need to analyze the study by adjusting the analysis for the confounding effect of tumor material. If we do this, the 90 2 ! 2 tables which correspond to tumors on which the pathologists were agreed contribute nothing to the numerator, ( O – E – ½)2, and denominator, V, of the test statistic
5 |
Some Warnings Concerning 2 ! 2 Tables |
52 |
T (|O E | 12)2 .
V
This occurs because whenever the pathologists agree, either ai = 1, ei = 1, ri = 2
and Ni – ri = 2 – 2 = 0 or ai = 0, ei = 0, ri = 0 and Ni – ri = 2 – 0 = 2, i.e., the net contribution to O – E is either 1 – 1 = 0 or 0 – 0 = 0 and Vi = 0 in both cases
since one of ri, Ni – ri is always zero. Thus, only information from ‘discordant pairs’ contributes to a test of the null hypothesis that, for each tumor, the probability that the specimen is labelled malignant is the same for the two pathologists. We will refer to this null hypothesis as diagnostic consistency between the pathologists. A moment’s reflection will verify that if the pathologists’ diagnoses were always the same, there would be no statistical evidence to contradict the null hypothesis that they are equally inclined to diagnose a tumor as malignant. Therefore, it makes sense that only tumors on which their diagnoses are different should provide possible evidence to the contrary.
As a final, cautionary note we add that care should be exercised in using the test statistic, T, with matched pairs binary data. A rough rule-of-thumb specifies that there should be at least ten disagreements or discordant pairs. For situations involving fewer than ten, there is a fairly simple calculation which yields the exact significance level of the test, and we suggest consulting a statistician in these circumstances. For the example we have been discussing, the exact significance level of a test of the null hypothesis that there is diagnostic consistency between the pathologists is 0.0215. On the other hand, if we use the test statistic, T, it turns out that O = 9, E = 5 and V = 2.5; therefore, the observed value of T is
to (|9 5| 12)2 4.90. 2.5
According to table 4.10, the 0.05 and 0.025 critical values for the 12 probability distribution are 3.84 and 5.02, respectively. Therefore, we know that the approximate significance level of the test is between 0.025 and 0.05. This compares favorably with the exact value of 0.0215 which we quoted previously, and points to the conclusion that the data represent moderate evidence against the null hypothesis of diagnostic consistency. When the diagnoses of the two pathologists disagree, pathologist A is much more inclined to classify the tumor material as malignant than is pathologist B.
Though we have not, by any means, exhausted the subject of analyzing binary data, at the same time not all data are binary in nature. While interested readers may wish to divert their attention to more advanced treatments of this subject, we continue our exposition of statistical methods used in medical research by discussing, in chapter 6, the presentation and analysis of survival data.
Matched Pairs Binary Data |
53 |
6
U U U U U U U U U U U U U U U U U U U U U U U U U U U
Kaplan-Meier or ‘Actuarial’ Survival Curves
6.1. Introduction
In medical research, it is often useful to display a summary of the survival experience of a group of patients. We can do this conceptually by considering the specified group of patients as a random sample from a much larger population of similar patients. Then the survival experience of the available patients describes, in general terms, what we might expect for any patient in the larger population.
In chapter 1, we briefly introduced the cumulative probability function. With survival data, it is convenient to use a related function called the survival function, Pr(T 1 t). If T is a random variable representing survival time, then the survival function, Pr(T 1 t), is the probability that T exceeds t units. Since the cumulative probability function is Pr(T ^ t), these two functions are related via the equation
Pr(T 1 t) = 1 – Pr(T ^ t).
If Pr(T 1 t) is the survival function for a specified population of patients, then, by using a random sample of survival times from that population, we would like to estimate the survival function. The concept of estimation, based on a random sample, is central to statistics, and other examples of estimation will be discussed in later chapters. Here we proceed with a very specific discussion of the estimation of survival functions.
A graphical presentation of a survival function is frequently the most convenient. In this form it is sometimes referred to as a survival curve. Figure 6.1 presents the estimated survival curve for 31 individuals diagnosed with lymphoma and presenting with clinical symptoms. The horizontal axis represents time since diagnosis and the vertical axis represents the probability or chance
Probability
1.0
0.8
0.6
0.4
0.2
0
0 |
1 |
2 |
3 |
4 |
5 |
6 |
Time (years)
Fig. 6.1. The estimated survival curve for 31 patients diagnosed with lymphoma and presenting with clinical symptoms.
of survival. For example, based on this group of 31 patients, we would estimate that 60% of similar patients should survive at least one year, but less than 40% should survive for three years or more following diagnosis.
The estimation of survival curves like the one presented in figure 6.1 is one of the oldest methods of analyzing survival data. The early methodology is due to Berkson and Gage [7], and is also discussed by Cutler and Ederer [8]. Their method is appropriate when survival times are grouped into intervals and the number of individuals dying in each interval is recorded. This approach also allows for the possibility that individuals may be lost to follow-up in an interval. Such events give rise to censored survival times, which are different from observed survival times. Survival curves based on the methodology of Berkson and Gage are frequently referred to as ‘actuarial’ curves because the techniques used parallel those employed by actuaries.
The grouping of survival times may be useful for illustrative and computational purposes. However, with the increased access to computers and good statistical software which has emerged in recent years, it is now common practice to base an analysis on precise survival times rather than grouped data.
Introduction |
55 |
Figure 6.1 actually displays a ‘Kaplan-Meier’ (K-M) estimate of a survival curve. This estimate was first proposed in 1958 by Kaplan and Meier [9]. The K-M estimate is also frequently called an actuarial estimate, because it is closely related to the earlier methods. In this chapter, we will restrict ourselves to a discussion of the Kaplan-Meier estimate in order to illustrate the most important concepts. We will typically use survival time as the variable of interest, although the methodology can be used to describe time to any well-defined endpoint, for example, relapse.
6.2. General Features of the Kaplan-Meier Estimate
If we have recorded the survival times for n individuals and r of these times exceed a specified time t, then a natural estimate of the probability of surviving more than t units would be r/n. This is the estimate which would be derived from a Kaplan-Meier estimated survival curve. However, the KaplanMeier methodology extends this natural estimate to the situation when not all the survival times are known exactly. If an individual has only been observed for t units and death has not occurred, then we say that this individual has a censored survival time; all we know is that the individual’s survival time must exceed t units. In order to illustrate the general features of the Kaplan-Meier estimate, including the methodology appropriate for censored survival times, we consider the following simple example.
Figure 6.2a presents data from a hypothetical study in which ten patients were enrolled. The observations represent the time, in days, from treatment to death. Five patients were observed to die and the remaining five have censored survival times. From these data we intend to construct an estimate of the survival curve for the study population.
Although one observation is censored at Day 1, no patients are recorded as dying prior to Day 3 following treatment. Therefore, we estimate that no deaths are likely to occur prior to Day 3 and say that the probability of surviving for at least three days is 1. As before, we use the symbol Pr(T 1 t) to represent the probability that T, the survival time from treatment to death, exceeds t units. Based on the study data, we would estimate that Pr(T 1 t) = 1 for all values of t less than three days.
Nine individuals have been observed for at least three days, with one death recorded at Day 3. Therefore, the natural estimate of Pr(T 1 3), the probability of surviving more than three days, is 8/9. Since no deaths were recorded between Days 3 and 5, this estimate of 8/9 will apply to Pr(T 1 t) for all values of t between Day 3 and Day 5 as well.
6 |
Kaplan-Meier or ‘Actuarial’ Survival Curves |
56 |
a
Probability
b
0 |
2 |
4 |
6 |
8 |
10 |
|
|
Time (days) |
|
|
|
1.0
0.8
0.6
0.4
0.2
0
0 |
2 |
4 |
6 |
8 |
|
|
Time (days) |
|
|
Fig. 6.2. A hypothetical study involving ten patients. a Survival times, in days, from treatment to death ({ { death, y { censored). b The Kaplan-Meier estimated probability of survival function.
General Features of the Kaplan-Meier Estimate |
57 |
At Day 5 following treatment, two deaths are recorded among the seven patients who have been observed for at least five days. Therefore, among patients who survive until Day 5, the natural estimate of the probability of surviving for more than five days is 5/7. However, this is not an estimate of Pr(T 1 5) for all patients, but only for those who have already survived until Day 5. The probability of survival beyond Day 5 is equal to the probability of survival until Day 5 multiplied by the probability of survival beyond Day 5 for patients who survive until Day 5. Based on the natural estimates from our hypothetical study, this product is 89 ! 57 = 4063 . This multiplication of probabilities characterizes the calculation of a Kaplan-Meier estimated survival curve.
No further deaths are recorded in our example until Day 7, so that the estimate 40/63 corresponds to Pr(T 1 t) for all values of t between Day 5 and Day 7.
Four individuals in the study were followed until Day 7; two of these died at Day 7, one is censored at Day 7 and one is observed until Day 8. It is customary to assume that when an observed survival time and a censored survival time have the same recorded value, the censored survival time is larger than the observed survival time. Therefore, the estimate of survival beyond Day 7 for those patients alive until Day 7 would be 2/4. Since the estimate of survival until Day 7 is 40/63, the overall estimate of survival beyond Day 7 is 4063 ! 24 = 2063 , and so the estimate of Pr(T 1 t) is 2630 for all values of t exceeding 7 and during which at least one patient has been observed. The largest observation in the study is eight days; therefore Pr(T 1 t) = 20/63 for all values of t between Day 7 and Day 8. Since we have no information concerning survival after Day 8, we cannot estimate the survival curve beyond that point. However, if the last patient had been observed to die at Day 8, then the natural estimate of the probability of survival beyond Day 8 for individuals surviving until Day 8 would be zero (0/1). In this case, the estimated survival curve would drop to zero at Day 8 and equal zero for all values of t exceeding 8.
Figure 6.2b presents the Kaplan-Meier estimated probability of survival function for our hypothetical example. The graph of the function has horizontal sections, with vertical steps at the observed survival times. This staircase appearance may not seem very realistic, since the probability of survival function for a population is generally thought to decrease smoothly with time. Nevertheless, we have not observed any deaths in the intervals between the changes in the estimated function, so that the staircase appearance is the form most consistent with our data. If we had been able to observe more survival times, the steps in our estimated function would become smaller and smaller, and the graph would more closely resemble a smooth curve.
The staircase appearance, the drop to zero if the largest observation corresponds to a death, and the undefined nature of the estimated probability if
6 |
Kaplan-Meier or ‘Actuarial’ Survival Curves |
58 |
the largest observation time is censored, may appear to be undesirable characteristics of the Kaplan-Meier estimate of the probability of survival function. All of these features arise because the methodology attempts to estimate the survival function for a population without assuming anything regarding its expected nature, and using only a finite number of observations to provide information for estimation purposes. In some sense, therefore, these undesirable characteristics are artefacts of the statistical procedure. In practical terms, these features present no serious problems since their effects are most pronounced at points in time when very few individuals have been observed. For this reason, it would be unwise to derive any important medical conclusions from the behavior of the estimated survival function at these time points. Overall, the Kaplan-Meier estimate provides a very useful summary of survival experience and deserves its pre-eminent position as a method of displaying survival data.
Comments:
(a)One rather common summary of survival experience is the sample median survival time. This statistic is probably used more widely than is warranted; nevertheless, it is a useful benchmark. Censored data can complicate the calculation of the median survival time and, as a result, a variety of estimates can be defined. A simple indication of the median survival time can be read from a Kaplan-Meier estimated survival curve as the specific time t at which Pr(T 1 t) = 0.5. In figure 6.2b, this value may be identified as the time at which the estimated curve changes from more than 0.5 to less than 0.5. However, the estimated curve may be horizontal at the 0.5 level, in which case no unique number can be identified as the estimated median. The midpoint of the time interval over which the curve equals 0.5 is probably as reasonable an estimated median as any other choice in this situation. Use of the Kaplan-Mei- er estimated survival curve to estimate the median survival time ensures that correct use is made of censored observations in the calculation, and this is important.
As we noted earlier, if the largest observation has been censored, the K-M estimate can never equal zero and will be undefined when t exceeds this largest observation. In this case, if the K-M estimate always exceeds 0.5, then there can be no estimated median survival time. All that can be stated is that the median exceeds the largest observation.
(b)Another peculiar feature of the K-M estimated survival curve, especially at more distant times on the horizontal axis, is the presence of long horizontal lines, indicating no change in the estimated survival probability over a long period of time. It is very tempting to regard these flat portions as evidence of a ‘cured fraction’ of patients or a special group characterized in a
General Features of the Kaplan-Meier Estimate |
59 |
similar way. Usually, these horizontal sections arise because only a few individuals were still under observation, and no particular importance should be ascribed to these ‘long tails’. If the existence of such a special group of patients, such as a cured fraction, is thought to be likely, then it would be wise to consult a statistician concerning specialized methods for examining this hypothesis.
(c)Part of the problem discussed in (b) is due to the fact that most K-M estimated survival curves are presented without any indication of the uncertainty in the estimate that is due to sampling variability. This imprecision is usually quantified via the standard error – the standard deviation of the sampling distribution associated with the method of estimation. However, a standard error for the estimated survival probability at a particular time t can be calculated, and often appears in a computer listing of the calculations relating to a Kaplan-Meier curve. A range of plausible values for the estimated probability at t units is the estimate plus or minus twice the standard error (see chapter 8). It is essential to indicate an interval such as this one if any important conclusions are to be deduced from the K-M estimate.
A very rough estimate of the standard error is given by Peto et al. [10]. If
the estimated survival probability at t units is p and n individuals are still under observation, then the estimated standard error is p (1 – p)/n. Since this is
an approximate formula, it is possible that the range of plausible values p 8 2p (1 – p)/n may not lie entirely between 0 and 1; recall that all probabilities fall between these limits. If this overlap represents a serious problem, then it would be wise to consult a statistician.
(d)A critical factor in the calculation of the K-M estimated survival curve is the assumption that the reason an observation has been censored is independent of or unrelated to the cause of death. This assumption is true, for example, if censoring occurs because an individual has only been included in a trial for a specified period of observation and is still being followed. If individuals who responded poorly to a treatment were dropped from a study before death and identified as censored observations, then the K-M estimated survival curve would not be appropriate because the independent censoring assumption has been violated.
There is no good way to adjust for inappropriate censoring so it should, if possible, be avoided. Perhaps the most frequent example of this problem is censoring due to causes of death other than the particular cause which is under study. Unless the different causes of death act independently (and this assumption cannot be tested in most cases), the production of a K-M estimated survival curve corresponding to a specific cause is unwise. Instead, cause-specific estimation techniques that handle the situation appropriately should be used. Although these cause-specific methods are closely related to
6 |
Kaplan-Meier or ‘Actuarial’ Survival Curves |
60 |
- #
- #
- #28.03.202681.2 Mб0Ultrasonography of the Eye and Orbit 2nd edition_Coleman, Silverman, Lizzi_2006.pdb
- #
- #
- #
- #28.03.202621.35 Mб0Uveitis Fundamentals and Clinical Practice 4th edition_Nussenblatt, Whitcup_2010.chm
- #
- #
- #28.03.202627.87 Mб0Vaughan & Asbury's General Ophthalmology 17th edition_Riordan-Eva, Whitcher_2007.chm
- #
