Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

TABLE 21.2

Performance of Diagnostic Indexes in a Setting of “High” Prevalence

Marker-Test Result

Diseased Cases

Nondiseased Controls

TOTAL

 

 

 

 

Positiveþ

46

2

48

Negative

4

48

52

TOTALþþ

50

50

100

Note: “Sensitivity” = 46/50 = .92; “Specificity” = 48/50 = .96; “Positive predictive accuracy” = 46/48 = .958; “Negative predictive accuracy” = 48/52 = .923.

TABLE 21.3

Performance of Same Test Shown in Table 21.2 in a Setting of “Low” Prevalence

Marker-Test Result

Diseased Cases

Nondiseased Controls

TOTAL

Positiveþ

46

38

84

Negative

4

912

916

TOTALþ

50

950

1000

Note: “Sensitivity” = 46/50 = .92; “Specificity” = 912/950 = .96; “Positive predictive accuracy” = 46/84 = .548; “Negative predictive accuracy” = 912/916 = .996.

value of the latter result would completely hide the predictive “lesion” in that table. The lesion will also be missed by Youden’s J, which is independent of prevalence, relying only on nosologic sensitivity and specificity. In Tables 21.2 and 21.3, Youden’s J will have identical results of .92 + .96 1 = .88.

21.2.4Mathematical Conversions for Clinical Usage

Because the indexes of nosologic sensitivity and specificity are unsatisfactory for clinical usage, some other approach was needed. If the original diagnostic marker research had been done in cohort populations and expressed directly in the clinically desired “predictive” indexes, the results (and main problems) of clinical accuracy would have been immediately apparent. Case-control studies, however, were much easier to do than cohort research, and besides, sensitivity and specificity seemed to be inherent, invariant properties of the diagnostic marker test. They presumably reflected the test’s performance for the selected disease, regardless of what its prevalence might be.

With this assumption, investigators sought a mathematical method that could easily, without any further work, convert the two nosologic indexes into the desired diagnostic indexes.

21.2.4.1 Ordinary Algebra — The conversion can easily be done with simple algebraic symbols, letting v = a/n1 represent sensitivity, f = d/n2 represent specificity, and P = n1/N represent prevalence. These three symbols can then be algebraically transformed into predictive diagnostic indexes for any group containing N people.

For positive predictive accuracy, we first determine appropriate substitutes for a / (a + b) in Table 21.1. Because a = vnl , and n1 = PN, we can express a as vPN. Because b = n 2 d, with n2 = (1 P)N and d = fn2, the value of b becomes (1 f)(1 P)N. Thus, positive predictive accuracy, abbreviated as ppa, becomes a/(a + b) = vPN/[vPN + (1 f)(1 P)N], which is

ppa = vP/[vP + (1 f )(1 P)]

[21.1]

When applied to Table 21.3, this formula produces ppa = (.92)(.05)/[(.92)(.05) + (.04)(.95)] =

.04/[.046 + .038] = .046/.084 = .55, which is the same result obtained previously.

For negative predictive accuracy, we substitute for d /(c + d). The value of d becomes fn2 = f(1 P)N;

and c = (1 v)n1 = (1 v)PN. Consequently, with suitable algebraic arrangements, the negative predictive

accuracy (npa) for d/(c + d) is calculated as

 

npa = [f(1 P)]/[f(1 P) + (1 v)P]

[21.2]

© 2002 by Chapman & Hall/CRC

For Table 21.3, the calculation would be (.96)(.95)/[(.96)(.95) + (.08)(.05)] = .912/(.912 + .004) =

.912/.916 = .996, which is also the same result obtained previously.

Formulas [21.1] and [21.2] can easily show why misleading clinical results are produced by casecontrol studies in which P is about .5. When P = .5, 1 P = .5, and so ppa becomes v/[v + (1 f)], i.e., sensitivity/ [sensitivity + (1 specificity)], and npa becomes f / [f + (1 – v)], i.e., specificity /[specificity + (1 sensitivity)]. In this situation of high prevalence, if nosologic sensitivity and specificity have essentially similar high values, their counterparts in diagnostic sensitivity and specificity will also have similar high values.

In Table 21.3, however, where prevalence is .05, ppa will be .05v/[.05v + .95 (1 f)]. The result will be strongly affected (and reduced) by the relatively large value of .95 (1 f) in the denominator. Conversely, npa will be .95f/[.95f + .05 (1 v)]. The result will usually be close to 1, because the .05 (1 v) term in the denominator, being relatively small, will have little effect.

21.2.4.2 Bayes Theorem — A mathematically “elegant” way of achieving the simple algebraic conversions just described is to use the complex ideas and symbols of Bayes Theorem. Its basis is illustrated with the Venn diagrams of Figure 21.1, which shows the Boolean relationship in a population where one group, D, has the focal disease, and the other group, T, has a positive marker test for that disease. The complementary symbol, D , represents persons who do not have the disease, and T , those who have a negative (i.e., nonpositive) result in the marker test. In the combined diagram, the doubly cross-hatched group in the upper left corner represents D T, i.e., the “true positive” persons who have both a positive result and the disease. The group of people shown without shading in the lower right corner of Figure 21.1 represent D T , i.e., the true negative group.

FIGURE 21.1

Rectangular Venn diagrams showing groups involved in Bayes Theorem. The left side of the upper section shows the tested “population” divided into D, persons who have the disease under study, and not D. The division on the right side shows T, persons with a positive marker test for that disease, and those without. The overlap in the lower section shows the four groups in the customary 2 × 2 table. Persons with D T , who have neither the disease nor a positive test, are represented in the lower right corner.

 

_

D

D

 

T

 

_

 

T

D

T

_

T

D

D

_

_

_

T

D

T

If the symbol “N( )” is used to represent the number of persons in a particular group, the number of persons with disease D is N(D), and the prevalence of disease D in the total group of N persons is N(D) /N, which is also called the probability of the disease, P(D). Analogously, the prevalence (or probability) of persons with a positive test is P(T) = N(T)/N. The prevalence (or probability) of persons with a true positive result is P(D T) = N(D T)/N. All of these probabilities represent proportionate occurrence of the cited entities in the total population under investigation.

The other pertinent entities in Bayes Theorem are called conditional probabilities. They represent the proportional occurrence of a subgroup within a particular group. Thus, the symbol P(T D) represents the proportion of positive test results in diseased persons. The expression is pronounced “probability of T, given D”; and the symbol is a vertical “ ” mark, not the diagonal “/” mark often used for division.

© 2002 by Chapman & Hall/CRC

The proportion of diseased persons among those with a positive test is P(D T). Figure 21.1 demonstrates the relationships that can be expressed or defined as

P(T D) = P(D T)/P(D)

[21.3]

and

 

P(D T) = P(D T)/P(T)

[21.4]

Solving these two equations for P(D T) and then setting the results equal to each other, we get

P(D T) [P(T)] = P(T D) [P(D)]

[21.5]

Some additional thought about these symbols will reveal that P(D T) represents positive predictive accuracy; P(T) represents the prevalence of a positive test result; P(T D) represents the nosologic sensitivity of the marker test; and P(D) represents the prevalence of the disease. In the jargon developed for statistical communication, P(D) is also called the prior or pretest probability of the disease, and P(D T) is called its posterior or posttest probability.

Because we want to know P(D|T), we can solve Equation [21.5] to get

P(D|T ) = P-------------------------------(T|D)P(D )

[21.6]

P(T)

 

which is one of the simpler ways of expressing Bayes Theorem for diagnostic marker tests. The more complex expressions are derived by suitable substitutions of “known” values for those that are not known. To get the value of P(T) in Formula [21.6], we note from Figure 21.1 that P(T) = [N(T D) + N(T D )]/N which becomes

P( T ) = [P(T D) P(D)] + [P(T D) P(D)].

The cited value for specificity is P( D T )/P(D), which is P( T D). The reciprocal value, 1 – P(T D), will be P(T D). The reciprocal value of P(D) is P(D) = 1 P(D). When the cited value of P(T) is substituted into Equation [21.6], we can use the expressions of conditional probability, and the values for sensitivity, specificity, and prevalence of disease, to write

P(D|T) =

-----------------------------------------------------------------P(T|D)P(D )------------------------------------

[21.7]

 

[P(T|D)P(D )] + [1 – P(

T

|D )][1 – P(D )]

 

This complex expression for positive predictive accuracy, cited in terms of conditional probabilities, says exactly the same thing as the much simpler algebraic expression in Formula [21.1].

The application of Bayes Theorem can be illustrated with exactly the same data used earlier to show the change in positive predictive accuracy from Table 21.2 to Table 21.3. To apply Formula [21.7], we know that sensitivity, i.e., P(T D), is .92 and that specificity, i.e., P( T D), is .96. In Table 21.3, prevalence, i.e., P(D), is 50/1000 = .05. To determine positive predictive accuracy, i.e., P(D T), we now substitute into Formula [21.7] to get (.92)(.05)/[(.92)(.05) + (1 .96)(l .05)] = .55, which is the same result obtained previously.

An analogous expression can be developed for the Bayesian citation of P( D T ), which indicates negative predictive accuracy. You can work out the details for yourself or find them cited in pertinent publications elsewhere.

An important (and perhaps comforting) feature of Bayes Theorem is that it is simply a manipulation of the algebraic truism recorded in Expressions [21.3], [21.4], and [21.5]. For most purposes of analyzing diagnostic marker tests, nothing more need be known about Bayes Theorem itself.

21.2.4.3 Likelihood Ratio — A more sophisticated manipulation of the Bayesian algebra pro - duces an entity called the likelihood ratio. It is used later for other types of diagnostic calculations, but it has become an important intellectual contributor to a new form of stochastic reasoning, called Bayesian inference.

© 2002 by Chapman & Hall/CRC

To demonstrate the use of the likelihood ratio, we first note that results in Figure 21.1 for the “false positive” group, P(D T), can be included in two expressions as

 

 

P(T

 

 

 

 

 

 

 

 

 

 

 

[21.8]

D) = P(D T)/P(D)

and

 

 

 

P(

 

 

T) = P(

 

T)/P(T)

[21.9]

D

D

Solving [21.8] and [21.9] for P(

 

T) and equating the two results, we get

 

D

 

 

 

P(T

 

 

 

 

 

 

[21.10]

 

 

D) P(D) = P(D T) P(T)

If Equation [21.10] is solved for P(T), and if the result is substituted for P(T) in Equation [21.6], we get

P(D|T)

=

P(T|D ) ×

P(D)

[21.11]

P-----(-----D-------|T)

P------------------------------(T|

 

) ×

P(------D-)

 

D

 

In Equation [21.11], the far right entity, P(D)/P(D), is simply the odds for prevalence. It is (nl /N)/(n2/N) = nl /n2, and is also called the prior odds of the disease. The value of P(D T) /P(D T) on the left side of the equation is the odds for a true positive among positive results. It is (a/m1)/(b/m1) = a/b, and is also called the posterior odds for a positive marker test.

The value of P(T D)/P(T D) is called the likelihood ratio. It converts the prior odds into the posterior odds according to the formula

posterior odds = likelihood ratio × prior odds

[21.12]

or

 

likelihood ratio = posterior---------------------------------odds

[21.13]

prior odds

 

Because P(T D) = sensitivity = a/nl, and P(T D) = 1 specificity = 1 (d/n2), we can express the test’s accomplishment for a positive result in the 2 × 2 decision matrix as

positive likelihood ratio =

sensitivity

[21.14]

---------------------------------

1 – specificity

For a negative diagnostic marker result, the prior odds are n2/n1 and the posterior odds are d/c. When appropriately arranged, the negative likelihood ratio for a 2 × 2 table becomes (d/n 2)/(c/nl), which is

negative likelihood ratio =

specificity

[21.15]

--------------------------------

1 – sensitivity

This reasoning becomes pertinent later when likelihood ratios are used to express levels of diagnostic marker results.

21.2.4.4 Illustration of Likelihood-Ratio Calculations — To illustrate the numerical activities, the positive likelihood ratio in Table 21.2 is [46/50]/[2/50] = 23. The negative likelihood ratio is [48/50]/[4/50] = 12. The same results are obtained respectively in Table 21.3, where [46/50]/[38/950] = 23, and [912/950]/[4/50] = 12. This similarity would be expected because sensitivity and specificity are the same in both tables.

The main difference in the two tables is the prior odds, which is 50/50 = 1 in Table 21.2, and 50/950 =

.0526 in Table 21.3. Consequently, the two tables have different values for posterior odds, calculated as likelihood ratio × prior odds. The posterior odds for a positive result are 23 × 1 = 23 in Table 21.2, and 23 × .0526 = 1.21 in Table 21.3. An even simpler way of getting these results for the first row in each table is to note that 46/2 = 23 in Table 21.2 and 46/38 = 1.21 in Table 21.3.

© 2002 by Chapman & Hall/CRC

After being calculated, the posterior odds must be converted to the probabilities that express diagnostic accuracy. The conversion uses the mathematical “construction” that probability = odds/(odds + 1). Thus, the positive predictive accuracy is 23/(23 + 1) = .958 in Table 21.1 and 1.21/(1.21 + 1) = .548 in Table 21.2. Various nomograms5–7 have been proposed to produce these transformations directly from values for the likelihood ratios and prior probability.

21.2.5Bayesian Inference

The logic of Formula [21.12] is especially cogent for the statistical reasoning called Bayesian inference, which differs from the frequentist methods that underlie all the stochastic strategies discussed in this text. Bayesian inference relies on the likelihood-ratio relationship between prior and posterior odds.

The Bayesian inferential methods are becoming fashionable today, especially for advanced graduate courses and doctoral dissertations in academic departments of statistics. Nevertheless, the value of Bayesian inference is highly controversial,8,9 and its ultimate role is currently uncertain. A fundamental problem in the dispute is that subjective choices are often used for the values of prior odds. Clinicians who have received decades of exhortation to avoid anecdotal evidence and to get precise documentation for any quantitative statements are often surprised and ruefully chagrined to discover a new brand of mathematical reasoning that allows completely subjective guesses to be made about prior odds.

The main point to be noted now, however, is that Bayes Theorem, in contrast to the complexities of Bayesian inference, is a relatively simple mathematical mechanism for converting the case-control (“nosologic”) values of sensitivity and specificity, and the anticipated or observed prevalence of disease, into the desired “predictive” indexes of diagnostic accuracy. This mechanism is what has made Bayes Theorem so famous (or infamous) in the statistical analysis of diagnostic marker tests.

21.2.6Direct Clinical Reasoning

If all the foregoing mathematical methods had not been developed and subsequently advocated by academic investigators, a simple direct procedure might have been used. When diagnostic-marker tests are applied in cohorts of patients, the results can be promptly expressed in rates of diagnostic accuracy. For the symbols in Table 21.1, these rates would be the “horizontal” values of a/m1 and d/m2 that are respectively called positive and negative predictive accuracy.

In a recent survey of practicing clinicians in diverse specialties, Reid et al. 10 found that the formal mathematical strategies were almost never used. Instead, the clinicians — although often believing that they evaluated the vertical indexes of “sensitivity” and “specificity” — usually examined the horizontal results in their own groups of patients. The results were commonly expressed in the reciprocal values of false positive and false negative rates, which would be b/m 1 and c/m2. With these direct expressions, clinicians could promptly appraise the accuracy of the tests without having to rely on special mathematical calculations, transformations, or nomograms.

The direct, “sensible” clinical cohort approach may eventually replace the indirect case-control mathematical transformations. Because the indirect methods remain popular, however, they create the additional challenges (and problems) discussed in the next few sections.

21.3 Demarcations for Ranked Results

Regardless of whether Bayesian, likelihood-ratio, or direct expressions were used, all of the components discussed so far were dichotomous: the disease was either present or absent, and the diagnostic marker test was either positive or negative. The splendid mathematical tactics applied to these double dichotomies require binary citations for both the disease and the marker test.

Many diagnostic marker (or even gold-standard) tests, however, are expressed in ranked values. They might be dimensional scales for blood glucose, serum calcium, or enzyme tests, or ordinal scales, such as the none, trace, 1+, 2+, … ratings for various urinary tests. Furthermore, many “gold standard”

© 2002 by Chapman & Hall/CRC

diagnoses are expressed in a three-category scale, containing an equivocal maybe or uncertain, in addition to the 2-category unequivocal yes or no.

These problems can be managed with three approaches: (1) converting all gold standards into binary categories, (2) choosing a binary demarcation for ranked results of the marker test, and (3) establishing ordinal zones for the marker test and using a likelihood ratio, for efficacy in each zone.

21.3.1Binary “Gold Standards”

The ranked results of marker tests can easily be converted into binary (or ordinal) arrangements that will be discussed shortly. These arrangements, however, will not take care of a gold-standard criterion that is not binary. Accordingly, diagnostic marker evaluations are almost always confined to situations in which the gold-standard disease criterion is cited as an unequivocal binary yes or no. For example, the dimensional scale of a glucose tolerance test for diabetes mellitus or a urinary culture for bacteriuria might be given a yes/no binary demarcation such as Š 500 for the sum of four glucose levels orŠ 5 × 103 for a colony count of bacteria.

The custom of using a binary gold standard can easily be justified when the demarcation comes from dimensional scales such as those used for bacterial counts or the sum of glucose values. If the boundary is disputed, the accuracy of the marker test can be re-calculated at different boundaries. For example, if 10 5 colony forming units (CFU) is chosen as the gold-standard threshold for bacterial infection in a urinary culture, and if a different threshold is preferred, the efficacy of the marker test can be determined for alternative boundaries, such as 103, 104, or 106.

Insistence on a binary gold standard, however, creates major difficulties for a three-category scale if the group with an uncertain diagnosis is omitted from the clinical reality of diseases that are usually diagnosed in the trichotomous categories11 of yes, uncertain, or no. For example, a biopsy specimen may be inadequate for making a definitive diagnostic decision; or the collection of data for a patient with chest pain may lead to the conclusion of possible, but not definite, myocardial infarction.

If the gold-standard results contain only the two groups of unequivocally diseased cases and nondiseased controls, the results for efficacy, although mathematically attractive, may seriously distort what really happens in clinical practice. This potential distortion is constantly ignored, however, for the statistical activities of both published literature and the discussion that follows. Consequently, the pertinent statistical evaluations may be deceptive when published results of marker-test accuracy for binary gold standards are applied in the nonbinary scientific realities of clinical practice.

21.3.2Receiver-Operating-Characteristic (ROC) Curves

To get the double dichotomy needed for Bayes Theorem, the gold-standard nosologic groups are divided into diseased cases and nondiseased controls, and the ranked results of marker tests are also given a binary demarcation. Without this double binary split, the marker test cannot be indexed for sensitivity and specificity, and cannot be applied thereafter in Bayesian diagnostic analyses. Choosing the best binary split for ranked marker-test values thus became a new statistical challenge. It was approached with a method, developed in engineering, that analyzes a receiver-operating-characteristic curve.12 The popularity of these curves in published reports soon made ROC become a fashionable abbreviation in the mathematics of diagnostic analysis.3

21.3.2.1 Inverse Relationship of Sensitivity and Specificity — Table 21 .4 shows the relationship of results for a ranked marker (S-T depression in an exercise stress test) and the definitive nosologic state of 150 cases of coronary disease and 150 controls. The 7 ordinal categories used for the marker can be split dichotomously at six locations, marked A, B, , E, F in Table 21.4. Each split would produce a different fourfold table, yielding different indexes for sensitivity and specificity. For

example, with split C, the fourfold table becomes

 

73

7

, having a sensitivity of 73/150 = .49 and

 

 

 

 

 

77

143

 

 

103

15

, the corresponding indexes are

specificity of 143/150 = .95. With split D, which produces

 

47

 

103/150 = .69 and 135/150 = .90.

 

 

 

 

 

135

 

 

 

 

 

 

 

 

 

© 2002 by Chapman & Hall/CRC

TABLE 21.4

Results in Diagnostic Marker Study of Coronary Artery Disease and Level of S-T Depression in Exercise Stress Test

 

Definitive State of Disease

Patients with S-T Segment

Cases of

Controls Without

Depression of

Coronary Disease

Coronary Disease

 

 

 

≥ 3.0 mm.

31

0

A ______

 

 

≥ 2.5 mm. but < 3.0 mm.

15

0

B ______

 

 

≥ 2.0 mm. but < 2.5 mm.

27

7

C ______

 

 

≥ 1.5 mm. but < 2.0 mm.

30

8

D ______

 

 

≥ 1.0 mm. but < 1.5 mm.

32

39

E ______

 

 

≥ 0.5 mm. but < 1.0 mm.

12

43

F ______

 

 

< 0.5 mm.

3

53

TOTAL

150

150

 

 

 

The results of the different demarcations for Table 21.4 are summarized in Table 21.5, which shows that sensitivity and specificity have an inverse relationship: as sensitivity increases, specificity decreases, and vice versa. This relationship is easy to prove algebraically, but can be understood intuitively if you recognize that the denominators for calculating sensitivity and specificity are the same, regardless of the level of demarcation. As the level goes downward, however, the numerator values increase for the sensitivity index and decrease for specificity.

TABLE 21.5

Summary of Nosologic Sensitivity and Specificity Calculated for Demarcations of Table 21.4

 

Location of

Number of

 

Number of

 

 

 

Boundary for

Cases

 

Controls

 

 

Demarcation

Abnormal

Included

Sensitivity

Included

Specificity

1 – Specificity

 

 

 

 

 

 

 

A

Š 3.0 mm.

31

0.21

0

1

0

B

Š 2.5 mm.

46

0.31

0

1

0

C

Š 2.0 mm.

73

0.49

7

0.95

0.05

D

Š 1.5 mm.

103

0.69

15

0.90

0.10

E

Š 1.0 mm.

135

0.90

54

0.64

0.36

F

Š 0.5 mm.

147

0.98

97

0.35

0.65

 

TOTAL

150

150

 

 

 

 

 

 

 

21.3.2.2 Construction of ROC Curves — If sensitivity is plotted against specificity at each level of the possible split, the shape of the curve will go downward to the right. To make the curve go upward, it is constructed as a plot of sensitivity vs. 1 specificity. The two pairs of values then rise monotonically.

The ROC curve for the data of Tables 21.4 and 21.5 is shown in Figure 21.2. The curve here has .21 and .31 as the lowest values of sensitivity when 1 specificity = 0, and .98 as the highest value when 1 – specificity = .65.

In a useless marker test, the cases and controls will have similar distributions, and the corresponding values of sensitivity and 1 specificity will be essentially equal in each row. The ROC curve will be a straight line at a 45° angle.

In a perfect marker test, all of the cases will be included in rows that have no controls, and all the controls will appear in rows that begin below the level of the last case. The values of sensitivity will

© 2002 by Chapman & Hall/CRC

21.3.2.3.1 Mathematical Strategy.

ascend toward a maximum of 1 while 1 specificity = 0, and will maintain the value of 1 while 1 – specificity gradually rises. The curve will resemble a capital Greek gamma, Γ .

The useless, perfect, and ordinary possibilities for the ROC curves are shown in Figure 21.3. An inverted wedge is formed as the upper and left outer lines of the Γ -shaped perfect test join the diagonal line of the useless test. The results for most ordinary tests fit into that wedge. The closer they approach the Γ shape, the better is the test’s performance.

SENSITIVITY

1.0

.9

.8

.7

D

.6

.5

C

.4

.3

B

 

F

E

.2 A

.1

.1

.2

.3

.4 .5

.6

.7 .8

.9 1.0

1 - SPECIFICITY

FIGURE 21.2

Receiver-operating-characteristic (ROC) curve for data in Tables 21.4 and 21.5.

 

 

 

 

 

 

 

Perfect test

 

1.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.9

 

 

 

 

 

 

 

Ordinary

 

 

 

 

 

 

 

 

 

 

.8

 

 

 

 

 

 

 

test

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SENSITIVITY

.7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.6

 

 

 

 

 

 

 

 

 

Useless

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.5

 

 

 

 

 

 

 

 

 

 

 

test

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

1- SPECIFICITY

FIGURE 21.3

Perfect, useless, and ordinary possibilities for ROC curves of diagnostic marker tests.

21.3.2.3 Choice of Optimal Dichotomous Boundary — The main reason for constructing ROC curves is to choose an optimal dichotomous boundary for the diagnostic marker results. For a perfect test, the choice is easy: it will be the level at which sensitivity = 1, and 1 specificity = 0. Because perfect tests almost never occur in clinical reality, however, a practical strategy is needed for the decisions. The strategy can be purely mathematical or can involve additional ideas about costs and benefits.

A simple method of choosing an optimal cut-point is to minimize the sum of false positive and false negative test results.13 Mathematical calculus will show that this point occurs when the slope of the ROC curve is at (N T)/T, where T is the total number of cases and N T is the number of controls. Because N Š T, the formula becomes (N /T) 1, and this slope is always positive.

To avoid calculations of slope, the cut-point can be found as the location where the number of accruing false positive values begins to exceed the number of accruing false negatives. Thus, for the data in Table 21.5, the accruing totals for each cut-point are as follows:

 

Accruing Totals for:

 

 

Number of

Number of

Total Number of

Cut-Point

False-Negatives

False-Positives

False Results

 

 

 

 

A

119

0

119

B

104

0

104

C

77

7

84

D

47

15

62

E

15

54

69

F

3

97

100

 

 

 

 

© 2002 by Chapman & Hall/CRC

In downward descent of these boundaries, the false negatives exceed the false positives at cut-point D, but not at point E. The total number of false results is also minimized at cut-point D. Therefore D would be the best choice of demarcation for these data. This point occurs in Figure 21.2 just before the curve begins to flatten its sharp upward slope.

With another mathematical strategy,13 the chosen cut-point will maximize the predictive accuracy of the test for its diagnostic sensitivity and specificity. Yet another mathematical tactic relies on “information content” 3,12,14 of the test, determined from logarithmic calculations of sensitivity, specificity, and prevalence.

21.3.2.3.2Cost-Benefit Strategies. Each false-positive, true-positive, false-negative, and true-neg- ative diagnostic result can be multiplied by an arbitrary monetary (or other quantitative) value that gives a “weight” to costs and benefits of that result. The score that is calculated for the different possibilities can then be used to find an optimal cut-off boundary in the ROC curve.3,13

Despite the mathematical ingenuity, this strategy has had little value beyond its contributions to academic scholarship. The basic problem is making pragmatically realistic choices of quantitative values for the costs and benefits.

21.3.2.3.3Area under the ROC Curve. Because the ROC curve gets “better” as it approaches the Γ shape, the area under the curve can be used as an index of accomplishment. In the square formed by

the coordinates of the graph, a perfect curve will have an area of 1, and a useless curve will cover an area of .5. The areas under most curves will thus range from .5 to 1.

This distinction may not help choose the best cut-point for an individual curve, but can be useful in evaluating the accomplishments of different marker tests. For example, the areas under the correspond - ing ROC curves could be compared to decide whether S-T segment depressions or CPK enzyme results are the better diagnostic markers for coronary disease.

One problem in using areas under the curve is that a useless test will have a value of .5, which may seem impressive in other contexts, such as correlation coefficients. If the kappa coefficient in Chapter 20 is intended to adjust indexes of agreement for results that might occur by chance, the area-under-the-

curve index should be similarly adjusted to reflect its superiority over a useless result. Thus, with an area of only 1 .5 = .5 available for showing “superiority,” a .8 area under the curve has produced an improvement of only .8 .5 = .3, which is a proportion of .3/.5 = 60% of what could be accomplished.

The main mathematical flaw in the area-under-the-curve strategy, however, is that the points on the curve may often come from small proportions that are numerically unstable. The problem of stability is discussed in Section 21.4.

21.3.2.4 ROC Curves as Indexes of Prediction — The area under the ROC curve has been proposed as an index for evaluating staging systems or other multivariable mechanisms that produce prognostic predictions rather than diagnostic separations.15 Aside from the problem of instability in the constituent numbers, the ROC method ignores the principle that staging systems are used for much more than individual forecasts. In the design or evaluation of therapeutic research, a prognostic staging system is most valuable for the way it distributes the patients and provides significant gradients between the stages.16 Neither of these desiderata is considered in ROC curves that are aimed only at the individual accuracy of each estimate.

21.3.3Likelihood-Ratio Strategy

To avoid choosing a single binary cut-point, the marker test results can be appraised with likelihood ratios for ordinal zones.5 The reasoning is as follows: Suppose the numerical results in any row (or zone) of the table are ti for the cases and si for the controls, with mi = ti + si as the total in that row. The grand total will be T cases and S controls, with T + S = N. In any marker test, the prior odds of a positive result from the total of cases and controls, will be T/S. In any row (or zone), such as those in Table 21.4, the posterior odds for a positive result will be ti/si. As shown earlier in Formula [21.13],

posterior odds likelihood ratio = ---------------------------------

prior odds

© 2002 by Chapman & Hall/CRC

Accordingly, the positive likelihood ratio in any zone will be

LRpos = (ti/s)/(T/S)

Analogously, the likelihood ratio for a negative result will be

LRneg = (si/ti)/(S /T)

The major advantage of likelihood ratios is the removal of the grand total denominator, N, so that prevalence does not affect the results. The values of LRpos and LRneg can therefore be calculated directly from the counts in each zone and from the individual columnar totals for cases and controls. In contrast to the separate binary values of sensitivity for cases and specificity for controls, the likelihood ratios offer “stratified” indexes of efficacy for each selected zone of ordinal categories in the cases and controls of the diagnostic marker research.

21.3.3.1Example of Calculations — In Table 21.4, the value for prior odds is T/S = 150/150 = 1.

Consequently, the posterior-odds values in each zone will provide the positive and negative likelihood ratios. For positive results, the posterior odds will be 31/0 = ∞ in the first zone, 30/ 8 = 3.75 in the fourth zone, and 12/43 = .28 in the sixth zone. The posterior odds for the corresponding negative results will be 0,

.27, and 3.58. If the value of T/S were .5 rather than 1, however, each of these positive likelihood ratios would be doubled, and each negative ratio would be halved.

21.3.3.2Disadvantage of Likelihood Ratios — Although the conversion to odds gives the likelihood ratio the advantage of avoiding the effects of prevalence, the tactic becomes a disadvantage when the result is actually applied clinically. As noted earlier, the probability value needed for a clinical decision requires special calculations or nomograms to convert the odds values and likelihood ratio for the estimated prevalence of disease in the clinical situation under scrutiny.

The likelihood ratios also have all of the problems of stability (see Section 21.4) that occur for any collection of binary proportions, and the inevitable difficulties (see Section 21.8.3.1) of any index that erroneously relies on having constant values in the varied spectrum of a disease.

21.3.4Additional Expressions of Efficacy

Eisenberg et al.17 have proposed two additional ways of expressing diagnostic efficacy. In one method, which can be used only for binary markers, the prior and posterior probability values are subtracted as P(D T) P(D), and the incremental change in probability becomes the index of accomplishment. In the second method, the indexes of accomplishment are cited as logarithms (rather than actual values) of odds in the likelihood ratio. The logarithms are preferred, for reasons noted earlier (see Section 17.5.5.2), because of asymmetrical constraints in the range of odds ratios below and above 1. If LR > 1, the values can extend up to infinity; but if LR < 1, the values can range only between 0 and 1. Besides, if LR = 1, i.e., a useless result, log LR will be zero.

21.3.5Trichotomous Clinical Strategy

From an esthetic mathematical viewpoint, the Bayesian, ROC, and likelihood-ratio approaches offer appealing solutions for the challenges of either dichotomously demarcating or examining ordinal zones of ranked marker-test categories. Nevertheless, a different approach, which uses no specific mathematical strategy, may often best represent the way in which many clinicians would interpret the data in Table 21.4.

Clinicians usually want to separate three diagnostic zones for a marker test.11 In one extreme zone, the disease should almost always be present and, at the other extreme, the disease should almost always be absent. In the middle zone, the marker-test result will be too uncertain, and additional data (or tests) will be needed for diagnostic confidence. For this type of trichotomous clinical partition, the data of Table 21.4 would be divided as shown in Table 21.6. Coronary disease is particularly likely to be present for S-T segment depressions in the upper zone of Table 21.6, and absent in the lower zone. In the middle

© 2002 by Chapman & Hall/CRC