19.7.
19.7.1.The graph has too much scatter for the relationship to be biologically meaningful; also r is too low.
19.7.2.Each regression line seems to intersect the Y-axis at the stated intercepts of ~ 130, ~ 48, and ~ 120.
Chapter 20
20.1.
TABLE AE.20.1
Answer to Exercise 20.1.1
|
Incremental Differences |
|
Increment First Wright |
|
between lst and 2nd |
Mean of Two |
and First Mini-Wright |
Subject |
Wright |
Mini-Wright |
First Readings |
|
Heading |
|
|
|
|
|
|
|
1 |
4 |
−13 |
503 |
|
|
−18 |
2 |
−2 |
15 |
412.5 |
|
|
−35 |
3 |
4 |
12 |
518 |
|
|
−4 |
4 |
33 |
−16 |
431 |
6 |
5 |
6 |
0 |
488 |
|
|
−24 |
6 |
−54 |
−25 |
578.5 |
|
|
−43 |
7 |
−2 |
−96 |
388.5 |
49 |
8 |
11 |
−10 |
411 |
62 |
9 |
12 |
−16 |
654 |
|
|
−8 |
10 |
4 |
13 |
439 |
|
|
−12 |
11 |
−3 |
12 |
424.5 |
|
|
−15 |
12 |
23 |
21 |
641 |
30 |
13 |
−8 |
33 |
263.5 |
7 |
14 |
−14 |
10 |
477.5 |
1 |
15 |
13 |
−9 |
218.5 |
|
|
−81 |
16 |
51 |
−20 |
386.5 |
73 |
17 |
6 |
8 |
439 |
|
|
−24þ |
Σ X2 |
7966 |
13479 |
|
|
d |
= −1.88 |
Σ X2/17 |
468.59 |
792.88 |
|
|
Σ d2 /17 = |
Σ X2 /17 |
21.64 |
28.16 |
|
24120/17 = |
|
|
|
|
37.67 |
|
|
|
|
|
|
|
20.1.1.(See Table AE.20.1.) The mini-Wright seems more inconsistent. Its average squared increment in the two readings is 28.2 vs. 21.6 for the Wright.
20.1.2.(a) No, r shows good correlation, but not necessarily good agreement, and some points seem substantially discrepant from the line of “unity.”
(b)Check mean square increment of the two readings. Table AE.20.1 shows it is 37.67. Thus, the two sets of readings agree better with themselves than across readings with each other.
(c)Check mean increment. It is −1.88 (see Table AE.20.1) and thus not very biased. Check plot of increments vs. average magnitudes. A suitable graph will show that no major pattern is evident.
20.3.
20.3.1.(NYHA IV) + (CCS IV) − BOTH = 51 + 50 − 11 = 90. Seems correct.
20.3.2.Wide disagreements in Groups I–IV and IV–I are disconcerting. They suggest that many patients without angina are disabled, and many patients with severe angina had no functional problems. This is either an odd clinical situation or a major disagreement in the two scales.
20.3.3.Not a useful table prognostically. It shows current status of patients after operation, but does not indicate pre-op status of those who lived or died. For predictions, we need to know outcome in relation to pre-op status.
©2002 by Chapman & Hall/CRC
20.5.
20.5.1.
(a)Value of X2 for “agreement” seems to be [(19 × 12) − (5 × 1)]2 (37)/(24 × 13 × 20 × 17) = 17.345, which is an ordinary chi-square calculation. This is an inappropriate approach to these data.
(b)Value of X2 for “change” seems to be (5 − 1)2/(5 + 1) = 16/6 = 2.67. This is the McNemar
formula and would be appropriate for indexing disagreement, but these small numbers probably need a correction factor so that X2M = (|5 − 1| − 1)2/6 = 9/6 = 1.5, not 2.67.
(c)Better citations would be p o = (19 + 12)/37 = 31/37 = .84, or kappa. Because pe = [(20 ×
24) + (13 × 17)]/372 = .512, kappa = (.84 − .512)/(l − .512) = .672.
20.5.2. McNemar index of bias here is (5−1)/(5+1) = 4/6 = .67, suggesting that the therapists are more likely to make “satisfactory” ratings than the patients. Because of the small numbers, this index would have to be checked stochastically. The McNemar chi-square values in 20.5.1.(b), however, are too small (even without the correction factor) to exceed the boundary of 3.84 (= 1.962) needed for stochastic significance of X2 at 1 d.f.
Chapter 21
21.1. If we start with 10,000 school-aged children, of whom 4% are physically abused, the fourfold table will show
|
|
Confirmed Condition |
|
|
Physical Exam |
Abused |
Not Abused |
Total |
|
|
|
|
|
|
Positive |
384 |
768 |
1152 |
|
Negative |
16 |
8832 |
8848 |
TOTAL |
400 |
9600 |
10000 |
|
|
|
|
|
Of the 400 abused children, 96% (384) are detected by the physical exam; of the 9600 nonabused, 8% (768) have false-positive exams. Consequently, nosologic sensitivity = 96% (384/400), nosologic specificity = 92% (8832/9600), and diagnostic sensitivity (positive predictive accuracy) = 33% (384/1152).
21.3. Results from nomogram table show:
Value of +LR |
Value of P(D) |
Value of P(D T) |
10 |
2% |
17% |
10 |
20% |
75% |
5 |
2% |
8% |
5 |
20% |
53% |
20 |
2% |
35% |
20 |
20% |
83% |
|
|
|
Usefulness of test cannot be determined without knowing its negative predictive accuracy. The nomogram technique, however, does not seem easy to use, and many clinicians might prefer to do a direct calculation, if a suitable program were available to ease the work. For example, recall that posterior odds = prior odds × LR. For P(D) = .02, prior odds = .02/.98 = .0196. For LR = 10, posterior odds = 10 × .0196 =
.196. Because probability = odds/(1 + odds), posterior probability = .196/(1 + .196) = .16. This is close to the 17% estimated from the nomogram. For a working formula, convert P(D) to prior odds of P(D)/[1 – P(D)]; multiply it by LR to yield posterior odds, and convert the latter to posterior probability as LR[P(D)/{1 – P(D)}]/(1 + LR[P(D)/{(1 – P(D)}]). The latter algebra simplifies to P(D T) = LR[P(D)]/{l – P(D) + LR[P(D)]}. Thus, for LR = 20 and P(D) = 20%, P(D T) = [20(.20)]/[1 – .2 + (20)(.20)] = .83, as shown in the foregoing table.
© 2002 by Chapman & Hall/CRC
21.5. Prevalence depends on total group size, N, which is not used when likelihood ratios are calculated from the components T and S in N = T + S.
21.7. Individual answers. 21.9. Individual answers.
Chapter 22
22.1.
22.1.1. Calculations for fixed-interval method:
|
|
Died |
|
|
|
|
|
Cumulative |
|
Alive at |
during |
Lost to |
Withdrawn |
Adjusted |
Proportion |
Proportion |
Survival |
Interval |
Beginning |
Interval |
Follow-Up |
Alive |
Denominator |
Dying |
Surviving |
Rate |
|
|
|
|
|
|
|
|
|
0–1 |
126 |
47 |
4 |
15 |
116.5 |
0.403 |
0.597 |
0.597 |
1–2 |
60 |
5 |
6 |
11 |
51.5 |
0.097 |
0.903 |
0.539 |
2–3 |
38 |
2 |
0 |
15 |
30.5 |
0.066 |
0.934 |
0.503 |
3–4 |
21 |
2 |
2 |
7 |
16.5 |
0.121 |
0.879 |
0.443 |
4–5 |
10 |
0 |
0 |
6 |
7.0 |
0.000 |
1.000 |
0.443 |
|
|
|
|
|
|
|
|
|
22.1.2. Calculations for direct method:
|
Censored People |
Cumulative |
Cumulative |
Interval |
Removed |
Mortality Rate |
Survival Rate |
|
|
|
|
0–1 |
19 |
47/107 = 0.439 |
0.561 |
1–2 |
17 |
52/90 = 0.578 |
0.422 |
2–3 |
15 |
54/75 = 0.720 |
0.280 |
3–4 |
9 |
56/66 = 0.848 |
0.151 |
4–5 |
6 |
56/60 = 0.933 |
0.067 |
|
|
|
|
22.1.3. The distinctions clearly illustrate the differences between analyses in which the censored people do or do not contribute to the intervening denominators. At the end of the fifth year of follow-up in this cohort of 126 people, we clinically know only two things for sure: 56 people have died and 60 people have actually been followed for at least 5 years. Everyone else in the cohort is either lost or in the stage of censored suspended animation called withdrawn alive. Assuming that the 12 (= 4 + 6 + 0 + 2) lost people are irretrievably lost, we still have 54 people (= 15 + 11 + 15 + 7 + 6) who are percolating through the follow-up process at various stages of continuing observation before 5 years. The cumulative survival rate of 0.067 in the direct method is based on the two clearly known items of clinical information at 5 years. The cumulative survival rate of 0.443 in the fixed-interval method is based on the interval contributions made by all the censored people. In fact, if you take 33 as half the number of the 66 (= 12 + 54) censored people and add 33 to the direct numerator and denominator at 5 years, the cumulative survival becomes 0.398; and the two methods agree more closely.
Rather than dispute the merits of the two methods, we might focus our intellectual energy on the basic scientific policy here. Why are 5-year survival rates being reported at all when the 5-year status is yet to be determined for almost as many people (54) as the 60 for whom the status is known? Why not restrict the results to what was known at a 1-year or at most a 2-year period, for which the fate of the unknown group has much less of an impact? A clinician predicting a patient’s 5-year chances of survival from these data might be excessively cautious in giving the direct estimate of 0.067. On the other hand, the clinician would have a hard time pragmatically defending an actuarial estimate of 0.443, when the only real facts are that four people of a potential 60 have actually survived for 5 years.
© 2002 by Chapman & Hall/CRC
22.3. See Figure EA.22.3. 22.5. Individual answers.
FIGURE EA.22.3
Graph showing answer to Exercise 22.3. The Ka- plan-Meier “curve” is shown as a step-function, without vertical connections.
100 |
|
|
|
|
|
KAPLAN-MEIER |
|
|
|
|
|
|
|
|
|
|
90 |
|
|
|
|
BERKSON-GAGE CURVE |
PERCENT |
80 |
|
|
|
|
|
|
|
|
|
70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SURVIVAL |
60 |
|
|
|
|
|
|
|
|
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CUMULATIVE |
40 |
|
|
|
|
|
|
|
|
|
30 |
|
|
|
|
|
|
|
|
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
|
|
|
|
|
|
|
|
|
0 |
1.0 |
1.5 |
2.0 |
2.5 |
3.0 |
3.5 |
4.0 |
4.5 |
5.0 |
|
0.5 |
Chapter 23
23.1. In 1936, telephones were not yet relatively inexpensive and ubiquitous in the U.S. Owners of telephones were particularly likely to be in higher socioeconomic classes and, therefore, Republicans. In addition, readers of the “upscale” literary magazine were likely to be college graduates, a status which, in 1936, was also commonly held by Republicans. The use of these two “sampling frames,” despite the random selection of telephone owners, produced an overwhelmingly Republican response.
23.3.
23.3.1. Authors did not indicate why they made no recommendations. Perhaps they were worried about “β error” because the confidence interval around the observed result goes from –0.120 to +.070. [Observed mortality rates were 15/100 = .150 for propranolol and 12/95 = .126 for placebo, so that result was lower in placebo by .126 − .150 = −.024. Standard error under H0 is
(27 )(168 ) ⁄[195(100 )(95 )] = .049 and two-tailed 95% C.I. would be −.024 ± (1.96)(.049) = −
.024 ± .096, which extends from −0.12 to +.072. Thus, the result might be regarded as a stochastic variation for the possibility that propranolol was 7.2% better. The authors also state that their population was unusually “mild” in prognosis. [Perhaps propranolol works better in patients with worse prognoses.]
23.3.2. The most direct, simple approach is as follows. If mortality rate with propranolol is 10% below that of placebo, the true propranolol mortality rate is 12.6% − 10% = 2.6%. In the 100 patients treated with propranolol, we would expect 2.6 deaths and 97.4 survivors. Applying a chisquare goodness-of-fit test to this “model,” we would have
2 = (15 – 2.6 )2 (85 – 97.4)2 = + = <
X ------------------------- + ---------------------------- 59.14 1.58 60.72 and P .001
2.697.4
The chance is therefore tiny that the observed results arose by chance from such a population. In a more “conventional” way, we can use the formula
|
ZH |
= |
δ – do |
|
--------------------------------------------------------------[ p A q A ⁄ n A ] + [pA qA ⁄nB ] |
|
|
|
where δ − do would be .10 − (−.024) = .124, and
[(.15)(.85) ⁄100 ] + [(12 ⁄95)(83 ⁄95) ⁄95 ] = .04936
so that ZH = .124/.04936 = 2.51, P < .025. We can reject the alternative hypothesis that propranolol mortality was 10% below that of placebo.
23.5. Figure E.23.5 shows that the values of do (labeled PC − PT) ranged from −.20 to +.20, with negative values favoring treatment. Without knowing the actual values of P C and PT, we cannot decide how large δ should be for quantitative significance. Most investigators would probably accept do = .10 as quantitatively significant, regardless of magnitudes for PC and PT. Using the principle that δ = θ PC and letting θ = .5, we would have δ ≥ .1 for PC ≥ .2, δ ≥ .05 for PC ≥ .1, and δ ≥ .025 for PC ≥ .05. With
© 2002 by Chapman & Hall/CRC
these boundaries, most of the “positive” trials that favored treatment in Figure E.23.5 seemed to have dO > δ and certainly had upper confidence limits that exceeded δ , although the lower confidence interval included 0. Accordingly, the main defect in these trials was capacity, not power. A recommendation to use the Neyman-Pearson approach would needlessly inflate the sample sizes. Conversely, the top eight negative trials, favoring the control group would seem to have their quantitative “nonsignificance” confirmed by upper confidence limits that were < δ . The other “negative” trials in Figure E.23.5 seemed to have upper confidence limits that included δ and lower limits that included 0. The investigator would thus have to concede both the original null and the alternative hypothesis. An appropriate sample size for the latter trials would depend on which hypothesis the investigator wanted to reject, i.e., to confirm a big or small difference.
Thus, although a Neyman-Pearson calculation would probably have been unnecessary for any of the cited trials, Freiman et al. recommended that the “determination of sample size” should be done as cited in two references9,l5 that presented the Neyman-Pearson approach.
Chapter 24
24.1. For the big difference, set Zα at 1.645 for a one-tailed Po < .05. With π A = .08 and π B = .10, π =
.09 and so
n Š [1/(.02)2][1.645]2 [2(.09)(.91)] = 1068
and 2n would be 2136.
For the tiny difference, Zα can also be set at 1.645 for a one-tailed PE < .05. The calculation will be
n Š [1/(.02 − .005)2] [1.645]2[(.08)(.92) + (.10)(.90)] = 1968
and 2n would be 3935 — which is larger than the 2958 calculated previously in Exercise 23.2.
The main reason why sample size for the big difference did not become much smaller than before is the shift from δ = .03 to δ = .02. These values are quite small for a “big” difference, and the “gain” of having only one Zα term in the numerator of the calculation is offset by the loss of having to divide by the square of a smaller value of δ .
24.3. Using the data employed in Exercise 14.4, we can assume that the rate of total myocardial infarction is .02 per person in the placebo group, and will be reduced by 25% to a value of .015 in the aspirin group, so that δ = .005. The anticipated value for π will be (.02 + .015)/2 = .0175. If we choose a twotailed α = .05, so that Zα = 1.96, and a one-tailed β = .2, so that Zβ = 0.84, the Neyman-Pearson calculation becomes
n Š [1/(.005) 2][1.96 + 0.84]2[2(.0175)(.9825)] = 10783.9
This result is close to the actual sample size of about 11,000 in each group.
According to the original grant request, which was kindly sent to me by Dr. Charles Hennekens, the investigators expected that aspirin would produce “a 20% decrease in cardiovascular mortality” and a “10% decrease in total mortality” during “the proposed study period of four and one-half years.” The
1 |
|
expected 4-- yr CVD mortality among male physicians was estimated (from population data and the |
2 |
|
anticipated “age distribution of respondents”) as .048746. With a 20% relative reduction, δ = .009749 |
and the aspirin CV mortality would be .038997. The estimate for π |
would be (0.48746 + .038997)/2 = |
.043871. The α level was a one-tailed .05 so that Zα = 1.645 and β |
was also a one-tailed .05 so that |
Zβ = 1.645. With these assumptions, n Š [1/(.009749) 2] [1.645 + 1.645]2 [2(.043871) (.956129)] = 9554.2. The increase to about 11,000 per group was presumably done for “insurance.”
An important thing to note in the published “final report” of the U.S. study (see Exercise 10.1) is that neither of the planned reductions was obtained. The total cardiovascular death rate and the total death rate in the two groups were almost the same (with relative risks of .96 and .96, respectively). The significant reduction in myocardial infarction (which had not been previously designated as a principal endpoint) was the main finding that led to premature termination of the trial.
© 2002 by Chapman & Hall/CRC
Chapter 25
25.1. Individual answers.
25.3. The stochastic calculations rely on the idea that each comparison is “independent.” Thus, if treatments A, B, and C are each given independently and if the effects of the agents are not interrelated, the three comparisons are independent for A vs. B, A vs. C, and B vs. C. In a person’s body, however, the values of different chemical constituents are usually interrelated by underlying “homeostatic” mechanisms. Consequently, the different chemical values are not independent and cannot be evaluated with the [1 − (1 − α )k] formula used for independent stochastic comparisons. Another possible (but less likely) explanation is that the originally designated normal zones were expanded, after a plethora of “false positives” were found, to encompass a much larger range that would include more values in healthy people.
25.5. Individual answers.
Chapter 26
26.1. Stratum proportions in the total population are W1 = 101/252, W2 = 134/252, and W3 = 17/252. Direct adjusted rate for OPEN would be (W1)(6/64) + (W2)(10/59) + (W3)(1/3) = .1502. Analogous calculation for TURP yields .1438. [To avoid problems in rounding, the original fractions should be used for each calculation.]
26.3. Concato et al. used two different taxonomic classifications (Kaplan–Feinstein and Charlson et al.) for severity of co-morbidity, and three different statistical methods (standardization, composite-staging, and Cox regression) for adjusting the “crude” results. With each of the six approaches, the adjusted risk ratio was about 1. Nevertheless, despite the agreement found with these diverse methods, the relatively small groups in the Concato study led to 95% confidence intervals that reached substantially higher (and lower) values around the adjusted ratios. For example, with the Cox regression method in the Concato study the 95% confidence limits for the adjusted risk ratio of 0.91 extended from 0.47 to 1.75. The original investigators contended that this wide spread made the Concato result unreliable, and besides, it included the higher RR values, ranging from 1.27 to 1.45, that had been found in the previous claimsdata study.
26.5.
26.5.1.No. Crude death rate is “standardized” by the division of numerator deaths and denominator population.
26.5.2.(1) Guyana rates may be truly lower in each age-specific stratum; (2) many of the deaths in Guyana may not be officially reported; (3) Guyana rates may be the same or higher than U.S. in each age-specific stratum, but Guyana may have a much younger population.
26.5.3.The third conjecture in 26.5.2 is supported by the additional data. Despite the high infant mortality, if we subtract infant deaths from births, we find that Guyana increments its youth at
a much higher rate than the U.S. The annual birth |
increment per thousand population is |
38.1(1 – .0383) = 36.6 for Guyana and 18.2(1 − .0198) |
= 17.8 for U.S. |
26.7. Individual answers. |
|
Chapter 27
27.1.
27.1.1. By ordinary linear regression for the two variables in Figure 27.2. The tactic seems inappropriate. If a Pearson correlation coefficient is not justified, the results do not warrant use of an ordinary regression model.
© 2002 by Chapman & Hall/CRC
27.1.2. No. Duration of survival, which is the outcome (dependent) variable, has been put on the X rather than Y axis. Furthermore, the log scale makes values of interleukin < .1 look as though they were zero. If the X-axis data are divided essentially at tertiles and the Y data dichotomously at an interleukin level of 1, the tabular data that match the graph become
Duration of |
Proportion of |
Survival |
Interleukin Levels Š1 |
|
|
< 4 |
10/11 (91%) |
4–13 |
4/11 (36%) |
≥ 14 |
5/8 (63%) |
|
|
The declining trend is not at all as constant as implied by “r = −.51.” In fact, the trend is reversed in the high survival group.
A better way to show the results would be to orient the data properly and to divide interleukin roughly at the tertiles to get
Interleukin |
Proportion Who |
Level |
Survived < 10 days |
|
|
< 1 |
6/11 (55%) |
< 10 |
5/10 (50%) |
≥ 10 |
7/9 (78%) |
This result is also inconsistent with the constantly declining trend implied by the value of r = –.51 for interleukin vs. survival in the graph.
27.1.3. The graph says “N = 33,” but only 30 points are shown — as discovered in 27.1.2. The legend does not mention any hidden or overlapping points. What happened to the missing 3 people?
27.3.
27.3.1.System B seems preferable. It has a larger overall gradient (72 − 10 = 62% vs. 57 − 6 = 51%) and patients seem more equally distributed in the three stages.
27.3.2.Coding the three categories as −1, 0, +1, we can calculate slope with Formula [27.19]. For System A, the numerator is 221(3 − 70) − 85(53 − 122) = −14807 + 5865 = −8942. For System B, the numerator is 221(9 − 36) − 85(86 − 50) = –5967 − 3060 = −9027. Denominators are 221(122 + 53) − (122 − 53)2 = 33914 for System A and 221(50 + 86) − (86 − 50)2 = 28760 for System B. Slope is –8942/33914 = −.264 for A, and −9027/28760 = −.314 for B. Note that results are consistent with the “judgmental” evaluation (in AE27.3.1) that average gradient = .51/2 = .255 in
A and .62/2 = .31 in B. X2L can be calculated, as discussed in Section 27.5.5.2 (just after Equation [27.19]), from (numerator of slope)(slope)/NPQ, where NPQ = (221)(85)(221 − 85)/(221)2 =
(85)(136)/221 = 52.307. In System A, X2L = (−8942)(−.264)/52.307 = 45.13. In System B, X2L = (−9027)(−.314)/52.307 = 54.44. The result is highly stochastically significant in both, but is bigger
in System B.
27.3.3.Note that overall X2 = 45.5 in System A, so that the residual X2R is X2 − X2L = 45.51 − 45.13 = 0.38. In System B, the overall X2 is 54.89. The residual is 54.89 – 54.44 = 0.45.
27.3.3.Overall gradient (51% vs. 62%) has already been cited. Another index here could be lambda. Total errors would be 85. With System A, errors are (122 – 70) + 12 + 3 = 67. With system B, errors are (50 − 36) + 40 + 9 = 63. Thus, lambda is (85 − 67)/85 = .21 for System A, and (85 − 63)/85 = .26 for System B. Overall chi-square is 45.5 for System A and 54.9 for System
B. Finally, to quantify the hunch about “better” distribution, we can use Shannon’s H (recall Section 5.10.2). The proportions of data in each category of the denominator are .552, .208, and
.240 in System A and .226, .385, and .389 in system B. Shannon’s H is .433 for System A and
.465 for System B. Thus, System B gets better scores with all four methods.
27.3.4.In each system, each stratum has “stable” numbers, the gradient is monotonic, and the gradient is relatively “evenly” distributed (31% and 20% in System A; 25% and 37% in System B). Therefore, the overall gradients of 51% vs. 62% are probably the best simple comparison. If the strata had unstable numbers and/or dramatically uneven gradients, the best summary would probably be the linear slope.
©2002 by Chapman & Hall/CRC