Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

19.7.

19.7.1.The graph has too much scatter for the relationship to be biologically meaningful; also r is too low.

19.7.2.Each regression line seems to intersect the Y-axis at the stated intercepts of ~ 130, ~ 48, and ~ 120.

Chapter 20

20.1.

TABLE AE.20.1

Answer to Exercise 20.1.1

 

Incremental Differences

 

Increment First Wright

 

between lst and 2nd

Mean of Two

and First Mini-Wright

Subject

Wright

Mini-Wright

First Readings

 

Heading

 

 

 

 

 

 

 

1

4

13

503

 

 

18

2

2

15

412.5

 

 

35

3

4

12

518

 

 

4

4

33

16

431

6

5

6

0

488

 

 

24

6

54

25

578.5

 

 

43

7

2

96

388.5

49

8

11

10

411

62

9

12

16

654

 

 

8

10

4

13

439

 

 

12

11

3

12

424.5

 

 

15

12

23

21

641

30

13

8

33

263.5

7

14

14

10

477.5

1

15

13

9

218.5

 

 

81

16

51

20

386.5

73

17

6

8

439

 

 

24þ

Σ X2

7966

13479

 

 

d

= 1.88

Σ X2/17

468.59

792.88

 

 

Σ d2 /17 =

Σ X2 /17

21.64

28.16

 

24120/17 =

 

 

 

 

37.67

 

 

 

 

 

 

 

20.1.1.(See Table AE.20.1.) The mini-Wright seems more inconsistent. Its average squared increment in the two readings is 28.2 vs. 21.6 for the Wright.

20.1.2.(a) No, r shows good correlation, but not necessarily good agreement, and some points seem substantially discrepant from the line of “unity.”

(b)Check mean square increment of the two readings. Table AE.20.1 shows it is 37.67. Thus, the two sets of readings agree better with themselves than across readings with each other.

(c)Check mean increment. It is 1.88 (see Table AE.20.1) and thus not very biased. Check plot of increments vs. average magnitudes. A suitable graph will show that no major pattern is evident.

20.3.

20.3.1.(NYHA IV) + (CCS IV) BOTH = 51 + 50 11 = 90. Seems correct.

20.3.2.Wide disagreements in Groups I–IV and IV–I are disconcerting. They suggest that many patients without angina are disabled, and many patients with severe angina had no functional problems. This is either an odd clinical situation or a major disagreement in the two scales.

20.3.3.Not a useful table prognostically. It shows current status of patients after operation, but does not indicate pre-op status of those who lived or died. For predictions, we need to know outcome in relation to pre-op status.

©2002 by Chapman & Hall/CRC

20.5.

20.5.1.

(a)Value of X2 for “agreement” seems to be [(19 × 12) (5 × 1)]2 (37)/(24 × 13 × 20 × 17) = 17.345, which is an ordinary chi-square calculation. This is an inappropriate approach to these data.

(b)Value of X2 for “change” seems to be (5 1)2/(5 + 1) = 16/6 = 2.67. This is the McNemar

formula and would be appropriate for indexing disagreement, but these small numbers probably need a correction factor so that X2M = (|5 1| − 1)2/6 = 9/6 = 1.5, not 2.67.

(c)Better citations would be p o = (19 + 12)/37 = 31/37 = .84, or kappa. Because pe = [(20 ×

24) + (13 × 17)]/372 = .512, kappa = (.84 .512)/(l .512) = .672.

20.5.2. McNemar index of bias here is (51)/(5+1) = 4/6 = .67, suggesting that the therapists are more likely to make “satisfactory” ratings than the patients. Because of the small numbers, this index would have to be checked stochastically. The McNemar chi-square values in 20.5.1.(b), however, are too small (even without the correction factor) to exceed the boundary of 3.84 (= 1.962) needed for stochastic significance of X2 at 1 d.f.

Chapter 21

21.1. If we start with 10,000 school-aged children, of whom 4% are physically abused, the fourfold table will show

 

 

Confirmed Condition

 

 

Physical Exam

Abused

Not Abused

Total

 

 

 

 

 

 

Positive

384

768

1152

 

Negative

16

8832

8848

TOTAL

400

9600

10000

 

 

 

 

 

Of the 400 abused children, 96% (384) are detected by the physical exam; of the 9600 nonabused, 8% (768) have false-positive exams. Consequently, nosologic sensitivity = 96% (384/400), nosologic specificity = 92% (8832/9600), and diagnostic sensitivity (positive predictive accuracy) = 33% (384/1152).

21.3. Results from nomogram table show:

Value of +LR

Value of P(D)

Value of P(D T)

10

2%

17%

10

20%

75%

5

2%

8%

5

20%

53%

20

2%

35%

20

20%

83%

 

 

 

Usefulness of test cannot be determined without knowing its negative predictive accuracy. The nomogram technique, however, does not seem easy to use, and many clinicians might prefer to do a direct calculation, if a suitable program were available to ease the work. For example, recall that posterior odds = prior odds × LR. For P(D) = .02, prior odds = .02/.98 = .0196. For LR = 10, posterior odds = 10 × .0196 =

.196. Because probability = odds/(1 + odds), posterior probability = .196/(1 + .196) = .16. This is close to the 17% estimated from the nomogram. For a working formula, convert P(D) to prior odds of P(D)/[1 – P(D)]; multiply it by LR to yield posterior odds, and convert the latter to posterior probability as LR[P(D)/{1 – P(D)}]/(1 + LR[P(D)/{(1 – P(D)}]). The latter algebra simplifies to P(D T) = LR[P(D)]/{l – P(D) + LR[P(D)]}. Thus, for LR = 20 and P(D) = 20%, P(D T) = [20(.20)]/[1 – .2 + (20)(.20)] = .83, as shown in the foregoing table.

© 2002 by Chapman & Hall/CRC

21.5. Prevalence depends on total group size, N, which is not used when likelihood ratios are calculated from the components T and S in N = T + S.

21.7. Individual answers. 21.9. Individual answers.

Chapter 22

22.1.

22.1.1. Calculations for fixed-interval method:

 

 

Died

 

 

 

 

 

Cumulative

 

Alive at

during

Lost to

Withdrawn

Adjusted

Proportion

Proportion

Survival

Interval

Beginning

Interval

Follow-Up

Alive

Denominator

Dying

Surviving

Rate

 

 

 

 

 

 

 

 

 

0–1

126

47

4

15

116.5

0.403

0.597

0.597

1–2

60

5

6

11

51.5

0.097

0.903

0.539

2–3

38

2

0

15

30.5

0.066

0.934

0.503

3–4

21

2

2

7

16.5

0.121

0.879

0.443

4–5

10

0

0

6

7.0

0.000

1.000

0.443

 

 

 

 

 

 

 

 

 

22.1.2. Calculations for direct method:

 

Censored People

Cumulative

Cumulative

Interval

Removed

Mortality Rate

Survival Rate

 

 

 

 

0–1

19

47/107 = 0.439

0.561

1–2

17

52/90 = 0.578

0.422

2–3

15

54/75 = 0.720

0.280

3–4

9

56/66 = 0.848

0.151

4–5

6

56/60 = 0.933

0.067

 

 

 

 

22.1.3. The distinctions clearly illustrate the differences between analyses in which the censored people do or do not contribute to the intervening denominators. At the end of the fifth year of follow-up in this cohort of 126 people, we clinically know only two things for sure: 56 people have died and 60 people have actually been followed for at least 5 years. Everyone else in the cohort is either lost or in the stage of censored suspended animation called withdrawn alive. Assuming that the 12 (= 4 + 6 + 0 + 2) lost people are irretrievably lost, we still have 54 people (= 15 + 11 + 15 + 7 + 6) who are percolating through the follow-up process at various stages of continuing observation before 5 years. The cumulative survival rate of 0.067 in the direct method is based on the two clearly known items of clinical information at 5 years. The cumulative survival rate of 0.443 in the fixed-interval method is based on the interval contributions made by all the censored people. In fact, if you take 33 as half the number of the 66 (= 12 + 54) censored people and add 33 to the direct numerator and denominator at 5 years, the cumulative survival becomes 0.398; and the two methods agree more closely.

Rather than dispute the merits of the two methods, we might focus our intellectual energy on the basic scientific policy here. Why are 5-year survival rates being reported at all when the 5-year status is yet to be determined for almost as many people (54) as the 60 for whom the status is known? Why not restrict the results to what was known at a 1-year or at most a 2-year period, for which the fate of the unknown group has much less of an impact? A clinician predicting a patient’s 5-year chances of survival from these data might be excessively cautious in giving the direct estimate of 0.067. On the other hand, the clinician would have a hard time pragmatically defending an actuarial estimate of 0.443, when the only real facts are that four people of a potential 60 have actually survived for 5 years.

© 2002 by Chapman & Hall/CRC

22.3. See Figure EA.22.3. 22.5. Individual answers.

FIGURE EA.22.3

Graph showing answer to Exercise 22.3. The Ka- plan-Meier “curve” is shown as a step-function, without vertical connections.

100

 

 

 

 

 

KAPLAN-MEIER

 

 

 

 

 

 

 

 

 

 

90

 

 

 

 

BERKSON-GAGE CURVE

PERCENT

80

 

 

 

 

 

 

 

 

 

70

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SURVIVAL

60

 

 

 

 

 

 

 

 

 

50

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CUMULATIVE

40

 

 

 

 

 

 

 

 

 

30

 

 

 

 

 

 

 

 

 

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

0

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

 

0.5

Chapter 23

23.1. In 1936, telephones were not yet relatively inexpensive and ubiquitous in the U.S. Owners of telephones were particularly likely to be in higher socioeconomic classes and, therefore, Republicans. In addition, readers of the “upscale” literary magazine were likely to be college graduates, a status which, in 1936, was also commonly held by Republicans. The use of these two “sampling frames,” despite the random selection of telephone owners, produced an overwhelmingly Republican response.

23.3.

23.3.1. Authors did not indicate why they made no recommendations. Perhaps they were worried about “β error” because the confidence interval around the observed result goes from –0.120 to +.070. [Observed mortality rates were 15/100 = .150 for propranolol and 12/95 = .126 for placebo, so that result was lower in placebo by .126 .150 = −.024. Standard error under H0 is (27 )(168 ) ⁄[195(100 )(95 )] = .049 and two-tailed 95% C.I. would be .024 ± (1.96)(.049) = −

.024 ± .096, which extends from 0.12 to +.072. Thus, the result might be regarded as a stochastic variation for the possibility that propranolol was 7.2% better. The authors also state that their population was unusually “mild” in prognosis. [Perhaps propranolol works better in patients with worse prognoses.]

23.3.2. The most direct, simple approach is as follows. If mortality rate with propranolol is 10% below that of placebo, the true propranolol mortality rate is 12.6% 10% = 2.6%. In the 100 patients treated with propranolol, we would expect 2.6 deaths and 97.4 survivors. Applying a chisquare goodness-of-fit test to this “model,” we would have

2 = (15 – 2.6 )2 (85 – 97.4)2 = + = <

X ------------------------- + ---------------------------- 59.14 1.58 60.72 and P .001

2.697.4

The chance is therefore tiny that the observed results arose by chance from such a population. In a more “conventional” way, we can use the formula

ZH

=

δ – do

--------------------------------------------------------------[ p A q A n A ] + [pA qA nB ]

 

 

where δ − do would be .10 (.024) = .124, and

[(.15)(.85) ⁄100 ] + [(12 95)(83 95) ⁄95 ] = .04936

so that ZH = .124/.04936 = 2.51, P < .025. We can reject the alternative hypothesis that propranolol mortality was 10% below that of placebo.

23.5. Figure E.23.5 shows that the values of do (labeled PC PT) ranged from .20 to +.20, with negative values favoring treatment. Without knowing the actual values of P C and PT, we cannot decide how large δ should be for quantitative significance. Most investigators would probably accept do = .10 as quantitatively significant, regardless of magnitudes for PC and PT. Using the principle that δ = θ PC and letting θ = .5, we would have δ ≥ .1 for PC .2, δ ≥ .05 for PC .1, and δ ≥ .025 for PC .05. With

© 2002 by Chapman & Hall/CRC

these boundaries, most of the “positive” trials that favored treatment in Figure E.23.5 seemed to have dO > δ and certainly had upper confidence limits that exceeded δ , although the lower confidence interval included 0. Accordingly, the main defect in these trials was capacity, not power. A recommendation to use the Neyman-Pearson approach would needlessly inflate the sample sizes. Conversely, the top eight negative trials, favoring the control group would seem to have their quantitative “nonsignificance” confirmed by upper confidence limits that were < δ . The other “negative” trials in Figure E.23.5 seemed to have upper confidence limits that included δ and lower limits that included 0. The investigator would thus have to concede both the original null and the alternative hypothesis. An appropriate sample size for the latter trials would depend on which hypothesis the investigator wanted to reject, i.e., to confirm a big or small difference.

Thus, although a Neyman-Pearson calculation would probably have been unnecessary for any of the cited trials, Freiman et al. recommended that the “determination of sample size” should be done as cited in two references9,l5 that presented the Neyman-Pearson approach.

Chapter 24

24.1. For the big difference, set Zα at 1.645 for a one-tailed Po < .05. With π A = .08 and π B = .10, π =

.09 and so

n Š [1/(.02)2][1.645]2 [2(.09)(.91)] = 1068

and 2n would be 2136.

For the tiny difference, Zα can also be set at 1.645 for a one-tailed PE < .05. The calculation will be

n Š [1/(.02 .005)2] [1.645]2[(.08)(.92) + (.10)(.90)] = 1968

and 2n would be 3935 — which is larger than the 2958 calculated previously in Exercise 23.2.

The main reason why sample size for the big difference did not become much smaller than before is the shift from δ = .03 to δ = .02. These values are quite small for a “big” difference, and the “gain” of having only one Zα term in the numerator of the calculation is offset by the loss of having to divide by the square of a smaller value of δ .

24.3. Using the data employed in Exercise 14.4, we can assume that the rate of total myocardial infarction is .02 per person in the placebo group, and will be reduced by 25% to a value of .015 in the aspirin group, so that δ = .005. The anticipated value for π will be (.02 + .015)/2 = .0175. If we choose a twotailed α = .05, so that Zα = 1.96, and a one-tailed β = .2, so that Zβ = 0.84, the Neyman-Pearson calculation becomes

n Š [1/(.005) 2][1.96 + 0.84]2[2(.0175)(.9825)] = 10783.9

This result is close to the actual sample size of about 11,000 in each group.

According to the original grant request, which was kindly sent to me by Dr. Charles Hennekens, the investigators expected that aspirin would produce “a 20% decrease in cardiovascular mortality” and a “10% decrease in total mortality” during “the proposed study period of four and one-half years.” The

1

 

expected 4-- yr CVD mortality among male physicians was estimated (from population data and the

2

 

anticipated “age distribution of respondents”) as .048746. With a 20% relative reduction, δ = .009749

and the aspirin CV mortality would be .038997. The estimate for π

would be (0.48746 + .038997)/2 =

.043871. The α level was a one-tailed .05 so that Zα = 1.645 and β

was also a one-tailed .05 so that

Zβ = 1.645. With these assumptions, n Š [1/(.009749) 2] [1.645 + 1.645]2 [2(.043871) (.956129)] = 9554.2. The increase to about 11,000 per group was presumably done for “insurance.”

An important thing to note in the published “final report” of the U.S. study (see Exercise 10.1) is that neither of the planned reductions was obtained. The total cardiovascular death rate and the total death rate in the two groups were almost the same (with relative risks of .96 and .96, respectively). The significant reduction in myocardial infarction (which had not been previously designated as a principal endpoint) was the main finding that led to premature termination of the trial.

© 2002 by Chapman & Hall/CRC

Chapter 25

25.1. Individual answers.

25.3. The stochastic calculations rely on the idea that each comparison is “independent.” Thus, if treatments A, B, and C are each given independently and if the effects of the agents are not interrelated, the three comparisons are independent for A vs. B, A vs. C, and B vs. C. In a person’s body, however, the values of different chemical constituents are usually interrelated by underlying “homeostatic” mechanisms. Consequently, the different chemical values are not independent and cannot be evaluated with the [1 (1 − α )k] formula used for independent stochastic comparisons. Another possible (but less likely) explanation is that the originally designated normal zones were expanded, after a plethora of “false positives” were found, to encompass a much larger range that would include more values in healthy people.

25.5. Individual answers.

Chapter 26

26.1. Stratum proportions in the total population are W1 = 101/252, W2 = 134/252, and W3 = 17/252. Direct adjusted rate for OPEN would be (W1)(6/64) + (W2)(10/59) + (W3)(1/3) = .1502. Analogous calculation for TURP yields .1438. [To avoid problems in rounding, the original fractions should be used for each calculation.]

26.3. Concato et al. used two different taxonomic classifications (Kaplan–Feinstein and Charlson et al.) for severity of co-morbidity, and three different statistical methods (standardization, composite-staging, and Cox regression) for adjusting the “crude” results. With each of the six approaches, the adjusted risk ratio was about 1. Nevertheless, despite the agreement found with these diverse methods, the relatively small groups in the Concato study led to 95% confidence intervals that reached substantially higher (and lower) values around the adjusted ratios. For example, with the Cox regression method in the Concato study the 95% confidence limits for the adjusted risk ratio of 0.91 extended from 0.47 to 1.75. The original investigators contended that this wide spread made the Concato result unreliable, and besides, it included the higher RR values, ranging from 1.27 to 1.45, that had been found in the previous claimsdata study.

26.5.

26.5.1.No. Crude death rate is “standardized” by the division of numerator deaths and denominator population.

26.5.2.(1) Guyana rates may be truly lower in each age-specific stratum; (2) many of the deaths in Guyana may not be officially reported; (3) Guyana rates may be the same or higher than U.S. in each age-specific stratum, but Guyana may have a much younger population.

26.5.3.The third conjecture in 26.5.2 is supported by the additional data. Despite the high infant mortality, if we subtract infant deaths from births, we find that Guyana increments its youth at

a much higher rate than the U.S. The annual birth

increment per thousand population is

38.1(1 – .0383) = 36.6 for Guyana and 18.2(1 .0198)

= 17.8 for U.S.

26.7. Individual answers.

 

Chapter 27

27.1.

27.1.1. By ordinary linear regression for the two variables in Figure 27.2. The tactic seems inappropriate. If a Pearson correlation coefficient is not justified, the results do not warrant use of an ordinary regression model.

© 2002 by Chapman & Hall/CRC

27.1.2. No. Duration of survival, which is the outcome (dependent) variable, has been put on the X rather than Y axis. Furthermore, the log scale makes values of interleukin < .1 look as though they were zero. If the X-axis data are divided essentially at tertiles and the Y data dichotomously at an interleukin level of 1, the tabular data that match the graph become

Duration of

Proportion of

Survival

Interleukin Levels Š1

 

 

< 4

10/11 (91%)

4–13

4/11 (36%)

≥ 14

5/8 (63%)

 

 

The declining trend is not at all as constant as implied by “r = −.51.” In fact, the trend is reversed in the high survival group.

A better way to show the results would be to orient the data properly and to divide interleukin roughly at the tertiles to get

Interleukin

Proportion Who

Level

Survived < 10 days

 

 

< 1

6/11 (55%)

< 10

5/10 (50%)

≥ 10

7/9 (78%)

This result is also inconsistent with the constantly declining trend implied by the value of r = –.51 for interleukin vs. survival in the graph.

27.1.3. The graph says “N = 33,” but only 30 points are shown — as discovered in 27.1.2. The legend does not mention any hidden or overlapping points. What happened to the missing 3 people?

27.3.

27.3.1.System B seems preferable. It has a larger overall gradient (72 10 = 62% vs. 57 6 = 51%) and patients seem more equally distributed in the three stages.

27.3.2.Coding the three categories as 1, 0, +1, we can calculate slope with Formula [27.19]. For System A, the numerator is 221(3 70) 85(53 122) = −14807 + 5865 = −8942. For System B, the numerator is 221(9 36) 85(86 50) = –5967 3060 = −9027. Denominators are 221(122 + 53) (122 53)2 = 33914 for System A and 221(50 + 86) (86 50)2 = 28760 for System B. Slope is –8942/33914 = −.264 for A, and 9027/28760 = −.314 for B. Note that results are consistent with the “judgmental” evaluation (in AE27.3.1) that average gradient = .51/2 = .255 in

A and .62/2 = .31 in B. X2L can be calculated, as discussed in Section 27.5.5.2 (just after Equation [27.19]), from (numerator of slope)(slope)/NPQ, where NPQ = (221)(85)(221 85)/(221)2 =

(85)(136)/221 = 52.307. In System A, X2L = (8942)(.264)/52.307 = 45.13. In System B, X2L = (9027)(.314)/52.307 = 54.44. The result is highly stochastically significant in both, but is bigger

in System B.

27.3.3.Note that overall X2 = 45.5 in System A, so that the residual X2R is X2 X2L = 45.51 45.13 = 0.38. In System B, the overall X2 is 54.89. The residual is 54.89 – 54.44 = 0.45.

27.3.3.Overall gradient (51% vs. 62%) has already been cited. Another index here could be lambda. Total errors would be 85. With System A, errors are (122 – 70) + 12 + 3 = 67. With system B, errors are (50 36) + 40 + 9 = 63. Thus, lambda is (85 67)/85 = .21 for System A, and (85 63)/85 = .26 for System B. Overall chi-square is 45.5 for System A and 54.9 for System

B. Finally, to quantify the hunch about “better” distribution, we can use Shannon’s H (recall Section 5.10.2). The proportions of data in each category of the denominator are .552, .208, and

.240 in System A and .226, .385, and .389 in system B. Shannon’s H is .433 for System A and

.465 for System B. Thus, System B gets better scores with all four methods.

27.3.4.In each system, each stratum has “stable” numbers, the gradient is monotonic, and the gradient is relatively “evenly” distributed (31% and 20% in System A; 25% and 37% in System B). Therefore, the overall gradients of 51% vs. 62% are probably the best simple comparison. If the strata had unstable numbers and/or dramatically uneven gradients, the best summary would probably be the linear slope.

©2002 by Chapman & Hall/CRC