Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
investigator has observed that a particular treatment failed in only 1 (4%) of 23 patients. An evaluator who thinks that the true rate of failure should be 15% then suspects the investigator of either not telling the truth or having a peculiar group of patients. To determine whether the observed result is within the limits of chance, we can find the relative frequencies with which 1 failure would be expected in 23 treatments if the true rate of failure is 15%, with the success rate being .85 (= 85%).
If .85 is the success rate, the chance of observing no failures (i.e., all successes) in 23 patients would be (.85)23 = .0238. The chance of observing 1 failure in 23 patients would be 23(.85)22 (.15) = .0966. Thus, if the true failure rate is 15%, there is a chance of .0966 + .0238 = .1204, or 12% of observing as few as 1 or no failures in 23 people. The decision about whether the investigator is telling the truth depends on how you interpret the chance of 12%.
8.4.2.7 Use of Reference Tables — Because the Clopper-Pearson calculations are cumbersome (particularly for more than one or two events), they are seldom used routinely for determining Bernouilli confidence intervals. The main reason for discussing them is to let you know of their existence as the “gold standard” approach for small samples and for proportions that are remote from the meridian value of .5.
To save time and effort, special tables have been constructed to show “exact confidence limits” for events or proportions calculated with binomial distributions. For example, consider the observed value of 4/20 = .20. In an “exact-confidence-limits” table for the binomial distribution (cf. page 89 of the Geigy Scientific Tables2), the 95% confidence interval for 4/20 is listed as extending from .0573 to
.4366. Note that this result differs slightly from the exact boundary of .05 to .35 that was noted earlier for the Clopper-Pearson calculations shown here in Section 8.4.2.5. The reason for the disagreement is that the results in the Geigy tables are calculated with a complex iterative method for working with Formula [8.4]. The subsequent solutions can then produce values, such as .573 or .4366, that cannot be precisely achieved with a sample of exactly 20 members.
As another example, to answer the concern posed in Section 8.4.2.6 about a failure rate of 15% in 23 patients, page 103 of the Geigy tables shows that the 95% “Confidence limits for Np” extend from 0 to 8 persons if the true proportion is .15 in a group of 23 persons. Consequently, the observed failure of only 1 in 23 persons is within the 95% confidence interval for a .15 rate of failure.
The main point of this subsection is simply to let you know that appropriate tables exist for diverse attributes of binomial distributions. If you later find yourself in situations where the tables can be helpful, you can then learn the pertinent operational details and interpretations.
8.5Mental and “No Events” Short-Cut Approximations
Two simple techniques are available as “screening” or short-cut methods for approximating the confidence interval of certain proportions. One technique can often be done mentally, without a calculator; the other offers a simple rule for the confidence interval around a proportion of 0, in circumstances where no events have been observed.
8.5.1Mental Approximation
The standard deviation,
pq , is close to .5 for a wide span of proportions ranging from .3 to .7. For
example, if p = .4, pq = (.4)(.6) = .24 and
pq = .49. If p = .7, pq = .21 and
pq = .46. For a 95%
confidence interval, Zα = 1.96 can be approximated as 2. Consequently, in the span of proportions from
.3 to .7, |
Zα pq can be approximated as ~(2)(.5) = 1. Therefore, the confidence interval component Zα |
pq ⁄n |
can be approximated as 1 ⁄n . |
By estimating the square root of 1/n mentally, you can promptly get a good idea of the scope of the
confidence interval component. If you feel insecure about this mental feat, bear in mind that whenn = 25,
1 ⁄n = 1/5 = .2. When n = 100,
1 ⁄100 = .1. Thus, for group sizes between 25 and 100, the confidence
interval component will lie between .2 and .1. With this tactic, you can promptly get a reasonably good
© 2002 by Chapman & Hall/CRC
estimate of the confidence interval for any proportions ranging between .3 and .7. [For p or q = .2 or
.8,
pq = .4; and for p or q = .1 or .9,
pq = .3. The corresponding estimates for the latter proportions will be 80% and 60%, respectively, of the foregoing results for
pq ~.5.]
Examples: Suppose the observed proportion is 42/76 = .55. As 76 is close to 81, let
1 ⁄n = 1/9 = .11. The approximate 95% confidence interval, calculated mentally becomes .55 ± .11 and goes from .44 to .66. The actual calculation for standard error here is
(42 )(34 ) ⁄763 = .057 and the actual
Zα pq ⁄n value is (1.96)(.057) = .11. If the observed proportion is 8/65 = .12, use |
64 = 8, and 1/8 = |
.125, then multiply by .60 to get .075 as the approximate value. Its calculated |
result will be 1.96 |
(8)(57 ) ⁄653 = .080. |
|
8.5.2“Rumke’s Rule” for “No Events”
If no adverse events occur in a clinical trial, the investigators may want assurance about the likelihood that no such events will occur in the future. At a scientific level, the assurance will depend on the types of future patients, dosage, and additional treatments. At a purely statistical level, however, numerical calculations are possible.
For example, suppose no side effects were observed in 40 patients. If the true rate of side effects is
.01, the observed event has a chance probability of (.99)40 = .67. If the true rate is .04, the probability is (.96)40 = .20. For a rate of .07, the probability is (.93)40 = .055; and for a rate of .08, the probability is (.92)40 = .036. Thus, because .05 lies between the latter two probabilities, a one-sided 95% confidence interval could be estimated that the true rate is as high as something between 7% and 8%.
The calculations can be greatly simplified by a “rule of thumb” proposed by C. L. Rumke.3 In Rumke’s rule, if “nothing” has been observed in n patients, the one-sided upper 95% confidence limit is 3/n. Thus, for 40 patients, 3/40 = .075 = 7.5%. In a trial with 120 patients, the corresponding result is 3/120
=.025 = 2.5%. [To check this, note that (.975)120 = .048, consistent with a 95% confidence interval.] The formulas in Rumke’s rule are changed to 4.61/n for a 99% confidence interval and 5.30/n for a
99.5% confidence interval.
8.6“Continuity” and Other Corrections for Mathematical Formulas
For a binary proportion expressed as p = r/n, the commonly used Gaussian confidence interval is called a Wald-type interval, calculated with Formula [8.2] as
p ± Zα
pq ⁄n
where Zα is chosen from a Gaussian distribution to provide a 100 (1 − α ) percent level for the interval.
8.6.1Customary “Continuity Correction”
Aside from the problems already noted in Section 8.4.2.4, this formula is inappropriate because a Gaussian curve is continuous, whereas a distribution of binomial proportions has the discrete “noncon - tinuous” values shown in Tables 8.1 through 8.4. The customary solution recommended for this problem is an incremental “continuity correction,” 1/(2n), that reduces the size of Z and expands the width of the confidence interval. Thus, the critical ratio noted in Formula [8.3] would be corrected to
Z = |
p – π – (1 ⁄2n ) |
[8.5] |
-------------------------------------- |
pq
-----
n
The confidence interval around p would extend from
p − Zα pq ⁄n − (1/2n) to p + Zα pq ⁄n + (1/2n) |
[8.6] |
© 2002 by Chapman & Hall/CRC
and the net effect is to widen the confidence interval by twice the value of 1/2n.
If we applied this correction factor to the proportion observed earlier as .6 in a group of 5 members.
The 95% confidence interval would be |
|
|
|
|||
.60 |
± |
|
(1.96 ) (.60 )(.40 ) + |
1 |
|
= |
|
|
|||||
|
|
|
5 |
10 |
|
|
.60 |
± [.429 + .100] = .60 ± .529 |
|||||
and would go from .071 to the impossible value of 1.129. [Without the correction, the interval would have been .60 ± .429, and would go from .171 to 1.029.]
Like many parametric “corrections” and “estimations,” this one works better “on average” than in isolated individual instances. For example, in Section 8.4.2.5, the Bernouilli 95% confidence interval around the proportion 4/20 = .20 went from .05 to .35. The Gaussian interval was larger, going from
.025 to .375. With the continuity correction, the Gaussian interval would become even larger and more disparate from the true result. On the other hand, an argument can be offered that the “true” result cited here is not strictly accurate. If you add the relative frequencies for the zone of proportions from .05 to
.35 in Table 8.4, they occupy .9561 of the data, so that they slightly exceed a 95% confidence interval. If the Gaussian 95% interval of .025 to .375 is enlarged, things will be even less accurate.
Like many other “corrections,” the customary continuity correction has relatively little effect at large sample sizes, where 1/2n becomes very small; but at small sample sizes, if accuracy is really important, the exact Bernouilli method is the “gold standard.” Thus, the continuity correction is mentioned here mainly so you will have heard of it. You need not use it in any of the “homework” exercises, but you may sometimes meet it in the medical literature.
8.6.2Other Corrections and Appraisals
In a modified continuity correction, proposed by Blyth and Still,4 the
n factor in the denominator of Formula [8.6] is replaced by a smaller term, which enlarges the size of the interval. Another modification, lauded in a thorough review by Vollset,5 is a continuity-corrected “score interval,” which involves “an approximation using the limit estimate instead of the point estimate in the standard error.”
After carefully considering 13 mathematical methods for calculating the confidence interval of a binary (or “binomial”) proportion, Vollset “strongly discouraged” both the usually recommended Gaussian formula and the customary continuity-corrected version. He concluded that the upper and lower limits are best obtained with an exact Clopper-Pearson computation, as demonstrated earlier in Section 8.4.2.5 and Table 8.4. Among mathematical formulas, Vollset preferred the continuity-corrected score interval and said that a formula for “Pratt’s approximation” comes quite close to the Clopper-Pearson results and is easier to calculate.
8.7Interpretation of Stability
In Chapter 7, the stability of a central index was evaluated in two ways. In one way, the coefficient of stability was checked as a ratio of the standard error divided by the central index. The other way relied on avoiding or including an extrinsic boundary value such as >50% in an election poll.
8.7.1Impact of Directional Variations
The coefficient of stability is easy to determine for dimensional data as (s/
n )/ X because the central index has a unique, single value. With binary proportions, however, the central index is ambiguous: it can be either p or q; and the coefficient of stability can be dramatically different according to which value, p or q, is used as the reference.
© 2002 by Chapman & Hall/CRC
With the standard error cited as
pq ⁄n , the coefficient of stability will depend on whether it is
referred to p or q. It will be pq ⁄n /p = |
q ⁄p ⁄ n |
= q/np |
= q ⁄r if referred to p, and pq ⁄n /q = |
p ⁄nq = p ⁄(n – r ) if referred to q. |
|
|
(2 ⁄7)(5 ⁄7 ) ⁄7 = .1707. The coefficient |
For example, the Gaussian standard error of 2/7 = .2857 is |
|||
of stability will be .1707/.2857 = .598 |
if referred |
to p and .1707/.7143 = .239 if referred to q. An |
|
alternative way of getting these same results is with the
q ⁄r formula as
.7143 ⁄2 = .598 and with
p ⁄(n – r) as
.2857 ⁄5 = .239. With either coefficient of stability, the proportion 2/7 is highly unstable, but the existence of two possible values can be disconcerting.
[One way of trying to get around the problem is to use the geometric mean,
pq , of the two values for p and q. With
pq as the denominator, the ratio for the coefficient of stability becomes
pq ⁄n ⁄
pq =
1 ⁄n , which is the same value used in Section 8.5.1 for mentally approximating the scope of a 95%
confidence interval for proportions ranging from .3 to .7.]
8.7.2Impact on Small Proportions
Ambiguous results for the coefficient of stability can create big problems for small proportions. A small proportion can be defined, according to many beliefs, as having a value below .1. You may want to set its upper boundary, however, at a smaller level such as .05. Most people would agree that .01 is a small proportion, and agreement would probably be unanimous if the level were set at .001.
To illustrate the problems of small proportions, suppose we evaluate the stability of the fatality proportion 4/100 = .04. The Gaussian standard error for this proportion is
(.04 )(.96 ) ⁄100 = .0204. When referred to q = .96, the coefficient of stability is .0204/.96 = .021 — a relatively small value. When referred to p = .04, however, the analogous result is .0204/.04 = .5103, which is 24 times larger.
Furthermore, if one of the four fatalities is removed by the jackknife procedure, the reduced proportion will be 3/99 = .0303, which is a proportionate drop of (.04 − .0303)/.04 = .24, or almost 25% of the original value. This drastic change would occur, however, in only 4 of the 100 jackknife-reduced proportions. In the other 96 instances, the reduced proportion will be 4/99 = .0404, which is only slightly altered from the original value of .04. Nevertheless, the potential for getting a drastically altered result might be too distressing to make us want to rely on the stability of 4/100. If the observed proportion has a value such as 2/1000, the potential instability of the reduced proportion would be even greater, although its probability of occurrence is even smaller.
8.7.3Effect of Small Numerators
The impact of potentially drastic changes is responsible for the statistical doctrine that the trustworthiness of data for proportions (or rates) depends on the size of the numerators. If the numerators are small, as in 4/100, the results will be relatively unstable. The same proportion of .04, however, would be more stable if obtained from 40/1000 or from 400/10,000.
To illustrate the problem, suppose we arrange so that the smaller value of r or n − r is used as r. The cited proportion (as is common in death rates) will be p = r/n. The coefficient of stability for p will be the formula demonstrated earlier (Section 8.7.1) as
q ⁄r =
[(n – r) ⁄n]/r . If n – r is substantially larger than r, the factor 
(n – r) ⁄n will be close to 1. The coefficient will then be essentially 1/
r . Thus, if r = 5, the coefficient will be approximately 1/
5 = 0.45, which greatly exceeds the boundary of .05 for stability. A coefficient less than the .05 boundary will be achieved only if r exceeds 400, because
1/ 400 = .05. |
|
With the jackknife procedure, the main threat |
of instability occurs when the reduced value is |
(r – 1)/(n – 1). In this instance, the proportionate |
change from the original value will be [(r/n) − |
(r – 1)/(n − 1)]/[r/n] = [q/(n − 1)]/[r/n]. If n is large, (n − 1)/n approximates 1, and so the proportionate change will be essentially q/r. The q/r formula immediately shows the potential for a very large change if a small proportion, i.e., p < .1, also has a small numerator. For example, the potential for an almost 25% change in the 4/100 proportion can be determined rapidly as .96/4 = .24. If the proportion .04 came from 40/1000, however, the extreme jackknife instability would be only .96/40 = .024. On the
© 2002 by Chapman & Hall/CRC
other hand, with Gaussian calculations the standard error for 40/1000 would be
(.04 )(.96 ) ⁄1000 =
.006197, and its coefficient of stability will be the “unstable” value of .006197/.04 = .155. If the .04 came from 400/10,000, the standard error would be .00196 and the coefficient would be “stable,” at a value of .049.
For this reason, calculations of sample size and subsequent determinations of stability will be strongly affected by the number of “events” that become the numerator value of r in the smaller complement of binary proportions.
8.7.4Events-per-Variable Ratio
The instability produced by small numerators is worth remembering in statistical methods that use multivariable analysis. You become confronted with those problems when medical journals contain results published for binary data (such as survival rates) that were analyzed with such multivariable methods as logistic regression or proportional hazards (Cox) regression. Although you may not yet know how the latter methods work, you can already use the “small numerator” principle to evaluate the stability of the claimed results. If the multivariable procedure contains k variables and e events in the outcome variable, such as deaths, the ratio of e/k is called the event-per-variable (epv) ratio. Harrell et al.6 have proposed, and Peduzzi et al.7 have empirically demonstrated, that the analytic results are unstable if this ratio is ≤ 10.
Many published multivariable results contain an elaborate array of statistical statements, but give no attention to the importance of the epv ratio.8 Thus, without knowing anything formal about multivariable procedures, you can easily look at the data, determine the values of k and e, and mentally calculate epv as e/k. If the result is too small, be skeptical of the authors’ claims. (When you determine e, remember that it is the smaller of the two types of binary events. Thus, in a study of 4 predictor variables for 180 survivors in a total of 210 patients, e = 210 − 180 = 30 deaths, and epv = 30/4 = 7.5.)
8.8Calculations of Sample Size
When sample size is calculated for a single group, the goal is to get a result having a specified zone of error, or tolerance. The easiest (and conventional) approach is to use the Gaussian technique cited here in Formula [8.2]. The calculated size of the sample should allow suitable confidence about the estimated value of π .
8.8.1Political Polls
In political polls, the candidate’s manager wants to spend the least amount of money that will yield a confident result. The required steps are to estimate the apparent value of p, choose a magnitude of tolerance, select a level of confidence, and apply the formula for Z α 
pq ⁄n . For example, suppose the manager, estimating that her candidate will receive 60% of the popular vote, wants 95% confidence (or assurance) that the actual vote is not below 51%. In this instance, the value of
.60 ± 1.96
(.60 )(.40 ) ⁄n
must not go below .51. The calculation would be
.60 − 1.96
(.60 )(.40 ) ⁄n ≥ .51
which becomes
(.60 − .51)/1.96 ≥
(.60 )(.40 ) ⁄n
© 2002 by Chapman & Hall/CRC
After all the algebra is developed, and both sides are squared, this becomes
.0021085 ≥ .24/n
and the result is
n ≥ 114
The random sample for the poll should contain at least 114 persons.
8.8.2General Formula
ˆ
The general formula for sample size of a single group can be developed as follows: If p is the estimated proportion, with b as the lowest acceptable boundary and 1 − α as the desired level of confidence, the formula will be
ˆ |
− b)/Zα |
≥ |
ˆ ˆ |
|||
( p |
pq ⁄n |
|||||
The desired sample size is |
|
|
|
|
|
|
|
n ≥ |
ˆ ˆ |
Zα |
|
2 |
|
|
pq |
|
----------- |
[8.7] |
||
|
|
|
ˆ |
|
|
|
|
|
|
|
p – b |
|
|
Formula [8.7] shows that n will increase for larger levels of confidence (where Zα increases), and for
ˆ
smaller increments between the estimated p and the boundary it must exceed.
For example, suppose the candidate’s apparent proportion of support was 53% rather than 60%. If the manager were content with a 90% confidence interval and 50.5% as the lower boundary of tolerance,
the required sample size would be |
|
|
|
|
|
n ≥ |
(.53)(.47) |
|
1.645 |
|
2 |
|
----------------------- |
~ 1079. |
|||
|
|
.53 – .505 |
|
||
The calculations are sometimes simplified by assuming that p = q = .5, choosing a level of Zα , and letting Zα 
pq ⁄n be designated as a “margin of error.” With this approach, if e is the margin of error, we want
|
Zα |
pq |
e |
|
----- ≤ |
||
|
|
n |
|
and so |
|
|
|
|
n ≥ |
pq Zα2 |
[8.8] |
|
-------------e2 |
||
|
|
|
|
For a 95% confidence interval and a 2% margin of error, the sample size will be |
|||
n ≥ |
(.5)(.5)(1.96 )2 |
||
------------------------------------(.02 )2 |
~ 2401. |
||
|
|
||
© 2002 by Chapman & Hall/CRC
These results help show why getting accurate political polls can be a costly activity. The formulas
ˆ |
− b, can be particularly expensive for accuracy |
also show why a “tight race,” with small values of p |
|
in polling. |
|
8.8.3Sample Sizes in Medical Research
Medical investigators frequently ask statistical consultants for help in determining a suitable sample size. The calculations for comparing two groups will be discussed later. If only one group is being considered, however, the consultant trying to apply the foregoing formulas will have to know the estimated value of p, the desired boundary of b, and the level of α . A value of e could be used for margin of error instead of b.
When probing to get these values, the consultant is often frustrated by the investigator’s inability (or unwillingness) to state them. The discourse sometimes resembles history taking from a patient who does not state a chief complaint or account of the present illness.
“What do you anticipate as the expected proportion?” asks the consultant. “If I knew what it was, I wouldn’t be doing the research,” is the reply. “What tolerance or margin of error do you want?”
“Whatever seems right.”
“What level of confidence would you like?” “That’s your job.”
“I can't determine a sample size unless you give me the necessary information.”
“But I was told to consult a statistician. You’re supposed to let me know what the sample size should be.”
Eventually, if suitably pressed, the statistician will conjure up some guesses, plug them into the formula, and calculate a result. The investigator, having abandoned a primary role in stipulating the basic quantities, then happily departs after receiving all the guesswork as an exact number for the desired sample size.
8.8.4Sample Size for Dimensional Data
If the data are dimensional rather than binary, the sample-size job is even harder. An estimated value of p for binary data can immediately be used to get the variance as pq, but with an estimated value of X for the mean of dimensional data, an additional separate estimate is needed for the variance, s2. The calculation may require four rather than three previous decisions: for estimates of X and s, and for levels of α and b. The formula will be
n ≥ |
|
|
Zα |
2 |
|
s2 |
(8.9) |
||||
------------ |
|||||
|
|
X – b |
|
||
If e is used as the margin of error for X − b, only three decisions are needed in the formula
n ≥ |
|
sZα |
|
2 |
|
|
|||
-------- . |
||||
|
|
e |
|
|
For example, suppose we expect a mean of 30 with a standard deviation of 6. We want a 95% confidence interval within a margin of 2. The sample size will be
n ≥ |
6------------------- |
1.96 |
2 |
= 34.6 |
|
|
2 |
|
|
© 2002 by Chapman & Hall/CRC
8.8.5Checks on Calculations
To be sure the calculations are correct, it is helpful to try them in reverse and see what happens. In the research just described, suppose the investigator finds a mean of 30 and a standard deviation of 6 in 35 patients. The standard error of the mean will be 6/
35 = 1.01. A 95% confidence interval will depend on Zα sx , which will be 1.96 × 1.01 = 1.99. The margin around the observed mean of 30 will be 30 ± 1.99 and will just be within the desired boundary of 2.
Note that the foregoing calculation was done with Z α rather than t ν ,α .Trying to use the t statistic would have created a “double bind.” We need to know the value of ν (for degrees of freedom) to choose the right value of t, but we cannot know the value of ν until we find the value of n for which the calculation is intended. To avoid this problem, the calculation can be done with Zα . If n turns out to be small, it can be used for choosing an appropriate value of tν ,α and then recalculating. The calculations can then be adjusted back and forth until the level of t ν ,α produces the desired confidence level for the calculated level of n.
8.8.6Enlargement for “Insurance”
Because any of these sample-size calculations will yield results that “just make it” if everything turns out perfectly, a certain amount of imperfection should be expected. The calculated sample size is enlarged as “insurance” against this imperfection. The amount of the enlargement is “dealer’s choice.” Some investigators or consultants will arbitrarily double the calculated size; others will raise it by 50%; still others will multiply it by a factor of 1.2 (for a 20% increase).
The “bottom line” of the discussion in this section is that a precisely quantitative sample size is regularly calculated from highly imprecise estimates. To avoid disappointment afterwards, whatever emerges from the advance calculations should be suitably enlarged.
8.9Caveats and Precautions
For reasons discussed later in Chapter 13, confidence intervals are now being increasingly demanded by medical editors; the results regularly appear in medical publications; and an entire book9 has been devoted to the available formulas for calculating the intervals for many statistical summary indexes.
For a single group, with p = r/n, the formula usually recommended for a 95% confidence interval is the Wald-type Gaussian expression, p ± 1.96 
pq ⁄n , although authors are sometimes urged to use the customary continuity-corrected modification. These “conventional” formulas will usually allow a manuscript to pass scrutiny from most statistical reviewers in biomedical literature, but as noted in Section 8.6.2, “high-powered” statisticians may disagree about the best way to calculate a binomial confidence interval.
The main point of this comment is simply to warn you that you can usually “get away with” the Gaussian formula, but sometimes a statistical reviewer may object to it. If the boundaries of the confidence interval are really important (rather than merely serving as publicational “window dressing”), you may want to resort to a more exact method.
A more substantive (and substantial) problem in any confidence interval, regardless of the method used for its calculation, relies on the single value of p (or q) as being the central index for a homogeneous group of data. Because biomedical groups are seldom homogeneous, a cautious evaluator will want to know the proportions found in pertinent subgroups, not a single “average” value of p that is both scientifically and “politically” incorrect because it ignores diversity. Good clinicians are not content with a rate that cites 5-year survival for a particular cancer as 60%; the clinicians want to know the rates in pertinent subgroups such as stages for the spread of the cancer. Political managers want to know the proportion of voting preferences in different demographic subgroups (according to age, sex, gender, socio-economic status, and likelihood of actually voting), not for a single monolithic average proportion for the entire sample.
© 2002 by Chapman & Hall/CRC
Even in sports such as baseball, the team managers will make decisions based on “situational variables,” rather than a player’s overall average performance. In an engaging paper in a prominent statistical journal, Albert10 discussed about 20 of the situational variables (handedness of batter and pitcher, home or away games, day or night, grass or turf, time of season, etc.) that can affect a player’s batting average. Thus, for Wade Boggs, the overall 1992 batting average was .259 (= 133/514). The 95% confidence interval component for this “average” would be 1.96
(133 )(381) ⁄5143 = .038, and so the statistically expected 95% range would be from .211 to .297. Nevertheless, in different batting situations, Boggs’s batting average ranged from .197 to .322.
The moral of these stories is that you can calculate confidence intervals to pay appropriate homage to the statistical ritual. If you really want to know what’s happening, however, check the results not for an overall “average,” but for substantively pertinent subgroups.
References
1. Clopper, 1934; 2. Lentner, 1982; 3. Rumke, 1975; 4. Blyth, 1983; 5. Vollset, 1993; 6. Harrell, 1985; 7. Peduzzi, 1995; 8. Concato, 1993; 9. Gardner, 1989; 10. Albert, 1994.
Exercises
8.1.For many years in the United States, the proportion of women medical students was about 10%. At Almamammy medical school, however, 30 persons have already been accepted for admission to the next entering class. Of those 30 persons, 12 are women. What would you do to determine whether this 40% proportion of women is likely to continue in the long run, and whether it is compatible with the old “10%” policy or a substantial departure?
8.2.According to a statement near the middle of Section 6.2, if a political “candidate is favored by 40
of 60 voters, the probability is about one in a hundred … that the … voting population(’s) ... preference for that candidate might actually be as low as .51.” Verify that statement and show your calculation.
8.3.Another statement in Section 6.2 referred to our willingness to believe “a clinician who claims three consecutive successes” for “a particular treatment (that) usually has 80% failure.” What would you do to decide whether this claim is credible? What is your conclusion?
8.4.In a clinical trial of a new pharmaceutical agent, no adverse reactions were found in 50 patients. Can the investigators confidently assume that the rate of adverse reaction is below 1/50? What evidence would you offer for your conclusion?
8.5.The state health commissioner, having a list of inhabitants of all nursing homes in your state, wants to examine their medical records to determine what proportion of the inhabitants are demented. Because the examination of records is costly, the commissioner wants to check the smallest sample that will give an accurate result within an error tolerance of 1%. When you ask about a level of confidence, the commissioner says, “Use whatever is needed to make the report acceptable.”
8.5.1.What sample size would you choose if the commissioner also does not know the probable proportion of demented patients?
8.5.2.What size would you choose if the estimated proportion of dementia is 20%?
8.5.3.If the commissioner says funds are available to examine no more than 300 records, what would you propose?
8.6.A clinical epidemiologist wants to do a study to determine how often endometrial cancer is found at necropsy after having been undetected during life. In examining the diagnoses cited at 26,731 consecutive necropsies, he finds about 40 such undetected endometrial cancers, all of them occurring in women above age 50 with intact uteri.
©2002 by Chapman & Hall/CRC
He can use 40 as the numerator for a “rate of undetected cancer,” but he needs an appropriate denominator for the eligible patient population. To determine the number of post-menopausal, uteruspossessing women who constitute the appropriate eligible denominator population in the necropsy group, he would prefer not to have to check these characteristics in the entire group of 26,731 people who received necropsy. He therefore decides to take a random sample that can be counted to give an accurate approximation of the number of women, above age 50, with intact uteri.
He estimates that the autopsy population contains 50% women, that 80% of the women have intact uteri, and that 70% of the women are above 50 years of age. Thus, the proportion of eligible women is estimated at (.50)(.80)(.70) = 28%. The investigator wants to be able to confirm this estimated value with 95% confidence that the true value lies within 1% of the estimated result in either direction, i.e., between 27% and 29%. He calculates that 7745 people will be needed for the cited 95% confidence interval.
After checking with a statistician, however, the clinical epidemiologist learns something that reduces the sample size to 6005. What do you think was learned, and what was the appropriate calculation? 8.7. (Optional exercise if you have time) Using any literature at your disposal, find a published example of either logistic regression or proportional hazards (Cox) regression. [Note: Such papers appear about once a week on average in the NEJM or Lancet.] Determine the events per variable ratio and show the source of your calculation. Did the authors appear to pay any attention to this ratio?
© 2002 by Chapman & Hall/CRC
