Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
2.7. There are no real advantages. The trivial saving of effort in citing one rather than two digits when the data are extracted will be followed by massive disadvantages when the data are later analyzed. The investigator will be unable to determine any averages (such as means or medians), unable to find any trends within the same decade, and unable to find distinctions that cross decades (such as the age group 35–54). The moral of the story is: Always enter dimensional data in their original dimensions or in direct transformations (such as kg ↔ lb) that preserve the original dimensions. Never compress dimensional data for their original citation; the compression can always be done later, during the analyses.
2.9.Individual answers.
Chapter 3
3.1.The median is probably the best choice for this right-skewed distribution.
3.3. Mean = X = 120.10. For these 20 numbers, the median is between ranks 10 and 11. The actual values at these ranks are 91 and 96; and so median = (91 + 96)/2 = 93.5. Mode is 97. Geometric mean is (3.198952738 × 1040)1/20 = (3.198952738 × 1040).05 = 105.99.
Chapter 4
4.1.The data set contains 56 members.
4.1.1.For lower quartile, (.25)(56) = 14, and the rank is between 14 and 15. The values of 17
appear at both the 14th and 15th rank, and so QL = 17. For upper quartile, (.75)(56) = 42, and rank is between 42 and 43. Q U = 28, which occupies ranks 41–44. [With the r = P(n + l) formula, (.25)(57) = 14.25 and so QL is at 17 between 14th and 15th rank. For QU, (.75)(57) = 42.75, which is again between rank 42 and 43.]
4.1.2.(.025)(56) = 1.4, which will become rank 2, at which the value is 12 for P.025. For P.975, (.975)(56) = 54.6, which will become rank 55, at which the value is 41. [With the r = P(n + 1)
formula, (.025)(57) = 1.4, which would put the 2.5 percentile value at 11.5, between ranks 1 and 2. Since (.975)(57) = 55.6, the 97.5 percentile value is at 42 between the 55th and 56th ranks.] 4.l.3. The value of 30 in Table 3.1 is at the 48th rank. At rank 47, 47/56 = .839; and at rank 48, the cumulative proportion is 48/56 = .857. Therefore 30 occupies both the 84th and 85th percentiles. [With the formula P = r/(n + l), we get 48/57 = .842, which would be the 84th percentile.]
4.3. If the data were Gaussian, the positive and negative Z-scores would be symmetrically distributed around the mean. They are not.
4.5. Let each candidate’s raw score be Xi. From the array of candidate raw scores, calculate X and s and then calculate Zi = (Xi − X )/s for each candidate. These results will have a mean of zero and s.d. of l. To make the s.d. 100, multiply each Zi by 100, to get 100Z i. The results will have a mean of zero and s.d. of 100. Then add 500 to each 100Z i . The results will have a mean of 500 and s.d. of 100. Thus, the formula is: Final Score = [100(Xi − X )/s] + 500.
Practical Demonstration of the Formula:
Three candidates get 75, 83, and 92 in the raw scores. For these data, X = 83.33 and s = 8.50. The original Zi scores for the three candidates will be (75 − 83.33)/8.50 = −0.98, (83 − 83.33)/8.50 = −0.04 and (92 − 83.33)/8.50 = 1.02. Multiplied by 100, these scores become −98, −4, and 102. When 500 is added, the scores become 402, 496, and 602. For these three values, a check on your calculator will confirm that X = 500 and s = 100.
4.7.
4.7.l. False. The percentile reflects a ranking among candidates, not the actual score on the test.
4.7.2.False. If the actual results have a Gaussian or near-Gaussian distribution, only a small change is needed to go from the 50th to 59th percentile. A much larger change is needed to go from the 90th to 99th percentile. Thus, Mary made comparatively more progress than John.
4.7.3.False. Same problem as 4.7.2. Percentiles give ranks, not scores, and may distort the
©2002 by Chapman & Hall/CRC
6.1.4. The 5 possible ways of tossing a 6 are 1–5, 2–4, 3–3, 4 –2, and 5 –1. Thus, the probability of getting a 6 on a single roll is 5/36. The fact that a 6 was just tossed is irrelevant, since each toss is a new or “independent” event. (If you were considering, in advance, whether two consecutive sixes might be tossed, the probability would be (5/36)(5/36) = .019.)
6.1.5.Although effectively balanced on its pivot, no roulette wheel is ever perfectly balanced. Accordingly, the roulette ball is more likely to fall in slots for certain numbers than for others. If
you prepare a frequency count of the consecutive outcomes of each rotation, you can begin to see the pattern of the higher-probability outcomes for each wheel on its particular pivot. This
“histogram” can then guide you into successful betting. Changing the wheels at regular intervals will destroy the characteristics of the histograms.
6.3.If X is the mean, a zone of 1.96 standard deviations around the mean should include 95% of the
data. Conversely, a zone of 1.96 standard deviations around the observed value of 40 should have a 95% chance of including the mean. Thus, if we calculate 40 ± (1.96)(12.3) = 40 ± 24.1, the mean can be
expected, with 95% chance or confidence, to lie within the zone of 15.9 to 64.1. (This principle will be used more formally in Chapter 7 for determining confidence intervals.)
6.5.1.As noted in Section 4.9.3., the standard deviation of a proportion P is
pq . For the cited
data, the result will be |
(6/9 )(3/9) = |
18/9 . The standard error, estimated as s/ n , will be |
||
( 18/ 9)/3 = 18/27 . The coefficient of stability will be (s/ |
n )/p . Substituting 6/9 for P, we |
|||
get ( |
18 )/27 )/(6/9) , |
which becomes |
( 18/27 )(9/6 ) = ( |
18/3 )(1/6) = 18/18 = 1/ 18 . |
Since |
18 lies between 4 and 5, this result lies between 1/5 = .20 and 1/4 = .25, and is obviously |
|||
much larger than the smaller value (e.g. .1 or .05) needed for a stable central index. (If you actually did the calculation, c.s. = .24.)
6.5.2. The result is now stable because
900 = 30; and the standard error and c.s. will be 1/10 of their previous values. The main “non-statistical” question is whether the poll was taken from a random sample of all potential voters. If the 900 people were a “convenience sample”—comprising casual passers-by in a single neighborhood, all members of the same club, or respondents to a mailed questionnaire--the sample may be highly biased and unrepresentative, regardless of the stable result. Also, are the sampled people actually likely to vote?
6.7.Individual answers.
Chapter 7
7.1.
7.1.1.The two-tailed value for t7,.05 is 2.365. The 95% confidence interval is 34 ± (2.365)(5.18/
8) = 34 ± 4.33, and extends from 29.67 to 38.33.
7.1.2.For a one-tailed confidence interval of 90%, we want to use t7,.20 which is 1.415. The lower border for the interval would be 34 – (1.415)(5.18/
8 ), which is 34 − 2.59 = 31.41.
7.1.3.The extreme values in the data are 28 and 42. Removal of these values reduces the respective means to 34.86 and 32.86. The maximum proportional variation is (34 − 32.86)/34 =
.03, which does not seem excessive.
7.1.4.The median is 36 with removal of any item from 28–31, and 31 with removal of any item from 36–40. For the original median of 33.5, the maximum proportional variation is 2.5/33.5 = .07.
7.1.5.According to the jackknife procedure, the coefficients of potential variation for the mean are a maximum of .03 (as noted in Answer 7.1.3). The analogous coefficient (as noted in Answer
7.1.4) for the median is .07. According to the parametric procedure, the standard error of the mean is 5.18/
8 = 1.83 and so its coefficient of potential variation is 1.83/34 = .05. With either the empirical or parametric procedure, the mean seems more stable than the median, perhaps because
the median in this data set comes from the two middlemost values, 31 and 36, which are more widely separated than any two other adjacent members of the data. Without knowing more about the source of the data or what one intends to do with the mean, its stability is difficult to evaluate.
©2002 by Chapman & Hall/CRC
Another major decision here is whether to express the risks as events per “subject,” i.e., patient, or events per subject year. The investigators went to great effort to calculate and list subject years, so these are presumably the preferred units of analysis. They will be used in the first set of analyses here. The second set of analyses will use patients as the denominators of “risk.”
The figures in the following tables will provide results used in answers to 10.1.1, 10.1.2, and 10.1.3, as well as 10.1.4 and 10.1.5.
|
Rates per Subject Year, U.S. Study |
|
Rates per Subject Year, U.K. Study |
||||||
|
|
|
|
Ratio of |
|
|
|
|
Ratio of |
|
|
|
Increment |
Higher to |
|
|
No |
Increment |
Higher to |
|
Aspirin |
Placebo |
in Rates |
Lower Rate |
|
Aspirin |
Aspirin |
in Rates |
Lower Rate |
|
|
|
|
|
|
|
|
|
|
Total deaths |
.00398 |
.00418 |
.000196 |
1.050 |
.0143 |
.0159 |
.0016 |
1.112 |
|
Fatal MI |
.000183 |
.000478 |
.000295 |
2.612 |
.00473 |
.00496 |
.00023 |
1.049 |
|
Total rate for |
.00255 |
.00440 |
.00185 |
1.725 |
.00898 |
.00929 |
.00031 |
1.035 |
|
MI |
|
|
|
|
|
|
|
|
|
Fatal stroke |
.000165 |
.000110 |
.0000546 |
1.500 |
.00159 |
.000106 |
.00043 |
1.500 |
|
Total rate for |
.00218 |
.00180 |
.000377 |
1.211 |
.00484 |
.00412 |
.00072 |
1.175 |
|
stroke |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
Rates per Subject, U.S. Study |
|
|
Rates per Subject, U.K. Study |
||||
|
|
|
|
Ratio of |
|
|
|
|
Ratio of |
|
|
|
Increment |
Higher to |
|
|
No |
Increment |
Higher to |
Rate |
Aspirin |
Placebo |
in Rates |
Lower Rate |
|
Aspirin |
Aspirin |
in Rates |
Lower Rate |
|
|
|
|
|
|
|
|
|
|
Total deaths |
.0197 |
.0206 |
.000873 |
1.046 |
.0787 |
.0883 |
.0096 |
1.122 |
|
Fatal MI |
.000906 |
.002356 |
.00145 |
2.600 |
.02596 |
.02749 |
.00153 |
1.059 |
|
Total rate for |
.0126 |
.0217 |
.0091 |
1.722 |
.0493 |
.0515 |
.00216 |
1.045 |
|
MI |
|
|
|
|
|
|
|
|
|
Fatal stroke |
.000816 |
.000544 |
.000272 |
1.500 |
.00875 |
.00702 |
.00173 |
1.246 |
|
Total rate for |
.0108 |
.00888 |
.00192 |
1.216 |
.0265 |
.0228 |
.0037 |
1.162 |
|
stroke |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: The ratios of rates are essentially identical, whether the denominators are subject years or subjects. The increments in rates are higher with subjects than with subject years and will be used here for calculations of the number needed to treat.
10.1.1.In the U.S. trial, the total death rate ratios, at values of 1.05, were not impressively higher in the placebo group. The NNT would be 1/.000873 = 1145 persons to prevent one death. The
MI rate ratios were impressive, at 2.6 for fatal MI and 1.7 for total MI. The corresponding values of NNT, however, were less impressive, at 1/.00145 = 690 and 1/.0091 = 110. The rate ratios for
stroke were elevated for aspirin, but less impressively (at 1.5 for fatal and 1.2 for total stroke). The corresponding NNT values were 1/.000272 = 3676 and l/.00192 = 521.
10.1.2.The most impressive rate ratios in the U.K. trial were the elevated values for the risk of stroke with aspirin. The total death rate ratios in favor of aspirin, however, were more impressive in the U.K. than in the U.S. trial, but the U.K. rate ratio effects for MI were unimpressive.
Nevertheless, the NNT values in the U.K. trial were more impressive, in each respect, than in the U.S. trial. The NNT results were l/.0096 = 104 for total deaths, 1/.00153 = 654 for fatal MI, 1/.00216 = 463 for total MI, 1/.00173 = 578 for fatal stroke, and 1/.0037 = 270 for total stroke.
10.1.3.Risk of myocardial infarction was sum of deaths + nonfatal MIs. Per subject years, the proportionate increment in aspirin vs. placebo was –.00185/.00440 = −42%. Per subjects, the corresponding result was −.0091/.0217 = −42%. Both of these values are close to but not exactly 44%,
which may have been calculated with a statistical adjustment for age. The corresponding relative risks are .00255/.00440 = .58 per subject year and .0126/.0217 = .58 per subject.
©2002 by Chapman & Hall/CRC


