Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

Simel, D.L., Samsa, G.P., and Matchar, D.B. Likelihood ratios for continuous test results— Making the clinicians’ job easier or harder? J. Clin. Epidemiol. 1993; 46:85–93. [21]

Simes, R.J. An improved Bonferroni procedure for multiple tests of significance. Biometrika 1986; 73:751–754. [25]

Simon, G.A. Efficacies of measures of association for ordinal contingency tables. J. Am. Statist. Assn. 1978; 73:545–551. [27]

Simon, G.E. and VonKorff, M. Reevaluation of secular trends in depression rates. Am. J. Epidemiol. 1992; 135:1411–1422. [22]

Simon, J.L. Basic Research Methods in Social Science. New York: Random House. 1969. [6] Sinclair, J.C. and Bracken, M.B. Clinically useful measures of effect in binary analyses of randomized

trials. J. Clin. Epidemiol. 1994; 47:881–889. [10]

Singer, P.A. and Feinstein, A.R. Graphical display of categorical data. J. Clin. Epidemiol. 1993; 46:231–236. [9,16]

Slavin, R.E. Best evidence synthesis: An intelligent alternative to meta-analysis. J. Clin. Epidemiol. 1995; 48:9–18. [25]

Smirnov, N.V. Tables for estimating the goodness of fit of empirical distributions. Ann. Math. Statist. 1948; 19:279–281. [14]

Smith, A.H. and Bates, M.N. Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies. Epidemiology 1992; 3:449–452. [24]

Smith, D.E., Lewis, C.E., Caveny, J.L., Perkins, L.L., Burke, G.L., and Bild, D.E. Longitudinal changes in adiposity associated with pregnancy. J. Am. Med. Assn. 1994; 271:1747–1751. [22]

Snedecor, G.W. and Cochran, W.G. Statistical Methods. 7th ed. Ames, Iowa: Iowa State University Press, 1980 (5th ed., 1956). [5,19]

Sorlie, P.D., Thom, T.J., Manolio, T., Rosenberg, H.M, Anderson, R.N., and Burke, G.L. Age-adjusted death rates: Consequences of the year 2000 standard. Ann. Epidemiol. 1999; 9:93–100. [26]

Souhami, R.L. and Whitehead, J. (Eds.). Workshop on early stopping rules in cancer clinical trials. Statist. Med. 1994; 13:1289–1499. [25]

Sox, H. (Ed.). Common Diagnostic Tests: Use and Interpretation. Philadelphia: American College of Physicians, 1987. [21]

Spear, M. Charting Statistics. New York: McGraw-Hill Book Co., Inc., 1952. [5]

Spearman, C. General intelligence objectively determined and measured. Am. J. Psychol. 1904; 15:201–293. [28]

Special Writing Group of the Committee on Rheumatic Fever, Endocarditis, and Kawasaki Disease of the Council on Cardiovascular Disease in the Young of the American Heart Association. Guidelines for the diagnosis of rheumatic fever. Jones Criteria, 1992 update. JAMA 1993; 269:476. [21]

Spitzer, R.L., Fleiss, J.L., Kernohan, W., Lee, J., and Baldwin, I.T. The Mental Status Schedule: Comparing Kentucky and New York schizophrenics. Arch. Gen. Psychiatry 1965; 12:448–455. [15]

Spitzer, W.O. (Ed.). Potsdam International Consultation on Meta-Analysis. (Special issue) J. Clin. Epidemiol. 1995; 48:1–171. [25]

Sprent, P. Applied Nonparametric Statistical Methods. 2nd ed. London: Chapman and Hall, 1993. [15,20,27]

SPSSX User’s Guide. 2nd ed. Chapter 28, General linear models. Chicago, IL: SPSS Inc., 1986, 477–552. [5]

Stacpoole, P.W., Wright, E.C., Baumgartner, T.G. et al. A controlled clinical trial of dichloroacetate for treatment of lactic acidosis in adults. N. Engl. J. Med. 1992; 327:1564–1569. [10]

Staniloff, H.M., Diamond, G.A., Forrester, J.S., Pollock, B.H., Berman, D.S., and Swan, H.J.C. The incremental information boondoggle: When a test result seems powerful but is not. Circulation 1982; 66:184 (Abstract). [21]

Stead, E.A., Jr. Response to Letter to editor. Circulation 1978; 57:1232. [19]

Steen, P.M., Brewster, A.C., Bradbury, R.C., Estabrook, E., and Young, J.A. Predicted probabilities of hospital death as admission severity of illness. Inquiry 1993; 30:128–141. [21]

© 2002 by Chapman & Hall/CRC

Steering Committee of the Physicians’ Health Study Research Group. Final report on the aspirin component of the ongoing Physicians’ Health Study. N. Engl. J. Med. 1989; 321:129–1135. [10,14,24]

Steiner, D.L. and Norman, G.R. Health Measurement Scales. A Practical Guide to Their Development and Use. 2nd ed. Oxford: Oxford University Press. 1995. [28]

Stephen, S.A. et al. Propranolol in acute myocardial infarction. Lancet 1966; 2:1435–1438. [23] Stevens, S.S. On the theory of scales of measurement. Science 1946; 103:677–680. [2]

Student. The probable error of a mean. Biometrika 1908; 6:1. [6,7]

Stukel, T.A. Comparison of methods for the analysis of longitudinal interval count data. Stat. Med. 1993; 12:1339–1351. [22]

Sulmasy, D.P., Haller, K., and Terry, P.B. More talk, less paper: Predicting the accuracy of substituted judgments. Am. J. Med. 1994; 96:432–438. [20]

Surgeon General’s Advisory Committee On Smoking and Health. Smoking and Health 1964. United States Department of Health, Education and Welfare, Public Health Service Publication No. 1103. [13]

Tasaki, T., Ohto, H., Hashimoto, C., Abe, R., Saitoh, A., and Kikuchi, S. Recombinant human erythropoietin for autologous blood donation: Effects on perioperative red-blood-cell and serum erythropoietin production. Lancet 1992; 339:773–775. [19]

Tate, M.W. and Clelland, R.C. Non-Parametric and Shortcut Statistics. Danville, IL: Interstate Printers and Publishers, 1957. [27]

Teigen, K.H. Studies in subjective probability. III: The unimportance of alternatives. Scand. J. Psychol. 1983; 24:97–105. [6]

Thielemans, A., Hopke, P.K., De Quint, P., Depoorter, A.M., Thiers, G., and Massart, D.L. Investigation of the geographical distribution of female cancer patterns in Belgium using pattern recognition techiniques. Int. J. Epidemiol. 1988; 17:724–731. [28]

Thomas, D.C., Siemiatycki, J., Dewar, R., Robins, J., Goldberg, M., and Armstrong, B.G. The problem of multiple inference in studies designed to generate hypotheses. Am. J. Epidemiol. 1985; 122:1080–1095. [25]

Thompson, J.D., Fetter, R.B., and Mross, C.D. Case mix and resource use. Inquiry 1975; 12:300–312. [21]

Thompson, J.R. Invited Commentary: Re: Multiple comparisons and related issues in the interpretation of epidemiologic data. Am. J. Epidemiol. 1998; 147:801–806. [25]

Thompson, S.G. and Pocock, S.J. Can meta-analyses be trusted? Lancet 1991; 338:1127–1130. [25] Tibshirani, R. A plain man’s guide to the proportional hazards model. Clin. Invest. Med. 1982;

5:63–68. [22]

Tomei, R., Rossi, L., Carbonieri, E., Franceschini, L., Molon, G., and Zardini, P. Antihypertensive effect of lisinopril assessed by 24-hour ambulatory monitoring: A double-blind, placebocontrolled, cross-over study. J. Cardiovasc. Pharmacol. 1992; 19:911–914. [29]

Trentham, D.E., Dynesius-Trentham, R.A., Orav, E.J. et al. Effects of oral administration of Type II collagen on rheumatoid arthritis. Science 1993; 261:1727–1730. [15]

Tsai, S.P., Lee, E.S., and Kautz, J.A. Changes in life expectancy in the United States due to declines in mortality, 1968–1975. Am. J. Epidemiol. 1982; 116:376–384. [22]

Tufte, E.R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983. [16] Tufte, E.R. Envisioning Information. Cheshire, CT: Graphics Press, 1990. [16]

Tukey, J.W. Comparing individual means in the analysis of variance. Biometrics 1949; 5:99–114. [25] Tukey, J.W. Bias and confidence in not-quite large samples. Ann. Math. Statist. 1958; 29:614. [6] Tukey, J.W. The problem of multiple comparisons. Unpublished notes. Princeton University, 1953.

[Discussed in Bancroft, T.A. Topics in Intermediate Statistical Methods. Vol. 1. Ames: Iowa State University Press. 1968, 100–112.] [29]

Tukey, J.W. Some graphic and semigraphic displays. Chapter 18, pgs. 295–296. In Bancroft, T.A. (Ed.). Statistical Papers in Honor of George W. Snedecor. Ames, Iowa: Iowa State University Press, 1972. [3]

Tukey, J.W. Exploratory Data Analysis. Reading, MA: Addison-Wesley, 1977. [2,3,5]

Twisk, J.W.R., Kemper, H.C.G., and Mellenbergh, G.J. Mathematical and analytical aspects of tracking. Epidemiol. Rev. 1994; 16:165–183. [22]

© 2002 by Chapman & Hall/CRC

Ulm, K. A simple method to calculate the confidence interval of a standardized mortality ratio (SMR). Am. J. Epidemiol. 1990; 131:373–375. [26]

Uretsky, B.F., Jessup, M., Konstam, M.A. et al. Multicenter trial of oral enoximone in patients with moderate to moderately severe congestive heart failure. Circulation 1990; 82:774–780. [23]

Vandenbroucke, J.P. A shortcut method for calculating the 95 per cent confidence interval of the standardized mortality ratio (Letter to editor). Am. J. Epidemiol. 1982; 115:303–304. [26] Vandenbroucke, J.P. and Pardoel, V.P.A.M. An autopsy of epidemiologic methods: the case of “pop -

pers” in the early epidemic of the acquired immunodeficiency syndrome (AIDS). Am. J. Epidemiol. 1989; 129:455–457. [19]

Vessey, M., Doll, R., Peto, R., Johnson, B., and Wiggins, P. A long-term follow-up study of women using different methods of contraception—An interim report. J. Biosoc. Sci. 1976; 8:373– 427. [25]

Viberti, G. et al. Early closure of European Pimagedine trial (Letter to editor). Lancet 1997; 350:214–215. [25]

Vollset, S.E. Confidence intervals for a binomial proportion. Stat. Med. 1993; 12:809–824. [8]

von Knorring, L. and Lindstrom, E. Principal components and further possibilities with the PANSS. Acta Psychiatr. Scand. 1995; 91(Suppl 388): 5–10. [28]

Wagner, G.S., Cebe, B., and Rozen, M.P. (Eds.). E.A. Stead, Jr.: What This Patient Needs Is a Doctor. Durham, NC: Academic Press, 1978. [19]

Wald, A. Sequential Analysis. New York: John Wiley and Sons, 1947. [25] Wald, N. Use of MoMs (Letter to the editor). Lancet 1993; 341:440. [5]

Waller, L.A. and Turnbull, B.W. Probability plotting with censored data. Am. Statist. 1992; 46:5–12. [22]

Waller, L.A., Turnbull, B.W., Gustafsson, G., Hjalmars, U., and Andersson, B. Detection and assess - ment of clusters of disease: An application to nuclear power plant facilities and childhood leukemia in Sweden. Stat. Med. 1995; 14:3–16. [28]

Wallis, W.A. and Roberts, H.V. Statistics: A New Approach. New York: The Free Press, 1956. [5,14] Wallsten, T.S. and Budescu, D.V. Comment on “Quantifying probabilistic expressions.” Statist. Sci.

1990; 5:23–26. [6]

Walravens, P.A., Chakar, A., Mokni, R., Denise, J., and Lemonnier, D. Zinc supplements in breastfed infants. Lancet 1992; 340:683–685. [4]

Walter, S.D. Statistical significance and fragility criteria for assessing a difference of two proportions. J. Clin. Epidemiol. 1991; 44:1373–1378. [11]

Walter, S.D. Visual and statistical assessment of spatial clustering in mapped data. Statist. Med. 1993; 12:1275–1291. [25]

Wastell, D.G. and Gray, R. The numerical approach to classification: A medical application to develop a typology for facial pain. Statist. Med. 1987; 6:137–146. [28]

Weinberg, A.D., Pals, J.K., McGlinchey-Berroth, R., and Minaker, K.L. Indices of dehydration among frail nursing home patients: Highly variable but stable over time. J. Am. Geriat. Soc. 1994; 42:1070–1073. [29]

Weinstein, M.C. and Fineberg, H.V. Clinical Decision Analysis. Philadelphia: Saunders, 1980. [21] Weiss, J.S., Ellis, C.N., Headington, J.T., Tincoff, T., Hamilton, T.A., and Voorhees, J.J. Topical tretinoin

improves photoaged skin. A double-blind vehicle-controlled study. JAMA 1988; 259:527–532. [15]

Welch, B.L. On the Z test in randomized blocks and Latin squares. Biometrika 1937; 29:21–52. [12] Westfall, P.H. and Young, S.S. Resampling-Based Muiltiple Testing: Examples and Methods for

p-Values Adjustment. John Wiley and Sons, New York, 1993. [25]

Westlake, W.J. Use of confidence intervals in analysis of comparative bioavailability trials. J. Pharm. Sci. 1972; 61:1340–1341. [24]

Westlake, W.J. Statistical aspects of comparative bioavailability trials. Biometrics 1979; 35:273–280. [24]

Whitehead, J. The case for frequentism in clinical trials. Statist. Med. 1993; 12:1405–1413. [11] WHO Working Group. Use and interpretation of anthropometric indicators of nutritional status.

Bull. WHO 1986; 64:929–941. [4]

© 2002 by Chapman & Hall/CRC

Wiggs, J., Nordenskjold, M., Yandell, D. et al. Prediction of the risk of hereditary retinoblastoma, using DNA polymorphisms within the retinoblastoma gene. N. Engl. J. Med. 1988; 318:151–157. [21]

Wilcoxon, F. Individual comparisons by ranking methods. Biometrics Bull. 1945; 1:80–82. [15] Wilk, M.B. and Gnanadesikan, R. Probability plotting methods for the analysis of data. Biometrika

1968; 55:1–17. [16]

Williams, K. The failure of Pearson’s goodness of fit statistic. Statistician 1976; 25:49. [14] Wintemute, G.J. Handgun availability and firearm mortality (Letter to editor). Lancet 1988;

335:1136–1137. [19]

Wolleswinkel-van den Bosch, J.H., Looman, C.W., van Poppel, F.W., and Mackenbach, J.P. Causespecific mortality trends in the Netherlands, 1875–1992: A formal analysis of the epidemiologic transition. Int. J. Epidemiol. 1997; 26:772–779. [28]

Wolthius, R.A., Froelicher, V.F., Jr., Fischer, J., and Triebwasser, J.H. The response of healthy men to treadmill exercise. Circulation 1977; 55:153–157. [16]

Woolf, B. On estimating the relation between blood group and disease. Ann. Hum. Genet. 1955; 19:251–253. [17]

Woolf, B. The log likelihood ratio test (the G-test). Methods and tables for tests of heterogeneity in contingency tables. Ann. Hum. Genetics 1957; 21:397–409. [14]

Wright, J.G. and Feinstein, A.R. A comparative contrast of clinimetric and psychometric methods for constructing indexes and rating scales. J. Clin. Epidemiol. 1992; 45:1201–1218. [28]

Wright, J.G., McCauley, T.R., Bell, S.M., and McCarthy, S. The reliability of radiologists’ quality assessment of MR pelvic scans. J. Comput. Assist. Tomogr. 1992; 16:592–596. [27]

Wulff, H.R. Rational Diagnosis and Treatment. 2nd ed. Oxford: Blackwell Scientific, 1981. [4,21] Wynder, E.L. Workshop on guidelines to the epidemiology of weak associations. Prev. Med. 1987;

16:139–141. [17,24]

Wynder, E.L., Bross, I.D.J., and Hiroyama, T. A study of the epidemiology of cancer of the breast. Cancer 1960; 13:559–601. [15]

Yates, F. Contingency tables involving small numbers and the χ 2 test. J. Roy. Statist. Soc. Suppl. 1934; 1:217–235. [14]

Yates, F. Tests of significance for 2 × 2 contingency tables. J. Roy. Statist. Soc. Series A 1984; 147:426–463. [14]

Yerushalmy, J. Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Public Health Rep. 1947; 62:1432–1449. [21]

Yerushalmy, J. A mortality index for use in place of the age-adjusted death rate. Am. J. Public Health 1951; 41:907–922. [26]

Yerushalmy, J. The statistical assessment of the variability in observer perception and description of roentgenographic pulmonary shadows. Radiol. Clin. North Am. 1969; 7:381–392. [20]

Youden, W.J. Quoted in Tufte, E.R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983. [9]

Zeger, S.L. and Liang, K-Y. An overview of methods for the analysis of longitudinal data. Stat. Med. 1992; 11:1825–1839. [22]

Zelen, M. The education of biometricians. Am. Statist. 1969; 23:14–15. [24]

Zerbe, G.O., Wu, M.C., and Zucker, D.M. Studying the relationship between change and initial value in longitudinal studies. Stat. Med. 1994; 13:759–768. [22]

Zmuda, J.M., Cauley, J.A., Kriska, A., Glynn, N.W., Gutai, J.P., and Kuller, L.H. Longitudinal relation between endogenous testosterone and cardiovascular disease risk factors in middle-aged men. Am. J. Eidemiol. 1997; 146:609–617. [22]

© 2002 by Chapman & Hall/CRC

Answers to Exercises

The following “official” answers are for Exercises that have an odd second number. Answers to the even-numbered Exercises are contained in an Instructor’s Manual.

Chapter 1

1.1.4. (Architecture)

1.3.3. (Stochastic)

1.5.1. (A descriptive summary, expressed in nonquantitative terms.)

1.7. 5. (A decision about classifying the raw data. If you cited 6 as the answer, you can also be justified, although data processing usually refers to the conversion of sick into a coded category, such as “1 in column 37 of card 6,” rather than the data acquisition process that converts raw data, such as “Apgar 6,” into a designated category, such as “sick.” Because the distinctions are somewhat blurred, either answer is acceptable.)

1.9. 5, if you think the research was done to demonstrate the quality of the raw data; 4, if you think the research has an architectural role with respect to biased observations. Either answer is acceptable. 1.11. 4. (Architecture)

Chapter 2

2.1.

2.1.1.Age, height, and each blood pressure measurement are dimensional. Sex is binary. Diagnosis is nominal. Treatment is nominal, if different agents are used; dimensional, if different dosages of the same agent are given in the same schedule; ordinal, if the dosages of the same agent can be ranked but cannot be arranged in exact dimensions because of the different schedules.

2.1.2.To analyze blood pressure responses, some of the following changes might be examined:

Ultimate Effect: Increment of before and after values. Immediate Effect: Increment of before and during values.

Therapeutic Trend: Rating of rise, stable, fall, up-and-down, down-and-up, etc. for trend observed in the three consecutive BP values

Ultimate Success: Deciding if the blood pressure after treatment is normal or abnormal.

2.3.Examples

“How do you feel today compared with yesterday?”

“What has happened to your pain since you received the medication?” “What sort of change has occurred in your child’s temperature?” “How much heavier have you gotten since your last birthday?”

2.5. The TNM index expressions are composite “profiles” that contain three single-state variables. They are nominal and cannot be readily ranked because they do not have an aggregated expression. In the TNM staging system, each TNM profile index is assigned to an ordinal stage such as I, II, III, … . If desired, the TNM indexes could be ordinally ranked according to their locations in the staging system.

© 2002 by Chapman & Hall/CRC

2.7. There are no real advantages. The trivial saving of effort in citing one rather than two digits when the data are extracted will be followed by massive disadvantages when the data are later analyzed. The investigator will be unable to determine any averages (such as means or medians), unable to find any trends within the same decade, and unable to find distinctions that cross decades (such as the age group 35–54). The moral of the story is: Always enter dimensional data in their original dimensions or in direct transformations (such as kg lb) that preserve the original dimensions. Never compress dimensional data for their original citation; the compression can always be done later, during the analyses.

2.9.Individual answers.

Chapter 3

3.1.The median is probably the best choice for this right-skewed distribution.

3.3. Mean = X = 120.10. For these 20 numbers, the median is between ranks 10 and 11. The actual values at these ranks are 91 and 96; and so median = (91 + 96)/2 = 93.5. Mode is 97. Geometric mean is (3.198952738 × 1040)1/20 = (3.198952738 × 1040).05 = 105.99.

Chapter 4

4.1.The data set contains 56 members.

4.1.1.For lower quartile, (.25)(56) = 14, and the rank is between 14 and 15. The values of 17

appear at both the 14th and 15th rank, and so QL = 17. For upper quartile, (.75)(56) = 42, and rank is between 42 and 43. Q U = 28, which occupies ranks 41–44. [With the r = P(n + l) formula, (.25)(57) = 14.25 and so QL is at 17 between 14th and 15th rank. For QU, (.75)(57) = 42.75, which is again between rank 42 and 43.]

4.1.2.(.025)(56) = 1.4, which will become rank 2, at which the value is 12 for P.025. For P.975, (.975)(56) = 54.6, which will become rank 55, at which the value is 41. [With the r = P(n + 1)

formula, (.025)(57) = 1.4, which would put the 2.5 percentile value at 11.5, between ranks 1 and 2. Since (.975)(57) = 55.6, the 97.5 percentile value is at 42 between the 55th and 56th ranks.] 4.l.3. The value of 30 in Table 3.1 is at the 48th rank. At rank 47, 47/56 = .839; and at rank 48, the cumulative proportion is 48/56 = .857. Therefore 30 occupies both the 84th and 85th percentiles. [With the formula P = r/(n + l), we get 48/57 = .842, which would be the 84th percentile.]

4.3. If the data were Gaussian, the positive and negative Z-scores would be symmetrically distributed around the mean. They are not.

4.5. Let each candidate’s raw score be Xi. From the array of candidate raw scores, calculate X and s and then calculate Zi = (Xi X )/s for each candidate. These results will have a mean of zero and s.d. of l. To make the s.d. 100, multiply each Zi by 100, to get 100Z i. The results will have a mean of zero and s.d. of 100. Then add 500 to each 100Z i . The results will have a mean of 500 and s.d. of 100. Thus, the formula is: Final Score = [100(Xi X )/s] + 500.

Practical Demonstration of the Formula:

Three candidates get 75, 83, and 92 in the raw scores. For these data, X = 83.33 and s = 8.50. The original Zi scores for the three candidates will be (75 83.33)/8.50 = −0.98, (83 83.33)/8.50 = −0.04 and (92 83.33)/8.50 = 1.02. Multiplied by 100, these scores become 98, 4, and 102. When 500 is added, the scores become 402, 496, and 602. For these three values, a check on your calculator will confirm that X = 500 and s = 100.

4.7.

4.7.l. False. The percentile reflects a ranking among candidates, not the actual score on the test.

4.7.2.False. If the actual results have a Gaussian or near-Gaussian distribution, only a small change is needed to go from the 50th to 59th percentile. A much larger change is needed to go from the 90th to 99th percentile. Thus, Mary made comparatively more progress than John.

4.7.3.False. Same problem as 4.7.2. Percentiles give ranks, not scores, and may distort the

©2002 by Chapman & Hall/CRC

magnitudes of differences in actual scores. The three cited percentiles (84, 98, and 99.9) are roughly about l, 2, and 3 standard deviations above the mean in a Gaussian distribution, so that the actual magnitudes of scores are about equidistant.

4.7.4.False. In this “middle” zone of the distribution, where data are usually most abundant, a small change in actual score can produce large changes in percentiles.

4.7.5.Uncertain. The overall percentile scores for each student last year represent a comparative ranking among all students who took the test last year. The percentile rating averaged for different students will not be a meaningful number. On the other hand, if percentiles (rather than actual test scores) are the only information available to the dean, he has no other option if he wants to compare this year’s results.

4.9.As Joe’s lawyer, you argue that “grading on a curve” is an abominable way to assess competence. If you give the exam exclusively to an assembled collection of the most superb omphalologists in the country, some of them will nevertheless fail if “graded on a curve.” Why should Joe, whose actual score of 72% is usually accepted as passing in most events, be failed because of the performance of other people? As lawyer for the Board, you argue that it is impossible to determine an absolute passing score for the complex examination. Accordingly, the passing score is determined from a point on the curve of scores obtained by a reference group of Joe’s peers. He failed because his performance was worse than that of other people with presumably similar backgrounds and training. (This approach has been used, and successfully defended despite legal attacks, by almost all certification Boards in the U.S.)

Chapter 5

5.1. It seems peculiar to assemble a group of people who have been deliberately chosen for all being “healthy,” and then to exclude 5% of them arbitrarily as being outside the “range of normal.” Why not include the full range of data in the healthy, i.e. “normal,” people?

5.3. Individual answers.

5.5.

5.5.1

Let s

=

v/n – 1 and s= v/n . Since s2 = v/(n1), we can substitute s2(n1) for v,

to get s=

s2 (n – 1 )/n . Accordingly, in the 56-item data set, s= (7.693)2 (55 )/56 = 7.624 .

5.5.2.

cv = s/

 

= 7.693/22.732 = .388.

 

X

 

5.5.3.

The lower quartile is at rank (.25)(56) = 14

between 14 and 15. The upper quartile is

at rank (.75)(56) = 42 between 42 and 43. In Table

3.1, the value of 17 occupies ranks 12–16.

The value 28 occupies ranks 41–44. The quartile coefficient of variation will be (28 17)/(28 +

17) = 11/45 = .24.

5.7.Individual answers.

Chapter 6

6.1.

6.1.1.Since each tossed coin can emerge as a head or tail, there are four possibilities: HH, HT, TH, and TT. The chance is 1/4 for getting two heads or two tails. The chance is 2/4 = 1/2 for

getting a head and a tail.

6.1.2.Each of the two dice has six sides marked 1, 2,..., 6. When tossed, the pair can have 36

possible outcomes, ranging from 1-1 to 6-6. A value of 7 occurs on six occasions:1 – 6, 2–5, 3–4, 4–3, 5–2, and 6–1. Thus, the probability of a 7 is 6/36 = 1/6.

6.1.3. Common things occur commonly, but uncommon things can also occur if given enough opportunity. A pair of consecutive 7’s can be expected about once in 36 pairs of tosses. If you stay at the dice table long enough to observe 72 or more tosses, two of them could yield consecutive 7’s under ordinary circumstances. If the consecutive pairs of 7’s appear with substantially greater frequency than .03, you might suspect that the dice are “loaded”.

© 2002 by Chapman & Hall/CRC

6.1.4. The 5 possible ways of tossing a 6 are 1–5, 2–4, 3–3, 4 –2, and 5 –1. Thus, the probability of getting a 6 on a single roll is 5/36. The fact that a 6 was just tossed is irrelevant, since each toss is a new or “independent” event. (If you were considering, in advance, whether two consecutive sixes might be tossed, the probability would be (5/36)(5/36) = .019.)

6.1.5.Although effectively balanced on its pivot, no roulette wheel is ever perfectly balanced. Accordingly, the roulette ball is more likely to fall in slots for certain numbers than for others. If

you prepare a frequency count of the consecutive outcomes of each rotation, you can begin to see the pattern of the higher-probability outcomes for each wheel on its particular pivot. This

“histogram” can then guide you into successful betting. Changing the wheels at regular intervals will destroy the characteristics of the histograms.

6.3.If X is the mean, a zone of 1.96 standard deviations around the mean should include 95% of the

data. Conversely, a zone of 1.96 standard deviations around the observed value of 40 should have a 95% chance of including the mean. Thus, if we calculate 40 ± (1.96)(12.3) = 40 ± 24.1, the mean can be

expected, with 95% chance or confidence, to lie within the zone of 15.9 to 64.1. (This principle will be used more formally in Chapter 7 for determining confidence intervals.)

6.5.1.As noted in Section 4.9.3., the standard deviation of a proportion P is pq . For the cited

data, the result will be

(6/9 )(3/9) =

18/9 . The standard error, estimated as s/ n , will be

( 18/ 9)/3 = 18/27 . The coefficient of stability will be (s/

n )/p . Substituting 6/9 for P, we

get (

18 )/27 )/(6/9) ,

which becomes

( 18/27 )(9/6 ) = (

18/3 )(1/6) = 18/18 = 1/ 18 .

Since

18 lies between 4 and 5, this result lies between 1/5 = .20 and 1/4 = .25, and is obviously

much larger than the smaller value (e.g. .1 or .05) needed for a stable central index. (If you actually did the calculation, c.s. = .24.)

6.5.2. The result is now stable because 900 = 30; and the standard error and c.s. will be 1/10 of their previous values. The main “non-statistical” question is whether the poll was taken from a random sample of all potential voters. If the 900 people were a “convenience sample”—comprising casual passers-by in a single neighborhood, all members of the same club, or respondents to a mailed questionnaire--the sample may be highly biased and unrepresentative, regardless of the stable result. Also, are the sampled people actually likely to vote?

6.7.Individual answers.

Chapter 7

7.1.

7.1.1.The two-tailed value for t7,.05 is 2.365. The 95% confidence interval is 34 ± (2.365)(5.18/8) = 34 ± 4.33, and extends from 29.67 to 38.33.

7.1.2.For a one-tailed confidence interval of 90%, we want to use t7,.20 which is 1.415. The lower border for the interval would be 34 – (1.415)(5.18/8 ), which is 34 2.59 = 31.41.

7.1.3.The extreme values in the data are 28 and 42. Removal of these values reduces the respective means to 34.86 and 32.86. The maximum proportional variation is (34 32.86)/34 =

.03, which does not seem excessive.

7.1.4.The median is 36 with removal of any item from 28–31, and 31 with removal of any item from 36–40. For the original median of 33.5, the maximum proportional variation is 2.5/33.5 = .07.

7.1.5.According to the jackknife procedure, the coefficients of potential variation for the mean are a maximum of .03 (as noted in Answer 7.1.3). The analogous coefficient (as noted in Answer

7.1.4) for the median is .07. According to the parametric procedure, the standard error of the mean is 5.18/8 = 1.83 and so its coefficient of potential variation is 1.83/34 = .05. With either the empirical or parametric procedure, the mean seems more stable than the median, perhaps because

the median in this data set comes from the two middlemost values, 31 and 36, which are more widely separated than any two other adjacent members of the data. Without knowing more about the source of the data or what one intends to do with the mean, its stability is difficult to evaluate.

©2002 by Chapman & Hall/CRC

Using the parametric result (.05) for the coefficient of potential variation would in general produce a more cautious decision than the jackknife results. In this instance, if we accept the parametric coefficient of .05 as indicating stability, the mean might be regarded as stable.

7.3. The laboratory is trying to indicate a 95% interval for the range of the observed data, but the phrase “95% confidence interval” refers to dispersion around an estimate of the mean, not to dispersion of the data. The correct phrase should be simply “range of normal” or “customary inner 95% range.” The word confidence should be used only in reference to the location of a mean (or proportion or other central index of a group).

Chapter 8

8.1. The value of .40 is relatively unstable since the standard error is (.40 )(.60 ) ⁄30 = .089 and .089/.40 = .22. To answer the policy question, one approach is to put a 95% Gaussian confidence interval around the observed proportion. The interval would be .40 ± 1.96 × .089 = .40 ± .175; it extends from .225 to

.575. Because the confidence interval does not include the hypothesized value of .10, the result is probably not compatible with the 10% policy. A second approach is to do a one-sample Z test for the hypothesized parameter of .10. The calculation would be Z = (.40 – .10)/ (.40 )(.60 ) ⁄30 = .30/.089 = 3.37, for which 2P ð .001.

An entirely different approach is to jackknife 12/30 into 12/29 = .41 or 11/29 = .38. This range of maximum difference is .41 – .38 = .03, which is proportionately .03/.40 = .075. Because the ratio exceeds

.05, it immediately shows that the original proportion (.40) is unstable. The jackknife approach, however, does not answer the policy question.

8.3. With an 80% failure rate, the usual rate of success is 20% or .20. The chance of getting three consecutive successes is .2 × .2 × .2 = .008. Unless you think that the clinician is a wizard or that his patients have unusually good prognoses, this low probability seems hard to believe.

8.5.

8.5.1.Since 95% confidence intervals are regularly accepted, use Zα = Z.05 = 1.96. If the proportion of dementia is unknown, use p = q = .5. With a 1% tolerance, e = .01. Then solve n (.5)(.5)(1.96)2/(.01)2 to get n 9604.

8.5.2.With .20 as the estimated proportion, solve n (.20)(.80)(1.96)2/(.01)2 to get n 6147.

8.5.3.When the commissioner is distressed by the large sample sizes, offer to raise the tolerance

level to 5%. The sample size will then be n (.20) (.80)(1.96)2/(.05)2, which reduces the sample to n 246, which is within the specified limit of 300.

8.7.Individual answers.

Chapter 9

9.1

 

Individual answers.

 

9.3

 

Chapter 10

10.1. Like beauty, the concept of what is cogent is in the eye of the beholder. For the investigators, the most important issue seems to have been myocardial infarction. If this is your choice, do you also want to focus on all myocardial infarctions, or just the associated deaths? Would you prefer to focus, instead, on total deaths? And how about strokes, which seemed more common in the aspirin group?

© 2002 by Chapman & Hall/CRC

Another major decision here is whether to express the risks as events per “subject,” i.e., patient, or events per subject year. The investigators went to great effort to calculate and list subject years, so these are presumably the preferred units of analysis. They will be used in the first set of analyses here. The second set of analyses will use patients as the denominators of “risk.”

The figures in the following tables will provide results used in answers to 10.1.1, 10.1.2, and 10.1.3, as well as 10.1.4 and 10.1.5.

 

Rates per Subject Year, U.S. Study

 

Rates per Subject Year, U.K. Study

 

 

 

 

Ratio of

 

 

 

 

Ratio of

 

 

 

Increment

Higher to

 

 

No

Increment

Higher to

 

Aspirin

Placebo

in Rates

Lower Rate

 

Aspirin

Aspirin

in Rates

Lower Rate

 

 

 

 

 

 

 

 

 

Total deaths

.00398

.00418

.000196

1.050

.0143

.0159

.0016

1.112

Fatal MI

.000183

.000478

.000295

2.612

.00473

.00496

.00023

1.049

Total rate for

.00255

.00440

.00185

1.725

.00898

.00929

.00031

1.035

MI

 

 

 

 

 

 

 

 

 

Fatal stroke

.000165

.000110

.0000546

1.500

.00159

.000106

.00043

1.500

Total rate for

.00218

.00180

.000377

1.211

.00484

.00412

.00072

1.175

stroke

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Rates per Subject, U.S. Study

 

 

Rates per Subject, U.K. Study

 

 

 

 

Ratio of

 

 

 

 

Ratio of

 

 

 

Increment

Higher to

 

 

No

Increment

Higher to

Rate

Aspirin

Placebo

in Rates

Lower Rate

 

Aspirin

Aspirin

in Rates

Lower Rate

 

 

 

 

 

 

 

 

 

Total deaths

.0197

.0206

.000873

1.046

.0787

.0883

.0096

1.122

Fatal MI

.000906

.002356

.00145

2.600

.02596

.02749

.00153

1.059

Total rate for

.0126

.0217

.0091

1.722

.0493

.0515

.00216

1.045

MI

 

 

 

 

 

 

 

 

 

Fatal stroke

.000816

.000544

.000272

1.500

.00875

.00702

.00173

1.246

Total rate for

.0108

.00888

.00192

1.216

.0265

.0228

.0037

1.162

stroke

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note: The ratios of rates are essentially identical, whether the denominators are subject years or subjects. The increments in rates are higher with subjects than with subject years and will be used here for calculations of the number needed to treat.

10.1.1.In the U.S. trial, the total death rate ratios, at values of 1.05, were not impressively higher in the placebo group. The NNT would be 1/.000873 = 1145 persons to prevent one death. The

MI rate ratios were impressive, at 2.6 for fatal MI and 1.7 for total MI. The corresponding values of NNT, however, were less impressive, at 1/.00145 = 690 and 1/.0091 = 110. The rate ratios for

stroke were elevated for aspirin, but less impressively (at 1.5 for fatal and 1.2 for total stroke). The corresponding NNT values were 1/.000272 = 3676 and l/.00192 = 521.

10.1.2.The most impressive rate ratios in the U.K. trial were the elevated values for the risk of stroke with aspirin. The total death rate ratios in favor of aspirin, however, were more impressive in the U.K. than in the U.S. trial, but the U.K. rate ratio effects for MI were unimpressive.

Nevertheless, the NNT values in the U.K. trial were more impressive, in each respect, than in the U.S. trial. The NNT results were l/.0096 = 104 for total deaths, 1/.00153 = 654 for fatal MI, 1/.00216 = 463 for total MI, 1/.00173 = 578 for fatal stroke, and 1/.0037 = 270 for total stroke.

10.1.3.Risk of myocardial infarction was sum of deaths + nonfatal MIs. Per subject years, the proportionate increment in aspirin vs. placebo was –.00185/.00440 = −42%. Per subjects, the corresponding result was .0091/.0217 = −42%. Both of these values are close to but not exactly 44%,

which may have been calculated with a statistical adjustment for age. The corresponding relative risks are .00255/.00440 = .58 per subject year and .0126/.0217 = .58 per subject.

©2002 by Chapman & Hall/CRC