Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

2.Because PRSV can be determined for both dimensional and binary data and has a direct mathematical relationship to the standardized increment (or “effect size”), a single standard of evaluation can be used, if desired, for subsequent decisions about “quantitative significance” in contrasts of both types of data.

3.As will be noted later in Chapter 19, when we compare associations in two variables rather than contrasts in two groups, the value ofPRSV is called the correlation coefficient. Therefore, the decisions discussed later for quantitative significance of a correlation coefficient can also be applied for evaluating SI andPRSV in a contrast of two groups.

10.9Standards for Quantitative Significance

Although stochastic (or “statistical”) significance is commonly proclaimed when a calculated P value is smaller than .05, no analogous boundary has been established for demarcating quantitative significance.

10.9.1Reasons for Absence of Standards

Many excuses can be offered for the failure of biomedical scientists to set such a boundary.

10.9.1.1Substantive Context — As noted in Section 10.2, quantitative contrasts always occur in a substantive context. The increment of .03 = .09 .06 for two proportions of occurrence of an adverse

drug reaction may be unimpressive if it refers to a minor skin rash, but important if it refers to sterility or death. Aside from the particular event being considered, the context also includes the size of the population “at risk” for the event. Suppose a skin cream that offers striking improvement in acne also raises the rate of fetal deformity from .010 to .011 if the cream is used by pregnant women. The increment of .001 seems tiny, but could lead to 2,000 unnecessarily deformed babies in a nation with 2,000,000 births. [One reason that public-health people often seem more “alarmed” than clinicians about “risks” is that the public-health denominator is so much larger than the clinical denominator. An obstetrician who delivers 200 women per year may not notice an adverse event that occurs at a rate of .005. In a community with 100,000 deliveries, however, the event will be much more apparent.]

Because the substantive context cannot be easily separated from the associated quantitative magnitude, biomedical scientists have often been reluctant to draw boundaries based on magnitude alone.

10.9.1.2Quantitative Complexity — For the simple indexes of contrast, both a ratio and an increment must be considered. The demarcation of values for two indexes simultaneously is much more complicated than choosing a single boundary, such as .05, for “significance” of a stochastic probability.

For the other indexes discussed throughout Sections 10.5 and 10.6, neither the standardized increment nor the common proportionate increment is well known or frequently used for dimensional data. Consequently, neither index is accompanied by a “commonsense” background of analytic experience to aid in the interpretation. For binary data, the odds ratio is probably easy to interpret intuitively as a “risk

ratio,” but the odds ratio will falsely enlarge the “risk” ratio 15 if the compared proportions exceed values

of .1. For example, if p1 = .04 and p2 = . 01, the risk ratio is 4, and the odds ratio is (.04/.01)(.99/.96) = 4.12. If p1 = .84 and p2 = .21, however, the risk ratio is still 4, but the odds ratio is (.84/.21) (.79/.16) = 19.75.

10.9.1.3Intellectual Distraction — In the fallacious belief that decisions about importance depend only on the P values (or confidence intervals) of stochastic significance, investigators may not feel that any boundaries are needed to demarcate quantitative significance. This intellectual distraction

©2002 by Chapman & Hall/CRC

has become untenable in recent years, however, because scientists have discovered two major circumstances in which quantitative boundaries must be demarcated.

10.9.2Demand for Boundaries

Despite the inattention or efforts at avoidance, boundaries of quantitative significance are demanded and must be established in two common circumstances that occur before or after the research is done.

10.9.2.1Planning Sample Size — In planning a research study, investigators will want to have a sample size that assures stochastic significance for the results. As noted later, the calculation requires establishment of a boundary for the contrasted magnitude that will be regarded as quantitatively significant.

10.9.2.2Checking for False Negative Results — When a study has been completed and has yielded results that do not show an impressive quantitative distinction, the investigator may want to conclude that the compared groups do not have a major difference. To prevent this decision from being a “false negative” conclusion, certain stochastic calculations can be done to determine whether the group sizes were large enough to detect and confirm a “major difference.” For the stochastic calculations to be interpreted, however, a boundary must be set for the magnitude of the “major difference.”

10.9.3Development of Standards

For both reasons just cited, modern investigators have been increasingly unable to escape the challenge of demarcating quantitative significance and have had to develop some methods for doing so.

10.9.3.1 Sample Size in Clinical Trials — One frequent challenge is to choose a magnitude of quantitative significance when sample size is calculated for a controlled clinical trial. The calculated formula (as discussed later) requires a demarcation for δ , the amount of a significant quantitative

ˆ

increment in the contrasted means or proportions. Thus, if pA is the expected proportion of success in

ˆ

Group A, and if pB is the corresponding proportion in Group B, the result is quantitatively significant in favor of Group A if

ˆ ˆ ≥ δ pA – pB

(The “^” or “hat” symbol indicates a result that is estimated or expected but not directly observed.)

In making decisions about δ , the investigators have seldom assigned a direct value for it. Instead, they

usually choose a value for

the

proportionate

increment, θ , that is to be regarded

as quantitatively

ˆ

ˆ

ˆ

 

 

ˆ

significant. Thus, θ = (pA – pB )/pB . The value of δ

is then found by anticipating a value of pB for the

reference group, and by applying the formula δ

= θ

ˆ

have been values

pB . The customary choices of θ

of 25% or 50%. With these decisions, if the control group has had a success rate of 16%, a proportionate increment of 25% would require an actual increment of 4%, so that the treated group would need a success rate of 20%. If θ is 50% and the success rate is 16% in the control group, the treated group’s success rate would have to rise to 24% to be quantitatively significant.

The proportionate increment θ has often been used for approaching decisions about quantitative significance, but it creates two problems. The first is that the choice of δ is left to depend entirely on

ˆ θ

the magnitude of pB in the control group. Thus, if is 50% and the control group has a success rate of 2%, the rate of success required for a quantitatively significant difference in the treated group would be 3%. Despite the impressive value of 50% for θ , few commonsense evaluators would ordinarily be impressed either with a δ of 1% or with a “substantial improvement” that produces a success rate of only 3%.

The second problem is deciding which of the two complementary rates should be multiplied by θ .

θ = ˆ =

Thus, if 50% is applied to the control group’s failure rate of qB 98%, the failure rate demanded

© 2002 by Chapman & Hall/CRC

ˆ = δ

for the new treatment will be qA 49%. This of 49% (= 98% – 49%) is substantially greater than the δ of 1% calculated when θ = 50% was applied to the success rate of 2%.

For these reasons, the current concept of establishing quantitative significance with a θ value can be applauded because any clinical attempt to demarcate this type of significance is better than nothing. Nevertheless, because a thorough judgment requires a demarcation of two values — δ and θ — decisions based only on a θ value will inevitably be inadequate. Furthermore, such decisions perpetuate the custom of reporting results according to the somewhat misleading and occasionally deceptive proportionate increment.

10.9.3.2Importance of Epidemiologic Distinctions — In many epidemiologic studies of risk factors for disease, the proportions of people who become diseased is quite small. For example, in Doll and Hill’s famous study16 of smoking in British physicians, the occurrence rates of deaths ascribed

to lung cancer were about 166 per hundred thousand in cigarette smokers and 7 per hundred thousand in non-smokers. These two values have an impressive ratio of 23.7 ( = 166/7) but a highly unimpressive

increment of .000159. Expressed in the NNE formula, an extra lung cancer would occur in one of 6289 (= 1/.000159) smokers.

For an individual smoker, the increased incremental risk of developing lung cancer may seem too small to warrant cessation of the habit. On the other hand, in a society that contains 200 million people, a small increment can have a large total impact. If half of those people are adults and if half of the adults are smokers, there would be 50,000,000 smokers. With an incremental risk of .000159, this group of smokers would develop 7950 cases of lung cancer that presumably would not have otherwise occurred. The increment in the two rates may seem unimpressive at an individual personal level, but be important at a societal level.

The importance of certain quantitative distinctions may sometimes depend on such external features as the frequency with which the cited issue occurs as a general or medical problem. The kinds of distinctions that might seem trivial in a clinical comparison of two treatments might thus become substantial in an epidemiologic contrast of risk factors for disease.

10.9.3.3Importance of Additional Clinical Distinctions — Finally, certain issues in the quantitative magnitude of significance will regularly be affected by associated clinical factors such as the costs of treatment and the type and risk of adverse reactions. If a particular drug is very expensive and has a high rate of untoward adverse reactions, we might demand that its efficacy exceed that of placebo by a greater amount than we might ask of a cheaper drug that is relatively risk-free.

10.10 Pragmatic Criteria

Because no formal criteria have been promulgated for quantitative significance, Burnand, Kernan, and Feinstein17 investigated the boundaries that seemed to be used pragmatically for decisions about quantitative contrasts in a series of papers published in three general medical journals.

For comparisons of two means, the published reports did not always list the group sizes or standard deviations. Consequently, neither a standardized nor a common proportionate increment could regularly be determined. Accordingly, Burnand et al. examined the simple ratio, XA / XB , with XA routinely chosen to be the larger mean, so that the ratio always exceeds 1. The investigators concluded that 1.2 was commonly used as a lower boundary for quantitative significance of the ratio in a contrast of two means. Thus, if XA Š 1.2 XB , the value of XA XB will be Š.2 XB . In other words, one mean must be proportionately at least 20% larger than the other. If XA = 1.2 XB and if the two groups have equal size, the common proportionate increment will be (2)(.2)/(1 + 1.2) = .18. Thus, .18 might be regarded as a lower boundary for quantitative significance of the common proportionate increment.

To illustrate this process, suppose two compared groups have means of 18.9 and 10.1. The simple ratio is 18.9/10.1 = 1.87, which clearly exceeds the criterion boundary of 1.2. For this simple ratio, the common proportionate increment — for equal-sized groups — will be (2)(.87)/2.87 = .61.

© 2002 by Chapman & Hall/CRC

For a contrast of two proportions, Burnand et al. found that 2.2 was a commonly used lower boundary for quantitative significance of the odds ratio. The odds ratio was chosen as a single index of contrast to avoid decisions based on increments alone in proportions that could range from very small values, such as .001 in public health rates, to much higher values, >.1, in clinical rates of “success.” The odds ratio, which multiplies p1/p2 by q2/q1, also eliminated the ambiguity of deciding whether the compared ratio should be p1/p2 or q2/q1.

The boundaries of an impressive odds ratio, however, sometimes varied with the size of the groups under study. With the smaller groups and higher proportions (i.e., usually .01) in most clinical studies, the published reports often used somewhat lower boundaries for “quantitative significance,” than what appeared in the larger groups and lower proportions (i.e., <.01) of public-health research.

A value of .28 seemed to be a reasonable lower boundary for quantitative significance in a standardized increment of two proportions. This magnitude is slightly more than 1/4 of the common standard deviation between the two proportions. Thus, if PQ = .5, the absolute increment between two proportions would have to exceed (.28)(.5) = .14. For comparisons of much smaller proportions, such as .01 vs. .07, where P = .04 and PQ = (.04 )(.96 ) = .196, the absolute increment would have to exceed (.28)(.196) = .055.

Studying the “treatment effect size” in 21 trials of surfactant therapy for neonatal respiratory distress syndrome, Raju et al.18 found that a median value of θ = 50% had been used for the proportionate reduction expected in adverse outcomes after intervention. In most of the trials, however, the “observed treatment effect sizes were lower than the investigator-anticipated treatment effect sizes.” Nevertheless, “all except 1 of 21 reports concluded that the therapy was useful, mostly based on subgroup analyses.” [The phenomenon of clinical-trial distinctions that are unexpectedly small but nevertheless “significant” will be discussed later in several chapters.]

As noted later in Chapter 19, the variance of a system is not regarded as impressively altered unless the proportionate reduction is at least 10%. With this criterion for quantitative significance, PSRV must Š .1 , which is about .3. If this criterion is extended to the “effect size,” as discussed in Section 10.8, “quantitative significance” would require that SI Š .6. This boundary is much higher than the value of .28 noted earlier for SI in two proportions, and some further thought will show why epidemiologists often avoid using the SI to present their results. For example, consider the “impressive” risk ratio of 5 for a disease having occurrence rates of .005 in exposed and .001 in nonexposed groups. When these rates are suitably expressed as (.005 .001)/(.003 )(.997) , the SI has the unimpressive value of .07. If the occurrence rates are ten times higher, at .05 and .01, the SI is raised to the still unimpressive value of (.05 .01)/(.03 )(.97 ) = .23. At the “clinical” rates of .5 and .1, however, the SI finally becomes impressive, reaching a value of (.5 .1)/(.3 )(.7 ) = .87.

10.11 Contrasts of Ordinal and Nominal Data

In all of the two-group comparisons discussed so far, the data were either dimensional or binary. Ordinal and nominal data seldom receive descriptive comparisons because of the problem of choosing a single central index to compare, although the total distributions of data in the two groups can be contrasted stochastically with methods discussed for ordinal data in Chapter 15 and for nominal data in Chapter 27.

One common approach for descriptive comparisons in ordinal data is to assign arbitrary dimensional values to the categories, e.g., 0 = none; 1 = mild; 2 = moderate; 3 = severe. The results are then compared with means, medians, standard deviations, etc. as though the data were dimensional. The tactic is mathematically “shady,” but has been repeatedly used and frequently accepted in studies where data for pain, anxiety, satisfaction, or other subjective feelings are expressed in ordinal scales. A mathematically proper descriptive index of contrast can be developed (see Chapter 15) for ordinal data, but the index is unfamiliar and seldom used.

For nominal data, which cannot be ranked, the most common descriptive approach is to summarize each group with a single binary proportion derived from the modal category of the total or from a compression of several categories. The selected binary proportions are then compared as though they were ordinary binary proportions.

© 2002 by Chapman & Hall/CRC

10.12 Individual Transitions

All the discussion thus far has been concerned with comparing central indexes for two groups. A different type of quantitative contrast is appraised when an individual person’s condition changes from one state to another. These transitions are easy to evaluate if graded (by patient or clinician) on a simple “transition scale” such as better, same, or worse.

A distinctive clinical change has also occurred if a single-state ordinal rating of 4, on a pain scale of 0, 1, 2, 3, 4, later declines to 2. Changes of one category, from 4 to 3 or from 3 to 2, are more difficult to evaluate unless accompanied by a separate transition rating such as somewhat better. A one-category single-state change from 1 (for slight pain) to 0 (for no pain), however, almost always represents a distinct clinical improvement.

The tricky problems in individual transitions arise for dimensional variables, where vicissitudes of the measuring system itself must also be considered. For example, a change of 0.2 units may represent measurement variations in hematocrit rather than a hematologic alteration; and a rise or fall in magnitude of at least two tube-dilutions is usually demanded to represent a change in Group A streptococcal antibody titers. Another problem in dimensional data is deciding whether to calculate direct or proportional increments from a person’s baseline value for changes in such entities as blood pressure or weight.

In the absence of a clinically symptomatic observation to confirm the result of a laboratory test or physical measurement, the standard deviation of the group has been proposed as a basis for determining individual changes. Thus, a person may be rated as having a “significant” change if the direct increment exceeds one-fourth or one-half of the group’s standard deviation in that variable.

The concepts and standards used for measuring individual transitions are now in a state of ferment, particularly as clinical investigators (and regulatory agencies) have begun giving increased attention to quantitative rather than merely stochastic accomplishments in therapeutic trials and other research.

10.13 Challenges for the Future

As a basic part of statistical appraisals, decisions about quantitative significance require intricate judgments for which a well-developed set of standards has not yet been established.

In the absence of standards, you will have to use your own judgment to determine whether a claim of “statistical significance” represents a truly important quantitative result or a stochastically low P-value obtained (as noted later) from a trivial difference with a large group size. If investigative judgments by you and your colleagues can lead to better general agreement about the strategies and boundaries, the results may produce desperately needed standards for quantitative significance. The new standards would allow the part of statistics that depends on substantive observation and analysis of quantitative significance to become at least as important as the stochastic significance that relies on mathematical theories of probability.

References

1. Rossen, 1974; 2. Cohen, 1977; 3. Glass, 1981; 4. Laupacis, 1988; 5. Naylor, 1992; 6. Sheps, 1958; 7.Forrow, 1992; 8. Bobbio, 1994; 9. Bucher, 1994; 10. Feinstein, 1992; 11. Sinclair, 1994; 12. Sackett, 1994; 13. Cornfield, 1951; 14. Feinstein, 1985; 15. Feinstein, 1986; 16. Doll, 1964; 17. Burnand, 1990; 18. Raju, 1993; 19. Steering Committee of the Physicians’ Health Study Research Group, 1989; 20. Peto, 1988; 21. Boston Collaborative Drug Surveillance Program, 1974. 22. Stacpoole, 1992.

© 2002 by Chapman & Hall/CRC

Appendix for Chapter 10

A.10.1 Algebraic Demonstration of Similarity for Cohort Risk Ratio and Case-Control Odds Ratio

In a cohort study, let e be the proportion of exposed persons in the total population, N. The size of the exposed group will be n1 = eN; the unexposed group will have n2 = (1 – e)N and N = n1 + n2. The occurrence rate of events in the exposed cohort will be p1 and the number of events will be a = p1n1 = p1eN. The number of persons without events will be b = (1 p1)n1 = (1 p1)eN. The corresponding values in the nonexposed cohort will be p2 for the occurrence rate, and c = p2n2 = p2 (1 e)N and d = (1 p2)(1 e)N for persons with and without events. The fourfold table for the cohort study will be as follows.

 

 

Nondiseased

 

 

Diseased Cases

Controls

Total

 

 

 

 

Exposed

pleN

(1 pl)eN

eN

Nonexposed

p2(1 e)N

(1 p2)(1 e)N

(1 e)N

When the odds ratio is calculated as ad/bc for this table, the factors of e, 1 e, and N all cancel. The odds ratio becomes

p1

(1 – p2 )

---- ×

------------------

p2

(1 – p1 )

As noted in Section 10.7.4.2, this odds ratio is the risk ratio, p 1/p2, multiplied by (1 – p2)/(l p1), and the latter factor will approximate 1 if both p2 and p1 are relatively small.

In a case-control study, the “sampling” is done from the total of diseased cases and nondiseased controls. Suppose the sampling fractions are k for cases and kfor controls. In other words, if we choose 50 of 1000 possible cases for the research, the sampling fraction is k = 50/1000 = .05. If we correspond - ingly choose 50 of 99,000 possible controls, the sampling fraction is k′ = 50/99,000 = .000505.

The fourfold table for the results of the case-control study will still appear as

a b

, but the data will

actually represent the following:

 

 

c d

 

 

 

 

 

 

 

 

Diseased

Nondiseased

 

 

Cases

Controls

 

 

 

 

 

 

Exposed

kpleN

k(1 pl)eN

 

Nonexposed

kp2(1 e)N

k(1 p2)(1 e)N

When the odds ratio is calculated in the form of ad/bc for this table, the values of k and k, as well as the values of e, 1 e, and N, will cancel. The odds ratio will then represent (p 1/p2)[(1 p2)/(1 p1)]. With the “uncommon” disease assumption that 1 p1 and 1 – p2 are each 1, this result will approximate p1/p2.

© 2002 by Chapman & Hall/CRC

Exercises

10.1. Millions of persons in the U.K. and U.S. now take aspirin as daily (or every other day) “prophylaxis” against myocardial infarction. The therapy depends on results of a randomized trial of 22,071 male U.S. physicians, receiving aspirin 325 mgm every other day or a double-blind placebo, for an average of 60.2 months.19 In summarizing the results, the investigators reported a “statistically significant” finding of a “44 percent reduction in the risk of myocardial infarction (relative risk 0.56…)” in men aged 50 years. In an analogous previous trial in the U.K.,20 results for 5139 “healthy male doctors” receiving either no treatment or 500 mgm (or 300 mgm) of daily aspirin for an average of 6 years were somewhat “positive” but not “statistically significant.” A summary of the pertinent data is in the following table.

 

U.S. Study

 

 

U.K. Study

 

Aspirin

Placebo

 

Aspirin

No Aspirin

 

 

 

 

 

No. of participants

11,037

11,034

3429

1710

No. of subject years

54,560.0

54,355.7

18,820

9470

No. of:

 

 

 

 

 

Total deaths (all causes)

217

227

270

151

Deaths from myocardial

10

26

89

47

infarction

 

 

 

 

 

Nonfatal confirmed

129

213

80

41

myocardial infarction

 

 

 

 

 

Fatal stroke

9

6

30

12

Nonfatal stroke

110

92

61

27

 

 

 

 

 

 

10.1.1.What results and indexes of descriptive comparison would you use to express the most cogent findings in the American trial?

10.1.2.What are the corresponding results in the U.K. trial?

10.1.3.What information was used by the U.S. investigators to calculate a “44 percent reduction” and “relative risk 0.56”?

10.1.4.Why do you think the total death rates were so much higher in the U.K. trial than in the

U.S.?

10.1.5.How would you evaluate and compare the benefits of aspirin for MI versus its risk for stroke in the two trials?

10.1.6.Do the foregoing evaluations change your beliefs about the merits of daily or every-other- day aspirin prophylaxis? In other words, what did you think before you did this exercise, and what do you think now?

10.2.The following results were reported for two carefully conducted randomized trials of cholesterollowering treatment.

Trial A:

When 1,900 men receiving active treatment X were compared with 1,906 men given a placebo, the death rate from coronary heart disease after seven years was found to be 2.0% in the group given the placebo and 1.6% in the group given the active treatment, a reduction in the death rate from coronary heart disease of 0.4% over those seven years. (This difference was statistically significant.)

Trial B:

When active treatment Y was compared with placebo among almost 4000 middle-aged hypercholesterolemic men, a statistically significant 20% relative reduction was achieved in the 7-year rate of death from coronary heart disease.

Which of these two treatments would you prefer to offer your patients? Why?

© 2002 by Chapman & Hall/CRC

10.3. In the first case-control study 21 that described an alleged relationship between reserpine and breast cancer, the data were as follows

 

Breast Cancer

“Control” Patients

 

 

Cases

Without Breast Cancer

Total

 

 

 

 

Users of Reserpine

11

26

37

Nonusers of Reserpine

139

1174

1313

Total

150

1200

1350

 

 

 

 

10.3.1.What indexes would you use to contrast the risk of breast cancer in reserpine users vs. non-reserpine users?

10.3.2.What is the incremental risk of breast cancer in users of reserpine?

10.4.Here is a direct copy of the Abstract of a randomized trial in treatment of lactic acidosis.22

Abstract Background. Mortality is very high in lactic acidosis, and there is no satisfactory treatment other than treatment of the underlying cause. Uncontrolled studies have suggested that dichloroacetate, which stimulates the oxidation of lactate to acetylcoenzyme A and carbon dioxide, might reduce morbidity and improve survival among patients with this condition.

Methods. We conducted a placebo-controlled, randomized trial of intravenous sodium dichloroacetate therapy in 252 patients with lactic acidosis; 126 were assigned to receive dichloroacetate and 126 to receive placebo. The entry criteria included an arterial-blood lactate concentration of Š5.0 mmol per liter and either an arterial-blood pH of 7.35 or a base deficit of 6 mmol per liter. The mean (± SD) arterial-blood lactate concentrations before treatment were 11.6. ± 7.0 mmol per liter in the dichloroacetate-treated patients and 10.4 ± 5.5 mmol per liter in the placebo group, and the mean initial arterial-blood pH values were 7.24 ± 0.12 and 7.24 ± 0.13, respectively. Eighty-six percent of the patients required mechanical ventilation, and 74 percent required pressor agents, inotropic drugs, or both because of hypotension.

Results. The arterial-blood lactate concentration decreased 20 percent or more in 83 (66 percent) of the 126 patients who received dichloroacetate and 45 (36 percent) of the 126 patients who received placebo (P = 0.001). The arterial-blood pH also increased more in the dichloroacetate-treated patients (P = 0.005). The absolute magnitude of the differences was small, however, and they were not associated with improvement in hemodynamics or survival. Only 12 percent of the dichloroacetate-treated patients and 17 percent of the placebo patients survived to be discharged from the hospital.

What conclusions would you draw about the value of dichloroacetate treatment?

10.5. Here is an opportunity for you to indicate what you mean by “quantitative significance” (and for the class convener to note observer variability in the decisions). You have been asked to choose a specific boundary point that will allow the following conclusions in quantitative contrasts. What boundaries would you choose for each decision? If you used specific increments and ratios in making the decisions, indicate what they were.

10.5.1.What value of the success rate for active treatment will make you decide that it is substantially better than a placebo success rate of 45%?

10.5.2.The mortality rate with placebo is 8%. What should this rate become to decide that the active treatment is worthwhile?

10.5.3.Short-term pain relief has been rated on a scale of 0 to 3, with 0 = no relief and 3 = complete relief. The mean rating for placebo is 1.3. What should the mean rating be for an effective active treatment?

10.5.4.The rate of endometrial carcinoma in postmenopausal women who do not use estrogens is .001. What rate, if correctly associated with estrogens, will alarm you enough to make you want to stop estrogen treatment in a patient whose distressing menopausal syndrome is under excellent control?

©2002 by Chapman & Hall/CRC

10.6.By now you should appreciate the point that decisions about quantitative and stochastic significance are not always concordant. A quantitatively significant difference, such as 75% vs. 25%, may not achieve stochastic significance if the sample sizes are too small, such as 3/4 vs. 1/4. Conversely, a quantitatively trivial difference, such as baseball batting averages of .333 vs. .332, can become stochastically significant if the group sizes are large enough. (The latter statement will be proved in Exercise 14.5.) Without going through an exhaustive search, can you find and cite an example of this discordance in published literature? In other words, you want to look for one of two situations. In one situation, the investigators claimed “statistical significance” for a contrast that you believe was not quantitatively significant. In the other situation, the investigators dismissed, as “not statistically significant,” a contrast that you believe was important quantitatively and that should have received more serious attention.

One example — it can be either type — will suffice; but outline briefly what you found and why you disagree with the conclusion.

10.7.Find a published report of a randomized trial. It should be on a topic “important” enough to warrant advance calculation of sample size. Check to see what value of δ or θ (or any other demarcation

of quantitative significance) was used for the sample-size calculation. Regardless of whether you can find this demarcation, note the results reported for the main outcome of the trial. Were they “statistically significant”? Do you regard them as quantitatively significant? (Give reasons and/or calculations for your

answer.) Did the “significant” difference found in the actual results agree with the previously stated value of δ or θ ?

© 2002 by Chapman & Hall/CRC

11

Testing Stochastic Hypotheses

CONTENTS

11.1Principles of Statistical Hypotheses

11.2Basic Strategies in Rearranging Data

11.2.1Parametric Sampling

11.2.2Empirical Procedures

11.2.3Relocation Procedures

11.3Formation of Stochastic Hypotheses

11.4Statement of Hypothesis

11.5Direction of the Counter-Hypothesis

11.6Focus of Stochastic Decision

11.6.1P Values

11.6.2Confidence Intervals

11.6.3Relocation Procedures

11.7Focus of Rejection

11.7.1P Values

11.7.2Confidence Intervals

11.7.3Relocation Procedures

11.8Effect of Oneor Two-Tailed Directions

11.8.1Construction of One-Tailed Confidence Intervals

11.8.2Origin of Two-Tailed Demands

11.8.3Controversy about Directional Choices

11.8.4Compromise Guidelines

11.9Alternative Hypotheses

11.10Multiple Hypotheses

11.10.1Previous Illustration

11.10.2Mechanism of Adjustment

11.10.3Controversy about Guidelines

11.11Rejection of Hypothesis Testing

11.11.1Complaints about Basic Concepts

11.11.2Bayesian Approaches

11.12Boundaries for Rejection Decisions

11.12.1P Values and α

11.12.2Confidence Intervals

11.12.3Descriptive Boundaries

References

Exercises

Statistical inference was originally developed to estimate the parameters of a parent population by using the results found in a single random sample. As discussed in Chapters 7 and 8, the estimated parameters were the location of a point, such as the parametric mean or proportion, and the magnitude of a confidence interval surrounding the point. (In statistical parlance, these inferences are often called point estimation

© 2002 by Chapman & Hall/CRC