Добавил:

Sekretar kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Ростовский Государственный Медицинский Университет

Предмет:

Медицина общая

Файл:

Principles Of Medical Statistics_Feinstein_2002

.pdf

Скачиваний:

Добавлен:

28.03.2026

Размер:

25.93 Mб

Скачать

☆

<<< < Предыдущая 38 39 40 41 42 43 44 45 46 47 48 4950 / 6950 51 52 53 54 55 56 57 58 59 60 61 62 > Следующая >>>

many contagious diseases were ascribed to miasmal vapors. In the mid-20th century, thousands of premature babies were blinded during an iatrogenic epidemic of retrolental ﬁbroplasia,1 caused by excessive oxygen therapy given as prophylaxis in the belief that its potential pulmonary beneﬁts could not be accompanied by adverse effects elsewhere in the body.

23.1.1.3 Erroneous Interpretations — When the hypotheses emerge from the data, a correct set of information may be erroneously interpreted.

A major blunder in confusing association with causation was committed by William Farr (the respected “founder” of Vital Statistics) who concluded, from statistical correlations, that cholera was caused by high atmospheric pressure.2 In the early 20th century, pellagra was erroneously regarded as infectious after it was commonly found in members of a family and in their neighbors.3

23.1.2Statistical Problems

Common statistical sources of error have been the failure to set quantitative boundaries, to appraise stochastic variation, and to understand dissident results.

23.1.2.1 Boundaries for Quantitative Distinctions — As the use of statistical data became popular in the 20th century, a subtle source of error was incorrect beliefs about magnitude. If Treatment A was believed “better” than B, but turned out “worse,” the scientiﬁc hypothesis itself was wrong. If A was only slightly or trivially better, however, the scientiﬁc hypothesis was still correct, but the anticipated magnitude of difference was wrong.

To be examined statistically, such concepts as better, worse, or trivially better must be converted to quantitative expressions. The scientiﬁc difﬁculty of choosing these expressions was discussed in Chapter 10. Qualitatively, a particular phenomenon (death, “success,” relief of symptoms, etc.) must be chosen as the focus of attention; and a particular statistical index (increment, direct ratio, proportionate increment, etc.) must be chosen to cite the quantitative distinction, do, observed in the compared treatments. The next statistical step is to demarcate a magnitude that will make this distinction be regarded as “big” or “small.”

The choice of these boundaries is often difﬁcult, but they must nevertheless be established to allow statistical procedures to be used, before the research, for calculating a suitable sample size, and afterward, to decide whether the observed distinctions are impressive or unimpressive enough to warrant stochastic “tests of signiﬁcance.”

23.1.2.1.1 Demarcations of “Big” and “Small”. Suppose the symbol δ is used for the lower boundary of a big or quantitatively impressive distinction. If Treatment A is expected to be substantially better than Treatment B, the scientiﬁc hypothesis might be cited symbolically as

A − B ≥ δ

If Treatments A and B are expected to be essentially equivalent, the difference in results will seldom be exactly zero. Consequently, an upper limit, expressed with the symbol ζ, can be established as the maximum magnitude of a small or quantitatively insigniﬁcant distinction. The scientiﬁc hypothesis of equivalence, or a tiny difference, could then be quantitatively cited as

A − B ≤ ζ

23.1.2.1.2 Effects of Quantitative Boundaries. The quantitative boundaries for δ and ζ are arbitrary, but no more arbitrary than the level of .05 usually set for the boundary of α in stochastic tests. With two sets of boundaries available for quantitative and stochastic decisions, however, a gallery of statistical errors becomes possible. Most of this chapter is concerned with possible errors in stochastic decisions when the investigator wanted to ﬁnd a “big” distinction, i.e., do ≥ δ. The stochastic examination of “small” distinctions, i.e., the conﬁrmation of “equivalence,” is discussed in Chapter 24.

23.1.2.2 Stochastic Variations — Stochastic variation refers to phenomena that can occur during the action of random chance. Wrong conclusions can occur if these variations are not recognized and suitably accounted for. For example, someone who wins the main prize in a lottery might become rich, but would not immediately be regarded as a talented selector of random numbers. In fact, if the same person wins again and particularly a third time, we might believe something is wrong with the lottery process. On the other hand, if a 12 is tossed with two dice, the chance probability of the occurrence is (1/6)(1/6) = 1/36 = .028, but we would not immediately reject the idea that the dice are “fair.” The event would regularly happen among a large series of consecutive tosses at a gambling casino’s dice table.

23.1.2.2.1 “False Positive” Conclusions. The customary “tests of statistical signiﬁcance” are done to avoid erroneous “false positive” conclusions that might arise merely from stochastic variation; and α is set at the level of acceptable errors. Thus, an impressively large incremental success rate could readily arise by stochastic variation alone if two treatments, A and B, are actually equivalent, but produced pA = 8/13 = .615 and pB = 4/12 = .333, with do = .282, in a study done with small groups.

For these small numbers, the stochastic test of the null hypothesis is best done with the Fisher exact procedure, which would produce 2P = .238. The mathematical principles, however, are easier to illustrate with the Z test. The standard error of the difference is ﬁrst calculated with Formula [14.11] as

SED = [(8 + 4)(5 + 8)] ⁄ [(13 + 12)(13)(12)] = 0.2

The observed value of Z, designated as Zo, is then calculated as .282/.2 = 1.41, for which 2Po = 0.159. Although this P value is smaller than the result of the Fisher test, neither procedure would lead to rejection of the null hypothesis with α set at .05.

23.1.2.2.2 “False Negative” Conclusions. Stochastic variation, however, might also lead to an erroneous “false negative” conclusion. For example, suppose two treatments in a clinical trial really differ substantially (i.e., by at least δ = .15), and suppose at least 25 patients had been entered into each group. If the subsequent results then show that pA = 14/29 = .483 and pB = 12/28 = .429, the value of do = .483 − .429 = .054 is less than δ = .15. This result is not quantitatively signiﬁcant and is also not stochastically signiﬁcant with a test in which we ﬁrst calculate

SED = [(14 + 12)(15 + 16)] ⁄ [(57)(29)(28)] = .132

The value of Zo is then .054/.132 = .409, for which 2Po = .68.

To avoid a false negative conclusion, however, we can ﬁrst check to see whether the observed small value of do is a stochastic variation, which differs by chance from the true value of δ ≥ .15. An obvious way to check for the latter possibility was discussed in Sections 11.7.2 and 11.9. We determine whether δ = .15 is included in the upper boundary of an appropriate conﬁdence interval around do = .054. With

.132 as the value of SED, the 95% conﬁdence interval is .054 ± (1.96)(.132) = .054 ± .258. It extends from −.205 to .313. The result is not stochastically signiﬁcant because the value of 0 is included, but the conﬁdence interval also includes the value of .15. The result might therefore be a stochastic variation from a true difference, between treatments A and B, that is actually as large as δ = .15.

Because the conﬁdence interval component, Zα × SED, is .258 here, the value of δ = .15 would be included in the upper part of the conﬁdence interval for all positive values of do. The interval would fail to exceed δ only if do < δ − Zα(SED), which would occur when do has a negative value of at least .150 − .258 = −.108.

23.1.2.2.3 Clinical Claims of “No Difference”. The situation just described constantly occurs in medical literature when the investigator gets the observed data, ﬁnds that the customary P value exceeds the α level of “signiﬁcance,” and then concludes that the study had “nonsigniﬁcant” results.

If a conﬁdence interval was not published to show how large the “nonsigniﬁcant” difference might have been (or sometimes even if the conﬁdence interval was shown), irate readers will regularly send letters to the editor complaining about the omission. The readers usually contend that the group sizes

were too small to prove the claim. The original authors may then respond by citing the conﬁdence intervals (which may often include a “big” result), but offering various justiﬁcations for the claim of nonsigniﬁcance.

Such arguments have occurred after publication of clinical trials claiming that early discharge from hospital was relatively safe after acute myocardial infarction,4 that glucagon injections did not improve accuracy of a double-contrast barium enema,5 that exchanging unsaturated fats did not affect plasma lipoproteins6, that thoracic radiotherapy did not prolong survival in patients with carcinoma of the lung,7 and that ritodrine (a beta-adrenergic agonist) was not effective in treating preterm labor.8 If a big difference is included, the upper boundary of the conﬁdence interval can readily be used to justify the contention that such a difference might exist.

23.1.2.3 Statistical Dissidence — The quantitative and stochastic decisions agree if the observed distinction seems quantitatively signiﬁcant, and if the stochastic test conﬁrms the signiﬁcance. Statistical dissidence occurs when the two sets of results do not agree, so that signiﬁcance is found quantitatively but not stochastically, or stochastically but not quantitatively. The conclusion will be wrong if a correct quantitative distinction is ignored in favor of the contradictory stochastic result.

The quantitative-yes–stochastic-no type of dissidence was frequently noted in scientiﬁc literature as stochastic tests became increasingly used to prevent erroneous conclusions from small groups. The dissidence occurs when the group size is too small to allow stochastic conﬁrmation for a big quantitative distinction. Without the stochastic test, an investigator might claim signiﬁcance when a quantitatively impressive increment of 15% in success rates of 25% vs. 40% came from numbers as small as 1/4 vs. 2/5. “Tests of signiﬁcance” were introduced and intended to prevent this problem. The opposite type of quantitative-no–stochastic-yes dissidence, which has become increasingly common when stochastic tests are used as the main or only basis for scientiﬁc conclusions, is the stochastic proclamation of signiﬁcance for a small, unimpressive quantitative distinction.

23.1.2.3.1“Boundless Signiﬁcance” and Oversized Groups. The enormous impact of size in tested groups was shown in earlier discussions of the Z, t, and chi-square tests. If the groups are too small, an impressive quantitative distinction may not be stochastically conﬁrmed; but if the groups are too big, an unimpressive distinction may become stochastically signiﬁcant.

The latter type of statistical dissidence can occur because the customary calculation of stochastic signiﬁcance is “boundless.” The quantitative boundary of δ is neither used nor needed for determining a conventional P value from Zo = do/(SED) or a conﬁdence interval constructed as do ± Zα (SED). If the calculated Zo exceeds Zα, or if the conﬁdence interval excludes 0, the stochastic result can be proclaimed

“signiﬁcant,” regardless of the actual magnitude of do. For example, in a study that contains more than 2200 persons in each group, the rates of “success” may be 750/2207 = .34 in Group A and 819/2213 =

.37 in Group B. The increment of .03 in the two groups may seem small and unimpressive, but it is stochastically signiﬁcant at P < .05, because the big groups lead to a suitably large value of 2.1 for Zo.

There is no statistical method to prevent erroneous conclusions in this situation. They can be avoided (or “cured”) only if investigators (and readers) preserve their scientiﬁc judgment and examine the actual magnitude of the observed distinction. If it is not big enough to be impressive, it is not “signiﬁcant” even if the P value is inﬁnitesimal.

23.1.2.3.2Problems of Undersized Groups. The stochastic dissidence caused when undersized

groups are too small to allow rejection of the null hypothesis was illustrated in Section 23.1.2.2.1. In a

well-conducted clinical trial that produced pA = 8/13 = .615 vs. pB = 4/12 = .333, the investigator could not get stochastic conﬁrmation for the impressively large quantitative increment of do = .282.

This problem, although regularly regarded as a defect in “power” of the trial, is actually due to a simpler defect in what might be called capacity. As noted later, power refers to the ability to reject an alternative hypothesis that the quantitative distinction is large although the observed result may be small. Capacity, however, refers to the ability to reject the original null hypothesis when the observed distinction is large.

23.2 Calculation of Capacity

The statistical dissidence just described occurred because the group sizes were too small. If the investigator had really expected to ﬁnd an increment as large as .282 between the two treatments, the necessary sample size for stochastic signiﬁcance at a two-tailed P < .05 could have been calculated with the earlier Formula [14.20], using πB = .333, to get

n ≥ (2)(.333)(.667)(1.96)2/(.282)2 = 21.5

At least 22 patients would have been required in each group. With Formula [14.19], for which π would be estimated as (.615 + .333)/2 = .474, the sample size needed for each group would have been

n ≥ (2)(.474)(.526)(1.96)2/(.282)2 = 24.1

or at least 25. With either calculation, the actual group sizes of 13 and 12 would lack the capacity to achieve a stochastic 2P < .05.

If δ = .15 had been originally chosen as a boundary for quantitative signiﬁcance, the sample size required by Formula [14.19] would have been

n ≥ 2(.4065)(.5935)(1.96)2/(.15)2 = 82.4

With at least 83 persons in each group, the quantitatively impressive do = .282 would easily have yielded 2P < .05.

The cited stochastic defect in capacity can easily be quantiﬁed numerically. The foregoing calculations (where do = .282 and π was estimated as .474) showed that about 25 persons were required for each group, making a total required size of NR = 50. Because the actual group, No, contained 13 + 12 = 25 people, the capacity was approximately No/NR = 25/50 = 50% of what was needed.

The main point to be noted, however, is that the trial under discussion was defective in its basic capacity, not in its power to reject an alternative hypothesis, as discussed shortly.

23.3 Disparities in Desired and Observed Results

In a study that begins with the goal of ﬁnding a big difference, do ≥ δ, three possible outcomes can occur. In the ﬁrst two, the result is “positive,” with do ≥ δ. This “positive” result is then either conﬁrmed stochastically, with Po ≤ α, or not conﬁrmed, with Pο > α. In the third situation, the result of the trial is “negative,” with do < δ. In this situation, the investigator hopes that the small do is stochastically consistent with a big δ.

23.3.1General Conclusions

An observed “positive” big distinction, i.e., do ≥ δ, would be stochastically conﬁrmed if the group size had full capacity, but would not be conﬁrmed, with P > α, if the group size was too small. If the result was “negative,” i.e., do < δ, the small do might be rendered stochastically signiﬁcant if the group size was huge; but in most reasonable situations, the associated Po would exceed α, and the distinction would be nonsigniﬁcant, both quantitatively and stochastically.

In the last situation, however, an investigator who wanted to ﬁnd do > δ would be delighted if δ were included in the upper end of the conﬁdence interval for do. After savoring the delight, however, a cautious investigator might have a nagging doubt. Suppose the scientiﬁc hypothesis is wrong, so that do is really small, rather than being merely a stochastic variation from δ. Worried about this possibility, the investigator may now want some further stochastic reassurance that can prove or conﬁrm “no difference.” Thus, the investigator would ask, “How can I be sure I have not been deluded by random fate? What would be convincing evidence that the treatments have a really small difference?”

23.3.2Group Size for Exclusion of δ

Scientiﬁcally, the immediate answer to the latter questions is “Repeat the trial.” Statistically, however, a numerical solution can be offered. If a suitable conﬁdence interval excludes the “big” value of δ, the demonstration that

do + Zα (SED) < δ

[23.1]

could be reasonably assuring. It would indicate that the “small” value of do is probably not merely a stochastic variation from a true large value of δ.

To determine the group size required for this assurance, we ﬁrst convert Formula [23.1] to Zα(SED) < (δ − do). Assuming equal group sizes, SED can then be calculated as 2πˆ (1 – πˆ ) ⁄ n . After the algebra is developed, the size of n needed in each group will be

2	ˆ	ˆ	[23.2]
n > Z-----------------------------α2	π(1 – π)
(δ – dO )2

To illustrate the calculation, suppose we assume that do will be .054, as in Section 23.1.2.2.2. We determine πˆ as the average of the estimated pA and pB, which is (.483 + .429)/2 = .456, so that 1 – πˆ =

.544. We set Zα = 1.96 and then substitute in [23.2] to get

n > [(1.96)2(2)(.456)(.544)]/(.150 − .054)2

which turns out to be 1.906/(.096)2 = 206.8.

Thus, if a trial with 207 patients in each group yields the expected pA = .483, pB = − .429, and do =

.054, the value of δ = .15 would be excluded from the conﬁdence interval. The investigator could then be able, with 95% conﬁdence, to conclude that Treatment A does not exceed Treatment B by a difference of δ = .15 or more.

If things turn out almost exactly as anticipated in the preceding paragraph, the results with the larger sample size will be pA = 100/207 = .483 and pB = 89/207 = .430. The SED will be

[(89 + 100)(107 + 118)] ⁄ [(414)(207)(207)] = .049

The 95% conﬁdence interval will be (.483 − .430) ± (1.96)(.049) = .053 ± .096; and it will extend from −.043 to .149. With 0 included, the result is not stochastically signiﬁcant; and with .150 excluded, the result offers stochastic assurance that do is unlikely to be as large as .150. The investigator could now conclude that the original scientiﬁc hypothesis was probably wrong. Despite the desired hope, Treatment A is not substantially better than Treatment B.

23.4 Formation of Alternative Stochastic Hypothesis

The foregoing tactics brought us into a new type of stochastic reasoning. In everything done until now, we found a “big” difference, do, that was scientiﬁcally expected and welcome, so we tried to conﬁrm it stochastically. The stochastic hypothesis, symbolized as ∆, was assumed to be the opposite of what we wanted to prove. We made ∆ as small as possible, i.e., ∆ = 0.

For the new situation, however, we want stochastic conﬁrmation for a “small” difference, i.e., do < δ. We therefore want to reject a different hypothesis, i.e., that ∆ ≥ δ. If δ is excluded from the corresponding conﬁdence interval, we could conclude stochastically that the observed result, do, is indeed smaller than δ.

This conclusion, however, would reverse the original goal of the trial, which was done with the hope of ﬁnding do ≥ δ. The reversal is not important for the statistical procedures that follow in the next few

sections, but becomes a crucial feature of the reasoning when we reach the Neyman-Pearson strategy considered in Section 23.6.

23.4.1Statement of Alternative Hypothesis

The conventional null hypothesis places the value of ∆ at 0 for increments, correlation coefﬁcients, or slopes, and at 1 for a ratio. To avoid frequently repeating the comment about “1 for a ratio,” all null hypotheses about “equivalence” or “no distinction” will hereafter be cited as 0. The same ideas and approaches will also pertain, if expressed for a ratio, but the null hypothesis will be 1.

With ∆ representing the value of the stochastic hypothesis, the conventional “null” assumption, ∆ = 0, was stated (for two proportions) as Ho : πA − πB = 0. In the new procedure, however, the hypothesis to be rejected is the alternative statement that ∆ ≥ δ. The symbols would be HH : πA − πB ≥ δ.

23.4.1.1Imprecise Counter-Hypothesis — In the logic of stochastic testing, a primary hypothesis can be rejected or conceded, but never accepted. The primary hypothesis is therefore set to

be the opposite of what we would like to conclude; and when the hypothesis is rejected, we concede the counter-hypothesis. To be the direct opposite of the null hypothesis, ∆ = 0, the counter-hypothesis must be imprecise, without a stipulated focal point. If ∆ = 0, the counter-hypothesis can be either a twotailed ∆ ≠ 0, or in a one-tailed direction, ∆ > 0 or ∆ < 0; but it cannot be ∆ = δ.

This logic is responsible for the “boundless signiﬁcance” discussed in Section 23.1.2.3.1. Suppose δ = .15 is set as the level of quantitative signiﬁcance for an increment in two proportions, and suppose

the results show that pA = 27/42 = .64 and pB = 11/39 = .28. For the quantitatively signiﬁcant distinction of do = pA − pB = .36, the value of Zo turns out to be 3.25. Although the observed value of do can now be deemed stochastically signiﬁcant at a two-tailed P < .05, the actual stochastic conclusion is only that ∆ ≠ 0. The observed do = .36 acquires its label of “stochastic signiﬁcance” merely by being compatible with the stochastic conclusion. This same conclusion could have been obtained with adequately large

group sizes if do were smaller, at .19. The observed do could even be stochastically signiﬁcant when smaller than δ, at values of .10, .07, or .03. For example, suppose pA = 238/2166 = .11 and pB = 80/2184 =

.04, so that do = .07. This result is substantially smaller than δ = .15, but it would produce Zo = 2.5, for

which 2P < .05. Despite the relatively small do, we can still reach the same stochastic conclusion, i.e., ∆ ≠ 0, as with the previous big do.

23.4.1.2Precise Location for Alternative Hypothesis — Unlike a counter-hypothesis, the alternative hypothesis has its own speciﬁc dignity and focal location. Like any other stochastic hypothesis, the alternative hypothesis can be rejected or conceded but not accepted. When considered for the

possibility of being false, i.e., rejected, the alternative stochastic hypothesis must have a precisely speciﬁed location, analogous to the precision of ∆ = 0.

If the goal is to get stochastic conﬁrmation that do < δ, the precise value of the alternative hypothesis is usually set at δ. With HH as the symbol, the alternative hypothesis becomes expressed as HH : ∆ = δ; and its counter-hypothesis becomes ∆ ≠ δ. For reasons to be cited shortly, the alternative hypothesis is

almost always checked in a one-tailed direction. The appropriate statement would then be HH : ∆ ≥ δ; and the counter-hypothesis would be ∆ < δ. For simplicity of expression and calculation, however, the

usual statement is simply HH : ∆ = δ. The directional issues are implicit when the results are interpreted. In a contrast of two proportions, pA and pB, the parametric alternative hypothesis is HH : πA − πB = δ, or (if speciﬁcally two-tailed) HH : |πA − πB| = δ. In a contrast of two means, XA and XB , the same

principles are used, but the parameters in the hypothesis are µA and µB.

With this operating principle, we can explore what has been called “the other side of statistical

signiﬁcance,”9 by considering what happens if the original null hypothesis is false and should have been rejected. Its falsity is explored with the alternative stochastic hypothesis, for which δ replaces the null-

hypothesis value of 0. Under the alternative hypothesis, the observed value of do is examined as the increment of δ − dο.

FIGURE 23.1

Location of do in reference to distributions for original null hypothesis (upper drawing) and for alternative hypothesis (lower drawing).

Location of do

23.4.2Alternative Standard Error

Under the alternative hypothesis (as under the null hypothesis), the increment in two means or in two proportions continues to have a theoretical Gaussian (or Gossetian) sampling distribution for values of Z (or t). Figure 23.1 shows the location of the observed do and the potential Gaussian distributions of increments

under each of the two stochastic hypotheses. ∆ = 0 As noted earlier, the standard error of a dif-

ference in two central indexes, symbolized as SED, is calculated differently when ∆ = δ rather than ∆ = 0. For contrasting two observed proportions with ∆ = 0, Formula [15.9] for SED is

SEDo = NPQ ⁄ nA nB
but for ∆ ≥ δ, the analogous calculation in

	∆ = 0	∆ = δ
Formula [15.12] is	∆ = 0	∆ = δ
Formula [15.12] is

SEDH = (pA qA ⁄ nA ) + (pB qB ⁄ nB )

Both calculations can be eased with the “shortcut” formulas shown earlier in Expressions [14.11] and [14.13].

The actual difference between the two SEDs, however, is usually small and inconsequential (see Section 14.4.2). For example, although the SED was calculated as SED0 in Section 23.1.2.2.2, the procedure was aimed at rejecting the alternative hypothesis that ∆ ≥ δ. Consequently, the calculation should have used SEDH, which would have been (14)(15) ⁄ 293 + (12)(16) ⁄ 283 = .132. The result (at three decimal places), however, is the same as the previously calculated SED0 = .132.

In many ensuing discussions in this text, the SED symbol will be used in a general way, regardless of which formula it comes from. For illustrative calculations, SEDH will be used when it is particularly pertinent, but SED0 will often be preferred because of its greater general applicability. A single calculation for SED0 has the advantage of letting the same conﬁdence interval sometimes be used, as in Section 23.1.2.2.2, for checking both the null hypothesis (in the lower boundary) and the alternative hypothesis (in the upper boundary).

23.4.3Determining ZH and PH Values

Using the alternative hypothesis, the symbols ZH and PH will correspond to the Zo and Po obtained with the ordinary null hypothesis. For comparing two groups, the alternative Z values will come from the formula

ZH = (δ − do)/SED			[23.3]
With the alternative SEDH calculated for two proportions, the formula will be
ZH =	δ – d o		[23.4]
ZH =	---------------------------------p A q A + p B q B		[23.4]
----------- ----------	---------------------------------p A q A + p B q B
	n A	nB

True Negative

Conclusion ( 1 − α)

For example, in a clinical trial where pA = 9/18 and pB = 8/17, so that do = .500 − .471 = .029, the value of SEDH under the alternative hypothesis will be

[(9)(9)/183 ] + [(8)(9) ⁄ 173 ] = .169

If δ is designated as .15,

.15 – .029

ZH = ----------------------- = .716

.169

The values of ZH are interpreted as P values in exactly the same way as under the conventional null hypothesis. At the Gaussian value of ZH = .716, the two-tailed PH is .47. Thus, there is a two-tailed chance of .47, and a one-tailed chance of .235, that the observed result of do = .029 came from a population in which the true difference was as large as .15.

23.4.4Role of β

For the original null hypothesis, the α level establishes the boundary of α-error or Type I error for false positive conclusions if a correct hypothesis is rejected. For the alternative hypothesis, a corresponding level, called β, establishes the boundary of β-error or Type II error for the relative frequency of wrong decisions if a correct alternative hypothesis is rejected. If HH is true, its rejection would lead to the false negative conclusion that the two groups are not substantially different, when in fact they are.

Table 23.1 shows the use of α and β

levels in reasoning for the original null-	TABLE 23.1
levels in reasoning for the original null-	α, β, and Accuracy of Stochastic Decisions
hypothesis decision. If the null hypothesis	α, β, and Accuracy of Stochastic Decisions
that ∆ = 0 is correct, there is an α chance	for Null Hypothesis
that rejection is wrong, and a 1 − α chance
that rejection is wrong, and a 1 − α chance	Conclusion
that concession is right. If the true state of	RE Stochastic	True State of Reality
affairs is ∆ = δ, however, concession of the	Hypothesis	True State of Reality
original null hypothesis has a β chance of	That ∆ = 0	∆ = δ	∆ = 0
being wrong, and rejection has a 1 − β	Reject	True Positive	False Positive
chance of being correct.		Conclusion	Conclusion
		(1 − β)	(α)

23.4.5Analogy to Diagnostic Marker Tests

Concede	False Negative
	Conclusion
	(β)

The statistical parlance does not use the language of diagnostic marker decisions,

but the concepts are almost identical. Suppose a pap smear is done as a diagnostic marker test for a cancer. If the pap smear result agrees with the deﬁnitive tissue biopsy, the pap smear conclusion is either a true positive or true negative. If the pap smear and biopsy disagree, the original conclusion is either falsely positive or falsely negative.

23.4.5.1 False Positive Conclusions — If the null hypothesis is rejected with Po < α, there is still a probability of Po that the rejection is wrong. The selected value of α is the upper boundary of risk for the false positive conclusion. Thus, if α is set at .05 and stochastic signiﬁcance is proclaimed when Po = .049, the two groups may still be truly similar, and the probability is .049 that the decision is wrong. With α set at a higher level of .1, the quantitative range of false positive conclusions is expanded. The two groups might really be similar and the decision that they are different might be wrong in .03, .06,

.08, .09, or .099 of the occasions when the null hypothesis is rejected at the corresponding values of P < .1. When α is set as the boundary of “risk” for a false positive decision, the level of 1 − α is analogous to the speciﬁcity of a diagnostic test. In previous usage, 1 − α helped establish the boundaries of a

conﬁdence interval. In the application here, 1 − α helps denote the relative “conﬁdence” attached to a stochastic decision to concede the null hypothesis.

23.4.5.2False Negative Conclusions — If the original null hypothesis is rejected as false, we

infer that the parent universe has a big distinction (at least as big as the observed do), rather than none. If the null hypothesis is conceded, however, and if the parent universe really does have a big distinction, the concession will be a false-negative conclusion. Because β is set as the permissible frequency of false-negative conclusions, the value of 1 − β is analogous to the sensitivity of a diagnostic test.

23.4.5.3Role of Horizontal “Gap” — When “vertical” indexes of sensitivity and speciﬁcity are used in diagnostic decisions (see Chapter 21), we cannot immediately make “horizontal” appraisals of accuracy, because the prevalence of diseased cases will vary in different clinical situations. An analogous problem prevents horizontal conclusions in stochastic decisions, but the problem does not

arise from prevalence. The stochastic problem is caused by the numerical gap that separates ∆ = 0 from ∆ = δ in Table 23.1. If the observed value of do lies in the intermediate zone where 0 < do < δ, we might have to concede (or reject) both the original null and the alternative hypotheses.

23.4.6Choice of β

Stated as ∆ ≥ δ, the alternative hypothesis obviously has a clear direction and could therefore be tested with a one-tailed choice of β. Accordingly, for a .05 level of rejection, Zβ could be set at Z.1 = 1.645.

The concept becomes important if conﬁdence intervals are used to examine both the null and the alternative hypotheses. In previous examples, this examination was done with a “single” arrangement, constructed as

do ± Zα(SEDo )

A more accurate approach, however, would require two arrangements:

do – Zα(SEDo )

would be used to locate the lower border, and

do + Zβ(SEDH )

would indicate the upper border.

If the alternative hypothesis is ∆ ≥ δ, a one-tailed 95% conﬁdence interval can be used to check the upper border, which will be enlarged if calculated with a two-tailed Zα, rather than with a one-tailed Zβ. Unless the original null hypothesis was clearly expressed in advance as ∆ > 0 or ∆ < 0, however, a onetailed calculation is not appropriate for the lower border of the conﬁdence interval.

[This distinction led to a major legal battle between the U.S. tobacco industry and the Environmental Protection Agency (EPA), which had done a meta-analysis of results for lung cancer attributed to environmental tobacco exposure (i.e., “passive smoking”). Certain crucial odds ratios that were not stochastically signiﬁcant in two-tailed 95% conﬁdence intervals, calculated with Zα = 1.96, became “signiﬁcant” when the EPA’s 90% conﬁdence intervals, calculated with Zα = 1.645, excluded the null value of 1 from the lower border. The tobacco industry contended that substituting 90% for the customary 95% criterion was a political rather than scientiﬁc decision. (The argument included other scientiﬁc disputes beyond the accusation of “rigged” conﬁdence intervals.)]

Both the lower and upper margins of conﬁdence intervals should be examined when investigators either claim stochastic signiﬁcance in rejecting ∆ = 0, or argue that the upper level of “risk” might be much higher than what was found in the observed do. Rejection of ∆ = 0 is easier if the interval is calculated with a one-tailed Zα; the converse claims of a potentially larger do are facilitated with a two-tailed Zβ.

23.5 The Concept of “Power”

The unfamiliar term capacity was used in Section 23.2 to refer to group sizes that were too small to do the desired job of rejecting the original null hypothesis when do was “big.” Capacity is an unfamiliar word, because statisticians regularly use the term power in reference to the adequacy of group (or sample) sizes. The idea of power, however, refers to the ability to reject the alternative, rather than the null, stochastic hypothesis.

23.5.1Statistical Connotation of “Power”

In the customary “test of signiﬁcance,” a big distinction has been observed, and the stochastic question is “How small might it have been?” If the Po value exceeds α, or if the lower end of a 1 − α conﬁdence interval includes the null hypothesis value of 0, the distinction is not stable enough for its “quantitative signiﬁcance” to be conﬁrmed stochastically.

The “other side of statistical signiﬁcance” is examined when the observed distinction is small, or obviously not big, i.e., do < δ. The stochastic question is then “How large might it have been?” This question can be answered with a direct counterpart of the former reasoning. If the PH value exceeds β, or if the upper end of a 1 − β conﬁdence interval includes the alternative hypothesis value of δ, the quantitative “nonsigniﬁcance” is not conﬁrmed. Although small, the observed distinction might really be big. Thus, rejection of the alternative stochastic hypothesis is intended to conﬁrm that the observed small distinction is really small.

Although the ability of a group size to reject a stochastic hypothesis is often called “power,” the statistical deﬁnition of power is much more constrained. When δ and β are set in advance, 1 − β is called the statistical power to reject the alternative hypothesis that ∆ ≥ δ. This prospective concept of power is also sometimes applied in retrospect, after a study is completed. When ZH is determined from Formula [23.4] and converted to PH, the value of 1 − PH may be called “power.”

The latter usage of “power” has been vigorously disputed,10 however, because the strict deﬁnition requires a single boundary value of δ that was designated before the research began. This “prospective” boundary is often not established, however; and in its absence, the investigator or data analyst can

retrospectively make various choices of δ. Each choice would yield different results for ZH and for the
1 – PH value of “power.” For example, consider the clinical trial in Section 23.4.3, where PA = 9/18,
PB = 8/17, do = .029, and SEDH = .169. When δ was chosen to be .15, ZH = . 716, and the one-tailed
value of 1 − PH was 1 – .235 = .765. If δ is set at .10, ZH = .420, 2PH = .674, and 1 − PH = .663. If δ is
set at .20, ZH = 1.01, 2PH = . 312, and 1 − PH = . 844. To get an impressively high value of power, we
could set δ at .32. ZH would then be 1.72, 2PH	= .085, and 1 − PH = .9575.
With these arbitrary retrospective choices of	δ, however, “power” would become a type of “variable,”

rather than a distinctive, ﬁxed attribute of the study. The opponents of this retrospective manipulation of “power” argue that the best way to answer the retrospective question, “How big might it have been?” is with an appropriate conﬁdence interval (or perhaps a form of Bayesian strategy). Thus, in the foregoing example, Zβ would be 1.645 for the upper end of a one-tailed 95% conﬁdence interval, constructed around do as

.029 + (1.645)(.169) = .307.

With this result, we could “rule out” the possibility that do was as large as .32, but not that it might be as large as .30.

23.5.2 Comparison of “Capacity” and “Power”

Because power refers to the alternative hypothesis, the term capacity was introduced here for the ability of a group (or sample) size to achieve “single signiﬁcance” by rejecting the original stochastic hypothesis. If the scientiﬁc goal of the research is to ﬁnd something big, the original stochastic hypothesis is ∆ = 0;

<<< < Предыдущая 38 39 40 41 42 43 44 45 46 47 48 4950 / 6950 51 52 53 54 55 56 57 58 59 60 61 62 > Следующая >>>

Соседние файлы в папке Английские материалы

#
28.03.202637.63 Mб0Primary Care Ophthalmology_Palay, Krachmer_2005.pdf
#
28.03.20263.67 Mб0Primary Intraocular Lymphoma_Chan, Gonzales_2007.pdf
#
28.03.202610.06 Mб0Primary Optic Nerve Sheath Meningioma_Jeremic, Pitz_2008.pdf
#
28.03.20266.24 Mб0Primary Retinal Detachment Options for Repair_Kreissig_2005.pdf
#
28.03.202633.81 Mб0Principles and Practice of Clinical Electrophsyiology of Vision_Heckenlively, Bernard Arden_2006.pdf
#
28.03.202625.93 Mб0Principles Of Medical Statistics_Feinstein_2002.pdf
#
28.03.202642.52 Mб0Progress in Brain Research The Brain's Eye Neurobiological and Clinical Aspects of Oculomotor Research_Hyona, Munoz, Heide, Radach_2002.pdf
#
28.03.202614.79 Mб0Progress in Brain Research Visual Perception, Part I Fundamentals of Vision Low and Mid-Level Processes in Perception_2006.pdf
#
28.03.20261.03 Mб0Progress in Lens and Cataract Research_Hockwin_2002.pdf
#
28.03.20262.48 Mб0Progress in the spectacle correction of presbyopia_Meister, Fisher_2007.pdf
#
28.03.20265.94 Mб0Progression of Glaucoma_Weinreb, Garway-Heath, Leung, Crowston, Medeiros_2011.pdf