Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

Nevertheless, if XB turns out unexpectedly to exceed XA and if B is a generic drug, its greater availability may impair the established dosing schedule of the standard agent. [This problem arose a few decades ago when digitalis toxicity developed in patients who transferred, at apparently similar constant dosage, from a “brand name” to a generic product.17]

To deal with the possibility that XA < XB , a separate boundary can be established for the largest permissible negative increment in XA XB . The data analysts may then demarcate two boundaries of tolerance, one for XB XA < ζ 1 and the other for XA XB < ζ 2. The extra complexity of using different positive and negative values for tolerance can be avoided, however, if we use the same value for both. (If desired, the discussions that follow can be readily adapted to encompass separate positive and negative boundaries.)

24.3 Stochastic Reasoning and Tests of Equivalence

As recently noted by Greene et al., 18 many studies claiming that results showed equivalence did not set a boundary for its magnitude, and made the claim only after the null hypothesis was not rejected when stochastic significance was tested for the observed distinction. The appropriate stochastic methods for examining and confirming equivalence create a challenging and currently unresolved problem. Because stochastic conclusions are based on rejecting hypotheses, equivalence obviously cannot be tested with the goal of rejecting the conventional null hypothesis, ∆ = 0, which is actually the basic idea we would like to confirm. Because the customary procedure cannot be used, some other approach is needed for stochastic tests of equivalence. The discussion that follows will first describe the approach that is currently used and will then introduce a new strategy, based on conventional stochastic logic, which can be reversed for testing equivalence.

24.4 Customary Single-Boundary Approach

The customary stochastic approaches for testing equivalence rely on confidence intervals, not on tests of a specific hypothesis. The calculations and consequences will differ according to whether a big δ or small ζ is used in stating the permissible boundaries of the confidence intervals.

24.4.1Procedure for “Big” δ

The earliest stochastic proposals for testing equivalence applied a confidence-interval strategy19,20 in which δ for equivalence was set as the maximum “big” boundary. The confidence interval needed to stochastically confirm an observed difference, do, as small was

do + Zα (SED ) < δ

[24.1]

Regardless of whether the selected Zα is designated as Zα or Zβ and regardless of whether the probabilities are oneor two-tailed, this confidence-interval strategy uses the same statement that appeared in Chapter 23 for rejecting the alternative stochastic hypothesis that HH : Š δ . Furthermore, the confidence interval statement can readily be converted to an equivalence hypothesis symbolized as HE : Š δ . The confirmatory stochastic decision would thus require either a confidence interval that excludes δ , or an appropriately small P value, determined when ZE (for equivalence) is calculated as

ZE = – do )/SED

[24.2]

This formula is identical to the previous Formula [23.3] for a two-boundary arrangement.

© 2002 by Chapman & Hall/CRC

24.4.1.1Conventional Calculations under HE Because HE is a nonzero hypothesis, the standard error of the difference, SED, is determined with the SEDH formula discussed previously. The

designated value of Z can be marked as Zα , because HE becomes the counterpart of the primary hypothesis to be rejected. After an appropriate value of Zα is chosen, a 1 − α confidence interval can be constructed as in Formula [24.1]. If the hypothesis boundary of δ is excluded from the upper part of this interval,

the observed “equivalence” is stochastically confirmed. For a two-tailed decision, the interval should lie between −δ and .

24.4.1.2Direction of Evaluation — The decision about a one-tailed or two-tailed direction

will affect the magnitude of Z α or Zused for calculating confidence intervals or advance sample sizes, as well as the interpretation of P values as being 2P or half of the 2P value. Because a firm consensus

has not been reached on this matter, you might use the compromise guidelines, suggested earlier in Section 11.8.4, which propose that the decisions always be two-tailed except when a one-tailed direction has been specified in advance and is suitably justified.

24.4.1.3Examples of Calculations — The calculations for advance sample size and for sto-

chastic confirmation of results are similar to those done in Sections 23.4.2 and 23.4.3. An illustration

for comparing two proportions, pA = 63/70 and pB = 56/70, with δ set at .18, would be as follows: To test stochastic significance for the “small” value of do = .90 .80 = .10, we first find

SEDH = [(.90 )(.10 )/70] + [(.80 )(.20 )/70] = .060

The 95% confidence interval is then .10 ± (1.96)(.060), and extends from .018 to .218. Because the value of .18 is contained in this interval, we cannot stochastically confirm that do = .10 is small. Alternatively, we could have calculated ZE = (.18 .10)/.060 = 1.33, which produces a “nonsignificant” 2PE = .184.

If an argument is offered that the stochastic hypothesis should be unidirectional because a generic product was being tested, the foregoing 2P value would be halved to .092 with a one-tailed interpretation; and the upper end of the 95% confidence interval, calculated with a one-tailed Z.1 = 1.645, would be

.10 + (1.645)(.060) = .199, which would still include .18. Thus, the “small” increment of .10 would be stochastically “nonsignificant” at α = .05 for both a one-tailed and two-tailed decision.

To illustrate the procedure for dimensional data, suppose δ is set at a direct increment of 10 units in a study for which the observed means, standard deviations, and group sizes respectively are XA = 100, sA = 10, and nA = 50 in the standard group, with XB = 95, sB = 9, and nB = 49 in the test group. The value of do = XA – XB = 100 95 = 5 seems small, and is well below the boundary of δ = 10. To test the “insignificant” difference stochastically, we first find

SEDH = (10 )2 /50 + [(9 )2 /49 ] = 1.91

The value of ZE will be (10 5)/1.91 = 2.62, for which 2P =.009. With Zα = 1.96, the two-tailed 95% confidence interval will extend from 1.26 to 8.74, thus excluding the value of δ = 10. [The upper boundary of a one-tailed 95% confidence interval, with Zα = 1.645, will be 5 + (1.645)(1.91) and will extend only to 8.14.] The observed “small” difference will therefore be stochastically confirmed as “insignificant.”

24.4.1.4 Paradoxical Results — The 95% two-tailed confidence interval in the immediately preceding example (for two means) shows the kind of paradox that can sometimes occur with stochastic tests of “insignificance.” The observed do of 5 seemed small and is only 5% of the common mean, which is 97.5 for the two groups. [The latter value was calculated as {(100 × 50) + (95 × 49)}/99.] Nevertheless,

© 2002 by Chapman & Hall/CRC

because 0 is excluded from the lower end of the 2-tailed 95%-confidence interval, the observed result would also be stochastically significant for the original null hypothesis that ∆ = 0. This paradox — which allows a small difference to be stochastically confirmed as both “big” and “small” — arose from the relatively large sample size, and from d o, at a value of 5, being equidistant from the locations of ∆ = 10 and ∆ = 0 for the two hypotheses.

24.4.2Example of a “Classical” Study

A classical clinical study of equivalence was presented many years ago by Kramer, Rooks, and Pearson.21 Wanting to show that childhood growth and development are not impaired by the sickle-cell trait, the investigators measured several indexes of physical growth and cognitive development in 50 matched pairs of black infants with either normal (AA) blood or sickle cell (AS) trait. The pairs were matched according to date of birth, sex, birth weight, gestational age, and parental socioeconomic status at the time of delivery. The follow-up measurements were done, when the children were between 39 and 63 months of age, by examiners who were “blind” to the child’s genotype, and for the cognitive tests, by a black psychologist.

For each measured variable, the investigators determined the do mean increments and corresponding Po values for members of the matched-pair groups. A value of δ was also established for each variable as a substantively “larger potential difference” in the incremental means for each group. Values of PE were then determined for the values of δ − do. (The symbols used by the investigators have been converted to those of the text here.) The results were small and quantitatively unimpressive for the diverse results of do in 6 variables indicating physical growth and in 6 variables indicating cognitive development. In all of these comparisons PE was ð.05. In one variable, the McCarthy Perceptual-Performance subtest, do for the AA–AS increment was 2.4, for which Po was 0.036 under the original null hypothesis. Nevertheless, for the corresponding established δ = +2.0, PE was “< 0.001.” The reversed direction of do and the “significant” Po were attributed to a chance event occurring “because of the many outcome variables under comparison.”

The investigators concluded that sickle-cell-trait children have “no deficits in standard measurements of growth and development,” and that previous beliefs that such children were impaired were due to methodologic flaws in the research.

24.4.3Procedure for “Small” δ

Recognizing that the big value of δ may be too large for satisfactory appraisals of equivalence, Makuch et al.10 and Jones et al.16 have proposed that a smaller boundary be used in the hypothesis for HE. If ζ symbolizes this boundary, the demand made in Formula [24.1] becomes expressed as

do + Zα (SED) < ζ

[24.3]

and the primary stochastic hypothesis for equivalence of two means would be stated as

HE : A µB ) > ζ

If the hypothesis is symmetrically two-tailed, Formula [24.3] becomes stated as

do ± Zα (SED)

 

< ζ

[24.4]

 

This approach produces the draconian demand that the entire confidence interval be contained between the boundaries of −ζ and . The demand seems scientifically peculiar because an observed value of d o < ζ , although small enough to be regarded as “equivalent,” cannot be stochastically confirmed unless do is “super-small” enough for the entire interval to be smaller than ζ.

© 2002 by Chapman & Hall/CRC

24.4.3.1

Consequences of “Small” Value for HE When the increments of δ

do are

replaced by ζ

do, much larger sample sizes will be needed for stochastic confirmation of the small

differences. For example, consider the small increment in the two means,

X

A = 100 and

X

B

= 95, that

was stochastically confirmed in the second part of Section 24.4.1.3. Suppose the observed do

was still 5,

with SEDH = 1.91, but the “small” boundary for ζ was set at 6, rather than at the previous

δ = 10. ZE

would then be calculated as (6 5)/1.91 = .523 and would not provide stochastic confirmation, because

2PE is >.5.

Because the observed do was , the failure to confirm would be caused by inadequate capacity in a

too-small sample. The sample size needed to confirm the “smallness” of do = 5 could be calculated from the formula

n Š Zα2 (sA2 + sB2 )/– do )2

[24.5]

For a one-tailed Zα =1.645, with sA = 10 and sB = 9, the requirement would be n Š (1.645) 2(100 + 81)/(6 5)2 = 489.8

At least 490 members would be needed in each group. Thus, the total group size of 99 members (= 50 + 49) had ample capacity to reject HE when it was set at the “big” δ = 10, but the capacity would be only 99/980 = .10, if HE were located at the “small” ζ = 6. If the latter HE were tested with Zα = 1.96 instead of Zβ = 1.645, the required sample size would be increased to 695.3 per group.

As an additional example, consider the comparison in the first part of Section 24.4.1.3, where “success” was achieved by 90% and by 80% of two groups that each contained 70 persons. When HE was set at the large value of δ = .18, the incremental difference of 10% was not stochastically confirmed as small. For Zα = 1.96, a sample size of 150 persons would have been required. If HE were set at the smaller value of ζ = .12, however, the observed increment of .10 would still have satisfied the quantitative requirement for equivalence, but the sample size needed for stochastic confirmation would have risen to (1.96)2(.25)/(.12 .10)2 = 2401 for each group. With Zα = 1.645, the required sample size per group would have been 1691.3.

24.4.3.2 Advantages of “Big” Value for HE As shown in the foregoing calculations, when the equivalence hypothesis, HE, is set at a relatively large value of δ , stochastic confirmation can be attained with much smaller group sizes than if HE is placed at the small ζ. In fact, the calculations can also show the impossibility of trying to prove the null hypothesis that ∆ = 0. If the demanded ζ = 0 is attained by an observed do = 0, the value of 0 would be entered in the denominator of Formula [24.5], and an infinite sample size would be required.

24.4.4 Alternative Hypothesis for Single Boundaries

As discussed earlier in Sections 11.5 and 11.9, when a stochastic hypothesis is rejected, its counterhypothesis is conceded. In customary two-tailed procedures with the null hypothesis that ∆ = 0, the counter-hypothesis is ¦ 0 and does not have a location. The alternative hypothesis, however, has a distinct location, which is set (for tests of efficacy) at the “large” value of δ . As shown in Chapter 23, the alternative hypothesis can offer “consolation” when the desired big do turns out to be disappointingly small, i.e., do < δ .

In stochastic testing for equivalence, however, if HE is set with only a single boundary for either δ or ζ , a specific location is not cited for an alternative hypothesis. The term “alternative hypothesis” is sometimes mentioned16,22 in stochastic discussions of equivalence tests, but the discussed entity is really a counter-hypothesis, expressed in terms such as µA µB < ζ . A specifically located alternative hypothesis is not cited. Consequently, in the customary one-boundary strategy, a differently located alternative hypothesis is not available either for additional testing or for “confidence-interval consolation” if a desired small do turns out to be disappointingly large.

© 2002 by Chapman & Hall/CRC

24.4.5Double-Significance (Neyman-Pearson) Approach

As discussed in Chapter 23, the Neyman-Pearson strategy deals with two hypotheses, but has only one distinct quantitative boundary. The strategy is intended to produce stochastic significance no matter which way the results turn out. Using the big δ as a single two-zone boundary, the doubly significant sample would let the results be stochastically significant if the observed do is Š δ and also if do < δ . The analogous events would happen if the single boundary is set at a small ζ , rather than a big δ .

Among the various problems associated with the Neyman-Pearson approach, perhaps the most scien - tifically peculiar idea is that the investigator is passively disinterested in the outcome of the research and does not care what emerges as long as it is stochastically significant. This approach inevitably leads to excessively large sample sizes, which can often stochastically confirm, as “big,” a disappointingly small quantitative value of do that is < δ . A counterpart of this dissidence would occur if the doublehypothesis strategy were used in a trial aimed at showing equivalence. The trial might produce a value of do that is bigger than the boundary of a smaller ζ , but the large sample size might allow the result to be confirmed stochastically as small.

When the Neyman-Pearson two-hypothesis strategy has been proposed16,23 with a single-δ -boundary for the challenge of testing equivalence, δ has been set at smaller boundaries than the customary “big” δ . Nevertheless, after estimates of π 1, π 2, and the common π , the advance sample size formula becomes

n = 2π (1 – π )(Zα + Zβ )2 /δ 2

[24.6]

which is the conventional “double significance” Formula [23.10] cited in Chapter 23.

One of the problems produced by the Neyman-Pearson strategy can be illustrated in an example offered by Jones et al.16 for equivalence of two bronchodilator inhalers. The upper boundary for equivalence was set at 15 l/min in mean values of morning peak expiratory flow rate. The “between subject variance,” i.e., s2, was estimated as 1600 (l/min)2. Using Zα = 1.96 and Zβ = 1.28, the authors then applied the NeymanPearson formula of 2s2(Zα + Zβ )2/ζ 2 to calculate a sample size of 150 patients. Suppose this sample size is used in a trial that produces do = 12 between the two groups, with s = 40 in each group. The value of SED will be 2 (40 )2 /150 = 4.62, and the upper boundary of the confidence interval will be 12 + (1.96)(4.62) = 21, thus exceeding the boundary of 15. Consequently, although the observed do = 12 was less than ζ = 15, the sample size would not allow stochastic confirmation of the small difference.

24.5 Principles of Conventional Stochastic Logic

A new strategy, which can avoid the problems and dilemmas of the currently used one-boundary approach, is to identify the principles of conventional stochastic logic, and then reverse them.

24.5.1Identification of Four Principles

When we wanted to prove that something was big, the principles of customary stochastic logic placed the primary “null” hypothesis at the opposite extreme, setting Ho at the smallest possible distinction, i.e., ∆ = 0. The hypothesis was rejected if the calculated value of P was below a selected value α or if

was excluded from the zone of an appropriately calculated 1 − α confidence interval.

As noted earlier in Section 23.1.2.3.1, the primary hypothesis was “boundless” because it did not

contain the boundary of δ (for big) either in its main statement or in demarcation of a confidence interval. The value of δ was used only for appraising the quantitative distinction of do. Later on, however, δ appeared as the location of the alternative secondary hypothesis, which was set at HH : Š δ .

This reasoning, used for stochastically confirming a “big” difference, contained four crucial principles:

1.The primary stochastic hypothesis, ∆ , is located at a value opposite to what we want to confirm.

2.The observed distinction, do, is stochastically confirmed, i.e., the primary hypothesis is rejected, if is excluded from a 1 − α confidence interval around do, or if the pertinent P value is ð α .

©2002 by Chapman & Hall/CRC

3.The critical quantitative boundary (for δ ) is not contained in the statement of the primary hypothesis itself and is not used to demarcate limits for confidence intervals.

4.The critical quantitative boundary is used to mark the location of the alternative hypothesis. For a “big” difference, this boundary is set at δ .

These principles are illustrated in Figure 24.3, which shows four ways in which the confidence intervals can stochastically confirm “efficacy.” In the top two instances, do was > δ and in the lower two, d o was < δ ; but in all four, the value of ∆ = 0 was excluded from the lower boundary of the confidence interval. Note that in Situations B and D, the value of δ was not included in the confidence interval around d o. (Situations C and D are excellent examples of statistical dissidence, where the observed distinction is stochastically but not quantitatively significant.)

24.5.2Reversed Symmetry for Logic of Equivalence

If applied with reverse symmetry for stochastic testing of equivalence, the four principles just cited would require the following:

1.The primary hypothesis of equivalence, HE, should be set at a value of ∆ that is large, i.e., the opposite of the small quantitative distinction that we want to confirm.

2.The observed small distinction, do, is stochastically confirmed if the “large” hypothesis is rejected by being excluded from a 1 − α confidence interval around d o, or if the pertinent P value is ð α .

3.The critical quantitative boundary, ζ , for the small distinction does not appear in the statement of the primary hypothesis, and is not used to demarcate limits for confidence intervals.

4.The critical boundary, ζ , is used to mark the location of the alternative hypothesis.

These principles are illustrated in Figure 24.4, which is a counterpart of Figure 24.3. In Situations A and B, the value of do is , and in C and D, do is , but in all four, the value of = δ is excluded from the upper boundary of the confidence limit. Note that in Situations B and D, the value of ζ is not included in the confidence interval around do. Situation D here is almost identical to Situation D in Figure 24.3 and shows the same statistical dissidence. The observed do, although too large to be called quantitatively “small,” is nevertheless stochastically confirmed as small. (Because both 0 and ζ are excluded from the confidence intervals, the observed do in Situation D in both Figures 24.3 and 24.4 is stochastically confirmed as being both large and small.)

The more common kind of statistical dissidence occurs in Situation C of both figures where a difference that fails to satisfy the quantitative requirement is stochastically confirmed for that requirement, but is not confirmed for the alternative hypothesis.

24.5.3Applicability of Previous Logic

For testing equivalence, the four cited principles of logic cannot be applied with a single-boundary twozone demarcation. The first principle requires that one boundary, such as δ , be set for a large value of ; and the third principle requires another boundary, such as ζ , to demarcate something small. If δ and ζ are used for these boundaries, they produce three zones, rather than the two zones formed when a single boundary is placed at either a big δ or a small ζ . A three-zone system has not hitherto been applied, however, for stochastic evaluations of equivalence. The analysts have regularly used only a single boundary, placed at a big δ or a small ζ , as discussed earlier throughout Section 24.4. The next section (24.6) describes the two-boundary–three-zone approach that produces symmetrical stochastic logic for testing equivalence.

© 2002 by Chapman & Hall/CRC

A

0

δ

do

B

0

δ

do

C

0

do

δ

D

0

do

δ

FIGURE 24.3

Dotted lines and arrows show extent of confidence intervals for the observed value of do and the designated boundary of δ . In all four situations, “efficacy” is confirmed because the lower end of the confidence interval exceeds 0. In situations A and B, do exceeds δ , but in situations C and D, do is < δ .

A

0 do ζ

δ

B

0 do ζ

δ

C

0

ζ do

δ

D

0

ζ

do

δ

FIGURE 24.4

Dotted lines and arrows show extent of confidence intervals for the observed value of do and the designated values of δ and ζ . In all four situations, equivalence is “confirmed” because the upper boundary of the confidence interval does not exceed δ . In parts A and B, do is ð ζ ; and in Parts C and D, do is > ζ . In part B, the entire confidence interval is contained between 0 and ζ , thus confirming do as both “big” and “small.” In part D, the entire confidence interval excludes both ζ and δ .

24.6 Logical (Three-Zone–Two-Boundary) Approach

To evaluate equivalence with a mirror image (or reverse symmetry) of the basic stochastic logic used for evaluating efficacy, the first step is to set the primary hypothesis of equivalence at a large value, i.e.,

HE : ∆ ≥ δ

The alternative hypothesis is set at the small value, i.e.,

HS : ð ζ

The two sets of arrangements for efficacy and equivalence are shown in Table 24.1.

TABLE 24.1

Changing Location of Hypotheses for Testing Efficacy or Equivalence

 

 

 

 

 

Location of

 

 

Quantitative

Primary

Alternative

Desired Goal

Criterion

Hypothesis

Hypothesis

 

 

 

 

 

 

 

Efficacy

do

Š δ

Ho :

= 0

HH :

Š δ

Equivalence

do

ð ζ

HE :

Š δ

Hs :

ð ζ

 

 

 

 

 

 

 

© 2002 by Chapman & Hall/CRC

24.6.1Calculations for Advance Sample Size

With the two-boundary–three-zone arrangement, the advance calculation of a “singly significant” sample size for equivalence can be done with a confidence interval or a P value. With a confidence interval, using Zα for the primary hypothesis, the observed maximum difference of ζ should have the attribute that ζ + Zα (SED) ð δ .

24.6.1.1 Calculation for Two Proportions — With equal sizes in each group and with π A and π B as the estimated values of the two proportions, the sample size will be

n

Z

α2

A (1 – π

A ) + π B (1 – π B )]

[24.7]

---------------------------------------------------------------------

 

ζ )2

 

 

 

 

This same result emerges with the P value approach, if we determine n by using the formula

 

 

 

ZE =

δ------------- do

[24.8]

 

 

 

 

SED

 

after substituting Zα for ZE, ζ for do, and

 

A (1 – π A ) + π B (1 – π B )]/n

for SED.

To demonstrate the calculations, suppose we expect that a suitable measurement of bioequivalence will be achieved by 90% of persons receiving Drug A and by 80% of persons receiving a generic product, B. If we use the FDA two-zone guideline that allows proportionate increments within 20% to be regarded as small, the boundary value of δ for this comparison will be δ = (.20)(.90) = .18. We can then set the anticipated increment of .10 as the value of ζ.

Using Formula [24.7], the required sample size for each group would be

n Š (1.96) 2[(.90)(.10) + (.80)(.20)]/(.18 .10)2 = 150.1

if Zα = 1.96, and 105.7 if Zα = 1.645.

24.6.1.2 Calculation for Two Means — For dimensional data, the appropriate modification of Formula [24.7] produces

n

Zα2

A2 + σ

B2 ]

[24.9]

-----------------------------

 

 

ζ )2

 

 

To illustrate the calculations, suppose the large value of δ is set at 12 for a study in which the two groups are expected to differ by no more than ζ = 5. We also expect that the two groups will have standard deviations of σ Α = 9 and σ B = 10. With these expectations, and with α set at a two-tailed .05, the required sample size for each group will be n Š (1.96) 2 (181)/(12 5)2 = 14.2.

Note that the expected values of the two means, µA and µΒ , do not appear in Formula [24.9], and the sample size will depend on only the increments and variances.

24.6.2Effect of Different Boundaries for δ and ζ

An important feature of Formula [24.7] is that the sample-size requirement enlarges if the boundaries are “relaxed” so that δ is made smaller and ζ is larger. Thus, if δ is reduced to .16 and ζ is raised to

.12 in the foregoing trial, the required per-group size for Zα = 1.96 will be

n (1.96)2(.25)/(.16 .12)2 = 600.2

Conversely, the required sample size will decrease if either δ is made larger or ζ is smaller (or both). For example, suppose the clinical trial in Section 24.6.1.1, done with 151 patients in each group, turns out to show pA = 136/151 = .901 and pB = 125/151 = .828. The observed do will be .901 .828 = .073,

and SED will be

© 2002 by Chapman & Hall/CRC

(136 )(15 )/(151)3 + (125 )(26 )/(151)3 = .0392

The value of ZE will be (.18 .073)/.0392 = 2.73, for which 2PE = 006. Thus, a stochastically significant result (with 2PE < .05) could have been achieved with smaller groups.

On the other hand, suppose the study comparing two means in Section 24.6.1.2 were done with 15 persons in each group, and that the standard deviations produced the expected results of s A = 9 and sB = 10, but the observed difference in means turned out to be do = 7. With Formula [24.8], SED will be (92 + 102 )/15 = 3.47 and ZE will be (12 7)/3.47 = 1.44, which is too small to yield PE< .05 with either

a two-tailed or one-tailed evaluation.

24.6.3Exploration of Alternative Hypothesis

If do is < ζ but the result is not stochastically confirmed, the main problem will be low capacity. If do exceeds ζ , however, symmetrical logic will allow the investigator to explore the alternative hypothesis, which will be

HS : ð ζ

The investigator can then be consoled if ζ is included within the lower boundary of a confidence interval around the too big do.

For example, in the foregoing comparison of two means, with SED = 3.47, Zβ = 1.645 can be used to calculate the lower boundary of a one-tailed 95% confidence interval, which will be 7 (1.645)(3.47) = 1.29. Because the value includes the desired ζ = 5 , the result can be regarded as a compatible stochastic variation. Similarly, if ζ had been set at .04 rather than .10 in the foregoing clinical trial, and if the results still showed do = .073 with SED = .0392, the lower boundary of the one-tailed confidence interval would be .073 (1.645)(.0392) = .0085, which would be stochastically compatible with the desired ζ = .04.

24.6.4Symmetry of Logic and Boundaries

The logic in Table 24.1 is “symmetrical,” because each of the scientific goals — efficacy and equivalence — has a separate location for its own primary hypothesis and its alternative hypothesis. The boundary values at those locations, however, are not symmetrically reversed. The “big” value of δ does double duty in locating the alternative hypothesis for a big distinction and the primary hypothesis for a small one. The “small” value of ζ , however, does not have a mirror image when the primary hypothesis is set at 0 for a big distinction.

To achieve the attractive scientific appeal of a perfect mirror image would require changing the primary null hypothesis from 0 to ζ in tests of efficacy. This change would probably be anathema to most statistical theorists, because it would alter a century of statistical reasoning. All of the ordinary parametric tests (t, Z, chi-square, etc.) based on ∆ = 0 would have to be recalled and reconfigured, and statisticians and investigators would have to collaborate closely in choosing an appropriate value of ζ for each situation. Nevertheless, this idea may have considerable merit when parametric reasoning is eventually replaced in the new era of computer-intensive statistics; and besides, in Bayesian inference, a close clinico-statistical collaboration is needed to choose the appropriate values for prior probabilities. Furthermore, a clearly demarcated pair of boundaries, with ζ for small and δ for big, would allow tests of fragility (as discussed earlier) to be used for making stochastic decisions according to descriptive rather than probabilistic zones of reasoning.

© 2002 by Chapman & Hall/CRC

24.7 Ramifications of Two-Boundary–Three-Zone Decision Space

The most important ramifications of a three-zone decision space extend far beyond the testing of “equivalence.” The use of two boundaries and three zones can promptly help eliminate some of the prime difficulties produced by the Neyman-Pearson “double significance” method for calculating sample size. One main source of these difficulties is the choice of an unrealistically high value of δ to lower the “inflated” sample size to an attainable value. The subsequently observed do , although substantially smaller than the originally selected δ , may then be proclaimed “significant” after receiving stochastic confirmation for its large value.

24.7.1Realistic Modifications for δ and ζ

With a three-zone decision space, the Neyman-Pearson calculations could be abandoned, and the inves - tigators could set goals for values of δ and ζ that are closer to realistically acceptable boundaries for clinical decisions about quantitative magnitudes. For example, although δ could be set at .15 for an increment in two proportions, the investigator might still be willing to claim quantitative significance for smaller observed values, such as do = .14 or even do = .10, but not for do = .09. If so, a new boundary for “big” can be set at δ = .10. Similarly, if ζ were originally set as an increment of .02, the investigator might still be willing to claim “equivalence” if do = .03 or .04, but not if do = .05. If so, a new boundary for “small” can be set at ζ = .04. With δ made smaller and ζ larger than the unrealistic previous boundaries of δ and ζ , the magnitude of the intermediate zone would be substantially reduced.

The sample-size calculation would then involve two computations: one for “single significance” under the primary null hypothesis, with Ho : ∆ = 0 and δ as the boundary of big; and the other for “single significance” under the primary equivalence hypothesis, with HE : Š δ and with ζ as the boundary of small. The larger of these two sample-size values would then be used in the trial. Because the denominators for these calculations will have δ in the first and δ − ζ in the second, the second calculation should always produce the larger result.

24.7.2Effects on Sample Size

Single-significance calculations with these modified boundaries for δ and ζ would be scientifically realistic and “honest,” and will usually be smaller than the sample sizes that emerge with the “double significance” approach.

For example, suppose δ is set at .15 for a desired pA = .35 and pB = .20 in a randomized trial, so that πˆ = (.35 + .20)/2 = .275. If Zα is set at a two-tailed 1.96 and Zβ is set at a one-tailed 0.84, the “doublesignificance” calculation with Formula [23.10] would produce

n

 

1.96 2(.275)(.725 ) + 0 .84 (.35)(.65 ) + (.20)(.80)

2

 

(.15 )2

 

 

 

 

 

which is n Š 137.8.

For a singly significant result if do > δ , the sample size calculated with Formula [14.19] would be n Š (1.96)2[2(.275)(.725)]/(.15)2 = 68.1. For a small value of do with ζ set at .02, a single-significance calculation to confirm the small result would use Formula [24.7] to produce n Š (1.96) 2[(.35)(.65) + (.20)(.80)]/(.15 .02)2 = 88.1. Thus, the trial could be done with 89 persons in each group rather than with the 138 persons needed for “double-significance.”

On the other hand, if the investigators more realistically set δ = .10 and ζ = .04 as the “bottom-line” quantitative boundaries for respective claims of “big” or “small,” the sample size needed to confirm single significance of the large difference (with do Š .10) would be n Š (l.96) 2[2(.275)(.725)]/(.l0)2 = 153.2; and confirmation of the small difference (with do ð .04) would use Formula [24.7] to require n Š (1.96)2[(.35)(.65) + (.20)(.80)]/(.10 .04)2 = 413.5. Thus, the new approach might occasionally seem

© 2002 by Chapman & Hall/CRC