Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

TABLE 14.2

Correspondence of Two-Tailed External P Values for X2 Values at Different Degrees of Freedom (marked ν )

ν

0.90

0.80

0.70

0.60

0.50

0.40

0.30

0.20

0.10

0.050

0.0250

0.010

0.0050

0.0010

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

0.0158

0.0642

0.148

0.275

0.455

0.708

1.074

1.642

2.706

3.841

5.024

6.635

7.879

10.828

2

0.211

0.446

0.713

1.022

1.386

1.833

2.408

3.219

4.605

5.991

7.378

9.210

10.597

13.816

3

0.584

1.005

1.424

1.869

2.366

2.946

3.665

4.642

6.251

7.815

9.348

11.345

12.838

16.266

4

1.064

1.649

2.195

2.753

3.357

4.045

4.878

5.989

7.779

9.488

11.143

13.277

14.860

18.467

5

1.610

2.343

3.000

3.355

4.351

5.132

6.064

7.289

9.236

11.070

12.833

15.086

16.750

20.515

6

2.204

3.070

3.828

4.570

5.348

6.211

7.231

8.558

10.645

12.592

14.449

16.812

18.548

22.458

7

2.833

3.822

4.671

5.493

6.346

7.283

8.383

9.803

12.017

14.067

16.013

18.075

20.278

24.322

8

3.490

4.594

5.527

6.423

7.344

8.351

9.524

11.030

13.362

15.507

17.535

20.090

21.955

26.124

9

4.168

5.380

6.393

7.357

8.343

9.414

10.656

12.242

14.684

16.919

19.023

21.666

23.589

27.877

10

4.865

6.179

7.267

8.295

9.342

10.473

11.781

13.442

15.987

18.307

20.483

23.209

25.188

29.588

11

5.578

6.989

8.148

9.237

10.341

11.530

12.899

14.631

17.275

19.675

21.920

24.725

26.757

31.264

12

6.304

7.807

9.034

10.182

11.340

12.584

14.011

15.812

18.549

21.026

23.337

26.217

28.300

32.909

13

7.042

8.634

9.926

11.129

12.340

13.636

15.119

16.985

19.812

22.362

24.736

27.688

29.819

34.528

14

7.790

9.467

10.821

12.078

13.339

14.685

16.222

18.151

21.064

23.685

26.119

29.141

31.319

36.123

15

8.547

10.307

11.721

13.030

14.339

15.733

17.322

19.311

22.307

24.996

27.488

30.578

32.801

37.697

16

9.312

11.152

12.624

13.983

15.338

16.780

18.418

20.465

23.542

26.296

28.845

32.000

34.267

39.252

17

10.085

12.002

13.531

14.937

16.338

17.824

19.511

21.615

24.769

27.587

30.191

33.409

35.718

40.790

18

10.865

12.857

14.440

15.893

17.338

18.868

20.601

22.760

25.989

28.869

31.526

34.805

37.156

42.312

19

11.651

13.716

15.352

16.850

18.338

19.910

21.689

23.900

27.204

30.144

32.852

36.191

38.582

43.820

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Source: This table is derived from Geigy Scientific Tables, Vol. 2, ed. by C. Lentner, 1982 Ciba-Geigy Limited, Basle, Switzerland.

Accordingly, at one degree of freedom in Table 14.2, the top row shows that the calculated X2 of 2.73 is just higher than the 2.706 value required for 2P = .10. Therefore, the stochastic conclusion is

.05 < 2P < .10.

To confirm the relationship of Z2 and X2, note that the increment in the numerator of Formula [l3.16]

for Z is (14/105) (6/95) = .070, and

nA nB /N is (95 )(105 )/200 = 7.06 . The denominator term for

PQ is calculated as (20 )(180 )/200

= .3. Therefore, Z = (.070)(7.06)/.3 = 1.65 and Z2 is 2.72, which

is identical, except for minor rounding, to the value of X2 calculated with the observed/expected tactic in Section 14.1.2.

Although this approach is satisfactory to show that a 2 × 2 table has one degree of freedom, a more general strategy is needed for larger two-way tables.

14.1.5Controversy about d.f. in 2-Way Tables

X2 tests are most commonly applied to a 2 × 2 table having four interior cells, but can also be used for two-way tables with larger numbers of cells. With r categories in the rows and c categories in the columns, the table would have r × c cells.

The decision about degrees of freedom in an r × c table gave rise, early in the 20th century, to a heated, bitter battle between two statistical titans: Ronald A. Fisher and Karl Pearson. Fisher emerged the victor, but the personal feud between the two men was never resolved, with Fisher thereafter avoiding (or being denied) publication in the then leading journal of statistics ( Biometrika), which was edited by Pearson. [Fisher, of course, found other places to publish, but the feud was a juicy scandal for many years. Consider the analogous situation in medicine today if William Osler were alive in the U.S. and writing vigorously, but never publishing anything in the New England Journal of Medicine.]

In the argument over degrees of freedom, Pearson contended that a two-way table with r rows and c columns would have r × c cells, and that the degrees of freedom should therefore be rc 1. Fisher demonstrated that the correct value for degrees of freedom was (r 1)(c 1).

He began his argument by “dis-entabling” the data and converting the cells into a linear array. It would have r × c cells, which could be filled with rc degrees of freedom, choosing any desired numbers. Because the cellular totals must add up to N, one degree of freedom is immediately lost, so that rc 1 choices become “freely” available. Reconstructing the r × c cells into a 2-way table, however, produces some new constraints. The marginal totals must add up to the grand total (N) in the rows and also in the columns. Therefore, only r 1 marginal totals can be freely chosen for the rows, and each of those choices creates

an additional constraint in the cellular freedom, so that r

1 degrees of freedom are lost in the basic cellular

choices. Analogously, the column totals create another c

1 constraints. Therefore, the degrees of freedom

© 2002 by Chapman & Hall/CRC

are rc 1 (r 1) (c 1), which becomes (r

1)(c 1). Thus, a 4 ×

5 table has (4 1)(5 1) = (3)(4)

= 12 degrees of freedom, and a 2 × 2 table has

(2 1)(2 1) = 1 degree of freedom.

The idea can be quickly grasped for a 2 ×

2 table if you consider the marginal totals fixed as follows:

 

 

 

Column

 

 

 

Row

1

2

Total

 

 

 

 

 

 

1

 

 

n1

2

 

 

n2

 

 

 

 

 

 

 

Total

f1

f2

N

As soon as you enter a number in any one of these four cells, the values of all the remaining cells are determined. Suppose a is chosen to be the value of cell 1,1. Then cell 1,2 is n 1 a; cell 2,1 is f1 a; and cell 2,2 is either n2 (f1 a) or f2 (n1 a). Consequently, the only available freedom in this table is to choose the value for any one of the cells.

14.1.6“Large” Degrees of Freedom

The formula (r 1)(c

1) should be kept in mind when the value of ν is determined for r × c tables

that are larger than 2 ×

2. If ν becomes very large, however, you may want to reappraise the planned

analysis. For example, a table with 4 categories in the rows and 5 categories in the columns will have 12 degrees of freedom, but the results of such a table are usually difficult to understand and interpret. In fact, many experienced, enlightened data analysts say they regularly try to reduce (i.e., compress) everything into a 2 × 2 table because they understand it best.

Chapter 27 contains a discussion of tables where r and c are each 3, and also of special tables constructed in a 2 × c (or r × 2) format where only c (or r) is 3. [For example, we might consider the sequence of survival proportions for four ordinal stages – I, II, III, and IV — in a table that would have (2 1)(4 1) = 3 degrees of freedom.] In general, however, if the degrees of freedom start getting “large” (i.e., 6), the table is probably unwieldy and should be compressed into smaller numbers of categories. Therefore, opportunities should seldom arise to use the many lines of Table 14.2 for which the degrees of freedom exceed 6.

14.1.7Yates Continuity Correction

Another major conceptual battle about χ 2 is still being fought today and has not yet been settled. The source of the controversy is the “continuity correction” introduced by Frank Yates3 in 1934.

The basic argument was discussed in Section 13.8.2. The distribution of a set of categorical frequencies, or of the X2 value calculated from them, is discontinuous, but χ 2 has a continuous distribution.

Consequently, χ 2 is the “limit” toward which X2 approaches as the group sizes grow very large. For a suitable “correction” of X2, Yates proposed that if we take “the half units of deviation from expectation as the group boundaries, we may expect to obtain a much closer approximation to the true distribution.”

The net algebraic effect of the Yates correction was to make the basic Formula [14.1] become

2 Σ ( O – E – 1/2 )2 Xc = -------------------------------------

E

where O = observed values, E = expected values, and the c subscript indicates use of the Yates correction. For practical purposes, the Yates correction usually alters the individual observed–expected deviations to make them less extreme. The corrected value of X2c is then used for entering the standard chi-square tables to find the associated value of P.

The Yates correction soon became established as a chic item of statistical fashion in the days before easy computation was available for the Fisher Exact Test. A manuscript reviewer (or editor) who could find nothing else to say about a paper would regularly ask whether the investigators had calculated their X2 values ignorantly (without the correction) or sagaciously (with it). To know about and use the Yates

© 2002 by Chapman & Hall/CRC

correction became one of the “in-group” distinctions — somewhat like knowing the difference between incidence and prevalence — that separated the pro from the neophyte in the analysis of clinical and epidemiologic data.

In recent years, however, a major assault has been launched against the continuity correction and it seems to be going out of fashion. The anti-Yates-correction argument is that the uncorrected X2 value gives better agreement than the corrected one when an exact probability (rather than an approximate one, via χ 2 tables) is calculated for the data of a 2 × 2 table. If the “gold s tandard” Fisher exact probability test, discussed in Chapter 12, eventually replaces the X2 test altogether, the dispute about the Yates correction will become moot.

14.2 Formulas for Calculation

Because the chi-square procedure is so popular, many tactics have been developed to simplify the arithmetic with methods that incorporate the “expected” values into the calculations, thereby avoiding the cumbersome Σ [(observed – expected)2/expected] formula. The alternative formulas are listed here for readers who may want or need to do the calculations themselves without a computer program. [If you like interesting algebraic challenges, you can work out proofs that the cited formulas are correct.]

14.2.12 × 2 Tables

For a 2 × 2 table constructed as

 

 

 

Total

 

 

 

 

 

a

b

n1

 

c

d

n2

Total

f1

f2

N

the simplified formula is

X2 = [(ad – bc)2 N ]/(n1 n2 f1 f2 )

[14.3]

In words, X2 is the squared difference in the cross product terms, multiplied by N, divided by the product of the four marginal totals. Thus, for the data in Table 14.1,

X

2

=

[(6 × 91 ) (89 × 14 )]2 ×

200

=

(546 – 1246)2

× 200

 

----------------------------------------------------------------------95 ×

 

105 ×

20 × 180

 

 

------------------------------------------------35910000

 

 

 

 

 

 

 

 

 

=

(–700 )2

×

200

=

98000000

= 2.73

 

 

 

---------------------------------35910000

35910000-----------------------

 

 

 

 

 

 

 

 

 

which is the same result obtained earlier.

14.2.2Yates Correction in 2 × 2 Table

To employ the Yates correction, Formula [14.3] is changed to

2 ( ad – bc – [N/2])2 N

X = --------------------------------------------------------------------

c (a + b)(c + d)(a + c )(b + d )

If Yates correction were applied, the previous calculation for Table 14.1 would become X2c (200/2)]2 [200]/35910000 = (600)2(200)/35910000 = 2.005.

[14.4]

= [700

© 2002 by Chapman & Hall/CRC

14.2.3Formula for “Rate” Calculations

The results of two proportions (or two rates) are commonly expressed in the form p1 = t1/n1 and p2 = t2/n2, with totals of P = T/N, where T = t1 + t2 and N = n1 + n2. The “unpacking” of such data can be avoided with the formula

X

2

=

 

t12

t22

T2

 

 

 

N2

 

[14.5]

 

 

 

 

 

----

+ ---- – ----

 

 

---------------------------

 

 

 

 

 

n1

n2

N

 

 

 

(T)(N – T)

 

 

 

 

 

 

 

 

 

 

This formula may look forbidding, but it can be rapidly carried out on a hand calculator, after only one manual maneuver—the subtraction of T from N.

The formula can be illustrated if the occurrence of anemia in Table 14.1 were presented as 6/95 =

.063 for men and 14/105 = .133 for women. Instead of unpacking these proportions into a table, we can promptly determine that T = 6 + 14 = 20, N = 95 + 105 = 200, and N T = 200 20 = 180. With Formula [14.5], we then calculate

X2 = [(62/95) + (142/105) (202/200)][2002/{(20)(180)}]

= [.2456][11.11] = 2.73

which is the same result as previously.

Another formula that determines X2 without fully “unpacking” the two proportions is

X

2

=

(n2 t1

– n1 t2 )2 N

[14.6]

 

----------------------------------n1 n2 T(N – T)

 

 

 

 

For the data in Table 14.1, this calculation is [(105)(6) (95)(14)]2(200)]/[(95)(105)(20)(180)] = 2.73.

14.2.4Expected Values

If you, rather than a computer program, are going to calculate X2 for a two-way table that has more than 2 rows or 2 columns, none of the simplified formulas can be used. The expected values will have to be determined, and a strategy is needed for finding them quickly and accurately.

The process is easiest to show for a 2 × 2 table, structured as follows:

 

Men

Women

Total

Old

a

b

f1

Young

c

d

f2

Total

n1

n2

N

 

 

 

 

In this table, the proportion of old people is f1/N. Under the null hypothesis, we would expect the proportion of old men to be (f1/N) × n1 and proportion of old women to be (f1/N) × n2. Similarly, the expected values for young men and young women would be, respectively, (f 2n1)/N and (f2n2)/N. Consequently, the expected values in any cell are the products of the marginal totals for the corresponding row and column, divided by N. Thus, the expected values here are

 

Men

Women

Old

f1 n1

f1 n2

--------

--------

 

N

N

Young

f2 n1

f2 n2

--------

--------

 

N

N

 

 

 

© 2002 by Chapman & Hall/CRC

For a table of larger dimensions, the process can be illustrated as shown in Figure 14.2. In the cell for row 2, column 1, the expected value is f2n1/N—which is the product of the marginal totals for row 2 and column 1, divided by N. In row 1, column 3, the expected value is f1n3/N. In row 3, column 2, the expected value is f3n2/N.

 

 

 

 

 

 

 

 

Column

 

 

 

 

 

 

 

 

 

 

1

 

2

 

3

TOTAL

 

 

 

1

 

 

 

 

 

 

 

 

 

 

f1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Row 2

 

 

 

 

 

 

 

 

 

 

f2

 

 

 

 

 

 

 

 

 

 

 

 

FIGURE 14.2

3

 

 

 

 

 

 

 

 

 

 

f3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Diagram showing marginal totals used for

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TOTAL

n1

n2

n3

 

 

N

calculating expected values in a cell.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

14.2.5 Similar Values of

 

O E

 

in Cells of 2 ×

2 Table

 

 

 

 

 

 

 

 

 

 

 

 

An important attribute of the observed-minus-expected values in a 2 × 2 table is that their absolute magnitudes are identical in each cell, but the signs oscillate around the table. For example, for Table 14.1, the results in Section 14.1.2. showed that the observed-minus-expected values are

 

Anemic

Non-Anemic

 

 

 

Men

3.5

3.5

Women

3.5

3.5

A simple formula for the absolute magnitude of the observed-minus-expected value for any cell in a

2 × 2 table is e

=

 

ad – bc

 

. For the anemia example, (ad bc)/N = [(6 × 91) (89 × 14)] /200 =

 

 

700/200 = 3.5.

 

 

 

 

 

With e = (ad

bc)/N, the pattern of observed–expected values in the four cells will be:

 

 

 

 

 

–e

e

 

 

 

 

 

e

–e

The rotating pattern of +e and e values is useful for checking calculations and for indicating that the four marginal totals (nl, n2, f1, f2) should be exactly the same in the expected values as in the observed values. Because the (observed expected)2 values of e2 will be identical in each cell, the main distinctions in X2 arise from divisions by the expected values. If they are too small, the results may be untrustworthy, as discussed in Section 14.3.1.

14.2.6Equivalence of Z2 and X2

Near the end of Section 14.1.4, we found that Z2 and X2 gave identical results for the data of Table 14.1. The equality of Z2 and X2 in any 2 × 2 table can be proved by substituting into the previous Formula

[13.16] for Z and developing the algebra. The increment of the two binary proportions, p 1

p2, becomes

(ad bc)/n1n2. The common value of P in a 2 ×

2 table is (n1p2 + n2p2)/N = f1/N, and Q = 1 P will be

f2/N. When entered into Formula [13.16], these results lead to

 

Z =

(ad – bc) N

 

-------------------------------

 

n1 n2 f1 f2

which is the square root of Formula [14.3] for X2.

The similarity between Z2 and X2 is so striking that in looking for P values you can square Z values if you do not have a Table of χ 2 (at one d.f.) available. For example, in Table 14.2, the X2 values for

© 2002 by Chapman & Hall/CRC

P = .1, .05, and .01, respectively, are 2.706, 3.841, and 6.635. These are the exact squares of the corresponding Z values in Table 6.3. They produce (1.645)2, (1.96)2 and (2.58)2.

The equivalence of Z2 = X2 for a 2 × 2 table, with 1 degree of freedom, was also shown in the construction of the chi-square distribution in Formula [14.2]. The equivalence helps indicate that for two-group contrasts, the Z procedure is probably the most useful of the three parametric tests: Z, t, and chi-square. Except when group sizes are small enough to be troublesome, the Z test can be used to contrast two means and also two proportions. Perhaps the main reason for noting the similarity of Z and chi-square, however, is that Z is used for calculating confidence intervals and sample sizes, as described in Sections 14.4 and 14.5.

14.2.7Mental Approximation of Z

A simple mental approximation becomes possible when the Z formula for comparing two proportions is rearranged as

 

p1

– p2

(k )(1 – k ) × N

[14.7]

 

--------------- ×

 

 

PQ

 

 

where k = n1/N and 1

k = n2/N.

 

 

 

If (k)(1 – k ) and

PQ are each about .5 (as is often true), they essentially cancel in the foregoing

formula, which becomes

 

 

 

 

 

Z

(p1 – p2 ) N

[14.8]

To use this formula mentally, without a calculator, begin with the values presented for p 1 and p2 and for

their increment, p1 p2. Because Z is “statistically significant” when

2, this status can be achieved if

 

p1 – p2

N is strikingly larger than 2. The main job is to see whether

N is large enough to achieve

 

 

·

and p2, you can quickly add n1

this goal when it multiplies p1 – p2 . If denominators are shown for p1

+n2 to form N and then estimate its square root.

For example, suppose an investigator compares the two proportions 70/159 = .44 and 300/162 = .19

as success rates in a trial of Treatment A vs. B. The increment p1 p2 is .44 .19 = .25. To exceed 2, this value must be multiplied by something that exceeds 8. Because the total group is bigger than 300, its square root will surely exceed 8. Therefore, your mental screening test can lead to the conclusion

that Z > 2, and that the result is stochastically significant. The actual calculation would show that

PQ =

.463, k(1 – k) = .500 , and Z = (.440 .185) 321 (.500)/(.463) = 4.93 for which 2P < 1 ×

106.

Conversely, suppose a speaker compares the proportions 19/40 = .48 and 12/42 = .29. The increment

of p1 p2 seems impressive at .48 .29 = .19, or about .2. For the appropriate product to exceed 2, however, the .2 value must be multiplied by 10; but the total group size here is below 100, being 40 + 42 = 82. Because the square root of the group size will not exceed 10, the result cannot be stochastically significant at 2P < .05 for Z > 2. If you want to awe (or offend) the speaker, you can immediately volunteer the warning that the result is not stochastically significant. [The actual calculation would show

 

 

PQ = (.378 )(

.622) = .48, and (k)(1 – k ) = (40/82)(42/82) = .500. The value of Z would be

[

 

.475 – .286

 

82

] (.500)/(.48) = 1.78 . It is close to 2 and will have a one-tailed but not a two-tailed

 

 

P < .05.]

 

The “mental method” is best for two-tailed rather than one-tailed tests. In the latter, Z need exceed

only

1.645 rather than the easy-to-remember “2” of 1.96. In the foregoing example, the closeness of

82

to 100 might have acted as a cautionary warning that 1.645 could be exceeded in a one-tailed test.

14.3 Problems and Precautions in Use of X2

The chi-square test became popular because it offered a stochastic contrast for the proportions found in two groups. For a clinical investigator who works with enumeration (counted) data, chi-square was the counterpart of the t test (or Z test) used for contrasting the means in two groups of dimensional laboratory data.

© 2002 by Chapman & Hall/CRC

Aside from disputes about how well the X2 computation fits the χ 2 distribution, many other problems require attention.

14.3.1Size of Groups

No consistent agreement has been developed for the minimum size of N, or of n1 and n2, that will preserve validity in the assumptions about a χ 2 distribution. A compromise solution offered by Cochran4 is to avoid X2 and to use the Fisher Exact Probability Test if N < 20, or if 20 < N < 40 and the smallest expected value in any cell is less than 5.

14.3.2Size of Cells

Consistent agreement is also absent about the minimum size of the expected values in the cells. Some authors have said that the X2 test is not valid if the expected value in any cell is below 5. Others put this minimum expected cell value at 10. Yet others recommend 20.

If you have a suitable computer program, the conflicting recommendations about sizes of groups and cells can be avoided by using the Fisher Exact Test for all 2 × 2 tables.

14.3.3Fractional Expected Values

The expected values often seem scientifically odd. Given success rates of 8/23 = .35 and 11/16 = .69 for groups of treated and untreated patients, respectively, the observed and expected values for these data are shown in Table 14.3. Just as a “mean of 2.7 children” is one of the reasons for avoiding means and using median values in dealing with discrete integer data, an expected value of “11.8” or “8.2” people seems equally strange in a 2 × 2 frequency table. Nevertheless, these fractions of people are essential for the calculations.

TABLE 14.3

Observed and Expected Values for Data in a 2 × 2 Table

 

Observed Values

 

Expected Values

 

Failure

Success

Total

Failure

Success

 

 

 

 

 

 

Untreated

15

8

23

11.8

11.2

Treated

5

11

16

8.2

7.8

Total

20

19

39

20.0

19.0

 

 

 

 

 

 

14.3.4Is X2 a Parametric Test?

The X2 test is sometimes called “non-parametric” because (unlike the usual Z or t test) it does not require dimensional data. Nevertheless, the interpretation of X2 requires the conventional parametric-type reasoning about sampling from a hypothetical distribution. Besides, as noted earlier, the parametric Z test and X2 give identical results in the comparison of two proportions.

14.3.5Controversy about Marginal Totals

As discussed earlier in Section 12.7.3, an academic controversy has developed about the idea of fixing the marginal totals at f1, f2, n1, and n2 when either X2 or the Fisher Exact Test is determined for a 2 × 2 table. Because the numbers in the table can be obtained from at least three different methods of “sampling,” each method (according to the theorists) can lead to different estimates of parametric values for the observed proportions, and to different strategies for either fixing the margins or letting them vary.

The arguments offer interesting displays of creative statistical reasoning, particularly for estimating sample sizes before the research is done. Nevertheless, when the research is completed, the pragmatic

© 2002 by Chapman & Hall/CRC

reality, acknowledged by many prominent statisticians,5–7 is that the four marginal totals are f 1, f2, n1, and n2. They are therefore kept intact, i.e., fixed, for the Fisher exact test and for the conventional X2 test, as well as for the test with Yates correction.

Unless you are planning to write a mathematical thesis on inferential probabilities, the useful pragmatic approach is to follow custom and fix both sets of margins. Besides, as the Fisher test gradually replaces X2 (or Z) for comparing two proportions, the arguments about estimating the parameters will become historical footnotes.

14.3.6Reasons for “Success” of Chi-Square

Despite all the cited infelicities, the X2 test has grown and flourished because it is easy to calculate, reasonably easy to understand, versatile, and robust. Its versatility will be discussed in Section 14.7. Its robustness arises because the results, when converted into probability values, are generally similar to those produced by the “gold standard” Fisher exact probability procedure.

Like the t test, the X2 test has major intellectual defects and major numerical advantages. The object here is neither to praise nor to condemn the test, but to familiarize you with its usage. Since the X 2 test will probably not be replaced for a long time, it will regularly appear in published literature, and the process of learning about it becomes an act of enlightened self-defense.

Nevertheless, to calculate confidence intervals or to estimate sample size for a contrast of two proportions, the preferred tactic is the Z procedure, not chi-square.

14.4 Confidence Intervals for Contrast of Two Proportions

To use the Z procedure for calculating the confidence interval of a contrast of two proportions, pA = tA/nA and pB = tB/nB, the first step is to determine the standard error of the difference.

14.4.1Standard Error of Difference

An increment of two proportions, like an increment of two means, evokes the same question about “Which standard error?”

14.4.1.1 Customary SED0 for Null Hypothesis — With the conventional null hypothesis, when we assume that the outcome is unrelated to treatment, the common proportion for the two groups is estimated parametrically as π = (nApA + nBpB)/N. With the observed P used to estimate π , the standard error of the difference, subscripted with 0 to indicate a null hypothesis, will then be

SED0 = NPQ/nA nB

[14.9]

With Zα appropriately selected for a 1 − α level of confidence, the corresponding confidence interval will be

pA – pB ± Zα NPQ/nA nB

[14.10]

14.4.1.2Simplified Calculation of SED0 The calculation of SED0 can be simplified by

+nB = N and tA + tB = T. Then P = (nApA + nBpB)/N = (tA + tB)/N = T/N. The value of

= 1 P will be [N T]/N. The product NPQ will then be T(N T)/N, and the calculational formulaQ

for [14.9] becomes

SED0 = [T(N – T )]/[(N)(nA )(nB )]

[14.11]

© 2002 by Chapman & Hall/CRC

For example, suppose we want to find the standard error for the difference in two proportions 18 /24 =

.75 and 10/15 = .67. The value of P can be calculated as (18 + 10)/(24 + 15) = 28/39 = .718, and Q will be 1 .718 = .282. The standard error would become 39 (.718 )(.282 )/(15 × 24 ) = .148. With the simplified formula, which avoids rounding for values of P and Q, the calculation would be

(18 + 10 )(6 + 5)/(24 + 15)(24)(15 ) = (28 )(11 )/(39 )(24 )(15 ) = .148

14.4.1.3 Rapid Mental Approximation — An additional feature of Formula [14.11] allows a particularly rapid “mental” approximation of the standard error of the difference.

This approach is possible because over a reasonable range of values, the ratio of [(T)(N T)]/[(nA)(nB)] will not be too far from 1. (In the cited example, it is (28)(11)/(24)(15) = .856.) If the value of [T(N T)]/[(nA)(nB )] is crudely approximated as 1, the standard error of the difference will be approximated

1/N.

as SED

 

The

1/N approximation for SED in comparing two proportions is the same 1/N calculation

discussed earlier (Section 8.5.1) for the 95% confidence-interval component of a single proportion. In using this “shortcut,” remember that the 1/N formula is used for two different approximations: SED

for two proportions and 2SE for one proportion. In the example here,

1/39 = .160 — a value not far

from the actual SED of .148.

 

For the “mental” part of the calculation, you can estimate 1/39

as roughly 1/36 , for which the

square root is 1/6 = .167, which is reasonably close to the actual value of .148. The “mental” feat here is even more impressive because it just happened that way: the numbers in the example were not deliberately “rigged” to produce a close result.

Unlike the results of two means, the results for two proportions are seldom listed with standard errors, which would be pA qA /nA and pB qB/nB . Therefore, a crude confidence interval component cannot be readily obtained by adding the two standard errors and doubling the result. Applying the “crude”

formula of 1/N , however, and using the particularly crude value of 1/36 = .167 in the example here, we could double the approximated SED to get .167 × 2 = .234. Since this value substantially exceeds the observed pA pB = .083, we can feel almost sure that the result is not stochastically significant at a two-tailed α = .05. Essentially the same computational tactic was used for the mental approximation in Section 14.2.7, when pA pB was multiplied by N , and the result compared against a value of 2.

14.4.1.4 SEDH Error for Alternative Hypothesis — In additional types of stochastic reasoning discussed later in Chapter 23, the assumed alternative hypothesis is that the two “treatments” are different, i.e., the observed values of pA and pB are not parametrically similar. With this assumption, the appropriate variance of the increment in the two central indexes is calculated, as in Chapter 13, by adding the two observed variances as (pAqA/nA) + (pBqB/nB). Under the alternative hypothesis, the standard error of the difference will be

SEDH = (pA qA /nA ) + (pB qB /nB )

[14.12]

Thus, if we assumed that one treatment was really different from the other in the foregoing example, the correct standard error would be calculated as

[(.75)(.25 )/24] + [(.67 )(.33)/15] = .150

14.4.1.5 Simplified Calculation of SEDH To avoid problems in rounding, the alternative SEDH is best calculated with the integer values of

{(tA /nA )[(nA – tA )/nA ]/nA } + {[(tB /nB )(nB – tB )/nB ]/nB }

© 2002 by Chapman & Hall/CRC

With hand calculators that can easily do “cubing,” a simple computational formula is

SED = [(tA )(nA – tA )/nA3 ] + [tB (nB – tB )/nB3 ]

[14.13]

In the foregoing example, the result would be

[(18 )(6 )/243] + [(10 )(5 )/153] = .150

14.4.2Similarity of SEDs

As in a contrast of two means, the two methods of calculation usually produce quite similar results for the standard error of the difference in two proportions. Thus, although Formula [14.9] (or [14.11]) vs. Formula [14.12] (or [14.13]) may produce different results in mathematical theory, the pragmatic values are reasonably close. In the foregoing example that compared 18 /24 = .75 with 10/15 = .67, the disparity between the SEDs of .148 and .150 is only about 2 parts in 150.

For practical purposes, therefore, the standard error of the difference in proportions can be calculated with either formula, regardless of what assumption is made about the parametric hypothesis. Because the intermediate calculation of P is avoided, the formula of (pAqA/nA) + (pBqB/nB) seems somewhat easier to use on a hand calculator, but NPQ /nAnB is usually preferred here because it is also more appropriate for the conventional null hypothesis.

If the result is so borderline that stochastic significance will be lost or gained by choice of the method for calculating SED, the data probably do not warrant any firm conclusions. Besides, both formulas rely on the assumption that increments in the two proportions have a Gaussian distribution. As shown in Chapter 8, however, this assumption does not hold for relatively small group sizes. In such circumstances, the data analyst might want to use the Fisher-test intervals or some other procedure to achieve reality amid the allure of the Gaussian wonderland.

14.4.3Choice of Zα Values

After an SED value is determined, the next step in constructing a confidence interval is to choose a value for Zα . This choice depends on the same questions that have been previously discussed for how much “confidence” we want, and whether it goes in a oneor two-tailed direction.

The questions can be answered with the same reasoning and arguments used previously for dimensional data in Chapter 13, but two distinctions need further consideration for comparisons of binary proportions.

14.4.3.1 Magnitude of α In most ordinary situations, α is set at a two-tailed level of .05, for which Zα = 1.96. Thus, for the difference in the previously cited comparison of 18/24 = .75 vs. 10/15 = .67, with SED = .148, the 95% confidence interval will be

(.75

– .67) ± (1.96 )(1.48 )

= .08 ± .29

and will extend from .21 to +.37.

Because the interval

includes 0, the result is not stochastically

significant at 2P .05. On the other hand, we might be reluctant to conclude that the increment of .08 is truly “insignificant” because it might be as large as .21 in favor of “Treatment B” or .37 in favor of “Treatment A” within the extent of the 95% confidence interval.

The magnitude of α becomes particularly important when the stochastic result is “nonsignificant,” and the confidence interval is then explored for the possibility that its upper end is impressively high. With a small enough choice of α, e.g., α = .0001, Zα can become large enough to “drag” the interval for any two-proportion increment across whatever descriptive border is set for “high.” For example, if the observed increment is .04 and the SED is .03, the customary Zα will make the confidence interval be .04 ± (1.96)(.03). It will be “nonsignificant,” extending from .019 to +.099.

On the other hand, if we set α = .001, the Zα of 3.29 will make the interval become .04 ± (3.29)(.03). Its upper end will now have the “impressive” value of .14.

© 2002 by Chapman & Hall/CRC