Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
TABLE 14.2
Correspondence of Two-Tailed External P Values for X2 Values at Different Degrees of Freedom (marked ν )
ν |
0.90 |
0.80 |
0.70 |
0.60 |
0.50 |
0.40 |
0.30 |
0.20 |
0.10 |
0.050 |
0.0250 |
0.010 |
0.0050 |
0.0010 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
0.0158 |
0.0642 |
0.148 |
0.275 |
0.455 |
0.708 |
1.074 |
1.642 |
2.706 |
3.841 |
5.024 |
6.635 |
7.879 |
10.828 |
2 |
0.211 |
0.446 |
0.713 |
1.022 |
1.386 |
1.833 |
2.408 |
3.219 |
4.605 |
5.991 |
7.378 |
9.210 |
10.597 |
13.816 |
3 |
0.584 |
1.005 |
1.424 |
1.869 |
2.366 |
2.946 |
3.665 |
4.642 |
6.251 |
7.815 |
9.348 |
11.345 |
12.838 |
16.266 |
4 |
1.064 |
1.649 |
2.195 |
2.753 |
3.357 |
4.045 |
4.878 |
5.989 |
7.779 |
9.488 |
11.143 |
13.277 |
14.860 |
18.467 |
5 |
1.610 |
2.343 |
3.000 |
3.355 |
4.351 |
5.132 |
6.064 |
7.289 |
9.236 |
11.070 |
12.833 |
15.086 |
16.750 |
20.515 |
6 |
2.204 |
3.070 |
3.828 |
4.570 |
5.348 |
6.211 |
7.231 |
8.558 |
10.645 |
12.592 |
14.449 |
16.812 |
18.548 |
22.458 |
7 |
2.833 |
3.822 |
4.671 |
5.493 |
6.346 |
7.283 |
8.383 |
9.803 |
12.017 |
14.067 |
16.013 |
18.075 |
20.278 |
24.322 |
8 |
3.490 |
4.594 |
5.527 |
6.423 |
7.344 |
8.351 |
9.524 |
11.030 |
13.362 |
15.507 |
17.535 |
20.090 |
21.955 |
26.124 |
9 |
4.168 |
5.380 |
6.393 |
7.357 |
8.343 |
9.414 |
10.656 |
12.242 |
14.684 |
16.919 |
19.023 |
21.666 |
23.589 |
27.877 |
10 |
4.865 |
6.179 |
7.267 |
8.295 |
9.342 |
10.473 |
11.781 |
13.442 |
15.987 |
18.307 |
20.483 |
23.209 |
25.188 |
29.588 |
11 |
5.578 |
6.989 |
8.148 |
9.237 |
10.341 |
11.530 |
12.899 |
14.631 |
17.275 |
19.675 |
21.920 |
24.725 |
26.757 |
31.264 |
12 |
6.304 |
7.807 |
9.034 |
10.182 |
11.340 |
12.584 |
14.011 |
15.812 |
18.549 |
21.026 |
23.337 |
26.217 |
28.300 |
32.909 |
13 |
7.042 |
8.634 |
9.926 |
11.129 |
12.340 |
13.636 |
15.119 |
16.985 |
19.812 |
22.362 |
24.736 |
27.688 |
29.819 |
34.528 |
14 |
7.790 |
9.467 |
10.821 |
12.078 |
13.339 |
14.685 |
16.222 |
18.151 |
21.064 |
23.685 |
26.119 |
29.141 |
31.319 |
36.123 |
15 |
8.547 |
10.307 |
11.721 |
13.030 |
14.339 |
15.733 |
17.322 |
19.311 |
22.307 |
24.996 |
27.488 |
30.578 |
32.801 |
37.697 |
16 |
9.312 |
11.152 |
12.624 |
13.983 |
15.338 |
16.780 |
18.418 |
20.465 |
23.542 |
26.296 |
28.845 |
32.000 |
34.267 |
39.252 |
17 |
10.085 |
12.002 |
13.531 |
14.937 |
16.338 |
17.824 |
19.511 |
21.615 |
24.769 |
27.587 |
30.191 |
33.409 |
35.718 |
40.790 |
18 |
10.865 |
12.857 |
14.440 |
15.893 |
17.338 |
18.868 |
20.601 |
22.760 |
25.989 |
28.869 |
31.526 |
34.805 |
37.156 |
42.312 |
19 |
11.651 |
13.716 |
15.352 |
16.850 |
18.338 |
19.910 |
21.689 |
23.900 |
27.204 |
30.144 |
32.852 |
36.191 |
38.582 |
43.820 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Source: This table is derived from Geigy Scientific Tables, Vol. 2, ed. by C. Lentner, 1982 Ciba-Geigy Limited, Basle, Switzerland.
Accordingly, at one degree of freedom in Table 14.2, the top row shows that the calculated X2 of 2.73 is just higher than the 2.706 value required for 2P = .10. Therefore, the stochastic conclusion is
.05 < 2P < .10.
To confirm the relationship of Z2 and X2, note that the increment in the numerator of Formula [l3.16]
for Z is (14/105) (6/95) = .070, and |
nA nB /N is (95 )(105 )/200 = 7.06 . The denominator term for |
PQ is calculated as (20 )(180 )/200 |
= .3. Therefore, Z = (.070)(7.06)/.3 = 1.65 and Z2 is 2.72, which |
is identical, except for minor rounding, to the value of X2 calculated with the observed/expected tactic in Section 14.1.2.
Although this approach is satisfactory to show that a 2 × 2 table has one degree of freedom, a more general strategy is needed for larger two-way tables.
14.1.5Controversy about d.f. in 2-Way Tables
X2 tests are most commonly applied to a 2 × 2 table having four interior cells, but can also be used for two-way tables with larger numbers of cells. With r categories in the rows and c categories in the columns, the table would have r × c cells.
The decision about degrees of freedom in an r × c table gave rise, early in the 20th century, to a heated, bitter battle between two statistical titans: Ronald A. Fisher and Karl Pearson. Fisher emerged the victor, but the personal feud between the two men was never resolved, with Fisher thereafter avoiding (or being denied) publication in the then leading journal of statistics ( Biometrika), which was edited by Pearson. [Fisher, of course, found other places to publish, but the feud was a juicy scandal for many years. Consider the analogous situation in medicine today if William Osler were alive in the U.S. and writing vigorously, but never publishing anything in the New England Journal of Medicine.]
In the argument over degrees of freedom, Pearson contended that a two-way table with r rows and c columns would have r × c cells, and that the degrees of freedom should therefore be rc − 1. Fisher demonstrated that the correct value for degrees of freedom was (r 1)(c 1).
He began his argument by “dis-entabling” the data and converting the cells into a linear array. It would have r × c cells, which could be filled with rc degrees of freedom, choosing any desired numbers. Because the cellular totals must add up to N, one degree of freedom is immediately lost, so that rc 1 choices become “freely” available. Reconstructing the r × c cells into a 2-way table, however, produces some new constraints. The marginal totals must add up to the grand total (N) in the rows and also in the columns. Therefore, only r 1 marginal totals can be freely chosen for the rows, and each of those choices creates
an additional constraint in the cellular freedom, so that r |
1 degrees of freedom are lost in the basic cellular |
choices. Analogously, the column totals create another c |
1 constraints. Therefore, the degrees of freedom |
© 2002 by Chapman & Hall/CRC
are rc 1 (r − 1) (c 1), which becomes (r |
1)(c 1). Thus, a 4 × |
5 table has (4 − 1)(5 − 1) = (3)(4) |
|||
= 12 degrees of freedom, and a 2 × 2 table has |
(2 1)(2 1) = 1 degree of freedom. |
||||
The idea can be quickly grasped for a 2 × |
2 table if you consider the marginal totals fixed as follows: |
||||
|
|
|
Column |
|
|
|
Row |
1 |
2 |
Total |
|
|
|
|
|
|
|
1 |
|
|
n1 |
||
2 |
|
|
n2 |
||
|
|
|
|
|
|
|
Total |
f1 |
f2 |
N |
|
As soon as you enter a number in any one of these four cells, the values of all the remaining cells are determined. Suppose a is chosen to be the value of cell 1,1. Then cell 1,2 is n 1 a; cell 2,1 is f1 a; and cell 2,2 is either n2 (f1 a) or f2 (n1 a). Consequently, the only available freedom in this table is to choose the value for any one of the cells.
14.1.6“Large” Degrees of Freedom
The formula (r 1)(c |
1) should be kept in mind when the value of ν is determined for r × c tables |
that are larger than 2 × |
2. If ν becomes very large, however, you may want to reappraise the planned |
analysis. For example, a table with 4 categories in the rows and 5 categories in the columns will have 12 degrees of freedom, but the results of such a table are usually difficult to understand and interpret. In fact, many experienced, enlightened data analysts say they regularly try to reduce (i.e., compress) everything into a 2 × 2 table because they understand it best.
Chapter 27 contains a discussion of tables where r and c are each ≥ 3, and also of special tables constructed in a 2 × c (or r × 2) format where only c (or r) is ≥ 3. [For example, we might consider the sequence of survival proportions for four ordinal stages – I, II, III, and IV — in a table that would have (2 1)(4 1) = 3 degrees of freedom.] In general, however, if the degrees of freedom start getting “large” (i.e., ≥ 6), the table is probably unwieldy and should be compressed into smaller numbers of categories. Therefore, opportunities should seldom arise to use the many lines of Table 14.2 for which the degrees of freedom exceed 6.
14.1.7Yates Continuity Correction
Another major conceptual battle about χ 2 is still being fought today and has not yet been settled. The source of the controversy is the “continuity correction” introduced by Frank Yates3 in 1934.
The basic argument was discussed in Section 13.8.2. The distribution of a set of categorical frequencies, or of the X2 value calculated from them, is discontinuous, but χ 2 has a continuous distribution.
Consequently, χ 2 is the “limit” toward which X2 approaches as the group sizes grow very large. For a suitable “correction” of X2, Yates proposed that if we take “the half units of deviation from expectation as the group boundaries, we may expect to obtain a much closer approximation to the true distribution.”
The net algebraic effect of the Yates correction was to make the basic Formula [14.1] become
2 Σ ( O – E – 1/2 )2 Xc = -------------------------------------
E
where O = observed values, E = expected values, and the c subscript indicates use of the Yates correction. For practical purposes, the Yates correction usually alters the individual observed–expected deviations to make them less extreme. The corrected value of X2c is then used for entering the standard chi-square tables to find the associated value of P.
The Yates correction soon became established as a chic item of statistical fashion in the days before easy computation was available for the Fisher Exact Test. A manuscript reviewer (or editor) who could find nothing else to say about a paper would regularly ask whether the investigators had calculated their X2 values ignorantly (without the correction) or sagaciously (with it). To know about and use the Yates
© 2002 by Chapman & Hall/CRC
correction became one of the “in-group” distinctions — somewhat like knowing the difference between incidence and prevalence — that separated the pro from the neophyte in the analysis of clinical and epidemiologic data.
In recent years, however, a major assault has been launched against the continuity correction and it seems to be going out of fashion. The anti-Yates-correction argument is that the uncorrected X2 value gives better agreement than the corrected one when an exact probability (rather than an approximate one, via χ 2 tables) is calculated for the data of a 2 × 2 table. If the “gold s tandard” Fisher exact probability test, discussed in Chapter 12, eventually replaces the X2 test altogether, the dispute about the Yates correction will become moot.
14.2 Formulas for Calculation
Because the chi-square procedure is so popular, many tactics have been developed to simplify the arithmetic with methods that incorporate the “expected” values into the calculations, thereby avoiding the cumbersome Σ [(observed – expected)2/expected] formula. The alternative formulas are listed here for readers who may want or need to do the calculations themselves without a computer program. [If you like interesting algebraic challenges, you can work out proofs that the cited formulas are correct.]
14.2.12 × 2 Tables
For a 2 × 2 table constructed as
|
|
|
Total |
|
|
|
|
|
a |
b |
n1 |
|
c |
d |
n2 |
Total |
f1 |
f2 |
N |
the simplified formula is
X2 = [(ad – bc)2 N ]/(n1 n2 f1 f2 ) |
[14.3] |
In words, X2 is the squared difference in the cross product terms, multiplied by N, divided by the product of the four marginal totals. Thus, for the data in Table 14.1,
X |
2 |
= |
[(6 × 91 ) – (89 × 14 )]2 × |
200 |
= |
(546 – 1246)2 |
× 200 |
|||||
|
----------------------------------------------------------------------95 × |
|
105 × |
20 × 180 |
|
|
------------------------------------------------35910000 |
|||||
|
|
|
|
|
|
|
||||||
|
|
= |
(–700 )2 |
× |
200 |
= |
98000000 |
= 2.73 |
|
|||
|
|
---------------------------------35910000 |
35910000----------------------- |
|
||||||||
|
|
|
|
|
|
|
|
|||||
which is the same result obtained earlier.
14.2.2Yates Correction in 2 × 2 Table
To employ the Yates correction, Formula [14.3] is changed to
2 ( ad – bc – [N/2])2 N
X = --------------------------------------------------------------------
c (a + b)(c + d)(a + c )(b + d )
If Yates correction were applied, the previous calculation for Table 14.1 would become X2c (200/2)]2 [200]/35910000 = (600)2(200)/35910000 = 2.005.
[14.4]
= [700
© 2002 by Chapman & Hall/CRC
14.2.3Formula for “Rate” Calculations
The results of two proportions (or two rates) are commonly expressed in the form p1 = t1/n1 and p2 = t2/n2, with totals of P = T/N, where T = t1 + t2 and N = n1 + n2. The “unpacking” of such data can be avoided with the formula
X |
2 |
= |
|
t12 |
t22 |
T2 |
|
|
|
N2 |
|
[14.5] |
|
|
|
|
|||||||||
|
---- |
+ ---- – ---- |
|
|
--------------------------- |
|
||||||
|
|
|
|
n1 |
n2 |
N |
|
|
|
(T)(N – T) |
|
|
|
|
|
|
|
|
|
|
|||||
This formula may look forbidding, but it can be rapidly carried out on a hand calculator, after only one manual maneuver—the subtraction of T from N.
The formula can be illustrated if the occurrence of anemia in Table 14.1 were presented as 6/95 =
.063 for men and 14/105 = .133 for women. Instead of unpacking these proportions into a table, we can promptly determine that T = 6 + 14 = 20, N = 95 + 105 = 200, and N T = 200 20 = 180. With Formula [14.5], we then calculate
X2 = [(62/95) + (142/105) (202/200)][2002/{(20)(180)}]
= [.2456][11.11] = 2.73
which is the same result as previously.
Another formula that determines X2 without fully “unpacking” the two proportions is
X |
2 |
= |
(n2 t1 |
– n1 t2 )2 N |
[14.6] |
|
----------------------------------n1 n2 T(N – T) |
||||
|
|
|
|
||
For the data in Table 14.1, this calculation is [(105)(6) (95)(14)]2(200)]/[(95)(105)(20)(180)] = 2.73.
14.2.4Expected Values
If you, rather than a computer program, are going to calculate X2 for a two-way table that has more than 2 rows or 2 columns, none of the simplified formulas can be used. The expected values will have to be determined, and a strategy is needed for finding them quickly and accurately.
The process is easiest to show for a 2 × 2 table, structured as follows:
|
Men |
Women |
Total |
Old |
a |
b |
f1 |
Young |
c |
d |
f2 |
Total |
n1 |
n2 |
N |
|
|
|
|
In this table, the proportion of old people is f1/N. Under the null hypothesis, we would expect the proportion of old men to be (f1/N) × n1 and proportion of old women to be (f1/N) × n2. Similarly, the expected values for young men and young women would be, respectively, (f 2n1)/N and (f2n2)/N. Consequently, the expected values in any cell are the products of the marginal totals for the corresponding row and column, divided by N. Thus, the expected values here are
|
Men |
Women |
Old |
f1 n1 |
f1 n2 |
-------- |
-------- |
|
|
N |
N |
Young |
f2 n1 |
f2 n2 |
-------- |
-------- |
|
|
N |
N |
|
|
|
© 2002 by Chapman & Hall/CRC
For a table of larger dimensions, the process can be illustrated as shown in Figure 14.2. In the cell for row 2, column 1, the expected value is f2n1/N—which is the product of the marginal totals for row 2 and column 1, divided by N. In row 1, column 3, the expected value is f1n3/N. In row 3, column 2, the expected value is f3n2/N.
|
|
|
|
|
|
|
|
Column |
|
|
|
|
|
||
|
|
|
|
|
1 |
|
2 |
|
3 |
TOTAL |
|||||
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
f1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
Row 2 |
|
|
|
|
|
|
|
|
|
|
f2 |
|
|
|
|
|
|
|
|
|
|
|
|
||||
FIGURE 14.2 |
3 |
|
|
|
|
|
|
|
|
|
|
f3 |
|||
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
Diagram showing marginal totals used for |
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
TOTAL |
n1 |
n2 |
n3 |
|
|
N |
||||||||
calculating expected values in a cell. |
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
14.2.5 Similar Values of |
|
O – E |
|
in Cells of 2 × |
2 Table |
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|||||||||
An important attribute of the observed-minus-expected values in a 2 × 2 table is that their absolute magnitudes are identical in each cell, but the signs oscillate around the table. For example, for Table 14.1, the results in Section 14.1.2. showed that the observed-minus-expected values are
|
Anemic |
Non-Anemic |
|
|
|
Men |
−3.5 |
3.5 |
Women |
3.5 |
−3.5 |
A simple formula for the absolute magnitude of the observed-minus-expected value for any cell in a
2 × 2 table is e |
= |
|
ad – bc |
|
/Ν . For the anemia example, (ad bc)/N = [(6 × 91) (89 × 14)] /200 = |
|
|
|
|||||
700/200 = 3.5. |
|
|
|
|
|
|
With e = (ad |
bc)/N, the pattern of observed–expected values in the four cells will be: |
|||||
|
|
|
|
|
–e |
e |
|
|
|
|
|
e |
–e |
The rotating pattern of +e and e values is useful for checking calculations and for indicating that the four marginal totals (nl, n2, f1, f2) should be exactly the same in the expected values as in the observed values. Because the (observed expected)2 values of e2 will be identical in each cell, the main distinctions in X2 arise from divisions by the expected values. If they are too small, the results may be untrustworthy, as discussed in Section 14.3.1.
14.2.6Equivalence of Z2 and X2
Near the end of Section 14.1.4, we found that Z2 and X2 gave identical results for the data of Table 14.1. The equality of Z2 and X2 in any 2 × 2 table can be proved by substituting into the previous Formula
[13.16] for Z and developing the algebra. The increment of the two binary proportions, p 1 |
p2, becomes |
|
(ad − bc)/n1n2. The common value of P in a 2 × |
2 table is (n1p2 + n2p2)/N = f1/N, and Q = 1 P will be |
|
f2/N. When entered into Formula [13.16], these results lead to |
|
|
Z = |
(ad – bc) N |
|
------------------------------- |
|
|
n1 n2 f1 f2
which is the square root of Formula [14.3] for X2.
The similarity between Z2 and X2 is so striking that in looking for P values you can square Z values if you do not have a Table of χ 2 (at one d.f.) available. For example, in Table 14.2, the X2 values for
© 2002 by Chapman & Hall/CRC
P = .1, .05, and .01, respectively, are 2.706, 3.841, and 6.635. These are the exact squares of the corresponding Z values in Table 6.3. They produce (1.645)2, (1.96)2 and (2.58)2.
The equivalence of Z2 = X2 for a 2 × 2 table, with 1 degree of freedom, was also shown in the construction of the chi-square distribution in Formula [14.2]. The equivalence helps indicate that for two-group contrasts, the Z procedure is probably the most useful of the three parametric tests: Z, t, and chi-square. Except when group sizes are small enough to be troublesome, the Z test can be used to contrast two means and also two proportions. Perhaps the main reason for noting the similarity of Z and chi-square, however, is that Z is used for calculating confidence intervals and sample sizes, as described in Sections 14.4 and 14.5.
14.2.7Mental Approximation of Z
A simple mental approximation becomes possible when the Z formula for comparing two proportions is rearranged as
|
p1 |
– p2 |
(k )(1 – k ) × N |
[14.7] |
|
--------------- × |
|||
|
|
PQ |
|
|
where k = n1/N and 1 |
k = n2/N. |
|
|
|
If (k)(1 – k ) and |
PQ are each about .5 (as is often true), they essentially cancel in the foregoing |
|||
formula, which becomes |
|
|
|
|
|
|
Z |
(p1 – p2 ) N |
[14.8] |
To use this formula mentally, without a calculator, begin with the values presented for p 1 and p2 and for |
|||
their increment, p1 p2. Because Z is “statistically significant” when ≥ |
2, this status can be achieved if |
||
|
p1 – p2 |
N is strikingly larger than 2. The main job is to see whether |
N is large enough to achieve |
|
|
· |
and p2, you can quickly add n1 |
this goal when it multiplies p1 – p2 . If denominators are shown for p1 |
|||
+n2 to form N and then estimate its square root.
For example, suppose an investigator compares the two proportions 70/159 = .44 and 300/162 = .19
as success rates in a trial of Treatment A vs. B. The increment p1 p2 is .44 .19 = .25. To exceed 2, this value must be multiplied by something that exceeds 8. Because the total group is bigger than 300, its square root will surely exceed 8. Therefore, your mental screening test can lead to the conclusion
that Z > 2, and that the result is stochastically significant. The actual calculation would show that |
PQ = |
.463, k(1 – k) = .500 , and Z = (.440 − .185) 321 (.500)/(.463) = 4.93 for which 2P < 1 × |
10−6. |
Conversely, suppose a speaker compares the proportions 19/40 = .48 and 12/42 = .29. The increment
of p1 p2 seems impressive at .48 .29 = .19, or about .2. For the appropriate product to exceed 2, however, the .2 value must be multiplied by 10; but the total group size here is below 100, being 40 + 42 = 82. Because the square root of the group size will not exceed 10, the result cannot be stochastically significant at 2P < .05 for Z > 2. If you want to awe (or offend) the speaker, you can immediately volunteer the warning that the result is not stochastically significant. [The actual calculation would show
|
|
PQ = (.378 )( |
.622) = .48, and (k)(1 – k ) = (40/82)(42/82) = .500. The value of Z would be |
||
[ |
|
.475 – .286 |
|
82 |
] (.500)/(.48) = 1.78 . It is close to 2 and will have a one-tailed but not a two-tailed |
|
|
||||
P < .05.] |
|
||||
The “mental method” is best for two-tailed rather than one-tailed tests. In the latter, Z need exceed
only |
1.645 rather than the easy-to-remember “2” of 1.96. In the foregoing example, the closeness of |
82 |
to 100 might have acted as a cautionary warning that 1.645 could be exceeded in a one-tailed test. |
14.3 Problems and Precautions in Use of X2
The chi-square test became popular because it offered a stochastic contrast for the proportions found in two groups. For a clinical investigator who works with enumeration (counted) data, chi-square was the counterpart of the t test (or Z test) used for contrasting the means in two groups of dimensional laboratory data.
© 2002 by Chapman & Hall/CRC
Aside from disputes about how well the X2 computation fits the χ 2 distribution, many other problems require attention.
14.3.1Size of Groups
No consistent agreement has been developed for the minimum size of N, or of n1 and n2, that will preserve validity in the assumptions about a χ 2 distribution. A compromise solution offered by Cochran4 is to avoid X2 and to use the Fisher Exact Probability Test if N < 20, or if 20 < N < 40 and the smallest expected value in any cell is less than 5.
14.3.2Size of Cells
Consistent agreement is also absent about the minimum size of the expected values in the cells. Some authors have said that the X2 test is not valid if the expected value in any cell is below 5. Others put this minimum expected cell value at 10. Yet others recommend 20.
If you have a suitable computer program, the conflicting recommendations about sizes of groups and cells can be avoided by using the Fisher Exact Test for all 2 × 2 tables.
14.3.3Fractional Expected Values
The expected values often seem scientifically odd. Given success rates of 8/23 = .35 and 11/16 = .69 for groups of treated and untreated patients, respectively, the observed and expected values for these data are shown in Table 14.3. Just as a “mean of 2.7 children” is one of the reasons for avoiding means and using median values in dealing with discrete integer data, an expected value of “11.8” or “8.2” people seems equally strange in a 2 × 2 frequency table. Nevertheless, these fractions of people are essential for the calculations.
TABLE 14.3
Observed and Expected Values for Data in a 2 × 2 Table
|
Observed Values |
|
Expected Values |
||
|
Failure |
Success |
Total |
Failure |
Success |
|
|
|
|
|
|
Untreated |
15 |
8 |
23 |
11.8 |
11.2 |
Treated |
5 |
11 |
16 |
8.2 |
7.8 |
Total |
20 |
19 |
39 |
20.0 |
19.0 |
|
|
|
|
|
|
14.3.4Is X2 a Parametric Test?
The X2 test is sometimes called “non-parametric” because (unlike the usual Z or t test) it does not require dimensional data. Nevertheless, the interpretation of X2 requires the conventional parametric-type reasoning about sampling from a hypothetical distribution. Besides, as noted earlier, the parametric Z test and X2 give identical results in the comparison of two proportions.
14.3.5Controversy about Marginal Totals
As discussed earlier in Section 12.7.3, an academic controversy has developed about the idea of fixing the marginal totals at f1, f2, n1, and n2 when either X2 or the Fisher Exact Test is determined for a 2 × 2 table. Because the numbers in the table can be obtained from at least three different methods of “sampling,” each method (according to the theorists) can lead to different estimates of parametric values for the observed proportions, and to different strategies for either fixing the margins or letting them vary.
The arguments offer interesting displays of creative statistical reasoning, particularly for estimating sample sizes before the research is done. Nevertheless, when the research is completed, the pragmatic
© 2002 by Chapman & Hall/CRC
reality, acknowledged by many prominent statisticians,5–7 is that the four marginal totals are f 1, f2, n1, and n2. They are therefore kept intact, i.e., fixed, for the Fisher exact test and for the conventional X2 test, as well as for the test with Yates correction.
Unless you are planning to write a mathematical thesis on inferential probabilities, the useful pragmatic approach is to follow custom and fix both sets of margins. Besides, as the Fisher test gradually replaces X2 (or Z) for comparing two proportions, the arguments about estimating the parameters will become historical footnotes.
14.3.6Reasons for “Success” of Chi-Square
Despite all the cited infelicities, the X2 test has grown and flourished because it is easy to calculate, reasonably easy to understand, versatile, and robust. Its versatility will be discussed in Section 14.7. Its robustness arises because the results, when converted into probability values, are generally similar to those produced by the “gold standard” Fisher exact probability procedure.
Like the t test, the X2 test has major intellectual defects and major numerical advantages. The object here is neither to praise nor to condemn the test, but to familiarize you with its usage. Since the X 2 test will probably not be replaced for a long time, it will regularly appear in published literature, and the process of learning about it becomes an act of enlightened self-defense.
Nevertheless, to calculate confidence intervals or to estimate sample size for a contrast of two proportions, the preferred tactic is the Z procedure, not chi-square.
14.4 Confidence Intervals for Contrast of Two Proportions
To use the Z procedure for calculating the confidence interval of a contrast of two proportions, pA = tA/nA and pB = tB/nB, the first step is to determine the standard error of the difference.
14.4.1Standard Error of Difference
An increment of two proportions, like an increment of two means, evokes the same question about “Which standard error?”
14.4.1.1 Customary SED0 for Null Hypothesis — With the conventional null hypothesis, when we assume that the outcome is unrelated to treatment, the common proportion for the two groups is estimated parametrically as π = (nApA + nBpB)/N. With the observed P used to estimate π , the standard error of the difference, subscripted with 0 to indicate a null hypothesis, will then be
SED0 = NPQ/nA nB |
[14.9] |
With Zα appropriately selected for a 1 − α level of confidence, the corresponding confidence interval will be
pA – pB ± Zα NPQ/nA nB |
[14.10] |
14.4.1.2Simplified Calculation of SED0 — The calculation of SED0 can be simplified by
+nB = N and tA + tB = T. Then P = (nApA + nBpB)/N = (tA + tB)/N = T/N. The value of
= 1 P will be [N T]/N. The product NPQ will then be T(N T)/N, and the calculational formulaQ
for [14.9] becomes
SED0 = [T(N – T )]/[(N)(nA )(nB )] |
[14.11] |
© 2002 by Chapman & Hall/CRC
For example, suppose we want to find the standard error for the difference in two proportions 18 /24 =
.75 and 10/15 = .67. The value of P can be calculated as (18 + 10)/(24 + 15) = 28/39 = .718, and Q will be 1 .718 = .282. The standard error would become 
39 (.718 )(.282 )/(15 × 24 ) = .148. With the simplified formula, which avoids rounding for values of P and Q, the calculation would be

(18 + 10 )(6 + 5)/(24 + 15)(24)(15 ) =
(28 )(11 )/(39 )(24 )(15 ) = .148
14.4.1.3 Rapid Mental Approximation — An additional feature of Formula [14.11] allows a particularly rapid “mental” approximation of the standard error of the difference.
This approach is possible because over a reasonable range of values, the ratio of [(T)(N T)]/[(nA)(nB)] will not be too far from 1. (In the cited example, it is (28)(11)/(24)(15) = .856.) If the value of [T(N T)]/[(nA)(nB )] is crudely approximated as 1, the standard error of the difference will be approximated
− |
1/N. |
|
as SED |
|
|
The |
1/N approximation for SED in comparing two proportions is the same 1/N calculation |
|
discussed earlier (Section 8.5.1) for the 95% confidence-interval component of a single proportion. In using this “shortcut,” remember that the
1/N formula is used for two different approximations: SED
for two proportions and 2SE for one proportion. In the example here, |
1/39 = .160 — a value not far |
from the actual SED of .148. |
|
For the “mental” part of the calculation, you can estimate 1/39 |
as roughly 1/36 , for which the |
square root is 1/6 = .167, which is reasonably close to the actual value of .148. The “mental” feat here is even more impressive because it just happened that way: the numbers in the example were not deliberately “rigged” to produce a close result.
Unlike the results of two means, the results for two proportions are seldom listed with standard errors, which would be
pA qA /nA and
pB qB/nB . Therefore, a crude confidence interval component cannot be readily obtained by adding the two standard errors and doubling the result. Applying the “crude”
formula of
1/N , however, and using the particularly crude value of
1/36 = .167 in the example here, we could double the approximated SED to get .167 × 2 = .234. Since this value substantially exceeds the observed pA pB = .083, we can feel almost sure that the result is not stochastically significant at a two-tailed α = .05. Essentially the same computational tactic was used for the mental approximation in Section 14.2.7, when pA pB was multiplied by
N , and the result compared against a value of 2.
14.4.1.4 SEDH Error for Alternative Hypothesis — In additional types of stochastic reasoning discussed later in Chapter 23, the assumed alternative hypothesis is that the two “treatments” are different, i.e., the observed values of pA and pB are not parametrically similar. With this assumption, the appropriate variance of the increment in the two central indexes is calculated, as in Chapter 13, by adding the two observed variances as (pAqA/nA) + (pBqB/nB). Under the alternative hypothesis, the standard error of the difference will be
SEDH = (pA qA /nA ) + (pB qB /nB ) |
[14.12] |
Thus, if we assumed that one treatment was really different from the other in the foregoing example, the correct standard error would be calculated as

[(.75)(.25 )/24] + [(.67 )(.33)/15] = .150
14.4.1.5 Simplified Calculation of SEDH — To avoid problems in rounding, the alternative SEDH is best calculated with the integer values of
{(tA /nA )[(nA – tA )/nA ]/nA } + {[(tB /nB )(nB – tB )/nB ]/nB }
© 2002 by Chapman & Hall/CRC
With hand calculators that can easily do “cubing,” a simple computational formula is
SED = [(tA )(nA – tA )/nA3 ] + [tB (nB – tB )/nB3 ] |
[14.13] |
In the foregoing example, the result would be

[(18 )(6 )/243] + [(10 )(5 )/153] = .150
14.4.2Similarity of SEDs
As in a contrast of two means, the two methods of calculation usually produce quite similar results for the standard error of the difference in two proportions. Thus, although Formula [14.9] (or [14.11]) vs. Formula [14.12] (or [14.13]) may produce different results in mathematical theory, the pragmatic values are reasonably close. In the foregoing example that compared 18 /24 = .75 with 10/15 = .67, the disparity between the SEDs of .148 and .150 is only about 2 parts in 150.
For practical purposes, therefore, the standard error of the difference in proportions can be calculated with either formula, regardless of what assumption is made about the parametric hypothesis. Because the intermediate calculation of P is avoided, the formula of (pAqA/nA) + (pBqB/nB) seems somewhat easier to use on a hand calculator, but NPQ /nAnB is usually preferred here because it is also more appropriate for the conventional null hypothesis.
If the result is so borderline that stochastic significance will be lost or gained by choice of the method for calculating SED, the data probably do not warrant any firm conclusions. Besides, both formulas rely on the assumption that increments in the two proportions have a Gaussian distribution. As shown in Chapter 8, however, this assumption does not hold for relatively small group sizes. In such circumstances, the data analyst might want to use the Fisher-test intervals or some other procedure to achieve reality amid the allure of the Gaussian wonderland.
14.4.3Choice of Zα Values
After an SED value is determined, the next step in constructing a confidence interval is to choose a value for Zα . This choice depends on the same questions that have been previously discussed for how much “confidence” we want, and whether it goes in a oneor two-tailed direction.
The questions can be answered with the same reasoning and arguments used previously for dimensional data in Chapter 13, but two distinctions need further consideration for comparisons of binary proportions.
14.4.3.1 Magnitude of α — In most ordinary situations, α is set at a two-tailed level of .05, for which Zα = 1.96. Thus, for the difference in the previously cited comparison of 18/24 = .75 vs. 10/15 = .67, with SED = .148, the 95% confidence interval will be
(.75 |
– .67) ± (1.96 )(1.48 ) |
= .08 ± .29 |
and will extend from .21 to +.37. |
Because the interval |
includes 0, the result is not stochastically |
significant at 2P ≤ .05. On the other hand, we might be reluctant to conclude that the increment of .08 is truly “insignificant” because it might be as large as .21 in favor of “Treatment B” or .37 in favor of “Treatment A” within the extent of the 95% confidence interval.
The magnitude of α becomes particularly important when the stochastic result is “nonsignificant,” and the confidence interval is then explored for the possibility that its upper end is impressively high. With a small enough choice of α, e.g., α = .0001, Zα can become large enough to “drag” the interval for any two-proportion increment across whatever descriptive border is set for “high.” For example, if the observed increment is .04 and the SED is .03, the customary Zα will make the confidence interval be .04 ± (1.96)(.03). It will be “nonsignificant,” extending from .019 to +.099.
On the other hand, if we set α = .001, the Zα of 3.29 will make the interval become .04 ± (3.29)(.03). Its upper end will now have the “impressive” value of .14.
© 2002 by Chapman & Hall/CRC
