13.8.2Continuity Correction
You may remember that a continuity correction was recommended back in Section 8.6.1 when the Z procedure is applied to a proportion. Although not all authorities are agreed, this correction can also be applied when two proportions are compared. For the correction, 1/2 is subtracted from the numerator of the larger proportion, and 1/2 is added to the numerator of the smaller proportion. Thus, if pA = rA/nA and pB = rB/nB, and if pA > pB, the correction is
[rA – (1/2 )]/nA – [rB + (1/2 )]/nB ZC = -------------------------------------------------------------------------------
NPQ/nA nB
The algebra can be simplified if you recognize that the numerator of this expression becomes pA − pB − [(1/2)(N/nAnB)]. If we let G = N/nAnB, the expression becomes
pA – pB – (G/2 ) ZC = ------------------------------------
PQG
In the foregoing example, G = 39/(15 × 24) = .108, and G/2 = .054. The corrected value would be
ZC = |
.75 – .67 – .054 |
= |
.0293 |
= .198 |
(.718)(.282 )(.108) |
.02187 |
The continuity result is identical to what is achieved as the square root of chi-square when the latter is calculated with an analogous continuity correction in Chapter 14. The desirability of the correction will be discussed in that chapter.
13.8.3Standard Error for Alternative Hypothesis
For the alternative hypothesis, when we assume that π A ≠ π B, the common value for π is not estimated as P = (nApA + nBpB)/N. In this situation, the standard error is determined as
(pA qA /nA ) + (pBqB /nB ), not with
NPQ/nA nB as in Formula (13.16).
If you work out the algebra, the difference in the two squared standard errors turns out to be minor.
Thus, if we let |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NPQ |
|
pA qA |
|
|
|
|
|
|
|
|
|
|
pB qB |
|
|
k = -----------nA nB |
– -----------nA |
+ |
---------- |
nB |
|
the value of k becomes |
|
|
|
|
|
|
|
|
|
|
|
k |
= |
nA |
– nB |
(pA qA |
– pB qB ) |
+ |
(pA – pB )2 |
----------------nA nB |
----------------------- |
[13.17] |
|
|
|
|
|
|
|
|
|
N |
For example, for the comparison |
of 18/24 |
|
vs. |
10/15, the conventional standard error is |
(39 )(.718)(.282 )/(24)(15 ) = |
.148. |
The standard |
error |
for the alternative hypothesis is |

[(18 )(6 )/(24)] + [(10 )(5 )/(15 )] = .150. According to Formula [13.17], k turns out to be −.00069, which is essentially (.148)2 − (.150)2.
13.9 Sample Size for a Contrast of Two Means
The last topic discussed in this chapter is the calculation of sample size for a contrast of two means. The calculational challenge occurs most commonly in clinical trials, where the contrast is usually between two proportions, rather than two means; and the clinical-trial calculations often involve the simultaneous consideration of both a null and an alternative hypothesis, as discussed later in Chapter 23. Nevertheless, particularly in laboratory experiments, an investigator may want to know the sample size required to show stochastically that the mean is “significantly” larger for Group A than for Group B.
© 2002 by Chapman & Hall/CRC
The process once again involves the type of anticipatory estimates that were discussed in Section 8.8.3, and the conversation between investigator and statistician often evokes the same kind of frustrating guesswork. The decisions require six choices: two estimates, a quantitative distinction, a stochastic boundary, a direction, and a sample partition. The estimates, distinctions, and boundaries are commonly shown with Greek letters to denote that they have been guessed or assigned, rather than actually observed.
One of the estimates is µˆ B , the mean expected in Group B. The other estimate is σˆ p , the common (or pooled) standard deviation of data in the two groups. For example, we might estimate that Group B will have a mean of 32.0, and that the common standard deviation is 12.3. The quantitative distinction is labeled δ , the anticipated magnitude for the impressive or “quantitatively significant” increment in means between Group A and Group B. After δ is chosen, the anticipated mean in Group A is estimated as µˆ A = µˆ B + δ . If we want the mean in Group A to be at least 25% larger than in Group B, we set δ at 8.0, so that the mean expected for Group A should exceed 40.0.
The next step is to chose a stochastic boundary for α , which can be set at the conventional .05. Because we expect Group A to exceed Group B, a direction has been stated, and the Z α corresponding to a onetailed α = .05 would be Z.1 = 1.645. If we wanted to comply with the customary preference for twotailed tests, however, the anticipated difference would be |µA − µB| = δ and the α = .05 would be bidirectional, so that Z.05 = 1.96. Finally, for sample partition, we can make the customary choice of equalsize samples, so that nA = nB = n, and N = 2n.
With all of these decisions completed, the rest is easy. The sample size can be calculated by putting all the Greek and other values into the appropriate formula for either a Z test or a confidence interval.
13.9.1Z-test Formula
For the Z-test sample-size calculation, Formula [13.13] receives appropriate substitutions to become
|
|
|
Zα |
= [(µA – µB )/σ |
p ] |
n |
|
/2n |
|
|
|
|
ˆ |
ˆ |
ˆ |
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
Squaring both sides and rearranging terms produces |
|
|
|
|
|
|
|
|
|
(n/2) |
|
Zα2 |
σˆ p2 |
|
|
|
[13.18] |
|
|
|
|
= ------------------------ |
|
|
|
|
|
|
(µA – µB ) |
|
|
|
|
|
|
|
|
|
ˆ |
ˆ |
|
2 |
|
|
ˆ |
ˆ |
ˆ 2 |
2 |
the sample size for one group is |
and because µA – µB = δ |
and σ p |
= sp , |
|
|
|
|
n = |
2Zα2 sp2 /(δ )2 |
|
|
|
[13.19] |
Substituting the foregoing data and using a two-tailed test, we get |
|
|
|
n = |
(2 )(1.96)2 |
(12.3 )2 |
= |
|
18.2 |
|
|
|
------------------------------------------ |
|
|
|
|
|
(8.0 )2 |
|
|
|
|
|
The required sample size will be 19 members in each group. The total sample size will be 38.
13.9.2Confidence-Interval Formula
With the confidence-interval approach, we want the expected difference, do, to exceed 0 at the lower end, so that
do − Zα (SED) > 0
Because SED is sp
N/(nA nB ), we can substitute appropriately and rearrange terms to get
δ > Zα sp
2n/n2
© 2002 by Chapman & Hall/CRC
which becomes
n > 2Z2α sp2 /δ 2
yielding the same result as previously in Formula [13.19].
13.9.3Adjustment for t Distribution
If we worry that the sample size of 38 is not quite large enough to justify using Z rather than t, the value of t36,.05 = 2.028 can be substituted for Z.05 = 1.96 in the foregoing formulas. The result will be
n > 2(2.028)2 (12.3)2/(8.0)2 = 19.4
With a sample size of 20 members in each group, the degrees of freedom will be 38, and t38,.05 will be 2.0244. Substituting (2.0244)2 for (2.028)2 in the foregoing calculation produces n > 19.33, and so we
can keep the sample size as 20 in each group.
13.9.4One-Tailed Result
Because Zα2 (or tα2 ) appears in the numerator of the calculation, the sample size is always larger for two-tailed than one-tailed decisions. For one-tailed calculations, we would use Z.1 = 1.645 or t38, .1 = 1.686. The sample size in Formula [13.19] would be reduced to
n > 2(l.686)2 (12.3)2/(8.0)2 = 13.4
and the total sample size would become 2 × 14 = 28.
13.9.5Augmentation for “Insurance”
After all the precise calculations have been completed, with all their exact mathematical stipulations, the results are usually altered by an act of pure judgment. To provide “insurance” against such phenomena as loss of members, “unusable data,” or any other potential reduction in the effective size of the research group, a cautious investigator will almost always increase the number that emerged from the mathematical calculations. The amount of augmentation usually exceeds the calculated amount by 10%, 20%, or even more, according to a background “judgment” based on the losses encountered in previous instances of the same type of research. The mathematical calculations may have had splendid quantification, but the ultimate decision depends on augmentations that come from the “hunch” of previous experience.
References
1. Boneau, 1960; 2. Feinstein, 1998a; 3. Fleiss, 1986; 4. Feinstein, 1985; 5. Surgeon General’s Advisory Committee on Smoking and Health, 1964; 6. Hammond, 1958; 7. Mohr, 1982.
Exercises
13.1.In Exercise 11.1, you were informed that the t test gave a “nonsignificant” result for the two cited groups of data. Please verify this statement. Why does the t test “fail” to confirm “significance” for these obviously “significant” results?
13.2.In Exercise 12.3, you did a Pitman-Welch test on the mean difference of two groups. For Group A, the data were 2, 8, 11, and 13; for Group B, the data were 1, 3, and 6. Perform a t test on this difference and compare the stochastic conclusions with what you found using the Pitman-Welch test.
©2002 by Chapman & Hall/CRC
13.3. In a randomized clinical trial, one group of babies was fed Formula A from birth to age six months; and the other group was fed Formula B. The two groups can be assumed to have had identical weights at birth. At the end of six months, the weight gains in pounds for members of the two groups were as follows:
Group A: 5, 7, 8, 9, 6, 7, 10, 8, 6
Group B: 9, 10, 8, 6, 7, 9
13.3.1.Considering each group individually, calculate a 95% confidence interval and a 99% confidence interval for the mean. (Show the cogent intermediate steps in your arrangement of calculations.)
13.3.2.Considering each group individually, what is the probability that this group arose from a parent population in which the mean weight gain is 6.7 pounds?
13.3.3.Use two different approaches (t test and confidence interval) to show whether the
difference in the means of the two groups is “statistically significant” at the level of 2P < .025.
13.3.4.Forgetting the differences in the formula feeding, assume that the members of Groups A and B constitute the entire population of babies in which you are interested. What procedure would you use (and show the calculations) to determine whether there is something peculiar about the baby who gained only 5 pounds?
13.4.Here is a chance to let current activities be enlightened by a repetition of past history. In 1908,
Gosset reported data showing the comparative effects of two drugs in producing sleep in 10 patients. For each patient, he noted the amount of sleep obtained without medication and the additional hours of sleep gained after treatment (1) with hyoscyamine and (2) with hyoscine. The table below reproduces the data from an original publication, except for a replacement by modern symbols for mean and standard deviation.
Additional hours’ sleep gained by the use of two drugs (“Student,” 1908)
Patient |
Hyoscyamine |
Hyoscine |
Difference |
1 |
+ 0.7 |
+1.9 |
+ 1.2 |
2 |
–1.6 |
+0.8 |
+ 2.4 |
3 |
–0.2 |
+1.1 |
+ 1.3 |
4 |
–1.2 |
+0.1 |
+ 1.3 |
5 |
–0.1 |
–0.1 |
0 |
6 |
+ 3.4 |
+4.4 |
+ 1.0 |
7 |
+ 3.7 |
+5.5 |
+ 1.8 |
8 |
+ 0.8 |
+1.6 |
+ 0.8 |
9 |
0 |
+4.6 |
+ 4.6 |
10 |
+ 2.0 |
+3.4 |
+ 1.4 |
– |
+ 0.75 |
+2.33 |
+ 1.58 |
X |
s |
1.79 |
2.00 |
1.23 |
The results in this table can be stochastically evaluated with a two-group or one-group test. Do both procedures and compare the results. What is the reason for any differences you observed? Can you demonstrate the source of the differences?
13.5. At a medical meeting you are attending, a renowned professor presents “statistically significant” results showing XA = 16, SEA = 4, and nA = 81 for Group A; and XB = 25, SEB = 6, and nB = 64 for Group B. Everyone seems to be impressed by the large difference. During the discussion period, however, a member of the audience, who has been sitting placidly taking no notes and doing no apparent calculations, arises to dispute the professor’s contention. The commentator says that the difference is not “statistically significant” and says, in addition, that the data should not have been reported with mean values as summaries of the results.
Everyone is now shocked or outraged by the temerity, skill, or chutzpah of the commentator. The challenged professor responds by saying that he does not understand the mathematics and calls for help from his statistician, who happens to be present in the audience, accompanied by a large volume
© 2002 by Chapman & Hall/CRC
of computer printout. Intrigued by the dispute, the chairperson declares a brief intermission, while the statistician checks the results. After the intermission, the professor returns to the podium and confesses, somewhat abashedly, that he mistakenly looked at the wrong section of printout in finding the “significant” P value. Furthermore, he says that the statistician has also expressed reservations about using means to summarize the results.
13.5.1.What do you think the commentator did to decide that the results were not “statistically significant”?
13.5.2.What is the correct P value for this comparison?
13.5.3.On what basis do you think the commentator rejected the means as summary indexes?
13.5.4.What would you recommend as a strategy for an improved alternative summary and analysis?
13.6.In a report in The Lancet, the authors did a crossover study in 10 patients with refractory noctural
angina.7 For the control night the bed was placed with the head in a semi-erect position; and for the test night it was put in a feet-down position. The authors wrote that “Results were compared by means of student’s t test (non-paired, two-tailed). All numerical results were expressed as mean ± standard error of the mean.”
The table below is a copy of what was published as Table II in the report.7
Effects of Feet-Down Position on Number of Pain Episodes and Number of Isosorbide Dinitrate Tablets Taken
|
No. Pain Episodes |
|
|
No. Tablets |
Patient No. |
Control |
Test |
|
Control |
Test |
|
|
|
|
|
1 |
6 |
0 |
6 |
0 |
2 |
2 |
0 |
4 |
0 |
3 |
2 |
0 |
2 |
0 |
4 |
7 |
0 |
10 |
0 |
5 |
3 |
1 |
4 |
1 |
6 |
5 |
0 |
5 |
0 |
7 |
3 |
0 |
5 |
0 |
8 |
2 |
0 |
2 |
0 |
9 |
3 |
1 |
6 |
2 |
10 |
4 |
0 |
4 |
0 |
Mean ± SEM |
3.7 ± 1.8 |
0.2 ± 0.4* |
4.8 ± 2.3 |
0.3 ± 0.7* |
*p< 0.001.
13.6.1.A value of “p < 0.001” is shown for the comparison of “No. pain episodes” in the Control and Test results. Verify this claim, and show how you verified it.
13.6.2.Can you think of a simple procedure, which can be done mentally without any calculation, that would promptly give you essentially the same result for the P value in the table? (This is one of those questions where you immediately see how to get the answer—or you don’t. If you don’t, go on to the next question and don’t spend a lot of
time deliberating.)
13.7. If you have time, and want to check your understanding of some of the basic principles involved in this and preceding chapters, try answering the following “top ten” questions:
13.7.1.For a two-group t (or Z) test, why do we subtract the two means? Why not form a ratio of some other index of contrast?
13.7.2.In the two-group tests, the incremental deviation is calculated as Xi − X for the members of each group. Is this the best way to indicate discrepancies between
individual values and the central indexes?
13.7.3.Why are standard deviations calculated by first squaring and summing the individual deviations? Why not use them directly?
13.7.4.Why is the sum of squared deviations divided by n − 1 rather than n for inferential calculations? What is the justification for getting an “average” value by using one fewer than the number of available members in a group?
©2002 by Chapman & Hall/CRC
13.7.5.What is a standard error of the mean? What errors were created during the measurement process, and why don’t the investigators find and fix the errors before doing all the statistical analyses?
13.7.6.In the formula for a t (or Z) test, the increment in the numerator is divided by a standard error in the denominator. Why are the two terms divided? Why aren’t they added or multiplied?
13.7.7.What is the meaning of degrees of freedom? What type of incarceration and subsequent liberation occurred before the data received their “freedom”? And why does it have different “degrees”?
13.7.8.What is a P value? It has immense importance in the uriniferous aspects of human physiology, but what is it statistically? Why does it have so dominant a role in deciding whether research results are “significant” or “nonsignificant”?
13.7.9.What are you confident about in a “confidence interval”? When might your confidence be statistically unjustified? When would the confidence be unjustified for other reasons?
13.7.10.What is the meaning of an α level for “false positive conclusions”? Are they an
inevitable consequence of the errors in “standard errors”? Why is α regularly set at .05?
© 2002 by Chapman & Hall/CRC
14
Chi-Square Test and Evaluation
of Two Proportions
CONTENTS
14.1Basic Principles of Chi-Square Reasoning
14.1.1Two Sources of Models
14.1.2Illustration of Test of Independence
14.1.3The χ 2 Distribution
14.1.4Degrees of Freedom and Interpretation of χ 2
14.1.5Controversy about d.f. in 2-Way Tables
14.1.6“Large” Degrees of Freedom
14.1.7Yates Continuity Correction
14.2Formulas for Calculation
14.2.12 × 2 Tables
14.2.2Yates Correction in 2 × 2 Table
14.2.3Formula for “Rate” Calculations
14.2.4Expected Values
14.2.5Similar Values of OE– in Cells of 2 × 2 Table
14.2.6Equivalence of Z2 and X2
14.2.7Mental Approximation of Z
14.3Problems and Precautions in Use of X2
14.3.1Size of Groups
14.3.2Size of Cells
14.3.3Fractional Expected Values
14.3.4Is X2 a Parametric Test?
14.3.5Controversy about Marginal Totals
14.3.6Reasons for “Success” of Chi-Square
14.4Confidence Intervals for Contrast of Two Proportions
14.4.1Standard Error of Difference
14.4.2Similarity of SEDs
14.4.3Choice of Zα Values
14.4.4Criteria for “Bulk” of Data
14.5Calculating Sample Size to Contrast Two Proportions
14.5.1“Single” vs. “Double” Stochastic Significance
14.5.2Assignments for δ and α
14.5.3Estimates of π A, π B , and π
14.5.4Decision about Sample Allocation
14.6Problems in Choosing πˆ
14.6.1Directional Differences in Sample Size
14.6.2Compromise Choice of πˆ
14.6.3“Compromise” Calculation for Sample Size
14.6.4Implications of Noncommittal (Two-Tailed) Direction
14.6.5Augmentation of Calculated Sample Size
14.6.6“Fighting City Hall”
© 2002 by Chapman & Hall/CRC
14.7Versatility of X2 in Other Applications
14.7.1“Goodness of Fit”
14.7.2McNemar Test for “Correlated Proportions”
14.7.3Additional Applications
14.8Problems in Computer Printouts for 2 × 2 Tables
14.8.1Sources of Confusion
14.8.2Additional Stochastic Indexes
14.8.3Additional Descriptive Indexes
References
Exercises
To do a stochastic contrast (for “statistical significance”) of two proportions, the following three methods have already been discussed in Chapters 11–13:
1.Fragility test: The unit fragility test (Section 11.2.3.1) is simple, is easy to understand, and uses a commonsense approach that might be applied if investigators were directly appraising stability, without being affected by previous mathematical instruction. The fragility test, however, is new, generally unfamiliar, and currently unaccompanied either by a long track record or by recommen - dations from what R. A. Fisher once called “heavyweight authorities.”
2.Fisher Exact Test: Generally regarded as the “gold standard,” the Fisher exact test (Chapter 12) should always be used when the constituent numbers are small. It can also be applied if the numbers are large, but usually needs a computer for the calculations. The only disadvantages of the Fisher test are that most current computer programs have not been automated to produce confidence intervals routinely or to do advance calculations of sample size.
3.Z test: The Z test (Section 13.8) for the increment in two proportions is easy to do with a hand calculator and requires only that the group sizes be adequately large. Unlike the first two methods, the Z procedure can readily be applied to get conventional confidence intervals or to calculate sample size.
Despite the availability of these three methods, however, two proportions are usually stochastically compared with a fourth procedure to which the rest of this chapter is devoted. Called chi-square (rhymes with “eye square”), the procedure is probably the most commonly applied statistical tactic in group-based clinical or epidemiologic research. The chi-square test is popular because it is easy to calculate from the frequency counts of categorical data; and it is versatile enough for many additional applications that will be described later. For two proportions, the chi-square test yields the same result as a Z test; and, when the constituent numbers are large enough, the chi-square (or Z) test produces a stochastic conclusion almost identical to what emerges from a Fisher exact test.
14.1 Basic Principles of Chi-Square Reasoning
The chi-square test relies on a hypothetical “model” for the set of “expected” frequencies in categories of the data. The expected and observed frequencies in each category are then arranged in a ratio as
(observed – expected)2 /expected
and the sum of these ratios forms the test statistic,
X |
2 |
= |
Σ |
|
(observed – expected )2 |
|
[14.1] |
|
|
|
------------------------------------------------------- |
|
|
|
|
|
expected |
|
|
The calculated results for X2 are interpreted with a theoretical sampling distribution from which we find the associated P value and then decide to concede or reject the hypothesis that formed the model being tested.
© 2002 by Chapman & Hall/CRC
14.1.1Two Sources of Models
The model that produces the expected values can come from one of two different assumptions for the null hypothesis. One assumption refers to the apportionment of categories in the simple univariate distribution of a one-way table. The proportions may be assumed either to be equal, such as 25% for each of four categories, or to occur in a specific ratio, such as the 9:3:3:1 distribution of AB, Ab, aB, and ab in genetic mixtures. The X2 test for the observed univariate distribution will indicate how well the expectations fit the observed categorical frequencies. Rejection of the null hypothesis leads to the conclusion that the anticipated model fits the data poorly. This type of procedure, illustrated later in Section 14.7.1, is called a goodness-of-fit chi-square test and is usually applied to a one-way table of categories, not to the proportions compared in a 2 × 2 table.
The latter comparison is usually done with a different assumption, which is the null hypothesis that the two variables under examination are independent, i.e., that one of the variables, such as type of treatment, has no effect on the other variable, such as successful outcome. In the chi-square test of independence, which is commonly applied to two-way tables, the expected values emerge from the observed data, not from a previous belief about apportionment. Subsequent rejection of the null hypoth - esis leads to the conclusion that the two variables are not independent, i.e., that successful outcome is affected by type of treatment.
14.1.2Illustration of Test of Independence
In its most frequent and world-renowned application, X2 (pronounced ex-square) is used to compare two proportions, such as the occurrence of anemia in 6% (6 /95) of a group of men and in 13% (14 /105) of a group of women. For the stochastic comparison, the proportions are usually “unpacked” to form a 2 × 2 (or fo urfold) contingency table, having a structure such as Table 14.1.
TABLE 14.1
2 × 2 Table Showing Contingency Counts for Presence of
Anemia in a Group of 200 Men and Women
|
Anemic |
Not Anemic |
Total |
|
|
|
|
Men |
6 |
89 |
95 |
Women |
14 |
91 |
105 |
Total |
20 |
180 |
200 |
|
|
|
|
Under the null hypothesis of independence, we assume that the two variables, sex and anemia, in Table 14.1 are independent, i.e., unrelated. We therefore expect men and women to be similar in their proportions of anemia and non-anemia. To determine these proportions, we use the total results in the data as the best source of the parametric estimates. Thus, from the marginal totals in the table, the “true” proportional parameters would be estimated as 20/200 = .10 for anemia and 180/200 = .90 for non-anemia.
With these parametric estimates, the expected frequency counts in the cells of the table would be
.1 × |
95 = 9.5 for anemia and .9 |
× 95 = 85.5 for non-anemia in the 95 men, and .1 |
× 105 = 10.5 and |
.9 × |
105 = 94.5 correspondingly for the 105 women. For calculating the observed/expected ratios in |
Formula [14.1], the upper left cell in Table 14.1 would produce (6 |
9.5)2/9.5 = ( 3.5)2 /9.5. After the |
remaining appropriate ratios are formed and added, the value for X2 would be |
|
|
|
X |
2 |
= |
(–3.5 )2 |
+ |
(3.5 ) |
2 |
(3.5)2 |
+ |
(–3.5)2 |
= |
12.25 |
+ |
12.25 |
+ |
12.25 |
+ |
12.25 |
= 2.73 |
|
|
----------------9.5 |
-------------85.5 |
+ |
-------------10.5 |
----------------94.5 |
------------9.5 |
------------85.5 |
------------10.5 |
------------94.5 |
|
|
|
|
|
|
|
|
|
|
|
|
When checked as shown shortly, the associated 2P value for this X2 exceeds .05. Thus, there is a reasonable stochastic chance (i.e., greater than 1 in 20) that men and women — at least in the group under study here — do indeed have similar occurrence rates of anemia.
© 2002 by Chapman & Hall/CRC
14.1.3The χ 2 Distribution
The critical ratio that forms the calculated value of X2 is an arbitrary stochastic index or “test statistic,” which has a mathematical sampling distribution called chi-square, written as χ 2. (This same phenomenon occurred earlier for the “critical ratio” of test statistics that were referred to a t or Gaussian Z distribution. In the previous activities, however, the same names, t or Z, were used for both the test statistics and the corresponding distributions of “continuous” dimensional data. For the “discontinuities” of categor - ical data, a distinction can be made between X2, the calculated test statistic, and χ 2, the theoretical continuous distribution.)
The “discovery” of the chi-square distribution is usually ascribed to Karl Pearson, but it is now believed to have been introduced in Germany in 1863 by the physicist, E. Abbe,1 and publicized in 1876 by another physicist, F. R. Helmert.2
The mathematical idea behind the χ 2 distribution (if you are curious to know) involves the sum of squares for a series of individual Gaussian variables, Xj, each with standard deviation, σ j. When each variable is cited as a standardized deviate, Zj = (Xj – Xj )/σ j , the sum of squares for ν su ch independent deviates is
χ ν2 = Z12 + Z22 + … + Zν2 |
[14.2] |
The subscript “ν ” denotes that ν independent standard “normal variates” contributed to the sum. The sampling distribution of χ 2ν would produce a family of curves, analogous to the family we found for the t distribution, with ν being the degrees of freedom for each curve.
Figure 14.1 shows the probability density, analogous to that of a Gaussian curve, for several members of the family of χ 2 curves, each marked with n degrees of freedom. Unlike Z and t curves, the χ 2 curves are not symmetrical. Like the Z and t curves, however, the χ 2 curves are interpreted for the external probability area that remains under the curve beyond a particular value of χ 2.
0.5
|
n =1 |
|
|
|
|
0.4 |
|
|
|
|
|
0.3 |
n =2 |
|
|
|
|
|
|
|
|
|
Probability |
|
|
|
|
|
Density |
|
n =3 |
|
|
|
0.2 |
|
|
|
n =4 |
|
|
|
|
|
|
0.1 |
|
|
|
|
|
0 |
1 |
2 |
3 |
4 |
5 |
χ 2
FIGURE 14.1
Probability density of chi-square distribution for different degrees of freedom marked n. (For further details, see text.)
14.1.4Degrees of Freedom and Interpretation of χ 2
The χ 2 distribution is well suited for interpreting the test statistic, X2, determined as Σ [(observed – expected)2/expected]. The proof of the correspondence between X 2 and χ 2 lies buried somewhere in the annals of mathematical statistics and will not be exhumed here. (Shouts of “hurrah” are permissible, but not encouraged.)
The sampling distribution of χ 2 in its family of curves will have different external P values according to the degrees of freedom in the particular calculated value of X 2. Table 14.2 shows the correspondence of values for χ 2, ν , and P.
To apply Table 14.2 for the X2 of 2.73 in the data of Table 14.1, we first determine the degrees of freedom in the data. One approach to this decision is to recognize that the increment in two proportions can be converted to a “Z-deviate” using Formula [13.16]. When entered in Formula [14.2], the squared value of Z would be a single member of a χ 2 family and would, therefore, have one degree of freedom.
© 2002 by Chapman & Hall/CRC