Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
is that the customary parametric approach for the confidence interval of two contrasted proportions may not always yield one of the realistic values shown in the foregoing tabulation of Fisherian rearrangements. [The disparities will be illustrated in Chapter 14.]
Although the Fisher-table method might offer an elegant solution to the problem of getting confidence intervals for comparisons of two proportions, the method has not been automated and does not appear in the computer programs offered by such ‘‘package” systems as BMDP or SAS. If you want to use the method, you will have to work out a suitable approach. The simplest tactic is to start at the ‘‘terminal’’ Fisher tables at each end and then work toward the center, as in Table 12.3.
12.9 Pitman–Welch Permutation Test
A permutation rearrangement can also be applied to dimensional data with results expressed as a difference in two means rather than two proportions. The strategy of stochastically contrasting two means by rearranging the permuted probabilities was described independently by two statisticians3,4 whose names are joined here for the Pitman–Welch commemorative eponym.
Because the data are dimensional rather than binary, the calculations can become much more complex than in a contrast of two proportions, where each individual item of data was either a 0 or a 1. The binary data allow probabilities to be determined with the simplifying calculations that use permutations and combinations for the 0/1 values. Because each item of dimensional data can have different values, the same type of simplification cannot be used. Nevertheless, with the ‘‘short cuts” noted in the following example, the computations can be substantially eased.
12.9.1Example of a Contrast of Two Means
In four patients who received Treatment A, the measured results were 8, 11, 13, and 21 units, so that the mean was 53/4 = 13.25 units. For Treatment B, the results were 19, 25, 31, and 37 units, with a mean of 112/4 = 28.00 units. Is this difference of 14.75 units stochastically significant?
To answer this question, we could contemplate all the ways of arranging the eight values — 8, 11, 13, 19, 21, 25, 31, and 37 — into two groups, each containing four members. We could calculate the mean for each group in the arrangement, find the difference in means for each pair of groups, and determine how often the difference equals or exceeds 14.75. For the eight values divided into two groups of four each, there would be 8!/(4!)(4!) = 70 arrangements to consider.
Table 12.4 shows 35 of these arrangements. Because the two groups each contain the same number of four members, the second set of 35 arrangements would simply be a mirror image of what is shown in Table 12.4, with all of the Group A results appearing as Group B and vice versa. This type of symmetry occurs in permutation tests whenever the two compared groups have the same number of members. If the numbers are unequal, the entire set of possible arrangements must be considered.
12.9.2Summary of Distribution
The 35 arrangements in Table 12.4 can be summarized as follows:
Observed Difference in Means |
Frequency |
|
|
15.75 |
1 |
14.75 |
1 |
10 to 14 |
3 |
7 to <10 |
5 |
4 to <7 |
7 |
0 to <4 |
7 |
−4 to <0 |
7 |
−7 to <−4 |
2 |
−10 to <−7 |
2 |
TOTAL |
35 |
|
|
© 2002 by Chapman & Hall/CRC
TABLE 12.4
Permutation Arrangements in Comparing Means of Two Groups
Data for |
Group A |
Data for |
Group B |
Mean |
Treatment A |
Mean |
Treatment B |
Mean |
Group B − Group A |
|
|
|
|
|
8,11,13,19 |
12.75 |
21,25,31,37 |
28.50 |
15.75 |
8,11,13,21 |
13.25 |
19,25,31,37 |
28.00 |
14.75* |
8,11,13,25 |
14.25 |
19,21,31,37 |
27.00 |
12.75 |
8,11,13,31 |
15.75 |
19,21,25,37 |
25.50 |
9.75 |
8,11,13,37 |
17.25 |
19,21,25,31 |
24.00 |
6.75 |
8,11,19,21 |
14.75 |
13,25,31,37 |
26.50 |
11.75 |
8,11,19,25 |
15.75 |
13,21,31,37 |
25.50 |
9.75 |
8,11,19,31 |
17.25 |
13,21,25,37 |
24.00 |
6.75 |
8,11,19,37 |
18.75 |
13,21,25,31 |
22.50 |
3.75 |
8,11,21,25 |
16.25 |
13,19,31,37 |
25.00 |
8.75 |
8,11,21,31 |
17.75 |
13,19,25,37 |
23.50 |
5.75 |
8,11,21,37 |
19.25 |
13,19,25,31 |
22.00 |
2.75 |
8,11,25,31 |
18.25 |
13,19,21,37 |
23.00 |
4.75 |
8,11,25,37 |
19.75 |
13,19,21,31 |
21.50 |
1.75 |
8,11,31.37 |
21.25 |
13,19,21,25 |
20.00 |
−1.25 |
8,13,19,21 |
15.25 |
11,25,31,37 |
26.00 |
10.75 |
8,13,19,25 |
16.25 |
11,21,31,37 |
25.00 |
8.75 |
8,13,19,31 |
17.75 |
11,21,25,37 |
23.50 |
5.75 |
8,13,19,37 |
19.25 |
11,21,25,31 |
22.00 |
2.75 |
8,13,21,25 |
16.75 |
11,19,31,37 |
24.50 |
7.75 |
8,13,21,31 |
18.25 |
11,19,25,37 |
23.00 |
4.75 |
8,13,21,37 |
19.75 |
11,19,25,31 |
21.50 |
1.75 |
8,13,25,31 |
19.25 |
11,19,21,37 |
22.00 |
2.75 |
8,13,25,37 |
20.75 |
11,19,21,31 |
20.50 |
−0.25 |
8,13,31,37 |
22.25 |
11,19,21,25 |
19.00 |
−3.25 |
8,19,21,25 |
18.25 |
11,13,31,37 |
23.00 |
4.75 |
8,19,21,31 |
19.75 |
11,13,25,37 |
21.50 |
1.75 |
8,19,21,37 |
21.25 |
11,13,25,31 |
20.00 |
−1.25 |
8,19,25,31 |
20.75 |
11,13,21,37 |
20.50 |
−0.25 |
8,19,25,37 |
22.25 |
11,13,21,31 |
19.00 |
−3.25 |
8,19,31,37 |
23.75 |
11,13,21,25 |
17.50 |
−6.25 |
8,21,25,31 |
21.25 |
11,13,19,37 |
20.00 |
−1.25 |
8,21,25,37 |
22.75 |
11,13,19,31 |
18.50 |
−4.25 |
8,21,31,37 |
24.25 |
11,13,19,25 |
17.00 |
−7.25 |
8,25,31,37 |
25.25 |
11,13,19,21 |
16.00 |
−9.25 |
* = observed result.
We now have the answer to the basic question. An absolute difference of 14.75 units or more occurs twice in these 35 arrangements and would appear four times in the full total of 70 arrangements. The one-tailed P value is 2/70 = .03, and the two-tailed P value is 4/70 = .06.
12.9.3Simplified Arrangement
To shorten (or avoid) all these calculations, we can note that the observed results for the group receiving Treatment A contain the three members with the lowest values (8, 11, 13) of the total array of values: 8, 11, 13, 19, 21, 25, 31, and 37. The fourth observed value in Treatment A is 21, yielding the mean difference of 14.75 in the two groups. If we look for a difference in mean values that is more extreme, i.e., even larger, than 14.75, the only way to get it is if the fourth member of Group A were 19 rather than 21. For the group of values 8, 11, 13, and 19, the mean would be 12.75. The mean for 21, 25, 31, and 37 would be 28.50; and the difference in means would be 15.75.
Because there are only two ways of getting a mean difference that is at least as large as 14.75, the one-tailed P value will be 2/70 or .03. Because there are 4 members in each treatment group, 35 (or half) of the possible 70 arrangements will be symmetrical. Thus, there will be one arrangement in which
© 2002 by Chapman & Hall/CRC
the values 19, 25, 31, and 37 appear in Group A, while 8, 11, 13, and 21 appear in B; and another arrangement in which 21, 25, 31, and 37 appear in A, with 8, 11, 13, and 19 in B. Consequently, the probabilities on the ‘‘other side’’ will be distributed in a manner similar to those of the ‘‘first side.” Thus, the two-tailed P value will be .03 + .03 = .06.
If α is set at .05, the one-tailed but not the two-tailed probability will be ‘‘statistically significant.”
12.9.4“Confidence Intervals”
Like the Fisher test, the Pitman–Welch procedure offers P values for the null hypothesis, but does not produce confidence intervals around the observed distinction. Nevertheless, from the probability values associated with each arrangement of the data, a counterpart of confidence intervals can be formed to show extreme limits on either side. For example, the results in Table 12.4 show that 2 of the 35 onesided arrangements have incremental means that are >13. Among the 35 arrangements on the other side, 2 would have incremental means that are <−13. Consequently a zone from −13 to +13 would cover 66/70 = .94 of the data—an ipr94 for the spread of possible increments in means. Because these intervals are not arranged, however, around the observed difference (which is 14.75 in this instance), the Pit- man–Welch procedure is seldom used for the stochastic goal of forming a confidence interval around the observed distinction in means. The conventional procedure for the latter goal will be discussed in Chapter 13.
12.9.5Additional Application
The Pitman–Welch procedure is an excellent way to manage the problem previously stated in Exercise 11.1, where the difference seemed obviously ‘‘significant,’’ but where high variability in the data prevented the t-test from giving stochastic confirmation. In Exercise 11.1, a group of 12 items was divided into two groups of 6 each, so that the number of possible arrangements is 12!/[(6!)(6!)] = 924. Because all of the ‘‘low” values were in Group A and all of the ‘‘high” ones were in Group B, the observed data are at the extreme ends of the two-group distribution. Any other arrangement will lead to a smaller increment. Because there is a ‘‘mirror” extreme table at the other side, the 2-tailed P value is 2/924 = .002 for the observed result.
References
1. Fisher, 1934; 2. Irwin, 1935; 3. Pitman, 1937; 4. Welch, 1937.
Exercises
The assignment is relatively light to allow you to digest all the reading material.
12.1.In Section 12.6.1, the two-tailed P value was determined from only four possible tables of the observed data. How many more such tables are there? Calculate the p value for each such table, and show that the sum of all the p values for all the tables is unity.
12.2.In Section 12.1, we compared the proportions, 9/12 (75%) vs. 5/15 (33%). Perform a Fisher Exact Probability Test for both tails of this stochastic contrast. (Be sure to show your constituent results.)
12.3.In a randomized clinical trial, an investigator finds the following dimensional results: For Treatment X: 2, 8, 11, and 13 units; and for Treatment Y: 1, 3, and 6 units. Perform a Pitman–Welch probability test on the difference in means, in both possible directions.
© 2002 by Chapman & Hall/CRC
13
Parametric Sampling: Z and t Tests
CONTENTS
13.1Differences in Permutation and Parametric Principles
13.2Parametric Comparison of Two Means
13.2.1Basic Principles
13.2.2Variance in Increments of Two Means
13.2.3Estimation of Parametric Variance
13.2.4Construction of Standard Error
13.2.5Similarity of the Two Estimates
13.2.6Estimating the Parametric Mean
13.3Z or t Tests for Means in Two Groups
13.3.1Example of Calculations
13.3.2Crude Mental Approximation
13.3.3Summary of Procedure
13.3.4Electronic Computation
13.3.5Robustness of Z/t Tests
13.3.6Similarity of F Test
13.4Components of “Stochastic Significance”
13.4.1Role of Standardized Increment
13.4.2Role of Group Sizes
13.4.3Effects of Total Size
13.4.4Simplified Formulas for Calculation or Interpretation
13.5Dissidence and Concordance in Significance
13.6Controversy about P Values vs. Confidence Intervals
13.6.1Fundamental Problems in Both Methods
13.6.2Coefficient of Stability
13.7Z or t Tests for Paired Groups
13.7.1Effect of Correlated Values
13.7.2Paired vs. Two-Group Arrangements
13.7.3Arbitrary Analytical Pairings
13.8Z Test for Two Proportions
13.8.1Similarity to Chi-Square Test
13.8.2Continuity Correction
13.8.3Standard Error for Alternative Hypothesis
13.9Sample Size for a Contrast of Two Means
13.9.1Z-test Formula
13.9.2Confidence-Interval Formula
13.9.3Adjustment for t Distribution
13.9.4One-Tailed Result
13.9.5Augmentation for “Insurance”
References
Exercises
© 2002 by Chapman & Hall/CRC
The procedures described in Chapter 12 are simple, direct, relatively easy to understand, and empirical. They use the evidence that was actually observed, and they require no excursions into a hypothetical world of parent populations and parameters. The tests can be used for stochastic contrasts of any two binary proportions, two means, or even, if desired, two medians. (In the last instance, the permuted samples would show the distribution of differences in medians, rather than means.)
Permutation tests, however, are not the customary methods used today for evaluating stochastic hypotheses. Instead, the conventional, traditional approaches rely on parametric procedures that are well known and long established. You may have never heard of the Fisher and Pitman-Welch tests before encountering them in Chapter 12, but your previous adventures in medical literature have surely brought a meeting somewhere with the two-group (or “two-sample”) Z and t tests.
They are based on the same principles described in Chapter 7 when the Z and t procedures were used in parametric inference for a single group. In this chapter, the principles for one group are expanded to their most common application: a contrast of two groups.
13.1 Differences in Permutation and Parametric Principles
For the Fisher and Pitman-Welch permutation procedures described in Chapter 12, the central indexes of the two observed groups were converted to a single group of increments. The distribution of the rearranged group of increments showed all the potential differences that could occur, for the Fisher test, in the two proportions, and for the Pitman-Welch test, in the two means. Decisions about P values or confidence intervals were then made from what was found in that distribution.
This same type of activity occurs when two groups are contrasted with parametric methods. A single group is formed from the observed results in the two groups; a set of samples is constructed; and decisions are made from the distribution of the samples. The permutation and parametric procedures differ mainly in that the parametrically constructed samples are theoretical, rather than empiric, and that the distribution is examined for a test statistic, rather than for increments of the central indexes. The main advantage of the parametric procedure is that the test statistic theoretically has a specific mathematical distribution, which presumably will always be the same, regardless of what actually occurred in the two observed groups. The mathematical distribution has previously known probabilities for each value of the parametric test statistic, which can thus be promptly used for decisions about P values and confidence intervals.
This advantage is what made the parametric procedures so popular in the days before ubiquitous digital computation. With permutation techniques, the single group of re-arranged increments forms a unique structure that must be calculated ad hoc for each pair of contrasted groups. Although easy to understand, the calculations were usually difficult to do. With parametric techniques, the single theoretical distribution of the test statistic can be applied for a contrast of almost any two groups; and the test statistic itself has a formula that makes calculation relatively easy. The hard part, however, is to understand the mathematical reasoning that underlies the theoretical construction.
13.2 Parametric Comparison of Two Means
In common forms of statistical inference, the summary index for contrasting two groups is the direct increment of the two central indexes. For two observed means, XA and XB , the increment is XA − XB or XA – XB ; and for two proportions, p A and pB, the increment is pA − pB or pA – pB . For parametric procedures, these increments are converted to the standardized values of the test statistics Z or t. In the discussion that follows, the conversion process and reasoning will first be described for contrasting two means. The contrast of two proportions will be discussed later in Section 13.8.
As you might expect from Chapter 7, the Z and t procedures for comparing two means are quite similar. With both procedures, we form the same “critical ratio,” which is the test statistic called Z or t. With both procedures, the critical ratio consists of the observed increment in means divided by its standard error. With both procedures, the critical ratio can be interpreted with a P value or used
© 2002 by Chapman & Hall/CRC
to construct a confidence interval. With both procedures, the critical ratio is interpreted in reference to a parametric mathematical model of the “sampling distribution.” The only difference between the two procedures is in the mathematical model. For large groups, the model is the Gaussian Z distribution; for small groups, the model is the “Student” t distribution, selected for the appropriate degrees of freedom.
Although the basic operations are similar for the t and Z procedures, and for contrasting two groups rather than examining only one, a new strategy is needed to estimate the parametric standard error for an increment in means. For this strategy, the observed two groups of data are converted into a theoretical sample consisting of increments in means, and the standard error is found as the standard deviation in that sample.
The basic principles of that process, discussed in the rest of this section, involve a lot of mathematics, but should not be too difficult to understand. It is worth reading if you want to know about a prime doctrine in the “ethos” of parametric inference. If not, skip to Section 13.3.
13.2.1Basic Principles
For two observed means, XA and XB , in groups having the respective sizes nA and nB, with total size N = nA + nB, the parametric reasoning assumes that Group A is a random sample from a population having µA as mean and σ A as standard deviation, and that Group B analogously has the populational parameters µB and σ B.
We now do a repetitive theoretical sampling process as in Chapter 7, but on each occasion, we draw a pair of samples, not just one. One sample, with nA members, is taken from population A and has XAj as its mean. The other sample, of size nB, is taken from population B and has XBj as its mean. We then subtract those two sampled means to form a single item of “new” data, which is the increment in means,
XAj – XBj .
As the theoretical sampling process is repeated over and over, each time yielding different values of XAj and XBj , the increments in those two means form a distribution having its own parametric mean, and its own parametric variance and standard deviation. The next step is to determine what they are.
13.2.2Variance in Increments of Two Means
To determine the parametric variance of the increments {XAj – XBj }, we resort to a concept that appeared in Appendix 7.1, with an illustration in Table 7.4. The concept is that the variance of either a sum or difference in two variables is equal to the sum of their variances. For the series of means sampled from
population A, the parametric variance will be σ A2 /nA , which is estimated as sA2 /nA . |
|
The corresponding parametric variance for the means in population B will be σ B2 /nB , |
estimated as |
sB2 /nB . The variance of the distribution of increments in means could therefore be estimated as |
|
σˆ C2 = sC2 = (sA2 /nA ) + (sB2 /nB ) |
[13.1] |
Although this sum offers a satisfactory estimate for σ C, a problem will arise for the stochastic null hypothesis.
13.2.3Estimation of Parametric Variance
The appropriate estimate for the parametric variance depends on what we have in mind for the null hypothesis. If the null hypothesis assumes that the two means are equal, i.e., µA = µB, we imply that the
data forming the increments in the two sample means, XA and XB , came from the same single population, having 0 as its parametric mean, i.e., µC = µA – µB = 0, with σ as the standard deviation of the data. After making the assumption that µA = µB, however, do we also assume that σ A = σ B = σ ? With the latter assumption about variance, we cannot estimate σˆ C directly from sA2 and sB2 , because sA and sB may not be equal. If we do not assume that σ A = σ B, however, σˆ C can readily be estimated from the sum of sA2 /nA + sB2 /nB in Formula [13.1].
© 2002 by Chapman & Hall/CRC
In most circumstances, the null hypothesis assumes that the two groups are parametrically equal for both the means and the variances, i.e., that µA = µB and σ A = σ B = σ . In other situations discussed in Section 11.9 and later in Chapter 23, however, the stochastic assumption is an alternative hypothesis that the two means are different, i.e., µA ≠ µB. In the latter situations, we do not require that σ A = σ B.
If σ A ≠ σ B, the parametric variance of the increments of means can be correctly estimated with Formula [13.1]; but for most ordinary uses of the two-group Z or t test, the null hypothesis assumes equivalence of both means and variances. Consequently, because the observed sA2 will seldom equal sB2 , we need a new way to estimate σ with the idea that σ A = σ B. For this tactic, we estimate σ A and σ B as the “average” value of sA and sB, found from their pooled variance, called sp2 . The pooling process was shown earlier in Section 10.5.1.1. The group variances, SXXA in Group A and SXXB in group B, are added to form SXXA + SXXB as the pooled group variance. This sum is then divided by the combined degrees of freedom to form the average pooled variance of the total data as
2 |
= |
SXX A |
+ SXXB |
= |
SXXA |
+ SXXB |
[13.2] |
sp |
---------------------------ν A + ν B |
n-------------------------------A −1 + nB −1 |
|||||
|
|
|
|
||||
The degrees of freedom are calculated as |
|
|
|
|
|
||
ν c = ν A + ν B = nA − 1 + nB − 1 = N − 2
because, as noted in Section 7.5.2, the estimate of µc as µA − µB requires estimates for µA and µB. |
||||
Therefore, two degrees of freedom are lost from the total group size, N. Because SXX |
= (nA – 1)sA2 |
|||
and SXX |
= (nB – 1 )sB2 , a convenient working formula is |
A |
||
|
||||
|
B |
|
|
|
|
2 |
= |
(nA – 1)sA2 + (nB – 1)sB2 |
[13.3] |
|
sp |
--------------------------------------------------------nA – 1 + nB – 1 |
||
|
|
|
|
|
For example, suppose a group of 8 people have a mean chloride value of 101.25, with s = 4.30. Another group of 15 people have a mean chloride value of 96.15 and s = 4.63. To find the pooled variance, Formula [13.3] would produce
2 |
= |
(7 )(4.30)2 |
+ (14)(4.63 )2 |
= |
429.55 |
= 20.45 |
sp |
------------------------------------------------------------7 |
+ 14 |
---------------21 |
|||
|
|
|
|
and sp =
20.45 = 4.52. This result is what would be anticipated as a “common standard deviation” for the two groups: the value of s p lies between the two observed values of sA = 4.30 and sB = 4.62; and the pooled value in this instance is closer to sB, because Group B had a larger size for nB.
13.2.4Construction of Standard Error
The value of the parametric variance, σ C2 , in the theoretical samples of increments of means is the square of the standard deviation, σ C, in those samples. This standard deviation is the standard error needed for the denominator of the critical ratio.
If we were not using a pooled standard deviation, sp, the standard error could promptly be calculated for the cited two groups of chloride values using Formula [13.1] and inserting
2 |
= |
(4.30 )2 |
(4.63 )2 |
= 2.31 + 1.43 = 3.74 |
||||||||
sC |
----------------8 |
+ |
---------------- |
15 |
||||||||
|
|
|
|
|
|
|
|
|
|
|
||
so that s C = 1.93. |
|
|
|
|
|
|
|
|
|
|
|
|
When a pooled standard deviation is used, however, sp2 |
is substituted for both sA2 and sB2 in Formula |
|||||||||||
[13.1]. The result is a slightly different estimate, designated as σˆ c ′, and calculated as |
||||||||||||
2 |
= |
ˆ 2 |
sp2 |
|
sp2 |
2 1 |
+ |
1 |
2 |
n |
||
sC ′ |
σ C′ = |
----- + |
---- = |
sp |
n-----A |
n----B |
= sp |
n-----------A nB |
||||
|
|
|
nA |
|
nB |
|
|
|
||||
© 2002 by Chapman & Hall/CRC
For the example under discussion,
2 |
= (20.45 ) |
|
1 |
1 |
|
= |
(20.45)(.1917 ) = 3.92 |
sC ′ |
-- + ----- |
||||||
|
|
|
8 |
15 |
|
|
|
so that sC′ = 1.98.
13.2.5Similarity of the Two Estimates
According to the type of null hypothesis being tested, either sc or sc′ will represent SED, the standard error of the difference in two means.
As in various other mathematical niceties — such as when to use t or Z — the distinction between σˆ C and σˆ C′ (or between sC and sC′) is theoretically important. Pragmatically, however, the distinction often makes little difference, because sC and sC′ are usually quite similar, as in the values of 1.93 and 1.98 for the example here. In fact, with development of the algebra, it can be shown that
2 |
2 |
ˆ 2 ˆ 2 |
= |
(N – 1 )(nB – nA )(sA2 – sB2 ) |
[13.5] |
sC – sC ′ |
= σ C – σ C ′ |
--------------------------------------------------------------(N – 2 )(nA nB ) |
|||
|
|
|
|
|
|
The result shown in Formula [13.5] will be slightly positive or negative according to the relative magnitudes of nB vs. nA and sA vs. sB. If the original sample sizes or variances are similar, so that nB = nA or sA = sB, the values of sC2 and sC2 ′ will be identical.
Nevertheless, it is useful to know the rules and to “play the game” properly. You would probably do just as well in a “surgical scrub” if you used an ordinary good soap, rather than the special chemical solution that is usually provided. You keep your operating-room colleagues happy, however, by using the special chemical. Analogously, you will keep your statistical colleagues happy by using the pooled standard deviation, sp, and the pooled standard error, sC′, to test a zero null hypothesis, and the unpooled sC to test the nonzero alternative hypotheses discussed later.
13.2.6Estimating the Parametric Mean
The last step needed to form the critical ratio for the test statistic is to estimate the parametric mean of the distribution of increments in means, formed as
µc = µA − µB
In Section 7.3.7.2, we learned that the parametric mean is best estimated, on average, from the value of the observed mean. Therefore, µˆ A is best estimated by the observed XA , and µˆ B by the observed XB . Accordingly, the value of µC is best estimated as
µˆ c = µˆ A – µˆ B = XA – XB
13.3 Z or t Tests for Means in Two Groups
With the principles cited throughout Section 13.2, we are now ready to use t or Z for stochastic tests on a difference in two means. The critical ratio will be
|
Observed |
|
|
Parametric |
|
|
– |
|
|
|
Difference |
|
Difference |
|
|
in Means |
|
|
in Means |
------------------------------------------------------------------------ |
||||
|
Standard Error of Difference |
|||
Under the parametric null hypothesis that µA = µB, the second term in the numerator of this ratio vanishes. The standard error of the difference in means can then be estimated in two ways. For the conventional
© 2002 by Chapman & Hall/CRC
null hypothesis, the standard error is estimated as sC ′ from the pooled variance, sp. The critical ratio becomes
t (or Z ) = X-------------------A – XB |
[13.6] |
sC′ |
|
When sc′ is calculated with Formula [13.4] and sp with Formula [13.3], the “working” formula becomes
XA – XB |
|
|
[13.7] |
t (or Z ) = ----------------------------------------------------------------(nA – 1)sA2 + (nB – 1)sB2 ------------------- |
|
|
|
1 |
+ |
1 |
|
--------------------------------------------------------nA – 1 + nB – 1 |
----- |
n----B |
|
nA |
|
Formula [13.7] is used for an ordinary “two-sample” Z or t test.
The alternative formula, used for the “alternative hypotheses” discussed later, places sC rather than sC′
in the denominator. With Formula [13.1] for sC , the critical ratio is |
|
|||||||
|
|
|
|
|
|
|
||
|
XA – XB |
[13.8] |
||||||
t (or Z ) = |
-- |
-------------------sA2 |
|
|
sB2 |
|||
|
+ |
|
||||||
|
----- |
n----B |
|
|||||
|
|
nA |
|
|
|
|||
The reference distribution for interpreting the test statistic formed in either Formula [13.7] or [13.8] depends on sample size. For larger groups, we use the standard Gaussian Z distribution. For smaller groups, the t distribution is applied, with ν = nA − 1 + nB − 1 = N − 2.
The interpretation can produce either P values or confidence intervals or both. P is found as the appropriate value associated with the calculated critical ratio of Z or tυ . A confidence interval is constructed by first choosing Zα or tν ,α for the selected 1 − α level of confidence. The standard error of the difference is then multiplied by Zα (or tν ,α ), and the interval becomes calculated as
(XA – XB ) ±Z α sC ′
or as
(XA – XB ) ± tν , α sC′
13.3.1Example of Calculations
For the two groups of chloride values that were cited in Section 13.2.3, XA − XB = 101.25 – 96.15 = 5.1. The two possible values for SED (calculated in Section 13.2.4) were 1.93 for sC and 1.98 for sC′. The critical ratio (calculated with sC′) would be
101.25 – 96.15 5.1
----------------------------------- = --------- = 2.576 1.98 1.98
For groups having 8 and 15 members, ν = 7 + 14 = 21. The critical ratio (or “test statistic”) would therefore be interpreted from t21 in Table 7.3 as a 2-tailed P value for which .01 < 2P < .025. At the α rejection level of .05, this result is “significant.”
To determine a 95% confidence interval around the observed increment, we first find t21 ,.05 = 2.080. After calculation of (2.080)(1.98) = 4.12, the interval will be (XA – XB ) ± 4.12 = 5.1 ± 4.12, and extends from .98 to 9.22. Because the null-hypothesis value of 0 is excluded, the result is “significant.”
More examples and opportunities for calculation are available in the Exercises at the end of the chapter.
13.3.2Crude Mental Approximation
A crude mental approximation can sometimes be useful as a “screening” test for a “significant” confidence interval. For the approximation, the two individual standard errors, sXA and sXB , are simply added as
© 2002 by Chapman & Hall/CRC
sC ″ = ---------sA + |
-- -------s B |
[13.9] |
nA |
n B |
|
If the results for each group have been summarized with standard errors (rather than standard deviations), the two standard errors can be added immediately. For the example at the end of Section 13.2.3, however, “standard errors were not listed for each group of chloride values. We therefore use Formula [13.9] to calculate the crude combined standard error as sc″ = (4.30/
8 ) + (4.63/
15 ) = 1.520 + 1.195 = 2.715. It is larger than either sC = 1.93 or sC′ = 1.98.
The value of sC″ will always exceed the value of sC obtained from Formula [13.1]. (You can prove this statement by squaring formula [13.9] and comparing sC2 ″ with sC2 in Formula [13.1].) Consequently, any confidence intervals calculated with sC ″ will be larger than those calculated with sC and almost always larger than those calculated with sC′.
The value of the enlarged crude sc″ interval can then be used for a quick, mental answer to questions about stochastic significance. The first step is to approximate Zα or tν ,α as 2 for a 95% confidence interval and then double the sC″ value of 2.715 to produce 5.430. The confidence interval will not include 0 whenever the doubled crude value of sC′′ is less than XA – XB . In this instance XA – XB is 5.1 and the doubled sC″ is > XA – XB . We therefore cannot promptly conclude that “significance” exists. If 2sC″ were smaller than 5.1, however, we could reach the conclusion of stochastic significance without needing the formal test.
Thus, a working rule for the “in-the-head” approach is:
1.Add the two standard errors.
2.Double the added value.
3.If the doubled value is distinctly smaller than the difference in the two means, the result is stochastically significant at 2P < .05.
4.Otherwise, do the formal test. (In this instance, as shown in Section 13.3.1, the formal value of 2P turned out to be <.05.)
When the two means and their standard errors are cited on a “slide” during an oral presentation, the use of this mental tactic, followed by an appropriate comment, can sometimes make you seem like a wizard. Beware of “false negative” conclusions, however. You will seldom commit a “false positive” error in claiming “significance” if 2sC″ is < XA – XB ; but if not, the formal test should be done.
13.3.3Summary of Procedure
Although permuted non-parametric methods (such as the Pitman-Welch test) will probably eventually replace the t test and Z test, the two parametric procedures will remain popular for many years to come. For readers who want a synopsis of “cookbook” directions, two recipes are cited below for the t test. The same recipes are used for Z, when degrees of freedom can be ignored with larger samples (e.g., N > 40).
13.3.3.1Getting a P Value
1. For each group, A and B, calculate a mean, group variance (or “sum of squares”), and a standard deviation. The respective formulas are X = Σ Xi/n; SXX = Σ Xi2 – [(Σ Xi )2 /n]; and s =

SXX /(n – 1 ).
2.Add SXXA + SXXB or, alternatively, (nA − 1) sA2 + (nB − 1) sB2 . Divide the sum by nA + nB − 2. The result is the pooled variance, sp2 , in the data.
3.Calculate the estimated standard error of the difference in means as sp
(1/nA ) + (1/nB ). [It is calculated directly as
(sA /nA2 ) + (sB /nB2 ) for the “alternative” hypothesis.]
4.Choose a value of α and decide whether the result will be interpreted in a two-tailed or onetailed direction. For a one-tailed test, t is associated with half the probability value of a
©2002 by Chapman & Hall/CRC
