Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

bootstrap resampling of a group with 20 members would require 1.05 × 1026 resampled arrangements. Consequently, a smaller resampling, consisting of perhaps 1000 randomly selected bootstrapped samples, is usually prepared. After the values of the means in those 1000 samples are ranked, an ipr95 interval would depend on the boundaries of the inner 950 values. This ipr 95 is called a 95% confidence interval for the range of possible variation of the observed mean.

7.2.2.1 Illustration of Bootstrap Process — A complete bootstrap rearrangement was prepared earlier in Section 6.4.1.1, and in Tables 6.1 and 6.2. Another example of the process is shown now for all the samples of two items that can be drawn from the set of data {1, 2, 3, 4}. The number of samples is small enough to show everything easily, whereas 256 entries (= 44) would be needed for a complete bootstrap of all four items.

When considered as a parent population, this 4-item set of data has 2.5 as its mean, µ, and its group variance is Σ X2i n X2 = 30 – 25 = 5. Because the original data set is a parent population for the bootstrap procedure, its standard deviation is calculated with n, as σ = 5/4 = 1.118.

With m members in the original data set, and n members in each sample, the number of possible bootstrap samples is mn. In this instance, with m = 4 and n = 2, we can expect 42 = 16 samples of two members each. Those 16 samples are shown in Table 7.2.

TABLE 7.2

Bootstrap Samples of Two Items Each from the Set of Data {1,2,3,4}

 

Mean of This

Deviation

Squared Deviation

Sample of Two Items

Sample

from µ

from µ

 

 

 

 

1,1

1

1.5

2.25

1,2

1.5

1.0

1.00

1,3

2

0.5

0.25

1,4

2.5

0

0

2,1

1.5

1.0

1.00

2,2

2

0.5

0.25

2,3

2.5

0

0

2,4

3

+0.5

0.25

3,1

2

0.5

0.25

3,2

2.5

0

0

3,3

3

+0.5

0.25

3,4

3.5

+1.0

1.00

4,1

2.5

0

0

4,2

3

+0.5

0.25

4,3

3.5

+1.0

1.00

4,4

4

+1.5

2.25

Total of Foregoing Values

40

0

10

Mean of Foregoing Values

2.5

0

(10/16) = 0.625 = variance

0.625 = .791 = standard deviation

The distribution of means in those 16 samples is shown in Figure 7.1.

The next few sections contain a general discussion of the bootstrap results, with illustrations supplied from Table 7.2 and Figure 7.1.

7.2.2.2Mean of the Resampled Means — The mean of a collection of bootstrapped sample means is the same as the mean in the original set of data. In Table 7.2, the bootstrapped means extend, as expected, from 1 to 4, but their mean value is 2.5. (The same principle occurred for the complete resampling of means in Tables 6.1 and 6.2.)

7.2.2.3Shape of the Distribution of Means — Regardless of the shape of the original data set that becomes the parent population, the collection of means in a bootstrapped sample will have a

©2002 by Chapman & Hall/CRC

Gaussian distribution, provided that n is large enough. This assertion about a Gaussian distribution for resampled means is particularly crucial for parametric reasoning and will be discussed later in Section 7.3.6.

In Table 7.2, the array of means for samples where n = 2 obviously does not have enough items to produce a smooth curve, but the shape in Figure 7.1 clearly has the symmetry and convex central crest that are consistent with a Gaussian distribution. (The shape seen earlier in Table 6.2 is also “Gaussianoid”).

 

4

Number of

3

Samples

 

with Each

2

Mean

 

 

1

1

1.5

2

2.5

3

3.5

4

Values of Sample Means

FIGURE 7.1

Distribution of means in bootstrapped samples shown in Table 7.2.

7.2.2.4Standard Deviation of the Sample Means — In bootstrapped samples containing n

members each, the standard deviation of the means is σ / n . This formula, which is another basic concept discussed later in parametric reasoning, was illustrated earlier in Section 6.4.2. The formula is again confirmed in Table 7.2, where the variance of the 16 bootstrapped sample means is .625 and their standard deviation is .791. With σ = 1.118 in the parent population, the formula σ / n produces 1.118/ 2 = .791.

7.2.2.5Inner Percentile Ranges for the Samples of Means — We can determine certain ipr zones for the resampled means directly from the results shown in Table 7.2. Each of the 16 items

occupies 1/16 or .0625 of the distribution. Consequently, the ipr95 and ipr90 , which respectively go from the 2.5 to 97.5 percentiles and from the 5 to 95 percentiles, will be bounded by the extreme values of

1 and 4. The interquartile range, from the 25 to 75 percentiles, will cover the inner 50% of the data.

The lower quartile occurs at 2, and the upper quartile at 3, so that the ipr50 would be from 2 to 3. Actually, because 10 of the 16 sample means lie between the values of 2 and 3, this zone occupies 62.5% of the

data. Because 14 of the 16 means lie between the values of 1.5 and 3.5, the inner zone from 1.5 to 3.5 will cover 14/16 (= 87.5%) of the data.

Thus, examining the data in Table 7.2, we can choose the size of the desired zone of confidence — 50%, 90%, 95%, etc. — and demarcate the corresponding boundaries of the confidence interval for the resampled means.

7.2.2.6Application of Bootstrap — The foregoing illustration was intended merely to show how the bootstrap works. The method is seldom used or needed in determining a confidence interval for the mean of a single group of dimensional data. In fact, perhaps the main value of the bootstrap for such data is to offer empiric confirmation for the parametric theory, discussed later, that the standard

error of a mean is σ/ n . Nevertheless, a bootstrap confidence interval can be valuable for small data sets that may not always comply with theoretical principles.

For example, for the 3-item data set {1, 6, 9} in Section 6.4.1.1, the mean was 5.33 and the standard error of the mean was 1.905. Using the parametric theory to be discussed shortly, a 70.4% confidence interval for the mean would be constructed with α = .296 and Zα = 1.04. The interval would be placed symmetrically around the mean as 5.33 ± (1.04) (1.905) = 5.33 ± 1.98, and would extend from 3.35 to 7.31. As shown in the bootstrap resampling in Table 6.2 and Section 6.4.2, however, the 70.4% confidence interval actually extends from 3.67 to 7.00.

© 2002 by Chapman & Hall/CRC

The bootstrap method becomes particularly valuable when we turn to binary data in Chapter 8, because each item of data is coded as either 0 or 1. The bootstrap is therefore relatively easy to use and display; and in fact, a mathematical model devised several centuries ago by Bernouilli can indicate exactly what would be produced by a complete bootstrap.

7.3Theoretical Principles of Parametric Sampling

Although the jackknife and bootstrap methods seem easy to understand and were easy to illustrate in the deliberately simple foregoing examples, the “computer-intensive’’ calculations can become formidable for most of the groups of data collected in medical research. Consequently, in the era before ubiquitous availability of digital computers, the alternative approach, using parametric theories, became popular because the associated calculations were easy. The harder part, for nonmathematicians, was to understand the parametric reasoning.

The parametric sampling method begins with the idea that a group of n items has been randomly sampled from a parent population, which has its own correct values for such parameters as the mean and standard deviation. We do not know what those parameters are, and the goal is to estimate their values from the sample. To be distinguished from the corresponding results in the observed sample, the population parameters are designated with Greek symbols: µ for the mean and s for the standard deviation in dimensional data and π for the (binary) proportion in binary data. The corresponding symbols for the sampled group would be X , s, and p. The aim is to observe the results for X , s, or p in the sample, and to use them for inferences about the unobserved populational parameters: µ, σ , or π .

The parametric process uses a sequence of reasoning that involves the following activities:

1.Repetitive sampling from the parent population.

2.Calculation of central index (or other anticipated parameters) in each sample.

3.Distribution of the central indexes of those samples.

4.Standard deviation, i.e., “standard error,” of the central indexes.

5.Calculation of a “test statistic” for each sample.

6.Distribution of the test statistics.

7.Confidence interval for the population parameters.

8.Use of test statistics for an individual group.

These activities are described in the sections that follow. Note that the first four steps are exact counterparts of what occurred earlier in bootstrap resampling.

Figure 7.2 illustrates the difference between the theoretical and empirical methods of rearranging a group of data. In the theoretical method, a parametric population is inferred, from which theoretical repeated samplings are then taken. In the empirical method, specific groups or resamples are reconstructed from the observed data.

7.3.1Repetitive Sampling

In the theoretical parametric process, we repeatedly take random samples of n members from a parent population. For this process, we assume that the parent population is infinitely (or at least very) large. In smaller or “finite” populations, the theory requires each sample member to be returned immediately afterward, so that the next sample member is obtained after the replacement. In the parametric process, however, we do not actually “take” each sample. The entire activity is theoretical.

7.3.2Calculation of Anticipated Parameters

In each of the repeated (although theoretical) samples, we determine the observed values of the central indexes or other parameters that are being estimated. They would be the mean, X , and standard deviation, s, for samples of dimensional data, and the binary proportion, p, in samples of binary data.

© 2002 by Chapman & Hall/CRC

Inferred

,

,

Parametric

Population

 

 

Theoretical

Theoretical Repeated

 

Samples

 

 

Observed

Group

Empirical

, , ,

Reconstructed Groups or Resamples

FIGURE 7.2

Theoretical and empirical methods of arrangement.

7.3.3Distribution of Central Indexes

Each sample in the repetitive array can be denoted with the subscript j, and each has its own central index: a mean Xj or a binary proportion pj. Thus, the first sample of n items has mean X1 ; the second

sample of n items has mean X2; the third sample has mean X3, and so on. Each of those central indexes will have its own deviation from the parent population’s parametric value. The deviations will be Xj µ for each mean or pj − π for each proportion. The parametric reasoning depends on the average results found in the deviations of those central indexes.

The pattern of deviations will turn out to have a mathematical distribution that allows decisions about stability, confidence intervals, and even a test of statistical hypothesis about zones of probability.

7.3.4Standard Error of the Central Indexes

[The discussion that follows is confined to samples of dimensional data for which the central index is a mean. The process for binary data, summarized with proportions, will be considered in Chapter 8].

The series of means, { X1 , X2 , …, Xj , …}, found in the repeated samples becomes a sample of its own, with its own mean, its own standard deviation, and its own pattern of distribution. The standard deviation of that series of means is the crucial entity called the standard error of the mean. A prime step in the parametric reasoning, therefore, is to estimate the standard deviation, i.e., standard error, of the hypothetical sample of sample means.

7.3.4.1Standard Deviation of the Means — The standard error of the mean, which is the

standard deviation in the theoretical sample of repetitive means, { X1 , X2 , …, Xj , …}, is often abbreviated as s.e., s.e.m., or SEM, and is symbolized as σ x . Intuitively, you would expect the standard deviation of a set of means to be smaller than σ , the standard deviation in the parent population of data.

The actual value is

standard error of mean = σ x = σ /n

The accuracy of this formula was confirmed with bootstrap sampling for a 3-item data set in Section

6.4.2and for a 2-item data set in Section 7.2.2.4. A more formal and general proof of the formula

©2002 by Chapman & Hall/CRC

requires a good deal of mathematics that has been relegated to the Appendix, which can be examined or omitted as you wish. The mathematics involves determining the variance in a sum or difference of two variables (Appendixes A.7.1 and A.7.2), the variance of a parametric distribution (Appendix A.7.3), and finally, the variance of a sample mean (Appendix A.7.4).

7.3.4.2 Estimation of σ with s — Because we do not know the parent population’s value of σ , it must be estimated. Diverse candidates might be used from the available single sample of observed data, but the best “unbiased” estimate on average comes from the value of s produced when the observed sample’s group variance is divided by n 1. Thus, using the “^” (or “hat”) symbol to indicate an estimated value, we decide that the best estimate (for dimensional data) is

σˆ = s = Sxx /(n – 1 )

This assertion is proved in Appendixes A.7.4 and A.7.5.

7.3.4.3 Estimation of Standard Error — Because σ /n is the standard error of the parametric data and σ is estimated with s for the sample, we can estimate the standard error of the mean as

sx = s/n

For the set of dimensional data in Exercise 4.3, the standard deviation, s, was calculated with n 1 to be 80.48. With n = 20, the standard error of the mean would be estimated as 80.48/20 = 18.0 .

With the standard error, we can eventually construct a zone of probable location, called a confidence interval, for the parametric mean, µ. We can also develop an index of stability for the observed mean, X .

7.3.5Calculation of Test Statistic

In Chapter 4, we constructed descriptive Z-scores as standard deviates formed in a set of data by the “critical ratio” of (observed value mean value) divided by standard deviation of the data. These ratios, or other analogous entities, are regularly constructed for inferential activities as indexes that are called test statistics because they are often used for testing statistical hypotheses that are discussed later.

For descriptive Z-scores, we learned earlier that each score can be associated with a particular probability (or percentile) if the data come from a mathematical distribution that allows a correspondence between Z scores and probability values. This correspondence occurs with a Gaussian distribution, but other correspondences occur for the associated distributions of other test statistics. For example, probability values can be readily determined between values of a different test statistic (to be discussed later) and a distribution called chi-square.

For the sample of means we have been theoretically assembling, a Zj test statistic can be determined

for each mean, Xj . A particular observed value in the sample of means will be Xj ; the mean value of the observed means will be the parametric mean, µ; and the standard deviation in the sample of means will be the standard error, σ /n . Thus, as repetitive samples of means are drawn from the population with parameters µ and σ , each of the sampled means, Xj , can be expressed in the Z-score format of a standardized deviate,

 

 

 

µ

 

Zj =

X--------j

[7.1]

 

σ /

n

 

If we had a mathematical model for the distribution of these Zj values, each Zj could be associated with a P value that indicates its percentile location.

7.3.6Distribution of the Test Statistics

The distribution of the test statistic in Formula [7.1] is demonstrated by the Central Limit Theorem, which is one of the most majestic discoveries in statistics and in all of mathematics.1 The proof of this

© 2002 by Chapman & Hall/CRC

theorem, which requires too many pages of higher mathematics to be shown here, received contributions from many stars in the firmament of statistics: DeMoivre, Gauss, Laplace, Legendre, Tchebyshev, Markov, and Lyapunov.

According to this theorem, repeated samples of means, each having sample size n, will have a Gaussian distribution, regardless of the configuration of shape in the parent population. Thus, if you take repeated samples from data having a Gaussian distribution, an exponential distribution, a uniform distribution, or any of the other shapes shown in Figure 3.4, the means of those repeated samples will have a Gaussian distribution. Consequently, the values of Zj calculated with Formula [7.1] will also have a Gaussian distribution.

The crucial contribution of the Central Limit Theorem, therefore, is that it allows the use of Gaussian mathematical models for inferences about means, even when Gaussian models are unsatisfactory for describing an observed set of data.

7.3.7Estimation of Confidence Interval

Once we know that Zj has a Gaussian distribution for the samples of means, we can take care of the original challenge of finding a zone to locate the unobserved value of the parametric mean, µ. As the theoretical sampling process continues over and over, positive and negative values of Zj will occur when

calculated for each Xj found as the sampled mean. Because of the Gaussian distribution, Zj will be a standard Gaussian deviate and can be used or interpreted with Gaussian P values. We can therefore solve Formula [7.1] for µ, substituting s for σ and bearing in mind that the Zj values might be positive or negative. The result is

µ =

X

j ± Zj( s/ n ).

[7.2]

If we want to denote an inner probability zone for the location of µ, we replace Zj by an assigned value of Zα , using Gaussian levels of α for the size of the desired zone.

7.3.7.1Choice of Xj Formula [7.2] offers an excellent method for locating µ, but has one major problem. It is based on the array of theoretical means denoted as Xj and on each corresponding Zj value. In reality, however, we shall obtain a single random sample and have to work with its results. The next section demonstrates that we use the observed sample’s value of X to substitute for Xj in Formula [7.2].

7.3.7.2Estimation of — In Chapter 6 and also in the current chapter, the bootstrap or jackknife

resamples had a mean that was the same as the mean of the original group. An analogous event happens in the repetitive sampling now under discussion. The mean of the series of means will turn out to be µ,

the parametric mean. In the bootstrap and jackknife procedures, however, the original observed group was the parent population used for the resampling procedures, but in the current activities, the parent population was not observed. Although the parametric reasoning allows us to contemplate a theoretical

series of samples, the reality is that we obtain a single random sample from the parent population. We therefore have to estimate the value of µ from something noted in that single observed sample. The

“something” might be the sample’s mean or median or some other selected index. The best estimate of µ, according to a mathematical proof not shown here, is produced, on average, by the observed mean,

X . For example, if the twenty blood sugar measurements in Exercise 4.3 were a random sample from a larger population, its mean, µ, would be estimated from the sample mean as µˆ = X = 120.10.

7.3.7.3 Choice of Zα If the observed X is used for Xj , we can substitute a designated value of Zα for Zj , and Formula [7.2] will become

µ =

X

± Zα (s/ n )

[7.3]

© 2002 by Chapman & Hall/CRC

After choosing Zα , we can promptly calculate a confidence interval for µ. For example, suppose we

want an inner probability zone of 90% for an observed value of X . In Table 6.3, the corresponding value of Zα is 1.645. The corresponding zone for µ would be X ± 1.645 ( s/n ). If we wanted an inner probability zone of 95%, the value of Zα in Table 6.3 is 1.96, and the zone would be X ± 1.96 ( s/n ).

The choice of α will affect the amount of confidence we can have that µ is truly located in the cited interval. The level of confidence is 1 − α . Thus, if α = .05, 1− α = .95, and the result is called a

95% confidence interval. If α = .1, 1 − α = .90, and the result is a 90% confidence interval. Note that confidence increases as α gets smaller; but as α gets smaller, Zα gets larger, so that the size of the confidence interval enlarges.

For example, suppose a particular random sample of 25 members produces X = 10 and s = 2,

so that s/n = 2/5 = .4 . With a 90% confidence interval, we can estimate µˆ = X ± 1.645

s/n = 10 ± (1.645)(.4) = 10 ± .658. The interval extends from 9.342 to 10.658. With a 95% confidence

interval, the estimate for µˆ is 10 ± (1.96)(.4) = 10 ± .784; and the interval extends from 9.218 to 10.784.

We can be more confident that the true parametric mean, µ, will be contained in the larger interval (for 95%) than in the smaller interval (for 90%).

7.3.7.4 Contents of Confidence Interval — The usual way of expressing a confidence-inter - val estimate is to say that we are 95% (or whatever was chosen as 1 − α ) confident of finding the correct value of µ in the calculated interval. Strictly speaking, however, the actual confidence is that µ will be included in 95% of the zones formed by calculations from all the potential values of Xj in the repeated samples. The reason for this distinction is that the confidence intervals are calculated around the observed values of Xj , not the unobserved value of µ.

Suppose we took ten samples of means, each group having 25 members, from a parent population having

µ = 36 and σ = 10. The standard error of the mean will be σ / n

= 10/ 25 = 2. The 95% confidence

interval around each sampled mean will be

X

j ± (1.96 )( 2) =

X

j

± 3.92 . When each sampled mean is

checked, we find the following:

 

 

 

 

 

 

 

 

 

 

Confidence

Does This Interval

 

Value of

X

j

 

Interval

Include 36?

 

 

 

 

 

 

 

 

 

 

40.2

 

 

36.28 to 44.12

No

39.7

 

 

35.78 to 43.62

Yes

38.2

 

 

34.28 to 42.12

Yes

37.1

 

 

33.18 to 41.02

Yes

36.3

 

 

32.38 to 40.22

Yes

35.4

 

 

31.48 to 39.32

Yes

34.6

 

 

30.68 to 38.52

Yes

33.5

 

 

29.58 to 37.42

Yes

32.7

 

 

28.78 to 36.62

Yes

31.8

 

 

27.88 to 35.72

No

In this instance, the true parametric value of the mean, 36, was included in 8 of the 10 confidence intervals calculated with Z.05 = 1.96, but in repetitive sampling, the parametric value would (or should) be included in 95% of those intervals.

7.3.8Application of Statistics and Confidence Intervals

As a conventional method of denoting “stability,” confidence intervals have two main applications: to estimate the zone of location for a parametric index and to indicate the extremes of potential variation for an observed central index.

7.3.8.1 Parametric Zones — The most obvious application of confidence intervals is in the role for which they were originally devised. In political polls, market research, or industrial testing, the goal is to infer the parametric attributes of a larger population by using what is found in a smaller random

© 2002 by Chapman & Hall/CRC

sample. The confidence interval indicates the zone in which the desired parameter (such as the mean) is likely to occur; and the selected value of Zα increases or decreases the confidence by affecting the size of the zone and thus the probability that is located in that zone. The process will be further illustrated in Chapter 8, for political polls or election-night forecasting, when the sampled central index is a proportion, p, rather than a mean, X .

7.3.8.2 Potential Extremes of Variation — In contrast to the process of parametric sampling, the observed group in most medical research is not chosen as a random sample, and a parameter is not being estimated. Nevertheless, because the group is a sample of something, the central index might have numerical variations arising merely from the few or many items that happen to be included. For example, a particular treatment might have a success rate of 50% in the “long run” of 200 patients, but might be successful in 75% of a “short run” of 4 patients. By making provision for numerical caprices that might occur in a group of size n, the confidence interval serves to warn or anticipate what could happen in the longer run.

For observed groups, the confidence interval is calculated from the ± contribution of Zα ( s/n ) and is placed symmetrically around the mean, X . The two extreme ends of the interval will indicate how large or small X might really be. For example, if the mean weight of a group of adults is 110 pounds, we might conclude that they are relatively thin until we discover that the mean itself might vary, in an 80% confidence interval, from 50 to 170 pounds.

This role of confidence intervals becomes particularly important later for comparison of two groups in Chapter 13. Whether the observed difference between the two means is tiny or huge, the extremes of the confidence interval for the difference in means can be inspected, before any firm conclusions are drawn, to show how big that difference might really be.

7.3.9Alternative Approaches

Despite the popularity and customary use of the foregoing procedures, they can often produce problems, particularly with the small data sets discussed in Section 7.2.2.6 and throughout Section 7.5. Many statisticians today may therefore advocate that parametric procedures be replaced and that confidence intervals be obtained with bootstrap methods. 2 They have the advantage of being applicable to any type of distribution and can also be used3 for the ranked data discussed in Chapter 15.

7.4Criteria for Interpreting Stability

Regardless of whether we use parametric theory or empirical resampling (with bootstraps or jackknives), the decision about stability of a central index can be made intrinsically, from its own potential variation, or extrinsically, in reference to a selected outer boundary of “tolerance.”

7.4.1Intrinsic Stability: Coefficient of Stability

In Section 5.5.1, the coefficient of variation (c.v.) was calculated as s/X to indicate the “effectiveness” with which the mean represents the individual items of data. If the coefficient was too large, the mean was not effective. This same approach, which was introduced in Section 6.4.2, can be used to denote the effectiveness or stability with which the observed mean, X , represents the potential means X1 , X2 , …, Xj , … that might have been obtained in the repetitive sampling.

Because s/n is the standard deviation, i.e., standard error, of the samples of means, the coefficient of stability for the mean can be calculated as

s/ n

-----------

X

© 2002 by Chapman & Hall/CRC

We would expect this coefficient to be much smaller than the ordinary c.v., which is multiplied by the factor 1/n to form the c.s. (coefficient of stability); but specific standards have not been established for “stability.” If .25 is used as a liberal upper boundary for s/X to be “effective” and if 20 is a typical sample size, then 20 = 4.47 and 1/4.47 = .22. A crude upper boundary for “stability” would therefore be .25 × .22 = .056. Because the value of .05 has many other mnemonic stimuli in statistics, we might use it as a rough guide for intrinsic stability in considering the potential variation for a mean.

In the data set of Exercise 3.3, the mean was 120.10, and the standard error was 18.0. The value of 18.0/120.10 = .15 would suggest that the mean in that set of data is not stable.

7.4.2Extrinsic Boundary of “Tolerance”

A different approach to stability is to establish an extrinsic critical boundary of “tolerance” that should be avoided or exceeded by the observed results. For example, a political candidate might want to be sure that she is getting more than 50% of the votes; or we might want the mean value of blood lead in a group of children to be below 10 units. If the observed central index for the group is suitably above (or below) the selected boundary of tolerance, the confidence interval can be used to provide assurance that the boundary level is suitably avoided.

The multiplication by Zα will make the size of the interval vary according to the “confidence” produced with the choice of α . We can make the interval small by choosing a relatively large α , such as .5, for a 50% confidence interval in which Zα will be only .674. For greater confidence, with a 95% interval, α will be much smaller at .05, and Zα will enlarge about three times to 1.96. If the confidence interval is small with a large Zα , we can feel particularly confident about its stability in relation to the chosen boundary.

For example, suppose the mean blood level of lead in a group of children is 7.1 with an SEM of 2.2. With a 50% confidence interval, the potential spread will be 7.1 ± (.674)(2.2), and will extend from 5.6 to 8.6. With a 95% confidence interval, however, the factor (1.96)(2.2) = 4.3 will make the interval extend from 2.8 to 11.4. Thus, we could be 50% confident that the mean lead level is below the tolerance boundary of 10, but not 95% confident. Similarly, a political candidate who is favored by 60% of a random sample of 30 voters might have different degrees of confidence that the percentage exceeds the tolerance level of >50% needed for winning the election. (The use of confidence intervals with binary data and the mechanisms used for “election night predictions” are discussed in Chapter 8.)

7.5Adjustment for Small Groups

All of the foregoing ideas about the Gaussian distribution of sampled means and values of Zj are splendid, but not strictly true. They pertain only if the group sizes, n, are suitably large (usually above 30). For smaller groups, the critical ratio calculated for Zj as

X – µ

-------------

σ /n

has a distribution that is not strictly Gaussian. By pointing out this distinction in a 1908 paper 4 published under the pseudonym of “Student,” W. S. Gosset, a statistician working at the Guinness brewery, became a member of the statistical hall of fame.

7.5.1The t Distribution

Gosset showed that for small samples of n, the ratio (X – µ)/( σ /n ) has a sampling-distribution that resembles the basic Gaussian curve, while differing at the peaks and tails. The sampling distribution is often called t, but you may prefer Gossetian if you like eponyms. Furthermore, the patterns of a

© 2002 by Chapman & Hall/CRC

t distribution have slight variations in shape according to the degrees of freedom (d.f.) in each sample. For a sample having n members, the value for degrees of freedom is

ν = n – 1.

Figure 7.3 shows the basic resemblances and variations for a standard Gaussian distribution and for standard t distributions having 4 and 14 degrees of freedom. The same critical ratio, (X – µ)//n ) , is used to calculate either Z or t; and the three curves in Figure 7.3 were plotted for critical ratio values at intervals of 0.5 from 0 to ± 3.0. The t curves are slightly shorter in the center and slightly taller in the tails than the Gaussian curve. The two curves become quite close, however, as group sizes (or degrees of freedom) increase.

RELATIVE FREQUENCY

.400

Z

 

ν

= 14

=

)

 

.350

 

 

ν

= 4

 

 

 

.300

.250

.200

.150

.100

.005

-3.0

-2.0

-1.0

0

+1.0

+2.0

+3.0

VALUES of Z or t

FIGURE 7.3

Relative frequencies associated with Z and with t distribution at 14 and 4 degrees of freedom.

7.5.2Explanation of Degrees of Freedom

The term degrees of freedom, abbreviated as d.f., appears so often in inferential statistics that it warrants a brief explanation.

In the actual data, we find X as the mean of a sample of n members. For inferential purposes thereafter, we use X to estimate µ, the population mean. When the repetitive theoretical sampling takes place, we assume that each of those samples also offers an estimate of µ. Consequently, when each sample is drawn from the parent population, the first n 1 members can vary in any possible way, but the last member

is constrained. For each sample to be an estimate of µ, with µˆ = X , the sum of its values must be Σ Xi = nµˆ = nX .

For example, suppose a group of 6 people have a mean of 73 units, so that Σ Xi = 6 × 73 = 438. In repeated theoretical sampling thereafter from a parent population with µˆ = 73 , the first five values of a sample might be 132, 63, 72, 52, and 91, with a sum of 410. To estimate µˆ as 73, however, the total sum of the sample values should be 438. Therefore, the sixth member of the group must have a value of 438 410 = 28.

This constraint removes one degree of freedom from the possible variations in Xi. Before the observed group was formed, n values could be chosen freely for any sample. Once that group has been formed, however, and after the parametric value of µˆ has been estimated from X , only n 1 values can vary freely in each sample thereafter.

© 2002 by Chapman & Hall/CRC