Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

An Introduction To Statistical Inference And Data Analysis

.pdf
Скачиваний:
2
Добавлен:
10.07.2022
Размер:
943.46 Кб
Скачать

180

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

²An experimental unit is a hypertensive patient.

²Two measurements (blood pressure before and after treatment) are taken on each experimental unit.

²Let Bi and Ai denote the blood pressures of patient i before and after treatment.

²Let Xi = Bi ¡ Ai, the decrease in blood pressure for patient i.

Example 3: A graduate student investigated the e®ect of Parkinson's disease (PD) on speech breathing. She recruited 15 PD patients to participate in her study. She also recruited 15 normal control (NC) subjects. Each NC subject was carefully matched to one PD patient with respect to sex, age, height, and weight. The lung volume of each study participant was measured. For this experiment:

²An experimental unit was a matched PD-NC pair.

²Two measurements (PD and NC lung volume) were taken on each experimental unit.

²Let Di and Ci denote the PD and NC lung volumes of pair i.

²Let Xi = log(Di=Ci) = log Di ¡ log Ci, the logarithm of the PD proportion of NC lung volume.

This chapter is subdivided into sections according to distributional assumptions about the Xi:

9.1If the data are assumed to be normally distributed, then we will be interested in inferences about the population's center of symmetry, which we will identify as the population mean.

9.3If the data are only assumed to be symmetrically distributed, then we will also be interested in inferences about the population's center of symmetry, but we will identify it as the population median.

9.2If the data are only assumed to be continuously distributed, then we will be interested in inferences about the population median.

Each section is subdivided into subsections, according to the type of inference (point estimation, hypothesis, set estimation) at issue.

9.1. THE NORMAL 1-SAMPLE LOCATION PROBLEM

181

9.1The Normal 1-Sample Location Problem

In this section we assume that X1; : : : ; Xn » Normal(¹; ¾2). As necessary, we will distinguish between cases in which ¾ is known and cases in which ¾ is unknown.

9.1.1Point Estimation

Because normal distributions are symmetric, the location parameter ¹ is the center of symmetry and therefore both the population mean and the population median. Hence, there are (at least) two natural estimators of ¹, the

¹

^

sample mean Xn and the sample median q2

(Pn). Both are consistent, unbi-

ased estimators of ¹. We will compare them by considering their asymptotic relative e±ciency (ARE). A rigorous de¯nition of ARE is beyond the scope of this book, but the concept is easily interpreted.

If the true distribution is P = N(¹; ¾2), then the ARE of the sample median to the sample mean for estimating ¹ is

2 :

e(P ) = ¼ = 0:64:

This statement has the following interpretation: for large samples, using the sample median to estimate a normal population mean is equivalent to randomly discarding approximately 36% of the observations and calculating the sample mean of the remaining 64%. Thus, the sample mean is substantially more e±cient than is the sample median at extracting location information from a normal sample.

In fact, if P = N(¹; ¾2), then the ARE of any estimator of ¹ to the sample mean is · 1. This is sometimes expressed by saying that the sample mean is asymptotically e±cient for estimating a normal mean. The sample mean also enjoys a number of other optimal properties in this case. The sample mean is unquestionably the preferred estimator for the normal 1- sample location problem.

9.1.2Hypothesis Testing

If ¾ is known, then the possible distributions of Xi are

n

o

Normal(¹; ¾2) : ¡1 < ¹ < 1 :

182 CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

If ¾ is unknown, then the possible distributions of Xi are

n o

Normal(¹; ¾2) : ¡1 < ¹ < 1; ¾ > 0 :

We partition the possible distributions into two subsets, the null and alternative hypotheses. For example, if ¾ is known then we might specify

H0 = nNormal(0; ¾2)o

and H1 = nNormal(¹; ¾2) : ¹ 6= 0o ;

which we would typically abbreviate as H0 : ¹ = 0 and H1 : ¹ 6= 0. Analogously, if ¾ is unknown then we might specify

n o

H0 = Normal(0; ¾2) : ¾ > 0

and n o H1 = Normal(¹; ¾2) : ¹ 6= 0; ¾ > 0 ;

which we would also abbreviate as H0 : ¹ = 0 and H1 : ¹ 6= 0. More generally, for any real number ¹0 we might specify

n o n o

H0 = Normal(¹0; ¾2) and H1 = Normal(¹; ¾2) : ¹ 6= ¹0

if ¾ is known, or n o H0 = Normal(¹0; ¾2) : ¾ > 0

and n o H1 = Normal(¹; ¾2) : ¹ 6= ¹0; ¾ > 0

if ¾ in unknown. In both cases, we would typically abbreviate these hypotheses as H0 : ¹ = ¹0 and H1 : ¹ 6= ¹0.

The preceding examples involve two-sided alternative hypotheses. Of course, as in Section 8.4, we might also specify one-sided hypotheses. However, the material in the present section is so similar to the material in Section 8.4 that we will only discuss two-sided hypotheses.

The intuition that underlies testing H0 : ¹ = ¹0 versus H1 : ¹ 6= ¹0 was discussed in Section 8.4:

² If H0 is true, then we would expect the sample mean to be close to the

 

population mean ¹0.

²

¹

= x¹n is observed far from ¹n, then we are inclined to

Hence, if Xn

reject H0.

9.1. THE NORMAL 1-SAMPLE LOCATION PROBLEM

183

To make this reasoning precise, we reject H0 if and only if the signi¯cance probability

P = P¹0

¡¯X¹n ¡ ¹0

¯ ¸ jx¹n ¡ ¹0j¢ · ®:

(9.1)

The ¯rst equation in (9.1)¯

is a formula¯

for a signi¯cance probability.

Notice that this formula is identical to equation (8.2). The one di®erence between the material in Section 8.4 and the present material lies in how one computes P . For emphasis, we recall the following:

1.The hypothesized mean ¹0 is a ¯xed number speci¯ed by the null hypothesis.

2.The estimated mean, x¹n, is a ¯xed number computed from the sample. Therefore, so is jx¹n ¡ ¹0j, the di®erence between the estimated mean and the hypothesized mean.

¹

3. The estimator, Xn, is a random variable.

4. The subscript in P¹0 reminds us to compute the probability under

H0 : ¹ = ¹0.

5. The signi¯cance level ® is a ¯xed number speci¯ed by the researcher, preferably before the experiment was performed.

To apply (9.1), we must compute P . In Section 8.4, we overcame that technical di±culty by appealing to the Central Limit Theorem. This allowed us to approximate P even when we did not know the distribution of the Xi, but only for reasonably large sample sizes. However, if we know that X1; : : : ; Xn are normally distributed, then it turns out that we can calculate P exactly, even when n is small.

Case 1: The Population Variance is Known

Under the null hypothesis that ¹ = ¹0, X1; : : : ; Xn » Normal(¹0; ¾2) and

³ ´

¹ » 2 Xn Normal ¹0; ¾ :

¹

This is the exact distribution of Xn, not an asymptotic approximation. We

¹

convert Xn to standard units, obtaining

 

¹

¹0

» Normal ³¹0; ¾2´ :

 

 

Xn

 

Z =

¾=¡p

 

 

(9.2)

n

184

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

The observed value of Z is

n ¡ ¹0 z = ¾=pn :

The signi¯cance probability is

¹

¡¯

 

X¹n¡

¹0¯

¸ j

 

n¡

¹j0¢

 

 

 

¯

¹

 

¯

 

 

 

 

 

 

 

P = P 0

 

Xn

¹0

 

¯

¸

n

¹0

 

!

= P¹0

ï

¾=¡pn

 

¯

¾=¡pn

¯

 

 

¯

 

 

 

¯

 

 

 

 

 

 

¯

 

 

 

¯

 

¯

 

 

¯

 

 

 

¯

 

 

 

¯

 

¯

 

 

¯

 

 

 

¯

 

 

 

¯

 

¯

 

 

¯

 

=P (jZj ¸ jzj)

=2P (Z ¸ jzj) :

In this case, the test that rejects H0 if and only if P · ® is sometimes called the 1-sample z-test. The random variable Z is the test statistic.

Before considering the case of an unknown population variance, we remark that it is possible to derive point estimators from hypothesis tests. For testing H0 : ¹ = ¹0 vs. H1 : ¹ 6= ¹0, the test statistics are

¹ ¡

Z(¹0) = X¾=n pn¹0 :

¹

minimizes jz(¹0)j? Clearly,

If we observe Xn = x¹n, then what value of ¹0

the answer is ¹0 = x¹n. Thus, our preferred point estimate of ¹ is the ¹0 for which it is most di±cult to reject H0 : ¹ = ¹0. This type of reasoning will be extremely useful for analyzing situations in which we know how to test but don't know how to estimate.

Case 2: The Population Variance is Unknown

Statement (9.2) remains true if ¾ is unknown, but it is no longer possible to compute z. Therefore, we require a di®erent test statistic for this case. A natural approach is to modify Z by replacing the unknown ¾ with an estimator of it. Toward that end, we introduce the test statistic

¹ ¡

Tn = Xn p¹0 ; Sn= n

where Sn2 is the unbiased estimator of the population variance de¯ned by equation (8.1). Because Tn and Z are di®erent random variables, they have di®erent probability distributions and our ¯rst order of business is to determine the distribution of Tn.

We begin by stating a useful fact:

9.1. THE NORMAL 1-SAMPLE LOCATION PROBLEM

185

Theorem 9.1 If X1; : : : ; Xn » Normal(¹; ¾2), then

 

 

 

 

 

n

¡

 

 

 

¢

 

 

 

 

 

 

 

(n ¡ 1)Sn2

Xi

 

i ¡

X¹

2 2

»

Â2

 

¡

 

 

¾2

=

 

X

n

(n

1):

 

=1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The Â2 (chi-squared) distribution was described in Section 4.5 and Theorem

9.1is closely related to Theorem 4.3. Next we write

 

 

 

¹

 

 

 

¹

 

 

 

T

 

=

 

Xn ¡ ¹0

=

Xn ¡ ¹0

 

 

 

 

 

 

 

 

 

 

n

 

Sn=pn

¾=pn

 

=

 

 

 

 

Z ¢ Sn = Z=qSn2 2

 

 

 

¾

 

 

 

 

 

 

 

 

q

p ¢ ¾= pn Sn= n

= Z= [(n ¡ 1)Sn2 2] =(n ¡ 1):

Using Theorem 9.1, we see that Tn can be written in the form

Tn = pZ ;

Y=º

where Z » Normal(0; 1) and Y » Â2(º). If Z and Y are independent random variables, then it follows from De¯nition 4.7 that Tn » t(n ¡ 1).

Both Z and Y = (n ¡ 1)Sn2 2 depend on X1; : : : ; Xn, so one would be inclined to think that Z and Y are dependent. This is usually the case, but it turns out that they are independent if X1; : : : ; Xn » Normal(¹; ¾2). This is another remarkable property of normal distributions, usually stated as follows:

Theorem 9.2 If X1; : : : ; Xn »

 

 

 

 

 

 

 

2

 

¹

2

are inde-

Normal(¹; ¾

 

), then Xn

and Sn

pendent random variables.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The result that interests us can then be summarized as follows:

 

Corollary 9.1 If X1; : : : ; Xn » Normal0; ¾2), then

 

 

 

 

 

¹

 

 

 

 

 

 

 

 

 

 

 

 

 

T

 

=

 

Xn ¡ ¹0

 

t(n

 

 

1):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

Sn=pn

»

¡

 

 

 

 

 

 

 

 

 

 

 

Now let

 

 

 

 

 

 

n ¡ ¹0

 

 

 

 

 

 

 

 

 

 

t

 

=

;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

sn=pn

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

186

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

the observed value of the test statistic Tn. The signi¯cance probability is

P = P (jTnj ¸ jtnj) = 2P (Tn ¸ jtnj) :

In this case, the test that rejects H0 if and only if P · ® is called Student's 1-sample t-test. Because it is rarely the case that the population variance is known when the population mean is not, Student's 1-sample t-test is used much more frequently than the 1-sample z-test. We will use the S-Plus function pt to compute signi¯cance probabilities for Student's 1-sample t- test, as illustrated in the following examples.

Example 1 Test H0 : ¹ = 0 vs. H1 : ¹ 6= 0, a 2-sided alternative.

² Suppose that n = 25 and that we observe x¹ = 1 and s = 3.

p

 

:

 

² Then t = (1 ¡ 0)=(3=

25) = 1:67 and the 2-tailed signi¯cance proba-

bility is computed using both tails of the t(24) distribution, i.e. P =

:

2 ¤ pt(¡1:67; df = 24) = 0:054.

Example 2 Test H0 : ¹ · 0 vs. H1 : ¹ > 0, a 1-sided alternative.

² Suppose that n = 25 and that we observe x¹ = 2 and s = 5.

p

² Then t = (2 ¡ 0)=(5= 25) = 2:00 and the 1-tailed signi¯cance prob-

ability is computed using one tail of the t(24) distribution, i.e. P =

:

1 ¡ pt(2:00; df = 24) = 0:028.

9.1.3 Interval Estimation

As in Section 8.5, we will derive con¯dence intervals from tests. We imagine testing H0 : ¹ = ¹0 versus H1 : ¹ 6= ¹0 for every ¹0 2 (¡1; 1). The ¹0 for which H0 : ¹ = ¹0 is rejected are implausible values of ¹; the ¹0 for which H0 : ¹ = ¹0 is accepted constitute the con¯dence interval. To accomplish this, we will have to derive the critical values of our tests. A signi¯cance level of ® will result in a con¯dence coe±cient of 1 ¡ ®.

Case 1: The Population Variance is Known

If ¾ is known, then we reject H0 : ¹ = ¹0 if and only if

P = P¹0

¡¯X¹n ¡ ¹0

¯

¸ jx¹n ¡ ¹0j¢ = 2© (¡jznj) · ®;

 

¯

¯

 

9.1. THE NORMAL 1-SAMPLE LOCATION PROBLEM

187

where zn = (¹xn ¡¹0)=(¾=pn). By the symmetry of the normal distribution, this condition is equivalent to the condition

1 ¡ © (¡jznj) = P (Z > ¡jznj) = P (Z < jznj) = © (jznj) ¸ 1 ¡ ®=2;

where Z » Normal(0; 1), and therefore to the condition jznj ¸ qz, where qz denotes the 1 ¡ ®=2 quantile of Normal(0; 1). The quantile qz is the critical value of the two-sided 1-sample z-test. Thus, given a signi¯cance level ® and

a corresponding critical value qz, we reject H0 : ¹ = ¹0 if and only if (i®)

 

¯

¾=¡pn

 

¯

= jznj ¸ qz

 

¯

n

 

¹0

¯

 

 

 

 

 

 

 

 

¯

 

 

¹

 

 

¯

 

q

¾=p

 

 

 

 

n

 

0

 

 

n

 

¯

 

¡

 

 

¯

 

z

 

 

 

 

 

 

 

j

 

 

 

 

j ¸

 

 

 

 

 

 

 

 

¹0 62x¹n ¡ qz¾=p

n; x¹n + qz¾=p

 

 

n

and we conclude that the desired¡

set of plausible values is¢ the interval

µ ¾ ¾ n ¡ qz pn; x¹n + qz pn :

Notice that both the preceding derivation and the resulting con¯dence interval are identical to the derivation and con¯dence interval in Section 8.5. The only di®erence is that, because we are now assuming that X1; : : : ; Xn » Normal(¹; ¾2) instead of relying on the Central Limit Theorem, no approximation is required.

Example 3 Suppose that we :desire 90% con¯dence about ¹ and ¾ = 3 is known. Then ® = :10 and qz = 1:645. Suppose that we draw n = 25 observations and observe x¹n = 1. Then

3

 

1 § 1:645p25

= 1 § :987 = (:013; 1:987)

is a :90-level con¯dence interval for ¹.

Case 2: The Population Variance is Unknown

If ¾ is unknown, then it must be estimated from the sample. The reasoning in this case is the same, except that we rely on Student's 1-sample t-test.

As before, we use Sn2 to estimate ¾2. The critical value of the 2-sided 1-sample t-test is qt, the 1¡®=2 quantile of a t distribution with n¡1 degrees

of freedom, and the con¯dence interval is

:

µn ¡ qt pn

; x¹n + qt pn

sn

sn

 

188

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

Example 4 Suppose that we desire 90% con¯dence about ¹ and ¾ is

unknown. Suppose that we draw n = 25 observations and observe x¹n = 1

:

and s = 3. Then tq = qt(:95; df = 24) = 1:711 and

p

1 § 1:711 £ 3= 25 = 1 § 1:027 = (¡:027; 2:027)

is a 90% con¯dence interval for ¹. Notice that the con¯dence interval is larger when we use s = 3 instead of ¾ = 3.

9.2. THE GENERAL 1-SAMPLE LOCATION PROBLEM

189

9.2The General 1-Sample Location Problem

²Assume that X1; : : : ; Xn » P .

²Since P is not assumed to be symmetric, we must decide which location parameter is of interest. Because the population mean may not exist, we usually are interested in inferences about the population median

M.

²We assume only that the Xi are continuous random variables.

9.2.1Point Estimation

² The (only) natural estimator of the population median M is the sample

~

median Xn.

9.2.2Hypothesis Testing

²As before, we initially consider testing a 2-sided alternative, H0 : M = M0 vs. H1 : M 6= M0.

²

~

, i.e. approx-

Under H0, we would expect to observe Xn = x~n near M0

imately half the data above M0 and half the data below M0.

²Let p+ = PH0 (Xi > M0) and p¡ = PH0 (Xi < M0). Because the Xi are continuous, p+ = p¡ = :5. Thus, observing if Xi is greater or less than M0 is equivalent to tossing a fair coin, i.e. performing a Bernoulli trial.

²The Sign Test is the following procedure:

Соседние файлы в предмете Социология