Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

An Introduction To Statistical Inference And Data Analysis

.pdf
Скачиваний:
2
Добавлен:
10.07.2022
Размер:
943.46 Кб
Скачать

190

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

{Let x1; : : : ; xn denote the observed sample. Because the Xi are continuous, P (Xi = M0) = 0 and we ought not to observe any xi = M0. In practice, of course, this may happen. For the moment, we assume that it has not.

{Let

S+ = #fXi > M0g = #fXi ¡ M0 > 0g

be the test statistic. Under H0, S+ » Binomial(n; :5).

{ Let

µjS+ ¡ 2 j ¸ js+ ¡

2 j

p = PH0

 

 

n

 

n

 

be the signi¯cance probability.

{The sign test rejects H0 : M = M0 if and only if p · ®.

{For small n, we compute p using a table of binomial probabilities; for larger n, we use the normal approximation. Both techniques will be explained in the examples that follow.

²We now consider three strategies for dealing with the possibility that several xi = M0. We assume that these observations represent only a small fraction of the sample; otherwise, the assumption that the Xi are continuous was not warranted.

{The most common practice is to simply discard the xi = M0 before performing the analysis. Notice, however, that this is discarding evidence that supports H0, thereby increasing the probability of a Type I error, so this is a somewhat risky course of action.

{Therefore, it may be better to count half of the xi = M0 as larger than M0 and half as smaller. If the number of these observations is odd, then this will result in a non-integer value of the test statistic S+. To compute the signi¯cance probability in this case, we can either rely on the normal approximation, or compute two

p-values, one corresponding to S+ + :5 and one corresponding to

S+ ¡ :5.

{Perhaps the most satisfying solution is to compute all of the significance probabilities that correspond to di®erent ways of counting the xi = M0 as larger and smaller than M0. Actually, it will su±ce to compute the p-value that corresponds to counting all of the xi = M0 as larger than M0 and the p-value corresponds to

9.2. THE GENERAL 1-SAMPLE LOCATION PROBLEM

191

counting all of the xi = M0 as smaller than M0. If both of these p-values are less than (or equal to) the signi¯cance level ®, then clearly we will reject H0. If neither is, then clearly we will not. If one is and one is not, then we will declare the evidence to be equivocal.

²Example 2.3 from Gibbons

{Suppose that we want to test H0 : M = 10 vs. H1 : M 6= 10 at signi¯cance level ® = :05.

{Suppose that we observe the following sample:

9.8

10.1

9.7

9.9

10.0

10.0

9.8

9.7

9.8

9.9

{Note the presence of ties in the data, suggesting that the measurements should have been made (or recorded) more precisely. In particular, there are two instances in which xi = M0.

{If we discard the two xi = 10, then n = 8, s+ = 1, and

p= P (jS+ ¡ 4j ¸ j1 ¡ 4j = 3)

=P (S+ · 1 or S+ ¸ 7)

=2P (S+ · 1)

=2 £ :0352 = :0704;

from Table F in Gibbons (see handout).

{Since p = :0704 > :05 = ®, we decline to reject H0.

²Example 2.4 from Gibbons

{Suppose that we want to test H0 : M · 625 vs. H1 : M > 625 at signi¯cance level ® = :05.

{Suppose that we observe the following sample:

612 619 628 631 640 643 649 655 663 670

{ Here, n = 10, s+ = 8, and

p = P (S+ ¸ 8) = P (S+ · 2) = :0547;

from Table F in Gibbons (see handout).

{ Since p = :0547 > :05 = ®, we decline to reject H0.

192

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

²If n > 20, then we use the normal approximation to the binomial

distribution. Since S+ » Binomial(n; :5), S+ has expected value :5n and standard deviation :5pn. The normal approximation is

: µ k ¡ :5 ¡ :5nP (S+ ¸ k) = P Z ¸ :5pn ;

where Z » N(0; 1).

² Example 2.4 (continued):

µ

 

 

 

¡:5p10

 

+ ¸

 

 

¸

 

P (S

 

8) =: P

 

Z

 

8

:5 ¡ 5

=: 1:58

=: :0571:

 

 

 

 

 

 

 

 

 

 

 

²Notice that the sign test will produce a maximal signi¯cance probability of p = 1 when S+ = S¡ = :5n. This means that the sign test is least likely to reject H0 : M = M0 when M0 is a median of the sample. Thus, using the sign test for testing hypotheses about population medians corresponds to using the sample median for estimating population medians, just as using Student's t-test for testing hypotheses about population means corresponds to using the sample mean for estimating population means.

²One consequence of the previous remark is that, when the population mean and median are identical, the \Pitman e±ciency" of the sign test to Student's t-test equals the asymptotic relative e±ciency of the sample median to the sample median. For example, using the sign test on normal data is asymptotically equivalent to randomly discarding 36% of the observations, then using Student's t-test on the remaining 64%.

9.2.3Interval Estimation

²We want to construct a (1 ¡ ®)-level con¯dence interval for the pop-

ulation median M. We will do so by determining for which M0 the level-® sign test of H0 : M = M0 vs. H1 : M 6= M0 will accept H0.

²Suppose that we have ordered the data:

x(1) < x(2) < ¢¢¢ < x(n¡1) < x(n)

9.2. THE GENERAL 1-SAMPLE LOCATION PROBLEM

193

²The sign test rejects H0 : M = M0 if jS+ ¡ :5nj is large, i.e. H0 will be accepted if M0 is such that the numbers of observations above and below M0 are roughly equal.

²Suppose that

P (S+ · k) = P (S+ ¸ n ¡ k) = ®=2:

For n · 20, we can use Table F to determine pairs of (®; k) that satisfy this equation. Notice that only certain ® are possible, so that we may not be able to exactly achieve the desired level of con¯dence.

² Having determined an acceptable (®; k), the sign test would accept H0 : M = M0 at level ® if and only if

x(k+1) < M0 < x(n¡k);

hence, a (1 ¡ ®)-level con¯dence interval for M is

(x(k+1); x(n¡k)):

²Remark: Since there is no ¯xed M0 when constructing a con¯dence interval, we always use all of the data.

²Example 2.4 in Gibbons (continued): From Table F,

P (S+ · 2) = P (S+ ¸ 8) = :0547;

hence, a (1 ¡ 2 £ :0547) = :8906-level con¯dence interval for M is (628; 655).

²For n > 20, we can use the normal approximation to the binomial to determine k.

{ If we specify ® in

µ

¸

 

¡

:5pn

 

 

 

 

+ ¸

 

¡

 

 

2

 

P (S

 

n

 

k) = P Z

 

n

 

k ¡ :5

¡ :5n

= z =

®

;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

194

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

then

k= :5(n ¡ 1 ¡ zpn):

{For example, ® = :05 entails z = 1:96. If n = 100, then

p

k = :5(100 ¡ 1 ¡ 1:96 100) = 39:7 and the desired con¯dence interval is approximately

(x(41); x(60));

which is slightly liberal, or

(x(40); x(61));

which is slightly conservative.

9.3The Symmetric 1-Sample Location Problem

²Assume that X1; : : : ; Xn » P .

²We assume that the Xi are continuous random variables with symmetric pdf f. Let µ denote the center of symmetry. Note, in particular, that µ = M, the population median.

9.3.1Hypothesis Testing

²As before, we initially consider testing a 2-sided alternative, H0 : µ = µ0 vs. H1 : µ = µ0.

²Let Di = Xi ¡ µ0. Because the Xi are continuous, P (Di = 0) = 0 and P (jDij = jDjj) = 0 for i 6= j. Therefore, we can rank the absolute di®erences as follows:

jDi1 j < jDi2 j < ¢¢¢ < jDin j:

Let Ri denote the rank of jDij.

9.3. THE SYMMETRIC 1-SAMPLE LOCATION PROBLEM

195

²The Wilcoxon Signed Rank Test is the following procedure:

{Let x1; : : : ; xn denote the observed sample and let di = xi ¡ µ0. Initially, we assume that no di = 0 or jdij = jdjj were observed.

{We de¯ne two test statistics,

X

T+ =

k;

Dik >0

the sum of the \positive ranks," and

X

T¡ =

k;

Dik <0

the sum of the \negative ranks."

{ Notice that

Xn

T+ + T¡ = k = n(n + 1)=2;

k=1

so that it su±ces to consider only T+ (or T¡, whichever is more convenient).

{ By symmetry, under H0 : µ = µ0 we have

ET+ = ET¡ = n(n + 1)=4:

{The Wilcoxon signed rank test rejects H0 if and only if we observe T+ su±cently di®erent from ET+, i.e. if and only if

p= PH0 (jT+ ¡ n(n + 1)=4) ¸ jt+ ¡ n(n + 1)=4j) · ®:

²For n · 15, we can compute the signi¯cance probability p from Table G in Gibbons.

²Example 3.1 from Gibbons

{Suppose that we want to test H0 : M = 10 vs. H1 : M 6= 10 at signi¯cance level ® = :05.

{Suppose that we observe the following sample:

196

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

xi di ri

9.83-.17 7

10.09.09 3

9.72-.28 10

9.87-.13 5

10.04.04 1

9.95-.05 2

9.82-.18 6

9.73-.27 9

9.79-.21 8

9.90 -.10 4

{ Then n = 10, ET+ = 10(11)=4 = 27:5, t+ = 3 + 1 = 4, and

p= P (jT+ ¡ 27:5j ¸ j4 ¡ 27:5j = 23:5)

=P (T+ · 4 or T+ ¸ 51)

=2P (T+ · 4)

=2 £ :007 = :014;

from Table G in Gibbons (see handout).

{Since p = :014 < :05 = ®, we reject H0.

²For n ¸ 16, we convert T+ to standard units and use the normal approximation:

{Under H0 : µ = µ0, ET+ = n(n + 1)=4, and

VarT+ = n(n + 1)(2n + 1)=24:

{ For n su±ciently large,

T+ ¡ ET+

Z = pVarT+ »N(0; 1):

{ In the above example,

VarT+ = 10(11)(21)=24 = 96:25

and

z =

t+ ¡ ET+

=

4 ¡ 27:5

=:

¡

2:40;

 

 

 

 

 

 

 

pVarT+

p96:5

 

9.3. THE SYMMETRIC 1-SAMPLE LOCATION PROBLEM

197

which gives an approximate signi¯cance probability of

p= 2P (Z · z = ¡2:40)

=2 [:5 ¡ P (0 · Z < 2:40)]

=2(:5 ¡ :4918)

=:0164:

²Ties. Now suppose that the jdij > 0, but not necessarily distinct. If the number of ties is small, then one can perform the test using each possible ordering of the jdij. Otherwise:

{If several jDij are tied, then each is assigned the average of the ranks to be assigned to that set of jDij. These ranks are called midranks. For example, if we observe jdij = 8; 9; 10; 10; 12, then the midranks are ri = 1; 2; 3:5; 3:5; 5.

{We then proceed as above using the midranks. Since Table G was calculated on the assumption of no ties, we must use the normal approximation. The formula for ET+ is identical, but the formula for VarT+ becomes more complicated.

{Suppose that there are J distinct values of jDij. Let uj denote the number of jDij equalling the jth distinct value. Then

 

n(n + 1)(2n + 1)

1

J

 

 

 

 

jX

VarT+ =

24

¡ 48

(uj3 ¡ uj):

 

 

 

 

=1

{Notice that, if uj = 1 (as typically will be the case for most of the values), then u3j ¡ uj = 0.

²If any di = 0, i.e. if any xi = µ0, then we can adopt any of the strategies that we used with the sign test when we observed xi = M0.

9.3.2Point Estimation

²

^

for which

We derive an estimator µ of µ by determining the value of µ0

the Wilcoxon signed rank test is least inclined to reject H0 : µ = µ0 in favor of H1 : µ 6= µ0. Our derivation relies on a clever trick.

² Suppose that

x(1) < ¢¢¢ < x(k) < µ0 < x(k+1) < ¢¢¢ < x(n):

198

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

² Notice that, if i · j · k, then

 

³x(i) ¡ µ0´ + ³x(j) ¡ µ0´ < 0:

² For j = k + 1; : : : ; n,

rj

= rank of dj = jx(j) ¡ µ0j

=# fi : i · j; jx(i) ¡ µ0j · jx(j) ¡ µ0jg

=# fi : i · j; ¡(x(i) ¡ µ0) · x(j) ¡ µ0g

=# fi : i · j; (x(i) ¡ µ0) + (x(j) ¡ µ0) ¸ 0g.

²Therefore,

t+ = rk+1 + ¢¢¢rn

=# fi · j; (x(i) ¡ µ0) + (x(j) ¡ µ0) ¸ 0g

=# fi · j; (xi ¡ µ0) + (xj ¡ µ0) ¸ 0g

²We know that H0 : µ = µ0 is most di±cult to reject if t+ = ET+ = n(n + 1)=2. From our new representation of t+, this occurs when half of the (xi ¡µ0) + (xj ¡µ0) are positive and half are negative; i.e. when 2µ0 is the median of the pairwise sums (xi + xj); i.e. when µ0 is the median of the pairwise averages (xi + xj)=2.

² The pairwise averages (xi + xj)=2, for 1 · i · j · n, are sometimes

 

^

 

called the Walsh averages. The estimator µ of µ that corresponds to

 

the Wilcoxon signed rank test is the median of the Walsh averages.

²

^

The following table reports the asymptotic relative e±ciency of µ to

¹

 

 

 

X for estimating the center of symmetry of several symmetric distri-

butions.

 

 

 

 

 

 

 

Family

 

ARE

 

Normal

 

:

 

3=¼ = :955

 

Logistic

¼2

:

 

=9 = 1:097

 

Double Exponential

 

1:5

 

Uniform

 

1:0

 

¾2 < 1

 

¸ :864

 

 

 

9.3.3Interval Estimation

²We construct a (1 ¡ ®)-level con¯dence interval for µ by including µ0 in the interval if and only if the Wilcoxon signed rank test accepts H0 : µ = µ0 vs. H1 : µ 6= µ0 at signi¯cance level ®. As we found

9.3. THE SYMMETRIC 1-SAMPLE LOCATION PROBLEM

199

when deriving con¯dence intervals from the sign test, not all levels are possible.

²From the preceding section, we know that we can represent the test statistic T+ as the number of Walsh averages that exceed µ0. Because we reject if this number is either too large or too small, we accept if there are su±cient numbers of Walsh averages below and above µ0. Hence, the desired con¯dence interval must consist of those µ0 for which at least k¡1 Walsh averages are · µ0 and at least k¡1 Walsh averages are ¸ µ0. The number k is determined by the level of con¯dence that is desired.

²For example, suppose that we desire the level of con¯dence to be 1¡® = :90, so that ®=2 = :05.

{ Suppose that we observe n = 8 values: -1 2 3 4 5 6 9 13

{ The n(n + 1)=2 = 36 Walsh averages are:

-1

2

3

4

5

6

9

13

.5

2.5

3.5

4.5

5.5

7.5

11

 

1

3

4

5

7

9.5

 

 

1.5

3.5

4.5

6.5

9

 

 

 

2

4

6

8.5

 

 

 

 

(2.5)

5.5

(8)

 

 

 

 

 

4

7.5

 

 

 

 

 

 

4

 

 

 

 

 

 

 

{For n = 8 in Table G, p = P (T+ · 6) = P (T+ ¸ 30) = :055. Hence, we would reject H0 : µ = µ0 at ® = :11 if and only if · 6 Walsh averages are · µ0 or ¸ 30 Walsh averages are · µ0.

{Hence, the :89-level con¯dence interval for µ should have a lower endpoint equal to the (k = 7)th Walsh average and an upper endpoint equal to the (n(n + 1)=2 + 1 ¡k = 30)th Walsh average. By inspection, the con¯dence interval is [2:5; 8:0]. Notice that the endpoints are included.

²For n ¸ 16, we can use the normal approximation to determine k. The

formula is

p

 

k = 0:5 + ET+ ¡ z®=2 VarT+:

Соседние файлы в предмете Социология