Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

An Introduction To Statistical Inference And Data Analysis

.pdf
Скачиваний:
2
Добавлен:
10.07.2022
Размер:
943.46 Кб
Скачать

200

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

9.4A Case Study from Neuropsychology

McGlynn and Kaszniak (1991) investigated awareness of cognitive de¯cit in patients su®ering from Alzheimer's disease (AD). They recruited 8 pairs of AD patients and their spousal caregivers (CG). An examiner described a neuropsychological task to each subject, asked the subject to predict how both he/she and his/her spouse would perform on it, and then administered the task. For this experiment:

²A unit of observation was a matched AD-CG pair.

²Six measurements were taken on each unit of observation:

ppppatient prediction of patient pscor patient score

ppc patient prediction of caregiver cscor caregiver score

cpc caregiver prediction of caregiver cpp caregiver prediction of patient

²Trosset & Kaszniak (1996) proposed Comparative Predictive Accuracy (CPA) as a measure of de¯cit unawareness:

CPA =

(ppp=pscor) ¥ (ppc=cscor)

(cpc=cscor) ¥ (cpp=pscor)

²Let Xi = log(CPAi), the logarithm of the AD de¯cit unawareness observed for pair i.

9.5. EXERCISES

201

9.5Exercises

Problem Set A The following data are from Darwin (1876), The E®ect of Crossand Self-Fertilization in the Vegetable Kingdom, Second Edition, London: John Murray. Pairs of seedlings of the same age (one produced by cross-fertilization, the other by self-fertilization) were grown together so that the members of each pair were reared under nearly identical conditions. The aim was to demonstrate the greater vigour of the cross-fertilized plants. The data are the ¯nal heights (in inches) of each plant after a ¯xed period of time. Darwin consulted Francis Galton about the analysis of these data, and they were discussed further in Ronald Fisher's Design of Experiments.

Pair

Fertilized

 

Cross

Self

 

 

 

1

23.5

17.4

2

12.0

20.4

 

21.0

20.0

3

4

22.0

20.0

 

 

18.4

5

19.1

6

21.5

18.6

 

 

18.6

7

22.1

8

20.4

15.3

9

18.3

16.5

1021.6 18.0

1123.3 16.3

1221.0 18.0

1322.1 12.8

1423.0 15.5

1512.0 18.0

1.Show that this problem can be formulated as a 1-sample location problem. To do so, you should:

(a)Identify the experimental units and the measurement(s) taken on each unit.

(b)De¯ne appropriate random variables X1; : : : ; Xn » P . Remember that the statistical procedures that we will employ assume that these random variables are independent and identically distributed.

202

CHAPTER 9. 1-SAMPLE LOCATION PROBLEMS

(c)Let µ denote the location parameter (measure of centrality) of interest. Depending on which statistical procedure we decide to use, either µ = EXi = ¹ or µ = q2(Xi). State appropriate null and alternative hypotheses about µ.

2.Does it seem reasonable to assume that the sample ~x = (x1; : : : ; xn), the observed values of X1; : : : ; Xn, were drawn from:

(a)a normal distribution? Why or why not?

(b)a symmetric distribution? Why or why not?

3.Assume that X1; : : : ; Xn are normally distributed and let µ = EXi = ¹.

(a)Test the null hypothesis derived above using Student's 1-sample t-test. What is the signi¯cance probability? If we adopt a significance level of ® = 0:05, should we reject the null hypothesis?

(b)Construct a (2-sided) con¯dence interval for µ with a con¯dence coe±cient of approximately 0:90.

4.Now we drop the assumption of normality. Assume that X1; : : : ; Xn are symmetric (but not necessarily normal), continuous random variables and let µ = q2(Xi).

(a)Test the null hypothesis derived above using the Wilcoxon signed rank test. What is the signi¯cance probability? If we adopt a signi¯cance level of ® = 0:05, should we reject the null hypothesis?

(b)Estimate µ by computing the median of the Walsh averages.

(c)Construct a (2-sided) con¯dence interval for µ with a con¯dence coe±cient of approximately 0:90.

5.Finally we drop the assumption of symmetry, assuming only that X1; : : : ; Xn are continuous random variables, and let µ = q2(Xi).

(a)Test the null hypothesis derived above using the sign test. What is the signi¯cance probability? If we adopt a signi¯cance level of ® = 0:05, should we reject the null hypothesis?

(b)Estimate µ by computing the sample median.

(c)Construct a (2-sided) con¯dence interval for µ with a con¯dence coe±cient of approximately 0:90.

Chapter 10

2-Sample Location Problems

²The title of this chapter indicates an interest in comparing the location parameters of two populations. That is, we assume that:

{X1; : : : ; Xn1 » P1 and Y1; : : : ; Yn2 » P2. Notice that we do not assume that n1 = n2.

{The Xi and the Yj are continuous random variables.

{The Xi and the Yj are mutually independent. In particular, there is no natural pairing of X1 with Y1, etc.

²We observe random samples x1; : : : ; xn1 and y1; : : : ; yn2 . From the samples, we attempt to draw an inference about the di®erence in location

of P1 and P2. This di®erence, which we will denote by ¢, is called the shift parameter. For example, we might de¯ne ¢ = ¹1 ¡¹2, where

¹1 = EXi and ¹2 = EYj.

²Each of the structures that we encountered in 1-sample problems may also be encountered in 2-sample problems. What distinguishes the two are that the units of observations are drawn from one population in the former and from two populations in the latter. The prototypical case of the latter is that of a treatment population and a control population. We now consider some examples.

²Example 1: A researcher investigated the e®ect of Alzheimer's disease (AD) on the ability to perform a confrontation naming task. She recruited 60 mildly demented AD patients and 60 normal elderly control subjects. The control subjects resembled the AD patients in that the

203

204

CHAPTER 10. 2-SAMPLE LOCATION PROBLEMS

two groups had comparable mean ages, years of education, and (estimated) IQ scores; however, the control subjects were not individually matched to the AD patients. Each person was administered the Boston Naming Test. For this experiment,

{An experimental unit is a person.

{The experimental units belong to one of two populations: AD patients or normal elderly persons.

{One measurement (BNT) is taken on each unit of observation.

{Let Xi denote the BNT score for AD patient i. Let Yj denote the BNT score for control subject j.

{Let ¹1 = EXi, ¹2 = EYj, and ¢ = ¹1 ¡ ¹2.

²Example 2: A drug is supposed to lower blood pressure. To determine if it does, n1 +n2 hypertensive patients are recruited to participate in a double-blind study. The patients are randomly assigned to a treatment group of n1 patients and a control group of n2 patients. Each patient in the treatment group receives the drug for two months; each patient in the control group receives a placebo for the same period. Each patient's blood pressure is measured before and after the two month period, and neither the patient nor the technician know to which group the patient was assigned. For this experiment,

{An experimental unit is a patient.

{The experimental units belong to one of two populations: patients receiving the drug or patients receiving the placebo.

205

{Two measurements (blood pressure before & after) are taken on each unit of observation.

{Let B1i and A1i denote the before & after blood pressures of patient i in the treatment group. Similarly, let B2j and A2j denote the before & after blood pressures of patient j in the control group.

{Let Xi = B1i ¡A1i, the decrease in blood pressure for patient i in the treatment group. Let Yj = B2j ¡ A2j, the decrease in blood pressure for patient j in the control group.

{Let ¹1 = EXi, ¹2 = EYj, and ¢ = ¹1 ¡ ¹2.

²Example 3: A graduate student decides to compare the e®ects of Parkinson's disease (PD) and multiple sclerosis (MS) on speech breathing. She recruits n1 PD patients and n2 MS patients to participate in her study. She also recruits n1 + n2 normal control (NC) subjects. Each NC subject is carefully matched to one PD or MS patient with respect to sex, age, height, and weight. The lung volume of each study participant is measured. For this experiment,

{An experimental unit is a matched pair of subjects.

{The experimental units belong to one of two populations: PD-NC pairs or MS-NC pairs.

{Two measurements (lung volume of each subject) are taken on each unit of observation.

{Let D1i and C1i denote the PD & NC lung volumes of pair i in the PD-NC group. Similarly, let D2j and C2j denote the MS & NC lung volumes of pair j in the MS-NC group.

206

CHAPTER 10. 2-SAMPLE LOCATION PROBLEMS

{Let Xi = log(D1i=C1i) = log D1i¡log C1i, the logarithm of the PD proportion of NC lung volume for pair i. Let Yj = log(D2j=C2j) = log D2j ¡log C2j, the logarithm of the MS proportion of NC lung volume for pair j.

{Let ¹1 = EXi, ¹2 = EYj, and ¢ = ¹1 ¡ ¹2.

²This chapter is subdivided into three sections:

{If the data are assumed to be normally distributed, then we will be interested in inferences about the di®erence in the population means. We will distinguish three cases, corresponding to what is known about the population variances.

{If the data are only assumed to be continuously distributed, then we will be interested in inferences about the di®erence in the population medians. We will assume a shift model, i.e. we will assume that P1 & P2 di®er only with respect to location.

{If the data are also assumed to be symmetrically distributed, then we will be interested in inferences about the di®erence in the population centers of symmetry. If we assume symmetry, then we need not assume a shift model.

10.1The Normal 2-Sample Location Problem

²Assume that X1; : : : ; Xn1 » N(¹1; ¾12) and Y1; : : : ; Yn2 » N(¹2; ¾22) are mutually independent.

²Point Estimation

10.1. THE NORMAL 2-SAMPLE LOCATION PROBLEM

207

^ ¹

¹

{ The natural estimator of ¢ = ¹1 ¡ ¹2 is ¢ = X ¡ Y .

^

 

 

 

 

 

 

{ ¢ is an unbiased, consistent, asymptotically e±cient estimator of

¢.

 

 

 

 

 

 

 

 

 

¹

»

² For Interval Estimation and Hypothesis Testing, we note that X

2

¹

 

2

 

 

 

Normal(¹1; ¾1

=n1) and Y » Normal(¹2; ¾2 =n2). It follows that

 

 

¾2

¾2

 

 

 

1

2

 

 

 

¢^ » Normal â;

 

+

 

! :

 

 

n1

n2

 

We now distinguish three cases:

1.Both ¾i are known (and possibly unequal);

2.The ¾i are unknown, but they are necessarily equal (¾1 = ¾2 = ¾); and

3.The ¾i are unknown and possibly unequal|the Behrens-Fisher problem.

10.1.1Known Variances

² Let

^ ¡

Z = r¢ ¢ » N(0; 1):

¾2 ¾2

n+ n2

1 2

²Interval Estimation. We construct a (1 ¡ ®)-level con¯dence interval for ¢ by writing

1 ¡ ® = P (jZj < z®=2)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

=

P

¢^

 

 

¢

< z®=2

 

 

 

¾12

+

¾22

 

 

 

 

 

 

 

 

 

 

¡

sn1

 

 

 

 

 

 

 

 

 

 

 

 

@j

 

 

j

 

 

 

 

 

n2 A

 

 

 

 

 

 

 

 

 

=

P

0¢^

¡

z®=2

 

¾12

+

 

¾22

 

 

< ¢ < ¢^

¡

z®=2

 

 

¾12

+

¾22

 

1

 

n2

 

 

 

 

@

 

 

 

sn1

 

 

 

 

 

 

 

sn1

n2 A

²Example. Suppose that n1 = 57, x¹ = :0167, and ¾1 = :0042; suppose that n2 = 12, y¹ = :0144, and ¾2 = :0024. Then a :95-level con¯dence interval for ¢ is

208

CHAPTER 10. 2-SAMPLE LOCATION PROBLEMS

 

(:0167 ¡ :0144) § 1:96q

 

=:

:0023 § :0017

 

:00422=57 + :00242=12

 

=

(:0006; :0040):

²Hypothesis Testing. To test H0 : ¢ = ¢0 vs. H1 : ¢ 6= ¢0, we consider the test statistic Z under the null hypothesis that ¢ = ¢0. Let z denote the observed value of Z. Then a level-® test is to reject H0 if and only if

P = P (jZj ¸ jzj) · ®;

which is equivalent to rejecting H0 if and only if

jzj ¸ z®=2:

This test is sometimes called the 2-sample z-test.

² Example (continued). To test H0 : ¢ = 0 vs. H1 : ¢ 6= 0, we compute

z =

 

(:0167 ¡ :0144) ¡ 0

=: 2:59:

p:00422=57 + :00242=12

 

 

Since j2:59j > 1:96, we reject: H0 at signi¯cance level ® = :05. (The signi¯cance probability is P = :010.)

10.1.2Equal Variances

²Let ¾ = ¾1 = ¾2. Since the common variance ¾2 is unknown, we must estimate it.

²Let

 

 

 

 

¡

 

n1

 

 

 

 

 

 

Xi

 

2

 

 

 

1

 

¹

2

S1

=

n1

 

1

=1(Xi ¡ X)

 

denote the sample variance for the Xi; let

 

 

¡

 

n2

 

 

 

 

 

jX

 

 

2

 

1

 

 

¹

2

S2 =

n2

 

1

=1(Yj ¡ Y )

 

denote the sample variance for the Yj; let

10.1. THE NORMAL 2-SAMPLE LOCATION PROBLEM

209

S2 =

(n1 ¡ 1)S12 + (n2 ¡ 1)S22

 

 

 

 

 

 

 

(n1 ¡ 1) + (n2 ¡ 1)

 

 

 

 

 

 

 

=

1

 

 

2 n1 (Xi

 

X¹ )2 +

n2

(Yj

 

Y¹ )2

3

 

¡

 

 

X

 

 

n1 + n2

2

4Xi

 

¡

 

 

 

¡

 

5

 

 

 

 

=1

 

 

 

 

 

j=1

 

 

 

 

denote the pooled sample variance; and let

 

 

 

 

 

 

 

 

 

^

 

 

 

 

 

 

 

 

 

 

T =

 

¢ ¡ ¢

 

:

 

 

 

 

 

 

 

 

 

r³n1

+ n2

´ S2

 

 

 

 

 

 

 

 

 

1

 

 

1

 

 

 

 

 

 

 

² Theorem 10.1 ES2 = ¾2 and T » t(n1 + n2 ¡ 2).

² Interval Estimation. A (1 ¡ ®)-level con¯dence interval for ¢ is

¢^ § t®=2(n1

+ n2

¡ 2)s

 

 

 

 

n1

+ n2 S:

 

 

1

1

 

 

²Example. Suppose that n1 = 57, x¹ = :0167; that n2 = 12, y¹ = :0144; and that s = :0040. Then a :95-level con¯dence interval for ¢ is approximately

(:0167 ¡ :0144) § 2:00q

 

 

:0023 § :0025

1=57 + 1=12(:0040) =:

=

(:0002; :0048):

²Hypothesis Testing. To test H0 : ¢ = ¢0 vs. H1 : ¢ 6= ¢0, we consider the test statistic T under the null hypothesis that ¢ = ¢0. Let t denote the observed value of T . Then a level-® test is to reject H0 if and only if

P = P (jT j ¸ jtj) · ®; which is equivalent to rejecting H0 if and only if

jtj ¸ t®=2(n1 + n2 ¡ 2):

This test is called Student's 2-sample t-test.

Соседние файлы в предмете Социология