Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Using and Understanding Medical Statistics_Matthews, Farewell_2007

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
3.03 Mб
Скачать

pothesis for Fisher’s test may be expressed concisely as the statement H0: p1 = p2, i.e., the probability of success is the same for Groups 1 and 2.

The letter T in the upper left corner of the contingency table shown in table 3.1 represents the total number of successes observed in Group 1. The variable T is the test statistic for the observed data which we referred to in §2.3. Notice that since the numbers of individuals from Groups 1 and 2 are known (R1 and R2, respectively), as are the numbers of successes and failures (C1 and C2, respectively), the remaining three entries in the table can be obtained by subtraction from row or column totals once the value of T has been determined. In effect, when the row and column totals are known, T determines the split of the successes between Groups 1 and 2; this is the principal reason that T is chosen as the test statistic for the significance test.

To determine the set of comparable events, we must consider all the possible 2 ! 2 tables with row totals R1, R2 and column totals C1, C2 which might have been obtained. These can be identified by allowing the value of T to vary, beginning with the smallest possible value it can assume; this will usually be 0. Figure 3.1 displays four of these tables, provided both R1 and C1 are at least three and C1 is at most R2. As we saw in the example which was discussed in §2.2, these tables can be conveniently labelled using the value which the test statistic, T, assumes in the table. Clearly, the last table in the list will be one having a value of T which is equal to the smaller of R1 and C1.

To obtain the significance level of the test, we need the probability that, for known values of R1, R2 and C1, the split of successes between Groups 1 and 2 is T and C1 – T, respectively. To calculate this probability, we must assume that the null hypothesis is true, i.e., that p1 = p2, and also that any particular set of T successes in Group 1 and C1 – T in Group 2 is equally likely to have occurred. Certain fairly simple mathematical arguments lead to the formula

R1 N R1

t C1 t

N

C1

for the probability that table t, i.e., the 2 ! 2 table with t successes in Group 1 and C1 – t in Group 2, would be observed if these assumptions are true. The symbols (Rt 1 ), (N–RC1–t1 ) and (CN1 ) used in this expression are called binomial coefficients. The binomial coefficient (nj ) can be evaluated using the formula

n

 

n(n 1)(n 2)…(n j+2)(n j+1)

 

 

 

 

 

 

 

=

 

.

 

j

 

j(j 1)(j 2)…(2)(1)

 

 

 

 

However, statistical tables or software are normally used to determine the significance level of Fisher’s test. In any case, the most important aspect to re-

Details of the Test

21

Table 0

 

Success

Failure

Total

 

 

 

 

Group 1

0

R1

R1

Group 2

C1

R2 – C1

R2

Total

C1

C2

N

 

 

 

 

Table 1

 

 

 

 

 

 

 

Group 1

1

R1 – 1

R1

Group 2

C1 – 1

R2 – C1 + 1

R2

Total

C1

C2

N

 

 

 

 

Table 2

 

 

 

 

 

 

 

Group 1

2

R1 – 2

R1

Group 2

C1 – 2

R2 – C1 + 2

R2

Total

C1

C2

N

 

 

 

 

Table 3

 

 

 

 

 

 

 

Group 1

3

R1 – 3

R1

Group 2

C1 – 3

R2 – C1 + 3

R2

Total

C1

C2

N

 

 

 

 

Fig. 3.1. Four possible 2 ! 2 tables having row totals R1, R2 and column totals C1, C2.

member is that the probability corresponding to each possible table, i.e., each value of the test statistic T, can be calculated.

Once the numerical values of the probability distribution of T have been determined, there is a simple ranking for all the possible tables. This is based on the value of the probability corresponding to each table. If t1 and t2 are two possible values of T and the probability corresponding to table t1 is greater than the probability for table t2, then we say that table t1 is more consistent with the null hypothesis that p1 = p2 than table t2. On this basis we can quickly rank all the tables, i.e., all possible values of the test statistic T.

3

Fisher’s Test for 2 ! 2 Contingency Tables

22

Table 3.2. The results of an experiment comparing the antitumor activity of two drugs in leukemic mice

 

Complete remission

Total

 

yes

no

 

 

 

 

 

Methyl GAG

7

3

10

6-MP

2

7

9

 

 

 

 

Total

9

10

19

 

 

 

 

To complete the test, we must calculate the significance level of the data with respect to the null hypothesis. From the original 2 ! 2 table which we observed and the corresponding value of T, we can determine the position of the observed 2 ! 2 table in the ranking. Then, by summing the probabilities for possible tables whose rankings are no more consistent than the observed 2 ! 2 table, we obtain the significance level of the data with respect to the null hypothesis that p1 = p2.

To illustrate each aspect of Fisher’s test in a specific case, we consider the following example. Two drugs, methyl GAG and 6-MP, were screened in a small experiment to determine which, if either, demonstrated greater anti-tu- mor activity in leukemic mice. Ten mice received methyl GAG and nine were treated with 6-MP. When the experiment was ended, seven of the nine mice which had achieved complete remission belonged to the methyl GAG group. Table 3.2 summarizes the results of the experiment as a 2 ! 2 contingency table.

In this particular case, mice in complete remission represent observed successes and the null hypothesis for this set of data states that p1 = p2, i.e., the probability of complete remission is the same for both drugs. The observed value of the test statistic, T, is seven, the number of complete remissions observed in mice treated with methyl GAG. Since nine, the total number of complete remissions observed, is the largest number that might have been obtained in the methyl GAG group, there are ten possible 2 ! 2 tables, corresponding to the values of T from zero through nine. These tables are displayed in figure 3.2, and the probability distribution for T is given in table 3.3. From the probability distribution for T, we can determine the ranking of the ten possible tables. In this particular case it turns out that table 5 is most consistent with the null hypothesis and table 0 is the least consistent of all 10. Notice that the observed result, table 7, is fifth in the ranking, followed, in order, by tables 2, 8, 1, 9 and 0. Therefore, since tables 7, 2, 8, 1, 9 and 0 are each no more consis-

Details of the Test

23

Table 0

 

 

Complete remission

Total

 

 

 

 

 

 

 

yes

no

 

 

 

 

 

Methyl GAG

0

10

10

6- MP

9

0

9

 

 

 

 

 

Total

9

10

19

 

 

 

 

 

Table 2

 

 

 

 

 

 

 

 

Methyl GAG

2

8

10

6-MP

7

2

9

 

 

 

 

 

Total

9

10

19

 

 

 

 

 

Table 4

 

 

 

 

 

 

 

 

Methyl GAG

4

6

10

6-MP

5

4

9

 

 

 

 

 

Total

9

10

19

 

 

 

 

 

Table 6

 

 

 

 

 

 

 

 

Methyl GAG

6

4

10

6-MP

3

6

9

 

 

 

 

 

Total

9

10

19

 

 

 

 

 

Table 8

 

 

 

 

 

 

 

 

Methyl GAG

8

2

10

6-MP

1

8

9

 

 

 

 

 

Total

9

10

19

 

 

 

 

 

Table 1

Complete remission

Total

 

 

 

yes

no

 

 

 

 

1

9

10

8

1

9

 

 

 

9

10

19

 

 

 

Table 3

 

 

 

 

 

3

7

10

6

3

9

 

 

 

9

10

19

 

 

 

Table 5

 

 

 

 

 

5

5

10

4

5

9

 

 

 

9

10

19

 

 

 

Table 7

 

 

 

 

 

7

3

10

2

7

9

 

 

 

9

10

10

 

 

 

Table 9

 

 

 

 

 

9

1

10

0

9

9

 

 

 

9

10

19

 

 

 

Fig. 3.2. The ten possible 2 ! 2 tables having row totals 10, 9 and column totals 9, 10.

3

Fisher’s Test for 2 ! 2 Contingency Tables

24

Table 3.3. The probability of observing each of the tables shown in figure 3.2 if the null hypothesis H0:p1 = p2 is true

T (table #)

Probability

T (table #)

Probability

 

 

 

 

0

0.00001

5

0.3437

1

0.0009

6

0.1910

2

0.0175

7

0.0468

3

0.1091

8

0.0044

4

0.2864

9

0.00019

 

 

 

 

tent with the null hypothesis than the observed result (table 7), the significance level of the test is obtained by summing the probabilities for these six tables. The sum, 0.0698, is therefore the significance level of the data with respect to the null hypothesis that the probability of complete remission is the same for both drugs. And since the significance level is roughly 0.07, we conclude that there is no substantial evidence in the data to contradict the null hypothesis.

On the basis of this small experiment, we would conclude that the antitumor activity of methyl GAG and 6-MP in leukemic mice is apparently comparable, although further investigation might be justified. In chapter 11 we consider the related problem of estimating rates for events of interest such as the occurrence of complete remission in this study.

3.3. Additional Examples of Fisher’s Test

In previous editions, this section introduced specialized statistical tables that one could use to carry out Fisher’s test. Although these tables are still appropriate, the widespread availability of modern statistical software packages that routinely calculate exact or approximate significance levels for various statistical tests selected by the package user has largely eliminated the routine use of statistical tables. Consequently, we have chosen to replace the specialized statistical tables for Fisher’s test by additional examples to reinforce the key aspects that characterize the use and interpretation of this commonly occurring method of assessing whether two binary classification schemes are associated in a population of interest.

As part of an experiment to investigate the value of infusing stored, autologous bone marrow as a means of restoring marrow function, a researcher administered Myleran to 15 dogs. Nine were then randomized to the treatment group and received an infusion of bone marrow, while the remaining six dogs

Additional Examples of Fisher’s Test

25

Table 3.4. Treatment and survival status data for 15 dogs insulted with Myleran

Bone marrow infusion

30-day survival status

Total

 

yes

no

 

 

 

 

 

No (control)

1

5

6

Yes (treatment)

9

0

9

 

 

 

 

Total

10

5

15

 

 

 

 

Table 3.5. Treatment and follow-up examination status data for 73 patients with simple urinary tract infection

Treatment regime

Urine culture status at follow-up

Total

 

 

negative

positive

 

 

 

 

 

 

Single dose

25

8

33

Multiple dose

35

5

40

 

 

 

 

 

Total

60

13

73

 

 

 

 

 

 

served as a control group. The experiment was ended after 30 days and the results are presented in table 3.4.

In this context, a natural question to ask is whether the probability of 30day survival is the same in both groups of dogs, i.e., no treatment effect. By appropriately specifying the use of Fisher’s exact test (which is how it is usually identified in most statistical packages) in connection with this 2 ! 2 contingency table, we learn that the corresponding significance level for these data is 0.002. The obvious conclusion is that the probability of 30-day survival is significantly higher in dogs that receive an infusion of stored, autologous bone marrow.

Backhouse and Matthews [2] describe an open randomized study concerning the efficacy of treating simple urinary tract infections with a single dose (600 mg) of enoxacin, a new antibacterial agent, compared with 200 mg of the same drug twice a day for three consecutive days. Of the 73 patients who had confirmed pretreatment bacterial urine cultures, 33 received a single dose of enoxacin and 40 were randomized to the multiple-dose treatment. The observed results at a follow-up examination are summarized in table 3.5.

3

Fisher’s Test for 2 ! 2 Contingency Tables

26

The obvious hypothesis of interest is whether the probability of a negative urine culture ten days following the initiation of treatment with enoxacin is the same for both groups of patients. The significance level of Fisher’s exact test for these data is 0.727, indicating that, from the statistical point of view, there is no evidence to contradict the view that both a single dose of enoxacin and the multiple-dose regime are equally effective treatments for simple urinary tract infections.

Available software may well be able to cope with the extensive calculations involved in evaluating exact significance levels in Fisher’s test where the sample sizes in the two groups exceed 50. However, in larger samples it is more likely that the software package defaults to an approximate version of Fisher’s test which is usually adequate. Therefore, in chapter 4 we intend to describe in detail the approximate test for contingency tables.

Additional Examples of Fisher’s Test

27

4

U U U U U U U U U U U U U U U U U U U U U U U U U U U

Approximate Significance Tests for Contingency Tables

4.1. Introduction

Fisher’s test, which we discussed in chapter 3, evaluates the exact significance level of the null hypothesis that the probability of success is the same in two distinct groups. Ideally, the exact significance level of this test is what we would prefer to know in every situation. However, unless the associated calculations have been very carefully programmed, they may be erroneous if the sample sizes are large. In such situations, accurate approximations for calculating the significance level of the test that have been used for decades are frequently the default action in modern software packages. The approximate version of Fisher’s test which we discuss in the following section is known as the2 (chi-squared) test for 2 ! 2 tables. Another merit of this approximate version is that it helps to elucidate the nature of the comparisons that Fisher’s test involves. The same approximation also applies to generalizations of Fisher’s test involving classification schemes with more than two categories and more than two outcomes. Thus, after discussing the simplest version of the approximation to Fisher’s test in §4.2, we intend to introduce approximate significance tests for rectangular contingency tables in §4.3.

4.2. The 2 Test for 2 ! 2 Tables

Suppose that a 2 ! 2 table, such as the one shown in table 4.1, has row and column totals which are too large to be used easily in determining the significance level of Fisher’s test. For reasons of convenience in describing the 2 test, we have chosen to label the entries in the 2 ! 2 table O11, O12, O21 and O22 (see table 4.1). The symbol O11 represents the observed number of successes in Group 1. If we call ‘success’ category I and ‘failure’ category II, then O11 is the

Table 4.1. A 2 ! 2 table summarizing binary data collected from two groups

 

Success (I)

Failure (II)

Total

 

 

 

 

Group 1

O11

O12

R1

Group 2

O21

O22

R2

Total

C1

C2

N

 

 

 

 

number of category I observations in Group 1. Similarly, the symbol O12 is the number of category II observations (failures) in Group 1; the symbols O21 and O22 represent the corresponding category I (success) and category II (failure) totals for Group 2.

The assumptions on which the 2 test is based are the same as those for Fisher’s test. If p1 and p2 represent the probabilities of category I for Groups 1 and 2, respectively, then we are assuming that:

(a)within each group the probability of category I (success) does not vary from individual to individual,

(b)for any member of the population, the outcome that occurs (I or II) does not influence the outcome for any other individual.

Likewise, the purpose of the 2 test is the same as that of Fisher’s test, namely, to determine the degree to which the observed data are consistent with

the null hypothesis H0: p1 = p2, i.e., the probability of category I in the two groups is the same.

The basis for the 2 test is essentially this: assume that the null hypothesis, H0, is true and calculate the 2 ! 2 table which would be expected to occur based on this assumption and the row and column totals R1, R2, C1 and C2. If the observed 2 ! 2 table is similar to the expected 2 ! 2 table, the significance level

of the data with respect to H0 will be fairly large, say 0.5. However, if the observed 2 ! 2 table is very different from the expected 2 ! 2 table, the signifi-

cance level of the data with respect to H0 will be rather small, say 0.05 or less. In both cases, the approximation to Fisher’s test occurs in the calculation of the significance level. And this is precisely the point at which the calculations for

Fisher’s test become so formidable when R1, R2, C1 and C2 are quite large.

In order to illustrate the calculations which the 2 test involves, we will consider the sample 2 ! 2 table shown in table 4.2. The data are taken from Storb et al. [3] and summarize the outcomes of 68 bone marrow transplants for patients with aplastic anemia. Each patient was classified according to the outcome of the graft (Rejection, Yes or No) and also according to the size of the marrow cell dose which was used in the transplant procedure. The principal

The 2 Test for 2 ! 2 Tables

29

Table 4.2. Graft rejection status and marrow cell dose data for 68 aplastic anemia patients

Graft rejection

Marrow cell dose (108 cells/kg)

Total

 

<3.0

63.0

 

 

 

 

 

Yes

17

4

21

No

19

28

47

 

 

 

 

Total

36

32

68

 

 

 

 

Table 4.3. The 2 ! 2 table of expected values corresponding to the observed data summarized in table 4.1

 

Success (I)

Failure (II)

Total

 

 

 

 

Group 1

e11

e12

R1

Group 2

e21

e22

R2

Total

C1

C2

N

 

 

 

 

question which the data are intended to answer is whether the size of the marrow cell dose is associated with the marrow graft rejection rate.

To carry out the approximate test of significance, we need to calculate the values which would be expected in this particular sample if the null hypothesis, H0, is true. This table of expected values will have the same row and column totals as the observed 2 ! 2 table. In the 2 ! 2 table shown in table 4.3, the four entries are represented by the symbols e11, e12, e21 and e22 to distinguish them from the values in the observed 2 ! 2 table (cf. table 4.1). The meaning of the subscripts on these symbols should be fairly obvious. The symbol e11 represents the expected number of category I outcomes in Group 1, while e21 is the corresponding expected value for Group 2; likewise, e12 and e22 are the category II expected numbers for Groups 1 and 2, respectively.

The overall success rate in the observed 2 ! 2 table is C1/N. If the null hypothesis is true, this rate is a natural estimate of the common success rate for both Group 1 and Group 2. There are R1 individuals in Group 1; therefore, if the null hypothesis is true, the expected number of category I outcomes (success) in Group 1 would be

 

 

C

 

 

R

C

1

 

 

e

R

 

1

 

 

1

 

 

.

 

 

 

 

 

 

 

 

11

1

N

 

 

N

 

 

 

 

 

 

 

4 Approximate Significance Tests for Contingency Tables

30

Соседние файлы в папке Английские материалы