Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
15
Non-Parametric Rank Tests
CONTENTS
15.1Background Concepts
15.2Basic Principles
15.3Wilcoxon Two-Group Rank-Sum Test
15.3.1Illustration of Basic Strategy
15.3.2Arrangement of Permutations
15.3.3Choice of Test Statistic
15.3.4Illustration of Procedure
15.4Mann-Whitney U Test
15.4.1Illustration of Procedure
15.4.2Simplified Procedure
15.4.3Use of the U/P Table
15.4.4Z/U Relationship for Large Groups
15.5Management of Ties
15.5.1Illustrative Example
15.5.2Assignment of Ranks
15.5.3Determination of U and Z Values
15.5.4Adjustment for Ties
15.6Wilcoxon Signed-Ranks Test
15.6.1Basic Principle
15.6.2Finding the Ranks of Increments
15.6.3Finding the Sum of Signed Ranks
15.6.4Sampling Distribution of T
15.6.5Use of T/P Table
15.7Challenges of Descriptive Interpretation
15.7.1Disadvantages of Customary Approaches
15.7.2Additional Approaches
15.7.3Comparison of Mean Ranks
15.7.4Ridit Analysis
15.7.5U Score for Pairs of Comparisons
15.7.6Traditional Dichotomous Index
15.7.7Indexes of Association
15.8Role in Testing Non-Gaussian Distributions
15.9Additional Comments
15.9.1“Power-Efficiency” of Rank Tests
15.9.2Additional Rank Tests
15.9.3Confidence Intervals
15.10Applications in Medical Literature
15.11Simple Crude Tests
15.11.1Sign Test
15.11.2Median Test
References
Exercises
© 2002 by Chapman & Hall/CRC
Empirical procedures, such as the permutation tests discussed in Chapter 12, can be regarded as nonparametric because they do not use the parametric estimates that are required for the t, Z, and chi-square tests. In customary statistical usage, however, the name non-parametric is often reserved for a different set of procedures, discussed in this chapter, that use the ranks of the observed data rather than the original values. The procedures are often called non-parametric rank tests or simply rank tests.
15.1 Background Concepts
Rank tests were introduced as a pragmatic analytic procedure because Frank Wilcoxon, a statistician working at Lederle Laboratories, was “fed up with the drudgery of computing one t statistic after another and was looking for something simpler.” 1 To get a simpler procedure for two groups, Wilcoxon converted all the dimensional values to ranks, formed an easy-to-calculate sum of ranks for each group, constructed a test statistic from the increment in the two sums, and determined P values from the sampling distribution of the test statistic.
When introduced in 1945, however, Wilcoxon’s approach2 was generally disdained, because it “wasted” information by converting precise dimensional values into ordinal ranks. The approach was labeled “rough-and-ready” or “quick-and-dirty” and was regarded as both inefficient and inferior to the parametric procedures that used Gaussian or other mathematical theories.
The main virtues of the non-parametric procedures, however, came from two advantages that Wilcoxon had not anticipated when he tried to avoid the calculational “drudgery” of t tests for dimensional data. One advantage was that rank tests could be applied directly to ordinal data that were expressed in ranks. Ordinal rating scales, although also sometimes deprecated by classical statisticians, were becoming the best and often the only mechanism for expressing medical phenomena, such as pain, distress, and dysfunction, that could not be measured dimensionally. Ordinal grades became increasingly necessary (and respectable) for such clinical ratings as stage of disease (I, II, III), baseline condition (excellent, good, fair, poor), and responses to therapy (much worse, worse, same, better, much better). Other data for ordinal analysis came from visual analog scales,3–5 which were commonly applied for rating diverse forms of severity, using dimensional-looking numbers that did not have the equi-interval attributes of measured dimensions.
The ordinal and visual analog ratings could readily be analyzed stochastically today with permutation procedures such as the Pitman-Welch test, but in the precomputer era, the non-parametric rank tests seemed an ideal way to do the job. No one could complain that information was being “lost,” because the ordinal data were not converted; they were maintained in their original ratings or quasi-dimensional ranks. The rank tests could also avoid alternative complaints that might arise if the ordinal ratings were regarded as dimensions and managed with t or Z tests.
A second major advantage of rank tests was that they could be applied, without fear of violating mathematical assumptions, to the many sets of dimensional data that had eccentric, non-Gaussian distributions. Furthermore, as noted later, the rank tests could sometimes demonstrate stochastic signif - icance that cannot be achieved in situations where the t and Z tests become “handicapped” by large variance in the eccentric distributions.
15.2 Basic Principles
A rank test uses a hybrid of the empirical and parametric strategies discussed for two-group stochastic tests in Chapters 12 through 14. In both of those strategies, we begin with an index of contrast, which is usually an increment in two means or in two binary proportions, but can also be expressed as a ratio. We next choose a “test statistic” that will be examined stochastically. The test statistic can be either the focal index itself (in empirical tests) or a special stochastic index, such as Z, t, or X2 (in parametric tests). We then examine the way in which the test statistic would be distributed in a sampling arrangement
© 2002 by Chapman & Hall/CRC
formed under the null hypothesis. From the possibilities that would occur in that distribution, we determine a P value or a confidence interval for the focal index.
The main difference between the empirical and parametric methods is in forming the rearranged “sampling distribution” of the test statistic. Empirically, the sampling distribution of the index of contrast is constructed from appropriate resamplings or permutations of the observed data, as in the Fisher exact probability or Pitman-Welch test. Parametrically, the test statistic is a special stochastic index — such as Z, t, or X2 — whose distribution is derived from known mathematical characteristics of the parametric sampling procedure. The non-parametric rank tests are “hybrids,” because they also prepare a special stochastic index, which is then rearranged according to empirical permutations.
The purpose of this chapter is to introduce you to the basic operational methods of the “classical” non-parametric rank tests. You need not remember any of the formulas, because you can either look them up when you need them or get a suitable computer program to do all the work.
15.3 Wilcoxon Two-Group Rank-Sum Test
Perhaps the best way to demonstrate the rank-test strategy is with a simple example of the type that led Wilcoxon to devise the procedure.
15.3.1Illustration of Basic Strategy
Suppose we wanted to compare the dimensional values for {76, 81, 93} in Group A vs. {87, 95, 97} in Group B. To do a t-test for this contrast we would have to calculate values such as Sxx for both groups, get the pooled variance, etc. To avoid these calculations, Wilcoxon decided to do a permutation test, somewhat like the Fisher or Pitman-Welch procedure. Instead of permuting the dimensional values of the data, however, he permuted their ranks.
After a null hypothesis is formed, the six items under examination can be combined into a single group and placed in order of magnitude as 76, 81, 87, 93, 95, and 97. The corresponding ranks are 1, 2, 3, 4, 5, 6. (The process yields exactly the same results if the ranks are reversed as 6, 5, 4, 3, 2, 1.) In the observed data, ranks 1, 2, and 4 occurred in Group A and 3, 5, and 6 in Group B.
Because the sum of N consecutive integers is (N)(N + 1)/2, the sum of the ranks in these two groups will always be fixed at 21 = (6 × 7) /2. Consequently, as the distinctions between the groups increase, the increment in the sum of ranks for each group will enlarge. For example, the widest possible separation for these six items would occur if Group A contains the smallest values 76, 81, and 87, with Group B containing 93, 95, and 97. For this arrangement, the sum of ranks would be 1 + 2 + 3 = 6 in Group A, and 4 + 5 + 6 = 15 in Group B. The difference, 15 − 6 = 9, is the largest obtainable increment in the rank sums for two groups that each contain three members. In our observed data, the sums of ranks are 1 + 2 + 4 = 7 in Group A and 3 + 5 + 6 = 14 in Group B. The difference in sums is 14 − 7 = 7.
15.3.2Arrangement of Permutations
By examining permutations of the summed ranks, Wilcoxon developed the distribution of the expected increments and could establish probabilities for their occurrence. For the six ranks — 1, 2, 3, 4, 5, 6 — divided into two groups of three members each, a total of 6!/[3! × 3!] = 20 combinations are possible.
The arrangement of these combinations for the first ten patterns is shown in Table 15.1. The remaining ten patterns are identical to those shown in Table 15.1, except that the contents are reversed for Groups A and B. The absolute differences in sums of ranks for the 20 possibilities in the total distribution are shown in Table 15.2.
Under the null hypothesis that the two groups are taken from the same underlying population, the observed difference of 7 (or larger) in the sum of ranks would randomly occur with a relative-frequency chance (i.e., P value) of .2. The maximum difference of 9 would have a two-tailed P = .1.
© 2002 by Chapman & Hall/CRC
TABLE 15.1
Permutations and Incremental Sum of Ranks for Two Groups with Three Members Each*
Contents, |
Sum of Ranks, |
Contents, |
Sum of Ranks, |
Absolute Difference |
Group A |
Group A |
Group B |
Group B |
in Sum of Ranks |
|
|
|
|
|
1,2,3 |
6 |
4,5,6 |
15 |
9 |
1,2,4 |
7 |
3,5,6 |
14 |
7 |
1,2,5 |
8 |
3,4,6 |
13 |
5 |
1,2,6 |
9 |
3,4,5 |
12 |
3 |
1,3,4 |
8 |
2,5,6 |
13 |
5 |
1,3,5 |
8 |
2,4,6 |
12 |
4 |
1,3,6 |
9 |
2,4,5 |
11 |
2 |
1,4,5 |
9 |
2,3,6 |
11 |
2 |
1,4,6 |
11 |
2,3,5 |
10 |
1 |
1,5,6 |
12 |
2,3,4 |
9 |
3 |
* An additional 10 possibilities are produced by reversing the contents of Groups A and B.
TABLE 15.2
Summary of Distribution Illustrated in Table 15.1*
Absolute |
|
|
Descending |
Difference in |
|
Relative |
Cumulative |
Sum of Ranks |
Frequency |
Frequency |
Frequency |
|
|
|
|
9 |
2 |
.1 |
.1 |
7 |
2 |
.1 |
.2 |
5 |
4 |
.2 |
.4 |
4 |
2 |
.1 |
.5 |
3 |
4 |
.2 |
.7 |
2 |
4 |
.2 |
.9 |
1 |
2 |
.1 |
1.0 |
Total |
20 |
1.0 |
|
*Note that Table 15.1 shows half of the total distribution.
15.3.3Choice of Test Statistic
The strategy just described is used in essentially all of the diverse forms of non-parametric rank tests, which differ mainly in the construction of the index used as a test statistic.
In Wilcoxon’s original formulation, he chose the smaller of the two sums of ranks to be the test statistic. Because the total sum of N ranked integers is (N)(N + 1)/2, each of the two groups should have half this value, i.e., (N)(N + 1)/4, under the null hypothesis of equivalence. Extensive tables of P values can then be constructed for the smaller of the two rank sums under the null hypothesis. The tables are usually condensed to show critical values of the test statistic for conventional α levels such as .05 and
.01. Table 15.3 shows such an arrangement for sizes ranging from 2 to 20 in the two groups. (For larger groups, the non-parametric test statistic can receive the parametric conversion discussed in Section 15.4.4.)
15.3.4Illustration of Procedure
Suppose a written examination for medical licensure has been given to five people from School A and to four from School B. Their scores in the exam are as follows:
School A: 78, 64, 75, 45, 82
School B: 93, 70, 53, 51
© 2002 by Chapman & Hall/CRC
Hall/CRC & Chapman by 2002 ©
TABLE 15.3
Required Value of Smaller of Two Rank Sums to Attain P Values of .05 or .01 in Wilcoxon Two-Group Rank Sum Test, Arranged According to Size of Groups*
|
|
|
|
N1 |
(Smaller Sample) |
|
|
|
|
|
|
|
|
N1 |
(Smaller Sample) |
|
|
|
|
||
N1 |
α |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
.05 |
|
|
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
.05 |
|
6 |
11 |
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
|
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
.05 |
|
7 |
12 |
18 |
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
|
10 |
16 |
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
.05 |
|
7 |
13 |
20 |
27 |
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
|
10 |
16 |
24 |
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
.05 |
3 |
8 |
14 |
21 |
29 |
38 |
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
|
11 |
17 |
25 |
34 |
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
.05 |
3 |
8 |
14 |
22 |
31 |
40 |
51 |
62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
6 |
11 |
18 |
26 |
35 |
45 |
56 |
|
|
|
|
|
|
|
|
|
|
|
|
10 |
.05 |
3 |
9 |
15 |
23 |
32 |
42 |
53 |
65 |
78 |
|
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
6 |
12 |
19 |
27 |
37 |
47 |
58 |
71 |
|
|
|
|
|
|
|
|
|
|
|
11 |
.05 |
3 |
9 |
16 |
24 |
34 |
44 |
55 |
68 |
81 |
96 |
|
|
|
|
|
|
|
|
|
|
|
.01 |
|
6 |
12 |
20 |
28 |
38 |
49 |
61 |
73 |
87 |
|
|
|
|
|
|
|
|
|
|
12 |
.05 |
4 |
10 |
17 |
26 |
35 |
46 |
58 |
71 |
84 |
99 |
115 |
|
|
|
|
|
|
|
|
|
|
.01 |
|
7 |
13 |
21 |
30 |
40 |
51 |
63 |
76 |
90 |
105 |
|
|
|
|
|
|
|
|
|
13 |
.05 |
4 |
10 |
18 |
27 |
37 |
48 |
60 |
73 |
88 |
103 |
119 |
136 |
|
|
|
|
|
|
|
|
|
.01 |
|
7 |
14 |
22 |
31 |
41 |
53 |
65 |
79 |
93 |
109 |
125 |
|
|
|
|
|
|
|
|
14 |
.05 |
4 |
11 |
19 |
28 |
38 |
50 |
62 |
76 |
91 |
106 |
123 |
141 |
160 |
|
|
|
|
|
|
|
|
.01 |
|
7 |
14 |
22 |
32 |
43 |
54 |
67 |
81 |
96 |
112 |
129 |
147 |
|
|
|
|
|
|
|
15 |
.05 |
4 |
11 |
20 |
29 |
40 |
52 |
65 |
79 |
94 |
110 |
127 |
145 |
164 |
184 |
|
|
|
|
|
|
|
.01 |
|
8 |
15 |
23 |
33 |
44 |
56 |
69 |
84 |
99 |
115 |
133 |
151 |
171 |
|
|
|
|
|
|
16 |
.05 |
4 |
12 |
21 |
30 |
42 |
54 |
67 |
82 |
97 |
113 |
131 |
150 |
169 |
190 |
211 |
|
|
|
|
|
|
.01 |
|
8 |
15 |
24 |
34 |
46 |
58 |
72 |
86 |
102 |
119 |
136 |
155 |
175 |
196 |
|
|
|
|
|
17 |
.05 |
5 |
12 |
21 |
32 |
43 |
56 |
70 |
84 |
100 |
117 |
135 |
154 |
174 |
195 |
217 |
240 |
|
|
|
|
|
.01 |
|
8 |
16 |
25 |
36 |
47 |
60 |
74 |
89 |
105 |
122 |
140 |
159 |
180 |
201 |
223 |
|
|
|
|
28 |
.05 |
5 |
13 |
22 |
33 |
45 |
58 |
72 |
87 |
103 |
121 |
139 |
158 |
179 |
200 |
222 |
246 |
270 |
|
|
|
|
.01 |
|
8 |
16 |
26 |
37 |
49 |
62 |
76 |
92 |
108 |
125 |
144 |
163 |
184 |
206 |
228 |
252 |
|
|
|
19 |
.05 |
5 |
13 |
23 |
34 |
46 |
60 |
74 |
90 |
107 |
124 |
143 |
163 |
182 |
205 |
228 |
252 |
277 |
303 |
|
|
|
.01 |
3 |
9 |
17 |
27 |
38 |
50 |
64 |
78 |
94 |
111 |
129 |
147 |
168 |
189 |
210 |
234 |
258 |
283 |
|
|
20 |
.05 |
5 |
14 |
24 |
35 |
48 |
62 |
77 |
93 |
110 |
128 |
147 |
167 |
188 |
210 |
234 |
258 |
283 |
309 |
337 |
|
|
.01 |
3 |
9 |
18 |
28 |
39 |
52 |
66 |
81 |
97 |
114 |
132 |
151 |
172 |
193 |
215 |
239 |
263 |
289 |
315 |
|
* This table is derived from Schor, S.S. Fundamentals of Biostatistics. New York: Putnam’s Sons, 1968.
A data analyst determines that XA = 68.8, sA = 14.89, XB = 66.75, sB = 19.47; and then claims (after doing a t test) that the results are not stochastically significant. Another analyst might then argue, however, that the t test is improper and that a ranks test is more appropriate, because the examination scores are arbitrary ratings rather than truly dimensional data and, therefore, should not be analyzed with dimensional tactics.
For the Wilcoxon Rank Sum test, we rank the total set of observations, irrespective of group, and then find the sum of ranks in each group. The results are as follows:
Group A |
|
|
Group B |
|
Observed Value |
Rank |
|
Observed Value |
Rank |
|
|
|
|
|
78 |
7 |
93 |
9 |
|
64 |
4 |
70 |
5 |
|
75 |
6 |
53 |
3 |
|
45 |
1 |
51 |
2 |
|
82 |
8 |
|
|
|
Sum of Ranks |
RA = 26 |
|
|
RB = 19 |
|
|
|
|
|
[If you do a rank test yourself, rather than with a computer program, always check your calculations by seeing that the total of the two rank sums is equal to the value of (N)(N+1)/2. In this instance 26 + 19
=45 = (9 × 10)/2.]
The smaller of the two rank sums here is RB = 19. In Table 15.3, with a smaller group of 4 members
≤11 to attain a 2P value of ≤ .05. Consequently,and a larger group of 5, the value of R
B
the observed result for RB is too large to be stochastically significant. [The only way stochastic significance could be obtained in the situation here is to get RB ≤ 11 with ranks 1, 2, 3, and 4 or ranks 1, 2, 3, and 5 in Group B.]
15.4 Mann-Whitney U Test
To avoid using the somewhat crude sum of ranks, Mann and Whitney6 developed a different, more subtle indexing strategy, which relies on the sequential placement of ranks. The index reflects the number of times, in the ranked total arrangement, in which a value (or “mark”) in one group precedes that of the other group. The sequential placement index, called “U,” will have two sets of values: one for the number of times that Group A marks precede those of Group B, and the other, for the number of times that Group B marks precede those of Group A.
Under the null hypothesis, the two U indexes should be about equal. If the two compared groups are different, one of the U indexes should be substantially smaller than the other. The values of U will have a sampling distribution for groups of sizes nA and nB; and P values can be found for magnitudes of the smaller value of U in that distribution.
The stochastic results for P turn out to be essentially identical for the Wilcoxon rank-sum approach and for the Mann-Whitney sequential-placement U approach. Nevertheless, the Mann-Whitney approach is usually preferred, perhaps because it has better academic credentials (the paper was published in the Annals of Mathematical Statistics), but mainly because its stochastic and descriptive properties have several advantages to be discussed later. In view of the similarity of results, however, the Mann-Whitney U procedure now often receives the eponym of Wilcoxon-Mann-Whitney.
15.4.1Illustration of Procedure
For a Mann-Whitney analysis, the 11 examination grades in the previous example from Schools A and B would be ranked, from lowest to highest, as follows:
45 |
51 |
53 |
64 |
70 |
75 |
78 |
82 |
93 |
A |
B |
B |
A |
B |
A |
A |
A |
B |
© 2002 by Chapman & Hall/CRC
The test statistic, U, can be determined as the number of times a B mark precedes an A mark. Thus, the first A mark, 45, is preceded by no B marks. The next A mark, 64, is preceded by two B marks. The next A mark, 75, is preceded by three B marks (51, 53 and 70); and the last two A marks, 78 and 82, are each preceded by the same three B marks. Thus, the value of U is:
U = 0 + 2 + 3 + 3 + 3 = 11
In the extreme instance, if School A had the five highest scores and School B had the four lowest scores, the results would be:
45 |
51 |
53 |
64 |
70 |
75 |
78 |
82 |
93 |
B |
B |
B |
B |
A |
A |
A |
A |
A |
|
|
|
|
|
|
|
|
|
In this situation, each of the A marks would be preceded by four B marks and the value of U would be:
U = 4 + 4 + 4 + 4 + 4 = 20
If we had reversed our enumeration technique and counted the occasions on which a B mark is preceded by an A mark, rather than vice versa, we would have obtained, for the first instance:
U′ = 1 + 1 + 2 + 5 = 9
In the second instance, for the extreme example, U′ = 0.
It can be shown mathematically that U + U′ = nAnB, where nA and nB are the respective sizes of the two groups under consideration. Thus, for the examples being discussed, nA = 5 and nB = 4, so that nAnB = 20. In the first example U = 11 and U′ = 9. In the second example, U = 20 and U′ = 0. In both examples, U + U′ = 20.
15.4.2Simplified Procedure
Because the process of determining U (or U′) can become a nuisance if the group sizes are large, a simpler method is used. By some algebra that will not be shown here, it can be demonstrated that, if we use the subscripts 1 and 2 for Groups A and B, and if the sums of ranks are R 1 for Group A and R2 for Group B, then
U1 = R1 − [n1 (n1 + 1)/2] |
and |
U2 = R2 − [n2 (n2 + 1)/2] |
[15.1] |
One of these values will be U and the other will be U′. In the example here, U1 = 26 − [5(6)/2] = 11; and U2 = 19 − [4(5)/2] = 9. These are the same values found earlier with the A-before-B or B-before- A sequential placement procedure.
15.4.3Use of the U/P Table
Table 15.4 gives the values of P for .025 (one-tail) or .05 (two-tail) decisions, associated with the smaller of the two values of U or U′. For the data under discussion, Table 15.4 shows that with n2 = 5 and n1 = 4, U must equal 1 or less for a P value of .05 or lower. Therefore, since U = 9, we again conclude that P > .05.
15.4.4Z/U Relationship for Large Groups
When n1 and n2 become large (e.g., > 20), the sampling distribution of U approaches that of a Gaussian distribution, having as its mean
|
|
ˆ |
= n1 n2 ⁄2 |
|
[15.2] |
|
|
µU |
|
||
and standard deviation |
|
|
|
|
|
σˆ U |
= |
(n1 )(n2 )(n1 + n2 |
+ 1 ) |
[15.3] |
|
-------------------------------------------------- |
12 |
|
|||
|
|
|
|
|
|
© 2002 by Chapman & Hall/CRC
TABLE 15.4
U/P Table for Wilcoxon-Mann-Whitney U Test
(Larger) n2 |
|
|
|
|
Smallest Permissible Value of U for 2-Tailed P = .05 |
|
|
|
|
||||||||
(Smaller) n1 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
3 |
|
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
4 |
0 |
1 |
2 |
3 |
4 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
11 |
12 |
13 |
13 |
5 |
|
2 |
3 |
5 |
6 |
7 |
8 |
9 |
11 |
12 |
13 |
14 |
15 |
17 |
18 |
19 |
20 |
6 |
|
|
5 |
6 |
8 |
10 |
11 |
13 |
14 |
16 |
17 |
19 |
21 |
22 |
24 |
25 |
27 |
7 |
|
|
|
8 |
10 |
12 |
14 |
16 |
18 |
20 |
22 |
24 |
26 |
28 |
30 |
32 |
34 |
8 |
|
|
|
|
13 |
15 |
17 |
19 |
22 |
24 |
26 |
29 |
31 |
34 |
36 |
38 |
41 |
9 |
|
|
|
|
|
17 |
20 |
23 |
26 |
28 |
31 |
34 |
37 |
39 |
42 |
45 |
48 |
10 |
|
|
|
|
|
|
23 |
26 |
29 |
33 |
36 |
39 |
42 |
45 |
48 |
52 |
55 |
11 |
|
|
|
|
|
|
|
30 |
33 |
37 |
40 |
44 |
47 |
51 |
55 |
58 |
62 |
12 |
|
|
|
|
|
|
|
|
37 |
41 |
45 |
49 |
53 |
57 |
61 |
65 |
69 |
13 |
|
|
|
|
|
|
|
|
|
45 |
50 |
54 |
59 |
63 |
67 |
72 |
76 |
14 |
|
|
|
|
|
|
|
|
|
|
55 |
59 |
64 |
67 |
74 |
78 |
83 |
15 |
|
|
|
|
|
|
|
|
|
|
|
64 |
70 |
75 |
80 |
85 |
90 |
16 |
|
|
|
|
|
|
|
|
|
|
|
|
75 |
81 |
86 |
92 |
98 |
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
93 |
99 |
105 |
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
106 |
112 |
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
119 |
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
127 |
This table is derived from Smart, J.V., Elements of Medical Statistics. Springfield, IL: Charles C Thomas, 1965.
For large group sizes, therefore, Table 15.4 can be avoided. The observed value of U can be converted into a Z test, using the formula:
|
ˆ |
U – n--------1 n2 |
|
Z = |
U---------------– µU = --------------------------------------- |
2 |
[15.4] |
|
σˆ U |
n1 n2 ( n1 + n2 |
+ 1 ) |
|
|
----------------------------------- |
|
|
|
12 |
|
This result can then be interpreted using either a table of probabilities associated with Z, or the last row (with ν = ∞ ) in a t/P arrangement such as Table 7.3.
15.5 Management of Ties
All of these attributes make the Wilcoxon-Mann-Whitney U Test quite easy to use if the data come from dimensions that have been converted to ranks or if the data are expressed in an unlimited-ranks scale. In most medical situations, however, the ordinal data are expressed in a limited number of grades that will produce many ties when the results are ranked. The ties then create problems in choosing scores for the ranking process and also in the analysis of the test statistic.
15.5.1Illustrative Example
Table 15.5 shows the results of a randomized double-blind trial of active agent A vs. placebo in the treatment of congestive heart failure. Because the outcome was expressed in a 4-category ordinal scale of improvement, the Wilcoxon-Mann-Whitney U procedure is an appropriate stochastic test for the contention that the active agent was better than placebo.
© 2002 by Chapman & Hall/CRC
TABLE 15.5
Results of Clinical Trial for Treatment of Congestive Heart Failure
|
|
Ordinal Scale for Improvement of Patients |
|
||
Treatment |
Worse |
No Change |
Improved |
Much Improved |
TOTAL |
|
|
|
|
|
|
Placebo |
8 |
9 |
19 |
10 |
46 |
Active Agent |
2 |
8 |
29 |
19 |
58 |
TOTAL |
10 |
17 |
48 |
29 |
104 |
|
|
|
|
|
|
15.5.2Assignment of Ranks
To establish the ranks here, we begin at the “low” end of Table 15.5, with 10 patients who were worse.
Their mean rank will be (1 + 2 + + 10)/10 = 5.5. The next 17 patients had no change. They share the ranks 11 to 27, and their mean rank will be (11 + 12 + 13 + + 27)/17 = 323/17 = 19.
An easy method can be used to get a cumbersome sum such as (11 + 12 + + 27)/17. The hard way is to enter all the numbers into a calculator, step by step, pushing the buttons 17 times to get all the numbers entered and added, and then dividing their sum by 17 to get the average value. The easy method relies on remembering that the average value of a sequence of integer numbers — nA, nA + 1, nA + 2, nA + 3, …, nB — is simply (nA + nB)/2. Thus, (11 + 27)/2 = 38/2 = 19, which is the same as 323/17.
The next 48 patients, sharing the ranks 28 through 75, were improved. They tie at the average rank of (28 + 75)/2 = 51.5. The remaining 29 patients, who were much improved, have the average rank of (76 + 104)/2 = 90.
To be sure that the assignment of ranks is correct, recall that the sum of ranks for the 104 patients in the trial should be (104)(105)/2 = 5460. The sum of ranks for the placebo group is (8 × 5.5) + (9 × 19) + (19 × 51.5) + (10 × 90) = 44 + 171 + 978.5 + 900 = 2093.5. The sum of ranks for the actively treated group is (2 × 5.5) + (8 × 19) + (29 × 51.5) + (19 × 90) = 3366.5. The total is 2093.5 + 3366.5 = 5460.
15.5.3Determination of U and Z Values
With the two sums of ranks available, the corresponding U values can be calculated with Formula [15.1] as
Uplacebo = 2093.5 − [(46)(47)/2] = 1012.5
and
Uactive = 3366.5 − [(58)(59)/2] = 1655.5
[To check accuracy of the calculation, note that 1012.5 + 1655.5 = 2668 = (46)(58).] Because Uplacebo is the smaller value, it will serve as U for the subsequent evaluation.
To do a Z test here, we would use Formulas [15.2] and [15.3] to get µˆ U = (46)(58)/2 = 1334 and σˆ U =
(46 )(58 )(105) ⁄12 =
23345 = 152.79. Substituting into Formula [15.4], we then get:
1012.5 – 1334
Z = --------------------------------- = –2.10 152.79
The result is stochastically significant, but is not strictly accurate because we have not taken account of ties.
15.5.4Adjustment for Ties
If ti is the number of observations tied for a given rank, we can determine
Ti = (ti3 – ti )/12 |
[15.5] |
© 2002 by Chapman & Hall/CRC
for each rank, and then calculate Σ Ti. In the cited example, for ratings of worse, t1 = 10; for no change, t2 = 17; for improved, t3 = 48; and for much improved, t4 = 29. We would have
Σ Ti |
= |
(103 |
– 10 ) + (173 – 17 ) + (483 – 48) + (293 |
– 29) |
= |
14790 |
= 11732.5 |
---------------------------------------------------------------------------------------------------------------------- |
12 |
|
--------------12 |
||||
|
|
|
|
|
|
The value of Ti is used to modify the estimate of the standard deviation of U in Formula [15.3]. The modified formula is:
sU |
= |
|
n1 n2 |
N |
3 – N |
|
[15.6] |
|
N----------------------(N – 1) |
---------------- – Σ Ti |
|||||
|
|
|
12 |
|
|
||
where N = n1 + n2.
For our observed data,
sU = |
|
|
(46 )(58 ) |
|
|
|
1043 – 104 |
|
|
|
|
|
|
|
|||||
|
|
(---------------------------104 )(103 ) |
|
|
|
------------------------- – 11732.5 |
|
||
|
|
|
|
|
|
12 |
|
|
|
= |
[0.24907 ][81997.5] = 20423.12 = 142.9 |
||||||||
This value of sU can then be used to calculate Z as:
Z = U---------------– µU |
= 1012.5---------------------------------– 1334 = –2.25 |
sU |
142.9 |
With some simple algebra, not shown here, it can be demonstrated that the subtraction of Σ Ti always reduces the variance, so that sU ≤ σ U. Consequently, the Z value calculated with sU will always be larger than the Z calculated with σˆ U . Thus, the effect of adjusting for ties is to raise the value for Z, thereby increasing the chance of getting stochastic significance.
In the example just cited, the result was stochastically significant without the adjustment, which becomes unnecessary if the directly calculated value of Z yields a P value that is < α . If the uncorrected Z yields a P value substantially > α , the adjustment probably will not help. Thus, the adjustment is best used when the uncorrected Z is somewhere near but below the border of stochastic significance.
15.6 Wilcoxon Signed-Ranks Test
Wilcoxon also created a one-group or paired rank test, analogous to the corresponding one-group or paired Z and t tests. To show how the Wilcoxon procedure works in this situation, suppose 6 patients have received treatment intended to lower their levels of serum cholesterol. The results of the before
(B) and after (A) differences in Table 15.6 show that four of the six cholesterol levels were reduced. The stochastic question to be answered is whether the reduction might be a chance phenomenon if there had been no real effect.
15.6.1Basic Principle
To do a parametric one-group t test for these data, we would examine all of the B-A differences, find their mean, and contrast its value stochastically against a hypothesized mean difference ( d ) of 0. The process is equivalent to adding all the positive increments, subtracting all the negative increments, and noting their mean difference.
In a rank test, we do something quite analogous, but instead of dealing with the direct values, we work with their ranks. After ranking all of the observed increments according to their absolute values,
© 2002 by Chapman & Hall/CRC
