
2.4. The Mann-Whitney test
Suppose two independent random samples are to be used to compare two populations. We may be unwilling to make assumptions about the form of the underlying population probability distributions or we may be unable to obtain exact values of the sample measurements. If the data can be ranked in order of magnitude for either of these situations, the Mann-Whitney test (sometimes called Mann-Whitney U test) can be used to test the hypothesis that the probability distributions associated with the two populations are equivalent.
Assume
that apart from any possible differences in central location, that
the two population distributions are identical. Suppose that
observations
are available from the first population and
observations
from the second population. The two samples are pooled and the
observations are ranked in ascending order, with ties assigned the
average of the next available ranks. Let
denote
the sum of the ranks from the first population. The Mann-Whitney
statistic is
In testing the null hypothesis that the central locations of the two population distributions are the same, we assume that the two population distributions are identical. It can be shown that if the null hypothesis is true, the random variable U has mean
and variance
Then for large sample sizes (both at least 10), the distribution of the random variable,
is well approximated by the standard normal distribution.
Decision rules for the Mann-Whitney test
Suppose that two population distributions are identical, apart from any possible differences in central location. In testing the null hypothesis the two population distributions have the same central location, the following test have significance level :
Two population distributions have the same central location
1. If the alternative hypothesis is one sided hypothesis that the location of population 1 is higher than the location of population 2, the decision rule is
Reject
if
2. If the alternative hypothesis is one sided hypothesis that the location of population 1 is lower than the location of population 2, the decision rule is
Reject
if
3. If the alternative hypothesis is two sided hypothesis that the two population distributions differ, the decision rule is
Reject
if
or
Example:
Let us demonstrate the methodology of the Mann-Whitney test by using it conduct a test on the population of account balances at two branches of some Bank. Data collected from two independent simple random samples, one from each branch, are shown in Table 2.2.
Table2.2
Branch 1 Branch 2
Sampled Account Sampled Account
Account balance account balance
1 1 095 1 885
2 955 2 850
3 1 200 3 915
4 1 195 4 950
5 925 5 800
6 950 6 750
7 805 7 865
8 945 8 1 000
9 875 9 1 050
10 1 055 10 935
11 1 025
12 975
The first step in the Mann- Whitney test is to rank the combined (pooled) data from the two samples from low to high. Using the combined set of 22 observations shown in Table 2.2, the lowest value of $750(item 6 of sample2) is ranked number 1. Continuing the ranking, we have
Account balance Item Rank
750 6 of sample 2 1 800 5 of sample 2 2
805 7 of sample 1 3
…… ……………. …
1 195 4 of sample1 21
1 200 3 of sample 1 22
Item 6 of sample 1 and item 4 of sample 2 both have the same account balance, $950. We could give one of these items a rank 12 and the other a rank 13, but this could lead to an erroneous conclusion. In order to avoid this difficulty the usual treatment for tied data values is to assign each value the rank equal to the average of the ranks associated with the tied items. Thus the tied observations of $950 are both assigned ranks of 12.5. Table 2.3 shows the entire data set with the rank of each observation.
Table2.3
Branch 1 Branch 2
Sampled Account Sampled Account
Account balance Rank account balance Rank
1 1 095 20 1 885 7
2 955 14 2 850 4
3 1 200 22 3 915 8
4 1 195 21 4 950 12.5
5 925 9 5 800 2
6 950 12.5 6 750 1
7 805 3 7 865 5
8 945 11 8 1 000 16
9 875 6 9 1 050 18
10 1 055 19 10 935 10
11 1 025 17
12 975 15________________________________
Sum of ranks 169.5 83.5
The
next step in the Mann-Whitney test is to sum the ranks for each
sample. These sums are shown in Table 2.3. The test procedure can be
based upon the sum of the ranks for either sample. In the following
discussion we use the sum of the ranks for the sample from branch 1.
We will denote this sum by
.
Thus, in our example
.
The value observed for the Mann-Whitney test is
Since
two samples are selected from identical populations and
and
each is 10 or greater, the sampling distribution of U can be
approximated by a normal distribution with mean
and variance
Suppose that we want to test the null hypothesis that the central locations of the distributions of account balance are identical against the two-sided alternative for . The decision rule is to reject the null hypothesis if
or
Here
and
Since -2.08 is less than -1.96, we reject the null hypothesis that two population account balances are identical. Thus we conclude that two populations are not identical. The probability distribution of account balances at branch 1 is not the same as that at branch 2.
Now,
from Table1 of the Appendix, the value of
corresponding
to a value (-2.08) is 0.0188, so the corresponding
is
0.0376
The null hypothesis will be rejected for any significance level higher than 3.76%. Thus, these data do not contain strong evidence against the hypothesis that the central locations of accounts at two branches are the same. There is very strong support that two branches account balances are not identical.
Exercises
1.
Starting salaries were recorded for ten recent business
administration graduates at each of two well-known universities. Use
and
test for the difference in the starting salaries from the two
universities is zero against the alternative that starting salaries
are higher for the university A.
University A University B
Student Monthly salary ($) Student Monthly salary ($)
1 890 1 1 000
2 950 2 1 020
3 1 200 3 1 140
4 1 150 4 1 000
5 1 300 5 975
6 1 350 6 925
7 990 7 900
8 1 050 8 1 025
9 1 400 9 1 075
10 1 450 10 930
2. The following data show product weights for items produced on two production lines
Line 1: 13.6; 13.8; 14.0; 13.9; 13.4; 13.2; 13.3; 13.6; 12.9; 14.4
Line 2: 13.7; 14.1; 14.2; 14.0; 14.6; 13.5; 14.4; 14.8; 14.5; 14.3; 15.0; 14.9
Test that the difference between the product weights for the two lines is zero against the alternative that product weights of second line is higher.
Use . Also find p-value.
3. A random sample of 14 male students and an independent random sample of 16 female students were asked to write essays at the conclusion of a writing course. Their grades were recorded below:
Male: 75; 80; 60; 80; 95; 100; 65; 70; 75; 60; 50; 55; 90; 95
Female: 85; 70; 90; 100; 95; 67; 50; 50; 67; 83; 78; 62; 43; 97; 89; 73
Test the 5% significance level null hypothesis that, in the aggregate the male and female students are equally ranked, against a two-sided alternative. Also find p-value.
4. For a random sample of 12 management department gradates and 14 economics department graduates were asked their starting salaries. Those salaries were then ranked from 1 to 26. The following rankings resulted
Management: 2; 6; 7; 1; 11; 20; 8; 14; 21; 12; 4; 26
Economics: 13; 3; 17; 25; 5; 9; 10; 24; 15; 23; 16; 22; 18; 19
Analyze the data using the Mann-Whitney test, and comment on the results.
5. Starting salaries of graduates from two leading universities were compared. Independent random samples of 40 from each university were taken, and the 80 starting salaries were pooled and ranked. The sum of the ranks for students from one of these universities was 1450. Test the null hypothesis that the central locations of the population distributions are identical against two sided alternative.
6. A stock market analyst produced at the beginning of the year a list of stocks to buy and another list of stocks to sell. For a random sample of ten stocks from the “buy list”, percentage returns over the year were as follows:
10.6; 5.2; 12.8; 16.2; 10.6; 4.3; 3.1; 11.7; 13.9; 11.3
For an independent random sample of ten stocks from the “sell list”, percentage returns over the year were as follows:
-2.6; 6.1; 9.9; 11.3; 2.3; 3.9; -2.3; 1.3; 7.9; 10.8
For use the Mann-Whitney test to interpret these data. Also find and interpret p-value.
Answers
1.
; reject
;2.
; reject
;
p-value
= 0.3%;
3. ; accept ; 4. ; p- value =12.36%; will be rejected at all levels higher than 12.36%; 5. ; p-value = 0.101; will be rejected at any level higher than 10.1%; 6. ; reject at 5%;
p- value = 2.58%.