Encyclopedia of SociologyVol._3
.pdfNONPARAMETRIC STATISTICS
distribution theory was developed during this time period.
A number of texts have been published recently (see, e.g., Conover 1999; Daniel 1990; Hollander and Wolfe 1999; Krauth 1988; Neave and Worthington 1988; Siegel and Castellan 1988; Sprent 1989). Some of these texts can be used without an extensive statistical background; they have excellent bibliographies and provide adequate examples of assumptions, applications, scope, and limitations of the field of nonparametric statistics. In addition, the Encyclopedia of Statistical Sciences (Kotz and Johnson 1982–1989) and the International Encyclopedia of Statistics (Kruskal and Tanur 1978) should serve as excellent sources of reference material pertaining to nonparametric statistics.
The literature on nonparametric statistics is extensive. The bibliography published in 1962 by Savage had approximately 3,000 entries. More recent bibliographies have made substantial additions to that list.
TESTS AND TECHNIQUES
Nonparametric statistics may be divided into three major categories: (1) noninferential statistical measures; (2) inferential estimation techniques for point and interval estimation of parametric values of the population; and (3) hypothesis testing, which is considered the primary purpose of nonparametric statistics. (Estimation techniques included in the category above are often used as a first step in hypothesis testing.) These three categories include different types of problems dealing with location, dispersion, goodness-of fit, association, runs and randomness, regression, trends, and proportions. They are presented in Table 1 and illustrated briefly in the text.
Table 1, which includes a short list of some commonly used nonparametric statistical methods and techniques, is illustrative in nature. It is not intended to be an exhaustive list. The literature literally consists of scores of nonparametric tests. More exhaustive tables are available in the literature (e.g., Hollander and Wolfe 1999). The six columns in the table describe the nature of the sample, and the eight categories of rows identify the major types of problems addressed in
nonparametric statistics. Types of data used in nonparametric tests are not included in the table, though references to levels of data are made in the text. Tables that relate tests to different types of data levels are presented in some texts (e.g., Conover 1999). A different type of table provided by Bradley (1968) identifies the family to which the nonparametric derivations belong.
The first column in Table 1 consists of tests involving a single sample. The statistics in this category include both inferential and descriptive measurements. They would be used to decide whether a particular sample could have been drawn from a presumed population, or to calculate estimates, or to test the null hypothesis. The next column is for two independent samples. The independent samples may be randomly drawn from two populations, or randomly assigned to two treatments. In the case of two related samples, the statistical tests are intended to examine whether both samples are drawn from the same (or identical) populations. The case of k (three or more) independent samples and k related samples are extensions of the two sample cases.
The eight categories in the table identify the main focus of problems in nonparametric statistics and are briefly described later. Only selected tests and techniques are listed in table 1. Log linear analyses are not included in this table, although they deal with proportions and meet some criteria for nonparametric tests. The argument against their inclusion is that they are rather highly developed specialized techniques with some very specific properties.
It may be noted that: (1) many tests cross over into different types of problems (e.g. the chisquare test is included in three types of problems);
(2) the same probability distribution may be used for a variety of tests (e.g., in addition to association, proportion, and goodness-of-fit, the chi-square approximation may also be used in Friedman’s two-way analysis of variance and Kruskal-Wallis test); (3) many of the tests listed in the table are extensions or modifications of other tests (e.g., the original median test was later extended to three or more independent samples; e.g., the Jonckheere test); (4) the general assumptions and procedures that underlie some of these tests have been extended beyond their original scope (e.g. Hájek’s extension of the Kolmogorov-Smirnov test to regression
1958
NONPARAMETRIC STATISTICS
Selected Nonparametric Tests and Techniques
|
|
|
TYPE OF DATA |
|
|
|
|
|
Two Related, |
|
|
|
|
Two |
Paired, or |
k |
k |
Type of |
|
Independent |
Matched |
Independent |
Related |
Problem |
One Sample |
Samples |
Samples |
Samples |
Samples |
Location |
Sign test |
Mann-Whitney- |
Sign test |
Extension of |
|
|
Wilcoxon rank- |
|
Brown-Mood |
|
Wilcoxon |
sum test |
Wilcoxon matched- |
median test |
|
signed ranks |
|
pairs signed |
|
|
test |
Permutation test |
rank test |
Kruskal-Wallis |
|
|
|
|
one-way analysis |
|
|
Fisher tests |
Confidence |
of variance test |
|
|
|
interval based |
|
|
|
Fisher-Pitman test |
on sign test |
Jonckheer test |
|
|
|
|
for ordered |
|
|
Terry Hoeffding |
Confidence |
alternatives |
|
|
and van der |
interval based |
|
|
|
Waerden/normal |
on the Wilcoxon |
Multiple |
|
|
scores tests |
matched-pairs |
comparisons |
|
|
Tukey’s confidence |
signed-ranks test |
|
|
|
|
|
|
|
|
interval |
|
|
Extension of Brown-Mood median test
Kruskal-Wallis one-way analysis of variance test
Jonckheer test for ordered alternatives
Multiple comparisons
Friedman two-way analysis of variance
Dispersion |
|
Siegel-Tukey test |
|
|
|
(Scale |
|
|
|
|
|
Problems) |
|
Moses’s ranklike tests |
|
|
|
|
|
Normal scores tests |
|
|
|
|
|
Test of the Freund, |
|
|
|
|
|
Ansari-Bradley, |
|
|
|
|
|
David, or Barton type |
|
|
|
|
|
|
|
|
|
Goodness-of-fit |
Chi-square |
Chi-square test |
|
Chi-square test |
|
|
goodness-of-fit |
Kolmogorov- |
|
Kolmogorov- |
|
|
|
|
|
||
|
Kolmogorov- |
Smirnov test |
|
Smirnov test |
|
|
Smirnov test |
|
|
|
|
|
Lilliefors test |
|
|
|
|
Association |
Spearman’s |
Chi-square test of |
Spearman rank |
Chi-square test of |
Kendall’s coefficient |
|
rank correlation |
independence |
correlation |
independence |
of concordance |
|
Kendall’s taua |
|
coefficient |
Kendall’s Partial |
|
|
|
|
|
||
|
taub tauc |
|
Kendall’s taua |
rank correlations |
|
|
Olmstead-Tukey |
|
taub tauc |
Kendall’s coefficient |
|
|
|
|
|
||
|
test |
|
Olmstead-Tukey |
of agreement |
|
|
Phi coefficient |
|
corner test |
Kendall’s coefficient |
|
|
|
|
|
||
|
Yule coefficient |
|
|
of concordance |
|
|
|
|
|
|
|
|
Goodman-Kruskal |
|
|
|
|
|
coefficients |
|
|
|
|
|
Cramer’s statistic |
|
|
|
|
|
Point biserial |
|
|
|
|
|
coefficient |
|
|
|
(continued) |
1959
NONPARAMETRIC STATISTICS
Selected Nonparametric Tests and Techniques (continued)
|
|
TYPE OF DATA |
|
|
|
|
|
|
Two Related, |
|
|
|
|
Two |
Paired, or |
k |
k |
Type of |
|
Independent |
Matched |
Independent |
Related |
Problem |
One Sample |
Samples |
Samples |
Samples |
Samples |
Runs and |
Runs test |
Wald-Wolfowitz |
|
|
|
Randomness |
Runs above |
runs test |
|
|
|
|
|
|
|
|
|
|
and below the |
|
|
|
|
|
median |
|
|
|
|
|
Runs up-and- |
|
|
|
|
|
down test |
|
|
|
|
Regression |
|
Hollander and |
|
Brown-Mood test |
|
|
|
Wolfe test for |
|
|
|
|
|
parallelism |
|
|
|
|
|
Confidence interval |
|
|
|
|
|
for difference |
|
|
|
|
|
between two |
|
|
|
|
|
slopes |
|
|
|
Trends and |
Cox-Stuart test |
|
McNemar Change |
|
Changes |
Kendall’s tau |
|
test |
|
|
|
|
|
|
|
Spearman’s rank |
|
|
|
|
correlation coefficent |
|
|
|
|
McNemar change |
|
|
|
|
test |
|
|
|
|
Runs up-and-down |
|
|
|
|
test |
|
|
|
Proportion and |
Binomial test |
Fisher’s exact |
Chi-square test |
Cochran’s Q |
Ratios |
|
test |
test of |
test |
|
|
Chi-square test |
homogeneity |
|
|
|
|
|
|
|
|
of homogeneity |
|
|
Table 1
analysis and extension of the two-sample Wilcoxon test for testing the parallelism between two linear regression slopes); (5) many of these tests have corresponding techniques of confidence interval estimates, only a few of which are listed in Table 1;
(6) many tests have other equivalent or alternative tests (e.g., when only two samples are used, the Kruskal-Wallis test is equivalent to the MannWhitney test); (7) sometimes similar tests are lumped together in spite of differences as in the case of the Mann-Whitney-Wilcoxon test or the Ansari-Bradley type tests or multiple comparison tests; (8) some tests can be used with one or more samples in which case the tests are listed in one or
more categories, depending on common usage;
(9) most of these tests have analogous parametric tests; and (10) a very large majority of nonparametric tests and techniques are not included in the table.
Only a few of the commonly used tests and techniques are selected from Table 1 for illustrative purposes in the sections below. The assumptions listed for the tests are not meant to be exhaustive, and hypothetical data are used in order to simplify the computational examples. Discussions about the strengths and weaknesses of these tests is also omitted. Most of the illustrations are either two-tailed or two-sided hypotheses at the
1960
NONPARAMETRIC STATISTICS
0.05 level. Tables of critical values for the tests illustrated here are included in most statistical texts. Modified formulas for ties are not emphasized, nor are measures of estimates illustrated. Generally, only simplified formulas are presented. A very brief description of the eight major categories of problems follows.
Location. Making inferences about location of parameters has been a major concern in the field of statistics. In addition to the mean, which is a parameter of great importance in the field of inferential statistics, the median is a parameter of great importance in nonparametric statistics because of its robustness. The robust quality of the median can be easily ascertained. If the values in a sample of five observations are 5, 7, 9, 11, 13, both the mean and the median are 9. If two observations are added to the sample, 1 and 94 (an outlier), the median is still 9, but the mean is changed to 20. Typical location problems include estimating the median, determining confidence intervals for the median, and testing whether two samples have equal medians.
Sign Test This is the earliest known nonparametric test used. It is also one of the easiest to understand intuitively because the test statistic is based on the number of positive or negative differences or signs from the hypothesized median. A binomial probability test can be applied to a sign test because of the dichotomous nature of outcomes that are specified by a plus (+) which indicates a difference in one direction or a minus (−) sign which indicates a difference in another direction. Observations with no change or no difference are eliminated from the analysis. The sign test may be a one-tailed or a two-tailed test. A sign test may be used whenever a t-test is inappropriate because the actual values may be missing or not known, but the direction of change can be determined, as in the case of a therapist who believes that her client is improving. The sign test only uses the direction of change and not the magnitude of differences in the data.
Wilcoxon Matched-Pairs Signed-Rank Test The sign test analysis includes only the positive or negative direction of difference between two measures; the Wilcoxon matched-pairs signed-rank test will also take into account the magnitude of differences in ordering the data.
Example: A matched sample of students in a school were enrolled in diving classes with different training techniques. Is there a difference? The scores are listed in Table 2.
Illustrative Assumptions: (1) The random sample data consist of pairs; (2) the differences in pair values have an ordered metric or interval scale, are continuous, and independent of one another; and
(3) the distribution of differences is symmetric.
Hypotheses: A two-sided test is used in this example.
H0: Sum of positive ranks = sum of negative
ranks in population
(1) H1: Sum of positive ranks ≠ sum of negative
ranks in population
Test statistic or procedures: The differences between the pairs of observations are obtained and ranked by magnitude. T is the smaller of the sum of ranks with positive or negative signs. Ties may be either eliminated or the average value of the ranks assigned to them. The decision is based on the value of T for a specified N. Z can be used as an approximation even with a small N except in cases with a relatively large number of ties. The formula for Z may be substituted when N > 25.
Z = |
T − N (N + 1) / 4 |
|
|
(2) |
|
|
||
|
N (N + 1)(2N + 1) / 24 |
This formula is not applicable to the data in Table 2 because the N is < 25 and the calculations in Table 2 will be used in deciding whether to reject or fail to reject (‘‘accept’’) the null hypothesis. In this example in Table 2, the N is 7 and the value of the smaller T is 9.5.
Decision: The researchers fail to reject the null hypothesis (or ‘‘accept’’ the null hypothesis) of no difference between the two groups, with an N of 7 at the 0.05 level, for a two-sided test, concluding that there is no statistically significant difference in the two types of training at the 0.05 level.
Efficiency: The asymptomatic related efficiency of the test varies around 95 percent, based on the sample sizes.
1961
NONPARAMETRIC STATISTICS
Total Scores for Five Diving Trials
|
X |
Y |
Y - X |
Signed Rank of |
Negative |
||
Pairs |
Team A |
Team B |
Differences |
Differences T+ |
Ranks T– |
||
1 |
37 |
35 |
-2 |
-1 |
1 |
|
|
2 |
39 |
46 |
7 |
+4 |
|
|
|
3 |
32 |
24 |
-8 |
-5.5 |
5.5 |
|
|
4 |
21 |
34 |
13 |
+7 |
|
|
|
5 |
20 |
28 |
8 |
+5.5 |
|
|
|
6 |
9 |
12 |
3 |
+2 |
|
|
|
7 |
14 |
9 |
-5 |
-3 |
3 |
|
|
|
|
|
|
|
|
|
|
|
|
T+ = 18.5, T– = 9.5 |
|
|
9.5 |
|
|
|
|
|||||||||||
Table 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Related parametric test: The t-test for matched pairs. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
k |
|
|
2 |
|
|
|
|
|
|
|||||
Analogous nonparametric tests: Sign test; ran- |
H = |
|
12 |
|
|
|
∑ |
R i |
|
− 3(N + 1) |
|
|
(4) |
|
||||||
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
N |
(N + |
1) i− 1 |
|
N i |
|
|
|
|
|
|
|||||||||
domization test for matched pairs; Walsh test |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
for pairs. |
where N1 = the case in the ith category of rank sums |
|||||||||||||||||||
Kruskal-Wallis One-Way Analysis of Variance Test |
Ri = the sum of ranks in the ith sample. |
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
This is a location measure with three or more |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
12 |
|
|
2 |
|
2 |
|
(29) |
2 |
|
|
|||||||||
independent samples. It is a one-way analysis of |
H = |
|
|
(46) |
|
+ |
(16) |
+ |
|
|
|
|
||||||||
13(13 + 1) |
|
|
|
4 |
|
|
|
|
||||||||||||
variance that utilizes ranking procedures. |
|
|
5 |
|
|
|
|
4 |
(5) |
|
||||||||||
|
− 3(13 + 1) = 45, 9857 − 42 |
|
|
|
||||||||||||||||
|
|
|
|
|
|
|||||||||||||||
Example: The weight loss in kilograms for 13 |
H = 3.99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
randomly assigned patients to one of the three diet |
Decision: Do not reject the null hypothesis, as |
|||||||||||||||||||
programs is listed in Table 3 along with the rankings. |
||||||||||||||||||||
the chi-square value for 2 df at the 0.05 level is 5.99 |
||||||||||||||||||||
Is there a significant difference in the sample |
||||||||||||||||||||
and the H value of |
|
3.99 is |
|
less |
than |
the criti- |
||||||||||||||
medians? |
|
|
||||||||||||||||||
cal value. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
Illustrative Assumptions: (1) Ordinal data; (2) |
Efficiency: Asymptotic relative efficiency of |
|||||||||||||||||||
three or more random samples; and (3) indepen- |
||||||||||||||||||||
Kruskal-Wallis test to F test is 0.955 if the popula- |
||||||||||||||||||||
dent observations. |
tion is normally distributed. |
|
|
|
|
|
|
|
|
|||||||||||
Hypotheses: A two-sided test without ties is used |
Related parametric test: F test. Analogous |
|||||||||||||||||||
in this example. |
nonparametric test(s): Jonckheere test for ordered |
|||||||||||||||||||
H0 : Md1 = Md2 = Md3. The populations |
alternatives. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
Friedman Two-Way Analysis of Variance This is a |
||||||||||||||||||||
have the same median values. |
||||||||||||||||||||
nonparametric two-way analysis of variance based |
||||||||||||||||||||
(3) |
||||||||||||||||||||
H1 : Md1 ≠ Md2 ≠ Md3 All the populations |
on ranks and is a good substitute for the paramet- |
|||||||||||||||||||
do not have the same median value. |
ric F test when the assumptions for the F test |
cannot be met.
Test statistics or procedures: The procedure is to rank the values and compute the sums of those ranks for each group and calculate the H statistic. The formula for H is as follows:
Example: Three groups of telephone employees from each of the work shifts were tested for their ability to recall fifteen-digit random numbers, under four conditions or treatments of sleep
1962
NONPARAMETRIC STATISTICS
Diet Programs and Weight-Loss Rankings
|
Group 1 |
|
Rank |
Group 2 |
Rank |
Group 3 |
Rank |
|||
|
2.8 |
|
|
3 |
2.2 |
|
1 |
2.9 |
4 |
|
|
3.5 |
|
|
7 |
2.7 |
|
2 |
3.1 |
6 |
|
|
4.0 |
|
|
11 |
3.0 |
|
5 |
3.7 |
9 |
|
|
4.1 |
|
|
12 |
3.6 |
|
8 |
3.8 |
10 |
|
|
4.9 |
|
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R1 = 46 |
|
|
R2 = 16 |
|
R3 = 29 |
Table 3 |
|
|
|
|
|
|
|
|
|
|
deprivation. The observations and rankings are |
Decision: The critical value at the 0.05 level of |
|||||||||
listed in Tables 4 and 5. Is there a difference in the |
significance in this case for N=3 and k=4 is 7.4. |
|||||||||
population medians? |
|
|
|
Reject the null hypothesis because the F value is |
||||||
|
|
|
|
|
|
|
|
higher than the critical value. Conclude that the |
||
|
|
|
|
|
|
|
||||
|
12 |
k |
|
|
ability to recall is affected. |
|
||||
F |
= |
|
∑R 2 − 3N |
(k + 1) |
|
|
||||
τ |
Nk |
(k + 1) |
j=1 |
j |
|
|
Efficiency: The asymptotic relative efficiency of |
|||
N = number of rows (subjects) |
|
|||||||||
|
this test depends on the nature of the underlying |
|||||||||
|
|
|
|
|
|
(6) |
|
|||
k = number of columns (variables or |
|
population distribution. With k=2 (number of sam- |
||||||||
|
conditions or treatments) |
|
ples), the asymptotic relative efficiency is reported |
|||||||
R j |
= sum of ranks in the jth column |
|
||||||||
|
to be 0.637 relative to the t test and is higher in |
cases of larger number of samples. In the case of
|
|
12 |
|
|
|
|
F = |
|
|
[(11)2+ (6)2+ (3)2+ (10)2] |
|
||
|
|
|
||||
τ |
3(4)(4 + 1) |
|
(7) |
|||
|
|
|||||
|
− 3(3)(4 + 1) |
|
|
|||
|
|
|
|
|
||
|
|
Fτ |
= (0.20)(266) − 45 = 8.2 |
(8) |
Illustrative Assumptions: (1) There is no interaction between blocks and treatment; and (2) ordinal data with observable magnitude or interval data are needed.
Hypotheses:
H0: Md1 = Md2 = Md3 = Md4. The different levels of sleep deprivation do not have differential effects.
(9)
H1: One or more equality is violated. The different levels of sleep deprivation have differential effects.
Test statistic or procedures: The formula and computations are listed above.
three samples, for example, the asymptotic relative efficiency increases to 0.648 relative to the F test, and in the case of nine samples it is at least 0.777.
Related parametric test: F test.
Analogous nonparametric tests: Page test for ordered alternatives.
Mann-Whitney-Wilcoxon Test A combination of different procedures is used to calculate the probability of two independent samples being drawn from the same population or two populations with equal means. This group of tests is analogous to the t-test, it uses rank sums, and it can be used with fewer assumptions.
Example: Table 6 lists the verbal ability scores for a group of boys and a group of girls who are less than 1 year old. (The scores are arranged in ascending order for each of the groups.) Do the data provide evidence for significant differences in verbal ability of boys and girls?
1963
NONPARAMETRIC STATISTICS
Scores of Three Groups by Four Levels of |
Rank of Three Groups by Four Levels of |
||||||||||||||||||||||||
|
|
Sleep Deprivation |
|
|
|
|
Sleep Deprivation |
|
|
|
|||||||||||||||
Conditions |
I |
|
II |
|
|
III |
IV |
Ranks |
|
I |
|
II |
|
III |
IV |
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Group 1 |
|
7 |
|
4 |
|
|
2 |
6 |
|
|
Group 1 |
|
4 |
|
2 |
|
1 |
|
|
3 |
|
|
|||
Group 2 |
|
6 |
|
4 |
|
|
2 |
9 |
|
|
Group 2 |
|
3 |
|
2 |
|
1 |
|
|
4 |
|
|
|||
Group 3 |
|
10 |
|
3 |
|
|
2 |
7 |
|
|
Group 3 |
|
4 |
|
2 |
|
1 |
|
|
3 |
|
|
|||
Table 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rj |
|
11 |
|
6 |
|
3 |
|
|
10 |
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Illustrative Assumptions: (1) Samples are inde- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
pendent and (2) ordinal data. |
|
|
|
|
for U in this case is 4 or smaller, for sample sizes of |
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
Hypotheses: A two-sided test is used in this |
4 and 9 respectively. |
|
|
|
|
|
|
|
|
|
|
||||||||||||||
example. |
|
|
|
|
|
|
|
|
|
|
Efficiency: For large samples, the asymptomatic |
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
relative efficiency approaches 95 percent. |
|||||||||||||
H0: Md1 = Md2. There are no significant |
|
|
|||||||||||||||||||||||
|
Related Parametric Test: F test. |
|
|
|
|||||||||||||||||||||
|
differences in the verbal ability of |
|
|
|
|
||||||||||||||||||||
|
boys and girls. |
|
|
|
|
|
(10) |
|
|
Analogous Nonparametric Tests: Behrens-Fisher |
|||||||||||||||
H1: |
Md1 ≠ Md2. There is a significant |
|
|
||||||||||||||||||||||
|
|
|
problem test, robust rank-order test. |
|
|
|
|||||||||||||||||||
|
difference in the verbal ability of |
|
|
|
|
|
|
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
boys and girls. |
|
|
|
|
|
|
|
|
Z can be used as a normal approximation if N > |
|||||||||||||||
Test statistic or procedures: Rearrange all the |
12, or N1, or N2 > 10, and the formula is giv- |
||||||||||||||||||||||||
en below. |
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
scores in an ascending or descending order (see |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
Table 7). The test statistics are U1 and U2 and the |
Z = |
R |
1 − R 2(N |
1 − N |
2)(N + 1) /2 |
|
(14) |
|
|||||||||||||||||
calculations are illustrated below. |
|
|
|
|
|
N 1N 2(N + 1) / 3 |
|
||||||||||||||||||
Mann-Whitney Wilcoxon U Test The following |
|
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
formulas may be used to calculate U. |
|
|
|
Dispersion. Dispersion refers to spread or |
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
variability. Dispersion measures are intended to |
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
U 1 |
= N 1N 2 |
+ |
N |
1(N 1 |
+ 1) |
|
− R 2 |
(11) |
|
|
test for equality of dispersion in two populations. |
|||||||||||||
|
|
2 |
|
|
|
|
The two-tailed null hypothesis in the Ansari- |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
Bradley-type tests and Moses-type tests assumes |
|||||||||||||
|
|
|
|
|
2(N 2 |
+ 1) |
|
|
|
|
|||||||||||||||
|
U 2 |
= N 1N 2 |
+ |
N |
− R 2 |
(12) |
|
|
that there are no differences in the dispersion of |
||||||||||||||||
|
|
2 |
|
|
|
|
the populations. The Ansari-Bradley test assumes |
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
equal medians in the population. The Moses test |
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
has wider applicability because it does not make |
|||||||||||||
R 1 = 1 + 4 + 5 + 9 = 19 |
|
|
|
|
|
|
|||||||||||||||||||
|
|
|
|
|
|
that assumption. |
|
|
|
|
|
|
|
|
|
|
|||||||||
R 2 |
= 2 + 3 + 6 + 7 + 8 + 10.5 + 10.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
Dispersion tests are not widely used because |
||||||||||||||||||||||
|
+ 12 + 13 = 72 |
|
|
|
|
|
|
|
|
||||||||||||||||
R 1 and R 2 refer to the sum of ranks for (13) |
|
of the limitations on the tests imposed by the |
|||||||||||||||||||||||
|
group 1 and group 2, respectively. |
|
assumptions and the low asymptotic related effi- |
||||||||||||||||||||||
U 2 = (4)(9) = [9(9 + 1) / 2] − 72 = 9 and |
|
|
|
ciency of the tests, or both. |
|
|
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
U 1 = 27, for U 1 + U 2 = (N 1N 2) = 36 |
|
Goodness-of-Fit. A goodness-of-fit test is used |
||||||||||||||||||||||
Decision: Retain null hypothesis. At the 0.05 |
to test different types of problems—for example, |
||||||||||||||||||||||||
the likelihood of observed sample data’s being |
|||||||||||||||||||||||||
level, we fail to reject the null hypotheses of no |
drawn from a prespecified population distribu- |
||||||||||||||||||||||||
differences in verbal ability. The rejection region |
tion, or comparisons of two independent samples |
1964
|
|
|
NONPARAMETRIC STATISTICS |
|
|
||||
|
|
|
|
||||||
|
|
Verbal Scores for Boys and Girls Less than 1 Year Old |
|
||||||
Boys N1 (sample A): |
10 |
15 |
18 |
28 |
|
|
|
|
|
Girls N2 (sample B): |
12 |
14 |
20 |
22 |
25 |
30 |
30 |
31 |
32 |
Table 6
being drawn from populations with a similar distribution. The first problem mentioned above is illustrated here using the chi-square goodness-of- fit procedures.
Χ2, or the chi-square test, is among the most widely used nonparametric tests in the social sciences. The four major types of analyses conducted through the use of chi-square are: (1) goodness-of- fit tests, (2) tests of homogeneity, (3) tests for differences in probability, and (4) test of independence. Of the four types of tests, the last one is the most widely used. The goodness-of-fit test and the test of independence will be illustrated in this article because the assumptions, formulas, and testing procedures are very similar to one another. The Χ2 test for independence is presented in the section on measures of association.
Goodness-of-fit tests would be used in making decisions based on the prior knowledge of the population; for example, sentence length in a new manuscript could be compared with other works of an author to decide whether the manuscript is by the same author; or a manager’s observation of a greater number of accidents in the factory on some days of the week as compared to the average figures could be tested for significant differences. The expected frequency of accidents given in table 8 below is based on the assumption of no differences in the number of accidents by days of the week.
Illustrative Assumptions: (1) The data are nominal or of a higher order such as ordinal, categorical, interval or ratio data. (2) The data are collected from a random sample.
Hypothesis:
Test Statistic or Procedures: The formula for calculating this is the same as for the chi-square test of independence. A short-cut formula is also provided and is used in this illustration:
H0 : The distribution of accidents
during the week is uniform.
(15)
H1 : The distribution of accidents during the week is not uniform.
χ 2 = Σ ( f o − f e )2 |
|
|
fe |
(16) |
|
χ 2 = Σ ( f o)2 − N |
||
|
||
fe |
|
The notation f0 refers to the frequency of actual observations and fe is the frequency of expected observations.
χ 2 = |
225 |
+ |
900 |
+ .... + |
1600 |
|
|
|
|
|
|||
30 |
30 |
30 |
(17) |
|||
|
|
|
|
|
|
+2025 − N = 230 − 210 = 20
30
Decision: With seven observations, there are six degrees of freedom. The value for χν2 is 12.59 at the .05 level of significance. Therefore, the null hypothesis of equal distribution of accidents over the 7 days is rejected at the .05 level of significance.
Asymptotic Relative Efficiency: There is no discussion in the literature about this because nominal data can be used in this analysis and the test is often used when there are no alternatives available. Asymptotic relative efficiency is meaningless with nominal data.
Related Parametric Test. t test.
Analogous Nonparametric Tests. The KolmogrovSmirnov one-sample test, and the binomial test for dichotomous variables.
The Kolmogrov-Smirnov test is another major goodness-of-fit test. It has two versions, the onesample and the two-sample tests. It is different from the chi-square goodness-of-fit in that the
1965
NONPARAMETRIC STATISTICS
Ranked Verbal Scores for Boys and Girls Less than 1 Year Old
Scores: |
10 |
12 |
14 |
15 |
18 |
20 |
22 |
25 |
28 |
30 |
30 |
31 |
32 |
Rank: |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10.5 |
10.5 |
12 |
13 |
Comp: |
A |
B |
B |
A |
A |
B |
B |
B |
A |
B |
B |
B |
B |
Table 7
Kolmogrov-Smirnov test, which is based on observed and expected differences in cumulative distribution functions and can be used with individual values instead of having to group them.
Association. There are two major types of measures of association. They consist of: (1) measures to test the existence (relationship) or nonexistence (independence) of association among the variables, and (2) measures of the degree or strength of association among the variables. Different tests of association are utilized in the analysis of nominal and nominal data, nominal and ordinal data, nominal and interval data, ordinal and ordinal data, and ordinal and interval data.
Chi-Square Test of Independence In addition to goodness-of-fit, χ2 can also be used as a test of independence between two variables. The test can be used with nominal data and may consist of one or more samples.
Example: A large firm employs both married and single women. The manager suspects that there is a difference in the absenteeism rates between the two groups. How would you test for it? Data are included in Table 9.
Illustrative Assumptions: (1) The data are nominal or of a higher order such as ordinal, categorical, interval, or ratio data. (2) The data are collected from a random sample.
Hypothesis:
Test statistic or procedures: The formula for χ2 is given below. Differences between observed and expected frequencies are calculated, and the resultant value is indicated below.
The expected frequencies are obtained by multiplying the corresponding column marginal totals by row marginal totals for each cell divided by the total number of observations. For example, the expected frequency for the cell with an observed frequency of 40 is (100 × 100)/400=25.
H0: The two variables are independent
|
or there is no difference between |
|
|
married and single women with |
|
|
respect to absenteeism. |
|
H1: |
The two variables are not indepen- |
(18) |
|
dent (i.e., they are related), or there is no difference between married women and single women with respect to absenteeism.
χ 2 = Σ ( fo − fe )2
fe
(19)
fo − observed frequency, fe − expected frequency
Similarly, the expected frequency for the cell with an observed frequency of 170 is (200 × 300)/ 400=150.
(30 − 25)2/ 25 + (70 − 75)2/ 75 + (40 − 25)2/
25 + (60 − 75)2/ 75 + (30 − 50)2/ |
(20) |
|
50 + (170 − 150)2/ 150 |
||
|
||
χ 2 = 1 + .33 + 9 + 3 + 8 + 2.67 = 24 |
|
|
|
|
|
df = (number of rows − 1) × (number of |
(21) |
|
columns − 1) = (3 − 1)(2 − 1) = 2 |
||
|
||
|
|
Decision: As the critical χ2 value with two df is 5.99, we reject the null hypothesis, at the 0.05 level. We accept the alternate hypothesis of the existence of a statistically significant difference in the ratio of absenteeism per year between the two groups of married and single women.
Efficiency: The asymptotic relative efficiency of a χ2 test is hard to assess because it is affected by the number of cells in the contingency table and the sample size as well. The asymptotic related efficiency of a 2 × 2 contingency table is very low,
1966
NONPARAMETRIC STATISTICS
Frequency of Traffic Accidents for One Week during May
Day |
S |
M |
T |
W |
T |
F |
S |
Total |
Traffic Accidents |
15 |
30 |
30 |
25 |
25 |
40 |
45 |
210 |
Expected Frequencies |
30 |
30 |
30 |
30 |
30 |
30 |
30 |
210 |
Table 8
but the power distribution of χ2 starts approximating closer to 1 as the sample size starts getting larger. However, a large number of cells in a χ2 table, especially with a combination of large sample sizes, tend to yield large χ2 values which are statistically significant because of the size of the sample. In the past, Yate’s correction for continuity was often used in a 2 × 2 contingency table if the cell frequencies were small. Because of the criticism of this procedure, this correction procedure is no longer widely used. Other tests such as Fisher’s Exact Test can be used in cases of small cell frequencies.
Related Parametric Test: There are no clear-cut related parametric tests because the χ2 test can be used with nominal data.
Analogous Nonparametric Tests: The Fisher Exact Test (limited to 2 × 2 tables and small tables) and the median test (limited to central tendencies) can be used as alternatives. In addition, a large number of tests such as phi, gamma, and Cramer’s V statistic, can be used as alternatives, provided the data characteristics meet the assumptions of these tests. The χ2 distribution is used in many other nonparametric tests.
The chi-square tests of contingency tables allow partitioning of tables, combining tables, and using more than two-way tables with control variables.
The second type of association tests measure the actual strength of association. Some of these tests also indicate the direction of the relationship and the test values in most cases extend from −1.00 to +1.00 indicating a negative or a positive relationship. The values of some other nondirectional tests fall between 0.00 and 1.00. Contingency table formats are commonly used to measure this type of association. Among the more widely used tests are the following, arranged by the types of data used: Nominal by Nominal Data:
Phi coefficient—limited to a 2 × 2 contingency table. A square of these test values is used to interpret a proportional reduction error.
Contingency coefficient based on the chisquare values. The lowest limit for this test is 0.00, but the upper limit does not attain unity (value of 1.00).
Cramer’s V statistic—not affected by an increase in the size of cells as long as it is related to similar changes in the other cells.
Lambda—the range of lambda is from 0.00 to 1.00, and thus it has only positive values.
Ordinal by Ordinal Data:
Gamma—uses ordinal data for two or more variables. Test values are between −1.00 and +1.00.
Somer’s D—used for predicting a dependent variable from the independent variable.
Kendall’s tau—described in more detail below.
Spearman’s rho—described in more detail below.
Categorical by Interval Data
Kappa—The table for this test needs to have the same categories in the columns and the rows. Kappa is a measure of agreement, for example, between two judges.
The tests described above are intended for two-dimensional contingency tables. Tests for threedimensional tables have been developed recently in both parametric and nonparametric statistics.
Two other major measures of association referenced above are presented below. They are Kendall’s τ (the forerunner of this test is also one
1967