
Chapter 2
Some nonparametric tests
2.1. Introduction
When the methods of statistical inference are based upon the assumption that the population has a certain probability distribution, such as the normal, the resulting collection of statistical tests and procedures is referred to as parametric methods. In this chapter we will consider several statistical procedures that do not require knowledge of the form of the probability distribution form which the measurements come. The methods of statistical inference we will study here are called nonparametric methods. Since nonparametric methods do not require assumptions about the form of the population distribution they are often referred to as distribution free methods.
From this discussion we see that one reason for using nonparametric methods is that in some situations there is insufficient knowledge about the form of the population distribution. Thus assumptions necessary for use of parametric tests can not be made.
A second reason for using nonparametric methods concerns data measurement. Nonparametric methods are often applied to rank order or preference data. Preference data are the type of data generated when people express preference for one product over another, one service over another, etc. Parametric procedures can not be applied with these data, but nonparametric ones can.
This chapter presents an introduction to some of the commonly used statistical procedures that can be classified as nonparametric or distribution free methods. The emphasis will be on the type of problems that can be solved, how the statistical calculations are made, and how appropriate conclusions can be developed to assist management in the decision-making process.
2.2. 1. The Sign test for paired or matched samples
In Chapter 1 we considered z and t statistics for testing hypothesis about a population mean. For both of them, the sample was selected at random from a normal distribution. The question is: How can we conduct a test of hypothesis when we have a small sample from a nonnormal distribution?
The Sign test is a relatively simple and most frequently employed nonparametric procedure for testing hypothesis about the central tendency of a nonnormal probability distribution. The sign test is used in studies to identify if consumer preference exists for one of two products.
Suppose
that paired or matched samples are taken from a population, and the
differences equal to 0 are discharged, leaving n
observations. The Sign test can be used to test the null hypothesis
that the population median of the differences is 0. Let “+”
indicates a positive difference, and “-“ indicates a negative
difference. If the null hypothesis were true, our sequence of “+”
and “-“ differences could be regarded as a random sample from a
population in which the probabilities for “+” and “-“ were
each 0.5. In that case, the observations would constitute a random
sample from a binomial population in which the probability of “+”
was 0.5. Thus, if p
denotes
the true
population
proportion of “+”’s in the population (that is, the true
proportion of positive differences), the null hypothesis is simply
The
Sign test is then based on the fact that the number of positive
observations, S,
in the sample has a binomial distribution (with
under the null hypothesis).
Sign test for paired samples
Suppose that paired random samples are taken from a population and the differences equal to 0 are ignored. Calculate the difference for each pair and record the sign of this difference. The Sign test is used to test:
where p-is the proportion of nonzero observations in the population that are positive. The test statistic S for the Sign test for paired samples is simply
S = the number of pairs with positive difference
and
S
has
a binomial distribution with
and
the
number of nonzero differences.
After determining the null and alternative hypotheses and finding a test statistic, the next step is to determine the p-value and to draw conclusions based on a decision rule.
Determining p- value for a Sign test
The p-value for a Sign test is found using the binomial distribution with the number of nonzero differences, S = the number of pairs with positive differences and .
1. For right tailed test,
,
p-value
=
2. For left tailed test,
,
p-value
=
3. For two tailed test,
,
Example:
In the study 8 individuals were asked to rate on a scale from 1 to 10 the test of products of two brands: Brand A and Brand B. The scores of the test comparison are shown in the following table
N Brand A Brand B
1 5 7
2 3 10
3 4 8
4 9 6
5 8 8
6 5 7
7 6 5
8 9 6
Do the data indicate an overall tendency to prefer the Brand B to the
Brand A?
Solution:
First of all, let us calculate differences
N Brand A Brand B Difference (A-B) Sign of difference
1 5 7 -2 -
2 3 10 -7 -
3 4 8 -4 -
4 9 6 3 +
5 8 8 0 0
6 5 7 -2 -
7 6 5 1 +
8 9 6 3 +
We
are discarding those who rated the brands equally. In this example
the values for fifth person is omitted in future analysis and the
effective sample size is reduced to
.
The only sample information on which our test is based is that three
of the seven tasters preferred the brand A. Hence, the value of the
Sign test is
.
Let p-denotes the true proportion of “+”s in the population. Then the null hypothesis is
There is no overall tendency to prefer one Brand to the other
A one tailed test is used to determine if there is an overall tendency to prefer the Brand B to the Brand A. The alternative hypothesis is that in the population, the majority of preferences are for Brand B. The alternative hypothesis is expressed as
Majority
prefer the Brand B
The
next step is the finding the p-value.
If we denote by
the
probability of observing x
“successes” (“+”s) in
binomial
trials, each with probability of success 0.5, then the cumulative
binomial probability of observing tree or fewer “+”s can be
obtained using binomial formula
For this example p- value is 50%. We are unable to reject the null hypothesis and conclude that data is not sufficient to suggest that population have a preference for Brand B. Since the p-value is the smallest significance level at which the null hypothesis can be rejected, for this example, the null hypothesis can be rejected at 50% or higher. It is unlikely that one would be willing to accept such a high significance level. Again, we conclude that the data is not statistically significant to recommend that Brand B is preferred by majority.