Non- Parametric Statistics - Rob McMullen (2006)

Download Report

Transcript Non- Parametric Statistics - Rob McMullen (2006)

Non-Parametric Statistics
A Presentation by
Rob McMullen
for
AP Statistics
What are Non-Parametric
Statistics?
Non-parametric statistics are a special
form of statistics which help
statisticians with a problem occuring in
Parametric statistics. In order to
understand what non parametric
statistics are, it is first necessary to
know what parametric statistics are.
end
end
What are Parametric
Statistics?
In AP statistics, when we refer to a distribution we
often make certain assumptions about it that enable
us to work with it. One thing that helps us with
this is the CLT, which allows us to assume that
many sampling distributions are approximately
normal.
This theorem, the Central Limit Therom, tells us that for any
distribution with a mean and variance, the sampling
distribution for all samples of a given sample size is
approximately normally distributed.
end
When are Parametric
Statistics not useful?
When we do significance tests, we rely on the
assumption that the sampling distribution of
samples taken follows the t-distribution or the zdistribution, depending on the situation. When this
assumption is not true, none of our tests, which are
called “parametric statistical inference tests,” are
reliable. Everything we have done in AP stats has
been in the field of “parametric statistics.”
end
Why does lack of normality
cause problems?
When we calculate the p-value for an inference test, we
find the probability that the sample was different due to
sampling variability. Basically, we are trying to see if a
recorded value occurred by chance and chance alone.
When we look for a p-value, we are assuming that all
samples of the given sample size are normally distributed
around the mean. This is why the test statistic, which is the
number of standard deviations away from the population
mean the sample mean is, is able to be used. Therefore,
without normality, no p-value can be found.
What are Non-Parametric
Statistics?
The way in which statisticians deal with this
problem of parametric statistics is the field of nonparametric statistics. These are tests that can be
done without the assumption of normality,
approximate normality, or symmetry. These tests
do not require a mean and standard deviation.
Since a standard deviation assumes symmetry, it is
not useful for many distributions anyway.
end
end
What is different about NonParametric Statistics?
 Sometimes statisticians use what is called “ordinal” data.
This data is obtained by taking the raw data and giving each
sample a rank. These ranks are then used to create test
statistics.
 In parametric statistics, one deals with the median rather
than the mean. Since a mean can be easily influenced by
outliers or skewness, and we are not assuming normality, a
mean no longer makes sense. The median is another judge of
location, which makes more sense in a non-parametric test. The
median is considered the center of a distribution.
Tests for non-parametric statistics are similar to the tests end
covered in AP stats, but each is slightly different. There are
non-parametric tests which are similar to the parametric tests.
The following table shows how some of the tests match up.
Parametric Test
Goal for
Parametric Test
Non-Parametric
Test
Goal for NonParametric Test
Two Sample T-Test
To see if two samples
have identical
population means
Wilcoxon Rank-Sum
Test
To see if two samples
have identical
population medians
One Sample T-Test
To test a hypothesis
about the mean of the
population a sample was
taken from
Wilcoxon Signed Ranks
Test
To test a hypothesis
about the median of the
population a sample
was taken from
Chi-Squared Test for
Goodness of Fit
To see if a sample fits a
theoretical distribution,
such as the normal curve
Kolmogorov-Smirnov
Test
To see if a sample
could have come from
a certain distribution
ANOVA
To see if two or more
sample means are
significantly different
Kruskal-Wallis Test
To test if two or more
sample medians are
significantly different
ANOVA
What is an ANOVA?
When are ANOVAs useful?
How does one carry out an ANOVA?
end
ANOVA
What is an ANOVA?
Since ANOVAs were not covered in AP stats, I
will now explain them. An ANOVA is a way to
compare multiple sample means to see if they
are significantly different. The term comes from
a term that describes what the experiment does:
ANalysis Of VAriance = ANOVA.
An ANOVA looks at the variance between the
sample means, and decides if they are
significant or not. This can be done to compare
two or more samples.
end
ANOVA
When are ANOVAs useful?
An ANOVA can be used when one wants to compare
any number of samples. This test be done to see if
many samples could have come from the same
population. This test can also tell you about the
differences between two or more areas. For example,
if a survey is conducted in many different towns, you
can see if their average responses differ significantly.
Similarly, you can take samples of plant growth in
different climates, soil, or with different treatments.
In all cases, an ANOVA can be used to see if the
means vary significantly.
end
ANOVA
end
How does one carry out an ANOVA?
An ANOVA is conducted by first putting all the samples into one,
large sample. The standard deviation of this sample is then found,
and called  . Next, the value for the range of variation in sample
means is found. If the variation between the means is greater than
the range of variation, the null hypothesis is rejected. The range of
variation is found by finding  / N½, (N½ is the square-root of N)
where N is the number of samples in each sample. The difference
between each pair of sample means is then found, which is the
variation of the means. If any one of these is greater than the range
of variation, then those two means are significantly different from
each other. Depending on your goal, this may cause you to reject
your null hypothesis.
end
EXAMPLE
Now that I have explained the background
principles of Non-Parametric Statistics, I will
now carry out an example of one of the tests. I
have chosen the Wilcoxon Rank-Sum Test
(also call the Wilcoxon Mann-Whitney Test)
because it is the most commonly used test.
The Wilcoxon Rank-Sum Test
end
The Wilcoxon Rank-Sum Test is used in
place of the two-sample t-test when the
sampling distributions of the variables
being compared are not normal. This
test requires two samples of sample size
n1 and n2. The test is carried out as
follows.
Items in green are the steps to the test.
Items in white are an example of a real test.
The Wilcoxon Rank-Sum Test
1: The first step in this procedure
is to collect two samples.
Sample 1: {3,2,12,9,13,7,9,11,4,5,6}
n1=11
Sample 2: {1,8,4,15,12,6,10,14,3,3}
n2=10
end
The Wilcoxon Rank-Sum Test
end
2: The Second step is to combine the two samples into
one large sample. Simply take all the data values from
each sample and make one large group. Make sure to
know the original samples, as the data will have to be
separated back into its original state later.
Combined Sample size: n1+n2 = 10+11 = 21
{3,2,12,9,13,7,9,11,4,5,6} and {1,8,4,15,12,6,10,14,3,3}
becomes:
{3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3}
The Wilcoxon Rank-Sum Test
end
3: Once all the data is in one sample, the data
must be put into order by size. The data
should go from smallest to largest.
{3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3}
In order is:
{1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15}
The Wilcoxon Rank-Sum Test
end
4: Each data value is given a rank based on size. If
two or more data have the same value, their rank is
the average of the ranks. This step is when the raw
data becomes ordinal data, or ranked data.
Combined sample in order is: (sample size 21)
{1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15}
Each data value is ranked 1-21:
RANK:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
RAW
DATA:
1
2
3
3
3
4
4
5
6
6
7
8
9
9
10
11
12
12
13
14
15
end
RANK:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
RAW
DATA:
1
2
3
3
3
4
4
5
6
6
7
8
9
9
10
11
12
12
13
14
15
When two or more data have the same rank, the rank is averaged.
Therefore, the data becomes:
RANK:
RAW DATA:
1
2
4
6.5
8
9.5
11
12
13.5
15
16
17.5
19
20
21
1
2
333
44
5
6 6
7
8
9 9
10
11
12 12
13
14
15
The Wilcoxon Rank-Sum Test
end
5: The data are then put back into their original
sampling groups as ranked data.
RANK:
RAW DATA:
1
2
4
6.5
8
9.5
11
12
13.5
15
16
17.5
19
20
21
1
2
333
44
5
6 6
7
8
9 9
10
11
12 12
13
14
15
Orininal Sample 1:
Ranked Sample 1:
{3,2,12,9,13,7,9,11,4,5,6}
{4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5}
Original Sample 2:
Ranked Sample 2:
{1,8,4,15,12,6,10,14,3,3}
{1,12,6.5,21,17.5,9.5,15,20,4,4,}
The Wilcoxon Rank-Sum Test
6: The sum of the ranks is taken for each sample.
This is the test statistic.
Ranked Sample 1: {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5}
Sum of sample 1: 120.5
Ranked Sample 2:
{1,12,6.5,21,17.5,9.5,15,20,4,4,}
Sum of sample 2: 110.5
end
The Wilcoxon Rank-Sum Test
SUMMARY:
1: Two samples are taken.
2: The samples are combined to make one distribution of
sample size (n1+n2).
3: The data are put into order, based on size.
4: Each data value is given a rank based on size. If two or
more data have the same value, their rank is the average of
the ranks.
5: The data are then put back into their original sampling
groups as ranked data.
6: The sum of the ranks is taken for each sample. This is
the test statistic.
end
Non-Parametric Statistics
This concludes my presentation. Are there any topics which
have been covered that are not clear, which you would like
to see again?
Wilcoxon Rank-Sum Test explanation/example
Explanation of an ANOVA
Introduction to Non-Parametric Statistics
Chart comparing Significance Tests
THANK YOU
I would like to thank you for taking the
time to view this presentation. If you
have any questions regarding this topic,
you may email me at
[email protected].
I hope that this has been informational
and that you now clearly understand what
non-parametric statistics are.