Introduction to Statistics

Download Report

Transcript Introduction to Statistics

Introduction to Statistics
Nyack High School Science Research
Researchers often must determine if their data is
statistically significant, or a result of a β€œfluke,” or
measurement uncertainties. A researcher will often test
a hypothesis on a sample of the population.
sample: a group of people who participate in a study
population: all the people who the study is meant to
generalize.
A frequency distribution is a table in which all scores are
listed, along with the frequency with which each occurs.
Here’s an example of scores from an AP Physics test:
Frequency Distribution
rf
Score
frequency
(relative frequency)
56
1
0.077
71
3
0.231
79
1
0.077
80
2
0.154
82
2
0.154
93
2
0.154
95
1
0.077
96
1
0.077
N=
13
1.000
Often, data is presented as a frequency distribution of
intervals:
AP Test Scores
6
frequency
rf
5
56-64
1
0.077
65-73
3
0.231
74-82
5
0.385
83-91
0
0.000
92-100
4
0.308
Frequency
Score Interval
4
3
2
1
0
N=
13
1.000
56-64
65-73
74-82
83-91
92-100
Test Score
The bars in the graph are touching, indicating that the data is
continuous.
Here’s an example of a frequency distribution for
discreet data:
Pet Preference
Pet Preference
dog
frequency
7
6
6
cat
5
neither
3
Frequency
5
4
3
2
1
N=
14
0
dog
cat
Type of pet
neither
Population mean(µ) = the average of all the scores of the
population:
π‘ƒπ‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› π‘šπ‘’π‘Žπ‘› =
OR
πœ‡=
π‘ π‘’π‘š π‘œπ‘“ π‘ π‘π‘œπ‘Ÿπ‘’π‘ 
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘ π‘π‘œπ‘Ÿπ‘’π‘ 
𝑋
𝑁
Sample mean (𝑿):
𝑋=
𝑋
𝑁
Median = the middle score in a distribution organized from
highest-to-lowest, or lowest-to-highest. Referring to the example
above, the median score would be ’80.’
Mode = the score with the highest frequency. In the example
above, the mode is ’71.’
Measures of Variation
Range = highest score – lowest score
Standard deviation for a population(Οƒ) is the average
distance of all the scores in the distribution from the
mean, or central point of the distribution.
𝜎=
(𝑋 βˆ’ πœ‡)2
𝑁
Where: X = individual score value
What statistical tool do I use?
Determining the
relationship
between 2
variables
Comparing 2
sample
means
Correlation/
Regression
analysis
T-test
Comparing a
sample mean to a
population mean
Comparing
more than 2
samples
ANOVA
(Analysis of
Variance)
Comparing
observed
categorical results
to expected
Chi-squared
Z-test
Standard Scores
z-scores are a measure of how many standard deviation units the
individual raw score falls from the mean.
For an individual score, in comparison to a sample:
π‘‹βˆ’π‘‹
𝑧=
𝑆
For an individual score, in comparison to a population:
π‘‹βˆ’πœ‡
𝑧=
𝜎
Standard Scores
AP Physics Exam Score
(Score – mean)
z-score
96
15.46
1.32
95
14.46
1.24
56
-24.54
-2.10
71
-9.54
-0.82
93
12.46
1.07
71
-9.54
-0.82
93
12.46
1.07
71
-9.54
-0.82
80
-0.54
-0.05
81
0.46
0.04
80
-0.54
-0.05
81
0.46
0.04
79
-1.54
-0.13
Mean
80.54
Std. Dev
11.68
Null and Alternative Hypotheses
Null hypothesis (Ho): Whatever the research topic, the null
hypothesis predicts that there is no difference between the groups
being compared. (Which is typically what the research does not
expect to find.)
Ex: Say I want to find out if students who attend a review session
score higher than those who do not. The null hypothesis would be
that the mean score of the group who attended review session
would be the same as the mean score of the group who did not
attend a review session:
πœ‡π‘Ÿπ‘’π‘£π‘–π‘’π‘€ π‘ π‘’π‘ π‘ π‘–π‘œπ‘› = πœ‡π‘”π‘’π‘›π‘’π‘Ÿπ‘Žπ‘™ π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘›
The alternative hypothesis (Ha or H1) would be:
πœ‡π‘Ÿπ‘’π‘£π‘–π‘’π‘€ π‘ π‘’π‘ π‘ π‘–π‘œπ‘› > πœ‡π‘”π‘’π‘›π‘’π‘Ÿπ‘Žπ‘™ π‘π‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘›
Null and Alternative Hypotheses
A one-tailed hypothesis is when the direction of difference is
predicted. Ex: If I predict that students who attend review
session will score higher.
A two-tailed hypothesis is when differences are expected, but
the researcher is unsure what they will be. Ex: If I predict that
attending a review session will affect scores, but don’t know if
the scores would be higher or lower.
Null and Alternative Hypotheses
We must determine if our data is β€œstatistically significant.” In
other words, we must determine if the data actually supports
our hypothesis, or if it just looks that way due to uncontrollable
conditions. There are two types of errors:
Type I error: If we reject the null hypothesis, and the null
hypothesis is disproved (i.e., the data shows that there is a
difference between the population and the sample), but the
difference is due to a β€˜fluke’ (experimental errors, good guesses,
etc.)
Type II error: If we accept the null hypothesis, and the
population mean is equal to the sample mean, but this is due to
a β€˜fluke.’
Determining Statistical Significance
We can use either a z-test or a t-test to determine statistical
significance. The test we use depends on our data.
z-test: The z-test is used when the population variance is
known. It allows the user to compare a sample to a population.
The z-test uses the mean and the standard deviation of the
sample to determine whether the sample mean is significantly
different from the population mean.
t-test: The t-test is used when the population variance is not
known. Use the t-test when you have a small sample and you
do not know Οƒ.
Determining Statistical Significance
Once you have performed a z-test or a t-test, you can plug that
value into a program to get your p-value. The discussion of pvalues is later.
But wait! There are a other types of t-tests!!! What if you are
comparing two different samples, instead of comparing one
sample to one population? Then, you must use a different
algorithm. We will look at the two possibilities:
Determining Statistical Significance
t-test for Independent Groups/Samples
Use this test when you are comparing two samples, representing
two populations.
You can compare the two groups in one of two ways:
1. One group is the control group, and one is the experimental
group, or
2. Both groups are experimental, and there is no control.
Determining Statistical Significance
t-test for Correlated Groups/Samples
Use this test when you are comparing the performance of
participants in two groups, but the same people are used in each
group, or different participants are matched between groups (i.e.,
you are working with pairs of scores for each participant.)
This test is based on the difference score (D), which is the
difference between the pairs of scores for each participant.
Determining Statistical Significance
t-test for Correlated Groups/Samples
Ex: Eight participants are asked to listen to a single genre of music (hip hop), and then rate the severity of their nightmares on a 1-5
Likert scale (1=mild, 5 = severe.) They are then asked to repeat the process, this time listening to classical music.
Nightmare Severity
Participant
Hip-Hop
Classical
D (difference score)
1
5
1
4
2
4
3
1
3
4
4
0
4
5
4
1
5
3
2
1
6
3
3
0
7
4
3
1
8
3
1
2
Total
10
Mean (𝐷)
1.25
Determining Statistical Significance
Chi-Square Tests are nonparametric tests: They do not involve
the mean or standard deviation of the population, for one
thing.
Determining Statistical Significance
1. Chi-Square (Ο‡2) Goodness-of-Fit Test
Used for comparing categorical information (observed
frequencies) against what we would expect based on previous
knowledge (expected frequencies.)
For example, say a study of students at Nyack High School
samples 54 students, and finds 8 of the students (15% of the
sample) are overweight or obese. Assume that nationwide, 30%
of high school students have been found to be overweight or
obese. Observed and Expected frequencies are shown:
Frequencies
Overweight/obese
Not Overweight/obese
Observed (O)
8
46
Expected (E)
16
38
Determining Statistical Significance
2. Chi-Square (Ο‡2) Test of Independence
Whereas a Ο‡2 goodness-of-fit test compares how well an
observed frequency distribution of one nominal variable fits
some expected pattern of frequencies, a Ο‡2 test of
independence compares how well two nominal variables fits
some expected pattern of frequencies.
Determining Statistical Significance
For example, say a study of Nyack High School students looked at
whether students who have already taken a health class exercise
more than those who have not. We have two variables (taking a
health class and exercising.) We find that of the 100 students
who have taken a health class, 75 exercise regularly. In the group
of students who have not yet taken health, 35 out of 80 exercise
regularly. Data is shown in Table 8 below, where the numbers in
parenthesis are the expected frequencies, based on the total
students polled (180):
Taken Health Class
Yes
No
Row Totals (RT)
exercisers
75 (61)
35 (49)
110
Non-exercisers
25 (39)
45 (31)
70
100
80
180
Column Totals (CT)
P-values and Statistical Significance
You can now use an on-line p-value calculator to find the pvalue. If you’re doing a z-test, simply insert the z-value. If
you’re doing a t-test, you need the t-value and the degrees of
freedom. For a Ο‡2 test, you need the Ο‡2 value and the degrees
of freedom.
An on-line p-value calculator for two-tailed tests is available
at:
http://graphpad.com/quickcalcs/pvalue1.cfm
The p-value for a one-tailed test would be double the p-value
for a two-tailed test.
P-values and Statistical Significance
So what is a p-value?
A p-value is a probability, with a value ranging from zero to
one. A value of zero would mean that, if a random sample of
a population were taken, there would be no chance that it
would have a larger difference from the total population than
what you observed. If the p-value was 0.03, there is a 3%
chance of observing a difference as large as you observed.
P-values and Statistical Significance
In general, the smaller the p-value, the more β€œstatistically
significant” your data is. It’s up to you to set a threshold pvalue. Once this is done, every result is either statistically
significant or not.
Many scientists refer to data as being either β€œvery significant”
if the p-value is below a threshold (usually less than 0.05), and
β€œextremely significant” if below a lower threshold (often less
than 0.01).
Sometimes values are flagged with one asterisk for β€œvery
significant,” and two asterisks for β€œextremely significant.”
Confidence Intervals
If we don’t know the population mean (µ), we can calculate a
confidence interval.
A confidence interval is a range of values which we feel
β€œconfident” will contain the population mean, µ.
The confidence level describes the uncertainty involved with a
sampling method. A 90% confidence level means that we are
90% confident that the population mean falls within this interval.
Confidence intervals can be calculated from z-scores or t-scores.
Confidence Intervals
Referring back to the nightmare study, where the severity of
nightmares was rated on a scale of 1-5 (1-mild, 5=severe), the
95% confidence interval in the difference score was calculated to
be 0.11-2.39.
Thus, we can say that we are 95% confident that the difference
in nightmare severity after listening to classical music is between
0.11 and 2.39 less than the nightmare severity after listening to
hip-hop.
Correlation Coefficients
When you are looking at a cause-and-effect type of relationship
between two variables, a correlation coefficient (r), can be used
to measure the strength of the relationship. The value of a
correlation coefficient is between 0.00 and 1.00, as follows:
Correlation coefficient (r)
±0.70-1.00
±0.30-0.69
±0.00-0.29
Strength of Relationship
Strong
Moderate
None(0.00) to Weak
Correlation Coefficients
Sofia.usgs.gov
Summary
When analyzing data:
β€’ Decide what type of experiment you are conducting,
and what you are comparing: sample(s) vs. population.
β€’ If applicable, plot the frequency distribution.
β€’ Look for trends.
β€’ Choose a method to test for statistical significance:
οƒΌ z- test
οƒΌ t=test
οƒΌ Chi-Squared test
οƒΌ ANOVA
οƒΌ Regression
Claim 1: Money can’t buy you love, but
it can buy you a good ball team
β€’ Specifically, claim is that baseball teams with
bigger salaries win more games than those
will smaller salaries
β€’ Data are average (mean) salaries and winning
percentages for the 2012 baseball season
The data
TEAM
Arizona Diamondbacks
Atlanta Braves
Baltimore Orioles
Boston Red Sox
Chicago Cubs
Chicago White Sox
Cincinnati Reds
Cleveland Indians
Colorado Rockies
Detroit Tigers
Houston Astros
Kansas City Royals
Los Angeles Angels
Los Angeles Dodgers
Miami Marlins
Milwaukee Brewers
Minnesota Twins
New York Mets
New York Yankees
Oakland Athletics
Philadelphia Phillies
Pittsburgh Pirates
San Diego Padres
San Francisco Giants
Seattle Mariners
St. Louis Cardinals
Tampa Bay Rays
Texas Rangers
Toronto Blue Jays
Washington Nationals
AVG SALARY
$ 2,653,029
$ 2,776,998
$ 2,807,896
$ 5,093,724
$ 3,392,193
$ 3,876,780
$ 2,935,843
$ 2,704,493
$ 2,692,054
$ 4,562,068
$ 2,332,730
$ 2,030,540
$ 5,327,074
$ 3,171,452
$ 4,373,259
$ 3,755,920
$ 3,484,629
$ 3,457,554
$ 6,186,321
$ 1,845,750
$ 5,817,964
$ 2,187,310
$ 1,973,025
$ 3,920,689
$ 2,927,789
$ 3,939,316
$ 2,291,910
$ 4,635,037
$ 2,696,042
$ 2,623,746
winning percentage
0.5
0.58
0.574
0.426
0.377
0.525
0.599
0.42
0.395
0.543
0.34
0.444
0.549
0.531
0.426
0.512
0.407
0.457
0.586
0.58
0.5
0.488
0.469
0.58
0.463
0.543
0.556
0.574
0.451
0.605
How is this claim best evaluated?
-graph and statistical analysis
How is this claim best evaluated?
-graph and statistical analysis
Proportion of games won
0.65
0.60
Scatter plot
0.55
0.50
0.45
0.40
0.35
0.30
1
2
3
4
5
Mean salary (million $/yr)
6
7
How is this claim best evaluated?
-graph and statistical analysis
0.65
2
r = 0.03, p = 0.37
Proportion of games won
Nationals
0.60
Scatter plot,
Linear
regression
0.55
0.50
0.45
Red Sox
0.40
0.35
0.30
1
2
3
4
5
Mean salary (million $/yr)
6
7
Conclusion
β€’ Money can’t buy you a winning ball team,
either
Claim 2: Eels control crayfish populations
β€’ Specifically, claim is that crayfish population
densities are lower in streams where eels are
present
β€’ Background: dietary studies show that eels eat
a lot of crayfish, and old Swedish stories
suggest that eels eliminate crayfish
β€’ Data are crayfish densities (count along
transects, snorkelling) in local streams with
and without eels
The data
River
Site
Croton
Green Chimneys
3.225
0
Croton
PEP
0.119
0
Delaware
Buckingham
0.25
1
Delaware
Callicoon
0
1
Delaware
Hankins
0.109
1
Delaware
Mongaup
0
1
Delaware
Pond Eddy
0.067
1
Neversink
Bridgeville
0.233
0
Neversink
TNC
0
1
4.53
0
1.1
0
Shawangunk Mount Hope
Shawangunk Ulsterville
Crayfish (no./m^2) eels
Webatuck
Levin
0.812
0
Webatuck
Shope
1.719
0
Webatuck
Still Point
1.4
0
How is this claim best evaluated?
-graph and statistical analysis
How is this claim best evaluated?
-graph and statistical analysis
error bars show 95% confidence limits
2
Crayfish density (number/m )
4
Bar graph
3
2
1
0
no eels
eels
How is this claim best evaluated?
-graph and statistical analysis
error bars show 95% confidence limits
2
Crayfish density (number/m )
4
Bar graph,
t-test
p = 0.02
3
2
1
0
no eels
eels
Conclusion
β€’ Looks like streams containing eels have fewer
crayfish
Claim 3: Human life expectancy varies
among continents
β€’ Data are mean life expectancy for women in
different countries
The data
Africa
Asia
Americas
Europe
algeria
75
bangladesh
70.2 argentina
79.9
austria
83.6
cameroon
53.6
china
75.6 brazil
77.4
belgium
82.8
cote d'ivoire 57.7
india
67.6 canada
85.3
bulgaria
77.1
egypt
75.5
indonesia
71.8 chile
82.4
czech rep
81
kenya
59.2
iran
75.3 columbia
77.7
denmark
87.4
morocco
74.9
japan
87.1 mexico
79.6
estonia
80
nigeria
53.4
malaysia
76.9 peru
76.9
finland
83.3
south africa
54.1
pakistan
66.9 usa
81.3
france
84.9
zimbabwe
52.7
philippines
72.6 venezuela
77.7
germany
83
singapore
83.7
greece
82.6
How is this claim best evaluated?
-graph and statistical analysis
How is this claim best evaluated?
Life expectancy for women (years)
-graph and statistical analysis
90
error bars show 95% confidence limits
80
Bar graph
70
60
50
Africa
Asia
Note that y-axis doesn’t start at 0
Americas
Europe
How is this claim best evaluated?
Life expectancy for women (years)
-graph and statistical analysis
90
error bars show 95% confidence limits
80
Bar graph,
1-way ANOVA,
p = 0.0000001
70
60
50
Africa
Asia
Americas
Europe
Anova: Single Factor
SUMMARY
Groups
Africa
Asia
Americas
Europe
Count
9
10
9
10
Sum
Average
556.1 61.78889
747.7
74.77
718.2
79.8
825.7
82.57
ANOVA
Source of
Variation
Between
Groups
Within
Groups
1353.931
34
Total
3705.531
37
SS
2351.6
df
MS
Variance
104.6261
42.78233
7.7875
7.731222
F
P-value
F crit
3 783.8666 19.68451 1.42E-07 2.882604
39.8215
Conclusion
β€’ Life expectancy of women appears to differ
among continents
β€’ (The ANOVA doesn’t tell us which continents
are different; further tests would be necessary
to test claims about specific continents)
Measures of Variation
The standard deviation for a sample (S) formula is similar:
2
𝑆=
(𝑋 βˆ’ 𝑋)
𝑁
And, when using sample data to estimate the standard deviation of
a population – this is called the unbiased estimator of the true
population standard deviation (s), use the following formula:
2
𝑠=
(𝑋 βˆ’ 𝑋)
(𝑁 βˆ’ 1)
Statistical Distribution Shapes
Fao.org
A: Normal Distribution
B: Positively Skewed Distribution
C: Negatively Skewed Distribution