Chi Square Tutorial - Central Bucks School District

Download Report

Transcript Chi Square Tutorial - Central Bucks School District

Standard Deviation and
Standard Error Tutorial
This is significantly important.
Get your AP Equations and Formulas sheet
The Basics
• Let’s start with a review of the basics of statistics.
• Mean: What most people consider “average.”
– The sum of all scores divided by the number of scores.
• The mean is good for the average of normally distributed data.
• Median: The middle number when data is ordered.
– If you have an even number, it’s the mean of the two
middle points.
• The median is good for the average of data that is not normally
distributed.
• Mode: The most frequently-seen value in the data.
– 0 if no data points repeat.
Data Distribution
• Feast your eyes on this data and try to get a rough
sense of how a histogram (frequency chart) would
look. Where would the peak be?
Distribution Chart of Heights of 100 Control Plants
Height of
plants (cm)
# of Plants
0.0-0.9
1.0-1.9
2.0-2.9
3.0-3.9
4.0-4.9
5.0-5.9
6.0-6.9
3
10
21
30
20
14
2
Data Distribution
• This is a normal distribution, also known as a
bell curve.
– The majority of individuals are “medium.”
Number of Plants in Each Class
35
30
25
20
15
10
5
0
0.0-0.9 1.0-1.9 2.0-2.9 3.0-3.9 4.0-4.9 5.0-5.9 6.0-6.9
Abnormal Distribution?
• Human height is a fairly normal distribution.
– Average U.S. woman (age 20+) is 5’ 4”.
– Average U.S. man (age 20+) is 5’ 9.5”.
– About 50% of people are at or above average and 50% are
at or below average.
• What, then, is not a normal distribution?
• Imagine if most women are 5’ 4”, but no one is taller.
– That’s not a normal distribution, and it won’t be a bell
curve.
Abnormal Distribution
• The same goes for test scores.
• If we get an average of 80% on a test, we
don’t necessarily have a normal distribution.
– That’s why the median is better than the mean for
test scores.
• Imagine if the average were a 100% –
definitely not a normal distribution.
Back to Standard Deviation/Error
• Suppose two students take a test.
– One gets a 100%, one gets a 0%.
– What’s the mean?
• 50%.
• Suppose two students take a test.
– One gets a 50%, one gets a 50%.
– What’s the mean?
• 50%.
• So it’s the same mean, but we got there very
differently. This could mean a lot about the test.
• Variance measures the average “difference” from
the mean in a set of data.
Variance
• Variance is given by the symbol s2.
• A high variance is indicative of a lot of
deviation from the mean.
• A low variance is indicative of relatively stable
values.
Calculating Variance
S(xi - x )
s =
n -1
2
2
• Σ is “sum of” – you need to perform the numerator
operation for each number in the data set.
• xi is an individual number in your data set.
• x̄ (read: “x bar”) is the mean for your data.
• n is your sample size.
Sample Samples
• Let’s try calculating the variance:
Squares of
deviation
from mean
Height
(cm)
Deviations
from mean
(xi)
(xi- x)
(xi- x)2
A
10
2
4
B
7
-1
1
C
6
-2
4
D
8
0
0
E
9
1
1
Plant
Mean = 8
_
Divided by n-1
_
_
Σ (xi- x)2 = 10
10 / (5-1) = 2.5
Whoo, variance! Now what?
( xi - x )
s
n 1
2
• The standard deviation is simply the square root of the
variance.
– So its symbol is s.
• In our example, s2 (variance) is 2.5, so s (standard
deviation) is 1.58.
• Now, you may be asking why we bother taking this
statistic, if variance seems to do the same thing.
– The reason is that we can make some inferences and
statements about the data in the same way we used chisquared tables to make inferences about the role of chance.
Standard Deviation (SD) Inferences
• If you assume a normal
distribution of data,
68.27% of data is within 1
SD of the mean.
– No real difference.
• 95.45% of the data is
within 2 SD.
– Anything outside is
probably an outlier.
• 99.73% of the data is
within 3 SD.
– Anything outside is almost
definitely an outlier.
Standard Deviation (SD) Inferences
• Suppose the average height of a population is
6 feet (SD = 0.5 feet).
• If the population is normally distributed:
• 68.27% of the population is between 5.5’ and 6.5’.
• 95.45% of the population is between 5’ and 7’.
• 99.73% of the population is between 4.5’ and 7.5’.
Standard Deviation
• The standard deviation (and mean/variance)
allow us to learn something about an entire
population from just a sample.
– Assuming a normal distribution.
– For example, if we took a sample of pro basketball
players’ heights, we could generalize the raw data of
our sample to the entire NBA.
• Key: The more samples we take, and therefore
the more “means” we determine, the closer
we’ll get to the actual mean of the entire league.
Standard Error
• The standard error of the means (SEM) (or just plain
standard error) is a way to determine how likely our
data is off from reality due to chance.
– Oddly a little like x2.
• Example: Consider the NBA player height survey.
• We could sample 10 players and get the average height,
and get the standard deviation from that.
• However, if we continued to sample 10 players over
and over and over again, the mean of our calculated
means would start to become more like the true mean.
– Standard error of the means helps us figure out how close
our calculated mean is to the true mean, even without
knowing it.
Standard Error
• Put it another way:
– If we survey 10 players, that’s a low number.
– Is it likely that those 10 players perfectly represent
the league?
• Probably not.
– If we survey 300 players, that’s a high number.
– Is it likely that those 300 players perfectly
represent the league?
• Probably.
Standard Error
Yet Another Way…
• In hockey, one statistic is SOG (shots on goal).
– It’s the amount of shots a team makes that would
have gone in, if there were no goalie.
• Now, for those of you that don’t watch/play
hockey, suppose I asked you to determine the
average amount of SOG a team gets in a game.
• Here’s the data…
Standard Error
Yet Another Way…
• SOG per game:
–
–
–
–
Game 1: 22
Game 2: 20
Game 3: 21
Game 4: 21
• What’s the average?
– You’d say it’s probably around 21, right? It may be off,
but probably only by a little bit?
• So you’ll have a relatively small standard error because the
data are consistent.
Standard Error
Yet Another Way…
• What if I gave you these data?
• SOG per game:
–
–
–
–
Game 1: 41
Game 2: 19
Game 3: 29
Game 4: 56
• What’s the average?
– You might say it’s in the 20s, but you’re probably not
as confident.
• So you’ll have a relatively large standard error because the
data are not very consistent.
Standard Error
• The formula for standard error should now make
sense:
SEX =
s
n
• s = standard deviation
• n = sample size
• The standard error is best when it is closest to 0.
Standard Error vs. Standard Deviation
• Key: Standard deviation is the deviation of
the raw data from the sample’s mean.
– Think the deviation of an NBA player’s height from
the average of a surveyed population.
• Key: Standard error is the deviation of the
sample from the actual population’s mean.
– Think the deviation of our surveyed population’s
mean height from the true mean height of an NBA
player from the entire league.
One last way to understand this…
• Remember the potato cores?
• You can calculate the average potato core
mass, but that doesn’t tell us how consistent
the mass was.
– That’s why we have standard deviation.
• Once you get a mean for your samples, it also
doesn’t tell us if your set of potato cores was
representative of all the cores I was slicing.
– That’s why we have standard error.
Standard Error vs. Standard Deviation
• Interpreting data:
– Generally you want standard deviation low.
– This means your underlying data set is more consistent.
• Why is that important?
– You definitely want standard error low.
– How can we minimize standard error?
• Have a low standard deviation (out of our control).
• Have a large sample size (in our control).
Confidence Intervals & Error Bars
• In addition to the inferences about data from
before (68% within one SD, et cetera), we also can
make inferences using SEM.
– These are more important for biology.
• Traditionally, 95% is the confidence we need in our
data (just like in chi-squared analyses).
• For SEM, 95% confidence is a confidence interval
represented on a graph as error bars.
– Let’s take a closer look.
Confidence Intervals & Error Bars
• Suppose you want to see if Central Bucks HS students are
significantly taller than Council Rock HS students.
– You can’t do a x2 analysis because there’s no “expected.”
• So, you take the mean of some of the students from each
district.
– You can’t measure all of them – that’d take forever.
• You get the SD and SEM as shown:
Team
Mean
Standard Deviation
Standard Error
Council Rock
72 in.
6 in.
1.90 in.
Central Bucks
80 in.
4 in.
1.26 in.
• Let’s graph the means.
Mean Height of High School Students
86
84
82
Height (in)
80
78
76
74
72
70
68
66
Council Rock
Central Bucks
District
Confidence Intervals & Error Bars
Team
Council Rock
Mean
72 in.
Standard Deviation
6 in.
Standard Error
1.90 in.
Central Bucks
80 in.
4 in.
1.26 in.
• Okay, now let’s figure out a 95% confidence
interval.
• The 95% confidence interval is traditionally ± 2
SEM about the mean.
• In this case:
– C. Rock = 72 in ± 3.80 in (since 1.90 in * 2 = 3.80 in)
– C. Bucks = 80 in ± 2.52 in (since 1.26 in * 2 = 2.52 in)
• Now let’s draw the intervals on the graph.
Mean Height of High School Students
86
84
82
The shapes are the
95% confidence
intervals. Since they
don’t overlap
between the districts,
there is probably a
significant difference
between the heights
of the two.
Height (in)
80
78
76
74
72
70
68
66
Council Rock
Central Bucks
District
Confidence Interval “Frame of Mind”
• When you construct a graph with confidence
intervals and find they do overlap, it suggests
insignificant (null) results.
– It’s possible that the real average height of ALL Council
Rock students is actually equal to the same for the
Central Bucks.
• This is also known as sampling error.
– In other words, there is some average height, within
both confidence intervals, that could make the two
teams equal.
• If there is no overlap, it suggests significance.
Practice
• Standard Deviation and Standard Error
Procedural Practice
Practice
• How else are we going to practice standard
deviation and standard error?
– With your data!
• Find in your lab notebooks the measurements
you took on potato core size.
• Calculate the standard deviation and standard
error for your data set with your lab group.
– See why I had you take their masses individually?
Practice
• Calculate standard deviation:
– What is the SD of your set of three cores before the
study and the SD of your three cores afterward?
• Calculate the standard error:
– For each set of data, how likely is our average potato
mass was close to the actual average potato mass of
all the slices I cut for our lab?
• No error bars needed.
• Last Key Note: Your units for SD and SE match
the units of the mean (here it’s grams).