Transcript 2-5

Variation
Measures of variation quantify how spread out the
data is.
Variation is one of the core ideas in Statistics
Super-simple measure of variation
Range = highest value – lowest value
Not good for much, but gives us some idea
how spread out the data is.
Standard Deviation
Standard Deviation is a measure of variation based on the
mean
Because of this, it can be strongly influenced by outliers, just
like the mean.
Standard Deviation is always positive or 0 (zero only if all the
data are the same)
The standard deviation has the same units as the data
Calculating Standard Deviation
Definitional formula
s
 (x  x)
2
n 1
Notice we are measuring variation of the data from the mean.
This formula is for the sample standard deviation, and is based
on the sample mean and sample size
Calculating Standard Deviation
Shortcut Formula
n x    x 
2
s
2
n(n  1)
The advantage: No need to calculate the mean first
The disadvantage: Doesn’t make as much sense
Example: Definitional Form
Data x
7
8
10
11
13
x  12.3 25
s
x  x 
xx
2
7-12.3 = -5.3
8-12.3 = -4.3
10-12.3 = -2.3
11-12.3 = -1.3
13-12.3 = 0.7
25-12.3 = 12.7
2
(
x

x
)

n 1

28.09
18.49
5.29
1.69
.49
161.29
215.34
 6.6
5
Example: Shortcut Form
Data x
7
8
10
11
13
25
Sums: 74
x2
49
64
100
121
169
625
1128
n x    x 
2
2
s
n(n  1)
61128  74
6(6  1)
2
s
1292
s
30
s  6.6
Population Standard Deviation
If we have the population data, we can
calculate the population standard deviation.
To distinguish it, we use a different symbol.

 (x  )
N
2
Variance
Sample Variance: s 2
Population Variance:  2
Understanding Standard Deviation
Main idea:
Bigger value, data is more spread out.
Smaller value, data is closer together.
Rule of Thumb
range
To very roughly approximate s, s 
4
Rough interpretation:
“Most” data will be within two standard
deviations of the mean. In other words,
Approximate highest value  x  2s
Approximate lowest value  x  2s
Empirical Rule
For data sets with a bell-shaped distribution,
Example
For a particular fast-food store, the time people have
to wait at the drive-through has a bell-shaped
distribution with
x  3.5 min
s  0.7 min
Then about 68% of people wait between
x  s  2.8 min
and
x  s  4.2 min
About 95% of people wait between
x  2s  2.1min
and
x  2s  4.9 min
Almost everyone (99.7%) of people wait between
x  3s  1.4 min
and
x  3s  5.6 min
Homework
2.5: 3, 9, 21, 23, 25, 33