Transcript Lecture5

Review
• Sections 2.5-2.7
• Qualitative Numerical Statistics
– Outliers
– Measures of variability
• Range
• Variance
• Standard Deviation
1
Review
R=Max-Min

x

 x  n
2
s2
(x  x)


s s
i
n 1
2
2

i
i
n 1
2
2
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
( x  s, x  s )
b) two standard deviations from the mean i.e.
( x  2 s, x  2 s )
c) three standard deviations from the mean i.e.
( x  3s, x  3s)
3
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
(70.4  16.7, 70.4  16.7)
b) two standard deviations from the mean i.e.
( x  2 s, x  2 s )
c) three standard deviations from the mean i.e.
( x  3s, x  3s)
4
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
(70.4  16.7, 70.4  16.7)  (53.7, 87.1)
b) two standard deviations from the mean i.e.
( x  2 s, x  2 s )
c) three standard deviations from the mean i.e.
( x  3s, x  3s)
5
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
59, 70, 73, 75, 81, 84, 86
b) two standard deviations from the mean i.e.
(70.4  2(16.7), 70.4  2(16.7))  (37.0, 103.8)
c) three standard deviations from the mean i.e.
( x  3s, x  3s)
6
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
59, 70, 73, 75, 81, 84, 86
b) two standard deviations from the mean i.e.
59, 70, 73, 75, 81, 84, 86
c) three standard deviations from the mean i.e.
(70.4  3(16.7), 70.4  3(16.7))  (21.3, 120.5)
7
Example – Using Standard
Deviation
Here are eight test scores from a previous Stats 201
class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and 16.7,
respectively. Work out which data points are within
a) one standard deviation from the mean i.e.
59, 70, 73, 75, 81, 84, 86
b) two standard deviations from the mean i.e.
59, 70, 73, 75, 81, 84, 86
c) three standard deviations from the mean i.e.
35, 59, 70, 73, 75, 81, 84, 86
8
Interpreting the Standard
Deviation
Chebyshev’s Theorem
The proportion (or fraction) of any data set lying
within K standard deviations of the mean is always
at least 1-1/K2, where K is any positive number
greater than 1.
For K=2 we obtain, at least 3/4 (75 %) of all scores
will fall within 2 standard deviations of the mean,
i.e. 75% of the data will fall between
x  2s and x  2s
9
Interpreting the Standard
Deviation
Chebyshev’s Theorem
The proportion (or fraction) of any data set lying
within K standard deviations of the mean is always
at least 1-1/K2, where K is any positive number
greater than 1.
For K=3 we obtain, at least 8/9 (89 %) of all scores
will fall within 3 standard deviations of the mean,
i.e. 89% of the data will fall between
x  3s and x  3s
10
This Data is Symmetric, Bell
Shaped (or Normal Data)
x M
Relative
Frequency
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
11
This Data is Symmetric, Bell
Shaped (or Normal Data)
Relative
Frequency
0.5
0.4
x M
0.3
0.2
0.1
0
1
2
3
4
5
12
This Data is Symmetric, Bell
Shaped (or Normal Data)
Relative
Frequency
0.5
0.4
x M
0.3
0.2
0.1
0
1
2
3
4
5
6
13 7
The Empirical Rule
The Empirical Rule states that for bell shaped
(normal) data:
68% of all data points are within 1 standard deviations of the mean
95% of all data points are within 2 standard deviations of the mean
99.7% of all data points are within 3 standard deviations of the mean
14
The Empirical Rule
The Empirical Rule states that for bell shaped
(normal) data, approximately:
68% of all data points are within 1 standard deviations of the mean
95% of all data points are within 2 standard deviations of the mean
99.7% of all data points are within 3 standard deviations of the mean
15
Z-Score
To calculate the number of standard
deviations a particular point is away from the
standard deviation we use the following
formula.
16
Z-Score
To calculate the number of standard
deviations a particular point is away from the
standard deviation we use the following
formula.
z
x

or
xx
z
s
The number we calculate is called the z-score
of the measurement x.
17
Example – Z-score
Here are eight test scores from a previous
Stats 201 class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and
16.7, respectively.
a) Find the z-score of the data point 35.
b) Find the z-score of the data point 73.
18
Example – Z-score
Here are eight test scores from a previous
Stats 201 class:
35, 59, 70, 73, 75, 81, 84, 86.
The mean and standard deviation are 70.4 and
16.7, respectively.
a) Find the z-score of the data point 35.
z = -2.11
b) Find the z-score of the data point 73.
z = 0.16
19
Interpreting Z-scores
The further away the z-score is from zero the
more exceptional the original score.
Values of z less than -2 or greater than +2 can
be considered exceptional or unusual (“a
suspected outlier”).
Values of z less than -3 or greater than +3 are
often exceptional or unusual (“a highly
suspected outlier”).
20
Percentiles
Another method for detecting outliers
involves percentiles.
21
Percentiles
Another method for detecting outliers involves
percentiles.
The pth percentile ranking is a number so that
p% of the measurements fall below the pth
percentile and 100 – p% fall above it.
22
How to Find Percentiles
1) Rank the n points of data from lowest to
highest
2) Pick a percentile ranking you want to find.
Say p.
3) Compute
 p 
L
n
 100 
– If L is a whole number, then 1/2 way between
the L and L+1st number.
– If L is not a whole number then round up.
23
Important Percentiles
Memorize:
The 25th percentile is called the lower
quartile (QL)
The 75th percentile is called the upper
quartile (QU)
24
Important Percentiles
Memorize:
The 25th percentile is called the lower
quartile (QL)
The 75th percentile is called the upper
quartile (QU)
The 50th percentile is called the
25
Important Percentiles
Memorize:
The 25th percentile is called the lower
quartile (QL)
The 75th percentile is called the upper
quartile (QU)
The 50th percentile is called the median (M)
26
Important Percentiles
The interquartile range (IQR) is defined to
be:
IQR = QU -QL
27
Example - Fax
28
Example - Fax
Here are the number of pages faxed by each
fax sent from our Math and Stats department
since April 24th, in the order that they
occurred.
5, 1, 2, 6, 10, 3, 6, 2, 2, 2, 2, 2, 2, 4, 5, 1, 13,
2, 5, 5, 1, 3, 6, 37, 2, 8, 2, 25
29
Example - Fax
Here are the number of pages faxed by each
fax sent from our Math and Stats department
since April 24th, in the order that they
occurred.
5, 1, 2, 6, 10, 3, 6, 2, 2, 2, 2, 2, 2, 4, 5, 1, 13,
2, 5, 5, 1, 3, 6, 37, 2, 8, 2, 25
Find the 40th percentile, QU , QL , M and IQR.
30
How to Find Percentiles
1) Rank the n points of data from lowest to
highest
2) Pick a percentile ranking you want to find.
Say p.
3) Compute
 p 
L
n
 100 
– If L is not a whole number then round up.
– The percentile is 1/2 way between the L and
L+1st number.
31
Example - Fax
1) Rank the n points of data from lowest to
highest
5, 1, 2, 6, 10, 3, 6, 2, 2, 2, 2, 2, 2, 4, 5, 1, 13,
2, 5, 5, 1, 3, 6, 37, 2, 8, 2, 25
Find the 40th percentile.
32
Example - Fax
1) Rank the n points of data from lowest to
highest
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
Find the 40th percentile.
33
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
2) Pick a percentile ranking you want to find.
40%
Find the 40th percentile.
34
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
2) Pick a percentile ranking you want to find.
40%
3) Compute
 p 
L
n
 100 
35
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
2) Pick a percentile ranking you want to find.
40%
3) Compute
 p 
 40 
L
n  
28  11.2
 100 
 100 
36
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
3) Compute
 p 
 40 
L
n  
28  11.2
 100 
 100 
Half way between the 11th and 12th number.
37
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
3) Compute
 p 
 40 
L
n  
28  11.2
 100 
 100 
Half way between the 11th and 12th number.
Answer: 2
38
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 5, 5,
5, 5, 6, 6, 6, 8, 10, 13, 25, 37
To compute QU and QL , M.
Find the Median, divide the data into two
equal parts and then the Medians of these.
An example and more specific instructions
will be done on the board.
39
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37
M =3
QU = 6
QL = 2
IQR=6-2=4.
40
Percentiles
Sometimes the IQR, is a better measure of
variance then the standard deviation since it
only depends on the center 50% of the data.
That is, it is not effected at all by outliers.
41
Percentiles
Sometimes the IQR, is a better measure of
variance then the standard deviation since it
only depends on the center 50% of the data.
That is, it is not effected at all by outliers.
To use the IQR as a measure of variance we
need to find the Five Number Summary of
the data and then construct a Box Plot.
42
Five Number Summary and
Outliers
The Five Number Summary of a data set
consists of five numbers,
– MIN, QL , M, QU , Max
43
Five Number Summary and
Outliers
The Five Number Summary of a data set
consists of five numbers,
– MIN, QL , M, QU , Max
Suspected Outliers lie
– Above 1.5 IQRs but below 3 IQRs from the
Upper Quartile
– Below 1.5 IQRs but above 3 IQRs from the
Lower Quartile
Highly Suspected Outliers lie
– Above 3 IQRs from the Upper Quartile
– Below 3 IQRs from the Lower Quartile.
44
Five Number Summary and
Outliers
The Inner Fences are:
– data between the Upper Quartile and 1.5 IQRs
above the Upper Quartile and
– data between the Lower Quartile and 1.5 IQRs
below the Lower Quartile
The Outer Fences are:
– data between 1.5 IQRs above the Upper
Quartile and 3 IQRs above the Upper Quartile
and
– data between 1.5 IQRs Lower Quartile and 3
45
IQRs below the Lower Quartile
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37
Min=1, QL = 2, M = 3, QU = 6, Max = 37.
IQR=6-2=4.
46
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37
Min=1, QL = 2, M = 3, QU = 6, Max = 37.
IQR=6-2=4.
Inner Fence extremes: -4, 12
47
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37
Min=1, QL = 2, M = 3, QU = 6, Max = 37.
IQR=6-2=4.
Inner Fence extremes: -4, 12
Outer Fence extremes: -10, 18
48
Example - Fax
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 4, 5, 5, 5, 5, 6, 6, 6, 8, 10, 13, 25, 37
Min=1, QL = 2, M = 3, QU = 6, Max = 37.
IQR=6-2=4.
Inner Fence extremes: -4, 12
Outer Fence extremes: -8, 18
Suspected Outliers: 13
Highly Suspected Outliers: 25, 37
49
Definition: Boxplot
A boxplot is a graph of lines (from lowest point
inside the lower inner fence to highest point in
the upper inner fence) and boxes (from Lower
Quartile to Upper quartile) indicating the
position of the median.
*
Lowest data
Point more than
the lower inner
fence
Outliers
Median
Lower
Quartile
Highest data
Point less than
Upper
the upper inner
Quartile fence
50
Homework
• Read Chapter 3
• Assignment 1 due Next Week
• Problems…
51
Problems
• Problems (z-score) 2.90, 2.102, 2.128
• Problems (percentiles) 2.125, 2.126
52
Example: Aptitude tests
Before being accepted into a manufacturing job, one
must complete two aptitude tests. Your score on the
tests will decide whether you will be in management
or whether you will work on the factory floor. One
test is a manual dexterity test, the other is a statistics
test. The manual dexterity test (out of 10) has a mean
of 6 and a standard deviation of 1. The statistics test
(out of 50) has a mean of 25 with a standard deviation
of 3. Your score is 7/10 on the manual dexterity test,
and a 34/50 on the statistics test. In which test were
53
you exceptional?
Example: Aptitude tests
The problem with comparing the two test scores
stems from the fact that the tests are on two
different scales.
If we are going to do meaningful comparisons,
then we must somehow, standardize the scores.
54
Answer
Calculate the z-score for the two tests.
– Z-score of Man. Dex.
– Z-score of Stats.
= (7-6)/1 = 1
= (34-25)/3 = 3
55
2.90
a. x  8.24
s 2  3.36
s  1.83
56
2.90
a. x  8.24
s 2  3.36
s  1.83
b. x  s  (6.41, 10.07). This contains 18 points
x  2 s  (4.58, 11.90). This contains 24 points
x  3s  (2.75, 13.73). This contains 25 points
57
2.90
b. x  s  (6.41, 10.07). This contains 18 points
x  2 s  (4.58, 11.90). This contains 24 points
x  3s  (2.75, 13.73). This contains 25 points
c. The percentages are 72%, 96% and 100%.
These are relatively close to the percentages
of the Empirical Rule and better than those
given by Chebyshev’s Rule. They agree with
both rules.
58
2.90
c. The percentages are 72%, 96% and 100%.
These are relatively close to the percentages
of the Empirical Rule and better than those
given by Chebyshev’s Rule. They agree with
both rules.
d. R=12-5=7. This is not close to the value
of s.
59
2.102
60
2.102
By Chebeshev’s Rule at least 8/9 or 88% of
the trees fall within 3 standard deviations of
the mean.
61
2.102
By Chebeshev’s Rule at least 8/9 or 89% of
the trees fall within 3 standard deviations of
the mean.
Here that means 89% of the trees are between
21 and 39 feet ( 30  3(3) feet). Hence there
at most (11%)*(5000trees)=555 trees that are
over 40 feet. Therefore the buyer should not
buy.
62
2.128
63
2.128
a.
z
x

64
2.128
a.
z
x

175  79

 4.17
23
65
2.128
a.
z
x

175  79

 4.17
23
b. Yes this is an outlier since z>3.
66
2.128
a.
z
x

175  79

 4.17
23
b. Yes this is an outlier since z>3.
c. - A rare event
- An unusual event happened that week:
-Running a free clinic
-Training students
- A new family that require LA moved
near by
67
2.125
68
2.125
a. 4
b. 6; 3
c. 3
d.
69
2.125
a. 4
b. 6; 3
c. 3
d. Skewed to the right
70
2.125
a. 4
b. 6; 3
c. 3
d. Skewed to the right
e. 50%, 75%
71
2.125
a. 4
b. 6; 3
c. 3
d. Skewed to the right
e. 50%, 75%
f. 12, 13, 16
72
2.126
a.
73
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
74
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
Min 85
Max 196
Qu = 172
QL= 151
M = 165
75
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
Min 85
Inner Fence Limits:
Max 196
172+1.5(21)=203.5
Qu = 172
151-1.5(21)=119.5
QL= 151
M = 165
76
IQR =21
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
Min 85
Inner Fence Limits:
Max 196
172+1.5(21)=203.5
Qu = 172
151-1.5(21)=119.5
QL= 151
M = 165
77
IQR =21
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
Min 85
Inner Fence Limits:
Max 196
172+1.5(21)=203.5
Qu = 172
151-1.5(21)=119.5
QL= 151
Outer Fence Limits:
M = 165
172+3(21)=235
78
IQR =21
151-3(21)=88
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
Min 85
Inner Fence Limits:
Max 196
172+1.5(21)=203.5
Qu = 172
151-1.5(21)=119.5
QL= 151
Outer Fence Limits:
M = 165
172+3(21)=235
79
IQR =21
151-3(21)=88
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
80
100
120
140
160
180
80
200
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
80
100
120
140
160
180
81
200
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
*
80
100
120
140
160
180
82
200
2.126
a. 85 100 121 142 145 157 158 159 161
163 165 166 170 171 171 172 172
173 184 187 196
* Suspected Outlier
Highly Suspected Outlier
*
80
100
120
140
160
180
83
200