Chapter 5 powerpoints only

Download Report

Transcript Chapter 5 powerpoints only

Chapter 5
The Standard Deviation and
the Normal Model
1
68-95-99.7 rule
Mean and
Standard Deviation
(numerical)
Histogram
(graphical)
68-95-99.7 rule
2
The 68-95-99.7 rule; applies
only to mound-shaped data

approximately 68% of the measurements
are within 1 standard deviation of the mean,
that is, in ( y  s, y  s)

approx. 95% of the measurements are within
2 stand. dev. of the mean, i.e., in ( y  2 s, y  2 s )

almost all the measurements are within 3 stan.
dev of the mean, i.e., in ( y  3s, y  3s)
3
68-95-99.7 rule: 68% within 1
stan. dev. of the mean
0.4
0.35
0.3
0.25
68%
0.2
0.15
0.1
34%
34%
0.05
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
y-s
y
y+s
4
68-95-99.7 rule: 95% within 2
stan. dev. of the mean
0.4
0.35
0.3
0.25
95%
0.2
0.15
0.1
47.5% 47.5%
0.05
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
y-2s
y
y+2s
5
Example: textbook costs
286
328
349
367
382
398
425
480
291
340
354
369
385
409
426
307
342
355
371
385
409
428
308
346
355
373
387
410
433
315
347
360
377
390
418
434
316
348
361
380
390
422
437
327
348
364
381
397
424
440
n  50
y  375.48
s  42.72
6
Example: textbook costs (cont.)
286
340
355
373
390
422
440
291
342
355
377
390
424
480
307
346
360
380
397
425
308
347
361
381
398
426
315
348
364
382
409
428
316
348
367
385
409
433
327
349
369
385
410
434
328
354
371
387
418
437
1 standard deviation interval about the mean
y  375.48 s  42.72
( y  s, y  s )  (332.76, 418.20)
32
percentage of data values in this interval
 64% ;
50 7
68-95-99.7 rule:  68%
Example: textbook costs (cont.)
286
340
355
373
390
422
440
291
342
355
377
390
424
480
307
346
360
380
397
425
308
347
361
381
398
426
315
348
364
382
409
428
316
348
367
385
409
433
327
349
369
385
410
434
328
354
371
387
418
437
2 standard deviation interval about the mean
y  375.48 s  42.72
( y  2 s, y  2 s )  (290.04, 460.92)
48
percentage of data values in this interval
 96% ;
50 8
68-95-99.7 rule:  95%
Example: textbook costs (cont.)
286
340
355
373
390
422
440
291
342
355
377
390
424
480
307
346
360
380
397
425
308
347
361
381
398
426
315
348
364
382
409
428
316
348
367
385
409
433
327
349
369
385
410
434
328
354
371
387
418
437
3 standard deviation interval about the mean
y  375.48 s  42.72
( y  3s, y  3s )  (247.32, 503.64)
50
percentage of data values in this interval
 100%;
50
9
68-95-99.7 rule:  99.7%
The best estimate of the standard
deviation of the men’s weights
displayed in this dotplot is
71%
1.
2.
3.
4.
10
15
20
40
16%
9%
4%
1
10
2
3
4
Changing Units of
Measurement
Shifting data and rescaling
data, and how shifting and
rescaling data affect
graphical and numerical
summaries of data.
Shifting and rescaling: linear
transformations
Original data x1, x2, . . . xn
Linear transformation:
x* = a + bx, (intercept a, slope b)
Shifts data
by a
Changes
scale
x*
a
0
x
Linear Transformations
2.54
32
12
40
100
00
0a+
9/5 x
x* = 150
b
Examples: Changing
1. from feet (x) to inches (x*): x*=12x
2. from dollars (x) to cents (x*): x*=100x
3. from degrees celsius (x) to degrees
fahrenheit (x*): x* = 32 + (9/5)x
4. from ACT (x) to SAT (x*): x*=150+40x
5. from inches (x) to centimeters (x*):
x* = 2.54x
Shifting data only: b = 1
x* = a + x
 Adding the same value a to each value in
the data set:
 changes the mean, median, Q1 and Q3 by a
 The standard deviation, IQR and variance are
NOT CHANGED.
Everything shifts together.
Spread of the items does not change.
Shifting data only: b = 1
x* = a + x (cont.)
 weights of 80 men age 19 to 24
of average height (5'8" to 5'10")
x = 82.36 kg
 NIH recommends maximum healthy
weight of 74 kg. To compare their
weights to the recommended
maximum, subtract 74 kg from each
weight; x* = x – 74 (a=-74, b=1)
 x* = x – 74 = 8.36 kg
1.
No change in
shape
2.
No change in
spread
3.
Shift by 74
Shifting and Rescaling data:
x* = a + bx, b > 0
Original x data:
x1, x2, x3, . . ., xn
Summary statistics:
mean x
median m
1st quartile Q1
3rd quartile Q3
stand dev s
variance s2
IQR
x* data: x* = a + bx
x1*, x2*, x3*, . . ., xn*
Summary statistics:
new mean x* = a + bx
new median m* = a+bm
new 1st quart Q1*= a+bQ1
new 3rd quart Q3* = a+bQ3
new stand dev s* = b  s
new variance s*2 = b2  s2
new IQR* = b  IQR
Rescaling data:
x* = a + bx, b > 0 (cont.)
 weights of 80 men age 19 to 24,
of average height (5'8" to 5'10")
 x = 82.36 kg
 min=54.30 kg
 max=161.50 kg
 range=107.20 kg
 s = 18.35 kg
 Change from kilograms to pounds:
x* = 2.2x (a = 0, b = 2.2)
 x* = 2.2(82.36)=181.19 pounds
 min* = 2.2(54.30)=119.46 pounds
 max* = 2.2(161.50)=355.3 pounds
 range*= 2.2(107.20)=235.84 pounds
 s* = 18.35 * 2.2 = 40.37 pounds
Example of x* = a + bx
4 student heights in inches
(x data)
not
62, 64, 74, 72
necessary!
UNC
x = 68 inches
method
s = 5.89 inches
Suppose we want
centimeters instead:
Go directly to
x* = 2.54x
this. NCSU
(a = 0, b = 2.54) method
4 student heights in centimeters:
157.48 = 2.54(62)
162.56 = 2.54(64)
187.96 = 2.54(74)
182.88 = 2.54(72)
x* = 172.72 centimeters
s* = 14.9606 centimeters
Note that
x* = 2.54x = 2.54(68)=172.2
s* = 2.54s = 2.54(5.89)=14.9606
Example of x* = a + bx
x data:
Percent returns from 4
investments during
2003:
5%, 4%, 3%, 6%
not
x = 4.5%
necessary!
s = 1.29%
Inflation during 2003:
2%
x* data:
Inflation-adjusted returns.
Go directly to
this
x* = x – 2%
(a=-2, b=1)
x* data:
3% = 5% - 2%
2% = 4% - 2%
1% = 3% - 2%
4% = 6% - 2%
x* = 10%/4 = 2.5%
s* = s = 1.29%
x* = x – 2% = 4.5% –2%
s* = s = 1.29% (note! that
s* ≠ s – 2%) !!
Example
 Original data x: Jim Bob’s jumbo watermelons from his
garden have the following weights (lbs):
23, 34, 38, 44, 48, 55, 55, 68, 72, 75
s = 17.12; Q1=37, Q3 =69; IQR = 69 – 37 = 32
Melons over 50 lbs are priced differently; the
amount each melon is over (or under) 50 lbs is:
x* = x  50 (x* = a + bx, a=-50, b=1)
-27, -16, -12, -6, -2, 5, 5, 18, 22, 25
s* = 17.12; Q*1 = 37 - 50 =-13, Q*3 = 69 - 50 = 19
IQR* = 19 – (-13) = 32
NOTE: s* = s, IQR*= IQR
SUMMARY: Linear
Transformations x* = a + bx
Assembly Time (seconds)
Assembly Time (minutes)
30
20
15
10
5
0
Frequency
Frequency
25
30
20
10
0
Linear transformations do not affect the shape
of the distribution of the data
-for example, if the original data is rightskewed, the transformed data is right-skewed
SUMMARY: Shifting and
Rescaling data, x* = a + bx, b > 0
original data x1 , x2 , x3 ,... transformed data x1* , x2* , x3* ,...
summary statistics
mean x    
median m
  
summary statistics
new mean x *  a  bx
new median m*  a  bm
1st Q1
   
new Q1*  a  bQ1
3rd Q3
   
new Q  a  bQ3
st dev s    
var. s 2
IQR
   
   
*
3
new st dev s*  bs
new var. s *2  b 2 s 2
new IQR*  bIQR
Z-scores: Standardized Data
Values
Measures the distance of a
number from the mean in units of
the standard deviation
24
z-score corresponding to y
y y
z
s
where
y  original data value
y  the sample mean
s  the sample standard deviation
z  the z-score corresponding to y
25
If data has mean y and standard deviation s,
then standardizing a particular value of y
indicates how many standard deviations y
is above or below the mean y .

Exam 1: y1 = 88, s1 = 6; exam 1 score: 91
Exam 2: y2 = 88, s2 = 10; exam 2 score: 92
Which score is better?
z1 
z2 
91  88
6
92  88


3
 .5
6
4
 .4
10
10
91 on exam 1 is better than 92 on exam 2
26
Comparing SAT and ACT
Scores
SAT Math: Eleanor’s score 680
SAT mean =500 sd=100
 ACT Math: Gerald’s score 27
ACT mean=18 sd=6
 Eleanor’s z-score: z=(680-500)/100=1.8
 Gerald’s z-score: z=(27-18)/6=1.5
 Eleanor’s score is better.

27
Z-scores: a special linear
transformation a + bx
z
xx
s

x
s

1
s
x  a  bx where a  
x
s
,b 
1
s
Example. At a community college, if a student takes x credit
hours the tuition is x* = $250 + $35x. The credit hours taken by
students in an Intro Stats class have mean x = 15.7 hrs and
standard deviation s = 2.7 hrs.
Question 1. A student’s tuition charge is $941.25. What is the z-score of this
tuition?
x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50
z
941.25  799.50 141.75

 1.5
94.50
94.50
Z-scores: a special linear
transformation a + bx (cont.)
Example. At a community college, if a student takes x credit hours
the tuition is x* = $250 + $35x. The credit hours taken by students
in an Intro Stats class have mean x = 15.7 hrs and standard
deviation s = 2.7 hrs.
Question 2. Roger is a student in the Intro Stats class who has a
course load of x = 13 credit hours. The z-score is
z = (13 – 15.7)/2.7 = -2.7/2.7 = -1.
The linear transformation did
What is the z-score of Roger’s tuition? not change the z-score!
Roger’s tuition is x* = $250 + $35(13) = $705
Since x* = $250+$35(15.7) = $799.50; s* = $35(2.7) = $94.50
705 - 799.50 -94.50
z=
=
=-1
94.50
94.50
This is why z-scores are so
useful!!
Z-scores add to zero
Student/Institutional Support to Athletic Depts For the 8 Public ACC
Schools: 2008 ($ millions)
School
Support
y - ybar
Z-score
Clemson
4.5
-3.713
-0.8806
FSU
7.5
-0.7125
-0.1690
GaTech
6.0
-2.213
-0.5248
Maryland
17.1
8.8875
2.1082
NCSU
5.5
-2.713
-0.6434
UNC
6.4
-1.813
-0.4299
UVA
11.9
3.6875
0.8747
VaTech
6.8
-1.413
-0.3351
Mean=8.2125,
s=4.216
Sum = 0
Sum = 0
30
Nationally:
Mean IQ=100
sd = 15
Average IQ by Browser
z
81  100 19

 1.27
15
15
z
127  100 27

 1.80
15
15
Story was exposed as a hoax
31
NORMAL PROBABILITY
MODELS
The Most Important Model for
Data in Statistics
32
µ = 3 and  = 1

0
3
6
8
9
12
A family of bell-shaped curves that differ
only in their means and standard
deviations.
µ = the mean
 = the standard deviation
X
33
Normal Probability Models
The mean, denoted , can be any
number
 The standard deviation  can be any
nonnegative number
 The total area under every normal
model curve is 1
 There are infinitely many normal
distributions

34
Total area =1; symmetric
around µ
35
The effects of  and 
How does the standard deviation affect the shape of f(x)?
= 2
 =3
 =4
How does the expected value affect the location of f(x)?
 = 10  = 11  = 12
36
µ = 3 and  = 1

0
3
6
3
12
µ = 6 and  = 1

0
9
X
6
9
12
X
37

0
3
µ = 6 and  = 2
6
8
3
12
µ = 6 and  = 1

0
9
X
6
8
9
12
X
38
µ = 6 and  = 2
0
3
6
9
12
X
area under the density curve between 6 and 8.
39
area under the density curve between 6 and 8
40
Standardizing
Suppose X~N(
 Form a new random variable by
subtracting the mean  from X and
dividing by the standard deviation :
(X
 This process is called standardizing the
random variable X.
41

Standardizing (cont.)
(X is also a normal random
variable; we will denote it by Z:
Z = (X
  has mean 0 and standard deviation 1:
E(Z) =  = 0; SD(Z) =  1.
1
 The probability distribution of Z is called
42
the standard normal distribution.

Standardizing (cont.)
If X has mean  and stand. dev. ,
standardizing a particular value of x tells how
many standard deviations x is above or below
the mean .
 Exam 1: =80, =10; exam 1 score: 92
Exam 2: =80, =8;
exam 2 score: 90
Which score is better?

92  80 12
z1 

 1.2
10
10
90  80 10
z2 

 1.25
43
8
8
90 on exam 2 is better than 92 on exam 1
µ = 6 and  = 2
0
3
6
8
9
12
X
(X-6)/2
µ = 0 and  = 1
.5
-3
-2
-1
.5
0
1
2
3
Z
44
Standard Normal Model
.5
-3
-2
-1
.5
0
1
2
3
Z = standard normal random variable
 = 0 and  = 1
Z
45
Important Properties of Z
#1. The standard normal curve is
symmetric around the mean 0
#2. The total area under the curve is 1;
so (from #1) the area to the left of 0 is
1/2, and the area to the right of 0 is 1/2
46
Finding Normal Percentiles by Hand
(cont.)


Table Z is the standard Normal table. We have to convert
our data to z-scores before using the table.
The figure shows us how to find the area to the left when
we have a z-score of 1.80:
47
Areas Under the Z Curve:
Using the Table
Proportion of area above the interval
from 0 to 1 = .8413 - .5 = .3413
.50
.3413 .1587
0
1
Z
48
Standard normal areas have been
calculated and are provided in table Z.
Area between -
and z0
The tabulated area correspond
to the area between Z= - and some z0
Z = z0
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.5000
0.5040
0.5080
0.5120
0.5160
0.5199
0.5239
0.5279
0.5319
0.5359
0.1
0.5398
0.5438
0.5478
0.5517
0.5557
0.5596
0.5636
0.5675
0.5714
0.5753
0.2
0.5793
0.5832
0.5871
0.5910
0.5948
0.5987
0.6026
0.6064
0.6103
0.6141
0.8413
0.8438
0.8461
0.8485
0.8508
0.8531
0.8554
0.8577
0.8599
0.8621
0.8849
0.8869
0.8888
0.8907
0.8925
0.8944
0.8962
0.8980
0.8997
0.9015
…
1
…
…
…
…
1.2
…
0.01
…
0.00
…
z
…
…
…
…
49

Example – begin with a normal model
with mean 60 and stand dev 8
Proportion of the area to the left of 70
under the original curve is the proportion
70  60
0.8944
of the area to the left of
 1.25
0.8944
8
0.8944
0.8944
under the standard normal Z curve
In this example z0 = 1.25
= 0.8944
z
0.0
0.1
0.2
0.8438
0.8461
0.8849
0.8869
0.8888
0.05
0.5199
0.5596
0.5987
0.8485
0.8508
0.8531
0.8907
0.8925
0.8944
0.06
0.5239
0.5636
0.6026
0.07
0.5279
0.5675
0.6064
0.08
0.5319
0.5714
0.6103
0.09
0.5359
0.5753
0.6141
0.8554
0.8577
0.8599
0.8621
0.8962
0.8980
0.8997
0.9015
…
…
…
1.2
0.04
0.5160
0.5557
0.5948
…
0.8413
…
1
0.03
0.5120
0.5517
0.5910
…
0.02
0.5080
0.5478
0.5871
…
0.01
0.5040
0.5438
0.5832
…
0.00
0.5000
0.5398
0.5793
…
…
…
…
50
Example
Area=.3980
0

1.27
Area between 0 and 1.27) =
z
.8980-.5=.3980
51
Example
A2
0
.55
Area to the right of .55 = A1
= 1 - A2
= 1 - .7088
= .2912
52
Example
Area=.4875
Area=.0125 -2.24

0
Area between -2.24 and 0 =
z
.5 - .0125 = .4875
53
Example
Area to the left of -1.85
= .0322
54
Example
.9968
A1
A1
.1190
A2
A
-1.18
0
2.73
z
Area between -1.18 and 2.73 = A - A1

= .9968 - .1190

= .8778

55
Example
.6826
.1587
.8413
Area between -1 and +1 = .8413 - .1587 =.6826
56
Example
-.67
Is k positive or negative?
Direction of inequality; magnitude of probability
Look up .2514 in body of table; corresponding entry is -.67
57
Example
Area to the right of 250
under original curve
 area to the right of
250  275 25
Z

 .58
43
43
under the standard normal
curve = 1  .2810  .7190
58
Example
.8671
.1230
.9901
area between 225 and 375  area under
standard normal curve between z = (225  275) 43
= -1.16 and z = (375  275) 43 = 2.33;
the area is .9901  .1230  .8671
59
N(275, 43); find k so that area
to the left is .9846
.9846  area to the left of k under N(275,43)
curve  area to left of z = (k  275) 43 under
N(0,1) curve  k  275  2.16
43
(from standard normal table)
 k  2.16(43)  275  367.88
60
Area to the left of z = 2.16 = .9846
.9846
Area=.5
.4846
.1587
0
2.16
Z
61
Example
Regulate blue dye for mixing paint; machine
can be set to discharge an average of 
ml./can of paint.
 Amount discharged: N(, .4 ml). If more than
6 ml. discharged into paint can, shade of
blue is unacceptable.
 Determine the setting  so that only 1% of
62
the cans of paint will be unacceptable

Solution
X =amount of dye discharged into can
X ~N( , .4); determine  so that
area to the right of 6 is .01
63
Solution (cont.)
X =amount of dye discharged into can
X ~N( , .4); determine  so that
the area to the right of x= 6 is .01.
.01  area to the right of x  6
 area to the right of z = (6   ) .4

 6.4
 2.33(from standard normal table)
  = 6-2.33(.4) = 5.068
64
Normal Distributions

A random variable X with mean  and
standard deviation  is normally distributed if
its probability density function is given by
 x  
(1/ 2)




e
2
1
f ( x) 
  x  
 2
where   3.14159 ... and e  2.71828 ...
65
The Shape of Normal
Distributions
Normal distributions are bell shaped, and
symmetrical around .
90

Why symmetrical? Let  = 100. Suppose x = 110.
f (110) 
1
 2
 110 100 
(1/ 2)

 

e
2

1
 2
 10 
(1/ 2) 

e
110
Now suppose x = 90
2
f (90) 
1
 2
 90 100 
(1/ 2)

 

e
2

1
 2
66
 10 
(1/ 2)

 

e
2
Are You Normal? Normal
Probability Plots
Checking your data to determine if
a normal model is appropriate
67
Are You Normal? Normal Probability
Plots
When you actually have your own data,
you must check to see whether a
Normal model is reasonable.
 Looking at a histogram of the data is a
good way to check that the underlying
distribution is roughly unimodal and
symmetric.

68
Are You Normal? Normal Probability
Plots (cont)
A more specialized graphical display
that can help you decide whether a
Normal model is appropriate is the
Normal probability plot.
 If the distribution of the data is roughly
Normal, the Normal probability plot
approximates a diagonal straight line.
Deviations from a straight line indicate
69
that the distribution is not Normal.

Are You Normal? Normal Probability
Plots (cont)

Nearly Normal data have a histogram
and a Normal probability plot that look
somewhat like this example:
70
Are You Normal? Normal Probability
Plots (cont)

A skewed distribution might have a
histogram and Normal probability plot
like this:
71