Lecture 7 - Alex Braunstein's Blog

Download Report

Transcript Lecture 7 - Alex Braunstein's Blog

Statistics 111 - Lecture 7
Probability
Normal Distribution
and Standardization
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
1
Administrative Notes
• Homework 2 due on Monday
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
2
Outline
• Law of Large Numbers
• Normal Distribution
• Standardization and Normal Table
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
3
Data versus Random Variables
• Data variables are variables for which we
actually observe values
• Eg. height of students in the Stat 111 class
• For these data variables, we can directly calculate the statistics
s2 and x
• Random variables are things that we don't
directly observe, but we still have a probability
distribution of all possible values
• Eg. heights of entire Penn student population
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
4
Law of Large Numbers
• Rest of course will be about using data
statistics (x and s2) to estimate parameters of
random variables ( and 2)
• Law of Large Numbers: as the size of our
data sample increases, the mean x of the
observed data variable approaches the mean 
of the population
• If our sample is large enough, we can be
confident that our sample mean is a good
estimate of the population mean!
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
5
The Normal Distribution
• The Normal distribution has the shape of a “bell
curve” with parameters  and 2 that determine
the center and spread:


June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
6
Different Normal Distributions
• Each different value of  and 2 gives a
different Normal distribution, denoted N(,2)
N(0,1)
N(2,1)
N(-1,2)
N(0,2)
• We can adjust values of  and 2 to provide
the best approximation to observed data
• If  = 0 and 2 = 1, we have the Standard
Normal distribution
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
7
Property of Normal Distributions
• Normal distribution follows the 68-95-99.7 rule:
• 68% of observations are between  -  and  + 
• 95% of observations are between  - 2 and  + 2
• 99.7% of observations are between  - 3 and  + 3

2
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
8
Calculating Probabilities
• For more general probability calculations, we
have to do integration
For the standard
normal distribution,
we have tables of
probabilities already
made for us!
If Z follows N(0,1):
P(Z < -1.00) = 0.1587
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
9
Standard Normal Table
If Z has N(0,1):
P(Z > 1.46)
= 1 - P(Z < 1.46)
= 1 - 0.9279
= 0.0721
• What if we need to do a probability calculation for
a non-standard Normal distribution?
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
10
Standardization
• If we only have a standard normal table, then we
need to transform our non-standard normal
distribution into a standard one
• This process is called standardization

1

June 5, 2008
0
Stat 111 - Lecture 7 - Normal
Distribution
11
Standardization Formula
• We convert a non-standard normal distribution
into a standard normal distribution using a linear
transformation
• If X has a N(,2) distribution, then we can
convert to Z which follows a N(0,1) distribution
Z = (X-)/
• First, subtract the mean  from X
• Then, divide by the standard deviation  of X
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
12
Linear Transformations of Variables
• Sometimes need to do simple mathematical
operations on our variables, such as adding and/or
multiplying with constants
Y = a· X + b
• Example: changing temperature scales
Fahrenheit = 9/5 x Celsius + 32
• How are means and variances affected?
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
13
Mean/Variances of Linear Transforms
• For transformed variable Y = a·X + b
mean(Y) = a·mean(X) + b
Var(Y) = a2·Var(X)
SD(Y) = |a|·SD(X)
• Note that adding a constant b does not affect measures
of spread (variance and sd)
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
14
More complicated linear functions
• We can also do linear transformations involving with
more than one variable:
Z = a·X + b·Y + c
• The mean formula is similar:
mean(Z) = a·mean(X) + b·mean(Y) + c
• If X and Y are also independent then
var(Z) = a2·var(X) + b2·var(Y)
• Need more complicated variance formula (in book) if
the variables are not independent
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
15
Standardization Example
Dear Abby,
You wrote in your column that a woman is pregnant for
266 days. Who said so? I carried my baby for 10
months and 5 days. My husband is in the Navy and it
could not have been conceived any other time because I
only saw him once for an hour, and I didn’t see him
again until the day after the baby was born. I don’t drink
or run around, and there is no way the baby isn’t his, so
please print a retraction about the 266-day carrying time
because I am in a lot of trouble!
-San Diego Reader
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
16
Standardization Example
• According to well-documented data, gestation
time follows a normal distribution with mean 
of 266 days and SD  of 16
• Let X = gestation time. What percent of
babies have gestation time greater than 310
days (10 months & 5 days) ?
• Need to convert X = 310 into standard Z
Z = (X-)/ = (310-266)/16 = 44/16 = 2.75
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
17
Standardization Example
P(X > 310)
= P(Z > 2.75)
= 1 - P(Z < 2.75)
= 1 - 0.9970
= 0.0030
So, only a 0.3%
chance of a
pregnancy lasting
as long as 310 days!
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
18
Reverse Standardization
• Sometimes, we need to convert a standard
normal Z into a non-standard normal X
• Example: what is the length of pregnancy
below which we have 10% of the population?
• From table, we see P(Z <-1.28) = 0.10
• Reverse Standardization formula:
X = σ⋅Z +μ
• For Z = -1.28, we calculate
X = -1.28·16 + 266 = 246 days (8.2 months)
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
19
Another Example
• NCAA Division 1 SAT Requirements: athletes
are required to score at least 820 on combined
math and verbal SAT
• In 2000, SAT scores were normally distributed
with mean  of 1019 and SD  of 209
• What percentage of students have scores
greater than 820 ?
Z = (X-)/ = (820-1019)/209 = -199/209 = -.95
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
20
Another Example
• P(X > 820) = P(Z > -0.95) = 1- P(Z < -0.95)
• P(Z < -0.95) = 0.17 so P(X > 820) = 0.83
• 83% of students meet NCAA requirements
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
21
SAT Verbal Scores
• Now, just look at X = Verbal SAT score, which
is normally distributed with mean  of 505 and
SD  of 110
• What Verbal SAT score will place a student in
the top 10% of the population?
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
22
SAT Verbal Scores
• From the table, P(Z >1.28) = 0.10
• Need to reverse standardize to get X:
X = σ⋅Z + μ = 110⋅1.28 + 505 = 646
• So, a student needs a Verbal SAT score
of 646 in order to be in the top 10% of all
students
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
23
Next Class - Lecture 8
• Chapter 5: Sampling Distributions
June 5, 2008
Stat 111 - Lecture 7 - Normal
Distribution
24