Lecture 6: Normal distribution, central limit theorem

Transcript Lecture 6: Normal distribution, central limit theorem

Stats for Engineers Lecture 6
Answers for Question sheet 1 are now online
http://cosmologist.info/teaching/STAT/
Answers for Question sheet 2 should be available Friday evening
Summary From Last Time
𝑃(−1.5 < 𝑋 < −0.7)
Continuous Random Variables
𝑓(𝑥)
Probability Density Function (PDF) 𝑓(𝑥)
𝑏
𝑃 𝑎≤𝑋≤𝑏 =
∞
𝑓 𝑥 ′ 𝑑𝑥′
𝑎
Exponential distribution
𝑓 𝑥 𝑑𝑥 = 1
−∞
𝜈𝑒 −𝜈𝑦 ,
𝑓 𝑦 =
0,
𝑦>0
𝑦<0
Probability density for separation of random independent events with constant rate 𝜈
Normal/Gaussian distribution
𝑓 𝑥 =
1
2𝜋𝜎 2
𝑒
− 𝑥−𝜇 2
2𝜎2
(−∞ < 𝑥 < ∞)
𝜎
𝜇: mean
𝜎: standard deviation
Normal distribution
Question from Derek Bruff
Consider the continuous random variable X = the
weight in pounds of a randomly selected newborn baby. Suppose that X can be modelled with
a normal distribution with mean μ = 7.57 and
standard deviation 𝜎= 1.06.
𝜎
If the standard deviation were 𝜎 = 1.26 instead,
how would that change the graph of the pdf of X?
1.
2.
3.
4.
5.
6.
The graph would be narrower and have a
greater maximum value.
The graph would be narrower and have a
lesser maximum value.
The graph would be narrower and have
the same maximum value.
The graph would be wider and have a
greater maximum value.
The graph would be wider and have a
lesser maximum value.
The graph would be wider and have the
same maximum value.
∞
𝑓 𝑥 𝑑𝑥 = 1
−∞
82%
12%
3%
1
2
0%
0%
3
4
3%
5
6
𝑏
𝑃 𝑎<𝑋<𝑏 =
𝑓 𝑥 𝑑𝑥
𝑎
BUT: for normal distribution cannot integrate analytically.
Instead use tables for standard Normal distribution: 𝑓 𝑧 =
If 𝑋 ∼ 𝑁 𝜇, 𝜎 2 , then 𝑍 =
𝑋−𝜇
𝜎
1
2𝜋
∼ 𝑁(0,1)
−𝑧 2
𝑒 2
Why does this work?
Change of variable
The probability for X in a range 𝑑𝑥 around 𝑥 is for a distribution 𝑓(𝑥) is given by
𝑓 𝑥 𝑑𝑥. The probability should be the same if it is written in terms of another
variable 𝑦 = 𝑦(𝑥). Hence
𝑓 𝑥 𝑑𝑥 = 𝑓 𝑦 𝑑𝑦
𝑓(𝑥)
⇒𝑓 𝑦 = 𝑓 𝑥
i.e. change 𝑥 to 𝑧 =
𝑑𝑥
.
𝑑𝑦
𝑥−𝜇
𝜎
𝑑𝑥
𝑑𝑥
⇒
=𝜎
⇒ 𝑥 = 𝜇 + 𝜎𝑧
𝑑𝑧
𝑍2
=
2
−
𝑑𝑥
1
⇒𝑓 𝑧 = 𝑓 𝑥
=
𝑒
2
𝑑𝑧
2𝜋𝜎
=
1
2𝜋
−𝑧 2
𝑒 2
𝑥−𝜇 2
2𝜎 2
×𝜎
N(0, 1) - standard Normal distribution
𝑍=
Use Normal tables for 𝑄 = 𝑃(𝑍 < 𝑧) [also called Φ(𝑧)]
𝑋−𝜇
𝜎
∼ 𝑁(0,1)
𝑧
Q z =𝑃 𝑍≤𝑧 =
𝑓 𝑥 𝑑𝑥
−∞
𝑸
𝑧
Outside of exams this is probably best evaluated using a computer package (e.g. Maple, Mathematica, Matlab,
Excel); for historical reasons you still have to use tables.
0 ≤ 𝑍 ≤ 3.59
𝑸
z
Example:
If Z ~ N(0, 1):
(a) 𝑃 𝑍 ≤ 1.22 = Q 1.22
= 0.8888
(b) 𝑃 𝑍 > −0.5
= 𝑃 𝑍 ≤ 0.5
= Q(0.5)
= 0.6915.
=
(c) 𝑃 𝑍 ≤ −1.0
= 𝑃(𝑍 ≥ 1.0)
= 1 − 𝑃(𝑍 < 1.0)
= 1 − Q 1.0 = 1 − 0.8413
= 0.1587
=
Symmetries
If 𝑍 ∼ 𝑁(0,1), which of the following is NOT
the same as 𝑃(𝑍 < 0.7)?
1.
2.
3.
4.
1 − 𝑃(𝑍 > 0.7)
𝑃(𝑍 > −0.7)
1 − 𝑃(𝑍 > −0.7)
1 − 𝑃(𝑍 < −0.7)
60%
10%
1
17%
13%
2
3
4
Symmetries
If 𝑍 ∼ 𝑁(0,1), which of the following is NOT
the same as 𝑃(𝑍 < 0.7)?
1 − 𝑃 𝑍 > 0.7
𝑃(𝑍 > −0.7)
1 − 𝑃(𝑍 > −0.7)
1 − 𝑃(𝑍 < −0.7)

=
-

-
-
=
=

(d) 𝑃 0.5 < 𝑍 < 1.5
= 𝑃 𝑍 < 1.5 − 𝑃(𝑍 < 0.5)
= Q 1.5 − Q(0.5)
= 0.9332 −0.6915
= 0.2417
=
−
(e) 𝑃 𝑍 < 1.356
Between Q 1.35 = 0.9115
and Q 1.36 = 0.9131
Using interpolation
Q 1.356 = A 𝑄 1.35 + B 𝑄(1.36)
Fraction of distance between
1.35 and 1.36:
𝐵=
1.356 − 1.35
1.36 − 1.35
𝐴 =1−𝐵
= 0.6
= 0.4
Q 1.356 = 0.4𝑄 1.35 + 0.6𝑄(1.36)
=0.9125
(f) 0.8 = 𝑃 𝑍 ≤ 𝑐 = Q(𝑐)
What is 𝑐?
Use table in reverse:
𝑧 between 0.84 and 0.85
Interpolating as before
𝑐 = 𝐴 × 0.084 + 𝐵 × 0.085
0.8−0.7995
𝐵 = 0.8023−0.7995 = 0.18
𝐴 = 1 − 𝐵 = 0.82
⇒ 𝑐 = 0.82 × 0.084 + 0.18 × 0.085
≈ 0.0842.
Using Normal tables
The error 𝑍 (in Celsius) on a cheap digital thermometer
has a normal distribution, with 𝑍 ∼ 𝑁 0,1 . What is the
probability that a given temperature measurement is too
cold by more than 1.54∘ C?
1.
2.
3.
4.
0.0618
0.9382
0.1236
0.0735
43%
36%
19%
2%
1.
2.
3.
4.
Using Normal tables
The error 𝑍 (in Celsius) on a cheap digital thermometer has a normal
distribution, with 𝑍 ∼ 𝑁 0,1 . That is the probability that a given temperature
measurement is too cold by more than 1.54∘ C?
Answer:
Want 𝑃 𝑍 < −1.54
=
= 𝑃(𝑍 > 1.54)
= 1 − 𝑃(𝑍 < 1.54)
= 1 − Q(1.54)
= 1 − 0.9382 = 0.0618
(g) Finding a range of values within which 𝑍 lies with probability 0.95:
The answer is not unique; but suppose we
want an interval which is symmetric about
zero i.e. between −𝑑 and 𝑑.
0.95
So 𝑑 is where Q 𝑑 = 0.975
0.025+0.95
−𝑑
0.025
0.975
𝑑
𝑑
0.05/2=0.025
Use table in reverse:
Q 𝑑 = 0.975
⇒ 𝑍 = 1.96
95% of the probability is in
the range
−1.96 < 𝑍 < 1.96
In general 95% of the probability lies within 1.96𝜎 of the mean 𝜇
P=0.025
P=0.025
The range 𝜇 ± 1.96𝜎 is called a 95% confidence interval.
Question from Derek Bruff
Normal distribution
If 𝑋 has a Normal distribution with mean 𝜇 = 20
and standard deviation 𝜎 = 4, which of the
following could be a graph of the pdf of 𝑋?
1.
2.
3.
4.
45%
32%
20%
2%
1
2
3
4
Normal distribution
If 𝑋 has a Normal distribution with mean 𝜇 = 20
and standard deviation 𝜎 = 4, which of the
following could be a graph of the pdf of 𝑋?
1.
2.
Too wide
3.
Correct
4.
Wrong mean
i.e. Mean at 𝜇 = 20, 95% inside (5% outside) of 𝜇 ± 2𝜎, i.e. 20 ± 8
Too narrow
Example: Manufacturing variability
The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside diameter Y mm, where
Y ~ N(15.07, 0.0222).
(i) Find the probability that X exceeds 14.99 mm.
(ii) Within what range will X lie with probability 0.95?
(iii) Find the probability that a randomly chosen pipe fits into a randomly
chosen fitting (i.e. X < Y).
Y
X
Example: Manufacturing variability
The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside diameter Y mm, where
Y ~ N(15.07, 0.0222).
(i) Find the probability that X exceeds 14.99 mm.
Answer:
𝑋∼𝑁
𝜇, 𝜎 2
Reminder:
=
𝑍=
𝑁(15.0, 0.022 )
𝑃 𝑋 > 14.99 = 𝑃 𝑍 >
14.99 − 15.0
0.02
= 𝑃 𝑍 > −0.5
= 𝑃 𝑍 < 0.5 = 𝑄(0.5)
≈ 0.6915
𝑋−𝜇
𝜎
Example: Manufacturing variability
The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside diameter Y mm, where
Y ~ N(15.07, 0.0222).
(ii) Within what range will X lie with probability 0.95?
Answer
From previous example 𝑃 −1.96 < 𝑍 < 1.96 = 0.95
i.e. 𝑋 lies in 𝜇 ± 1.96𝜎 with probability 0.95
⇒ 𝑋 = 15 ± 1.96 × 0.02
⇒ 14.96mm < 𝑋 < 15.04mm
Where is the probability
We found 95% of the probability lies within
14.96mm < 𝑋 < 15.04mm
What is the probability that 𝑋 > 15.04mm?
P=0.025
P=0.025
1.
2.
3.
4.
0.025
0.05
0.95
0.975
71%
14%
11%
4%
1.
2.
3.
4.
Example: Manufacturing variability
The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside diameter Y mm, where
Y ~ N(15.07, 0.0222).
(iii) Find the probability that a randomly chosen pipe fits into a randomly
chosen fitting (i.e. X < Y).
Answer
For 𝑋 < 𝑌 we want 𝑃(𝑌 − 𝑋 > 0).
To answer this we need to know the distribution of 𝑌 − 𝑋, where 𝑌 and 𝑋 both
have (different) Normal distributions
Distribution of the sum of Normal variates
Means and variances of independent random variables just add.
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent and each have a normal distribution 𝑋𝑖 ∼
𝑁 𝜇𝑖 , , 𝜎𝑖2
⇒ 𝜇𝑋1+𝑋2 = 𝜇𝑋1 + 𝜇𝑋2
𝜎𝑋21+𝑋2 = 𝜎𝑋21 + 𝜎𝑋22
Etc.
A special property of the Normal distribution is that the distribution of the sum of
Normal variates is also a Normal distribution. [stated without proof]
If 𝑐1 , 𝑐2 , … , 𝑐𝑛 are constants then:
𝑐1 𝑋1 + 𝑐2 𝑋2 + ⋯ 𝑐𝑛 𝑋𝑛 ∼ 𝑁(𝑐1 𝜇1 + ⋯ + 𝑐𝑛 𝜇𝑛 , 𝑐12 𝜎 2 + 𝑐22 𝜎 2 + ⋯ + 𝑐𝑛2 𝜎 2 )
E.g.
𝑋1 + 𝑋2 ∼ 𝑁(𝜇1 + 𝜇2 , 𝜎12 + 𝜎22 )
𝑋1 − 𝑋2 ∼ 𝑁(𝜇1 − 𝜇2 , 𝜎12 + 𝜎22 )
Example: Manufacturing variability
The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside diameter Y mm, where
Y ~ N(15.07, 0.0222).
(iii) Find the probability that a randomly chosen pipe fits into a randomly
chosen fitting (i.e. X < Y).
Answer
For 𝑋 < 𝑌 we want 𝑃(𝑌 − 𝑋 > 0).
𝑌 − 𝑋 ∼ 𝑁 𝜇𝑌 − 𝜇𝑋 , 𝜎𝑌2 + 𝜎𝑋2
= 𝑁 15.07 − 15,0.022 + 0.0222 = 𝑁(0.07,0.000884)
Hence
𝑃 𝑌−𝑋 >0 =𝑃 𝑍 >
0−0.07
0.0.000884
= 𝑃 𝑍 > −2.35
= 𝑃 𝑍 < 2.35
=
≈ 0.991
Which of the following
would make a random
pipe more likely to fit into a
random fitting?
The outside diameter, X mm, of a
copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside
diameter Y mm, where
Y ~ N(15.07, 0.0222).
Y
1.
2.
3.
4.
X
Decreasing mean of Y
Increasing the variance of X
Decreasing the variance of X
Increasing the variance of Y
55%
16%
1
16%
14%
2
3
4
Which of the following
would make a random
pipe more likely to fit into a
random fitting?
The outside diameter, X mm, of a
copper pipe is N(15.00, 0.022) and the
fittings for joining the pipe have inside
diameter Y mm, where
Y ~ N(15.07, 0.0222).
Y
Answer
Common sense.
Or use 𝑑 = 𝑌 − 𝑋 ∼ 𝑁 𝜇𝑌 − 𝜇𝑋 , 𝜎𝑌2 + 𝜎𝑋2
0 − 𝜇𝑑
𝜇𝑑
𝑃 𝑋<𝑌 =𝑃 𝑑>0 =𝑃 𝑍>
=𝑃 𝑍<
𝜎𝑑
𝜎𝑑
Larger probability if
- 𝜇𝑑 larger (bigger average gap between pipe and fitting)
- 𝜎𝑑 smaller (less fluctuation in gap size)
𝜎𝑑2 = 𝜎𝑋2 + 𝜎𝑌2 , so 𝜎𝑑 is smaller if variance of 𝑋 is decreased
X
Normal approximations
Central Limit Theorem: If 𝑋1 , 𝑋2 … are independent random variables with the
same distribution, which has mean 𝜇 and variance 𝜎 2 (both finite), then the
sum 𝑛𝑖=1 𝑋𝑖 tends to the distribution 𝑁(𝑛𝜇, 𝑛𝜎 2 ) as 𝑛 → ∞.
1
Hence: The sample mean 𝑋𝑛 = 𝑛
𝑛
𝑖=1 𝑋𝑖
is distributed approximately as
𝜎2
𝑁(𝜇, 𝑛 ).
For the approximation to be good, n has to be bigger than 30 or more for
skewed distributions, but can be quite small for simple symmetric
distributions.
The approximation tends to have much better fractional accuracy near the
peak than in the tails: don’t rely on the approximation to estimate the
probability of very rare events.
It often also works for the sum of non-independent random variables,
i.e. the sum tends to a normal distribution (but the variance is harder to calculate)
Example: Average of n samples from a uniform distribution:
Example:
The mean weight of people in England is μ=72.4kg, with
standard deviation 𝜎 =15kg.
The London Eye at capacity holds 800 people at once.
What is the distribution of the weight of the passengers at
any random time when the Eye is full?
Answer:
The total weight 𝑊 of passengers is the sum of 𝑛 = 800 individual weights.
Assuming independent:
⇒ by the central limit theorem 𝑊 ∼ 𝑁(𝑛𝜇, 𝑛𝜎 2 )
𝜎 = 15Kg, 𝑛 = 800 ⇒ 𝑊 ∼ 𝑁 800 × 72.4kg, 800 × 152 kg 2
= 𝑁(58000kg, 180000kg 2 )
i.e. Normal with 𝜇𝑊 = 58000Kg , 𝜎𝑊 = 180000Kg = 424Kg
[usual caveat: people visiting the Eye unlikely to actually have independent weights, e.g. families, school trips, etc.]
Course Feedback
Which best describes your experience of the
lectures so far?
Stopped prematurely, not many answers
1.
2.
3.
4.
5.
6.
7.
8.
Too slow
Speed OK, but struggling to understand
many things
Speed OK, I can understand most things
A bit fast, I can only just keep up
Too fast, I don’t have time to take notes
though I still follow most of it
Too fast, I feel completely lost most of the
time
I switch off and learn on my own from
the notes and doing the questions
I can’t hear the lectures well enough (e.g.
speech too fast to understand or other
people talking)
38%
31%
13%
13%
6%
0%
1
0%
2
3
4
5
6
0%
7
8
Course Feedback
What do you think of clickers?
1.
2.
3.
4.
5.
I think they are a good thing, help
me learn and make lectures more
interesting
I enjoy the questions, but don’t think
they help me learn
I think they are a waste of time
I think they are a good idea, but
better questions would make them
more useful
I think they are a good idea, but
need longer to answer questions
65%
14%
12%
7%
2%
1
2
3
4
5
Course Feedback
How did you find the question sheets so far?
1.
2.
3.
4.
5.
Challenging but I managed most of it
OK
Mostly fairly easy
Had difficulty, but workshops helped
me to understand
Had difficulty and workshops were
very little help
I’ve not tried them
47%
20%
14%
16%
4%
1
2
3
4
5
Normal approximation to the Binomial
If 𝑋 ∼ 𝐵(𝑛, 𝑝) and 𝑛 is large and 𝑛𝑝 is not too near 0 or 1, then 𝑋is approximately
𝑁 𝑛𝑝, 𝑛𝑝 1 − 𝑝 .
1
2
𝑛 = 10
𝑝=
1
2
𝑛 = 50
𝑝=
p=0.5
p=0.5
Approximating a range of possible results from a Binomial distribution
e.g. 𝑃(6 or fewer heads tossing a coin 10 times) = 𝑃(𝑋 ≤ 6) if 𝑋 ∼ 𝐵(10,0.5)
𝑃 𝑋 ≤ 6 = 𝑃 𝑘 = 0 + 𝑃 𝑘 = 1 + ⋯ + 𝑃 𝑘 = 6 = 0.8281
𝑥−𝜇 2
6.5 − 2𝜎2
𝑒
≈
−∞
≈Q
2𝜋𝜎 2
1.5
2.5
6.5 − 𝜇
=Q
𝜎
= Q 0.9487 = 0.8286
𝜇 = 𝑛𝑝 = 5
𝜎 2 = 𝑛𝑝 1 − 𝑝 = 2.5
[not always so accurate
at such low 𝑛!]
If 𝑌 ∼ 𝑁(𝜇, 𝜎 2 ) what is the best approximation for
𝑃(3 or more heads when tossing a coin 10 times)?
i.e. If 𝑋 ∼ 𝐵(10,0.5), 𝜇 = 5, 𝜎 2 = 𝑛𝑝 1 − 𝑝 = 2.5, what is
the best approximation for 𝑃(𝑋 ≥ 3)?
1. 𝑃(𝑌 > 2.5)
2. 𝑃(𝑌 > 3)
3. 𝑃(𝑌 > 3.5)
60%
27%
13%
1
2
3
Quality control example:
The manufacturing of computer chips produces 10% defective chips. 200 chips are
randomly selected from a large production batch. What is the probability that fewer
than 15 are defective?
Answer:
mean 𝑛𝑝 = 200 × 0.1 = 20
variance 𝑛𝑝 1 − 𝑝 = 200 × 0.1 × 0.9 = 18.
So if 𝑋 is the number of defective chips, approximately 𝑋 ∼ 𝑁 20,18 .
Hence
𝑃 𝑋 < 15 ≈ 𝑃 𝑍 <
14.5 − 20
18
= 𝑃 𝑍 < −1.296 = 1 − 𝑃 𝑍 < 1.296
= 1 − [0.9015 + 0.6 × 0.9032 − 0.9015 ] ≈ 0.097
𝑛 𝑘
𝑛−𝑘 ≈ 0.093. The Binomial
This compares to the exact Binomial answer 14
𝑘=0 𝐶𝑘 𝑝 1 − 𝑝
answer is easy to calculate on a computer, but the Normal approximation is much easier if
you have to do it by hand. The Normal approximation is about right, but not accurate.

Lecture 6: Normal distribution, central limit theorem

Transcript Lecture 6: Normal distribution, central limit theorem

Directory