Stat 281 Chapter 6
Download
Report
Transcript Stat 281 Chapter 6
Chapter 6
Continuous Distributions
The Gaussian (Normal) Distribution
When Discrete Distributions Aren’t
Enough
• Discrete distributions are used in situations involving
counts. (Others are possible but this is the vast majority.)
• What happens when you want to measure things?
– Height
– Weight
– Miles per Gallon
• These aren’t counts. (Why not?)
• Measurements involve rounding and precision.
• When any level of precision is theoretically possible, we
call this a “continuous” variable.
• The values come from the set of Real Numbers, ie, the
number line.
Real Numbers
• -------------------------------------------------------
-∞ … -3 -2 -1 0 1 2
3 … ∞
• The Real Numbers include all possible values
between the pictured integers.
• That includes rational numbers like ½, 1/3,
237/573, etc.
• It also includes irrational numbers like π and √2.
• Real numbers have an infinite string of decimal
places.
• There are “uncountably many” real numbers
between any two specified real numbers.
Intervals
• An interval is a “piece” of the number line, or a
subset of the Real numbers.
• There are no “gaps.” For any two numbers in it,
all real numbers between them are included.
• Therefore an interval is described by its
endpoints—with a few special considerations.
• The endpoints may or may not be included.
Round brackets are used to exclude the
endpoints, square brackets to include them.
Write in order. Ex.: [0,1], (9, 100), [3,6), (0,7].
• If an interval goes on to infinity, the ∞ or -∞
symbol is used with a round bracket, since
infinity is not a number. Ex.: [0,∞), (-∞,-10).
Definition of Continuous R.V.
• A continuous random variable takes on values in
some Real Interval.
• -------------------------------------------------------
-∞ … -3 -2 -1 0 1 2
3 … ∞
• Suppose a r.v. X takes values in [0,1].
How many different values are there?
• Suppose you assign some tiny probability to
each Real Number in [0,1]. What is the total
probability?
• Suppose you divide [0,1] up into 10 subintervals.
• Can you assign probabilities to these so the total
is 1?
Definition of Continuous R.V.
• This illustrates the problem with assigning
probabilities to individual numbers, and the
contrasting ease of assigning probability to
intervals.
• Summary:
– Any continuous distribution has infinitely many values.
– No single point has a positive probability.
– Said another way: Every individual value of a
continuous random variable has probability zero, and
as such is an impossible event.
– Intervals can be assigned positive probability.
The Paradox
• Obviously, a r.v., X, must take on some value, and if it
does, that value is not impossible (it has P>0).
• We never actually “mean” a single value. Measurements
are given with a certain precision.
• Example: temperature is continuous, but measured to
the nearest degree, “70” really means the interval
[69.5,70.5).
• Intervals can have positive probability, and we can make
them as small as we like.
• The fact that a continuous r.v. cannot take a single value
agrees nicely with the fact that it is impossible to
measure anything to the exact real number value.
• Instead, we divide up our scale using equal-width
subintervals based on the precision of the measuring
device. These subintervals have positive probability.
Continuous Probabilities
• Probabilities for a continuous random variable,
X, are given by a probability function, P.
• P(X=k)=0 for any k.
• We might find positive probabilities for
expressions like
– P(X>k),
– P(X<k), or
– P(a<X<b).
Note: the interval is (k,∞)
Note: the interval is (-∞,k)
Note: the interval is (a,b)
• A formula that gives probabilities for X would
need to give probabilities for intervals, rather
than single values.
Has anything prepared us for this?
• Tables of probability for discrete r.v.’s?
Not if only individual values were given.
• Ungrouped histograms? No, same.
• Grouped histograms? Let’s see….
• Each bar represents
a frequency for an
interval, even though
this is a discrete
example.
What about relative histograms?
• Look at the histogram for the number of three’s
showing in a two-dice toss.
• Notice it shows the probabilities for 3 discrete values.
• Replace the discrete
values with intervals,
[0,1), [1,2), and [2,3).
• Then this histogram
looks like it belongs
to a continuous
distribution with
values in [0,3).
0.7
Relative Frequency
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
1
Three's Face Up
2
Making the Leap
• Change the horizontal axis to show that the bars
belong to each interval.
• Each bar is 1 unit wide and its height represents the
probability for that interval.
• Each bar is a rectangle, whose area is 1 x height.
• Since the heights add up
to 1, the total area of the
shaded region is 1.
• Make the transition to the
continuous case: Instead
of representing probability
by height, use area.
What did we leap over?
• This has been more of an analogy than an explanation.
Many details that require calculus are glossed over.
• The problem: can’t represent probabilities by height at a
point, because points all have probability zero.
• Solution: switch to areas, where the bottom boundary
(on the x axis) represents an interval for which we want
to determine probability. The area of the graph above
that interval represents its probability.
• In calculus, these areas are called “definite integrals.”
You don’t really need to know that, but you may come
across the following symbol, which means “the integral
from a to b.” b
a
Uniform Distribution
• A uniform distribution is defined for an interval outside of
which there is no positive probability. (This is to prevent
the area from being infinite.)
• Inside that interval, it has the same probabilities for any
sub-interval of a given size (they are “always the same”).
• A uniform distribution on the interval [0,3] is shown here.
Note that the height is 1/3, because 3 x 1/3 = 1.
• However, we should not
say that 1/3 is the
probability of anything
in particular.
Uniform Examples
• Let X be a uniform r.v. on the interval [1,5].
• Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3).
• Solution: The width of the distribution is 4, so
the height of the graph is ¼ between 1 and 5.
The area for any interval will be ¼ x the width of
the interval.
–
–
–
–
P(X>3)=(5-3)/4=1/2
P(X<5)=(5-1)/4=1
P(2<X<3)=(3-2)/4=1/4
Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2
Uniform Examples
• Let X be a uniform r.v. on the interval [1,5].
• Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3).
• Solution: The width of the distribution is 4, so
the height of the graph is ¼ between 1 and 5.
The area for any interval will be ¼ x the width of
the interval.
–
–
–
–
P(X>3)=(5-3)/4=1/2
P(X<5)=(5-1)/4=1
P(2<X<3)=(3-2)/4=1/4
Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2
More Uniform Examples
• Let X be a uniform r.v. on the interval [0,8].
– Find P(X>3), P(X<5), and P(2<X<3 or 7<X<8).
– Find the median and the 90th percentile.
• Solution: The width of the distribution is 8, so
the height of the graph is 1/8.
–
–
–
–
P(X>3)=(8-3)/8=5/8.
P(X<5)=(5-0)/8=5/8.
P(2<X<3 or 7<X<8)=P(2<X<3)+P(7<X<8)=1/4.
The median must have half the probability above it
and half below. Therefore the median is 4.
– P90 is a number such that 90% of the probability is
below it, so we have (P90-0)/8=.9, so P90 =7.2.
Probability Density Function
• We have been dealing with the uniform distribution in
terms of graphs. Before moving on, we need to put
these ideas into the form of mathematical notation.
• We were focusing on the areas of portions of a graph
like the one below. But how do we define the region we
want the area for?
– The bottom boundary is the x axis
– The sides are vertical lines going through the x values we want
– The top of the region is a special “curve” (straight lines are
curves too).
• This curve is defined by a function, called the probability
density function, or pdf. For our graph, it is:
1/ 4 if 1 x 5
f ( x)
0 otherwise
Normal Probability Distributions
• The normal probability distribution
(Gaussian Distribution) is the most
important distribution in all of statistics.
• Many continuous random variables have
normal or approximately normal
distributions.
• A normal distribution is defined by its pdf.
The Normal pdf
1
f ( x)
2
•
•
•
•
•
e
1 x
2
2
The parameters are μ and σ.
The mean of the distribution is μ.
The standard deviation is σ.
The median and mode are also μ.
There is a normal distribution for every
combination of values of μ and σ
Basic Shape
• Here we see the
basic shape of a
normal distribution.
• The blue band is an
example of an “area
under the curve” that
we might want to
calculate.
• This particular distribution has μ=110 and σ=10.
• The “x” axis represents values of the r.v. X.
Effect of Changing μ
Changing μ just causes a horizontal shift,
centering the graph in a different place.
Effect of Changing σ
• Changing σ causes the graph to stretch out or squeeze
together around the mean.
What does this mean?
• The normal pdf is a complicated formula. It is not easy
to calculate probabilities from it, even if you know
calculus. So, we use tables (or computers).
• We can’t have a table for every possible normal
distribution.
• We have one table for
the “standard” normal
distribution, which has
μ=0 and σ=1. This r.v.
is called Z.
• It is easy to convert probability statements from
other normal distributions
to Z.
Table 3, Appendix B entries:
0
z
The table contains the area under the standard normal
curve between 0 and a specific value of z.
Example: Find the area under the standard normal
curve between z = 0 and z = 1.45.
0
1.45
A portion of Table 3:
z
0.00
0.01
1.4
P(0 z 1.45) 0.4265
0.02
0.03
0.04
0.05
0.4265
0.06
Example: Find the area under the normal curve to the
right of Z = 1.45; P(Z > 1.45).
Area asked for
0.4265
0
1.45
P( Z 1.45) 0.5000 0.4265 0.0735
Example: Find the area to the left of Z = 1.45; P(Z <
1.45).
0.5000
0.4265
0
1.45
P( Z 1.45) 0.5000 0.4265 0.9265
Example:
Find the area between Z = 1.26 and the mean (Z = 0).
Area from table
0.3962
Area asked for
1.26
0
P(1.26 Z 0) 0.3962
1.26
Example: Find the area to the left of .98; P(Z < .98).
Area from table
0.3365
Area asked for
Same as
area asked
for
.98
0
.98
P( Z .98) 0.5000 0.3365 0.1635
Applications of Normal Distributions
• Apply the techniques learned for the Z distribution to
all normal distributions.
• Start with a probability question in terms of x-values.
• Convert, or transform, the question into an equivalent
probability statement involving z-values.
Standardization
Suppose X is a normal r.v. with mean and standard
deviation .
X
The r.v. Z
has a standard normal distribution.
0
x
x
Example: A bottling machine is adjusted to fill bottles with a
mean of 32.0 oz of soda and standard deviation of 0.02.
Assume the amount of fill is normally distributed and a bottle
is selected at random.
1. Find the probability the bottle contains between 32 oz and
32.025 oz.
2. Find the probability the bottle contains more than 31.97 oz.
32 32 32
When x 32; z
0
.02
32 32.025 32
When x 32.025; z
1.25
.02
32 32 X 32 32.025 32
P(32 X 32.025) P
.02
.02
.02
P(0 Z 1.25) .3944
Other Normal Applications
Find a cutoff point: a value of X such that there is a
certain probability in a specified interval defined by x.
Example:
The waiting time X at a certain bank is approximately
normally distributed with a mean of 3.7 minutes and a
standard deviation of 1.4 minutes. The bank would
like to claim that 95% of all customers are waited on
by a teller within c minutes. Find the value of c that
makes this statement true.
Solution:
0.0500
0.5000 0.4500
3.7
0
P( X c) .95
X 3.7 c 3.7
P
.95
1.4
1.4
c 3.7
PZ
.95
1.4
c
1645
.
x
z
c 3.7
1645
.
14
.
c (1645
. )(14
. ) 3.7 6.003
c 6 minutes
Notation:
If X is a normal random variable with mean and
standard deviation , this is often denoted: X ~ N(, 2).
Example: Suppose X is a normal random variable with
= 35 and = 6. A convenient notation to identify this
random variable is: x ~ N(35, 36).
z(a) and za are commonly used notations for the zscore (point on the z axis) such that there is a of the
area (probability) to the right of z(a) or za .
Illustrations:
z(0.10) represents the
value of Z such that the
area to the right under
the standard normal
curve is 0.10
010
.
0
z(0.80) represents the
value of Z such that the
area to the right under
the standard normal
curve is 0.80
z(010
. )
z
0.80
z(0.80) 0
z
Example: Find the numerical value of z(0.10).
Table shows this area (0.4000)
0.10 (area information
from notation)
0
z(010
. )
z
Use Table 3: look for an area as close as possible to
0.4000
z(0.10) = 1.28
Note:
The values of Z that will be used regularly come from
one of the following situations:
1. The z-score such that there is a specified area in one
tail of the normal distribution.
2. The z-scores that bound a specified middle
proportion of the normal distribution.
Example: Find the z-scores that bound the middle 0.99
of the normal distribution.
0.005
0.005
0.495
z(0.995)
or
z(0.005)
0.495
0
z(0.005)
Use Table 3:
z(0.005) 2.575 and z(0.995) z(0.005) 2.575