Transcript Topic 05
Topic 5 - Joint distributions and the CLT
• Joint distributions - pages 145 - 156
• Central Limit Theorem - pages 183 - 185
Joint distributions
• Often times, we are interested in more than
one random variable at a time.
• For example, what is the probability that a
car will have at least one engine problem
and at least one blowout during the same
week?
• X = # of engine problems in a week
• Y = # of blowouts in a week
• P(X ≥ 1, Y ≥ 1) is what we are looking for
• To understand these sorts of probabilities,
we need to develop joint distributions.
Discrete distributions
• A discrete joint probability mass function
is given by
f(x,y) = P(X = x, Y = y)
where
1. f (x , y ) 0 for all x , y
2.
all (x ,y )
f (x , y ) 1
3. P ((X ,Y ) A ) all ( x ,y )A f (x , y )
4. E (h (X ,Y )) all (x ,y ) h (x , y ) f (x , y )
Return to the car example
• Consider the following joint pmf for X and Y
X\Y
0
1
2
3
4
0
1/2
1/16 1/32 1/32 1/32
1
1/16 1/32 1/32 1/32 1/32
2
1/32 1/32 1/32 1/32 1/32
• P(X ≥ 1, Y ≥ 1) =
• P(X ≥ 1) =
• E(X + Y) =
Joint to marginals
• The probability mass functions for X and Y
individually (called marginals) are given by
f X (x ) all y f (x , y ), fY (y ) all x f (x , y )
• Returning to the car example:
fX(x) =
fY(y) =
E(X) =
E(Y) =
Continuous distributions
• A joint probability density function for two
continuous random variables, (X,Y), has the
following four properties:
1. f (x , y ) 0 for all x , y
2.
f (x , y )dxdy 1
- -
3. P ((X ,Y ) A )
4. E (h (X ,Y ))
A
f (x , y )dxdy
h (x , y ) f (x , y )dxdy
- -
Continuous example
• Consider the following joint pdf:
x (1 3y 2 )
f (x , y )
4
0 x 2, 0 y 1
• Show condition 2 holds on your own.
• Show P(0 < X < 1, ¼ < Y < ½) = 23/512
Joint to marginals
• The marginal pdfs for X and Y can be found
by
f X (x )
f (x , y )dy,
fY (y )
f (x, y )dx
• For the previous example, find fX(x) and fY(y).
Independence of X and Y
• The random variables X and Y are
independent if f(x,y) = fX(x) fY(y) for all pairs
(x,y).
• For the discrete clunker car example, are X
and Y independent?
• For the continuous example, are X and Y
independent?
Sampling distributions
• We assume that each data value we collect
represents a random selection from a common
population distribution.
• The collection of these independent random
variables is called a random sample from the
distribution.
• A statistic is a function of these random
variables that is used to estimate some
characteristic of the population distribution.
• The distribution of a statistic is called a
sampling distribution.
• The sampling distribution is a key component
to making inferences about the population.
StatCrunch example
• StatCrunch subscriptions are sold for 6 months
($5) or 12 months ($8).
• From past data, I can tell you that roughly 80%
of subscriptions are $5 and 20% are $8.
• Let X represent the amount in $ of a purchase.
• E(X) =
• Var(X) =
StatCrunch example continued
• Now consider the amounts of a random
sample of two purchases, X1, X2.
• A natural statistic of interest is X1 + X2, the
total amount of the purchases.
Outcomes X1 + X2 Probability
X1 + X2 Probability
StatCrunch example continued
• E(X1 + X2) =
• E([X1 + X2]2) =
• Var(X1 + X2) =
StatCrunch example continued
• If I have n purchases in a day, what is
– my expected earnings?
– the variance of my earnings?
– the shape of my earnings distribution for large n?
• Let’s experiment by simulating 1000 days
with 100 purchases per day.
• StatCrunch
Central Limit Theorem
• We have just illustrated one of the most
important theorems in statistics.
• As the sample size, n, becomes large the
distribution of the sum of a random sample
from a distribution with mean m and
variance s2 converges to a Normal
distribution with mean nm and variance ns2.
• A sample size of at least 30 is typically
required to use the CLT
• The amazing part of this theorem is that it is
true regardless of the form of the underlying
distribution.
Airplane example
• Suppose the weight of an airline passenger
has a mean of 150 lbs. and a standard
deviation of 25 lbs. What is the probability
the combined weight of 100 passengers will
exceed the maximum allowable weight of
15,500 lbs?
• How many passengers should be allowed on
the plane if we want this probability to be at
most 0.01?
The sample mean
• For constant c, E(cY) = cE(Y) and Var(cY) = c2Var(Y)
• E(X ) =
• Var( X ) =
• The CLT says that for large samples, X is
approximately normal with a mean of m and a
variance of s2/n.
• So, the variance of the sample mean decreases
with n.
Sampling applet