Transcript Slide 1
Chapter 16
Random Variables
Copyright © 2009 Pearson Education, Inc.
NOTE on slides / What we can and cannot do
The following notice accompanies these slides, which have been downloaded
from the publisher’s Web site:
“This work is protected by United States copyright laws and is provided solely
for the use of instructors in teaching their courses and assessing student
learning. Dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permitted.
The work and materials from this site should never be made available to
students except by instructors using the accompanying text in their classes.
All recipients of this work are expected to abide by these restrictions and to
honor the intended pedagogical purposes and the needs of other instructors
who rely on these materials.”
We can use these slides because we are using the text for this course.
Please help us stay legal. Do not distribute these slides any further.
The original slides are done in orange / brown and black. My additions are in
red and blue. Topics in green are optional.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 3
Topics in this chapter
Random Variables
Probability Models
Expected value
Standard Deviation
Working with means and variances
The “Pythagorean Theorem of Statistics”
Copyright © 2009 Pearson Education, Inc.
Slide 1- 4
Division of Mathematics, HCC
Course Objectives for Chapter 16
After studying this chapter, the student will be able
to:
Define random variable.
Find the probability model for a discrete random
variable.
Find the mean (expected value) and the standard
deviation of a random variable, and
Interpret the meaning of the expected value and
standard deviation of a random variable in the
proper context.
Copyright © 2009 Pearson Education, Inc.
A little more probability
Source: “Understandable Statistics’ (9th Edition),
Brase & Brase, Houghton-Mifflin, Ex. 5.1.9
Movie stars and presidents have fished Pyramid
Lake, Nevada.
It is one of the best places in the lower 48 to
catch trophy trout.
Let’s fish Pyramid Lake for six hours.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 6
Pyramid Lake, Nevada
Source: Wikipedia
•NW Nevada
•20km east of
CA line
•60km north
of Reno
Copyright © 2009 Pearson Education, Inc.
Slide 1- 7
Fishing at Pyramid Lake
We can catch any
number of fish up to
the limit of four.
The chance of
catching a given
number of fish is given
at the right:
Copyright © 2009 Pearson Education, Inc.
Number of fish
Chance of
catching
0
44%
1
36%
2
15%
3
4%
4
1%
Slide 1- 8
Fishing at Pyramid Lake
What is the probability that
an arbitrary fisherman will
catch at least two fish?
Convert to probabilities.
Are these assignments
plausible?
P(2) + P(3) + P(4)
= 0.15 + 0.04 + 0.01
= 0.20
Copyright © 2009 Pearson Education, Inc.
Number of fish
Probability
0
0.44
1
0.36
2
0.15
3
0.04
4
0.01
Slide 1- 9
Fishing at Pyramid Lake
How many fish can a fisherman expect to
catch on the average?
We need some more terminology to answer
this question.
The number of fish caught in a six-hour period
can vary randomly from 0 to 4.
Each number has a probability associated
with it.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 10
Expected Value: Center
A random variable assumes a value based on the
outcome of a random event.
We use a capital letter, like X, to denote a
random variable.
A particular value of a random variable will be
denoted with a lower case letter, in this case x.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 11
Expected Value: Center (cont.)
There are two types of random variables:
Discrete random variables can take one of a
finite number of distinct outcomes.
Example: Number of credit hours
Continuous random variables can take any
numeric value within a range of values.
Example: Cost of books this term (?) This is
actually discrete since it takes on a finite number of
values (perhaps a lot - $0 to maybe $1000 or more.)
Example: Throw a dart at a board. Where the dart
hits in relation to the bulls eye is continuous.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 12
Expected Value: Center (cont.)
A probability model for a random variable
consists of:
The collection of all possible values of a
random variable, and
the probabilities that the values occur.
Of particular interest is the value we expect a
random variable to take on, notated μ (for
population mean) or E(X) for expected value.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 13
Expected Value: Center (cont.)
The expected value of a (discrete) random
variable can be found by summing the products
of each possible value and the probability that it
occurs:
E X x P x
Note: Be sure that every possible outcome is
included in the sum and verify that you have a
valid probability model to start with.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 14
Fishing at Pyramid Lake
We have a random
variable.
We have a probability
model.
Adding the column on
the right, µ = 0.82.
An arbitrary fisherman
can expect 0.82 fish
per trip over the long
haul.
Copyright © 2009 Pearson Education, Inc.
Number of
fish
X
Probability
P(x)
X*P(x)
0
0.44
0
1
0.36
0.36
2
0.15
0.30
3
0.04
0.12
4
0.01
0.04
Slide 1- 15
Some clarification and another question
We all know that the fisherman is not going to
come home with 82/100’s of a fish!
Over the long haul (say if he goes to Pyramid
Lake every weekend), he can expect to get 0.82
fish.
How does that estimate vary?
We need a measure of variability.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 16
First Center, Now Spread…
For data, we calculated the standard deviation by
first computing the deviation from the mean and
squaring it. We do that with discrete random
variables as well.
The variance for a random variable is:
Var X x P x
2
2
The standard deviation for a random variable is:
SD X Var X
Copyright © 2009 Pearson Education, Inc.
Slide 1- 17
Fishing at Pyramid Lake
Number of Probability
fish
P(x)
X
0
0.44
1
0.36
2
0.15
3
0.04
4
0.01
Copyright © 2009 Pearson Education, Inc.
(x - µ)
(x - µ)2
(x - µ) 2 P(x)
Slide 1- 18
Fishing at Pyramid Lake
Number of Probability
fish
P(x)
X
(x - µ)
0
0.44
-0.82
1
0.36
0.18
2
0.15
1.18
3
0.04
2.18
4
0.01
3.18
Copyright © 2009 Pearson Education, Inc.
(x - µ)2
(x - µ) 2 P(x)
Slide 1- 19
Fishing at Pyramid Lake
Number of Probability
fish
P(x)
X
(x - µ)
(x - µ)2
0
0.44
-0.82
0.6724
1
0.36
0.18
0.0324
2
0.15
1.18
1.3924
3
0.04
2.18
4.7524
4
0.01
3.18
10.1124
Copyright © 2009 Pearson Education, Inc.
(x - µ) 2 P(x)
Slide 1- 20
Fishing at Pyramid Lake
Number of Probability
fish
P(x)
X
(x - µ)
(x - µ)2
(x - µ) 2 P(x)
0
0.44
-0.82
0.6724
0.29586
1
0.36
0.18
0.0324
0.011664
2
0.15
1.18
1.3924
0.20886
3
0.04
2.18
4.7524
0.190096
4
0.01
3.18
10.1124
0.101124
Copyright © 2009 Pearson Education, Inc.
Slide 1- 21
Fishing at Pyramid Lake
Add up the rightmost column
0.29586 + 0.01166 + 0.20886 + 0.1901 + 0.10112
Answer is the variance.
σ2 = 0.8076.
We need to take the square root to get the standard
deviation.
Therefore σ = 0.8987.
Explanation in context: The fisherman can expect
to catch 0.82 fish with a standard deviation of
about 0.9 fish.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 22
Fishing at Pyramid Lake with the TI
(Long Way)
First compute the mean
Then “1-var Stats L3”
Then add to get 0.8076
Take the square root .
Answer: 0.89867.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 23
Fishing at Pyramid Lake with the TI
(Short Way)
We’ve got it!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 24
Fishing at Pyramid Lake with the TI
Summary
First, put x in L1 and P(x) in L2. (It will not work the other
way around).
Long way:
L1 * L2 [STO] L3
Do 1-Var Stats L3 and get 0.82.
(L1 – 0.82)2 * L2 [STO] L4
Do 1-Var Stats L4 and read Σx = 0.8076
Take the square root.
Short way:
“1-Var-Stats L1,L2” You have both!
Nothing else!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 25
Notice ∑x and “x-bar.”
They are the same.
Also, n = 1.
This is always the case when
using a probability model.
If there is a discrepancy with
either of these, you probably
do not have a valid model.
Exception: Possible small
discrepancies due to
rounding: Example from last
chapter: 5.69114597 * 10-9
Do 1-VarStats L2 to check.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 26
Note: If you have System 2.55 on your TI-84
1-varStats L1,L2 will
not work with
StatWizard on.
You may input the
data in the manner
shown, however.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 27
StatCrunch – Expected Values
Put the number in var 1 and the probability in L2.
Now do Stat →Calculators→Custom
Put “Fish” in Values, P(Fish) in Weights
Click OK.
Same answer as the TI.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 28
Copyright © 2009 Pearson Education, Inc.
Slide 1- 29
Copyright © 2009 Pearson Education, Inc.
Slide 1- 30
Copyright © 2009 Pearson Education, Inc.
Slide 1- 31
Copyright © 2009 Pearson Education, Inc.
Slide 1- 32
A more complex example
We all know how insurance works.
We pay a premium to insure our cars.
If nothing happens, we lose (the insurance
company wins)
If our car is wrecked, we win (the insurance
company loses).
Questions:
How much can the insurance company expect
to lose over the long haul?
How much of a premium should they charge?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 33
Insurance example
Consider an AD&D company. Past experience has shown that the
company should expect the following to happen per thousand
policyholders:
Outcome
Death
Payout
$10,000
Probability
1 in 1000
Disability
Uninjured
$5,000
0
2 in 1000
997 in 1000
Questions:
How much can the insurance company expect to lose over the
long haul (per thousand policyholders)?
How much of a premium should they charge his year based on
their loss experience (and need to make a profit)?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 34
Insurance example
Let’s compute what the company expects to pay in each category:
Outcome
Death
Disability
Uninjured
Payout
(L1)
$10,000
$5,000
0
Probability
(L2)
0.001
0.002
0.997
Should we do it the long way?
Let’s not and say we did !! Use the TI.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 35
Insurance example
The company can expect to pay out $20 per
policyholder with an expected standard deviation of
about $387.00.
This is reasonable considering that we are dealing
with relatively large payouts and low probabilities.
The company needs to charge more than $20.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 36
Insurance Example by hand
We know that the company’s expected payout is
$20. What is the standard deviation?
First compute the variance.
Var(X) = (10000 – 20)2 + (5000 – 20)2*2 + (0-20) 2*997
1000
1000
1000
or 149,600 (“dollars squared”) (??)
The standard deviation is the square root, or
$386.78 (about $387)
Or use the TI! $386.78159 = $386.78.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 37
Insurance Example
Question: If the company charges $50 per policy,
what profit can it expect to make?
Profit = Premium – Expected Payout.
Intuitively, since the expected payout is $20, the
expected profit is $30.
We can generalize this to an Addition Rule for
Expected Values.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 38
More About Means and Variances
Adding or subtracting a constant from data shifts
the mean but doesn’t change the variance or
standard deviation:
E(X ± c) = E(X) ± c
Var(X ± c) = Var(X)
Example: Consider everyone in a company
receiving a $5000 increase in salary.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 39
More About Means and Variances (cont.)
In general, multiplying each value of a random
variable by a constant multiplies the mean by that
constant and the variance by the square of the
constant:
E(aX) = aE(X)
Var(aX) = a2Var(X)
Therefore, SD(aX) = aSD(X)
Example: Consider everyone in a company
receiving a 10% increase in salary.
Note: Dividing by a constant c is the same as
multiplying by (1 / c).
Copyright © 2009 Pearson Education, Inc.
Slide 1- 40
Insurance Example
Suppose the company doubles the payout, i.e.
$20000 for death, $10000 for disability.
Probabilities remain the same.
We can go through the math again, but let’s not!
Let’s use our formulas. The constant is 2.
E(2X) = 2E(X) = $40.
SD(2X) = 2*SD(X) = $773.56
If we want, we could also compute
Var(2X) = 4 * Var(X) = 598,400 dollars squared!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 41
Insurance Example
Now suppose two people from the same family
buy policies.
This is not the same situation as doubling the
premium or payout.
Both people are unlikely to die or become
disabled in the same year.
We could work with a chart with nine rows (since
there are nine possible outcomes).
Let’s do something easier!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 42
Insurance example
Outcome
Husband
Death
Death
Death
Disabled
Disabled
Disabled
OK
OK
OK
Wife
Death
Disabled
OK.
Death
Disabled
OK.
Death
Disabled
OK
Statistics
Payout
$20,000
$15,000
$10,000
$15,000
$10,000
$5,000
$10,000
$5,000
0
Probability
0.000001
0.000002
0.000997
0.000002
0.000004
0.001994
0.000997
0.001994
0.994009
Standard deviation comes out $546.99.
Should we make our columns (or put it in the TI?)
I didn’t think so. There’s an easier way.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 43
More About Means and Variances (cont.)
In general, with respect to addition and subtraction of
random variables,
The mean of the sum of two random variables is the
sum of the means.
The mean of the difference of two random variables is
the difference of the means.
E(X ± Y) = E(X) ± E(Y)
If the random variables are independent, the variance
of their sum or difference is always the sum of the
variances.
Var(X ± Y) = Var(X) + Var(Y)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 44
Insurance Example
So, all we need do is compute
Var (H + W) = Var (H) + Var (W)
= 149,600 + 149,600 = 299,200
So that SD (H + W) = $546.99 (say $547.)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 45
Notice something else
The variance formula is
Var(X ± Y) = Var(X) + Var(Y)
So we can write the standard deviation of the
sum (or difference) of two random variables as
\
This is the Pythagorean Theorem of Statistics
This theorem will be very important as we apply it in
Chapters 18 – 25. Put an asterisk on it!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 46
A Diversion! (High school student’s
answer on a math test)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 47
Combining Random Variables (The Bad News)
It would be nice if we could go directly from
models of each random variable to a model for
their sum.
But, the probability model for the sum of two
random variables is not necessarily the same as
the model we started with even when the
variables are independent.
Thus, even though expected values may add, the
probability model itself is different.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 48
Continuous Random Variables
Random variables that can take on any value in a
range of values are called continuous random
variables.
Continuous random variables have means
(expected values) and variances.
We won’t worry about how to calculate these
means and variances in this course, but we can
still work with models for continuous random
variables when we’re given these parameters.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 49
Combining Random Variables (The Good News)
Nearly everything we’ve said about how discrete
random variables behave is true of continuous
random variables, as well.
When two independent continuous random
variables have Normal models, so does their sum
or difference.
This fact will let us apply our knowledge of
Normal probabilities to questions about the sum
or difference of independent random variables.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 50
Another example – Baseball
In 2008, Pablo Sandoval. of the San Francisco
Giants had a batting average of .345.
He had 145 “At-bats” and 50 hits.
Out of the 50 hits,
36 were singles,
10 were doubles,
1 was a triple and
3 were home runs.
Can we give Pablo “extra credit” for his doubles, triple
and homers?
Copyright © 2009 Pearson Education, Inc.
Another example – Pablo Sandoval
Let’s compute the
Bases gained
expected value of the
number of bases gained in 0
an “at bat.”
We’ll compute the
1
standard deviation as well
and see if it is useful.
2
• Source:
http://espn.go.com/mlb/player/st
ats/_/id/29212/pablo-sandoval
Copyright © 2009 Pearson Education, Inc.
How many?
95
36
10
3
1
4
3
Pablo Sandoval – slugging ratio
The Expected Value is 0.489655, or 0.490.
This is the “slugging ratio.” It is a recognized
statistic that is tabulated in pro baseball.
It is the expected number of bases that Pablo is
expected to gain over the long run.
He cannot gain 0.49 bases in an “at bat” – it is an
average.
We can look at the standard deviation as well.
σ = 0.823.
Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 54
Pablo Sandoval– slugging ratio
Over the long run, based on 2008 data, Pablo is
expected to hit with
Expected number of bases earned: 0.49
Standard deviation: 0.823
Batting average was 0.345
Copyright © 2009 Pearson Education, Inc.
MegaMillions Lottery
Players may pick six numbers from two separate
pools of numbers - five different numbers from 1
to 56 and one number from 1 to 46 - or select
Easy Pick. You win the jackpot by matching all six
winning numbers in a drawing.
There are lesser prizes for other matches.
Source: http://www.megamillions.com/
Copyright © 2009 Pearson Education, Inc.
Possible prizes
Copyright © 2009 Pearson Education, Inc.
Expected value of winnings
This week, the web site
http://www.megamillions.com/ carried the banner
“Estimated Jackpot $41 Million.”
This is after the March 8 drawing.
Knowing this, we can compute the expected
return on your dollar.
In the TI put the prize value in L1 and 1 divided
by the chances of winning in L2 (one small
hangup – see next slide)
Then do 1-varStats L1,L2
Copyright © 2009 Pearson Education, Inc.
You must include a win of 0.
The overall chances of winning a prize are
approximately 1 in 40.
Therefore, the probability of not winning anything
is approximately 39/40. This is not exact.
Enter the prizes (except for 0) in L1 and 1 divided
by the odds in L2.
Then do 1-varStats L2 to get ∑x=0.0250179206.
Then add 0 as a prize and the probability of
winning 0 as 1 - 0.0250179206 = 0.9749820794.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 59
Computing expected value - TI
Mean = $0.415
StDev = $3095.64
(with rounding)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 60
Expected value of winnings
The TI gives Σx = 0.415
σx = 3095.64
Big standard deviation – remember the big numbers and
low probabilities (like the insurance example.)
Each ticket costs $1.00.
Over the long run, for this drawing, you can expect a
payout (on the average) of $0.42 per ticket with a
standard deviation of $3095.64!
Since the player pays $1 for the ticket, the expected gain
is $1.00 - $.42, or a loss of $.58.
Play responsibly!
Copyright © 2009 Pearson Education, Inc.
Biggest ever - $656M in March 2012
We need only put 656000000 into our calculator.
We get (for a $1.00 ticket) an expected prize
(rounded to two places) of $3.92 and a standard
deviation of $49488.11
How can the lottery afford to give away an
expected prize of $3.92 for a $1.00 ticket?
Answer: They had not given away the top prize in
months.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 62
See also
“THE LOTTERY: A DREAM COME TRUE OR
A TAX ON PEOPLE WHO ARE BAD AT
MATH?”
George Ashline and Joanna Ellis-Monaghan
Department of Mathematics, St. Michael’s
College; Colchester, Vermont
http://academics.smcvt.edu/.../PRIMUS%20v%20
XIV,%20n%204%20The%20Lottery.doc -
Copyright © 2009 Pearson Education, Inc.
*Correlation and Covariance
If X is a random variable with expected value
E(X)=µ and Y is a random variable with expected
value E(Y)=ν, then the covariance of X and Y is
defined as
Cov( X ,Y ) E(( X )(Y ))
The covariance measures how X and Y vary
together.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 64
*Correlation and Covariance (cont.)
Covariance, unlike correlation, doesn’t have to be
between -1 and 1. If X and Y have large values,
the covariance will be large as well.
To fix the “problem” we can divide the covariance
by each of the standard deviations to get the
correlation:
Corr ( X , Y )
Copyright © 2009 Pearson Education, Inc.
Cov( X , Y )
XY
Slide 1- 65
What Can Go Wrong?
Probability models are still just models.
Models can be useful, but they are not reality.
Question probabilities as you would data, and
think about the assumptions behind your
models.
If the model is wrong, so is everything else.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 66
What Can Go Wrong? (cont.)
Don’t assume everything’s Normal.
You must Think about whether the Normality
Assumption is justified.
Watch out for variables that aren’t independent:
You can add expected values of any two
random variables, but
you can only add variances of independent
random variables.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 67
What Can Go Wrong? (cont.)
Don’t forget: Variances of independent random
variables add. Standard deviations don’t.
Don’t forget: Variances of independent random
variables add, even when you’re looking at the
difference between them.
Don’t write independent instances of a random
variable with notation that looks like they are the
same variables.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 68
What have we learned?
We know how to work with random variables.
We can use a probability model for a discrete
random variable to find its expected value and
standard deviation.
The mean of the sum or difference of two random
variables, discrete or continuous, is just the sum
or difference of their means.
And, for independent random variables, the
variance of their sum or difference is always the
sum of their variances.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 69
What have we learned? (cont.)
Normal models are once again special.
Sums or differences of Normally distributed
random variables also follow Normal models.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 70
Topics in this chapter
Random Variables
Probability Models
Expected value
Standard Deviation
Working with means and variances
The “Pythagorean Theorem of Statistics”
Copyright © 2009 Pearson Education, Inc.
Slide 1- 71
Division of Mathematics, HCC
Course Objectives for Chapter 16
After studying this chapter, the student will be able
to:
Define random variable.
Find the probability model for a discrete random
variable.
Find the mean (expected value) and the standard
deviation of a random variable, and
Interpret the meaning of the expected value and
standard deviation of a random variable in the
proper context.
Copyright © 2009 Pearson Education, Inc.