Transcript Chapter 7
Chapter 7
Random Variables
Warm -up
A couple plans to have three children. There are 8
possible arrangements of girls and boys. For
example, GGB means the first two children are girls
and the third child is a boy. All 8 arrangements are
(approximately) equally likely.
1. Write down all 8 arrangements of the sexes of
three children.
2. What is the probability of any one of these
arrangements?
Answer : BBB, BBG, BGB, GBB, GGB, GBG, BGG,
GGG. Each has probability 1/8.
Let X be the number of girls the couple
has. What is the probability that X = 2?
Answer : Three of the eight arrangements
have two (and only two) girls, so P(X = 2) = 3/8
= 0.375.
3. Starting from your work in (a), find the
distribution of X. That is, what values can X
take, and what are the probabilities for
each value?
2.
Value of x
0
1
2
3
Probability
1/8
3/8
3/8
1/8
Random Variables
Sample spaces are not always numeric
(example tossing 4 coins: HTTH, TTTH, etc.)
If we let X = the number of heads, then in
those 2 outcomes X = 2, X = 1.
We call X a “random variable” because its
values vary when the coin tossing is repeated.
Random variables usually denoted by capital
letters such as X or Y
As we progress from general probability to
inference, the sample space S just lists the
possible values of the random variable
7.1 Discrete and Continuous
random variables
Example
A
University posts the grade distributions
for its courses online. Students in stats
received 21% A’s, 43% B’s, 30% C’s, 5% D’s,
1% F’s. Choose a student at random- the
student’s grade on a four point scale (A =
4) is a random variable. The value of X
changes when we repeatedly choose
students at random but it’s always either a
0, 1, 2, 3, or 4.
Here
is the distribution:
Value of X
0
1
2
3
4
Probability
.0
1
.0
5
.3
0
.4
3
.21
The
probability that the student got a B or
better is the sum of the probabilities of an
A and a B (they’re independent)
P(X ≥ 3) = P(X = 3) + P(X = 4)
= .43 + .21 = .64
Probability Histograms
Idealized
trials
pictures of the results over many
Continuous Random Variables
When
we choose a digit between 0 and 9, each
number has .1 probability. What if we wanted to
allow any number between 0 and 1 as our
outcome?
There are infinite possible
Outcomes! How can we
Assign probabilities?
X in this case is a continuous
Variable b/c its values are not
Isolated numbers but an entire interval of numbers
The
random number generator will spread
its output uniformly across the entire
interval from 0 to 1 creating a density
curve of a UNIFORM DISTRIBUTION.
Area under curve is 1 and probability of any
event is the area under the curve and
above the event in question.
Probability model for a continuous
variable assigns probabilities to
INTERVALS of outcomes- not
individual outcomes! An individual
outcome actually gets assigned a
probability of 0!
Ex: Random number gen produces
a number between .79 and .81 with
a probability of .02. An outcome
between .799 and .801 has
probability .002 (we can ignore the
distinction between > and ≥ in
continuous but not discrete
variables.
Normal Distributions
The
density curve we are most familiar
with is a Normal Curve.
Remember N(μ, σ) is our notation for a
normal distribution.
Z score is a standard Normal random
variable having the distribution N (0, 1)
Example: cheating in School
True population probability that a student will report
on another cheating student is 12%
based on a survey of 400 random students, done many
times, we expect the average probability to get close to
this .12 value. (with a standard deviation of .016)
This is a continuous random variable b/c if I draw one
sample of 400 I would likely get a different proportion.
What is the probability that if I conduct a survey my
result will differ from the true probability by 2%?
If the result is less than .10 or greater than .14
P(From table A or calculator, P(-1.25 ≤ Z ≤ 1.25)
= .8944 - .1056 = .7888
So probability we seek is 1 - .7888 = .2112 which is 21%
Homework
# 2, 3, 6-8, 13-17
7.2 Means and variances of
Random Variables (weighted
average)
Mean of a sample is X bar, Mean of a probability distribution is μ
Example
Lottery: You choose a 3 digit number. If the
lottery shows your same number you win $500.
Since there are 1000 possible 3 digit numbers,
you have a 1/1000 chance of winning.
Payoff X
$0
$500
Probability
.999
.001
What is your average payoff from many
tickets?
$500(.001) + $0(.999) = $.50
How would this differ if you paid 1$ for your
ticket?
Mean of a continuous random
variable
The
point at which the area under the
density curve would balance if it were
made out of solid material (center if
symmetrical).
Mean of skewed density curve requires
math outside this course!
Variance of a random
Variable
Variance
of a random variable X notated
as σ2x (different from variance of a
sample which is notated as s2).
Example- Linda sells cars
Cars Sold
0
1
2
3
Probability .3
.4
.2
.1
We can find the mean and variance of X with a table (or with
our calculator! enter X in L1, P in L2, do 1-var stat L1, L2!)
Xi
Pi
XiPi
(Xi – μx2)Pi
0
.3
.0
(0-1.1)2(.3)
.363
1
.4
.4
(1-1.1)2(.4)
.004
2
.2
.4
(2-1.1)2(.1)
.162
3
.1
.3
(3-1.1)2(.1)
.361
Μx = 1.1
Σx2 = .890
Law of Large Numbers
If we want to estimate the mean height μ of
all American women between age 18-24. To
estimate μ we take a SRS of F18-24 and use
the sample mean X bar to estimate the
unknown population mean.
If we repeat this, and choose another sample,
the mean height will likely differ, but the more
times we repeat this- drawing a sample and
recording the mean- we expect that the
average of the mean heights of all our samples
will get very close to the true μ
The behavior of X bar is just like the
behavior of expected probabilities!
Example: Suppose the true μ for
women’s heights was 64.5 inches with a
standard deviation of 2.5 inches.
If I continuously repeat drawing a sample
of women from this population and
recording their average height. After
each recording, I write down the
average of my mean sample heights.
The more times I do this, the closer my
overall average gets to 64.5
Casinos, Insurance
companies, and law of large
numbers
Gamblers
may win or lose, but the casino
will win in the long run because the law of
large numbers says what the average
outcome of many thousands of bets will
be
This is the same concept when insurance
companies decide what to charge or
how many beef patties McD’s should
make per day
Law of Small numbers
Rules of probability and law of large numbers describe the
regular behavior of chance phenomena in the LONG run,
but Psychologists have discovered that our intuitive
understanding of randomness is quite different from the true
laws of chance.
We expect that even short sequences of random events will
show the kind of average behavior that in fact only appears
in the long run.
Ex: Write down a sequence of heads and tails that you think
imitates 10 tosses of a balanced coin. What was the longest
run of consecutive heads or tails in your tosses?
Most people don’t write a run of more than 2 consecutive
heads or tails. Longer runs don’t seem “random” to us.
In fact, the probability of a run of 3 or more consecutive
heads or tails in 10 tosses is greater than .5078!
Seeing a run of 3 or more may cause us to incorrectly
conclude that we have a biased coin.
Some gamblers follow “hot-hand” theory…silly!
How large is large?
The
law doesn’t say how many trials are
needed to guarantee a mean outcome close
to μ.
That depends on the variability of the outcomes.
The
more variable the outcomes, the more trials
are needed to ensure that the mean outcome X
bar is close to the distribution mean μ
Casinos understand this: the outcomes of games
of chance are variable enough to hold the interest
of gamblers. Only the casino plays often enough
to rely on the law of large numbers. Gamblers get
entertainment, Casino has a business!
Rules for means
Review: How can we tell if something is a
legitimate probability distribution?
Example (Linda cars)
Cars Sold
0
1
2
3
Probability .3
.4
.2
.1
Trucks/SUV
0
1
2
Probability
.4
.5
.1
Let X be the number of cars Linda sells and Y the number of trucks
and SUV’s.
μx = (0)(.3) + (1)(.4) + (2)(.2) + (3)(.1) = 1.1 cars
μy = (0)(.4) + (1)(.5) + (2)(.1) = .7 trucks and SUV’s
At
her commission rate of 25% of gross profit
on each vehicle she sells, Linda expects to
earn $350 on each car and $400 on each
truck/SUV sold.
her earnings are Z = 350X + 400Y
What are her average (expected) earnings?
Combing rules 1 and 2, her mean earnings are
μz = 350 μx + 400 μy
= 350x1.1 + 400x .7 = $665
That’s her best estimate of her earnings for the
day.
Rules for Variances
For this course we only need to deal with variances
of 2 variables that are independent. This is an
ASSUMPTION when we do these problems (always
ask yourself if the assumption of independence
seems reasonable).
RULE 1: Multiplying X by a constant multiplies the SD
by that constant (and thus the variance by the
square of that constant)
RULE 2: If X and Y are independent
σ2x+Y = σx2 + σy2 and σ2x-y = σx2 + σy2
(they’re the same b/c variance affected by the
square of the change so doesn’t matter if neg or pos)
The difference X – Y is more variable than either X
or Y alone because variations in both X and Y
contribute to variation in their difference.
Example: Winning lottery
(review)
Xi
0
Pi
XiPi
(Xi – μx2)Pi
.999
0
(0 - .5) (.999)
.24975
.001
.5
(500-.5)2(.001)
249.50025
The payoff X of a $1 ticket (do on calc)
2
500
μx = .5
σ2x = 249.75
The standard deviation is σx = √($249.75) = $15.80 (*Usual
for games of chance to have large variances, keeps them
exciting)
If you buy a ticket your winnings are W = X -1 (b/c you paid
$1)
By rules for means, the mean amount you win is μw = μx – 1 = $.50 (standard deviation and variance of μx – 1 will be the
same as μx b/c adding or subtracting a constant is a linear
trans!)
Suppose
now that you buy a ticket on each
of 2 different days. The payoffs X and Y on
the two tickets are independent because
separate drawings are held each day. Your
total payoff X + Y has mean:
μx+y = μx + μy = $.5 + $.5 = $1.00
Because X and Y are independent, the
variance of X + Y is
σ2x+Y = σx2 + σy2 = 249.75 + 249.75 = 499.50
the standard deviation of the total payoff is
σx+Y
=√(499.5) = $22.35 **not the same as sum of
individual standard dev!**
If you buy a ticket every day (365 tickets a
year) your mean payoff is the sum of 365 daily
payoffs. That’s 365 x $.50 = $182.50. Of course
it cost you $365 to play so you actually lose
$182.50!
Combining Normal Random
Variables
Any
linear combination of independent
Normal Random Variables is also Normally
Distributed.
Example- golf
Tom and George are playing in a tournament. Their
scores vary as they play the course repeatedly. Tom’s
score X has the N(110, 10) distribution, and George’s
score Y varies from round to round according to the
N(100,8) distribution. If they play independently, what is
the probability that Tom will score lower than George
and thus do better in the tournament?
The difference X – Y between their scores is Normally
distributed with mean and variance:
μx-y = μx – μy = 110 – 100 = 10
σ2x-y = σ2x + σ2y = 102 + 82 = 164
√(164) = 12.8, X – Y has the N(10, 12.8) distribution.
The probability that Tom wins is:
P(X<Y) = P(X – Y < 0)
P(Z < (0-10)/12.8)
P(Z < .78) = .2177