Random Variables
Download
Report
Transcript Random Variables
Probability Distributions
Chapter 7
Random Variables
A Random variable assigns a number
to each outcome of a random
circumstance, or equivalently, a
random variable assigns a number to
each unit in a population.
Random Variables
Discrete random variable
can only take one of a
countable number of
distinct values.
Examples: Number of
siblings, sum of two dice
rolled,…
Separate Points on a
number line.
Continuous random
variable cannot be
displayed in a table since
there are innumerable
values.
Examples: Time, height,
weight,…
The Entire line.
Discrete Random Variable
Create a probability distribution table assigning distinct
values of the discrete random variable the
corresponding relative frequency (probability).
Prob
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
Prob
0
1
2
0.25
0.5
0.25
Number of Tails in 2 Flips
Prob
0.6
Probability Distribution of # of
Tails in 2 flips
0.4
0.3
0.2
0.1
0
Prob
0
0.25
1
0.5
2
0.25
Number of Tails in 2 Flips
Cumulative Probability Distribution
Cumul Pr
Cumulative Probability
Probability
0.5
1.2
1
0.8
0.6
0.4
0.2
0
Cumul Pr
0
0.25
1
0.75
Number of Tails in 2 Flips
2
1
Notation of Discrete RV
Capital letter represents random variable; T stands
for number of tails in 2 flips of coin
Lower case letter represents a number of the
discrete rv could assume; t could equal 0, 1, or 2
P(T = t) is the probability distribution function in
which T equals t;
P(T = 1) = .5
P(T t) is a cumulative probability function
notation of probability that T equals or is less
than t;
P(T 1) = P(T = 0) + P(T = 1) = .25 + .5 = .75
Practice
Probability
P(T = t) = ….
P(T = 0) =
P(T = 1) =
P(T = 2) =
P(T = -5) =
Cumulative Probability
P(T t) = ….
P(T 0) =
P(T 1) =
P(T 2) =
P(T < 1) =
P(T > 1) =
P(T < 2) =
How Many Girls Are Likely?
BBB BBG BGB GBB BGG GBG GGB GGG
Prob
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
G=g
0
1
1
1
2
2
2
3
List outcomes, assigned probabilities, and cumulative
probabilities.
g
P(G=g)
0
1/8
1
3/8
2
3/8
3
1/8
P(Gg)
1/8
4/8
7/8
1
Expected Value of Discrete RV
The expected value of a discrete random variable
is the MEAN value of the variable X in the
sample space of possible outcomes.
Formula says to calculate the sum of “outcomes
times probability”. E(X) = = xipi
Example of Expected Value
g
0
1
2
3
P(G = g)
1/8
3/8
3/8
1/8
gp
0(1/8)
1(3/8)
2(3/8)
3(1/8)
0
3/8
6/8
3/8
E(G) = G = gp = 0 + 3/8 + 6/8 + 3/8
= 12/8 or 1.5
Calculating Variance and Standard
Deviation of Discrete RV
Variance: V(X) = 2 = (xi - )2pi
Standard deviation: square root of Variance
x
2
i
pi
First calculate variance, then square root value
to find standard deviation.
Calculating V(X) and Std Dev (X)
g
0
1
2
3
g - G
0 – 1.5
1 - 1.5
2 – 1.5
3 – 1.5
(g - G)2
(-1.5)2
(-.5)2
(.5)2
(1.5)2
p
1/8
3/8
3/8
1/8
(g - G)2p
9/32
3/32
3/32
9/32
V(G) = 24/32 = .75
Std dev (X) = .8660
Using the TI-83+ to Calculate µ
and
Click here!
• Put the values of the X into L1
• Put the P(X=x) into L2
• Choose 1 Var-Stats input L1 and L2
– 1 Var-Stats L1,L2
• The output lists the expected value asx and
the standard deviation as .
Example Problem
Apgar Scores
At 1 min after birth and again at 5 min, each newborn child is given a
numerical rating called an Apgar score. Possible values of this score are
0 – 10. A child’s score is determined by five factors: muscle tone, skin
color, respiratory effort, strength of heartbeat, and reflex, with a high
score indicating a healthy infant. Let the random variable X denote the
Apgar score (at 1 min) of a randomly selected newborn infant at a
particular hospital, and suppose that x has the following probability
distribution. Find the average (or mean value or expected value) Apgar
score for all babies born at this hospital.
Example Problem
Apgar Scores
X
0
1
2
3
4
5
6
7
8 9 10
P(X=x) .002 .001 .002 .005 .02 .04 .17 .38 .25 .12 .01
Continuous Random Variable
Function Plot
no data
0.40
0.30
y
Probabilities of
continuous
random variables
equals the
proportion of area
shaded under the
curve.
The total area under
the curve is equal
to 1.
0.20
0.10
0.00
-3
-2
-1
normalDensity x , 0, 1
0
x
1
2
Uniform Distribution-Wait Time
P(1 X 2) = (.25)(2-1) = .25
= (height)(width)
Using a random variable as part
of a linear function
Sometimes we want to use the values of a random variable as
part of a function.
For example: Which copier should we buy if we plan to keep
it for two years?
Copier #1
Copier #2
Cost: $10,000
Cost: $10,500
Repair contract: $50/month
with unlimited service
calls.
Repair contract:
$200/service call
Number of Repairs
Probability
0
1
2
0.50 0.25 0.15
3
0.10
Copier #2
Cost: $10,500
Repair contract: $200/service call
Number of Repairs 0
1
2
3
Probability
0.50 0.25 0.15 0.10
y = |200|x (because adding to
every element of the distribution
y = 10,500 + 200*.85
doesn’t effect )
y = 10,670
y = 200*1.014
y = 10,500 + 200x
y =202.8
Linear Combinations of Random
Variables
Many times a random variable results from adding together
several other random variables.
Let X = the random variable of the
outcome of rolling one fair die.
Let Y = the random variable of the
outcome of rolling another fair die.
Let Z = the random variable of rolling
the die together.
Z=X+Y
X
1 2 3 4 5 6
P(X=x) 1/6 1/6 1/6 1/6 1/6 1/6
Y
1 2 3 4 5 6
P(Y=y) 1/6 1/6 1/6 1/6 1/6 1/6
X = 3.5
Y = 3.5
X2 = 2.9167
Y2 = 2.9167
For Z…
Z = X+Y = X+Y= 7
Z2 = X+Y2 = X2+Y2= 5.8334 (They are
independent)
Z = (Z2)
Note: Z X+Y
2010 AP Exam (Form B)
A test consisting of 25 multiple-choice questions with 5
answer choices for each question is administered.
For each question, there is only 1 correct answer.
Let X be the number of correct answers if a student guesses
randomly from the 5 choices for each of the 25 questions.
What is the probability distribution of X ?
2011 AP Exam (Form B)
An airline claims that there is a 0.10 probability that a coach-class ticket
holder who flies frequently will be upgraded to first class on any flight.
This outcome is independent from flight to flight. Sam is a frequent flier
who always purchases coach-class tickets.
What is the probability that Sam’s first upgrade will occur after the third
flight?
What is the probability that Sam will be upgraded exactly 2 times in his
next 20 flights?
Sam will take 104 flights next year. Would you be surprised if Sam
receives more than 20 upgrades to first class during the year? Justify
your answer.
Binomial and Geometric
Distributions
• Binomial Distribution
– Only two possible
outcomes per trial
– Fixed number of trials
– Each trial is
independent of the
others
– The probability of
success remains
constant
• Geometric
Distribution
– Only two possible
outcomes per trial
– Each trial is
independent of the
others
– The probability of
success remains
constant
What about sampling without
replacement?
When you sample without replacement from a population
the trials are not independent. The probability is not
remaining constant for each trial. So, it isn’t a binomial
experiment.
However…
If the sample size, n, is less than 5% of the population size,
N, then you can treat it as though the trials are independent
even when sampling without replacement. The change in
probability from one trial to the next is negligible. In this
case a binomial distribution is still a good model.
Binomial or Geometric?
• The number of LCD TVs sold out of the
next 8 TVs sold at an electronics store.
• The number of girls that must be asked to
get a date to prom.
• The number of plays that must be run to
score a touchdown.
• The number of face cards drawn in a 5 card
poker hand.
Remember to check for independence!
Binomial or Geometric?
• The number of LCD TVs sold out of the
next 8 TVs sold at an electronics store.
• The number of girls that must be asked to
get a date to prom.
• The number of plays that must be run to
score a touchdown.
• The number of face cards drawn in a 5 card
poker hand.
Remember to check for independence!
Formulas for Binomial
Distributions
n k
p( x) p (1 p) n k
k
x np
x np1 p
These formulas are on your
formula sheet. Always show
them with the correct values
substituted for n, k, and p, even
though most of the calculation
is done on the calculator.
Geometric : px 1 p p
x 1
Note: Binomial and Geometric distributions can be done
with the built-in TI-83 programming.
On the SAT, there are five answer choices
(A, B, C, D, and E). The probability of randomly guessing
the correct answer is .2.
What is the probability that on a 25-question section of the SAT by
complete random guessing that exactly 8 questions will be
answered correctly?
What is the probability that on a 25-question section of the SAT by
complete random guessing that 6 or fewer questions will be
answered correctly?
What is the probability that on a 25-question section of the SAT by
complete random guessing that the first correctly guessed
answered is the fourth?
What is the probability that on a 25-question section of the SAT by
complete random guessing that the first correct answer will be
within the first 6 guesses?
What is the expected number of correct guesses on a 25-question
section of the SAT exam
On the SAT, there are five answer choices (A, B, C, D, and E). The
probability of randomly guessing the correct answer is .2.
1. There are only two possible outcomes per trial: A
correct guess or an incorrect guess.
2. The probability of guessing right stays 1/5 for every
trial.
3. Since the selections are random the outcome of one
trial doesn’t influence the outcome of any other trial.
4. Some of the problems represent a fixed number of
trials (answering all 25 questions and examining the
results), and some do not (seeing how many questions
must be answered to achieve a desired outcome).
On the SAT, there are five answer choices (A, B, C, D, and E). The
probability of randomly guessing the correct answer is .2.
Let X be the random variable, number of questions answered
correctly.
Binomial: P(X = 8) = .062
Binomial: P(X ≤ 6) = .780
Geometric: P(The first success on the fourth guess) = .102
Geometric Cumulative: P(The first success within the first
6 guesses) = .738
Expected number of correct guesses: µ(X) = np = 25*1/5
= 5 correct guesses
Major universities claim that 72% of their senior athletes graduate
that year. Fifty senior athletic students attending major universities
are randomly selected and recorded in order of selection.
What is the probability that exactly 40 senior athletic students graduate that
year?
What is the probability that 40 or 41 or 42 senior athletic students graduated
that year?
What is the probability that 40 or fewer senior athletic students graduated
that year?
What is the probability that 41 or more senior athletic students graduated
that year?
What is the probability that 40 or more senior athletic students graduated
that year?
What is the probability that the first senior athletic student to graduate in the
group of 50 that year is the 5th selected?
Major universities claim that 72% of their senior athletes graduate
that year. Fifty senior athletic students attending major universities
are randomly selected and recorded in order of selection.
1.
There are only two possible outcomes for each senior athlete,
graduating or not graduating.
2.
The trials are independent: Since the athletes are selected at
random, whether one senior graduates or not does not affect the
graduation status of the next selected athlete.
3.
The probability does not remain constant b/c the athletes are
selected without replacement. However it is reasonable to
conclude that 50 senior athletes is less than or equal to 5% of
the entire population of senior athletes from all major
universities.
4.
The situations in which we select all fifty athletes and then
examine the results are binomial (fixed number of trials). Those
in which we select senior athletes until a desired result is
achieved are geometric (no fixed number of trials).
Major universities claim that 72% of their senior athletes graduate that year.
Fifty senior athletic students attending major universities are randomly
selected and recorded in order of selection.
Let G be the random variable, number of randomly selected senior athletic
students to graduate that year.
Binomial: P(G = 40) = .060
Binomial: P(40 ≤ G ≤ 42) = P(G ≤ 42) – P( G ≤ 39) = .118
Binomial: P(G ≤ 40) = .926
Binomial: P(G ≥ 41) = 1 – P(G ≤ 40) = .074
Binomial: P(G ≥ 40) = P(G = 40) + P(G ≥ 41) = .060 +.074 = .134
Geometric: P(The first selected athlete that graduated is the fifth
selection) = .004
Major universities claim that 72%...
What is the probability that the first senior athletic student to
graduate in the group of 50 that year is the 30th selected?
What is the probability that the first senior athletic student to
graduate in the group of 50 that year is within the first 10
selected?
What is the expected number of senior athletic students to
graduate that year?
What is the standard deviation of senior athletic students
graduating that year?
Major universities claim that 72%...
Geometric: P(the first senior athletic student to graduate is the
30th selected) = 0 (Wow, what does that mean? Does it
make sense?)
Geometric Cumulative: P(the first senior athletic student to
graduate is within the first 10 selected) = 1 (Was that
expected? What does it mean?)
µ(G) = np = 50*.72 = 36 senior athletes are expected to
graduate that year
(G) = sqrt[np(1-p)] = sqrt[50*.72*.28] = 3.175 senior athletes
is the typical amount of difference between the actual
number of selected senior athletic students graduating
that year and the average number.
Will Fumble is the only receiver for MHS football team
with the likelihood of catching a pass of .15.
What is the probability that 2 passes are caught out of 6 passes?
What is the probability that no passes are caught out of 6 passes?
What is the probability that only 0 or 1 pass is caught out of 6 passes?
What is the probability that 2 or fewer passes are caught out of 6
passes?
What is the probability that more than 2 passes are caught out of 6
passes?
What is the probability that the first pass caught is on the 1st pass?
Will Fumble is the only receiver for MHS football team with the
likelihood of catching a pass of .15.
1.
There are only two possible outcomes per trial: Either Will
catches the pass or he doesn’t.
2.
We must assume that we are randomly selecting the pass
attempts and that whether or not Will catches one pass has no
bearing on the result of the next pass attempt selected.
3.
We must assume that Will never improves or gets worse at
catching passes so that the probability of catching a pass
remains constant.
4.
Some of the problems are binomial b/c we are examing the
results of all 6 passes (fixed number of trials), others are
geometric because we are examining results as soon as certain
conditions are met (no fixed number of trials).
Will Fumble is the only receiver for MHS football team with the likelihood of
catching a pass of .15.
Let C be the random variable, number of randomly selected pass attempts to
Will that were completions.
Binomial: P(C = 2) = .176
Binomial: P(C = 0) = .377 (Does this number seem too high?)
Binomial: P(C = 0 or 1) = P(C = 0) + P(C = 1) = .776
Binomial: P(C = 2 or fewer) = .776 + .176 = .952
Binomial: P(C = more than 2 caught) = 1 – .952 = .048
Geometric: P(The first pass caught is the first pass) = .15
(This should have been obvious! Why?)
Will Fumble is the only receiver for MHS football team
with the likelihood of catching a pass of
.15.
Geometric: P(The first pass caught is on the 4th pass) = .092
Geometric Cumulative: P(The first pass is caught within the first 3
attempts) = .386
Geometric Cumulative: P(The first pass is caught after the first 3
attempts) = 1 - P(The first pass is caught within the first 3
attempts) = 1 - .386 = .614
µ(C) = np = 6*.15 = .9 catches is the expect number of catches with 6
attempts
µ(Number of attempts required for first catch) = 1/p = 1/.15 = 6.667
attempts is the expect number of attempts required for the first
pass caught
2010 AP Exam (Form B)
A test consisting of 25 multiple-choice questions with 5
answer choices for each question is administered.
For each question, there is only 1 correct answer.
Let X be the number of correct answers if a student guesses
randomly from the 5 choices for each of the 25 questions.
What is the probability distribution of X ?
2011 AP Exam (Form B)
An airline claims that there is a 0.10 probability that a coach-class ticket
holder who flies frequently will be upgraded to first class on any flight.
This outcome is independent from flight to flight. Sam is a frequent flier
who always purchases coach-class tickets.
What is the probability that Sam’s first upgrade will occur after the third
flight?
What is the probability that Sam will be upgraded exactly 2 times in his
next 20 flights?
Sam will take 104 flights next year. Would you be surprised if Sam
receives more than 20 upgrades to first class during the year? Justify
your answer.
Normal Distributions
“Z easiest kind of all!”
The Standard normal distribution:
A “bell curve” with = 0 and = 1.
It is essentially 6 wide and the area under the
curve = 1 !
3
2
1
0
+1
+2
+3
The Standard normal distribution:
When finding probabilities for a variable that has
a standard normal distribution, find the
corresponding area under the curve!
If x has a
standard normal
distribution,
find P(x<2).
3
2
P(x<2)
1
0
+1
+2
2
+3
Draw the Standard Normal Curve
and shade the probabilities.
P(z < 1)
P(z < -.34)
P(z > 1)
P(z > -2)
P(z 1.5)
P(-1.5 z 2.6)
P(-.75 <z < .25)
P(z > 0)
3
2
1
0
+1
+2
+3
Using Z-Table
Find the two tables representing z-scores from
–3.49 to +3.49.
Notice which direction the area under the
curve is shaded (to the left).
The assigned probabilities represents the area
shaded under the curve from the z-score and
far left of the z-score.
Reading a Z Table
P(z < -1.91) =.0281
?
P(z -1.74) =.0409
?
z
.00
.01
.02
.03
.04
-2.0
.0228
.0222
.0217
.0212
.0207
-1.9
.0287
.0281
.0274
.0268
.0262
-1.8
.0359
.0351
.0344
.0336
.0329
-1.7
.0446
.0436
.0427
.0418
.0409
Reading a Z Table
P(z < 1.81) =.9649
?
P(z 1.63) =.9484
?
z
.00
.01
.02
.03
.04
1.5
.9332
.9345
.9357
.9370
.9382
1.6
.9452
.9463
.9474
.9484
.9495
1.7
.9554
.9564
.9573
.9582
.9591
1.8
.9641
.9649
.9656
.9664
.9671
Other Probability statements
for continuous r.v.
P(X > b) = 1 – P(X < b)
same as
P(X b) = 1 – P(X < b)
P( a X b) =
P( X b) – P(X a)
Really boring slide…
P(X = b) = 0
Find P(z < 1.85)
From Table:
.9678
3
2
1
0
+1
+2
1.85
+3
Find P(z > 1.85)
From Table:
= 1 – P(z < 1.85)
= 1 - .9678
.9678
= .0322
3
2
1
0
+1
+2
1.85
+3
Find P(z < - .79)
From Table:
.2148
3
2
1
-.79
0
+1
+2
+3
Find P( -.79 < z < 1.85)
= P(z < 1.85) - P(z < -.79)
= .9678 - .2148 = .7530
3
2
1
-.79
0
+1
+2
1.85
+3
P(-1 < Z < 0)
Find the P(Z < 0).
Find the P(Z < -1).
Subtract.
P(Z < 0) - P(Z < -1) = .5000 - .1587 = .3413
Non-standard Normal
Distributions
• Draw Normal Curve
– Label mean
– Label standard deviation
• Plot the value(s) of x.
• Shade the appropriate area.
• Find the z-score(s) corresponding to the
value(s) of x.
• Find the probability. From the table or the
calculator.
Calculating Z-scores
z
x
Z is a standardized score (meaning the distance
with standardized deviation from =0).
x is the observed value of the random variable.
is the population mean of a normally distributed
model.
is the population standard deviation of a
normally distributed model
What is the probability that a
randomly selected student scored
below 73% on the Calculus exam
if the grades were normally
distributed with a mean of 78%
and a standard deviation of 4%?
X has a normal distribution with
a mean of 78 and a standard
deviation of 4; P(x73) = ?
x 73, 78, 4
x 73 78
z*
4
1.25
1. Find the z-score.
2. Find the
probability.
66
70
74
73
78
82
86
90
P(x73) = P(z*-1.25) = .106
Using TI-83 To Find Probability
of Normal Z Distribution
ShadeNorm(lower bd, upper bd, mean, std dev)
P(z < -1)ShadeNorm(-4,-1,0,1).1586
P(-2 < z < -1)ShadeNorm(-2, -1, 0, 1).1359
P(z > -1)ShadeNorm(-1, 4, 0, 1).8413
Finding z when given P?
What does it mean to score at the 80th
percentile on the SAT exam?
How many standardized standard deviations
from the mean was that SAT score?
Would the number of standardized standard
deviations (z-score) be positive or negative?
P(Z < z*) = .80
What is the value of z*?
z
.03
.04
.05
.06
0.7
.7673
.7704
.7734
.7764
0.8
.7967
.7995
.8023
.8051
0.9
.8238
.8264
.8289
.8315
P(Z < .84) = .7995 and P(Z < .85) = .8023,
therefore P(Z < z*) = .80 has the value of
z* approximately .84??
P(Z < z*) = .77
What is the value of z*?
z
.03
.04
.05
.06
0.7
.7673
.7704
.7734
.7764
0.8
.7967
.7995
.8023
.8051
0.9
.8238
.8264
.8289
.8315
What is the approximate value of z*?
93% of the students scored worse
than Devin Ned Integral on the
Calculus test mentioned earlier.
What was Devin’s score?
P(z<z*) = .93, z* 1.475
= (x – 78)/4, x = 83.9,
Devin’s score was
approximately 83.9% on
66 70 74 78 82 86 90
the Calculus test.
83.9
Using TI-83 to find Z-Score
Command invNorm calculates the z-score when you
enter the area shaded under the curve to the far left.
Ex. invNorm(.5) yields 0, since P(z < 0) = .5
Ex. invNorm(.05) = -1.645, as P(z < -1.645) = .05
Ex. invNorm(.95) = 1.645, as P(z < 1.645) = .95
invNorm( )
.95
.05
.10
Ex. invNorm(.95) = 1.645, since P(z > 1.645) = .95
Ex. invNorm(.05) = -1.645, as P(z > -1.645) = .95
Ex. invNorm(.90) = 1.281, as P(z >1.281) = .10
Establishing Normality of Data
• Normality must be given or verified before a
normal distribution may be used.
• This can be done with a graph.
– Histogram is unimodal and symmetric, approximating a
bell curve.
– Box and Whisker plot is symmetrical without any
That’s major outliers
not
• This can also be done with a Normal Probability
normal
Plot.
.
What’s your
Major
name?
Outlier!
Normal Probability Plot
Normal Probability Plot
.999
Probability
.99
.95
.80
.50
.20
.05
.01
.001
150
250
350
2-DAY
Average: 253.929
Std Dev: 47.7105
N of data: 28
Anderson-Darling Normality Test
A-Squared: 0.448
p-value: 0.259
Transformations to Achieve
Normality
• If X is skewed left,
– Square the data
– Cube the data
• See if the distribution of X2 or X3 is more normal.
• If X is skewed right,
– Square root the data
– Cube root the data
– Log or ln the data
• See if the distribution of X, 3X, logX or lnX is more
normal.
Approximating a Binomial
Distribution
• A Binomial Distribution may be
approximated using a normal distribution
under certain conditions.
Approximating a Binomial
Distribution
• If x is a binomial distribution (fixed number
of trials, each trial has only two outcomes,
trials are independent, probability of
success stays the same for each trial) and…
n·p 10
n(1-p) 10
• Then x has an approximately normal
distribution.