Transcript Unit 5

CHAPTER 5
Probability
Created by Kathy Fritz
Can ultrasound accurately predict the gender of a baby?
The paper “The Use of Three-Dimensional Ultrasound for
Fetal Gender Determination in the First Trimester” (The
British Journal of Radiology [2003]: 448-451) describes a
study of ultrasound gender prediction. An experienced
radiologist looked at 159 first trimester ultrasound images
and made a gender prediction for each one.
When each baby was born, the ultrasound gender
prediction was compared to the baby’s actual gender.
This table summarizes the resulting data:
Radiologist 1
Predicted Male
Predicted Female
Baby is Male
74
12
Baby is Female
14
59
Notice that the gender prediction based on the ultrasound
image is NOT always correct.
The paper also included gender predictions by a second
radiologist, who looked at 154 first trimester ultrasound
images.
Radiologist 2
Predicted Male
Predicted Female
Baby is Male
81
8
Baby is Female
7
58
INTERPRETING
PROBABILITIES
Probability
Relative Frequency
Law of Large Numbers
Basic Properties
PROBABILITY
We often find ourselves in situations where the outcome
is uncertain:
When a ticketed passenger shows up at the airport, she
faces two possible outcomes: (1) she is able to take the
flight, or (2) she is denied a seat as a result of
overbooking by the airline and must take a later flight.
Based on her past experience, the passenger believes that
the chance of being denied a seat is small or unlikely.
SUBJECTIVE APPROACH TO
PROBABILITY
The subjective interpretation of probability is
A probability of 1
A probability of 0
Because different people may have different
subjective beliefs, they may assign different
probabilities to the same outcome.
RELATIVE FREQUENCY
APPROACH
In the relative frequency interpretation of probability,
Relative frequency can be computed by:
A package delivery service promises 2-day delivery
between 2 cities in California but is often able to deliver
the packages in just 1 day. The company reports that the
probability of next-day delivery is 0.3.
Suppose that you track the delivery of packages shipped
with this company. With each new package shipped, you
could compute the relative frequency of packages shipped
so far that have arrived in 1 day:
number of packages that arrived in 1 day
total number of packages shipped
Here is a graph displaying the relative frequencies for
each of the first 15 packages shipped.
Here is a graph
displaying the
relative frequencies
for each of the first
50 packages
shipped.
Here is a graph
displaying the
relative frequencies
for each of the first
1000 packages
shipped.
LAW OF LARGE NUMBERS
Chance behavior is unpredictable in the short
run, but has a regular and predictable patter in
the long run.
PROBABILITY MODELS
Descriptions of chance behavior contain two parts:
Example: Roll the Dice
Give a probability model for the chance process of rolling two
fair, six-sided dice – one that’s red and one that’s blue.
PROBABILITY MODELS
Probability models allow us to find the
probability of any collection of outcomes.
Example: Roll the Die
If A is any event, we write its probability as P(A).
In the dice-rolling example, suppose we define event A as “sum is 5.”
There are 4 outcomes that result in a sum of 5.
Since each outcome has probability
Suppose event B is defined as “sum is not 5.” What is P(B)?
SOME BASIC PROPERTIES OF
PROBABILITY
All probability models must obey the following:
A large auto center sells cars made by many
different manufacturers. Two of these are Honda
and Toyota.
Suppose: P(Honda) = 0.25 and P(Toyota) = 0.14
Consider the make of the next car sold.
What is the probability that the next car sold is
either a Honda or a Toyota?
SOME BASIC PROPERTIES OF
PROBABILITY
4.
Recall the car dealership (P(Honda) = 0.25):
What is the probability that the next car sold is
not a Honda?
SOME BASIC PROPERTIES OF
PROBABILITY
Basic Rules of Probability
COMPUTING
PROBABILITIES
Chance Experiment
Sample Space
Event
Classical Approach to Probability
Chance Experiment
A chance experiment is
Suppose two six-sided dice are rolled and they
both land on sixes.
Or a coin is flipped and it lands on heads.
Or record the color of the next 20 cars to pass
an intersection.
Sample Space
Consider a chance experiment to investigate whether
men or women are more likely to choose a hybrid engine
over a traditional internal combustion engine when
purchasing a Honda Civic at a particular dealership. The
type of vehicle purchased (hybrid or traditional) will be
determined and the customer’s gender will be recorded.
A list of all possible outcomes are:
Chance Experiment
Recall the situation in which a person purchases a Honda
Civic:
Sample space = {MH, FH, MT, FT}
Identify the following events:
traditional =
female =
Classical Approach to Probability
When the outcomes in the sample space of a
chance experiment are equally likely, the
probability of an event E, denoted by P(E), is the
ratio of the number of outcomes favorable to E to
the total number of outcomes in the sample space:
Four students (Adam (A), Bettina (B), Carlos (C), and
Debra(D)) submitted correct solutions to a math contest
that had two prizes. The contest rules specify that if more
than two correct responses are submitted, the winners will
be selected at random from those submitting correct
responses.
What is the sample space for selecting the two winners
from the four correct responses?
Because the winners are selected at random,
the six possible outcomes are equally likely.
Four students (Adam (A), Bettina (B), Carlos (C), and
Debra(D)) submitted correct solutions to a math contest
that had two prizes. The contest rules specify that if more
than two correct responses are submitted, the winners will
be selected at random from those submitting correct
responses.
Let E be the event that both selected winners are the same
sex.
What is the probability of E?
Four students (Adam (A), Bettina (B), Carlos (C), and
Debra(D)) submitted correct solutions to a math contest
that had two prizes. The contest rules specify that if more
than two correct responses are submitted, the winners will
be selected at random from those submitting correct
responses.
Let F be the event that at least one of the selected winners
is female.
What is the probability of F?
RELATIVE FREQUENCY APPROACH TO
PROBABILITY
The probability of an event E, denoted by P(E), is
defined to be the value approached by the
relatively frequency of occurrence of E in a very
long series of observations from a chance
experiment. If the number of observations is large,
number of times E occurs
𝑃 𝐸 ≈
number of repetitions
Suppose that you perform a chance experiment
that consists of flipping a cap from a 20-ounce
bottle of soda and noting whether the cap lands
with the top up or down.
You carry out this chance experiment by flipping
the cap 1000 times and record if it lands top up or
top down. The cap lands top up 694 times.
PROBABILITIES OF
MORE COMPLEX
EVENTS
Union
Intersection
Complement
Mutually Exclusive Events
Independents Events
Consider the chance experiment that consists of selecting
a student at random from those enrolled at a particular
college.
There are 9000 students enrolled at the college
Here are some possible events:
F = event that the selected student is female
O = event that the selected student is older than 30
A = event that the selected student favors the expansion of the athletic
program
S = event that the selected student is majoring is one of the lab
sciences
Complement
The probability of EC can be computed from
the probability of E as follows:
Type equation here.
Suppose that 4300 of the 9000 students favor the expansion
of the athletic program.
What is the probability of event A not occurring?
Intersection
Consider the events:
O = event that the selected student is older than 30
S = event that the selected student is majoring is one of the
lab science
This table summaries the occurrence of these events:
Intersection
S
(Majoring in
Lab Science)
SC
(Not Majoring
in Lab Science)
Total
O (Over 30)
400
1700
2100
OC (Not over 30)
1100
5800
6900
Total
1500
7500
9000
What is the probability of a randomly selected student is
older than 30 AND is majoring in a lab science?
Union
Consider the events:
O = event that the selected student is older than 30
A = event that the selected student favors the expansion of
the athletic program
This table summaries the occurrence of these events:
Union
A
(Favors
Expansion)
AC
(Does Not Favor
Expansion)
Total
O (Over 30)
1600
500
2100
OC (Not over 30)
2700
4200
6900
Total
4300
4700
9000
What is the probability of a randomly selected student is
older than 30 OR favors the expansion of the athletic
program?
Two-Way Tables and Probability
Note, the previous example illustrates the fact that we can’t use
the addition rule for mutually exclusive events unless the
events have no outcomes in common.
The Venn diagram below illustrates why.
General Addition Rule for Two Events
If A and B are any two events resulting from some chance process, then
P(A or B) = P(A) + P(B) – P(A and B)
Venn Diagrams and Probability
Because Venn diagrams have uses in other
branches of mathematics, some standard
vocabulary and notation have been developed.
Venn Diagrams and Probability
Hint: To keep the symbols straight, remember ∪ for union and ∩ for
intersection.
Venn Diagrams and Probability
Recall the example on gender and pierced ears. We can use a Venn
diagram to display the information and determine probabilities.
Define events A: is male and B: has pierced ears.
Hypothetical 1000
You can use tables to compute the probability of
an intersection of two events and the probability
of a union of two events.
In many situations, you may ONLY know the
probabilities of some events. In this case, it is
often possible to create a “hypothetical 1000”
table and then use the table to compute
probabilities.
The report “TV Drama/Comedy Viewers and Health Information”
(www.cdc.gov/healthmarketing) describes a large survey that
was conducted by the Centers for Disease Control (CDC). The
CDC believed that the sample was representative of adult
Americans.
Let’s investigate these events (taken from questions on the
survey):
L = event that a randomly selected adult American reports
learning something new about a health issue or disease
from a TV show in the previous 6 months.
F = event that a randomly selected adult American is
female.
Data from the survey were used to estimate the following
probabilities:
𝑃 𝐿 = 0.58 𝑃 𝐹 = 0.5 𝑃 𝐿 ∩ 𝐹 = 0.31
CDC study continued
𝑃 𝐿 = 0.58 𝑃 𝐹 = 0.5 𝑃 𝐿 ∩ 𝐹 = 0.31
F (female)
Not F
Total
L (learned from TV)
Not L
Total
What is the probability that a randomly selected adult
American has learned something new about a health
issue or disease from a TV show in the previous 6 months
or is female?
Let’s look at the hypothetical table once more.
Suppose: P (A) = 0.6, P (B C) = 0.7, and P (A ∩ B) = 0.2
A
AC
Total
B
BC
Total
What is the probability of A or B happening?
Mutually Exclusive Events
Sometimes people call the emergency 9-1-1 number to
report situations that are not considered emergencies
(such as to report a lost dog). Let two events be:
M = event that the next call to 9-1-1 is for a medical
emergency
N= event that the next call to 9-1-1 is not considered
an emergency
Suppose that you know P(M) = 0.30 and P(N) = 0.20.
Events M and N are mutually exclusive because the
next call can’t be both a medical emergency and a call
that is not considered an emergency.
Mutually Exclusive Events
P(M) = 0.30 and P(N) = 0.20
𝑃 𝑀∩𝑁 =0
300 + 200
𝑃 𝑀∪𝑁 =
= 0.50
1000
N (Non-emergency)
Not N
Total
0
300
300
Not M
200
500
700
Total
200
800
1000
M (Medical Emergency)
Addition Rule for Mutually Exclusive
Events
If E and F are mutually exclusive events, then
and
INDEPENDENT EVENTS
Suppose that you purchase a desktop computer
system with a separate monitor and keyboard. Two
possible events are:
Event 1: The monitor needs service while under
warranty.
Event 2: The keyboard needs service while
under warranty.
DEPENDENT EVENTS
Consider a university’s course registration process, which divides
students into 12 priority groups. Overall, only 10% of all
students receive all requested classes, but 75% of those in the
first priority group receive all requested classes.
You would say that the probability that a randomly selected
student at this university receives all requested class is 0.10.
However, if you know that the selected student is in the first
priority group, you revise the probability that the student
receives all requested classes to 0.75.
These two events are said to be dependent events.
MULTIPLICATION RULE FOR TWO
INDEPENDENT EVENTS
More generally, if there are k independent events, the
probability that all the events occur is the product of all
individual event probabilities.
The Diablo Canyon nuclear power plant in California has a
warning system that includes a network of sirens. When the
system is tested, individual sirens sometimes fail. The sirens
operate independently of one another.
Imagine that you live near Diablo Canyon and that there are two
sirens that can be heard from your home. You might be
concerned about the probability that both Siren 1 and Siren 2
fail. (When the siren system is activated, about 5% of the
individual sirens fail.)
Using the multiplication rule for independent events:
CONDITIONAL
PROBABILITY
Sometimes the knowledge that one event has occurred
changes our assessment of the likelihood that another
event occurs.
Consider a population in which 0.1% of all the
individuals have a certain disease. The presence of the
disease cannot be discerned from appearances, but
there is a diagnostic test available. Unfortunately, the
test is not always correct.
Suppose that 80% of those with positive test results
actually have the disease and the other 20% of those
with positive test results actually do NOT have the
disease (false positive).
Disease example continued . . .
Consider the chance experiment in which an
individual is randomly selected from the population.
Let:
E = event that the individual has the disease
F = event that the individual's diagnostic test is
positive
P(E|F) denotes the probability that
event E (has disease) GIVEN that event
F (tested positive) occurs.
CONDITIONAL PROBABILITY
Recall the example in the Chapter Preview section about
gender predictions based on ultrasounds performed
during the first trimester of pregnancy. The table below
summarizes the data for Radiologist 1.
Radiologist 1
Predicted Male
Predicted
Female
Total
Baby is Male
74
12
86
Baby is Female
14
59
73
Total
88
71
159
How likely is it that a predicted gender is correct?
Gender prediction example continued.
Radiologist 1
Predicted Male
Predicted
Female
Total
Baby is Male
74
12
86
Baby is Female
14
59
73
Total
88
71
159
Is a predicted gender more likely to be correct when
the baby is male than when the baby is female?
Gender prediction example continued.
Radiologist 1
Predicted Male
Predicted
Female
Total
Baby is Male
74
12
86
Baby is Female
14
59
73
Total
88
71
159
If the predicted gender is female, should you paint the
nursery pink?
Let’s take the gender prediction example a little further.
Suppose that two radiologists both work in the same
clinic; Radiologist 1 works part-time while Radiologist 2
(from the Chapter Preview section) works full-time.
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 1 = 0.30
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 2 = 0.70
Let’s answer these questions:
1. What is the probability that a gender prediction based
on a first-trimester ultrasound at this clinic is correct?
2. If the first-trimester ultrasound gender prediction is
incorrect, what is the probability that the prediction
was made by Radiologist 2?
Gender prediction example continued.
From the data we know:
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 1 = 0.836
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 2 = 0.903
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 1 = 0.30
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 2 = 0.70
Prediction
Correct
Radiologist 1
Radiologist 2
Total
Prediction
Incorrect
Total
Gender prediction example continued.
From the data we know:
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 1 = 0.836
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 2 = 0.903
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 1 = 0.30
𝑃 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑖𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑅𝑎𝑑𝑖𝑜𝑙𝑜𝑔𝑖𝑠𝑡 2 = 0.70
Prediction
Correct
Prediction
Incorrect
Total
Radiologist 1
300
Radiologist 2
700
Total
1000
What is the probability that a gender prediction based on
a first-trimester ultrasound at this clinic is correct?
If the first-trimester ultrasound gender prediction is
incorrect, what is the probability that the prediction was
made by Radiologist 2?
CALCULATING
PROBABILITIES –
A MORE FORMAL APPROACH
PROBABILITY FORMULAS
The Complement Rule
The Addition Rule
For mutually exclusive events, this simplifies to
PROBABILITY FORMULAS
CONTINUED
The Multiplication Rule
For any two events E and F,
For independent events, this simplifies to
Conditional Probabilities
For any two events E and F with P(F) ≠ 0,
Revisit CDC’s study . . .
Recall:
L = event that a randomly selected adult American reports learning
something new about a health issue or disease from a TV show in
the previous 6 months.
F = event that a randomly selected adult American is female.
Data from the survey were used to estimate the following
probabilities:
𝑃 𝐿 = 0.58 𝑃 𝐹 = 0.5 𝑃 𝐿 ∩ 𝐹 = 0.31
What is the probability that a randomly selected adult
American reports learning something new about a health
issue or disease from a TV show in the previous 6 months or
that a randomly selected adult American is female?
The article “Chances Are You Know Someone with a Tattoo, and
He’s Not a Sailor” (Associated Press, June 11, 2006) summarized
data from a representative sample of adults ages 18 to 50.
T = the event that a randomly selected person has a tattoo
A = the event that a randomly selected person is between 18 and
29 years old
The following probabilities were estimated based on data
from the sample:
𝑃 𝑇 = 0.24,
𝑃 𝐴 = 0.50
𝑃 𝑇 ∩ 𝐴 = 0.18
ANOTHER APPROACH TO
PROBABILITY
A large electronics store sells two different portable DVD players,
Brand 1 and Brand 2. Based on past records, the store manager
reports that 70% of the DVD players sold are Brand 1 and 30% are
Brand 2.
The manager also reports that 20% of the people who buy
Brand 1 also purchase an extended warranty, and 40% of the
people who buy Brand 2 purchase an extended warranty.
Consider selecting a person at random from those who
purchased a DVD player from this store, what is the
probability that the person purchased extended warranty?
DVD Players Continued
P(Brand 1) = 0.7
P(Brand 2) = 0.3
The manager also reports that 20% of the people who buy
Brand 1 also purchase an extended warranty, and 40% of the
people who buy Brand 2 purchase an extended warranty.
Brand 1
Brand 2
Total
Bought Extended Warranty
Not Bought Extended
Warranty
Total
Consider selecting a person at random from those who
purchased a DVD player from this store, what is the
probability that the person purchased extended warranty?
DVD Players Continued
P(Brand 1) = 0.7
P(Brand 2) = 0.3
The manager also reports that 20% of the people who buy
Brand 1 also purchase an extended warranty, and 40% of the
people who buy Brand 2 purchase an extended warranty.
Consider selecting a person at random from those who
purchased a DVD player from this store, what is the
probability that the person purchased extended warranty?
and
EC = 0.8
B1 = 0.7
B2 = 0.3
E = 0.2
and
E = 0.4
EC = 0.6
or
PROBABILITY AS A
BASIS FOR MAKING
DECISIONS
PROBABILITY PLAYS AN IMPORTANT ROLE
IN DRAWING CONCLUSIONS FROM DATA.
A professor planning to give a quiz that consists of 20 truefalse questions is interested in knowing how someone who
answers by guessing would do on such a quiz.
To investigate, he asks the 500 students in his introductory
psychology course to write the numbers from 1 to 20 on a
piece of paper and then to arbitrarily write T or F next to each
number.
The students are forced to guess at the answer to each
question, because they are not even told what the questions
are!
These answers are then collected and graded using the key for
the quiz.
Quiz example continued.
Number of
Correct
Responses
Number of
Students
Proportion of
Students
Number of
Correct
Responses
Number of
Students
Proportion of
Students
0
0
0.000
11
79
0.158
1
0
0.000
12
61
0.122
2
1
0.002
13
39
0.078
3
1
0.002
14
18
0.036
4
2
0.004
15
7
0.014
5
8
0.016
16
1
0.002
6
18
0.036
17
1
0.002
7
37
0.074
18
0
0.000
8
58
0.116
19
0
0.000
9
81
0.162
20
0
0.000
10
88
0.176
Would you be surprised if someone guessing on a 20question true-false quiz got only 3 correct?
Quiz example continued.
Number of
Correct
Responses
Number of
Students
Proportion of
Students
Number of
Correct
Responses
Number of
Students
Proportion of
Students
0
0
0.000
11
79
0.158
1
0
0.000
12
61
0.122
2
1
0.002
13
39
0.078
3
1
0.002
14
18
0.036
4
2
0.004
15
7
0.014
5
8
0.016
16
1
0.002
6
18
0.036
17
1
0.002
7
37
0.074
18
0
0.000
8
58
0.116
19
0
0.000
9
81
0.162
20
0
0.000
10
88
0.176
If a score of 15 or more correct is a passing grade on the quiz,
is it likely that someone who is guessing will pass?
It would be unlikely that a student who is guessing would be
able to pass.
Quiz example continued.
Number of
Correct
Responses
Number of
Students
Proportion of
Students
Number of
Correct
Responses
Number of
Students
Proportion of
Students
0
0
0.000
11
79
0.158
1
0
0.000
12
61
0.122
2
1
0.002
13
39
0.078
3
1
0.002
14
18
0.036
4
2
0.004
15
7
0.014
5
8
0.016
16
1
0.002
6
18
0.036
17
1
0.002
7
37
0.074
18
0
0.000
8
58
0.116
19
0
0.000
9
81
0.162
20
0
0.000
10
88
0.176
The professor actually gives the quiz, and a student scores 16
correct. Do you think that the student was just guessing?
Quiz example continued.
What score on the quiz would it take to convince you that a
student was not just guessing?
Score
Approximately Probability
20
0.000
19 or better
0.000 + 0.000 = 0.000
18 or better
0.000 + 0.000 + 0.000 = 0.000
17 or better
0.002 + 0.000 + 0.000 + 0.000 = 0.002
16 or better
0.002 + 0.002 + 0.000 + 0.000 + 0.000 = 0.004
15 or better
0.014 + 0.002 + 0.002 + 0.000 + 0.000 + 0.000 = 0.018
14 or better
0.036 + 0.014 + 0.002 + 0.002 + 0.000 + 0.000 + 0.000 = 0.054
13 or better
0.078 + 0.036 + 0.014 + 0.002 + 0.002 + 0.000 + 0.000 + 0.000 = 0.132
ESTIMATING PROBABILITIES
EMPIRICALLY AND USING
SIMULATION
ESTIMATING PROBABILITIES
EMPIRICALLY
It is fairly common practice to use observed long-run
proportions to estimate probabilities.
The process used to estimate probabilities is simple:
1. Observe a large number of chance outcomes under
controlled circumstances.
2. Interpreting probability as a long-run relative
frequency, estimate the probability of an event by
using the observed proportion of occurrence.
To recruit a new faculty member, a university biology department
intends to advertise for someone with a Ph.D. in biology and at least 10
years of college-level teaching experience. A member of the
department express the belief that requiring at least 10 years of
teaching experience will exclude most potential applicants and will
exclude more female applicants than male applicants.
The biology department would like to determine the probability an
applicant would be excluded because of the experience requirement.
A similar university just completed a search in which there was no
requirement for prior teaching experience. However, prior teaching
experience was recorded. The resulting data is summarized in the
following table.
Number of Applicants
Less than 10 years
experience
10 or more years
experience
Total
Male
178
112
290
Female
99
21
120
Total
277
138
410
New faculty member example continued.
Now let’s determine if more females than males are
excluded due to the experience requirement.
Number of Applicants
Less than 10 years
experience
10 or more years
experience
Total
Male
178
112
290
Female
99
21
120
Total
277
138
410
ESTIMATING PROBABILITIES BY USING
SIMULATION
Simulation provides a way to estimate probabilities
when:
• You are unable to determine probabilities
analytically
• You do not have the time or resources to
determine probabilities
• It is impractical to estimate probabilities
empirically by observation
Simulations involves generating “observations” in a
situation that is similar to the real situation of interest.
USING SIMULATION TO APPROXIMATE A
PROBABILITY
1.
Design a method that uses a random mechanism (such as
a random number generator or table, the selection of a ball
from a box, or the toss a coin) to represent an observation.
Be sure that the important characteristics of the actual
process are preserved.
2.
Generate an observation using the method in Step 1, and
determine if the event of interest has occurred.
3.
Repeat Step 2 a large number of times.
4.
Calculate the estimated probability by dividing the number
of observations for which the event of interest occurred by
the total number of observations generated.
Suppose that couples who wanted children were to
continue having children until a boy was born. Would
this change the proportion of boys in the population?
We will use simulation to estimate the proportion of boys
in the population if couples were to continue having
children until a boy was born.
1. You can use a single random digit to represent a
child, where odd digits represent a male birth and
even digits represent a female birth.
2. An observation is constructed by selecting a
sequence of random digits. If the first random
number obtained is odd (a boy), the observation is
complete. If the first random number obtained is
even (a girl), another digit is chosen. You would
continue in this way until an odd digit is obtained.
Baby Boy Simulation Continued . . .
Below are four rows from the random digit table.
Row
6 0 9 3 8 7 6 7 9 9 5 6 2 5 6 5 8 4 2 6 4
7 4 1 0 1 0 2 2 0 4 7 5 1 1 9 4 7 9 7 5 1
8 6 4 7 3 6 3 4 5 1 2 3 1 1 8 0 0 4 8 2 0
9 8 0 2 8 7 9 3 8 4 0 4 2 0 8 9 1 2 3 3 2
Trial 1:
Trial 5:
Trial 9:
Trial 2:
Trial 6:
Trial 10:
Trial 3:
Trial 7:
Trial 4:
Trial 8:
• Example: Golden Ticket Parking Lottery
Read the example on the handout.
What is the probability that a fair lottery would result in two
winners from the AP Statistics class?
Students
Labels
AP Statistics Class
01-28
Other
29-95
Reading across row 139 in Table D,
look at pairs of digits until you see
two different labels from 01-95.
Record whether or not both
winners are members of the AP
Statistics Class.
Skip numbers from 96-00
55 | 58
89 | 94
04 | 70
70 | 84
10|98|43
56 | 35
69 | 34
48 | 39
45 | 17
X|X
X|X
✓|X
X|X
✓|Sk|X
X|X
X|X
X|X
X|✓
No
No
No
No
No
No
No
No
No
19 | 12
97|51|32
58 | 13
04 | 84
51 | 44
72 | 32
18 | 19
✓|✓
Sk|X|X
X|✓
✓|X
X|X
X|X
✓|✓
X|Sk|X
Sk|✓|✓
Yes
No
No
No
No
No
Yes
No
Yes
40|00|36 00|24|28
Based on 18 repetitions of our simulation, both winners came from the AP
Statistics class 3 times, so the probability is estimated as 16.67%.
• Example: NASCAR Cards and Cereal Boxes
Read the example on the handout.
What is the probability that it will take 23 or more boxes to get a
full set of 5 NASCAR collectible cards?
Driver
Label
Jeff Gordon
1
Dale Earnhardt, Jr.
2
Tony Stewart
3
Danica Patrick
4
Jimmie Johnson
5
Use randInt(1,5) to simulate buying one
box of cereal and looking at which card is
inside. Keep pressing Enter until we get all
five of the labels from 1 to 5. Record the
number of boxes we had to open.
3 5 2 1 5 2 3 5 4 9 boxes
4 3 5 3 5 1 1 1 5 3 1 5 4 5 2 15 boxes
5 5 5 2 4 1 2 1 5 3 10 boxes
We never had to buy more than 22 boxes to get the full set of cards in 50
repetitions of our simulation. Our estimate of the probability that it takes 23
or more boxes to get a full set is roughly 0.