BASIC COUNTING - Mathematical sciences

Download Report

Transcript BASIC COUNTING - Mathematical sciences

USES OF CONDITIONAL
PROBABILITY
The Product Rule, Bayes’ Rule, and
Extended Independence
Probability and Statistics for
Teachers, Math 507, Lecture 5
1
The Product Rule
• Notation: In advanced probability courses it is common
to denote the intersection of events by concatenating
them rather than writing an intersection symbol
between them. That is, if A and B are events we write
AB instead of A  B . Our book does not use this
notation, but I will use it in this lecture to simplify what
I have to type (i.e., writing the letters adjacent does not
make me use the equation editor!).
Probability and Statistics for
Teachers, Math 507, Lecture 5
2
The Product Rule
• Example: Suppose a bag contains four green beads and
seven red ones. If I pull two out (without replacement),
what is the probability that both are red?
– Intuitively my probability of getting red on the first draw is
7/11, my probability of then getting another red on the
second draw is 6/10=3/5, and my total probability of getting
both red is (7/11)*(3/5)=21/55, which is about 0.38.
– Formally we can solve the problem by counting. There are
C(11,2)=55 2-subsets of the 11 beads, and C(7,2)=21 of them
contain 2 red beads. Assuming all outcomes are equally
likely, the probability of 2 reds is 21/55. We see that our
intuitive procedure yields the correct answer, but can we
justify it formally?
Probability and Statistics for
Teachers, Math 507, Lecture 5
3
The Product Rule
• Theorem (The Product Rule for Probabilities): Suppose
A and B are events in a sample space S with probability
measure P. We already know that P(B|A)=P(AB)/P(A).
Clearing denominators we see P(AB)=P(A)P(B|A). In
other words, the probability that A and B both happen
equals the probability that A happens times the
probability that B happens if A does.
Probability and Statistics for
Teachers, Math 507, Lecture 5
4
The Product Rule
• Example Revisited: Again suppose we have four green
and seven red beads in a bag and we choose two of
them without replacement. Let F be the event that the
first bead chosen is red and S be the event that the
second bead chosen is red. Then the even that both
beads are red is FS. We can find its probability from
the product rule as follows:
P(FS)=P(F)P(S|F)=(7/11)*(6/10)=21/55. Our intuition
is now justified.
Probability and Statistics for
Teachers, Math 507, Lecture 5
5
The Product Rule
• The product rule generalizes to more events. For instance,
suppose A, B, C, and D are events in a sample space S with
probability measure P. Then
P(ABCD)=P(A)P(B|A)P(C|AB)P(D|ABC). This same pattern
works with any number of events.
• Example Re-revisited: Now suppose we have four green and
seven red beads and we want the probability that four beads
chosen (without replacement) from the bag are all red. Let A, B,
C, and D, be the events that the first, second, third, and fourth
beads are red, respectively. Then the probability that all four
beads are red is P(ABCD)=(7/11)*(6/10)*(5/9)*(4/8)
=(7/11)*(3/5)*(5/9)*(1/2)=7/66, which is about 0.11. Note that,
for instance, the fourth factor 4/8=1/2 is the probability that the
fourth bead is red after we have already removed three red
beads.
Probability and Statistics for
Teachers, Math 507, Lecture 5
6
Independence of three events
• If A, B, and C are events in some sample space S with
a probability measure P, then we say the events are
independent if each pair is independent (i.e.,
P(AB)=P(A)P(B), P(AC)=P(A)P(C), and
P(BC)=P(B)P(C)) and in addition
P(ABC)=P(A)P(B)P(C).
• Intuitively this is what is needed to guarantee that all
the events and their complements are proportionally
represented in each other in all relevant ways. It is
possible to satisfy some of these equations while failing
to satisfy others
Probability and Statistics for
Teachers, Math 507, Lecture 5
7
Independence of three events
• Contrary Example: Roll a red die and a clear die. Let A
be the event “the red die is 1,” B be the event “the clear
die is 3,” and C be the event “both dice have the same
number.” The P(A)=P(B)=P(C)=6/36=1/6. Clearly
P(AB)=1/36=P(A)P(B), P(AC)=1/36=P(A)P(C), and
P(BC)=1/36=P(B)P(C). The event ABC, however, is
empty (since you cannot have the red die 1, the clear
die 3, and both dice the same), so P(ABC)=0. But
P(A)P(B)P(C)=(1/6)(1/6)(1/6)=1/216. Thus A, B, and
C are not independent.
Probability and Statistics for
Teachers, Math 507, Lecture 5
8
Independence of three events
• Example: Roll red, green, and clear dice. Let A be the
event “the red dice is 1,” B be the events “the green die
is 1,” and C be the event “the clear die is 1.” It is easy
to test that events A, B, and C are independent under
the uniform model. In particular
P(ABC)=1/216=(1/6)(1/6)(1/6)=P(A)P(B)P(C).
Probability and Statistics for
Teachers, Math 507, Lecture 5
9
Independence of more events.
• A collection of events is independent if the probability
of every subset of them equals the product of the
probabilities of the events in the subsets. For example,
events A, B, C, and D are independent if every pair is
independent, every triple is independent, and
P(ABCD)=P(A)P(B)P(C)P(D). (Consider rolling four
dice and getting all 1’s, flipping four coins and getting
HTTH).
Probability and Statistics for
Teachers, Math 507, Lecture 5
10
Bayes’ Rule
• Bayes’ Rule is a simple formula relating the values of
P(A|B) and P(B|A). It has several forms and interesting
consequences.
Probability and Statistics for
Teachers, Math 507, Lecture 5
11
Bayes’ Rule
• Theorem 2.16 (Bayes’ Rule)
– Given events H and E in a sample space S with probability
measure P it holds that P(H|E)=P(H)P(E|H)/P(E)
– Proof: By definition of conditional probability,
P(H|E)=P(HE)/P(E). By the product rule P(HE)=P(H)P(E|H).
Therefore P(H|E)=P(H)P(E|H)/P(E).
– Here we use H and E to stand for Hypothesis and Evidence.
We sometimes conceive of Bayes’ Rule as telling us how to
revise the probability of a hypothesis based on the
observation of some particular piece of evidence.
Probability and Statistics for
Teachers, Math 507, Lecture 5
12
Bayes’ Rule
• Example: You are living in a dorm. One night the fire alarm goes
off. How likely is it that there is a fire? Here H is the event
“there is a fire” and E is the event “the fire alarm goes off.” You
want to know P(H|E). You estimate that all things being equal a
fire is unlikely on a given night, setting P(H)=0.001 (roughly
one fire in three years). You know that in a typical semester of
about 100 days there are about 3 fire alarms (typically false
alarms), so you estimate P(E)=0.03. Finally you guess that it is
nearly certain someone would set off the alarm if there really
were a fire, so you estimate P(E|H)=0.98. By Bayes’ Rule,
P(H|E)=P(H)P(E|H)/P(E)=(0.001)(0.98)/(0.03)=0.033.
Probability and Statistics for
Teachers, Math 507, Lecture 5
13
Bayes’ Rule
• Notes to the example: From one point of view the alarm is
almost meaningless. There is only 3.3% chance of a fire. Why is
it so low? Your probabilities say that in 100 days you should
expect 30 alarms but only one fire. Thus your chance of having
fire with the alarm is 1/30. From another point of view the alarm
carries a lot of weight: The alarm raises the likelihood of a fire
thirtyfold, from 0.1% to 3.3% (that is 1/1000 to 1/30). This is
how the evidence (alarm) causes you to revise your estimate of
the hypothesis (fire). In any case the difference between P(H|E)
and P(E|H) is large: 0.033 to 0.98, a clear example of how these
quantities need not be equal.
Probability and Statistics for
Teachers, Math 507, Lecture 5
14
Bayes’ Rule
• Theorem 2.17 (Bayes’ Rule, extended form)
– Under the same circumstances as before .
P( H ) P( E | H )
P( H | E ) 
P( H ) P( E | H )  P( H ) P( E | H )
Probability and Statistics for
Teachers, Math 507, Lecture 5
15
Bayes’ Rule
S
• Proof: This is the same equation as in
the simpler statement except that in
that case the denominator was simply
P(E). It is easy to see that the sets EH
and E H partition E. The Venn
Diagram makes it clear: The event E
is partitioned into the yellow section
EH and the orange section E H .
Since the sets are disjoint and have
union E, we have P( E)  P( EH )  P( E H )
By the product rule the righthand
side becomes P(H )P(E | H )  P(H )P(E | H )
So we have the same equation as
before, but with a fancy expansion of
P(E) in the denominator.
Probability and Statistics for
Teachers, Math 507, Lecture 5
H
E
16
Bayes’ Rule
• Example (medical testing)
– A drug company has designed a test for a disease. Through
extensive testing, the company reports that the test produces
only 1% false positive results (i.e., a healthy person tests
positive) and only 2% false negative results (i.e., a person
with the disease tests negative). Let P be the event “someone
tests positive,” N be the event “someone tests negative,” H be
the event “someone is healthy,” and D be the event “someone
has the disease.” Then the company is reporting P(P|H)=0.01
(or equivalently P(N|H)=0.99) and P(N|D)=0.02 (or
equivalently P(P|D)=0.98).
Probability and Statistics for
Teachers, Math 507, Lecture 5
17
Bayes’ Rule
• Example (medical testing)
– Suppose you test positive for the disease. How likely is it that
you in fact have the disease? It is tempting but incorrect to
say 98% since P(P|D)=0.98. But you want to know P(D|P),
which may be quite different. It turns out you do not have
enough information yet. Oddly enough you must also know
P(D), the prevalence of the disease in your population. Why?
As in the case of the fires and fire alarms, if the disease is
rare, then false positives will dominate true ones. If the
disease is common, true positives will dominate false ones.
Probability and Statistics for
Teachers, Math 507, Lecture 5
18
Bayes’ Rule
• Example (medical testing)
– Suppose the disease is rare, occurring in only 0.05% of the
population. Then applying the second form of Bayes’ we get
P( D) P( P | D)
P( D | P) 
P( D) P( P | D)  P( H ) P( P | H )
0.0005* 0.98

 0.047  4.7%
0.0005* 0.98  0.9995* 0.01
Probability and Statistics for
Teachers, Math 507, Lecture 5
19
Bayes’ Rule
• Thus with a positive test your chance of having the
disease is still just below 5%. Why? Roughly speaking
among 2000 randomly chosen people you expect to
have 20 positive tests but only 1 person with the
disease. Thus about 95% of your positives are false.
Still this is a dramatic increase in the probability of
having the disease, from 0.05% to 4.7%, almost a
hundredfold increase.
Probability and Statistics for
Teachers, Math 507, Lecture 5
20
Bayes’ Rule
• On the other hand, suppose the disease is common.
Suppose 10% of people in your “population” have the
disease. Then
P( D) P( P | D)
P( D | P) 
P( D) P( P | D)  P( H ) P( P | H )
0.1* 0.98

 0.916  91.6%
0.1* 0.98  0.9 * 0.01
Probability and Statistics for
Teachers, Math 507, Lecture 5
21
Bayes’ Rule
• Now the positive test gives you over a 90% chance of
having the disease. How can this be? Now among 1000
people, 100 will have the disease and 98 of them will
test positive. Similarly 900 will not have the disease
and 9 of them will test positive. Now fewer than 10%
of your positives are false.
Probability and Statistics for
Teachers, Math 507, Lecture 5
22
Revision of Probabilities by Bayes
• Is this all nonsense? Do not the test results speak for
themselves? Do they not give the same information regardless of
who takes them and how many people have the disease? No!
The subtle but crucial point is that Bayes’ Rule lets us revise a
probability based on new evidence. Revision implies the
existence of a prior probability to be revised. The new
probability depends not only on the evidence but also on our
prior estimate of the probability. If we already know an event is
likely, then evidence in its favor may make it nearly certain. The
same evidence will be less compelling, however, if we know the
event to be inherently unlikely. The less likely the event, the
stronger the evidence must be in order to make it probable.
Probability and Statistics for
Teachers, Math 507, Lecture 5
23
Revision of Probabilities by Bayes
• For example, suppose your neighbor goes to play in the
U.S. Open chess tournament (a big tournament drawing
hundreds or thousands of players). Later you overhear
on the radio that someone from your town won the
tournament. Do you excitedly call up your neighbor to
find out if he won? It depends on your original estimate
of his likelihood of winning. If he is a weak player with
no realistic chance of winning, then you probably call
him up to find out who did win. If he is one of the
strongest players in the county, one with serious
chances of winning the tournament, you call up
excitedly to see if he is in fact the winner.
Probability and Statistics for
Teachers, Math 507, Lecture 5
24
Revision of Probabilities by Bayes
• More simply, a positive test for AIDS is more
worrisome for a promiscuous drug addict than a chaste
person who avoid drugs. The evidence is more
compelling when revising a probability you know to be
likely than when revising one you know to be unlikely.
Probability and Statistics for
Teachers, Math 507, Lecture 5
25
Bayes Rule Extended
• Theorem 2.19 (Bayes’ Rule for multiple hypotheses)
Suppose you have n events (hypotheses) H1 ,, H n
that partition the sample space S and you have an event
(evidence) E in S. Then for i between 1 and n,
P( H i ) P( E | H i )
P( H i | E ) 
P ( H1 ) P ( E | H 1 )    P ( H n ) P ( E | H n )
Probability and Statistics for
Teachers, Math 507, Lecture 5
26
Bayes Rule Extended
• Proof: This is identical to the proof of the extended
form of Bayes’ Rule except that you partition E into n
blocks by dividing it among the H’s. Then you use this
partition to expand the denominator P(E) in the simple
form of Bayes’ Rule.
Probability and Statistics for
Teachers, Math 507, Lecture 5
27
Bayes Rule Extended
• Example
– At a college, 40% of the students are freshmen, 25%
sophomores, 20% juniors, and 15% seniors (These are your
H’s, partitioning the whole population of students). Among
students on the honor roll (your evidence E), 5% are
freshmen (the percentage of E among freshmen), 10% are
sophomores, 18% are juniors, and 22% are seniors. What
percentage of honor roll students are sophomores? By Bayes’
Rule the probability is .
0.25 * 0.1
 0.219  21.9%
0.4 * 0.05  0.25 * 0.1  0.2 * 0.18  0.15 * 0.22
Probability and Statistics for
Teachers, Math 507, Lecture 5
28
Bayes Rule Conclusion
• Note: Theorem 2.18 and remark 2 on p. 50 are
interesting but not essential to our work. In particular
you may want to look at remark 3 before Theorem 2.18
if you have ever wondered what “odds” are and how
they relate to probability.
Probability and Statistics for
Teachers, Math 507, Lecture 5
29
Statistical Interlude
• Recall that one of the crucial jobs of statistics is to help
people summarize data to look for the real information
and patterns to be found in it. A couple lectures ago we
talked about using a single number to summarize a
collection of numbers. In particular we looked at
various numbers that might do this job (mean, median,
mode), and we looked at data resistant to summary
(e.g., phone books). The key is to make sure the
summary is clear and accurate, just as you would in
writing a summary of a book.
Probability and Statistics for
Teachers, Math 507, Lecture 5
30
Statistical Interlude
• God, in His wisdom, has designed the human eye to see
certain relationships easily. Statisticians take advantage
of this by drawing pictures of data that the eye can
easily scan.
• One of the most powerful such tools is the histogram.
– To make a histogram one must have a collection of “interval”
or “ratio” data. Typically this means that the data represent
counts or measurements.
Probability and Statistics for
Teachers, Math 507, Lecture 5
31
Histograms
– One takes the range of the data and partitions it into a
convenient number of subranges, usually of equal width.
Then one counts the number of data values falling into each
of these ranges (frequency classes or bins). Next one draws a
horizontal axis containing the range of data values and a
vertical axis labeled frequency with values going as high as
that of the most frequent (modal) class. Finally one draws
over every subrange of values a bar whose height (or area)
represents the frequency with which data falls into that
subrange. Usually the bars touch without overlap to indicate
that they partition the data values (i.e., nothing falls in the
cracks).
Probability and Statistics for
Teachers, Math 507, Lecture 5
32
Histograms
– The histogram quickly yields much information about the
data, indicating what values are typical, what values are
uncommon, and whether any values are extremely different
from the main body of values. This information is obvious to
the eye which easily compares heights and areas of
rectangles.
– To convert a frequency histogram into a relative frequency
histogram, simply divide every frequency by the total
number of data values. Now the heights of the bars represent
the proportion of data falling into each subrange. This is
useful in making histograms from populations of different
sizes visually comparable.
Probability and Statistics for
Teachers, Math 507, Lecture 5
33
Histograms
• Example: The website
http://www.amstat.org/publications/jse/jse_data_archive.html
has a file poverty.dat (described in poverty.txt) with
information from 1990 on birth, death, and infant
mortality rates, and life expectancies of men and
women in 97 countries around the world. Let us begin
by looking at the data on women’s life expectancies.
Probability and Statistics for
Teachers, Math 507, Lecture 5
34
Histograms
• One challenge in constructing a histogram is to decide
how many bars to have or, equivalently, to decide how
wide to make the subranges. Too few bars makes the
histogram coarse and blocky, obscuring patterns and
crucial details. Too many bars produce a
snaggletoothed histogram with lots of little gaps and
bars of height one. The ideal normally lies somewhere
in between. Here are attempts that divide women’s life
expectancy into 1-year, 10-year, and 2.5 year
subranges. The last seems to work best.
Probability and Statistics for
Teachers, Math 507, Lecture 5
35
Women’s Life Expectancies
Life Expectancy of Wom en in 97 Countries
8
7
5
4
3
2
1
82
79
76
73
70
67
64
61
58
55
52
49
46
43
0
40
Frequency
6
Age
Female Life Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
36
Women’s Life Expectancies
Life Expe ctancy of Wom e n in 97 Countrie s
45
40
35
Frequency
30
25
20
15
10
5
0
40
50
60
70
80
90
Age
Female Lif e Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
37
Women’s Life Expectancies
Life Expe ctancy of Wom e n in 97 Countrie s
14
12
8
6
4
2
82
.5
80
77
.5
75
72
.5
70
67
.5
65
62
.5
60
57
.5
55
52
.5
50
47
.5
45
42
.5
40
0
37
.5
Frequency
10
Age
Female Lif e Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
38
Women’s Life Expectancies
• All three histograms are valid and somewhat useful,
but the last seems to give us the clearest sense of the
“shape” of the data. What do we see? All the values fall
between about 42 and 83. Values in the upper 70’s and
low 80’s are most common. Higher values drop off
very quickly. Perhaps this suggests a large number of
nations share about equally in the benefits of modern
medicine and public health policy. On the other hand
the frequencies decrease rather uniformly from 75
down to 42, with perhaps a small gap in the 60’s. It
would be interesting to look for anything the nations
with life expectancies below 60 have in common.
Probability and Statistics for
Teachers, Math 507, Lecture 5
39
Comparing Two Sets of Data
• It is also interesting to compare the life expectancy data
for men and women. This involves trying to show two
histograms at once in such a way as to make their data
visually comparable. One of the most straightforward
approaches is simply to make separate histograms for
men and women using the same horizontal and vertical
scales and then display the two histograms, one above
the other. Here are some other approaches:
Probability and Statistics for
Teachers, Math 507, Lecture 5
40
Men & Women’s Life Expectancies
Life Expe ctancy in 97 Countrie s by Se x
14
12
Frequency
10
8
6
4
2
0
37.5 40
42.5 45
47.5 50
52.5 55
57.5 60
62.5 65
67.5
70
72.5 75
77.5 80
82.5
Age
Male Lif e Expectancy
Female Lif e Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
41
Men & Women’s Life Expectancies
Life Expe ctancy in 97 Countrie s by Se x
25
20
Frequency
15
10
5
0
37.5 40
42.5
45
47.5
50
52.5 55
57.5 60
62.5
65
67.5 70
72.5 75
77.5
80
82.5
Age
Male Lif e Expectancy
Female Lif e Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
42
Men & Women’s Life Expectancies
Life Expe ctancy in 97 Countrie s by Se x
14
10
8
Female Lif e Expectancy
6
4
Male Lif e Expectancy
Frequency
12
2
0
37.5
42.5
47.5
52.5
57.5
62.5
67.5
72.5
77.5
82.5
Age
Male Lif e Expectancy
Female Lif e Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
43
Men & Women’s Life Expectancies
Life Expectancy in 97 Countries by Sex
14
12
Frequency
10
8
6
4
2
0
37.5
40
42.5
45
47.5
50
52.5
55
57.5
60
62.5
65
67.5
70
72.5
75
77.5
80
82.5
Age
Male Lif e Expectancy
Female Lif e Expectancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
44
Men & Women’s Life Expectancies
• The point of all these displays of the same data is to
show that one can take many creative approaches to
studying the same data and communicating what is
there.
Probability and Statistics for
Teachers, Math 507, Lecture 5
45
Men & Women’s Life Expectancies
• One interesting apparent pattern is a tendency of
women to live longer than men. How can we
investigate this further?
– A natural approach is to construct a histogram of the
difference between women’s and men’s life expectancies in
each country. That produces the following histogram.
Probability and Statistics for
Teachers, Math 507, Lecture 5
46
Men & Women’s Life Expectancies
Female Minus Male Lif e Expectancy in 97 Countries
30
25
Frequency
20
15
10
5
0
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
Ye ars
Probability and Statistics for
Teachers, Math 507, Lecture 5
47
Men & Women’s Life Expectancies
• From this histogram we see one nation standing out in
having men live about two years longer than women. It
would be interesting to find out what the story is there.
Otherwise women live longer than men in almost every
nation, with the bulk of the differences being between
three and seven years. This has some interesting
implications in terms of how societies will deal with
large numbers of widows. It also suggests (not proves)
that death from childbirth is far less common
worldwide than it once was.
Probability and Statistics for
Teachers, Math 507, Lecture 5
48
Men & Women’s Life Expectancies
• A different approach to this matter calls for a
scatterplot, a graph in which the average life
expectancies for men and women in each country are
treated as an ordered pair and graphed. This is a
powerful tool for spotting relationships between sets of
data, as the following graph shows.
Probability and Statistics for
Teachers, Math 507, Lecture 5
49
Men & Women’s Life Expectancies
Male vs. Female Life Expectancy in 97 Countries,
1990
90
Female Life Expectancy
80
70
60
50
40
30
30
40
50
60
70
80
90
M ale Life Expe ctancy
Probability and Statistics for
Teachers, Math 507, Lecture 5
50
Men & Women’s Life Expectancies
• Here we see a strong, positive "linear relationship”.
The data roughly fit along a line (not the line shown).
Generally if men live longer in a country, then so do
women. The line is not a regression line. It is simply
the line y=x, showing those countries in which men
and women have equal life expectancies. Dots above
the line indicate nations where women live longer than
men. Only a very few dots lie below the line,
confirming the pattern we saw in the previous graph.
Probability and Statistics for
Teachers, Math 507, Lecture 5
51