updating orange space

Download Report

Transcript updating orange space

Probability: The Study of
Randomness
Randomness and Probability Models
IPS Chapters 4.1 and 4.2
© 2009 W.H. Freeman and Company
Learning Objectives
Randomness and Probability models

Describe the difference between events that are independent, and
those that are not.

Be able to take a sample space and create a probability model.

Be able to interpret a probability model.

Describe the difference between events that are DISJOINT and
those that are not.

Be able to apply the addition rule for disjoint events.

Describe the difference between an empirical probability and a
theoretical probability.
“Yeah, yeah”

Often we look at topics that seem like “common sense” and say to
ourselves, ‘yeah, yeah, I get that’. BE CAREFUL! Often there are
many ways in which we think we understand something, but there
still remain (many!!) gaps in our knowledge and understanding.



This is not only true of statistics – happens in all kinds of places/studies.
However, it is a particularly common pitfall in stats.
Be sure to review the concepts and complement them with lots of
problems.


There are not shortcuts – if there were I would tell them to you!!
Review and lots of problems is the only way to do learn this material!
Randomness and probability
A phenomenon is random if individual
outcomes are uncertain, yet given a large
number of repetitions, you would see a
regular distribution of outcomes.
For example, a single individual flip of a
coin is random. However a large number
of coin-flips will result in about 50%
heads and 50% tails.
The probability of any outcome of a random phenomenon can be defined as
the proportion of times the outcome would occur in a very long series of
repetitions. For example, the probability of flipping a heads is 50% given a
large number of repetitions (flips).
Coin toss
The result of any single coin toss is entirely random.
But the result over many tosses IS predictable.
The probability of
heads is 0.5 =
the proportion of
times you get
heads in many
repeated trials.
First series of tosses
Second series
* Two events are independent if the probability that one event
occurs on any given trial of an experiment (e.g. a coin toss) is not
affected or changed by the occurrence of the other event.
When are trials not independent?
Imagine that these coins were spread out so that half the coins showed heads and
half showed tails. Close your eyes and pick one. The probability of it being heads
is 0.5. However, if you don’t put it back in the pile, the probability of picking up
another coin that is heads up is now less than 0.5.
The trials could only be considered
independent if you put the coin back
each time.
Probability Models
Probability models describe, mathematically, the outcome of random
processes. A probability model consists of two parts:
1) Sample Space (‘S’): This is the set, of all possible outcomes of
a random process.
2) A probability for each possible outcome in the sample space.
Example: Probability Model for a Coin Toss:
S = {Head, Tail}
Probability of heads = 0.5
Probability of tails
= 0.5
Example of a Probability Model
Example: A couple wants three children. What are the numbers of girls they could
end up with? What are the probabilities for each outcome? Create a probability
model.
Also: Be sure to note and be comfortable with the way we use variables and symbols such as P(X=2)
Sample space: Let X = possible number of girls: {0, 1, 2, 3}
 P(X = 0) = P(BBB) = 1/8
 P(X = 1) = P(BBG or BGB or GBB) = P(BBG) + P(BGB) + P(GBB) = 3/8
 P(X = 2) = P(BGG or GBG or GGB) = P(BBG) + P(BGB) + P(GBB) = 3/8
 P(X = 3) = P(GGG) = 1/8
Probability Model:
Terminology Example: A couple wants three children.


NB: It is vitally important to clearly define what X represents. In this case, X
is a count of the number of girls that the couple will end up with out of their 3
children.
So if your statistics prof gives you the situation in the previous slide and
then writes P(X=2) what exactly are you being asked?


If you were asked P(X>2)?


Answer: What is the probability of the couple ending up with exactly 3 girls?
What if you were asked P(X>=2)?


Answer: “What is the probability of the couple ending up with exactly two girls?”
Answer: What is the probability that the couple ending up with either 2 girls or 3
girls?
You must be very clear with this concept. Note that on this slide, I do not
spend the time working out the mathematical answer. This is intended as a
reminder to you that the most important thing to do first is to clearly
understand what the question is asking and to understand precisely what
the variables represent.
Terms / Models – Learn ‘em!!!

Sometimes we can intuitively figure things out.



Sometimes, however, things become much more subtle and/or
complicated. This is why it is important to become very familiar with
terms such as:






What is the probability of drawing an Ace of Spades from a deck of
cards? Answer: 1/52
What is the probability of drawing an Ace of Spades from the deck if the
card just before was the 5 of Diamonds? Answer: 1/52
probability models,
independent vs non-independent events
disjoint vs non-disjoint
conditional probability
etc etc.
Always focus on applying the proper model and you will have a
much better chance at understanding the concepts and coming up
with the correct answers. Remember, it’s the concept that matters. If
you don’t have the concept, you risk incorrect statistics.
** Sample spaces
It’s the question being asked that determines the sample space.
A. A basketball player shoots
three free throws. What are
the possible sequences of
hits (H) and misses (M)?
H
H -
HHH
M -
HHM
H
M
M…
H -
HMH
M -
HMM
…
B. A basketball player shoots
three free throwsWhat is the
number of baskets made?
S = { HHH, HHM,
HMH, HMM, MHH,
MHM, MMH, MMM }
Note: 8 elements, 23
S = { 0, 1, 2, 3 }
C. A nutrition researcher feeds a new diet to a young male white rat. What
are the possible outcomes of weight gain (in grams)?
S = [0, ∞[
(the ‘[‘ means that infinity is excluded)
Sample Space continued:

What is the sample space for a single coin flip?


What is the sample space for two coin flips?


S(Two coin flips) = {HH, HT, TH, TT}
What is the sample space for a single baseball at-bat?


S(Single Coin Flip) = {H, T}
S(At bat result) = {Hit, Out}
Note that in order to define the sample space, you must read the question
carefully!
Probability of Outcomes in the Sample Space

Each outcome in the sample space has a probability of occuring.




We notate this outcome as P(….)
Eg: Looking at a single coin flip. Let X = the flip result. What is P(X=heads)?

Answer: 0.5
Eg: Looking at two coin flips, Let X = the number of heads. What is P(X=2)? Answer is
0.25 – we will explain how we got this number later.
The most important point here, is NOT the numerical answer. It is to note the
terminology/symbology being used.
Key Point: Each of the possible outcomes of a sample space may have different
probabilities. That is, they are not necessarily identical.

Coin flip outcomes {H,T} do have identical probabilities

P(Heads) = 0.5, P(Tails) = 0.5

A good professional basketball player may hit 70% of free throws (well, those
whose names aren’t Shaq).

P(Basket) = 0.7, P(Miss) = 0.3

An at-bat baseball player’s outcomes {hit, out} have different probabilities

P(Hit) = 0.294, P(Out) = 0.706
* Outcomes / Events

Proper and specific identification of the event and sample space can sometimes fool you! Be
sure to read the question with some care!

Without proper identification of the sample space, event, and the probability you are looking for, you’re gonna
end up with the wrong answer!

An event (also called an outcome) is some subset of the sample space that we are
interested in.

For the following two examples, define the sample space, the event, and the probability
(in terms of symbols):

If I roll a single die 50 times, how often will I roll a 1?

Sample space = {1, 2, 3, 4, 5, 6}

Event = {1}

Let X = the number of times a 1 is rolled, i.e. What is P(X=1)?

If I roll two die 50 times, how often will I get a 7 or an 11?

Sample space = {1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/1, 2/2, 2/3, etc, etc}

Event = {7, 11}

Let X = the number of times a 7 OR an 11 is rolled. What is P(X=7 or X=11)?

An event is always some subset of the sample space. Notice that the event is NOT the same thing as the probability.
The event is simply the result you are interested in calculating a probability for. The probability refers to the likelihood
of that event occuring.

When doing probability work, we are most often interested in determining the likelihood
(probability) that a particular event inside the sample space will occur.

Classical Probability formula. This is calculated by looking at the sample space and
counting all of the times your event occurs. Divide that number by the total number of
outcomes in the sample space.


EXAMPLE: What is the probability of rolling a double in a single roll of the die?




P(A) = # of possible outcomes in which event A occurs / total # outcomes in the sample space
The sample space has 36 possible values: {1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/1, 2/2, 2/3 ….. 6/5, 6/6}
The event you are looking for are doubles {1/1, 2/2, 3/3, 4/4, 5/5, 6/6}
If you look through your sample space, you will find 6 occurences of your event. The total number of
outcomes in the sample space is 36. Therefore, P(A double is rolled) = 6/36 (0.167).
EXAMPLE: What is the probability of rolling a double in 2 rolls of the die?



The sample space has 36 possible values: {1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/1, 2/2, 2/3 ….. 6/5, 6/6}
The event you are looking for are doubles in the first roll {1/1, 2/2, 3/3, 4/4, 5/5, 6/6} OR doubles in
the second roll. (i.e. You can two tries).
P(double in 2 rolls) = 12 possible outcomes that have a double / 36 possible outcomes = 0.333

We will have a slightly more formal discussion of this shortly.
Probability rules
1) Probabilities range from 0
(no chance of the event) to
1 (the event will always happen).
For any event A, 0 ≤ P(A) ≤ 1
Coin Toss Example:
S = {Head, Tail}
Probability of heads = 0.5
Probability of tails = 0.5
Probability of getting a Head = 0.5
We write this as: P(Head) = 0.5
P(neither Head nor Tail) = 0
P(getting either a Head or a Tail) = 1
2) Because some outcome must occur
on every trial, the sum of the probabilities Coin toss: S = {Head, Tail}
of all the possible outcomes (the sample
P(head) + P(tail) = 0.5 + 0.5 =1
space) must be exactly 1.
 P(sample space) = 1
P(sample space) = 1

If a basketball free-thrower makes the basket 72% of the time:

S = {Hit, Miss}

P(Hit)
= 0.72
P(Miss)
= 0.28
P(Hit + Miss) = 1.0

Recall, all probababilities in the sample space must add up to 1.
P(Neither hit nor miss) = 0



New Term: Disjoint




Important for our next probability rule…
Disjoint essentially means ‘Mutually Exclusive’
If two events are disjoint, then if one event is true, the other can NOT be
true.
The two examples here are disjoint since if one of them is true, then the
other can not be.


Eg: P(born in Canada or born in Mexico)
Eg: P(draw Ace of Spades or draw Queen of Hearts)
Probability rules (cont’d )
Venn diagrams:
A and B disjoint
3) Two events A and B are disjoint if they have
no outcomes in common and can never happen
together. If two events A and B are disjoint, then
the probability that A or B occurs is the sum of
their individual probabilities.
P(A or B) aka P(A U B) = P(A) + P(B)
This rule is called the addition rule for disjoint events.
A and B not disjoint
(Later we will learn a different formula to use for NON-disjoint events)
When you find yourself using the word “OR” in a probability question,
you are probably going to need to use some form of the addition rule.
You will need to decide whether to use addition rule for disjoint events, or the addition rule
for non-disjoint events.
Example: Addition Rule for Disjoint Events
Example: If you flip two coins, what is the probability of getting ONLY heads or
ONLY tails?
S = {HH, HT, TH, TT}.
Here is the probability table (model): Event
HH
HT
TH
TT
Prob
0.25
0.25
0.25
0.25
Note that the event {HH} and the event {TT} are disjoint. Therefore, we can use our
addition rule for disjoint events. Recall that this rule says that: If two events are disjoint,
the probability of one event happening OR the other event happening is simply the sum
of the probabilities of each event individually.
Answer: So, the probability that you obtain “only heads or only tails” is:
P(HH or TT)
= P(HH) + P(TT)
= 0.25 + 0.25
= 0.50
The “Addition Rule for Disjoint Events”





This probability rule for disjoint events says that the probability of A
happening OR the probability of B happening equals the sum of
those two individual probabilities.
Put as a fomula:
 P(A or B) = P(A) + P(B)
HOWEVER!  It must be noted that this rule only applies for
“disjoint” events. If the events are not disjoint, you can not use this
formula to determine the P(A or B).
If you want to find the probability of P(A or B) when the events are
not disjoint, you must use a different formula.
You will encounter rules such as these throughout the course.
They key point of this slide: It is important to pay attention to
the details! (e.g. “disjoint”)
Example of Non-Disjoint events

Example: P(draw 6 of Spades or draw an even numbered card)  Yes, we
see the word ‘OR’ here, which makes it tempting to use the addition rule for
disjoint events. However, note that it IS possible for both of these events to
be true. Therefore, the rule can not be used.

Example: Suppose you have a group of 100 athletes. Of those athletes,
30% are basketball players. In that group of 100 athletes, 25% are over 6feet tall.
What is the probability that an athlete is a basketball player OR is 6 feet tall?

As with the previous example, you may be tempted to use our addition rule for
disjoint events: P(Basketball Player) + P(6-feet tall)

However, because these two events are not disjoint, doing so would give an
incorrect result!

It is possible for any member of the population to be BOTH a basketball player
and also 6 feet tall. The addition rule that we have discussed will NOT give a
valid answer in this case. We will soon modify our addition formula to deal with
non-disjoint events.
Example:
On a single roll of a die, what is the probability of rolling a 1 or a 3 or a 5?
You see the word ‘OR’, so you should start thinking about the addition rule.
In that case, the first thing you need to ask yourself is, are the events
disjoint?
Answer: Yes – they can not overlap. That is, on a single roll, you can’t end
up with more than one of the possible events at the same time.
-
So, using our addition rule:
-
S = {1,2,3,4,5,6}
E = {1, 3, 5}
-
-
P(1 or 3 or 5)
= P(1) + P(3) + P(5)
= (1/6) + (1/6) + (1/6)
= 3/6 i.e. P(1 or 3 or 5) = 0.5
Another example of a Probability Model:
Example: A couple wants three children. What are the numbers of girls they could
end up with? What are the probabilities for each outcome? Create a probability table.
Did you try to use the multiplication rule?! To calculate the probabilities for each
possible value, we must in this case use the addition rule.
Sample space: Let X = possible number of girls: {0, 1, 2, 3}
 P(X = 0) = P(BBB) = 1/8
 P(X = 1) = P(BBG or BGB or GBB) = P(BBG) + P(BGB) + P(GBB) = 3/8
 P(X = 2) = P(BGG or GBG or GGB) = P(BBG) + P(BGB) + P(GBB) = 3/8
 P(X = 3) = P(GGG) = 1/8
Important: Note that the sample space in this question is NOT {BBB, BBG, BGB, GBB, GGB,
GBG, BGG, GGG}
Remember that it is vitally important to properly identify your sample space!!!
Examples using a Probability Table
X = # of girls in a household with 3 children
Example: What percentage of households has fewer than 2 girls?
Answer: P(X<2) = 3/8 + 1/8 = 0.5
Example: What is the probability that a randomly selected household has at least
one girl?
Answer: P(X>=1) = 3/8 + 3/8 + 1/8 = 7/8
A better idea would be to use the complement rule: P(X=0)C = 1 – 1/8
Probability rules (cont’d)
Coin Toss Example:
S = {Head, Tail}
Probability of heads = 0.5
Probability of tails = 0.5
4) The complement of any event A refers
to an event in which A does not occur. It is
written as Ac.
Put as a formula: The complement rule
states that the probability of an event not
occurring is 1 minus the probability that is
does occur.
P(not A) = P(Ac) = 1 − P(A)
Tailc = not Tail = Head
P(Tailc) = 1 − P(Head) = 0.5
Venn diagram:
Sample space made up of an
event A and its complementary
Ac, i.e., everything that is not A.
Complement:



Can be very useful, sometimes it’s just a shortcut. Other times, it is a
tool where it would be essentially impossible to solve the problem
without it.
As a simple example, suppose you want to calculate the probability
of rolling a 1 or a 2 or a 3 or a 4 or a 5. This could be determined by
calcuating P(1) + P(2) + P(3) + P(4) + P(5)
However, an easier way would be to calculate the complement of
rolling a 6:

P(6C)= 1 – P(6) = 1-1/6 = 5/6
Probabilities: finite number of outcomes
Finite sample spaces deal with discrete data — data that can only
take on a limited number of values. For example:
Throwing a die:
S = {1, 2, 3, 4, 5, 6}
Sample space of a roll of a die is finite. {1,2,3,4,5,6}
Sample space of people’s heights in not finite. [0”, 110”]
As with disjoint/non-disjoint, we will see that there are different probability formulas
that are used depending on whether the sample space is discrete v.s. non-discrete.
* M&M candies
If you draw an M&M candy at random from a bag, the candy will have one
of six colors. The probability of drawing each color depends on the proportions
manufactured, as described here:
Color
Probability
Brown
Red
Yellow
Green
Orange
Blue
0.3
0.2
0.2
0.1
0.1
?
What is the probability that an M&M chosen at random is blue?
S = {brown, red, yellow, green, orange, blue}
P(S) = P(brown) + P(red) + P(yellow) + P(green) + P(orange) + P(blue) = 1
P(blue) = 1 – [P(brown) + P(red) + P(yellow) + P(green) + P(orange)]
= 1 – [0.3 + 0.2 + 0.2 + 0.1 + 0.1] = 0.1
What is the probability that a random M&M is either red, yellow, or orange?
P(red or yellow or orange) = P(red) + P(yellow) + P(orange)
= 0.2 + 0.2 + 0.1 = 0.5
M&M candies
Color
Probability
Brown
Red
Yellow
Green
Orange
Blue
Fuscia
0.3
0.2
0.2
0.1
0.1
?
?
Suppose I told you there was an SEVENTH color, ‘fuscia’.
Now what is the probability that an M&M chosen at random is blue?
Answer: You can’t tell. The probabilities must add up to 1. All colors except
blue and fuscia add up to 0.9. Blue and fuscia must make up the
remaining 0.1. We have no way of knowing the exact amounts.
eg: Blue = 0.06, Fuscia = 0.04?
Blue = 0.05, Fuscia= 0.05?
Blue = 0.03, Fuscia = 0.07?
etc
Methods of determining the probability to an event:
There are only 2 ways of assigning probabilities with some degree of accuracy:
Theoretically  from our understanding of the phenomenon and symmetries in
the problem


For a 6-sided fair die, P(2) = 1/6

For a deck of cards, P(Ace of Hearts) = 1/52
Empirically  We look at data. In other words, we are deriving our knowledge
based on numerous similar past events. We determine a probability empirically
when there isn’t a theoretical method of determining the probability.


EXAMPLE: What is the probability that a random student in IT-223 wil receive an
A? This probability can not be determine theoretically the way, say, a coin flip can
be. Instead, we’d have to look at past data from IT-223 courses and see how many
As were recorded. If in a sample of 600 students, 54 A’s were recorded, then
P(grade of A) = 54 / 600.

EXAMPLE: What is the probability of a baseball player getting a hit on a given at
bat? Again, we would have to look at past data to see his/her batting average. If
over the last year, he hit 122 times out of 411 at bats, P(Hit) = 122/400.
Shortcut: Determining a probability when all outcomes are equally likely
When all outcomes are equally likely (have the same probability), there is a nice
quick way of calculating probabilities:
If a random phenomenon has a group of equally likely possible outcomes (e.g.
die roll, coin flip) then each individual outcome has probability 1/(# of possible
outcomes).
Eg: For a coin flip, all possible outcomes have an equal probability (0.5)
Eg: For a die roll, all outcomes have an equal probability (1/6)
Formula: For any event A where all outcomes have the same probability:
P(A) 
count of outcomes in A
count of outcomes in S
So when looking at situations where all outcomes have the same likelihood,
this is a very convenient way of calculating a probability.
See example on next slide.
Dice
You toss two dice. What is the probability of the outcomes summing to 5?
S:
{(1,1), (1,2), (1,3),
……etc.}
There are 36 possible outcomes in S, all equally likely (given fair dice).
Thus, the probability of any one of them is 1/36.
P(sum of 5) = 4 possible outcomes
P(outcomes in Sample space) = 36 outcomes in sample space
= 4/36
EXAMPLE: What is the probability of the ball landing in an even number slot?
Answer: All slots have an equal probability. So:
P(Even)
= count of Even outcomes / count of All outcomes
= 18 / 38
= 0.47
count of outcomes in A
P(A) 
count of outcomes in S
Probability rules (cont’d)
Multiplication Rule for Independent Events
5) Two events A and B are independent if knowing that one event has occured
does not change the probability that the other will occur.
The multiplication rule for independent events:
P(A and B) = P(A) * P(B)
Again, this rule only applies to independent events. Later we will show a second rule that applies to
non-independent events
Independence

Flip a coin once and get ‘heads’. Does this result
change the probability of rolling a heads again on
the second roll?


No, they are independent results
In a game of Poker, you draw a card from the deck
and get an Ace. Does this result change the
probability of getting an Ace on a second draw
from that same deck?

Yes, there are now only 3 aces left in the deck. These events are
not independent. (Initially, 4/52 chance, after removing 1 ace,
3/51 chance)
Example
Multiplication Rule for Independent Events
Example: What is the probability of getting a tails on two consecutive coin
tosses?
When we encounter the term ‘AND” we typically think of the multiplication rule.
At that point we must ask ourselves if the events are independent. Because in
two consecutive coin tosses, they are indeed indepdendent, then:
P(first = Tail and second = Tail)
= P(first Tail) * P(second Tail)
= 0.5 * 0.5
= 0.25
Example:
A couple intends to have three children. What is the likelihood that they will have
only boys?


Each birth is independent of the next, so we can use the multiplication rule.
Example: P(BBB) = P(B)* P(B)* P(B) = (1/2)*(1/2)*(1/2) = 1/8
Example:

You are dealt two cards out of a deck. Calculate the probability the
the cards are the Ace of Spades and the Ace of Hearts.

Solution: You will probably be tempted to use the multiplication rule.
However, the rule you’ve been given only applies to independent events.
Unfortunately, these events are not independent. Once one event has
occurred (e.g. drawing the ace of spades from the deck), the probability
for the second event changes.

Later, we will learn a slight modification to this rule so that it can also
be applied to non-independent events.

If, however, we changed the question to state that instead of pulling two
cards out of the deck, we pull out one card, then put it back and pull out
another card, we could use our multiplication rule, since these are now
independent events.
Disjoint vs Independent


The two are often confused. Be sure that you are clear on the
meaning of each.
One common question has to do with the relationship between
them. Is there one? For example, can an event be both disjoint and
independent?
1.
2.
3.
Disjoint = If two events, A and B are disjoint, we are saying that if P(A) is
true, then P(B) can not be true. So let’s say that our event is indeed
disjoint.
If our event is disjoint, this says that P(B) is affected by P(A). If P(B) is
affected by the occurrence of A , then the two events are NOT
independent.
So the answer to our question is NO! When you think about it, if two
events are disjoint, then those events can NOT be independent.
Example: (4.33 p.247)

A state lottery’s Pick 3 game asks players to choose a 3-digit number, 000
to 999. The state chooses the winning number at random, so each number
has probability 1/1000. You win if the winning number contains the digits in
your number, in any order.



Your number is 456. What is your probability of winning? (Write on a piece of
paper)
Your number is 212. What is your probability of winning.
Solution: As always, you have to read the question carefully, and
always be sure to watch the fine print! In this case, the three tiny
words at the end, “in any order” change the entire nature of the
problem.


So, instead of winning with 456 where your probability would be 1/1000,
you can also win with 465, or 546, or 564 etc. If you look at all possible
permutations, you will see that there are 6 distinct possibilities: {456,
465, 546, 564, 645, 654}. Since each one has a 1/1000 chance of
winning, and you have six possibilities, the probability of winning is
6/1000 or 0.006.
For the second part, you only have three distinct arrangements: {212,
221, 122}. So your probability of winning is 3 * 1/1000 = 0.003.
Example - 4.34 (p.247)

The PINs for ATMs usually consist of 4 digits. You ntoice that most PINs have
at least one 0 and you wonder if the issuers use lots of 0s to make the
numbers easy to remember. Suppose that PINs are assigned at random so
that all 4-digit numbers are equally likely.


How many possible PINs are there? – Write down your answer
What is the probability that a PIN assigned at random has at least one 0? – Write.
SOLUTION:
 Possible PINs: 104 = 10,000
 The operative word for the second question is “at least”. In other words, what
is the probability that 1 or 2 or 3 or all 4 numbers are zero? This turns out to
be a surprisingly involved calculation. However, if you remember your trusty
complement rule, it becomes surprisingly easy! You realize that you could
phrase things another way: If I can determine the probability that NONE of the numbers
are 0, then the complement of that equals the probability at AT LEAST one is 0.
 P(one number = 0) = 0.1, P(not 0)=0.9
 P(all 4 numbers are not 0) = 0.9*0.9*0.9*0.9 = 0.6561. Don’t forget that this is
the probability that NONE are 0. So the probability that 1 or more are zero = 10.06561 = 0.3439.
Topics we’ve just covered:
Randomness and Probability models

Probability and Randomness

Sample spaces

Probability rules

Assigning probabilities: finite number of outcomes

Assigning probabilities: equally likely outcomes

Independence and multiplication rule
Write a short summary of each of these on a sheet
Keep that sheet in front of you as you are studying, and refer to it frequently.
It is very useful both for review, and to keep from confusing all these different concepts.
Probability: The Study of
Randomness
Random Variables
IPS Chapters 4.3 and 4.4
© 2009 W.H. Freeman and Company
Objectives (IPS Chapters 4.3 and 4.4)
Random variables

Discrete random variables

Continuous random variables

Normal probability distributions

Mean of a random variable

Law of large numbers

Variance of a random variable

Rules for means and variances
Discrete random variables
A random variable is a variable whose value is a numerical outcome of
a random phenomenon.
A basketball player shoots three free throws. We define the random variable
X as the number of baskets successfully made.
A discrete random variable X has a finite number of possible values.
A basketball player shoots three free throws. The number of baskets
successfully made is a discrete random variable (X). X can only take the
values 0, 1, 2, or 3.
* Probability Distribution
The probability distribution of a
random variable X lists the values
and their probabilities:
The probabilities pi must add up to 1. (Recall Probability Rule #2)
A basketball player with a 50% free throw accuracy shoots three free throws.
The random variable X is the number of baskets successfully made. Here is a
probability table:
H
H -
HHH
M -
HHM
H -
HMH
M -
HMM
Value of X
0
1
2
3
Probability
1/8
3/8
3/8
1/8
MMM
HMM
MHM
MMH
H
M
M…
…
HHM
HMH
MHH
HHH
The probability of any event is the sum of the probabilities pi of the
various values of X that make up the event.
A basketball player shoots three free throws. He typically makes a basket exacty 50% of
the time. The random variable X is the number of baskets successfully made.
What is the probability that the player
successfully makes at least* two
Value of X
0
1
2
3
Probability
1/8
3/8
3/8
1/8
MMM
HMM
MHM
MMH
HHM
HMH
MHH
HHH
baskets (“at least two” means “two or
more”)?
P(X≥2) = P(X=2) + P(X=3) = 3/8 + 1/8 = 1/2
In this case, the event is “2 or more free throws”. The possible values of
X making up this event are {2, 3}
* “at least”: An important concept that has a way of showing up very often in real life (and on exams)
The probability of any event is the sum of the probabilities pi of the
[various] values of X that make up the event.
A basketball player shoots three free throws. The random variable X is the
number of baskets successfully made.
What is the probability that the player
Value of X
0
1
2
3
Probability
1/8
3/8
3/8
1/8
MMM
HMM
MHM
MMH
HHM
HMH
MHH
HHH
successfully makes fewer than three
baskets?
P(X<3) = P(X=0) + P(X=1) + P(X=2) = 1/8 + 3/8 + 3/8 = 7/8
We can also use the complement:
P(X<3) = 1 – P(X=3) = 1 – 1/8 = 7/8
What is the event? What are the values of X that make up this event?
Event:
<3 baskets
Values of X:
{0, 1, 2}
Continuous random variables
A continuous random variable X takes all values in an interval (range).
Example: There is an infinity of numbers between 0 and 1 (e.g., 0.001, 0.4, 0.0063876).
How do we assign probabilities to events in a sample space that is infinite?!?
 Answer: We use density curves and compute the probabilities for intervals.
 The probability of any event is the area under the density curve for the
values of X that make up the event.
Shown here is a uniform density curve for the
variable X.
The probability that X falls between 0.3 and 0.7 is
the area under the density curve for that interval:
P(0.3 ≤ X ≤ 0.7) = (0.7 – 0.3)*1 = 0.4
X
“Uniform” Density Curve





“Uniform” means that all probabilities are identical
A uniform density curve is a straight line (red line in this diagram)
The area under the curve represents the total probability and sums
to 1
Recall that density curves can come in all kinds of shapes
The bell curve is only one kind of distribution. It is so familiar to us
because this kind of distribution shows up very often in the real
world





people’s heights
exam scores
intelligence scores
etc
In probabilities, we frequently encounter situations where all
possible outcomes have the same probabilities. (e.g. on a dice roll,
all numbers have an equal probability of being rolled). In this case,
there will not be any curve to the density curve. Intsead, it will be a
straight horizontal line. This is called a “uniform density curve”.
Intervals
The probability of a single event is meaningless for a continuous
random variable. Only intervals can have a non-zero probability.
Again, probability is represented by the area under the density curve for that interval.
The probability of a single event is zero:
P(X=1) = (1 – 1)*1 = 0
Height
=1
The probability of an interval is the same whether
boundary values are included or excluded:
P(0 ≤ X ≤ 0.5) = (0.5 – 0)*1 = 0.5
P(0 < X < 0.5) = (0.5 – 0)*1 = 0.5
X
P(0 ≤ X < 0.5) = (0.5 – 0)*1 = 0.5
P(X < 0.5 or X > 0.8) = P(X < 0.5) + P(X > 0.8) = 1 – P(0.5 < X < 0.8) = 0.7
Example: Random number between 0-2


In this case, you are generating a single random number between 0
and 2. The density curve is shown below.
What is the height?


What is the probability that a random number generated will be <1.5?


Answer: 0.5  Recall that the total area under a density curve represents the
complete range of probabilities. Also recall that the total probability must sum to 1.
So if your range is from 0-2, then the total height must be 0.5.
Answer: Find the area between 0 and 1.5. Ie: (1.5-0) * 0.5 = 0.75
Find the probability of a random number between 0.6 and 1.7

Answer: (1.7-0.6)*0.5 = 0.55
Example: Another example of a density curve
We generate two random numbers between 0 and 1 and take Y to be their sum.
Therefore, Y can take any value between 0 and 2. The density curve for Y is:
Height = 1. We know this because the
base = 2, and the area under the
curve has to equal 1 by definition.
Y
0
1
2
The area of a triangle is
½ (base*height).
What is the probability that Y is < 1?
What is the probability that Y < 0.5?
0.125
0.125
0
0.5
0.25
0.5
1
1.5
2
Question: Is this a uniform density curve?
Answer: No. Recall that a uniform density curve is one in which all possible
values in the sample space have the exact same probability.
For example, P(1) is different from P(1.3) which is different from P(0.14), etc etc.
Y
0
1
2
Continuous random variable and population distribution
There are two ways of looking at the
information under a density curve.
The shaded area under a density
curve shows the proportion, or %,
of individuals in a population with
values of X between x1 and x2.
Because the probability of drawing
one individual at random
depends on the frequency of this
type of individual in the population,
the probability is also the shaded
area under the curve.
% individuals with X
such that x1 < X < x2
What is the probability, if we pick one woman at random, that her height will be
some value X? For instance, between 68 and 70 inches P(68 < X < 70)?
Because the woman is selected at random, X is a random variable.
z
(x  )
N(µ, ) =
N(64.5, 2.5)

As before, we calculate the zscores for 68 and 70.
For x = 68",
z
(68  64.5)
 1.4
2.5
For x = 70",
z
(70  64.5)
 2.2
2.5
0.9192
0.9861

The area under the curve for the interval [68" to 70"] is 0.9861 − 0.9192 = 0.0669.
Thus, the probability that a randomly chosen woman falls into this range is 6.69%.
P(68 < X < 70) = 6.69%
What is the probability, if we pick one woman at random, that her height
will be 70 inches?
Answer: We can NOT say. Recall that with continuous variables, it is not
possible to determine the probability of a single value. We can only determine
the probability of a range of values.
* Mean of a random variable
The mean (x-bar) of a set of actual observations is their arithmetic average.
E.g. The mean exam score of all students in a stats class.
The mean µ of a random variable X is a weighted average of the possible
values of X, reflecting the fact that all outcomes might not be equally likely.
A “50% free throwing” basketball player shoots three free throws. The random
variable X is the number of baskets successfully made (“H”).
MMM
HMM
MHM
MMH
HHM
HMH
MHH
HHH
Value of X
0
1
2
3
Probability
1/8
3/8
3/8
1/8
The mean of a random variable X is also called expected value of X.
Mean of a discrete random variable
For a discrete random variable X with
the following probability distribution 
the mean µ of X is found by multiplying each possible value of X by its
probability, and then adding the products.
A basketball player shoots three free throws. The random variable X is the
number of baskets successfully made.
Value of X
0
1
2
3
Probability
1/8
3/8
3/8
1/8
The mean µ of X is
µ = (0*1/8) + (1*3/8) + (2*3/8) + (3*1/8)
= 12/8 = 3/2 = 1.5
Mean of a continuous random variable
The probability distribution of continuous random variables is
described by a density curve.
With symmetric curves, such as
the Normal curve, the mean lies at
the center.
Determining the exact mean of a
distribution with a skewed density
curve is more complex.
** Law of large numbers
As the number of randomly drawn
observations (n) in a sample
increases, the mean of the sample
(x-bar) gets closer and closer to
the population mean .
This is the law of large numbers.
It is valid for any population.
Note: Humans typically expect predictability over a few random observations, but this is wrong! (e.g.
flipping a coin 10 times will not typically yield 0.5; filip a million times and you’ll get extremely close to
0.5)The law of large numbers only applies to really large numbers.
Standard Deviation (variance) of a random variable
The standard deviation / variance are the measures of spread that
accompany the choice of the mean to measure center. Recall that SD
is simply the square root of the variance.
The standard deviation of a random variable is a weighted average of
the deviations of the variable X from its mean. Each outcome is
weighted by its probability in order to take into account outcomes that
are not equally likely.


Yes, this is the same variance and sd from before. However, when
calculating these values in a probability situation, we need to factor in the
different probabilities of each possible outcome.
KP: We need to take each possible outcome and assign a weight to it
according to its probability. Unlike dice rolls / coin flips, many cases to NOT
have outcomes of identical probability.
Standard Deviation of a discrete random variable
For a discrete random variable X
with the probability distribution shown
and mean µX, the sd of X is found by multiplying each deviation by its
probability and then adding all the products. This formula shows variance:
A basketball player shoots three free throws. The random variable X is the
number of baskets successfully made.
µX = 1.5.
The variance
σ2
Value of X
0
1
2
3
Probability
1/8
3/8
3/8
1/8
of X is
σ2 = 1/8*(0−1.5)2 + 3/8*(1−1.5)2 + 3/8*(2−1.5)2 + 1/8*(3−1.5)2
= 2*(1/8*9/4) + 2*(3/8*1/4) = 24/32 = 3/4 = .75
Practice on your own: 4.32/4.33
p. 267 and 269
(p.278 & 280 in 6th ed)

Linda is a sales associate at a large auto dealership. At her comission rate of 25% of
gross profit on each vehicle she sells, Linda expects to earn $350 for each car sold and
$400 for each truck or SUV sold. Linda motivates herself by using probability estimates
of her sales. For a sunny Saturday in April, she estimates her car sales as follows:
Cars Sold
Prob
Trucks Sold
Prob


0
1
2
3
0.3
0.4
0.2
0.1
0
1
2
3
0.4
0.5
0.1
0
What is Linda’s mean expected income?
What is the spread of her expected income?
Probability: The Study of
Randomness
General Probability Rules
IPS Chapter 4.5
© 2009 W.H. Freeman and Company
Objectives (IPS Chapter 4.5)
Generalized probability rules

General addition rules

Conditional probability

General multiplication rules

Tree diagrams

Bayes’s rule
General addition rule
General addition rule for any two events A and B:
Including nondisjoint events!
The probability that A occurs,
B occurs, or both events occur is:
P(A or B) = P(A) + P(B) – P(A and B)
What is the probability of randomly drawing either an ace or a heart from a deck of
52 playing cards? There are 4 aces in the pack and 13 hearts. However, 1 card is
both an ace and a heart. If you’re not careful, you’ll count it twice. So you need to
subtract it once. Thus:
P(ace or heart) = P(ace) + P(heart) – P(ace and heart)
= 4/52 + 13/52 - 1/52 = 16/52 ≈ .3
Eg: A die roll: P(even or >2)?





P(Even)
= 3/6
P(>2)
= 4/6
P(Even OR >2) = 7/6???
No, because we count a 4 and 6 two times. (Draw a Venn
diagram of dice). We must subtract the number of times they
show up together: 4 (even and >2) and also 6 (even and >2). In
other words, 2/6 times, they are together. So:
7/6 – 2/6 = 5/6
Another example:

In a group of 100 athletes, 30% are basketball players. In that same group,
25% are over 6-feet tall.




What is P(BB or >6 feet)? If we count 0.3 + 0.25, we will get an artificially
high number. This is because of the 25 people who are 6 feet, many are
basketball players as well. So we will have counted those people TWICE.
In other words, P(BB or 6’):




P(Basketball Player) = 0.3
P(>6 feet tall) = 0.25
It is NOT P(Basketball Player) + P(>6-feet tall)
The reason is that these events are NOT disjoint. That is, it is possible for any
member of the population to be BOTH a basketball player and also 6 feet tall.
The addition rule that we have discussed will NOT give a valid answer in this
case.
To get an accurate result, we have to factor in that group that we counted
twice, namely, those people who are BB and 6 feet.
= P(BB) + P(6’) – P(BB and 6’)
* More than one non-disjoint event

EXAMPLE: What is the probability that a card from a deck is either a
King or a Queen or a Diamond?
= P(King) + P(Queen) + P(Diamond) – all possible non-disjoint events
= P(King) + P(Queen) + P(Diamond) – P(King and Diamond) – P(Queen and Diamond)
= 4/52 +
4/52 +
13/52
–
1/52
–
1/52
= 19 / 52
= 0.365
Conditional probability
Conditional probabilities reflect how the probability of an event can
change if we know that some other event has occurred.

Example: The probability that a cloudy day will result in rain is different if
you live in Los Angeles than if you live in Seattle.

Every single day, our brains calculate conditional probabilities, updating
our “degree of belief” with each new piece of evidence.
Notation: The conditional probability
of event B “given” event A is:
Note: We assume that P(A) ≠ 0
P( A and B)
P( B | A) 
P( A)
Spoken as: “Probability of B given A”
P(A|B) is not the same as P(B|A)

P(Rain | Seattle) is not the same thing as P(Seattle | Rain)
General multiplication rule (“And”)

Probability that any two events, A and B, both occur:
P(A and B) = P(A) * P(B|A)
This is the general multiplication rule.
(i.e. NOT limited to independent events)

Recall that if A and B are independent, then P(A and B) = P(A) * P(B)
(A and B are independent when they have no influence on each other’s occurrence.)
Example: What is the probability of randomly drawing either an ace or heart from a deck
of 52 playing cards?
Answer: Use the addition rule
What is the probability of randomly drawing either an ace and a heart from a deck of 52
playing cards?
P(ace and heart) = P(ace)* P(heart | ace) = (4/52)*(1/4) = 1/52
Sometimes it may help to switch ‘em around






Recall: P(A and B) = P(A) * P(B | A)
Sometimes when trying to figure out P(A and B), you can’t figure out P(B |
A).
Sometimes though, you do know the other ‘side’: P(A | B).
Now observe that P(A and B) is the same thing as saying P(B and A).
Therefore, you can say: P(B and A) = P(B) * P(A | B)
EXAMPLE: Suppose you visit Seattle 8 weekends out of the year.
P(Seattle) = 8/52. You also know that it rains in Seattle 38 weekends out of
the year: P(rain | Seattle) = 38/52. What is the probability that you will be in
Seattle this weekend and that it will be raining?


P(Rain and Seattle) = P(Rain) * P(Seattle | rain)  awkward – in fact, not
possible since we don’t know some of the probabilities required here such as
P(Rain).
P(Seattle and Rain) = P(Seattle) * P(Rain | Seattle)  much easier
Example – 4.43, p.283

Slim is playing poker and holds 3 diamonds. He hopes to draw two
diamonds in a row in order to get a flush (all cards of the same suit).
Of all the cards showing (those in Slim’s hand and those upturned
on the table), he sees 11 cards. 4 of which are diamonds. So 9 of
the 41 remaining unseen cards must be diamonds. What is the
probability that Slim will draw his two diamonds?



P(first card is a diamond) = 9/41
P(second card diamond | given first card diamond) = 8/40
9/41 * 8/40 = 0.044
Conditional probability can be applied to more than two
events

What is the probability that a person chosen at random off the street
is: Italian, Male and plays the lute?





A = probability of being Italian
B = male
C = plays lute
P(A and B and C) = P(A) * P(B | A) * P(C | A and B)
Key Point: It can get a bit unweildy, but if you have the data, it is
possible to do these calculations.
Probability trees (skip)
Conditional probabilities can get complex, and it is often a good strategy
to build a probability tree that represents all possible outcomes
graphically and assigns conditional probabilities to subsets of events.
Tree diagram for chat room
habits of three adult age
groups: 18-29, 30-49, >=50.
Internet
user
0.47
P(chatting) = 0.136 + 0.099 + 0.017
= 0.252
About 25% of all adult Internet users visit chat rooms.
Breast cancer screening
If a woman in her 20s gets screened for breast cancer and receives a positive
test result, what is the probability that she does have breast cancer?
Diagnosis
sensitivity 0.8
Disease
incidence
0.0004
Positive
Cancer
0.2
Mammography
0.9996
0.1
Negative False negative
False positive
Positive
No cancer
Incidence of breast
cancer among
women ages 20–30
0.9
Diagnosis
specificity
Negative
Mammography
performance
She could either have a positive test and have breast cancer or have a positive
test but not have cancer (false positive).
Diagnosis
sensitivity 0.8
Disease
incidence
Positive
Cancer
0.0004
0.2
Mammography
0.1
0.9996
Negative
False negative
Positive
False positive
No cancer
Incidence of breast
cancer among
women ages 20–30
0.9
Diagnosis
specificity
Negative
Mammography
performance
Possible outcomes given the positive diagnosis: positive test and breast cancer
or positive test but no cancer (false positive).
P(cancer and pos )
P(cancer and pos)  P(nocancer and pos)
0.0004*0.8

 0.3%
0.0004*0.8  0.9996*0.1
P(cancer | pos) 
This value is called the positive predictive value, or PV+. It is an important piece
of information but, unfortunately, is rarely communicated to patients.
Bayes’s rule
An important application of conditional probabilities is Bayes’s rule. It is
the foundation of many modern statistical applications beyond the
scope of this course.
* If a sample space is decomposed in k disjoint events, A1, A2, … , Ak
— none with a null probability but P(A1) + P(A2) + … + P(Ak) = 1,
* And if C is any other event such that P(C) is not 0 or 1, then:
However, it is often intuitively much easier to work out answers with a
probability tree than with these lengthy formulas.
If a woman in her 20s gets screened for breast cancer and receives a positive test
result, what is the
Disease
incidence
probability that
she does have
Diagnosis
sensitivity 0.8
Positive
Cancer
0.0004
0.2
breast cancer?
Mammography
0.1
0.9996
Negative
False negative
Positive
False positive
No cancer
Incidence of breast
cancer among
women ages 20–30
0.9
Diagnosis
specificity
Negative
Mammography
performance
This time, we use Bayes’s rule:
A1 is cancer, A2 is no cancer, C is a positive test result.
P( pos | cancer ) P(cancer )
P( pos | cancer ) P(cancer )  P( pos | nocancer ) P (nocancer )
0.8*0.0004

 0.3%
0.8*0.0004  0.1*0.9996
P(cancer | pos) 