Transcript Week 6

161.120 Introductory Statistics
Week 6 Lecture slides
• Probability
– CAST chapter 8
– Text sections 7.1 to 7.3, 7.6 and 7.7
• Random Variables
– Text sections 8.1, 8.2 and 8.5
– CAST section 9.1
7.1 Random Circumstances
Random circumstance is one in which
the outcome is unpredictable.
Case Study 1.1
Alicia Has a Bad Day
Doctor Visit:
Diagnostic test comes back positive for a disease (D).
Test is 95% accurate.
About 1 out of 1000 women actually have D.
Statistics Class:
Professor randomly selects 3 separate students at
the beginning of each class to answer questions.
Alicia is picked to answer the third question.
Random Circumstances in Alicia’s Day
Random Circumstance 1: Disease
status
Alicia has D.
Alicia does not have D.
Random Circumstance 2: Test result
Test is positive.
Test is negative.
Random Circumstances in Alicia’s Day
Random Circumstance 3: 1st student’s name is drawn
Alicia is selected.
Alicia is not selected.
Random Circumstance 4: 2nd student’s name is drawn
Alicia is selected.
Alicia is not selected.
Random Circumstance 5: 3rd student’s name is drawn
Alicia is selected.
Alicia is not selected.
Assigning Probabilities
• A probability is a value between 0 and 1 and is
written either as a fraction or as a decimal fraction.
• A probability simply is a number between 0 and 1
that is assigned to a possible outcome of a random
circumstance.
• For the complete set of distinct possible outcomes
of a random circumstance, the total of the assigned
probabilities must equal 1.
7.2 Interpretations
of Probability
The Relative Frequency
Interpretation of Probability
In situations that we can imagine repeating
many times, we define the probability of a specific
outcome as the proportion of times it would occur
over the long run -- called the relative frequency
of that particular outcome.
Example 7.1 Probability of Male
versus Female Births
Long-run relative frequency of males
born in the United States is about .512.
Information Please Almanac (1991, p. 815).
Table provides results of simulation: the proportion is far from .512
over the first few weeks but in the long run settles down around .512.
Determining the Relative Frequency
Probability of an Outcome
Method 1: Make an Assumption about the Physical World
Example 7.2 A Simple Lottery
Choose a three-digit number between 000 and 999.
Player wins if his or her three-digit number is chosen.
Suppose the 1000 possible 3-digit numbers
(000, 001, 002, . . . , 999) are equally likely.
In long run, a player should win about 1 out of 1000
times.
This does not mean a player will win exactly once
in every thousand plays.
Determining the Relative Frequency
Probability of an Outcome
Method 1: Make an Assumption about the Physical World
Example 7.3 Probability Alicia has to Answer a Question
There are 50 student names in a bag.
If names mixed well, can assume each
student is equally likely to be selected.
Probability Alicia will be selected to
answer the first question is 1/50.
Determining the Relative Frequency
Probability of an Outcome
Method 2: Observe the Relative Frequency
Example 7.4 The Probability of Lost Luggage
“1 in 176 passengers on U.S. airline carriers
will temporarily lose their luggage.”
This number is based on data collected over the long
run. So
the probability that a randomly selected passenger on a
U.S. carrier will temporarily lose luggage is 1/176 or
about 0.006.
Proportions and Percentages
as Probabilities
Ways to express the relative frequency of lost luggage:
• The proportion of passengers who lose their
luggage is 1/176 or about 0.006.
• About 0.6% of passengers lose their luggage.
• The probability that a randomly selected
passenger will lose his/her luggage is about 0.006.
• The probability that you will lose your luggage
is about 0.006.
Last statement is not exactly correct – your probability depends
on other factors (how late you arrive at the airport, etc.).
Estimating Probabilities
from Observed Categorical Data
Assuming data are representative, the
probability of a particular outcome is
estimated to be the relative frequency
(proportion) with which that outcome
was observed.
Approximate margin of error
for the estimated probability is 1
n
Example 7.5 Nightlights and Myopia
Revisited
Assuming these data are representative of a larger population,
what is the approximate probability that someone from that
population who sleeps with a nightlight in early childhood
will develop some degree of myopia?
Note: 72 + 7 = 79 of the 232 nightlight users developed some
degree of myopia. So we estimate the probability to be
79/232 = 0.34. This estimate is based on a sample of 232 people
with a margin of error of about 0.066
The Personal Probability Interpretation
Personal probability of an event = the degree
to which a given individual believes the event
will happen.
Sometimes subjective probability used because the
degree of belief may be different for each individual.
• Must fall between 0 and 1 (or between 0 and 100%).
7.3 Probability Definitions
and Relationships
• Sample space: the collection of unique,
nonoverlapping possible outcomes of a random
circumstance.
• Simple event: one outcome in the sample space;
a possible outcome of a random circumstance.
• Event: a collection of one or more simple
events in the sample space; often written as
A, B, C, and so on.
Example 7.6 Days per Week of Drinking
Random sample of college students.
Q: How many days do you drink alcohol
in a typical week?
Simple Events in the Sample Space are:
0 days, 1 day, 2 days, …, 7 days
Event “4 or more” is comprised of the
simple events {4 days, 5 days, 6 days, 7 days}
Assigning Probabilities to Simple Events
P(A) = probability of the event A
Conditions for Valid Probabilities
1. Each probability is between 0 and 1.
2. The sum of the probabilities over all
possible simple events is 1.
Equally Likely Simple Events
If there are k simple events in the sample space
and they are all equally likely, then the
probability of the occurrence of each one is 1/k.
Example 7.2 A Simple Lottery (cont)
Random Circumstance:
A three-digit winning lottery number is selected.
Sample Space: {000,001,002,003, . . . ,997,998,999}.
There are 1000 simple events.
Probabilities for Simple Event: Probability any specific
three-digit number is a winner is 1/1000.
Assume all three-digit numbers are equally likely.
Event A = last digit is a 9 = {009,019, . . . ,999}.
Since one out of ten numbers in set, P(A) = 1/10.
Event B = three digits are all the same
= {000, 111, 222, 333, 444, 555, 666, 777, 888, 999}.
Since event B contains 10 events, P(B) = 10/1000 = 1/100.
Complementary Events
One event is the complement of another event
if the two events do not contain any of the same
simple events and together they cover the entire
sample space.
Notation: AC represents the complement of A.
Note: P(A) + P(AC) = 1
Example 7.2 A Simple Lottery (cont)
A = player buying single ticket wins
AC = player does not win
P(A) = 1/1000 so P(AC) = 999/1000
Mutually Exclusive Events
Two events are mutually exclusive,
or equivalently disjoint, if they do not contain
any of the same simple events (outcomes).
Example 7.2 A Simple Lottery (cont)
A = all three digits are the same.
B = the first and last digits are different
The events A and B are mutually exclusive
(disjoint), but they are not complementary.
Independent and Dependent Events
• Two events are independent of each other
if knowing that one will occur (or has
occurred) does not change the probability
that the other occurs.
• Two events are dependent if knowing that
one will occur (or has occurred) changes
the probability that the other occurs.
The definitions can apply either …
to events within the same random circumstance or
to events from two separate random circumstances.
Example 7.7 Winning a Free Lunch
Customers put business card in restaurant glass bowl.
Drawing held once a week for free lunch.
You and Vanessa put a card in two consecutive weeks.
Event A = You win in week 1.
Event B = Vanessa wins in week 1.
Event C = Vanessa wins in week 2.
• Events A and B refer to the same random
circumstance and are not independent.
• Events A and C refer to to different random
circumstances and are independent.
Example 7.3 Alicia Answering (cont)
Event A = Alicia is selected to answer Question 1.
Event B = Alicia is selected to answer Question 2.
Events A and B refer to different random circumstances,
but are A and B independent events?
• P(A) = 1/50.
• If event A occurs, her name is no longer in the bag,
so P(B) = 0.
• If event A does not occur, there are 49 names in the
bag (including Alicia’s name), so P(B) = 1/49.
Knowing whether A occurred changes P(B).
Thus, the events A and B are not independent.
Conditional Probabilities
Conditional probability of the event B,
given that the event A occurs,
is the long-run relative frequency with which
event B occurs when circumstances are such
that A also occurs; written as P(B|A).
P(B) = unconditional probability event B occurs.
P(B|A) = “probability of B given A”
= conditional probability event B occurs given
that we know A has occurred or will occur.
Example 7.8 Probability That a Teenager
Gambles Depends upon Gender
Survey: 78,564 students (9th and 12th graders)
The proportions of males and females admitting
they gambled at least once a week during the
previous year were reported. Results for 9th grade:
P(student is weekly gambler | teen is boy) = 0.20
P(student is weekly gambler | teen is girl) = 0.05
Notice dependence between “weekly gambling habit”
and “gender.” Knowledge of a 9th grader’s gender
changes probability that s/he is a weekly gambler.
Two-Way Table:
“Hypothetical Hundred Thousand”
Example 7.8
Teens and Gambling (cont)
Sample of 9th grade teens: 49.1% boys, 50.9% girls.
Results: 22.9% of boys and 4.5% of girls admitted
they gambled at least once a week during previous year.
Start with hypothetical 100,000 teens …
(.491)(100,000) = 49,100 boys and thus 50,900 girls
Of the 49,100 boys, (.229)(49,100) = 11,244
would be weekly gamblers.
Of the 50,900 girls, (.045)(50,900) = 2,291
would be weekly gamblers.
Example 7.8 Teens and Gambling (cont)
Weekly Gambler Not Weekly Gambler Total
Boy
11,244
37,856
49,100
Girl
2,291
48,609
50,900
Total
13,535
86,465
100,000
P(boy and gambler) = 11,244/100,000 = 0.1124
P(boy | gambler) = 11,244/13,535 = 0.8307
P(gambler) = 13,535/100,000 = 0.13535
Simulations
Probability can be used to model complex situations.
A simulation of the model involves using the model's
probabilities to generate an instance of the situation.
Repeating the simulation can give insight into the behaviour of
the system.
CAST examples:
Tennis game, shows how randomly generating points can
simulate a tennis match.
Soccer league, shows how simulations can explore some
properties of the league
In a sample, any individual's value is
highly variable
• Unexplained variability in data is usually modelled as random
sampling from some underlying population.
– A consequence of this model is that standard graphical and
numerical summaries of data cannot be regarded as fixed
– different samples from the same population would result in
different summaries.
• The variability of random samples is most noticable when the
sample values belong to named individuals.
– The value for a single individual will vary greatly from sample to
sample and the rankings of the individuals are similarly variable.
Distribution of the sample as a whole
• Although the value associated with any individual is highly
variable, there is more stability in the overall distribution of
random samples.
• Features such as centre, spread or skewness in the distribution
are more stable from sample to sample.
Parameters and statistics
• We usually model data sets as random samples from some
population.
– The sampling process is random, so if sampling is repeated, a
different sample is usually obtained.
• Summary statistics will vary from sample to sample.
– The underlying population remains unchanged.
• Summary statistics will remain constant.
• To distinguish, we call numerical summaries of a population
population parameters whereas the corresponding summaries
of a sample are called sample statistics.
Population parameters are constants
Sample statistics vary from sample to sample
8.1 What is a Random Variable?
Random Variable: assigns a number to
each outcome of a random circumstance, or,
equivalently, to each unit in a population.
Two different broad classes of random variables:
1. A continuous random variable can take any
value in an interval or collection of intervals.
2. A discrete random variable can take one of a
countable list of distinct values.
Example 8.1 Random Variables at an
Outdoor Graduation or
Wedding
Some Random factors that will determine
how enjoyable the event is:
Temperature: continuous random variable
Number of airplanes that fly overhead:
discrete random variable
Example 8.2 Probability an Event
Occurs Three Times in Three Tries
• What is the probability that three tosses of a fair coin
will result in three heads?
• Assuming boys and girls are equally likely, what is the
probability that 3 births will result in 3 girls?
• Assuming probability is 1/2 that a randomly selected
individual will be taller than median height of a
population, what is the probability that 3 randomly
selected individuals will all be taller than the median?
• Answer to all three questions = 1/8.
• Discrete Random Variable X = number of
times the “outcome of interest” occurs in
three independent tries.
8.2 Discrete Random Variables
X the random variable.
k = a number the discrete r.v. could assume.
P(X = k) is the probability that X equals k.
Discrete random variable: can only result in a countable set of
possibilities – often a finite number of outcomes, but can be infinite.
Example 8.3 It’s Possible to Toss Forever
Repeatedly toss a fair coin, and define:
X = number of tosses until the first head occurs
Any number of flips is a possible outcome.
P(X = k) = (1/2)k
Probability Distribution of a Discrete R.V.
Using the sample space to find probabilities:
Step 1: List all simple events in sample space.
Step 2: Find probability for each simple event.
Step 3: List possible values for random variable X
and identify the value for each simple event.
Step 4: Find all simple events for which X = k, for
each possible value k.
Step 5: P(X = k) is the sum of the probabilities for
all simple events for which X = k.
Probability function (pf) X is a table or rule that assigns
probabilities to possible values of X.
Example 8.4
How Many Girls are
Likely?
Family has 3 children. Probability of a girl is ½.
What are the probabilities of having 0, 1, 2, or 3 girls?
Sample Space: For each birth, write either B or G. There are
eight possible arrangements of B and G for three births.
These are the simple events.
Sample Space and Probabilities: The eight simple events are
equally likely.
Random Variable X: number of girls in three births. For each
simple event, the value of X is the number of G’s listed.
Example 8.4
How Many Girls? (cont)
Value of X for each simple event:
Probability function for Number of Girls X:
Graph of the pf of X:
Conditions for Probabilities
for Discrete Random Variables
Condition 1
The sum of the probabilities over all
possible values of a discrete random
variable must equal 1.
Condition 2
The probability of any specific outcome
for a discrete random variable must be
between 0 and 1.
Cumulative Distribution Function
of a Discrete Random Variable
Cumulative distribution function (cdf) for a
random variable X is a rule or table that provides the
probabilities P(X ≤ k) for any real number k.
Cumulative probability = probability that X is less
than or equal to a particular value.
Example 8.4 Distribution Function
for the Number of Girls (cont)
Finding Probabilities for Complex Events
Example 8.4 A Mixture of Children
What is the probability that a family with 3 children
will have at least one child of each sex?
If X = Number of Girls then either family has one girl and
two boys (X = 1) or two girls and one boy (X = 2).
P(X = 1 or X = 2) = P(X = 1) + P(X = 2) = 3/8 + 3/8 = 6/8 = 3/4
pf for Number of Girls X:
8.5 Continuous
Random Variables
Continuous random variable: the outcome can
be any value in an interval or collection of intervals.
Probability density function for a continuous random
variable X is a curve such that the area under the curve over
an interval equals the probability that X is in that interval.
P(a  X  b) = area under density curve over the
interval between the values a and b.
Example 8.13 Time Spent Waiting for Bus
Bus arrives at stop every 10 minutes. Person arrives at stop
at a random time, how long will s/he have to wait?
X = waiting time until next bus arrives.
X is a continuous random variable over 0 to 10 minutes.
Note: Height is 0.10
so total area under the
curve is (0.10)(10) = 1
This is an example of a
Uniform random variable
Example 8.13 Waiting for Bus (cont)
What is the probability the waiting time X was in
the interval from 5 to 7 minutes?
Probability = area under curve between 5 and 7
= (base)(height) = (2)(.1) = .2