Slides for Session #10

Download Report

Transcript Slides for Session #10

Midterm 1
•
•
•
•
•
Well done !!
Mean 80.23%
Median 84.6%
Standard deviation of 16.24 ppt.
5th percentile is 53.
Statistics for Social
and Behavioral Sciences
Session #9:
Probabilities
(Agresti and Finlay, Chapter 9)
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
Week 1
PART II. DESCRIBING DATA
Weeks 2-4
PART III. DRAWING CONCLUSIONS FROM DATA:
INFERENTIAL STATISTICS
Weeks 5-9
Firenze or Lebanese Express?
PART IV. : CORRELATION AND CAUSATION:
REGRESSION ANALYSIS
This is where we talk
about Zmapp and Ebola!
Weeks 10-14
Wrap-up on First Part
1. We are curious…. we ask empirical questions.
2. We design the study:
– Data collection, e.g. by simple random sampling.
– Nonresponse bias, response bias, sampling bias.
3. We describe the data:
– Using statistics:
• Univariate: mean, standard deviation, variance.
• Bivariate: correlation, slope, R squared, TSS, ESS, SSE.
– We measure statistics but are interested in parameters…
Statistics suffer from sampling error.
4. How can we make inferences??
e.g. concluding that the coin is balanced.
➥ This is the focus of the next 3-4 weeks.
Outline
1. Probabilities
The Three Rules
1. Random Variable
Expectation of a random variable
Next time:
Probability Distributions (continued)
Chapter 4 of A&F
Probability and Luck
• We play a game together…
– Heads you win 10 dirham.
– Tails I win 10 dirham.
• We play the game a very large
number of times.
• Should you play this game?
• P(heads) = 0.5, P(tails) = 0.5
Probability and Luck
• P(heads) = 1 – P(not heads)
• P(heads) is read as “probability of heads”.
• Game sequence:
– In the long run, with a balanced coin, 0.5 of the trials will lead to
heads, 0.5 of the trials will lead to tails.
– The probability of heads is the ratio of the number of heads to the
number of trials, with an infinite number of draws…
Number of heads
P(heads) =
Number of draws
Perform the game for a very
long number of draws.
… the longer the game the
closer the ratio will be to 0.5
Sometimes we can’t
repeat our choices
Life is full of random events… but
• We only draw one job at the end of university.
– Hard to know what other incomes/jobs we would
have gotten.
• We only draw one opponent in a football
game.
– Subsequent games are not identical to this one.
– What is the probability of winning?
• We only die once at a particular age.
– What is the probability of death at age 50?
Sometimes we can’t
repeat our choices
• In such a case we define the probability of an
event as the ratio of the number of such
events over the number of individuals in
identical circumstances.
– … for a very large number of such individuals.
• Example: number of individuals with the same
degree, same age as me:
• What is the probability of earning more than
$45,000 in my first job?
P(earning ³ $45,000)
Sometimes we can’t repeat
our choices? Edge of Tomorrow
Lt. Col. Bill Cage is an officer who has never
seen a day of combat when he is
unceremoniously dropped into what amounts
to a suicide mission. Killed within minutes,
Cage now finds himself inexplicably thrown
into a time loop - forcing him to live out the
same brutal combat over and over, fighting
and dying again - and again.
Groundhog day
A weatherman finds himself living the same
day over and over again.
« a blizzard develops that Connors had
predicted would miss them, closing the roads
and shutting down long-distance phone
service, forcing the team to return to
Punxsutawney. Connors awakens the next
morning, however, to find it is again February
2, and his day unfolds in exactly the same way.
He is aware of the repetition, but everyone
else seems to be living February 2 exactly the
same way and for the first time. »
Are these really
independent events??
Probability and Luck
• What is the probability that you win twice in a
row?
– P(heads in the first round)
* P(heads in the second round) =
– Because the draws in the first and the second
round are independent events.
• What is the probability that you win k times in
a row?
– P(heads in the first round)
* P(heads in the second round)
* …. * P(heads in the kth round) =
Rules for probability distributions
In general, we talk about the probability of an event.
– What is the probability that « it rains tomorrow »?
For an event A…
1. P(not A) = 1 – P(A).
P(A and B) = 0
If A and B are distinct possible events (with no overlap), then
2. P(A or B) = P(A) + P(B).
If A and B are two (possibly related) events,
3. P(A and B) = P(A) x P(B given that A has occured).
Special case: If A and B are independent, i.e. P(B given A) = P(B),
3’. P(A and B) = P(A) x P(B)
Applications: Coins
1. P(getting tails) = P(not getting heads)
= 1 – P(getting heads)
2. P(tails) + P(heads) = P(tails or heads) = 1
3. P(tails in 1st throw and tails in 2nd throw)
= P(tails in 1st throw)
x P(tails in 2nd throw given tails in 1st throw).
with independence
P(tails in 1st throw and tails in 2nd throw)
= P(tails in 2nd throw)
x P(tails in 2nd throw)
Applications: Dice
1. P(throwing 4) = P(not throwing 4)
= 1 – P(throwing 4)
2. P(throwing 4) + P(throwing 7) = P(throwing 4 or 7) = 2/6
3. P(throwing 4 in 1st throw and throwing 7 in 2nd throw)
= P(throwing 4 in 1st throw)
x P(throwing 7 in 2nd throw given throwing 4 in 1st throw).
with independence
P(throwing 4 in 1st throw and throwing 7 in 2nd throw )
= P(throwing 4 in 1st throw ) x P(throwing 7 in 2nd throw)
Inverse Probability Fallacy
• When asked about the probability of the disease
given the symptom P(disease | symptom) clinicians
tend to answer with the probability of the
symptom given disease P(symptom | disease).
There is an equal number of blue and green cabs in
the capital of Happinistan. The color of the cab is
independent of the probability of having an accident.
• What is the probability that a taxi has been
involved in an accident given that it is green?
Gloms and Fizos
Outline
1. Probabilities
The Three Rules
1. Random Variable
Expectation of a random variable
Next time:
Probability Distributions (continued)
Chapter 4 of A&F
Random variable
A random variable is a variable whose value is not given exante… but rather can take multiple values ex-post.
• Example:
– X is a random variable that, before the coin is tossed (ex-ante),
can take values « Heads » or « Tails ». Once the coin is tossed
(ex-post), the value of X is known, it is either « Heads » or
« Tails ».
– Y is a random variable that can take values 1,2,3,4,5, or 6
depending on the draw of a dice. Before the dice is thrown,
the value is not known. After the dice is drawn, we know the
value of Y.
Probability distribution
of a random variable
• Take all possible values of a random variable Y:
– Example: 1,2,3,4,5,6
– In general: y1, y2, y3, …, yK.
• Probability of the event that the random variable Y equates
yk is noted P(Y=yk) or simply P(yk).
• The probability distribution of random variable Y is the list of
all values of P(Y=yk).
• Example: for a balanced dice, the
probability distribution of Y is the
list of values P(Y=1), P(Y=2), P(Y=3), …
which is {1/6,1/6,1/6,1/6,1/6,1/6}
All throughout the course
we consider either discrete
quantitative random
variables or categorical
random variables.
Expected value of a random variable
What are your expected gains when playing the coin game?
• Gain is a random variable, equal to +10 AED when getting
heads, and -10 AED when getting tails.
E(gain) = Gain when getting heads x Probability of heads
+ Gain when getting tails x Probability of tails.
In general, for a random variable Y, the expected value of Y is:
• E(Y) = S yk P(Y=yk)
Also note that probabilities sum to one.
S P(Y=yk) = 1
Expected Earnings?
• « Your annual earnings right after NYU Abu
Dhabi » is a random variable…
– The variable has not been realized yet.
Let’s give it a name
Y = « Your annual earnings right after NYU Abu
Dhabi ».
• E(earnings) = E(Y) = S yk P(Y=yk)
Takes potentially K values.
• Problemo: We don’t observe earnings in the
future!!!
Expected Earnings?
An approximation is to use the distribution of current
graduates …
To substitute for our lack of knowledge
of P(Y=yk) for each k.
• Earnings take K distinct values, no two graduates earn
exactly the same annual wage…
• Hence an approximation of expected earnings is
E(Y) = S yk x (1/ K)
• The average earnings of current graduates…
• But that’s only an approximation !! What could be
wrong?
Properties of the Expectation
The expectation of the sum is the sum of the expectations:
• E(earnings – debt) = E(earnings) – E(debt)
The expectation of a constant x the random variable is the
constant x the expectation:
• E( Constant x Earnings ) = Constant x E(Earnings)
E.g. E(Earnings in AED) = 3.6 x E(Earnings in USD)
Beware !!!
• E( X Y ) is not E(X) E(Y) in general.
• When X and Y are independent, E( X Y ) = E(X) E(Y).
• Law of conditional expectation E(X)=E(E(X|Z))
Wrap Up
• Four rules of probability distributions
1. P(not A) = 1 – P(A)
2. P(A or B) = P(A) + P(B) when P(A and B)=0
3. P(A and B)=P(A) P(B given A)
Beware of the inverse probability fallacy,
P(B given A) is not P(A given B)
3’. P(A and B)=P(A) P(B) when A and B are independent
• Random variable
– Variable whose value has not been realized.
• Probability distribution of a random variable
– List of the probabilities of the values of the random variable.
• Expected value of a random variable E(Y)=S yk P(Y=yk)
– E(X+Y)=E(X)+E(Y), E(cX)= c E(X), E(X) = E(X|Z)
Coming up:
Readings:
• Chapter 4 entirely – full of interesting examples and super relevant.
• Online quiz on Thursday night.
• No slide due on Thursday.
For help:
• Amine Ouazad
Office 1135, Social Science building
[email protected]
Office hour: Tuesday from 5 to 6.30pm.
• GAF: Irene Paneda
[email protected]
Sunday recitations.
At the Academic Resource Center, Monday from 2 to 4pm.