Basic Probability & Contingency Tables
Download
Report
Transcript Basic Probability & Contingency Tables
Basic Probability
With an Emphasis on
Contingency Tables
Students in PSYC 2101
• Skip to Slide # 7.
Random Variable
• A random variable is real valued function
defined on a sample space.
– The sample space is the set of all distinct
outcomes possible for an experiment.
– Function: two sets’ (well defined collections
of objects) members are paired so that each
member of the one set (domain) is paired
with one and only one member of the other
set (range)
• The domain is the sample space, the
range is a set of real numbers.
• A random variable is the set of pairs
created by pairing each possible
experimental outcome with one and only
one real number.
Examples
the outcome of rolling a die: = 1, =
2, = 3, etc. (Each outcome has only
one number, and, vice versa)
= 1, = 2, = 1, etc. (each
outcome has (odd-even) only one number,
but not vica versa)
The weight of each student in my
statistics class.
Probability Distribution
• Each value of the random variable is
paired with one and only one probability.
• More on this later.
Probability Experiments
• A probability experiment is a welldefined act or process that leads to a
single well defined outcome.
– Flip a coin, heads or tails.
– Roll a die, how many spots up.
– Stand on a digital scale, what number is
displayed.
Probability
• The probability of an event, P(A) is the
fraction of times that event will occur in an
indefinitely long series of trials of the
experiment.
• Cannot be known, can be estimated.
Estimating Probability
• Empirically – perform experiment many
times, compute relative frequencies.
• Rationally – make assumptions and then
apply logic.
• Subjectively – strength of individual’s
belief regarding whether an event will or
will not happen – often expressed in terms
of odds.
Odds of Occurrence of Event A
• If the experiment were performed (a & b)
times, we would expect A to occur a times
and B to occur b times.
• There are 20 students in a class, 14 of
whom are women. If randomly select one,
what are the odds it will be a woman?
• 14 to 6 = 7 to 3.
Convert Odds to Probability
•
•
•
•
Probability = a/(a & b).
14 women, 6 men.
Odds = 7 to 3.
Probability = 7 out of 10.
Convert Probability to Odds
• Odds = P(A)/P(not A)
• Probability = .70
• Odds = .70/(1 - .70) = 7 to 3
Independence
•
Two events are independent iff (if and
only if) the occurrence or non-occurrence
of the one has no effect on the occurrence
or non-occurrence of the other.
– I roll a die twice. The outcome on the first roll
has no influence on the outcome on the
second roll.
Mutual Exclusion
• Two events are mutually exclusive iff the
occurrence of the one precludes
occurrence of the other (both cannot occur
simultaneously on any one trial).
– You could earn final grade of A in this class.
– You could earn a B.
– You can’t earn both.
Mutual Exhaustion
•
Two (or more) events are mutually
exhaustive iff they include all possible
outcomes.
– You could earn a final grade of A, B, C, D,
or F.
– These are mutually exhaustive since there are
no other possibilities.
Marginal Probability
•
The marginal probability of event A,
P(A), is the probability of A ignoring
whether or not any other event has also
occurred.
– P(randomly selected student is female) = .70
Conditional Probability of A
• the probability that A will occur given that
B has occurred
• P(A|B), the probability of A given B.
– Given that the selected student is wearing a
skirt, the probability that the student is female
is .9999
– Unless you are in Scotland
• If P(A|B) = P(A), the A and B are
independent of each other.
Joint Probability
• The probability that both A and B will
occur.
• P(A B) = P(A) P(B|A) = P(B) P(A|B)
• If A and B are independent, this simplifies
to P(A B) = P(A) P(B)
• This is known as the Multiplication Rule
The Addition Rule
• If A and B are mutually exclusive, the
probability that one or the other will occur
is the sum of their separate probabilities.
Grade
A
B
C
D
F
Probability
.2
.3
.3
.15
.05
P(A B) P(A) P(B) .2 .3 .5
• If A and B are not mutually exclusive,
things get a little more complicated.
• P(A B) = P(A) + P(B) - P(A B)
Two-Way Contingency Table
• A matrix where rows represent values of
one categorical variable and columns
represent values of a second categorical
variable.
• Can be use to illustrate the relationship
between two categorical variables.
Survey Questions
•
We have asked each of 150 female
college students two questions:
1. Do you smoke (yes/no)?
2. Do you have sleep disturbances
(yes/no)?
• Suppose that we obtain the following
data (these are totally contrived, not
real):
Marginal Probabilities
P(Smoke)
100
150
10
15
2
.6 6
P(Sleep)
3
90
150
9
15
Sleep?
Smoke?
No
Yes
No
20
30
50
Yes
40
60
100
60
90
150
3
5
. 60
Conditional Probabilities Show
Absolute Independence
P(Sleep | Smoke)
60
100
3
. 60
P(Sleep | Nosmoke)
5
30
50
Sleep?
Smoke?
No
Yes
No
20
30
50
Yes
40
60
100
60
90
150
3
5
. 60
Multiplication Rule Given
Independence
• Sixty of 150 have sleep disturbance and
smoke, so P (Sleep Smoke) = 60/150 =
.40
• P(A B) = P(A) x P(B)
P(Sleep Smoke) P(Sleep) x P(Smoke)
3
5
2
3
6
15
. 40
“Sleep” = Sexually Active
• Preacher claims those who smoke will go
to Hell.
• And those who fornicate will go to Hell.
• What is the probability that a randomly
selected coed from this sample will go to
Hell?
Addition Rule
P(Sleep)
90
150
P(Sleep)
9
. 60
P(Smoke)
15
P(Smoke)
100
150
9
15
10
15
19
10
.6 6
15
1 . 27
15
A probability cannot exceed one.
Something is wrong here!
Welcome to Hell
• The events (sleeping and smoking) are
not mutually exclusive.
• We have counted the overlap between
sleeping and smoking (the 60 women who
do both) twice.
• 30 + 40 + 60 = 130 of the women sleep
and/or smoke.
• The probability we seek = 130/150 = 13/15
= .87
Addition Rule For Events That
Are NOT Mutually Exclusive
P(Sleep Smoke)
P(Sleep)
P(Smoke)
9
6
15
10
15
-
15
13
15
- P(Sleep Smoke)
.87.
Sleep = Sexually Active,
Smoke = Use Cannabis
Sleep?
Smoke?
No
Yes
No
30
20
50
Yes
40
60
100
70
80
150
Marginal Probabilities
P(Smoke)
100
150
2
.6 6
P(Sleep)
3
80
150
Sleep?
Smoke?
No
Yes
No
30
20
50
Yes
40
60
100
70
80
150
8
15
.5 3
Conditional Probabilities Indicate
Nonindependence
P(Sleep | Smoke)
60
. 60
P(Sleep | Nosmoke)
100
20
50
Sleep?
Smoke?
No
Yes
No
30
20
50
Yes
40
60
100
70
80
150
. 40
Joint Probability
• What is the probability that a randomly
selected coed is both sexually active and a
cannabis user?
• There are 60 such coeds, so the
probability is 60/150 = .40.
• Now let us see if the multiplication rule
works with these data.
Multiplication Rule
P(Sleep Smoke)
P(Sleep) x P(Smoke)
8
15
2
3
16
.3 5
45
• Oops, this is wrong. The joint probability
is .40. We need to use the more general
form of the multiplication rule.
Multiplication Rule NOT
Assuming Independence
P(Smoke
P(Smoke)
2
3
3
5
Sleep)
P(Sleep | Smoke)
6
. 40 .
15
• Now that looks much better.
Actual Data From Jury Research
• Castellow, Wuensch, and Moore (1990,
Journal of Social Behavior and
Personality, 5, 547-562
• Male employer sued for sexual
harassment by female employee.
• Experimentally manipulated physical
attractiveness of both litigants
Effect of Plaintiff Attractiveness
• P(Guilty | Attractive) = 56/73 = 77%.
• P(Guilty | Not Attractive) = 39/72 = 54%.
• Defendant found guilty more often if
plaintiff was attractive.
Guilty?
Plaintiff
Attractive?
No
Yes
No
33
39
72
Yes
17
56
73
50
95
145
Odds and Odds Ratios
•
•
•
•
Odds(Guilty | Attractive) = 56/17
Odds(Guilty | Not Attractive) = 39/33
Odds Ratio = 56/17 39/33 = 2.79.
Odds of guilty verdict 2.79 times higher
when plaintiff is attractive.
Guilty?
Plaintiff
Attractive?
No
Yes
No
33
39
72
Yes
17
56
73
50
95
145
Effect of Defendant Attractiveness
• P(Guilty | Not Attractive) = 53/70 = 76%.
• P(Guilty | Attractive) = 42/75 = 56%.
• The defendant was more likely to be found
guilty when he was unattractive.
Guilty?
Attractive?
No
Yes
No
17
53
70
Yes
33
42
75
50
95
145
Odds and Odds Ratio
•
•
•
•
Odds(Guilty | Not Attractive) = 53/17.
Odds(Guilty | Attractive) = 42/33.
Odds Ratio = 53/17 42/33 = 2.50.
Odds of guilty verdict 2.5 times higher
when defendant is unattractive.
Guilty?
Attractive?
No
Yes
No
17
53
70
Yes
33
42
75
50
95
145
Combined Effects of Plaintiff and
Defendant Attractiveness
• Plaintiff attractive, Defendant not = 83%
guilty.
• Defendant attractive, Plaintiff not = 41%
guilty.
• Odds ratio = 83/17 41/59 = 7.03.
• When attorney tells you to wear Sunday
best to trial, listen.
Odds Ratios and Probability Ratios
• Odds of Success
– 90/10 = 9 for Antibiotic Group
– 40/60 = 2/3 for Homeopathy Group
– Odds Ratio = 9/(2/3) = 13.5
Odds Ratios and Probability Ratios
• Odds of Failure
– 10/90 = 1/9 for Antibiotic Group
– 60/40 = 1.5 for Homeopathy Group
– Odds Ratio = 1.5/(1/9) = 13.5
Notice that the
odds ratio
comes out the
same with both
perspectives.
Odds Ratios and Probability Ratios
• Probability of Success
– 90/100 = .9 for Antibiotic Group
– 40/100 = .4 for Homeopathy Group
– Probability Ratio = .9/(.4) = 2.25
Odds Ratios and Probability Ratios
• Probability of Failure
– 10/100 = .1 for Antibiotic Group
– 60/100 = .6 for Homeopathy Group
– Odds Ratio = .6/(.1) = 6
Notice that the
probability ratio
differs across
perspectives.
Another Example
• According to Medscape, 0.5% of the
general population has narcissistic
personality disorder (NPD)
• The rate is 20% among members of the
US Military.
Odds Ratios
• Odds of NPD
– Military: .2/.8 = .25
– General: .005/.995 = .005
– Ratio: .25/.005 = 49.75
• Odds of NOT NPD
– Military: .8/.2 = 4
– General: .995/.005 = 199
– Ratio: 199/4 = 49.75
Probability Ratios
• Probability of NPD
– Military: 20%
– General: 0.5%
– Ratio: 20/0.5 = 40.
• Probability of NOT NPD
– Military: 80%
– General: 99.5%
– Ratio: .995/.8 = 1.24
Probability Distributions
• For a discrete variable, pair each value
with the probability of obtaining that value.
• For example, I flip a fair coin five times.
What is the probability for each of the six
possible outcomes?
• May be a table, a chart, or a formula.
Probability Table
Number of
Heads
0
1
2
3
4
5
Percent
3.1
15.6
31.2
31.2
15.6
3.1
Probability Chart
Probability Formula
• y is number of heads, n is number of
tosses, p is probability of heads, q is
probability of tails
P Y y
n!
y ! (n - y) !
y
p q
n y
Continuous Variable
• There is an infinite number of values, so a
table relating each value to a probability
would be infinitely large.
• The probability of any exact value is
vanishingly small.
• We can find the probability that a randomly
selected case has a value between a and
b.
Evolution of a Continuous Variable
• I’ll start with a histogram for a discrete
variable.
• In each step I’ll double the number of
values (and number of bars).
• All the way up to an infinite number of
values with each bar infinitely narrow.
• Now one final step, to an uncountably
large number of bars, each infinitely
narrow, yielding a continuous, uniform
distribution ranging from A to B.
• Now I do the same but I start with a
binomial distribution with p = .5 and three
bars.
• Note that the bars are not all of equal
height.
• Each time I split one, I lower the height of
the tail-wards one more than the centerwards one.
• Now one final leap to a continuous
(normal) distribution with an uncountably
large number of infinitely narrow bars.
Random Sampling
• Sampling N data points from a population
is random if every possible different
sample of size N was equally likely to be
selected.
• Random samples most often will be
representative of the population.
• Our stats assume random sampling.
Y Random, X Not
Probability
Sample
AB
AC
AD
BC
BD
CD
X
1/2
0
0
0
0
1/2
Y
1/6
1/6
1/6
1/6
1/6
1/6
Counting Rules
• PSYC 2101 students can skip the material
in the rest of this slide show.
Arranging Y Things
• There are Y! ways to arrange Y different
things.
• I am getting a four scoop ice cream cone.
• Chocolate, Vanilla, Coconut, and Mint.
• How many different ways can I arrange
these four flavors?
• 4! = 4(3)(2)(1) = 24.
Permutations
• If I have 10 different flavors, how many
different ways can I select and arrange 4
different flavors from these 10?
N!
( N Y )!
10 !
(10 4 )!
10 9 8 7 6!
6!
5040
Combinations
• Same problem, but order of the flavors does
not count.
• The are Y! ways to arrange Y things, so just
divide the number of permutations by Y!
N!
( N Y )! Y !
10 !
6! 4!
10 9 8 7 6!
6! 4 3 2
210
Number of Different Strings
• CL = number of different strings
• C is the number of different characters
available
• L is the length of the string.
• Ten different characters (0 – 9) and two
character strings
• 102 = 100 different strings
•
•
•
•
•
•
Use letters instead (A through Z)
262 = 676 different strings
Use letters and numbers
362 = 1,296 different strings
Use strings of length 1 or 2.
36 + 1,296 = 1,332 different strings
•
•
•
•
•
•
•
•
Use strings of length up to 3.
363 = 46,656 three character strings
+ 1,332 one and two character strings
47,988 different strings.
Use lengths up to 4
1,679,616 + 47,988 = 1,727,604
Use lengths up to 5
60,466,176 + 1,727,604 = 62,193,780
• Use strings of length up to 6
• 2,176,782,336 + 62,193,780 =
2,238,976,116 different strings
• That is over 2 BILLION different strings.