Statistics 400 - Lecture 2

Download Report

Transcript Statistics 400 - Lecture 2

Statistics 400 - Lecture 23
 Last Day: Regression
 Today: Finish Regression, Test for Independence (Section 13.4)
 Suggested problems: 13.21, 13.23
Computer Output
 Will not normally compute regression line, standard errors, … by
hand
 Key will be identifying what computer is giving you
SPSS Example
S
u
E
t
s
h
q
R
q
m
M
u
u
a
1
0
9
a
P
b
O
m
e
a
S
d
u
F
a
M
i
f
g
a
a
1
R
6
1
6
1
0
R
4
7
1
T
0
8
a
P
b
D
i
a
c
n
d
e
d
f
f
a
i
t
i
s
c
B
e
M
E
t
i
g
t
1
(
C
2
4
0
6
R
4
7
6
7
0
a
D
 What is the Coefficients Table?
 What is the Model Summary?
 What is the ANOVA Table
Back to Probability
 The probability of an event, A , occurring can often be modified
after observing whether or not another event, B , has taken place
 Example: An urn contains 2 green balls and 3 red balls. Suppose
2 balls are selected at random one after another without
replacement from the urn.
 Find P(Green ball appears on the first draw)
 Find P(Green ball appears on the second draw)
Conditional Probability
 The Conditional Probability of A given B :
P ( A and B )
P( A | B) 
P( B)
 Example: An urn contains 2 green balls and 3 red balls. Suppose
2 balls are selected at random one after another without
replacement from the urn.
 A={Green ball appears on the second draw}
 B= {Green ball appears on the first draw}
 Find P(A|B) and P(Ac|B)
Example:
 Records of student patients at a dentist’s office concerning fear of
visiting the dentist suggest the following proportions
Fear Dentist
Do Not Fear Dentist
School Level
Elementary
Middle
0.12
0.08
0.28
0.25
 Let A={Fears Dentist}; B={Middle School}
 Find P(A|B)
High
0.05
0.22
Conditional Probability and Independence
 If fearing the dentist does not depend on age or school level what
would we expect the probability distribution in the previous example
to look like?
 What does this imply about P(A|B)?
 If A and B are independent, what form should the conditional
probability take?
Summarizing Bivariate Categorical Data
 Have studied bivariate continuous data (regression)
 Often have two (or more) categorical measurements taken on the
same sampling unit
 Data usually summarized in 2-way tables
 Often called contingency tables
Test for Independence
 Situation: We draw ONE random sample of predetermined size
and record 2 categorical measurements
 Because we do not know in advance how many sampled units will
fall into each category, neither the column totals nor the row totals
are fixed
Example:
 Survey conducted by sampling 400 people who were questioned
regarding union membership and attitude towards decreased
spending on social programs
Union
Non-Union
Total
Support
112
84
196
Indifferent
36
68
104
Opposed
28
72
100
Total
176
224
400
 Would like to see if the distribution of union membership is
independent of support for social programs
 If the two distributions are independent, what does that say about
the probability of a randomly selected individual falling into a
particular category
 What would the expected count be for each cell?
 What test statistic could we use?
Formal Test
 Hypotheses:
 Test Statistic:
 P-Value:
Spurious Dependence
 Consider admissions from a fictional university by gender
Male
Female
Total
Admit
490
280
770
Deny
210
220
430
Male
Female
Admit
0.70
0.56
Deny
0.30
0.44
 Is there evidence of discrimination?
 Consider same data, separated by schools applied to:
 Business School:
Male
Female
Admit
480
180
Deny
120
20
Male
Female
Admit
0.80
0.90
Deny
0.20
0.10
Deny
90
200
Male
Female
Admit
0.10
0.33
Deny
0.90
0.67
 Law School:
Male
Female
Admit
10
100
 Simpson’s Paradox: Reversal of comparison due to aggregation
 Contradiction of initial finding because of presence of a lurking
variable