Statistics 400 - Lecture 2
Download
Report
Transcript Statistics 400 - Lecture 2
Statistics 400 - Lecture 23
Last Day: Regression
Today: Finish Regression, Test for Independence (Section 13.4)
Suggested problems: 13.21, 13.23
Computer Output
Will not normally compute regression line, standard errors, … by
hand
Key will be identifying what computer is giving you
SPSS Example
S
u
E
t
s
h
q
R
q
m
M
u
u
a
1
0
9
a
P
b
O
m
e
a
S
d
u
F
a
M
i
f
g
a
a
1
R
6
1
6
1
0
R
4
7
1
T
0
8
a
P
b
D
i
a
c
n
d
e
d
f
f
a
i
t
i
s
c
B
e
M
E
t
i
g
t
1
(
C
2
4
0
6
R
4
7
6
7
0
a
D
What is the Coefficients Table?
What is the Model Summary?
What is the ANOVA Table
Back to Probability
The probability of an event, A , occurring can often be modified
after observing whether or not another event, B , has taken place
Example: An urn contains 2 green balls and 3 red balls. Suppose
2 balls are selected at random one after another without
replacement from the urn.
Find P(Green ball appears on the first draw)
Find P(Green ball appears on the second draw)
Conditional Probability
The Conditional Probability of A given B :
P ( A and B )
P( A | B)
P( B)
Example: An urn contains 2 green balls and 3 red balls. Suppose
2 balls are selected at random one after another without
replacement from the urn.
A={Green ball appears on the second draw}
B= {Green ball appears on the first draw}
Find P(A|B) and P(Ac|B)
Example:
Records of student patients at a dentist’s office concerning fear of
visiting the dentist suggest the following proportions
Fear Dentist
Do Not Fear Dentist
School Level
Elementary
Middle
0.12
0.08
0.28
0.25
Let A={Fears Dentist}; B={Middle School}
Find P(A|B)
High
0.05
0.22
Conditional Probability and Independence
If fearing the dentist does not depend on age or school level what
would we expect the probability distribution in the previous example
to look like?
What does this imply about P(A|B)?
If A and B are independent, what form should the conditional
probability take?
Summarizing Bivariate Categorical Data
Have studied bivariate continuous data (regression)
Often have two (or more) categorical measurements taken on the
same sampling unit
Data usually summarized in 2-way tables
Often called contingency tables
Test for Independence
Situation: We draw ONE random sample of predetermined size
and record 2 categorical measurements
Because we do not know in advance how many sampled units will
fall into each category, neither the column totals nor the row totals
are fixed
Example:
Survey conducted by sampling 400 people who were questioned
regarding union membership and attitude towards decreased
spending on social programs
Union
Non-Union
Total
Support
112
84
196
Indifferent
36
68
104
Opposed
28
72
100
Total
176
224
400
Would like to see if the distribution of union membership is
independent of support for social programs
If the two distributions are independent, what does that say about
the probability of a randomly selected individual falling into a
particular category
What would the expected count be for each cell?
What test statistic could we use?
Formal Test
Hypotheses:
Test Statistic:
P-Value:
Spurious Dependence
Consider admissions from a fictional university by gender
Male
Female
Total
Admit
490
280
770
Deny
210
220
430
Male
Female
Admit
0.70
0.56
Deny
0.30
0.44
Is there evidence of discrimination?
Consider same data, separated by schools applied to:
Business School:
Male
Female
Admit
480
180
Deny
120
20
Male
Female
Admit
0.80
0.90
Deny
0.20
0.10
Deny
90
200
Male
Female
Admit
0.10
0.33
Deny
0.90
0.67
Law School:
Male
Female
Admit
10
100
Simpson’s Paradox: Reversal of comparison due to aggregation
Contradiction of initial finding because of presence of a lurking
variable