Slides for Session #20

Transcript Slides for Session #20

Statistics for Social
and Behavioral Sciences
Part IV: Causality
Comparison of two groups
Chapter 7
Prof. Amine Ouazad
Statistics Course Outline
PART I. INTRODUCTION AND RESEARCH DESIGN
Week 1
Four Steps of “Thinking Like a Statistician”
Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling
Biases: Nonresponse bias, Response bias, Sampling bias
PART II. DESCRIBING DATA
Weeks 2-4
Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule
Bivariate sample statistics: Correlation, Slope
PART III. DRAWING CONCLUSIONS FROM DATA:
INFERENTIAL STATISTICS
Weeks 5-9
Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99%
Testing a hypothesis using the CI method and the t method.
Weeks 10-14
PART IV. : CORRELATION AND CAUSATION:
TWO GROUPS, REGRESSION ANALYSIS
This is where we talk
about Zmapp and Ebola!
Coming up
• “Comparison of Two Groups”
This Session.
• “Univariate Regression Analysis”
Next Session Saturday.
• “Association and Causality”
Tuesday, Thursday and Extra Session.
• “Randomized Experiments (Cted), ANOVA”.
Last Tuesday and Extra Session.
• “Robustness Checks and Wrap Up”.
Last Thursday.
Outline
1. Randomized controlled trials
2. t test for equality of means
Next time: Inference in Univariate Regressions
Do U.S. Employers Discriminate?
• Employers post job ads.
• Sometimes mentioning they are an “Equal Opportunity
Employer.”
• Some Employers are federal contractors.
• Lots of anecdotal evidence…
– “In hiring, racial bias is still a problem.” Forbes.
– “Protesters allege hiring discrimination by Ferrara Candy”,
Chicago Tribune, October 28, 2014.
• But we can’t trust stories…
• Very very tough question. Should be extra careful.
• What about causal evidence from statistical data?
Outline
1. Randomized controlled trials
2. t test for equality of means
Next time: Inference in Univariate Regressions
Difference of means
• Two groups: White and African American.
– m1: sample mean in first group.
X1i: observation of individual i in group 1.
– m2: sample mean in the second group.
X2i: observation of individual i in group 2.
• The expected value of the difference m1-m2?
• Sampling distribution of the difference m1-m2?
• Standard error of m1-m2:
Standard deviation of the sampling distribution of m1-m2.
Confidence Interval for
the Difference of Means
• Very very similar to the one group.
• t is also chosen from Table 5.1.
• Degrees of freedom df given either by the Welch approximation or
the Satterthwaite approximation (see end of handout). In general:
• use of t distribution makes normality assumption on X.
• Robustness to violations of the normality assumptions, esp. for
proportions.
t statistic for the difference
of means
• Built similarly as in the one group case.
• Can also subtract numerator by v when testing
for the equality of the mean to a number v.
Methods for hypothesis testing
H0 : m1 = m2
Ha : m1 different from m2.
Reject the null hypothesis if either:
The confidence interval does not include 0
The t statistic is above the t score in absolute value
Application
• Compute the t statistics of the difference for
each city.
• Can you recover the p values using Stata?
• Can you reject the null hypothesis that the call
back rates for White and African American
names are equal?
Fill in the t statistics here. Can we reject the null hypothesis?
t statistic
Can you recover the p values?
• Using display 2*ttail in Stata.
• And df approximately n1+n2 – 2.
Reject?
Resume Quality And Callback Rates
A refinement for sample proportions
• When X1i and X2i are variables that take only
two values 0 or 1.
• m1 and m2 are sample proportions p1 and p2.
• Group 1 size : n1. Group 2 size: n2.
• H0: “p1 = p2”
• Under the null, the standard deviations of the
two samples are equal (p), and thus:
• df = n1 + n2 – 2 and
Back to Café Firenze?
• What we did before:
– Confidence interval around the mean m1 of Café
Firenze and the mean m2 of Lebanese Express.
– We showed that:
• Exercise at home:
– Can we reject the null hypothesis that the true
mean rating m1 of Café Firenze is different from
the true mean rating m2 of Lebanese Express?
Another application of t tests for the
equality of means
• “In many studies, one group of volunteers will be given
an experimental or "test" drug or treatment, while the
control group is given either a standard treatment for
the illness or an inactive pill, liquid, or powder that has
no treatment value (placebo). This control group
provides a basis for comparison for assessing effects of
the test treatment. In some studies, the control group
will receive a placebo instead of an active drug or
treatment. In other cases, it is considered unethical to
use placebos, particularly if an effective treatment is
available. Withholding treatment (even for a short
time) would subject research participants to
unreasonable risks.”
Coming up:
•
•
•
•
•
Reading : Chapter on “Comparing Two Groups”.
Next chapter 9 with t tests for slope coefficients.
Online quiz this weekend on this material.
Session on Saturday at 12.45 in the same room -> catch up for National Day.
Make sure you come to sessions and recitations.
For help:
• Amine Ouazad
Office 1135, Social Science building
[email protected]
Office hour: Tuesday from 5 to 6.30pm.
• GAF: Irene Paneda
[email protected]
Sunday recitations.
At the Academic Resource Center, Monday from 2 to 4pm.
Read only if interested:
Degrees of freedom for two groups x and y
• Satterthwaite’s approximate formula:
• Welch’s approximate formula:

Slides for Session #20

Transcript Slides for Session #20

Directory