Are Women Paid Less than Men?

Download Report

Transcript Are Women Paid Less than Men?

1.
Are Women Paid Less than Men?
Intro and revision of basic statistics
Learning Objectives
1.
2.
3.
4.
5.
6.
7.
Review of basic statistics
Some basic stata commands
Nature of randomness
Distributions
The use and abuse of the Normal Distribution
Basic Hypothesis testing
The role of prediction & causation
Wages and Gender
• Some Stata commands
• We can use data in wages.dta to examine this
issue
– .dta is a stata format data file
• Start stata via ucd connect
• Review the dataset using basic stata commands
–
–
–
–
Editor
summ
Describe
sort
Wages and gender
• Basic answer to the question is to compare
average wages.
• Stata:
– summ wage if gender==1
– Summ wage if gender==2
• So we can say wages different in this sample
• Can we say this difference would be true
generally?
– Note the implicit “out of sample” prediction in this
question
Statistical Inference
• Key point is that would be no problem if we
observed wages of all men and women.
• Sampling is the source of the problem
– Result could it be dumb luck
– Choice of sample could be dumb luck
– Isolated example?
• What is chance that different sample lead to
radically different result?
Statistical Inference
• Key Point: Statistical inference is process of
deciding when we can use a sample result to
make statements about the world
• Also referred to as Hypothesis testing
– Using this data can we reject the hypothesis that both
groups are paid the same?
• Rough ans: we can if difference between the
average wage is large (How large?)
• To answer precisely we need to review the nature
of randomness – the role of “dumb luck”
Random Variables
• Outcome of an “experiment” value unknown until
it is observed
– Dice, coin, roulette wheel
– Getting a job, the wage
– In our example the observed differences between
wages may be random and have nothing to do with
gender i.e. dumb luck
• Continuous vs Discrete Random Variables
– Discrete: Finite no of possible outcomes
• Dice, coin, lottery
• Useful for intuition
– Continuous: Outcome can be any value over a range
• Most real world data is continuous
Distribution of Random Variables
• Discrete random variable
– Establish the frequency of outcomes in repeated
experiments
• Continuous Random Variable
– Outcome can be any value over a range
– Can establish probability of r.v. being with a
particular interval,
– not of taking a particular value (zero by definition)
– e.g. Household Income, interest rates, price of
bread
Probability Density Function f(x)
• Mathematical representation of the
distribution of a random variable
• f(x) = Pr(X=x)
– prob of dice to get 3 i.e. f(3) = Pr(X=3) = 1/6.
•  f(x) = 1; Sum of probabilities = 1,  0  f(x)
1
• For a continuous rv
– can take on any value within an interval: infinite
no. of values.
– NB: the probability of one exact value occurring =
0:
• Pr(a  X  b) = ?
• Probability is not measured by height now –
measured by area
An example of a continuous RV
Pr(Y 1200) = Shaded Area =
1200
0
f ( y)
Integral defines the probability:
f(y)
1200
• An example of the bell curve
Empirical vs Theoretical Distribution
• The examples so far are theoretical
distributions
• Real world data wont necessarily match these
exactly or even approximately
• We can show the empirical distribution of
data on a histogram
• File dice.dta contains 10 dice each rolled 3449
times
– hist dice1
10
5
0
Percent
15
20
Empirical Dist of Dice
1
2
3
4
dice1
5
6
Example using wage data
• Histogram will show the distribution
– Stata: hist wages, bin(50)
• Can also do it separately for the two genders
– Hist wages if gender==1, bin(50) norm
– Hist wages if gender==2, bin(50) norm
• This shows that the distribution of wages
looks a little different for both groups
– Not just the average
• Note that bell curve is bad approximation
Female
.1
0
.05
Density
.15
.2
Male
0
5
10
15
20
0
5
hwage
Density
normal hwage
Graphs by ==1 for a man, 2 for a woman
10
15
20
Characteristics of a Distribution
• We can characterise the difference between
distributions in many ways
– The two main are the Expected value and the
Variance
• Expected Value is the average
– Weighted by probability
Sample and Theoretical Mean
• Sample and theoretical mean will be different
– dice:
E(X)=1.(1/6)+2.(1/6)+3.(1/6)+4.(1/6)+5.(1/6)+6.(1/
6)=3.5
– For dice1: summ dice1
• Continuous:
– We already did for gender using summ
Rules of Expectations
1.
2.
3.
4.
5.
E(X+Y) = E(X)+E(Y)
E(X-Y) = E(X)-E(Y)
E(aX) =a E(X)
E(X+a) = E(X)+a
E(aX+bY+cZ) = aE(X)+bE(Y)+cE(X)
See dice.dta for examples of this
Variance of a random variable
• Distribution has an average but it also varies
around that average
• Need a concept to measure that dispersion
 
Var(X) = E  X  E X


 E X 2


 2 X .E  X 
 E  X 2   2E



 E  X 2    E

 



2
  E

X EX 
X 



X 
2
 
 
 

  E

X 



2
2
In stata part of output of “summ”
Dice Example
X2
X
(X-E(X))2
1
1
(1-3.5)2 = 6.25
2
4
(2-3.5)2=2.25
3
9
(3-3.5)2=0.25
4
15
….0.25
5
25
….2.25
6
36
….6.25
=21
=91
=17.5
Dice Example cont.
E (X) = X = 21/6 = 3.5
E(X2)= 91/6 = 15.166
Var (X) = E(X2) – (E(X))2 = 15.166 – (3.5)2 = 2.916
Var(X) =
1  X  E X 2  17.5  2.916
   6
n  
Note That the theoretical variance may differ from the variance in the sample
Stata: summ dice1
Rules of Variance
Var (aX) = a2Var(X):
Var(X+a) = Var(X):
Var(a+bX) = b2Var(X):
Var (aX+bY) = a2Var(X) + b2Var(Y) if X and Y
are independent. (deal with dependence
later).
• Standard Deviation = Square root of
Variance:  = 2
1.
2.
3.
4.
Normal Distribution
• A special continuous distribution that can be
very useful
• AKA “Gaussian Distribution”, “Bell Curve”,
“Law of Errors”
• Mean and variance completely define it
– X~N(m,
Bell Curve
Pr(Y 1200) = Shaded Area =
1200
0
f ( y)
Integral defines the probability:
f(y)
1000
1200
Calculating Probabilities from Bell
Curve
• Integral solved for you by computer
• in stata the “normal” function gives area under
the curve of standard normal rv
– Mean=0, variance=1
– display normal(0.5)
• For other normal make use of trick
–
–
–
–
If y~N(m, then z=(y-m/ is N(0,1)
Mean 1000, stn dev 100
Prob(x<1200)=Prob(z<2)
di normal(2)
Properties of the Normal
• Symmetric around the mean
– Positive and negative deviations are equally likely
• The probability of a deviation declines with
the size of a deviation
• approx. 68% of the area under the curve lies
between [m,m+]
• and 95% is in the interval [m2,m+2]
Using the Bell Curve
• We often assume that data has normal
distribution
– Easy to use
– Often intuitive
• But remember nothing is actually normal
• We choose to model things as normal
– it is only a convenient approximation
– Could be very wrong
• Always have to ask yourself if it is reasonable to
treat the data as normal
• Check histogram
Using the Bell Curve
• Nassim Taleb made career out of complaining
that we assume data is normal when it is not
– See Fooled by Randomness
• Already seen that bad approx to wage data
• Bad approximation to stock market data as
grossly underside tails
– Low prob of large changes
– See sandp.dta
0
.2
Density
.4
.6
Empirical dist of % Stock Returns
-4
-2
0
2
% change in daily closing price
4
Answer the question!
• We now have some tools which we can use to
answer the question of gender bias in wages
• Recall that women are paid less on average in
the sample
• Recall that the issue is whether we can use
this fact about the sample to make statements
about the world (“population”)
An Answer
• We will develop a precise method of answering these
kind of questions as we go thorough the course
• For the moment we adopt a very simple heuristic
approach
• Most people would suggest that the difference in the
means is the key
– i.e. 6.701875 -5.451302= 1.250573
• So men are paid more than women
• We know the problem with this is that it doesn’t take
account of possibility that sample may be unusual
– The statistical inference problem
An Answer
• Suppose we think that by dumb luck all our
women have come from the low end of the
distribution of wages and all our men have
come from the upper end
– We would find “discrimination” but it is just to our
weird sample
• So we need to account for the dispersion of
the data
• Recall we measure dispersion by the variance
An Answer
• Think of how this works
– If the variance is large then by dumb luck we could
have chosen a sample that where men came from one
end of the distribution and women from the other
– If the variance of wages is small then the distance
between the two ends of the distribution isn't that big
– So any difference in the wage is less likely due to
chance
• So what matters is the difference in the (mean)
wage relative to the dispersion or variance of
wages
• Sample size will also matter
Answer
• So we calculate
t 
m1  m 2
 / n
– This is known as the “test statistic”
• If this is large then we will say there is a
statistically significant difference between the
two
• How large? Typically around 2
– We will see why later
Here it is!
• Stata:
– di (6.701875 -5.451302)/(3.157414/345^0.5)
• Gives 7.35
• This is more than 2 so we reject the idea of equal
wages
• Note we are not saying discrimination is certain
– “statistically significant” difference in wages
– The observed difference is large enough that it is
unlikely to have ocurred by chance
– We can reject the null hypothesis of equal wages
– Note the phrasing “can reject’’
A Comment on Prediction
• We have taken a fact about the sample and
inferred a result about the world
– Statistical inference
– May be wrong
• Implicit prediction: women (outside the
sample) will be paid less than men
• Over the next few weeks will learn what is
missing from this technique
A Comment on Causation
• Implicit causation:
– gender “causes” the pay difference
• Only really shown correlation
• Need theory to justify causation
• Think of effect of schooling vs effect of
intelligence
• This is problem even if solve the sampling
issue
What Have We Learned?
1. Most fundamental issue: sampling causes
randomness
2. Theoretical distributions are only approx to
the true distribution
3. Distributions can be characterised by means
and variances
4. Sampling “error”
– makes statistical inference non-trivial
– Prediction is implicit in everything
5. Causation is not the same as correlation
What is missing?
• Only one of the variables was continuous
• We were very lax and vague in the discussion
of statistical inference
• Both of these to be corrected in the next
section