LIR 832 - Lecture 4

Download Report

Transcript LIR 832 - Lecture 4

Multivariate Methods
LIR 832
Multivariate Methods:
Topics of the Day
A. Isolating Interventions in a multi-causal
world
B. Multivariate probability Distributions
C. The Building Block: covariance
D. The Next Step: Correlation
A Multivariate World
Isolating Interventions in a Multi-Causal World

A. Example of problem:
 Evaluate a program to reduce absences from a plant?
 Is there age discrimination?

B. Types of data
 Experimental
 Quasi-experimental
 Non-experimental

C. Need multivariate analysis to sort out causal
relationships.
Bi-Variate Relations: A First
Run at Multivariate Methods
A. Many of the issues we are interested in are
essentially about the relationship between two
variables.
B. Bi-variate can be generalized to multivariate
relationships
C. We learn bi-variate formally and make more
intuitive reference to multivariate.
D. What do we mean by bi-variate relationship?
Bi-Variate Example
Our firm, has formed teams of engineers, accountants and
general managers at all plants to work on several issues
that are considered important in the firm. The firm has long
been committed to gender diversity and we are interested
in the distribution of gender among our managerial
classifications. We are particularly concerned about the
distribution of gender on these teams and particularly
among engineers. Consider the distribution of two
statistics about these three person teams.


a. gender of the team members (X: x = number of men)
b. is the engineer a woman (Y: 0 = man, 1 = woman)
Bi-Variate Example (cont.)
Bi-Variate Example (cont.)
Bi-Variate Example (cont.)
Bi-Variate Example (cont.)
Bi-Variate Example (cont.)
We can also use this information to build
conditional probabilities: What is the
likelihood that the engineer is a woman,
given that we have a man on the team?
Bi-Variate Example (cont.)
What is the likelihood that the engineer is a
woman, given that we have a man on the team?



P(Y = 1 & X = 1|X= 1)
= P(Y = 1 & X = 1)/P(X= 1)
= (2/8) / (3/8) = 2/3
Note: P(Y= 1|X=2) is:

“the probability that Y is equal to 1 given that X = 2"
or

“the probability that Y = 1 conditional on X = 2"
Bi-Variate Example (cont.)
What is the likelihood that there is only one
man, given the engineer is a woman?



P(Y = 1 & X = 1|Y= 1)
= P(Y = 1 & X = 1)/P(Y= 1)
= (2/8)/(4/8) = 2/4 =1/2
Bi-Variate Example (cont.)
What is the likelihood that the engineer is a
woman?

P(Y= 1) = 1/2
But if we know that there are two men, we can
improve our estimate:




P(Y=1 |X=2)
= P(Y=1 & X=2|X=2)
= P(Y=1 &X=2) / P(X=2)
= 1/8 / 3/8 = 1/3
What about calculating the likelihood of two men
given the engineer is a woman?
Example: Gender Distribution
Example: Gender Distribution
Working with Conditional Probability:

P(female) = 50.91%

P(female| LRHR) = p(Female & LRHR)/P(LRHR) =
0.36/0.55 = 65%

P(LRHR) = 0.55%

P(LRHR|Female) = p(lrhr & female)/p(female) =
.36/50.91 = .70%
Independence Defined
Now that we know a bit about bi-variate
relationships, we can define what it means,
in a statistical sense, for two events to be
independent.
If events are independent, then


Their conditional probability is equal to their
unconditional probability
The probability of the two independent events
occurring is P(X)*P(Y) = P(X,Y).
Importance of Independence
Why is independence important?


If events are independent, then we are getting
unique information from each data point. If
events are not independent, then
A practical example on running a survey on
employee satisfaction within an establishment.
Example:
Employee Satisfaction
Covariance
Covariance: Building Block of Multivariate Analysis

All very nice, but what we are looking for is a
means of expressing and measuring the
strength of association of two variables.
 How closely do they move together?
 Is variable A a good predictor of variable B?

Move to a slightly more complex world, no
more 2 and three category variables
Example:
Age and Income Data
Example:
Age and Income Data
Example:
Age and Income Data
Example:
Age and Income Data
__________________________________________________________________
Descriptive Statistics: age, annual income
Variable
age
annual I
N
23
23
Mean
24.565
17174
Median
23.000
10000
StDev
4.251
15712
SE Mean
0.886
3276
Variable
Minimum
Maximum
Q1
Q3
age
22.000
42.000
22.000
26.000
annual I
0
65000
7000
25000
_________________________________________________________________
Example:
Age and Income Data
Example:
Age and Income Data
•Adding some info to the graph…
Covariance and Correlation
Defined
Define Covariance and Correlation for a
random sample of data:

Let our data be composed of pairs of data
(Xi,Yi) where X has mean mx and Y has mean
my. Then the covariance, the co-movement
around their means, is defined as:
Example: Covariance
We observe the relationship between the number
of employees at work at a plant and the output for
five days in a row:
Attendance
Output
8
40
3
28
2
20
6
39
4
28
What is the covariance of attendance and output?
Example: Covariance (cont.)
The covariance is positive. This suggests that when
attendance is above its mean, output is also above its mean.
Similarly, when attendance is below its mean, output is
below its mean.
Example: Overtime Hours
and Productivity
Example: Overtime Hours
and Productivity
Example: Overtime Hours
and Productivity
Covariances: prod-avg, week
prod-avg
week
prod-avg
113.7292
-49.5667
week
22.6667
Example: Overtime Hours
and Productivity (cont.)
Example: Overtime Hours
and Productivity (cont.)
Example: Overtime Hours
and Productivity (cont.)
Covariances: prod-avg, week, week-hours
prod-avg
week
week-hours
prod-avg
233.3345
-51.8706
-89.0777
week
week-hours
21.3986
0.0000
99.3069
Example: Overtime Hours
and Productivity (cont.)
Example: Overtime Hours
and Productivity (cont.)
Example: Overtime Hours
and Productivity (cont.)
Correlation vs. Covariance
A limitation of covariance is that it is difficult to
interpret. Its units are not well defined.
Thus, we need a measure which is more readily
interpreted and tells about the strength of
association.
Correlation:

Population Correlation is Defined as:
Correlation = 1.00
Correlation = 0.94
Correlation = 0.604
Correlation = 0.198
Correlation:
Previous Examples
Correlation:
Previous Examples
Correlation:
Previous Examples
Correlation:
Previous Examples
Overtime-Productivity:
Limit to 5 days, 10 hours:
Correlations: prod-avg, week, week-hours
week
week-hours
prod-avg
-0.734
0.000
week
-0.585
0.000
0.000
1.000
Correlations:
Previous Examples
Example: Correlation
Example: Correlation
Example: Correlation
What about some real data: Relationship
between age gender and weekly earnings
among human resource managers (admin
associated occupations)?
Example: Correlation
Descriptive Statistics: Female, age, weekearn
Variable
Female
age
weekearn
N
55158
55158
47576
N*
0
0
7582
Mean
0.50471
42.357
894.53
Median
1.00000
42.000
769.23
TrMean
0.50524
42.103
846.16
Variable
Female
age
weekearn
SE Mean
0.00213
0.050
2.58
Minimum
0.00000
15.000
0.01
Maximum
1.00000
90.000
2884.61
Q1
0.00000
33.000
519.00
Q3
1.00000
51.000
1153.00
StDev
0.49998
11.662
562.22
Example: Correlation
Tabulated Statistics: Female
Rows: Female
Male
Female
All
weekearn
Mean
1085.4
727.2
894.5
weekearn
StDev
622.1
440.5
562.2
Example: Correlation
Tabulated Statistics: Female
Rows: Female
weekearn
Mean
male 1085.4
female 727.2
all
894.5
age weekearn
Mean
StDev
43.256
41.475
42.357
622.1
440.5
562.2
age
StDev
11.856
11.399
11.662
Example: Correlation
Covariances: age, weekearn, Female
age
weekearn
Female
age
135.99
1119.24
-0.45
weekearn
Female
316094.42
-89.17
0.25
Example: Correlation
Correlations: age, Female, weekearn
age
Female
Female -0.076 0.000
weekearn 0.174 -0.318
Example: Correlation
Example: Non-Linearity
Correlation and Covariance
So covariance and correlation are measures
of linear association, but not measures of
association in general (or of non-linear
association).
Correlation and Covariance
What if we do not have data on individuals
but data on distributions? Example, we
have plant level data but plants vary widely
in employment. We want to give greater
weight to plants with more employees.
Correlation and Covariance
Correlation and Covariance
Correlation and Covariance
Correlation and Covariance
Correlation and Covariance