Covariation and correlation

Download Report

Transcript Covariation and correlation

Association
Correlation, covariance
Which of the following shows a
scatterplot of a positive correlation?
A
B
C
D
E
Which of the following is an example in which the
correlation coefficient does not represent the relationship?
A.
A&B
B. B
C. C
D.
D&E
E. E
What is the correlation of the
following scatterplot?
A. 0
B. .1
C. .9
D. -.9
E. -.1
What is the correlation of the
following scatterplot?
A. 0
B. .1
C. .9
D. -.9
E. -.1
What is the correlation of the
following scatterplot?
A. 0
B. .1
C. .9
D. -.9
E. -.1
What is the correlation of the
following scatterplot?
A. 0
B. .1
C. .9
D. -.9
E. -.1
Expected values, covariance,
correlation and expected values
Introduction to Bivariate Regression
Does talking about politics cause
people to be more interested in
politics? Or does an interest in
politics cause people to talk about
politics?
Is this a causal relationship?
Talking about
politics
Interest in Politics
Interest in Politics
Talking about
politics
How often do you talk about politics?
Times/Week
Freq.
Percent Cum.
0
1
2
3
4
5
6
7
162
231
233
226
175
148
47
177
11.58
16.51
16.65
16.15
12.51
10.58
3.36
12.65
11.58
28.09
44.75
60.9
73.41
83.99
87.35
100
Total
1399
100
Data=ANES, Stata code: tab talkpol
How interested are you in politics?
Frequency
1.
2.
3.
4.
5.
Percent Cumulative
Not interested at all 87
Slightly interested 199
Moderately interested
Very interested
409
Extremely interested 164
Total
1380
100
data=ANES, Stata code: tab intpol
6.3
14.42
521
29.64
11.88
6.3
20.72
37.75
88.12
100
58.48
Review standard deviation and variance
Variance: for each unit or observation, it is the distance
from the mean squared and then divide by the number
of units.
 Standard deviation – squareroot of variance
 since variance is in squared units, it doesn’t make any
sense. The standard deviation can be understood in
terms of the original measurement unit

Calculating variance and standard
deviations
Obs.
Prestige
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
82
83
90
76
90
87
93
90
52
88
57
89
97
59
73
Deviation^2
1177.18
1246.8
1790.14
801.46
1790.14
1545.28
2053
1790.14
18.58
1624.9
86.68
1706.52
2431.48
127.92
640.6
Review: Units, mean, variance and
standard deviation
Variable
Obs.
Mean
Variance
Std. Dev.
Talking politics
1399
3.1
4.7
2.18
Interest politics
1380
3.26
1.1
1.05
Expected value v. probability



If our population set of numbers is:
1,1,3,3,17, then the expected value is 5,
even though P(5) = 0.
Suppose we know that E(X) = 5 with the
equation y = 5 + 7x.
What is E(Y)?
Expected values
Interest in politics
Missing
Obs
Mean
Std. Dev.
Var
26
1380
3.26
1.048
1.099
What is the expected value?
What is the range?
Mode?
Why are there 26 missing?
Talk politics
What is the expected value?
Why is the standard deviation and
variance so high?
Missing
Obs
Mean
Std. Dev.
Variance
7
1399
3.099
2.177
4.737
Causation
 Time
ordering
 Covariation
Co-variation from variation?
 (xi
- xmean)^2/n
average distance between the mean of x
and each x value, squared
 aka
(xi - xmean) (xi - xmean)/n
Covariation?
(xi - xmean) * (yi - ymean) /
n-1
Covariation
 covariance
can take any value
 negative infinity to positive infinity
Intuitive explanation
(xi - xmean) * (yi - ymean) /
n-1


When x and y are high at the same time and x and y are
low at the same time, then the covariance is positive
They are both higher than their means and so the
products being added together are positive
Plot showing positive covariance
Intuitive explanation
(xi - xmean) * (yi - ymean) /
n-1


When x is low when y is high and vice versa, then the
covariance is negative
They are both higher than their means and so the
products being added together are negative
Plot showing negative covariance
Infant Mortality and GDP
100
50
Stata Code:
twoway (lfitci wdi_mort gle_gdp) (scatter
wdi_mort gle_gdp)
0
Infant Mortality
150
R code:
library(foreign)
#Choose the file `class_qog.dta'
myFile <- file.choose()
dat <- read.dta(myFile,header=TRUE)
attach(dat)
#Make Scatterplot
scatterplot(wdi_mort~gle_gdp, reg.line=lm
smooth=TRUE, spread=TRUE,
boxplots='xy', span=0.5, data=dat)
6
7
8
GDP Per Capita
9
10
11
Intuitive explanation
(xi - xmean) * (yi - ymean) / n

When sometimes:
x and y are high at the same time and x and y are low at the same
time
And about half of the other time
x is low when y is high and vice versa
Then the covariance is about 0

High positive numbers are added to high negative numbers




Plot showing no covariance
60
40
20
0
Life Expectancy
80
100
An Insignificant Relationship
0
10
20
30
Oil Rents
40
50
60
70
Covariance is a function of…



Variance (standard deviation) of x
Variance (standard deviation) of y
Relationship between x and y
How can you compare a covariance of 132
and 134,847?

134, 847 could be high variance of x, high variance of y,
high variance of both variables, or a high relationship
between x and y?

Not that helpful?
How can you change the covariance to a number that tells you
only the magnitude of the relationship between x and y?

Divide by the standard deviation of x * the standard
deviation of y

Correlation =1/(n-1) ((x-xmean)*(y-ymean) /Sd(x) *
sd (y))

Pearson r ranges from -1 to +1
Weak correlation = .1
moderate correlation = .4
strong correlation = .7


