Standardization and regression

Download Report

Transcript Standardization and regression

Standardization and regression
Maarten Buis
7-12-2005
1
Outline
• Recap
• table A
• D-Standardization
• Regression: Thinking about effects
2
Recap Standardization
• Standardization makes difficult to interpret
variables more interpretable
• Allows you to add variables, e.g.:
Socioeconomic class is income + education
• P-standardization: % of observations who
score less
• Z-standardization: deviation from mean in
standard deviations
3
Appendix 2, Table A
4
D-standardization
• A third way to make the unit comparable between
variables
• One assigns numbers 0 or 1 to cases. The range is
thus fixed, the mean is proportion (P)of ones. The
standard deviation is P*(1-*P)
• Can be done to every variable
• Easy to communicate results (high versus low
incomes
5
D-standardization
• However: you lose a lot of information
• Often chosen when constructing a scale
based on a lot of variables
• Loss of information is less severe in this
case
• New variable has a range of 0 till 1 and can
be interpreted as the proportion ‘correct’
answers
6
Effect of gender on income
•
•
•
•
•
good old loneliness dataset
mean income males: f 2.819,86
mean income females: f 2.223,27
effect of gender on income:
f 2.819,86 – f 2.223,27 = f 596,59
7
Effect of age on income
age
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
Mean
3.373,68
3.282,48
3.311,79
2.853,17
2.973,18
2.782,87
2.630,97
2.928,21
2.832,16
2.739,62
2.550,00
2.509,28
2.487,13
2.704,26
2.359,08
2.474,34
2.230,67
2.299,85
age
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
Mean
2.466,79
2.422,93
2.151,48
2.400,00
2.343,75
2.294,87
2.281,47
2.109,48
2.141,30
2.304,85
2.239,26
2.158,70
2.058,42
2.206,12
2.050,37
2.076,39
2.380,00
8
Effect of age on income
comparison difference in mean income
55 - 56
-fl 91,21
56 - 57
fl 29,32
57 - 58
-fl 458,62
58 - 59
fl 120,01
59 - 60
-fl 190,31
60 - 61
-fl 151,90
.
.
.
.
.
.
82 - 83
-fl 65,59
83 - 84
-fl 80,56
84 - 85
-fl 100,27
85 - 86
fl 147,69
86 - 87
-fl 155,74
87 - 88
fl 26,02
88 - 89
fl 303,61
9
Too much of a good thing
• Too many effects to be meaningful.
• Problem: How to summarize a large number
of effects?
• Effects are just numbers.
• So the problem becomes: How to summarize
a large number of numbers?
• We have seen this problem before!!!
• Solution: Compute the average effect.
• This is the effect in regression: On average
one loses f 32,70 if one gets a year older.
10
effect and straight line
• An average person has average age and income;
i.e. is 71 years old and earns f 2.525,• Someone a year older will earn f 32,70 less
• Someone two years older will earn
2 * f32.70 = f 65.40 less
• Someone three years older will earn
3 * f 32.70 = f 98.10 less
• etc.
11
3000
2900
2800
income
2700
2600
2500
2400
2300
2200
2100
2000
58
63
68
73
78
age
12
effect and straight line
•
•
•
•
•
The regression line is a straight line
the equation of a straight line is:
y = ...
x = ...
the slope is ...
13