Overview of Empirical Analysis

Download Report

Transcript Overview of Empirical Analysis

Tutorial on Local Polynomial Regression (LPR):
An Alternative to Ordinary Lease Squares
by John M. Clapp
March 10, 2000
I. Motivation: What LPR does.
II. How LPR works with OLS: An Overview.
III. Application to Housing Transactions in
Contra Costa County, California.
IV. Technical aspects of LPR, with Equations.
V. Full color outline, Figures and Equations are
available at:
http://www.sba.uconn.edu/users/johnc/index.htm
I. Motivation: What LPR does.



Non-normal data: E.g., Estimating
an empirical probability density
function or CDF. LPR makes
efficient use of scarce data in the
tails of the distribution.
Estimate non-linear functions such
as price indices, and logistic
functions.
See Next 2 slides.
Figure 1: Repeat and Hedonic PE's
a. Probability Distributions
0.07000
0.06000
0.05000
Repeat
0.04000
Hedonic
0.03000
0.02000
0.01000
-1
.0
60
-0
.9
81
-0
.9
03
-0
.8
25
-0
.7
47
-0
.6
68
-0
.5
90
-0
.5
12
-0
.4
34
-0
.3
55
-0
.2
77
-0
.1
99
-0
.1
21
-0
.0
42
0.
03
6
0.
11
4
0.
19
3
0.
27
1
0.
34
9
0.
42
7
0.
50
6
0.
58
4
0.
66
2
0.
74
0
0.
81
9
0.00000
Prediction Errors (mult. By 100 to get approx. % error)
IIa. ORDINARY LEASED SQUARES (OLS)
A. The standard approach on Wall
Street.
B. Regress log of sales price on log of
square footage and quarterly time
dummies.
*
ln SP i = ln SF i  + Q1 1  Q 2  2  ...  QTT  v i
C. In effect, just shift the constant term
for each quarter over the time frame
where you want to track house value.
D. Location is controlled by using sales
within a zip code or group of
neighboring zip codes.
IIb. Overview of the Local
Polynomial Regression Model



LPR is weighted OLS; LPR uses a polynomial eq.
A weight is applied to each observation in the
sample. The “kernel” weighting function is a
probability density function.
Let X = Latitude, Longitude and linear time; x is a
particular point in space and time. The LPR Eq’s:
Yi =  o  1 ( X i - x)   2 ( X i - x) 2  K  p ( X i - x) p
n
MiN( ˆ ) å{Yi -  0 - K - p( X i - x) p }2 K h ( X i - x)
i =1
III. The Data from Contra
Costa County





San Francisco, Eastern Suburbs
Rapidly growing area that contains a lot of open
space.
All single family housing transactions, 1994
through 1997, 48 months of data.
Data include sales price, date of sale, zip code,
and housing characteristics (square footage,
etc.)
I added latitude and longitude of the zip code.
Zip Codes
94596
94514
94572
94516
94532
94522
95423
94626
94551
94115
9452
Total transactions
Number of House Sales by Zip Code
4000
3000
2000
1000
0
The ten zips chosen for analysis
zip
94805
94528
94525
94572
94595
94547
94507
94513
94517
94509
Total Trans
# trans
56
57
103
337
436
670
683
687
697
3555
7281
OLS Model fitted to ZIP # 94509.000
Valid cases:
3461
Dependent variable:
Missing cases:
0
Deletion method:
Total SS:
263.378
Degrees of freedom:
R-squared:
0.792
Rbar-squared:
Residual SS:
54.888
Std error of est:
F(49,3412):
264.498
Probability of F:
X01
X02
X03
X04
X05
X06
X07
X08
X09
lnSF
94.01
94.02
94.03
94.04
94.05
94.06
94.07
94.08
0.797109 Time
6.040032
94.01
6.052379
94.02
6.051791
94.03
6.068057
94.04
6.062479
94.05
6.046367
94.06
6.055872
94.07
6.050507
94.08
Index
100
101.2347
101.1759
102.8025
102.2447
100.6335
101.584
101.0475
Y
None
3412
0.789
0.127
0.000
t-values
111.8447
109.2812
110.9381
110.4077
112.076
112.0915
112.3771
112.2011
112.2107
94
.0
1
94
.0
3
94
.0
5
94
.0
7
94
.0
9
94
.1
1
95
.0
1
95
.0
3
95
.0
5
95
.0
7
95
.0
9
95
.1
1
96
.0
1
96
.0
3
96
.0
5
96
.0
7
96
.0
9
96
.1
1
97
.0
1
97
.0
3
97
.0
5
97
.0
7
97
.0
9
97
.1
1
Cumulative Index (94.01 = 100)
Price Index for Single Zip
110
105
Zip 94509, 3,461
transactions
100
95
90
85
80
YY.MM
PROBLEMS WITH THIS
PARAMETRIC APPROACH
A. Few transactions in many zip codes. What if your
loan is in one of these sparse areas?
B. We would like to estimate monthly rather than
quarterly price indices.
C. A linear relationship is assumed between sales
price and square footage or other characteristics.
THE CONCEPT BEHIND KERNEL
SMOOTHING and LOCAL REGRESSIONS
A. Suppose we took a moving average
of the monthly OLS price indices.
B. Problem # 1: We lose one or more
time periods at the end.
C. Problem # 2: What are the optimal
weights for the moving average?
D. Illustration: Control for square
footage: Smooth the partial residuals.
E. Average the residuals within “bins”
where each bin is a month, quarter, or
other time period.
F. Notice the bias-variance trade-off.
Constant Quality House Price Histogram: Many bins, Zip94547
101.0
100.5
Price Index
100.0
99.5
99.0
98.5
98.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Time (Rank order = # bins)
15
16
17
18
19
20
21
22
23
Constant Quality House Price Histogram: Zip94547
100.6
100.4
100.2
100.0
99.8
99.6
99.4
99.2
99.0
98.8
98.6
1
2
3
4
5
6
7
8
Time (rank order = # bins)
9
10
11
12
13
A NON-PARAMETRIC APPROACH: KERNEL
SMOOTHING AND LPR
A. Instead of “binning” we choose a point in
time and average transactions near that time.
B. Analogy: Estimate an OLS at each location
at each point in time.
1. Down-weight sales that are more distant
(control for location).
2. Down-weight sales that are more distant
in time (control for date of sale).
Price Indices by Zip Code
107
106
105
Zip94528-Lg. BW
Zip94547-Lg. BW
103
Zip94528-Sm. BW
Zip94547-Sm. BW
102
Indices with larger bandwidths are smoother
101
100
Month (1=Jan. 94)
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
99
1
House Price Index
104
Problems with LPR:
 How to choose the bandwidths:
This is analogous to the bin widths
and to the standard deviation in
parametric statistics.
 Curse of dimensionality: Difficult
to smooth on more than 3 or 4
explanatory variables at the same
time.
 But, you can combine OLS (e.g.,
to control for bathrooms,
fireplaces, view and the like) with
local regressions: Semiparametric
approach.