eres2011_331.content

Download Report

Transcript eres2011_331.content

Bootstrapping – the
neglected approach to
uncertainty
Paul Kershaw
University of South Australia
European Real Estate Society Conference
Eindhoven, Nederlands, 15-18 June 2011
Overview
•
•
•
•
•
•
•
•
•
The history of confidence intervals
Pedagogical predilection to a parametric view
Real estate research is NOT normal
Do not provide a measure of probability
Enter the Jackknife
Monte Carlo simulation & Bootstrapping
The basic algorithms
Real World Applications
A better mousetrap
Bootstrapping – the neglected approach to uncertainty
Slide 2
Introduction
• The origins of hypothesis tests is 1279
• Confidence intervals were derived in 1937
• A confidence interval estimates the uncertainty
about the true value of some population
parameter
• 50-year lag before medical journals for example
advocated their use
• The lazy approach is to assume a normal
distribution
Bootstrapping – the neglected approach to uncertainty
Slide 3
Not Normal
• Very little about real estate can be considered to follow a
Normal distribution including:
Prices, Land area, building area, age, number of
bedrooms, location, physical condition, construction,
tenant’s covenant, heating, etc.
• Linear regression techniques are regularly applied,
averages, standard errors and parametric confidence
intervals proffered. Why?
• Is it because we are taught to do it that way – or
because we teach it that way – gloss over the ignored
assumptions – just give me a number from the printout.
Bootstrapping – the neglected approach to uncertainty
Slide 4
Not a measure of Probability
• This begs the question “what is the confidence
interval of a correlation coefficient?” and leads to
the second question “why is it so rarely reported?”
• What is a realistic confidence interval for a
computer generated valuation using a linear
regression model?
• Most proprietary AVMs provide their own, often ill
defined, assessment of accuracy that is usually
somewhat nebulous.
Bootstrapping – the neglected approach to uncertainty
Slide 5
Enter the Jackknife
• Early efforts in the 1950s revolved around the
Jackknife (Quenouille, M).
• The Jackknife provides a technique for
estimating the bias and standard error of an
estimate irrespective of the shape of the
underlying distribution.
• The following example is based upon the work
of Efron, B; 1993. The datapoints are LSAT, the
average score for the class on a national law
test, and GPA, the average undergraduate
grade-point average for the class.
Bootstrapping – the neglected approach to uncertainty
Slide 6
Sample Data
LSAT GPA
576
3.39
635
3.3
558
2.81
578
3.03
666
3.44
580
3.07
555
3
661
3.43
651
3.36
605
3.13
653
3.12
575
2.74
545
2.76
572
2.88
594
2.96
3.5
3.4
3.3
3.2
3.1
3
2.9
2.8
2.7
540
560
Correlation =
580
600
620
640
660
680
0.7764
Jackknife 95% Confidence
Interval = 0.73 to 0.89
Bootstrapping – the neglected approach to uncertainty
Slide 7
The basic algorithms
Compute sample statistics on n separate samples
of size n-1. Each sample is the original data with
a single observation omitted.
Jackknife Heuristic:
• Remove one data point only and calculate the
statistic of interest to give estimate 1
• Repeat for each data points to give estimates 2,
3, 4 …n
• Calculate the percentiles of interest to obtain the
confidence interval
Bootstrapping – the neglected approach to uncertainty
Slide 8
Jackknife Calculations
Bootstrapping – the neglected approach to uncertainty
Slide 9
Monte Carlo & Bootstrapping
• Monte Carlo simulation caught the imagination of
practitioners and researchers following Hertz, David;
1964, Harvard Business Review
• Monte Carlo simulation uses repeated sampling to
determine the properties of some result of interest
• The re-sampling is carried out with replacement
• If we apply this technique to the previous Jackknife data
we would be Bootstrapping [Adventures of Baron Munchausen]
• Bootstrapping is repeatedly re-sampling with
replacement, calculating the statistic of interest and
recording its distribution.
Bootstrapping – the neglected approach to uncertainty
Slide 10
Bootstrap Algorithm
Remark: to calculate the dispersion of the mean
DataArray() = n data points
MeanResults(1000)
For i = 1 to 1000
Sum=0
For j = 1 to n
Sum = Sum + DataArray(RandomBetween(1,n))
Next j
MeanResults(i) = Sum / n
Next i
Bootstrapping – the neglected approach to uncertainty
Slide 11
Real World Application 1
What annoys me most – residential price change reporting and hot
spotting
Below are sale prices for Q4 2010 and Q1 2011 for Detached houses
in Aberfoyle Park, South Australia
$700,000
$700,000
$600,000
$600,000
$500,000
$500,000
$400,000
$400,000
$300,000
$300,000
$200,000
$200,000
$100,000
$100,000
$0
1
6
11
16
21
26
31
36
41
$0
46
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31 33 35
Median
382,500
Change
Median
0.65%
Median
385,000
Average
409,932
Change
Average
-4.39%
Average
391,946
Bootstrapping – the neglected approach to uncertainty
Slide 12
Bootstrap Results 1000 iterations
Q4 2010
Q1 2011
Average Std Deviation
$409,138
$12,037
$392,358
$10,549
Confidence Interval
2.50%
97.50%
$384,635 $431,877
$372,403 $413,303
Q4 2010
Q1 2011
Median
$384,431
$387,346
$15,276
$12,359
$361,000
$350,000
$384,635
$372,403
$431,877
$413,303
Average
Median
$361,000
$350,000
$420,150
$410,000
$420,150
$410,000
-13.77%
7.45%
-16.70%
13.57%
The degree of uncertainty is clearly illustrated.
The median has a 95% “confidence interval” of ….
Bootstrapping – the neglected approach to uncertainty
Slide 13
A better mousetrap
• The traditional approach is to select n from n with
replacement and calculate statistic of interest and repeat
m times
• This is inefficient for most statistics of interest including
the mean, median, standard deviation or correlation
coefficient
• For example the mean is sum/n
• If for each iteration we remove just one random element
and replace it with another random element we can
adjust the sum by subtracting the value of the removed
element and adding the value of the ingoing element
• If n is say 50 we save 48 mathematical operations
Bootstrapping – the neglected approach to uncertainty
Slide 14
Summary
• The bootstrap is simple to implement
• The results are meaningful and easy to
interpret
• No specious assumptions regarding
underlying distributions are required
• Widely accepted
• It should be embraced by all researchers
and practitioners
Bootstrapping – the neglected approach to uncertainty
Slide 15
Yesteryear’s Joys
Bootstrap Methods: Another Look at the Jackknife
B. Efron
Source: Annals of Statistics Volume 7, Number 1 (1979), 1-26.
Bootstrapping – the neglected approach to uncertainty
Slide 16