Kernel Density Estimation
Download
Report
Transcript Kernel Density Estimation
Introduction to Non Parametric
Statistics
Kernel Density Estimation
Nonparametric Statistics
Fewer restrictive assumptions about data
and underlying probability distributions.
Population distributions may be skewed
and multi-modal.
Kernel Density Estimation (KDE)
Kernel Density Estimation (KDE) is a non-parametric technique
for density estimation in which a known density function (the
kernel) is averaged across the observed data points to create a
smooth approximation.
Density Estimation and
Histograms
Let b denote the bin-width then the histogram estimation at a
point x from a random sample of size n is given by,
fˆH ( x ; b)
number of observations in bin containing x
nb
Two choices have to be made when constructing a histogram:
Positioning of the bin edges
Bin-width
KDE – Smoothing the Histogram
Let X 1 ,, X n be a random sample taken from a continuous,
univariate density f. The kernel density estimator is given by,
fˆ ( x; h)
n
1
K{( x X i ) h}
n h i 1
K is a function satisfying K ( x) dx 1
The function K is referred to as the kernel.
h is a positive number, usually called the bandwidth or
window width.
Kernels
Gaussian
Refer to Table 2.1 Wand and Jones, page 31.
Epanechnikov
… most unimodal densities perform about the
same as each other when used as a kernel.
Rectangular
Triangular
Biweight
Uniform
Typically K is chosen to be a unimodal
PDF.
Use the Gaussian kernel.
Cosine
Wand M.P. and M.C. Jones (1995), Kernel Smoothing,
Monographs on Statistics and Applied Probability 60, Chapman
and Hall/CRC, 212 pp.
KDE – Based on Five Observations
Kernel density estimate constructed
using five observations with the
kernel chosen to be the N(0,1)
density.
x=c(3, 4.5, 5.0, 8, 9)
0.00
0.05
Density
0.10
0.15
Density of X
0
2
4
6
8
N = 5 Bandwidth = 1.195
10
12
Histogram - Positioning of Bin
Edges
Histogram of x
0.20
0.00
0.05
0.10
Density
0.15
0.20
0.15
Density
0.10
0.05
0.00
2
4
6
8
10
2
x
Histogram of x
x=c(3, 4.5, 5.0, 8, 9)
hist(x,right=T,freq=F), R-default
(a,b] right closed (left-open)
4
6
8
10
x
hist(x,right=F,freq=F)
[a,b) left closed (right-open)
Area=1
Histogram - Bin Width
Histogram of x
0.04
0.06
Density
0.2
0.00
0.02
0.1
0.0
Density
0.08
0.3
0.10
0.4
0.12
Histogram of x
3
4
5
6
7
8
x
hist(x,breaks=5,right=F,prob=T)
9
0
2
4
6
8
x
hist(x,breaks=2,right=F,prob=T)
Area=1
10
KDE – Numerical Implementation
"kde" <- function(x,h)
{
npt=100
r <- max(x) - min(x); xmax <- max(x) + 0.1*r; xmin <- min(x) - 0.1*r
n <- length(x)
xgrid <- seq(from=xmin, to=xmax, length=npt)
f = vector()
for (i in 1:npt){
tmp=vector()
for (ii in 1:n){
z=(xgrid[i] - x[ii])/h
density=dnorm(z)
tmp[ii]=density
}
f[i]=sum(tmp)
}
f=f/(n*h)
lines(xgrid,f,col="grey")
} #end function
n
1
fˆ ( x; h)
K{( x X i ) h}
n h i 1
Variable description
x = xgrid
X =x
Bandwidth Estimators
Optimal Smoothing
Normal Optimal Smoothing
Cross-validation
Plug-in bandwidths