The method of least squares

Download Report

Transcript The method of least squares

CIS 2033 based on
Dekking et al. A Modern Introduction to Probability and Statistics. 2007
Instructor Longin Jan Latecki
C22: The Method of Least Squares
22.1 – Least Squares
Given is a bivariate dataset (x1, y1), …, (xn, yn), where x1, …, xn are
nonrandom and Yi = α + βxi + Ui are random variables for i = 1, 2, . . .,
n. The random variables U1, U2, …, Un have zero expectation and
variance σ 2
Method of
Least Squares: Choose a value for α and β such that
n
2
S(α,β)=( ∑ ( yi− α− β x i) ) is minimal.
1
22.1 – Regression
The observed value yi corresponding to xi and the value α+βxi on the
regression line y = α + βx.
n
2
(
y
−
α−
β
x
)
∑ i
i
1
22.1– Estimation
Method of Least Squares: Choose a value for α and β such that
n
S(α,β)=(∑ ( yi− α− β x i)2 ) is minimal.
1
To find the least squares estimates, we differentiate S(α, β) with
respect to α and β, and we set the derivatives equal to 0:
 After some calculus magic, we get two equations to estimate α and β:
22.1– Estimation
 After some simple algebraic rearranging, we obtain:
(slope)
(intercept)
Regression line y = 0.25 x –2.35 for points
Var ( X )  E[ X 2 ]  E[ X ]2
22.1– Least Square Estimators are
Unbiased
 The estimators for α and β are unbiased.
 For the simple linear regression model, the random variable
n
̂σ 2=
1
2
̂
(Y
−
̂α
−
β
x
)
∑
i
n− 2 i= 1 i
is an unbiased estimator for σ2.
22.2– Residuals
A way to explore whether the linear regression model is appropriate
to model a given bivariate dataset is to inspect a scatter plot of the
so-called residuals ri against the xi.
The ith residual ri is defined as the vertical distance between the
ith point and the estimated regression line:
We always have
22.2– Heteroscedasticity
 Homoscedasticity: The assumption of equal variance of the Ui (and
therefore Yi).
In case the variance of Yi depends on the value of xi, we
speak of heteroscedasticity. For instance, heteroscedasticity occurs
when Yi with a large expected value have a larger variance than those
with small expected values. This produces a “fanning out” effect, which
can be observed in the figure:
22.3– Relation with Maximum
Likelihood
What are the maximum likelihood estimates for α and β?
To apply the method of least squares no assumption is needed about
the type of distribution of the Ui. In case the type of distribution of the Ui
is known, the maximum likelihood principle can be applied. In particular,
when the Ui are independent with an N(0, σ2) distribution.
Then Yi has an N (α + βxi, σ2) distribution, making the probability
density function
When Yi are independent, and eachYi has an N(α+βxi, σ2)
distribution, and assuming that the linear model is appropriate to
model a given bivariate dataset, the residuals ri should look like the
realization of a random sample from a normal distribution.
An example is shown in the figure below:
22.3– Maximum Likelihood
For fixed σ >0 the loglikelihood
l
(α, β, σ) obtains the maximum when
n
∑1 ( yi− α− β x i)2
is minimal. Hence, when random variables independent with a N(0,σ 2)
distribution, the maximum likelihood principle and the least
squares method return the same estimators.
The maximum likelihood estimator for σ 2 is:
n
1
2
̂
̂σ = ∑ (Y i− ̂α − β xi )
n i= 1
2