IPPTChap004x
Download
Report
Transcript IPPTChap004x
Chapter 4
Basic Estimation Techniques
McGraw-Hill/Irwin
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Learning Objectives
Set up and interpret simple linear regression equations
Estimate intercept and slope parameters of a regression
line using the method of least‐squares
Determine statistical significance using either t‐tests or
p values associated with parameter estimates
Evaluate the “fit” of a regression equation to the data
using the R2 statistic and test for statistical significance
of the whole regression equation using an F‐test
Set up and interpret multiple regression models
Use linear regression techniques to estimate the
parameters of two common nonlinear models: quadratic
and log‐linear regression models
4-2
Basic Estimation
Parameters
~ The coefficients in an equation that determine
the exact mathematical relation among the
variables
Parameter estimation
~ The process of finding estimates of the
numerical values of the parameters of an
equation
4-3
Regression Analysis
Regression analysis
~ A statistical technique for estimating the
parameters of an equation and testing for
statistical significance
Dependent variable
~ Variable whose variation is to be explained
Explanatory variables
~ Variables that are thought to cause the
dependent variable to take on different values
4-4
Simple Linear Regression
True regression line relates dependent
variable Y to one explanatory (or
independent) variable X
Y a bX
~ Intercept parameter (a) gives value of Y where
regression line crosses Y-axis (value of Y when X
is zero)
~ Slope parameter (b) gives the change in Y
associated with a one-unit change in X:
b Y X
4-5
Simple Linear Regression
Regression line shows the average or
expected value of Y for each level of X
True (or actual) underlying relation
between Y and X is unknown to the
researcher but is to be discovered by
analyzing the sample data
Random error term
~ Unobservable term added to a regression model to
capture the effects of all the minor, unpredictable
factors that affect Y but cannot reasonably by
included as explanatory variables
4-6
Fitting a Regression Line
Time series
~ A data set in which the data for the
dependent and explanatory variables are
collected over time for a single firm
Cross-sectional
~ A data set in which the data for the
dependent and explanatory variables are
collected from many different firms or
industries at a given point in time
4-7
Fitting a Regression Line
Method of least squares
~ A method of estimating the parameters of a
linear regression equation by finding the line
that minimizes the sum of the squared
distances from each sample data point to
the sample regression line
4-8
Fitting a Regression Line
Parameter estimates are obtained by
choosing values of a & b that minimize
the sum of squared residuals
~ The residual is the difference between the
actual and fitted values of Y: Yi – Ŷi
~ Equivalent to fitting a line through a scatter
diagram of the sample data points
4-9
Fitting a Regression Line
The sample regression line is an
estimate of the true (or population)
regression line
ˆ
Yˆ aˆ bX
~Where â and b̂ are least squares estimates
of the true (population) parameters a and b
4-10
Sample Regression Line
(Figure 4.2)
S
S
60,000
Sii =
60,000
Sales (dollars)
70,000
60,000
ei
50,000
20,000
10,000
•
•
40,000
30,000
•
Sample regression line
Ŝi = 11,573 + 4.9719A
•
= 46,376
Ŝi Ŝ
i 46,376
•
•
•
A
0
2,000
4,000
6,000
8,000
10,000
Advertising expenditures (dollars)
4-11
Unbiased Estimators
The estimates â & b̂ do not generally
equal the true values of a & b
~
â & b̂ are random variables computed using
data from a random sample
The distribution of values the estimates
might take is centered around the true
value of the parameter
An estimator is unbiased if its average
value (or expected value) is equal to the
true value of the parameter
4-12
Relative Frequency Distribution*
(Figure 4.3)
Relative Frequency Distribution*
for bˆ when b 5
ˆ
Relative frequency of b
1
0
1
2
3
4
5
6
7
8
9
10
ˆ
Least-squares estimate of b (b)
*Also called a probability density function (pdf)
4-13
Statistical Significance
Statistical significance
~ There is sufficient evidence from the
sample to indicate that the true value of the
coefficient is not zero
Hypothesis testing
~ A statistical technique for making a
probabilistic statement about the true value
of a parameter
4-14
Statistical Significance
Must determine if there is sufficient
statistical evidence to indicate that Y is
truly related to X (i.e., b 0)
Even if b = 0, it is possible that the
sample will produce an estimate b̂ that
is different from zero
Test for statistical significance using
t-tests or p-values
4-15
Statistical Significance
First determine the level of significance
~ Probability of finding a parameter estimate to
be statistically different from zero when, in
fact, it is zero
~ Probability of a Type I Error
1 – level of significance = level of
confidence
~ Level of confidence is the probability of
correctly failing to reject the true hypothesis
that b = 0
4-16
Performing a t-Test
b̂
t-ratio is computed as t
Sb̂
where Sb̂ is the standard error of the estimate bˆ
Use t-table to choose critical t-value with
n – k degrees of freedom for the chosen
level of significance
n = number of observations
~ k = number of parameters estimated
~
4-17
Performing a t-Test
t-statistic
~ Numerical value of the t-ratio
If the absolute value of t-statistic is
greater than the critical t, the parameter
estimate is statistically significant at the
given level of significance
4-18
Using p-Values
Treat as statistically significant only those
parameter estimates with p-values
smaller than the maximum acceptable
significance level
p-value gives exact level of significance
~ Also the probability of finding significance
when none exists
4-19
Coefficient of Determination
R2 measures the fraction of total variation
in the dependent variable (Y) that is
explained by the variation in X
~ Ranges from 0 to 1
~ High R2 indicates Y and X are highly
correlated, and does not prove that Y and X
are causally related
4-20
F-Test
Used to test for significance of overall
regression equation
Compare F-statistic to critical F-value
from F-table
~ Two degrees of freedom, n – k & k – 1
~ Level of significance
If F-statistic exceeds the critical F, the
regression equation overall is statistically
significant at the specified level of
significance
4-21
Multiple Regression
Uses more than one explanatory variable
Coefficient for each explanatory variable
measures the change in the dependent
variable associated with a one-unit
change in that explanatory variable, all
else constant
4-22
Quadratic Regression Models
Use when curve fitting scatter plot is
U-shaped or ∩-shaped
Y = a + bX + cX2
~ For linear transformation compute new
variable Z = X2
~ Estimate Y = a + bX + cZ
4-23
Log-Linear Regression Models
Use when relation takes the form: Y = aXbZc
Percentage change in Y
b=
Percentage change in X
Percentage change in Y
c=
Percentage change in Z
Transform by taking natural logarithms:
lnY lna bln X c ln Z
~ b and c are elasticities
4-24
Summary
A simple linear regression model relates a dependent
variable Y to a single explanatory variable X
~ The regression equation is correctly interpreted as providing the
average value (expected value) of Y for a given value of X
Parameter estimates are obtained by choosing values of
a and b that create the best-fitting line that passes
through the scatter diagram of the sample data points
If the absolute value of the t-ratio is greater (less) than the
critical t-value, then is (is not) statistically significant
~ Exact level of significance associated with a t-statistic is its p-value
A high R2 indicates Y and X are highly correlated and the
data tightly fit the sample regression line
4-25
Summary
If the F-statistic exceeds the critical F-value, the
regression equation is statistically significant
In multiple regression, the coefficients measure the
change in Y associated with a one-unit change in that
explanatory variable
Quadratic regression models are appropriate when the
curve fitting the scatter plot is U-shaped or ∩-shaped
(Y = a + bX + cX2)
Log-linear regression models are appropriate when the
relation is in multiplicative exponential form (Y = aXbZc)
~ The equation is transformed by taking natural logarithms
4-26