IPPTChap004x

Download Report

Transcript IPPTChap004x

Chapter 4
Basic Estimation Techniques
McGraw-Hill/Irwin
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Learning Objectives
 Set up and interpret simple linear regression equations
 Estimate intercept and slope parameters of a regression
line using the method of least‐squares
 Determine statistical significance using either t‐tests or
p values associated with parameter estimates
 Evaluate the “fit” of a regression equation to the data
using the R2 statistic and test for statistical significance
of the whole regression equation using an F‐test
 Set up and interpret multiple regression models
 Use linear regression techniques to estimate the
parameters of two common nonlinear models: quadratic
and log‐linear regression models
4-2
Basic Estimation
 Parameters
~ The coefficients in an equation that determine
the exact mathematical relation among the
variables
 Parameter estimation
~ The process of finding estimates of the
numerical values of the parameters of an
equation
4-3
Regression Analysis
 Regression analysis
~ A statistical technique for estimating the
parameters of an equation and testing for
statistical significance
 Dependent variable
~ Variable whose variation is to be explained
 Explanatory variables
~ Variables that are thought to cause the
dependent variable to take on different values
4-4
Simple Linear Regression
 True regression line relates dependent
variable Y to one explanatory (or
independent) variable X
Y  a  bX
~ Intercept parameter (a) gives value of Y where
regression line crosses Y-axis (value of Y when X
is zero)
~ Slope parameter (b) gives the change in Y
associated with a one-unit change in X:
b  Y X
4-5
Simple Linear Regression
 Regression line shows the average or
expected value of Y for each level of X
 True (or actual) underlying relation
between Y and X is unknown to the
researcher but is to be discovered by
analyzing the sample data
 Random error term
~ Unobservable term added to a regression model to
capture the effects of all the minor, unpredictable
factors that affect Y but cannot reasonably by
included as explanatory variables
4-6
Fitting a Regression Line
 Time series
~ A data set in which the data for the
dependent and explanatory variables are
collected over time for a single firm
 Cross-sectional
~ A data set in which the data for the
dependent and explanatory variables are
collected from many different firms or
industries at a given point in time
4-7
Fitting a Regression Line
 Method of least squares
~ A method of estimating the parameters of a
linear regression equation by finding the line
that minimizes the sum of the squared
distances from each sample data point to
the sample regression line
4-8
Fitting a Regression Line
 Parameter estimates are obtained by
choosing values of a & b that minimize
the sum of squared residuals
~ The residual is the difference between the
actual and fitted values of Y: Yi – Ŷi
~ Equivalent to fitting a line through a scatter
diagram of the sample data points
4-9
Fitting a Regression Line
 The sample regression line is an
estimate of the true (or population)
regression line
ˆ
Yˆ  aˆ  bX
~Where â and b̂ are least squares estimates
of the true (population) parameters a and b
4-10
Sample Regression Line
(Figure 4.2)
S
S
60,000
Sii =
 60,000
Sales (dollars)
70,000
60,000
ei
50,000
20,000
10,000
•
•
40,000
30,000
•
Sample regression line
Ŝi = 11,573 + 4.9719A
•
= 46,376
Ŝi Ŝ
i 46,376
•
•
•
A
0
2,000
4,000
6,000
8,000
10,000
Advertising expenditures (dollars)
4-11
Unbiased Estimators
 The estimates â & b̂ do not generally
equal the true values of a & b
~
â & b̂ are random variables computed using
data from a random sample
 The distribution of values the estimates
might take is centered around the true
value of the parameter
 An estimator is unbiased if its average
value (or expected value) is equal to the
true value of the parameter
4-12
Relative Frequency Distribution*
(Figure 4.3)
Relative Frequency Distribution*
for bˆ when b  5
ˆ
Relative frequency of b
1
0
1
2
3
4
5
6
7
8
9
10
ˆ
Least-squares estimate of b (b)
*Also called a probability density function (pdf)
4-13
Statistical Significance
 Statistical significance
~ There is sufficient evidence from the
sample to indicate that the true value of the
coefficient is not zero
 Hypothesis testing
~ A statistical technique for making a
probabilistic statement about the true value
of a parameter
4-14
Statistical Significance
 Must determine if there is sufficient
statistical evidence to indicate that Y is
truly related to X (i.e., b  0)
 Even if b = 0, it is possible that the
sample will produce an estimate b̂ that
is different from zero
 Test for statistical significance using
t-tests or p-values
4-15
Statistical Significance
 First determine the level of significance
~ Probability of finding a parameter estimate to
be statistically different from zero when, in
fact, it is zero
~ Probability of a Type I Error
 1 – level of significance = level of
confidence
~ Level of confidence is the probability of
correctly failing to reject the true hypothesis
that b = 0
4-16
Performing a t-Test
b̂
 t-ratio is computed as t 
Sb̂
where Sb̂ is the standard error of the estimate bˆ
 Use t-table to choose critical t-value with
n – k degrees of freedom for the chosen
level of significance
n = number of observations
~ k = number of parameters estimated
~
4-17
Performing a t-Test
 t-statistic
~ Numerical value of the t-ratio
 If the absolute value of t-statistic is
greater than the critical t, the parameter
estimate is statistically significant at the
given level of significance
4-18
Using p-Values
 Treat as statistically significant only those
parameter estimates with p-values
smaller than the maximum acceptable
significance level
 p-value gives exact level of significance
~ Also the probability of finding significance
when none exists
4-19
Coefficient of Determination
 R2 measures the fraction of total variation
in the dependent variable (Y) that is
explained by the variation in X
~ Ranges from 0 to 1
~ High R2 indicates Y and X are highly
correlated, and does not prove that Y and X
are causally related
4-20
F-Test
 Used to test for significance of overall
regression equation
 Compare F-statistic to critical F-value
from F-table
~ Two degrees of freedom, n – k & k – 1
~ Level of significance
 If F-statistic exceeds the critical F, the
regression equation overall is statistically
significant at the specified level of
significance
4-21
Multiple Regression
 Uses more than one explanatory variable
 Coefficient for each explanatory variable
measures the change in the dependent
variable associated with a one-unit
change in that explanatory variable, all
else constant
4-22
Quadratic Regression Models
 Use when curve fitting scatter plot is
U-shaped or ∩-shaped
 Y = a + bX + cX2
~ For linear transformation compute new
variable Z = X2
~ Estimate Y = a + bX + cZ
4-23
Log-Linear Regression Models
 Use when relation takes the form: Y = aXbZc
Percentage change in Y
b=
Percentage change in X
Percentage change in Y
c=
Percentage change in Z
 Transform by taking natural logarithms:
lnY  lna  bln X  c ln Z
~ b and c are elasticities
4-24
Summary
 A simple linear regression model relates a dependent
variable Y to a single explanatory variable X
~ The regression equation is correctly interpreted as providing the
average value (expected value) of Y for a given value of X
 Parameter estimates are obtained by choosing values of
a and b that create the best-fitting line that passes
through the scatter diagram of the sample data points
 If the absolute value of the t-ratio is greater (less) than the
critical t-value, then is (is not) statistically significant
~ Exact level of significance associated with a t-statistic is its p-value
 A high R2 indicates Y and X are highly correlated and the
data tightly fit the sample regression line
4-25
Summary
 If the F-statistic exceeds the critical F-value, the
regression equation is statistically significant
 In multiple regression, the coefficients measure the
change in Y associated with a one-unit change in that
explanatory variable
 Quadratic regression models are appropriate when the
curve fitting the scatter plot is U-shaped or ∩-shaped
(Y = a + bX + cX2)
 Log-linear regression models are appropriate when the
relation is in multiplicative exponential form (Y = aXbZc)
~ The equation is transformed by taking natural logarithms
4-26