Lesson 5 - WordPress @ VIU Sites

Download Report

Transcript Lesson 5 - WordPress @ VIU Sites

Chapter 7
Qualitative Variables and
Non-Linearities in Multiple Linear
Regression Analysis
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
• Construct and use qualitative
independent variables
• Construct and use interaction effects
• Control for non-linear relationships
• Estimate marginal effects as percent
changes and elasticities
• Estimate a more fully-specified model
7-2
7-3
Construct and Use Qualitative
Independent Variables
• Qualitative explanatory variable (dummy variable)
with two or more levels:
– yes or no, on or off, male or female
– coded as 0 or 1
• Regression intercepts are different if the variable is
statistically significant
• Assumes equal slopes for the other variables
• The number of dummy variables needed is
(number of levels - 1)
7-4
Dummy-Variable Model Example
(with 2 Levels)
Let:
y = pie sales
y  0  1 x1   2 x 2  
x1 = price
x2 = holiday (X2 = 1 if a holiday occurred during the week)
(X2 = 0 if there was no holiday that week)
7-5
Dummy-Variable Model Example
(with 2 Levels) Continued
ŷ  ˆ0  ˆ1 x1  ˆ2 (1)  ( ˆ0  ˆ2 )  ˆ1 x1
ŷ  ˆ  ˆ x  ˆ (0)  ˆ
 ˆ x
0
y (sales)
ˆ0  ˆ2
̂ 0
1
1
2
0
Different
ˆ
intercept
1
1
Holiday
No Holiday
Same
slope
If H0: β2 = 0 is
rejected, then
“Holiday” has a
significant effect
on pie sales
x1 (Price)
7-6
Interpretation of the Dummy Variable
Coefficient (with 2 Levels)
Example: Sales  300 - 30(Price)  15(Holiday )
Sales: number of pies sold per week
Price: pie price in $
1 If a holiday occurred during the week
Holiday:
0 If no holiday occurred
ˆ2 = 15: on average, sales were 15 pies greater in weeks
with a holiday than in weeks without a holiday, given
the same price
7-7
Dummy-Variable Models
(more than 2 Levels)
• The number of dummy variables is one less than
the number of levels
• Example:
y = house price ; x1 = square feet
• The style of the house is also thought to matter:
Style = ranch, split level, condo
Three levels, so two dummy
variables are needed
7-8
Dummy-Variable Models
(more than 2 Levels) Continued
Let the default category be “condo”

1 if ranch
x2  

0 if not

1 if split level
x3  

0 if not
ŷ  ˆ0  ˆ1x1  ˆ2 x 2  ˆ3 x 3
ˆ2 shows the impact on price if the house is a ranch
style, compared to a condo
̂ 3 shows the impact on price if the house is a split
level style, compared to a condo
7-9
Interpreting the Dummy Variable
Coefficients (with 3 Levels)
Suppose the estimated equation is
ŷ  20.43  0.045x1  23.53x 2  18.84x 3
For a condo: x2 = x3 = 0
ŷ  20.43  0.045x1
For a ranch: x3 = 0
ŷ  20.43  0.045x1  23.53
For a split level: x2 = 0
ŷ  20.43  0.045x1  18.84
Same slope
With the same square feet, a
ranch will have an estimated
average price of 23.53 thousand
dollars more than a condo and
the intercept for a ranch is
20.43 + 23.53 = 43.96
With the same square feet, a
ranch will have an estimated
average price of 18.84 thousand
dollars more than a condo and
the intercept for a split level is
20.43 + 19.84 = 40.27
7-10
Excel Example
What type of relationship exists between energy use per capita
and GDP per Capita. The initial regression is as follows:
On average, if GDP per capita increases by $1000
US dollars, energy consumption per capita
increases by .07 tons. This is statistically significant
at the 1% level.
7-11
Scatter Plots of this Relationship for Europe,
North America, and South America
Energy per Capita vs. GDP per Capita: Europe
Energy per Capita vs. GDP per Capita: North
America
Energy per Capita (tons)
100
80
60
40
20
0
0
5
10
15
50
40
30
20
10
0
0
5
10
15
20
GDP per Capita ($1000 of US dollars)
20
GDP per Capita ($1000 of US dollars)
Energy per Capita vs. GDP per Capita: South
America
Energy per Capita (tons)
Energy per Capita (tons)
120
3
2.5
2
1.5
1
0.5
0
0
5
10
15
GDP per Capita ($1000 of US dollars)
Are the intercepts the same for these three locations?
7-12
Excel Example
Are the intercepts different between Europe and North America
with South America as the omitted group?
On average, if GDP per capita increases by $1000
US dollars, energy consumption per capita
increases by .07 tons. This is statistically significant
at the 10% level.
The dummy variables for Europe and N.
America are not statistically different from S.
America at the 10% level.
7-13
Construct and Use Interaction Effects
Interaction effects are the product of two
different independent variables.
We are first going to consider interaction
effects between a quantitative variable and a
dummy variable.
This type of interaction effect changes the
slope of the quantitative variable for the
various levels of the dummy variable.
7-14
Interaction Regression Model
Worksheet
Case, i
yi
x1i
x2i
x1i x2i
1
2
3
4
:
1
4
1
3
:
1
8
3
5
:
1
0
0
1
:
1
0
0
5
:
multiply x1 by x2 to get x1x2, then
run regression with y, x1, x2 , x1x2
7-15
Consider the price of the house with
three levels of the dummy variable
Let the default category be “condo” and x2 is 1 if ranch and 0 if
not and x3 is 1 if split level and 0 if not and x1 is square feet.
ŷ  ˆ0  ˆ1x1  ˆ2 x 2  ˆ3 x 3  ˆ4 x1x 2  ˆ5 x1x 3
ˆ2 shows a change in the intercept on price if the
house is a ranch style, compared to a condo
̂ 3 shows a change in the intercept on price if the
house is a split level style, compared to a condo
ˆ4 shows the impact of the slope on price if the
house is a ranch style, compared to a condo
̂ 5 shows the impact of the slope on price if the
house is a split level style, compared to a condo
7-16
Interaction Term Worksheet
Suppose the estimated equation is
ŷ  18.30  98x1  22.44x 2  16.38x 3  45x1x 2  32x1x 3
7-17
Visual Depiction of Interaction Terms
with Dummy Variables
7-18
Excel Example
Are the slopes and intercepts different between Europe and
North America with South America as the omitted group?
On average, if GDP per capita increases by $1000
US dollars, energy consumption per capita
increases by .07 tons. This is statistically significant
at the 10% level.
The dummy variables for Europe and N.
America are not statistically different from S.
America at the 10% level.
7-19
Control for Nonlinear Relationships
• The relationship between the dependent
variable and an independent variable may not
be linear
• Useful when scatter diagram indicates nonlinear relationship
• Example: Quadratic model
– y  β  β x  β x2  ε
0
1
j
2
j
– The second independent variable is the square of
the first variable
7-20
Polynomial Regression Model
General form:
y  β0  β1x j  β2 x    βp x  ε
2
j
p
j
• where:
β0 = Population regression constant
βi = Population regression coefficient for variable xj : j = 1, 2, …k
p = Order of the polynomial
i = Model error
If p = 2 the model is a quadratic model:
y  β0  β1x j  β2 x2j  ε
7-21
Linear vs. Nonlinear Fit
y
y
x
x
Linear fit does not give
random residuals
residuals
residuals
x
x

Nonlinear fit gives
random residuals
7-22
Quadratic Regression Model
y  β0  β1x j  β2 x  ε
2
j
Quadratic models may be considered when scatter diagram
takes on the following shapes:
y
y
β1 < 0
β2 > 0
x1
y
β1 > 0
β2 > 0
x1
y
β1 < 0
β2 < 0
x1
β1 > 0
β2 < 0
x1
β1 = the coefficient of the linear term
β2 = the coefficient of the squared term
7-23
Marginal Effect for the Quadratic
Regression Model
ˆy  β̂ 0  β̂1 x j  β̂ 2 x 2j
How does a one unit increase in xj affect the
dependent variable y (the marginal effect)? This
is just a partial derivative of y with respect to xj
ŷ
 β̂1  2β̂ 2 x j
x j
Notice that the effect that xj has on y changes
depending on the value of xj and this should be
evaluated at xj-1
7-24
Illustration of the Marginal Effect that xj
has on y
The marginal effect is the slope of a line tangent to the curve
At x1j the marginal effect is positive
At x2j the marginal effect is negative
x1j
x2j
7-25
Empirical Example of the Quadratic Effect:
Utility Bill vs. Temperature
Average Bill vs. Average Monthly Temperature
$160.00
$150.00
$140.00
$130.00
Average Bill
$120.00
$110.00
$100.00
$90.00
$80.00
$70.00
$60.00
35
45
55
65
75
Average Monthly Temperature
85
95
7-26
Utility Bill vs. Temperature – Simple
Linear Regression
Even though the scatter plot shows a clear
relationship between utility bill and
temperature, there is no linear
relationship between these two variables.
7-27
Utility Bill vs. Temperature – Quadratic
Regression
UtilityBil l  484.12 12.08temp  0.09temp2
When a quadratic relationship is fit
between utility bill and monthly
temperature the linear and quadratic
terms are now statistically significant at
the 1% level.
7-28
Utility Bill vs. Temperature – Quadratic
Regression Interpretation
UtilityBil l  484.12 12.08temp  0.09temp2
The marginal effect is utilitybill  -12.08  2(0.09) temp
temp
The marginal effect at a temperature of 40 (evaluated at 39) is
- 12.08  2(0.09)39  12.08  7.02  5.06
which means that if temperature increases from 39 to 40
degrees then the utility bill decreases by $5.06.
The marginal effect at a temperature of 80 (evaluated at 79) is
- 12.08  2(0.09)79  12.08  14.22  2.14
which means that if temperature increases from 79 to 80
degrees then the utility bill increases by $2.14.
7-29
Finding Where the Quadratic Function
Reaches a Maximum (or Minimum)
Method: Set the first derivative of the regression equal to 0 and
solve for xj. ŷ
 β̂1  2β̂ 2 x j  0
x j
or x   β̂1
j
2β̂ 2
Using the utility bill example, the function reaches a minimum at
 (12.08)
temp 
 67.11
2(.09)
or at a temperature of 67.11 degrees.
The function will reach a minimum if β̂ 2 is positive and the
function will reach a maximum if β̂ 2 is negative.
7-30
Testing for Significance: Quadratic
Model
• Test for Overall Relationship between y and xj (test if
the two parameters are jointly equal to 0).
– Use an F-test with the Hypothesis
H0: β1 = β2 = 0
(xj does not affect y)
H1: not H0
(xj affects y)
• Testing the Quadratic Effect
2
y

β

β
x

β
x
– Compare quadratic model
0
1 j
2 j ε
with the linear model y  β0  β1x j  ε
– Use a t-test with the Hypothesis
H0: β2 = 0
(No 2nd order polynomial term)
HA: β2  0
(2nd order polynomial term is needed)
7-31
Higher Order Models
y
x
If p = 3 the model is a cubic form:
y  β0  β1x j  β2 x  β3 x  ε
2
j
3
j
7-32
Interaction Effects
• Hypothesizes interaction between pairs of x
variables
– Response to one x variable varies at different
levels of another x variable
• Contains two-way cross product terms
y  β0  β1x1  β2 x12  β3 x 3  β 4 x1x 2  β5 x12 x 2
Basic Terms
Interactive Terms
7-33
Effect of Interaction
• Given:
y  β0  β1x1  β2 x 2  β3 x1x 2  ε
• Without interaction term, effect of x1 on y is
measured by β1
• With interaction term, effect of x1 on y is
measured by β1 + β3 x2
• Effect changes as x2 increases
7-34
Evaluating Presence of Interaction
• Hypothesize interaction between pairs of
independent variables
y  β0  β1x1  β2 x 2  β3 x1x 2  ε
• Hypotheses:
– H0: β3 = 0 (no interaction between x1 and x2)
– HA: β3 ≠ 0 (x1 interacts with x2)
7-35
Estimate Marginal Effects as Percent
Changes and Elasticities
The models are estimated taking natural
logarithms of the dependent variable, the
independent variable, or both.
- Log-Linear Model
- Log-Log Model
7-36
Log – Linear Model
The population regression function is specified
as
ln y  β 0  β1x1  ε
and β1 is interpreted as, “on average, if x1
increases by 1 unit then y increases by β1100%
Note that this is only an approximation because
the natural log is a nonlinear function.
7-37
Empirical Example of the Log – Linear
Model
The dependent variable is the natural log of energy per capita
This slope coefficient on gdppc is
interpreted as, “on average, if GDP per
capita increases by $1000 then energy
consumption per capita goes up by
(0.026)100% or 2.6%.” This coefficient is
statistically significant at the 1% level.
7-38
Empirical Example of the Log – Linear Model with
Dummy Variables
The dependent variable is the natural log of energy per capita with South
America as the omitted group
The Europe dummy variable coefficient is
interpreted as “on average energy consumption per
capita is 50.5% higher in Europe than South
America.” The North America dummy variable
coefficient is interpreted as “on average energy
consumption per capita is 56.6% higher in North
America than South America.” Europe is statistically
insignificant while North America is marginally
significant (significant at the 10% level).
7-39
Log – Log Model
The population regression function is specified
as
ln y  β 0  β1ln x 1  ε
and β1 is interpreted as “on average, if x1
increases by 1 percent then y increases by
β1 percent.”
In the log-log model β1 is an elasticity.
7-40
Empirical Example of the Log – Log Model
The dependent variable is the natural log of energy per capita
This slope coefficient on lngdppc is
interpreted as, “on average, if GDP per
capita increases by 1% then energy
consumption per capita goes up by .69%.”
This coefficient is statistically significant at
the 1% level.
7-41
Empirical Example of the Log – Linear Model with
Dummy Variables
The dependent variable is the natural log of energy per capita with South
America as the omitted group
The Europe dummy variable coefficient is
interpreted as “on average energy consumption per
capita is 9.3% lower in Europe than South America.”
The North America dummy variable coefficient is
interpreted as “on average energy consumption per
capita is 41.5% higher in North America than South
America.” Neither of these are statistically
significant at the 10% level.
7-42