Transcript Power 16
Power 16
1
Review
Post-Midterm
Cumulative
2
Projects
3
Logistics
Put power point slide show on a high
density floppy disk, or e-mail as an
attachment, for a WINTEL machine.
Email [email protected] the slide-show
as a PowerPoint attachment
4
Assignments
1. Project choice
2. Data Retrieval
3. Statistical Analysis
4. PowerPoint Presentation
5. Executive Summary
6. Technical Appendix
7. Graphics
Power_13
5
PowerPoint Presentations: Member 4
1. Introduction: Members 1 ,2 , 3
What
Why
How
2. Executive Summary: Member 5
3. Exploratory Data Analysis: Member 3
4. Descriptive Statistics: Member 3
5. Statistical Analysis: Member 3
6. Conclusions: Members 3 & 5
7. Technical Appendix: Table of Contents,
Member 6
6
Executive Summary and Technical
Appendix
7
I.
Your report should have an executive summary of one to one
and a half pages that summarizes your findings in words for a nontechnical reader. It should explain the problem being examined
from an economic perspective, i.e. it should motivate interest in the
issue on the part of the reader. Your report should explain how you
are investigating the issue, in simple language. It should explain
why you are approaching the problem in this particular fashion.
Your executive report should explain the economic importance of
your findings.
The technical details of your findings you can attach as an
appendix.
8
Technical Appendix
Table of Contents
Spreadsheet of data used and sources or
if extensive, a subsample of the data
Descriptive Statistics and Histograms for
the variables in the study
If time series data, a plot of each variable
against time
If relevant, plot of the dependent Vs.
each of the explanatory variables
9
Technical Appendix (Cont.)
Statistical Results, for example regression
Plot of the actual, fitted and error and other
diagnostics
Brief summary of the conclusions,
meanings drawn from the exploratory,
descriptive, and statistical analysis.
10
Post-Midterm Review
Project I: Power 16
Contingency Table Analysis: Power 14, Lab
8
ANOVA: Power 15, Lab 9
Survival Analysis: Power 12, Power 11, Lab
7
Multi-variate Regression: Power 11 , Lab 6
11
Slide Show
Challenger disaster
12
Project I
Number of O-Rings Failing On Launch i:
yi(#) = a + b*tempi + ei
Two Ways to Proceed
Biased because of zeros, even if divide equation by 6
Tobit, non-linear estimation: yi(#) = a + b*tempi + ei
Bernoulli variable: probability models
Probability Models: yi(0,1) = a + b*tempi + ei
13
Project I (Cont.)
Probability Models: yi(0,1) = a + b*tempi + ei
OLS, Linear Probability Model, linear
approximation to the sigmoid
Probit, non-linear estimate of the sigmoid
Logit, non-linear estimate of the sigmoid
Significant Dependence on Temperature
t-test (or z-test) on slope, H0 : b=0
F-test
Wald test
14
Project I (Cont.)
Plots of Number or Probability Vs Temp.
Label the axes
Answer all parts, a-f
The most frequent sins
Did not explicitly address significance
Did not answer b, 660 : all launches at lower
temperatures had one or more o-ring failures
Did not execute c, estimate linear probability
model
15
Challenger Disaster
Failure of O-rings that sealed grooves on
the booster rockets
Was there any relationship between oring failure and temperature?
Engineers knew that the rubber o-rings
hardened and were less flexible at low
temperatures
But was there launch data that showed a
problem?
16
Challenger Disaster
What: Was there a relationship between
launch temperature and o-ring failure prior
to the Challenger disaster?
Why: Should the launch have proceeded?
How: Analyze the relationship between
launch temperature and o-ring failure
17
Launches Before Challenger
Data
number of o-rings that failed
launch temperature
18
o-rings
3
1
1
1
0
0
0
0
0
0
1
temperature
53
57
58
63
66
67
67
67
68
69
70
19
o-rings
1
0
0
0
0
2
0
0
0
0
0
temperature
70
70
70
72
73
75
75
76
76
78
79
20
o-rings
temperature
0
80
0
81
21
Exploratory Analysis
Launches where there was a problem
22
Orings temperature
1
1
1
1
1
2
3
58
57
70
63
70
75
53
23
3.5
3.0
ORINGS
2.5
2.0
1.5
1.0
0.5
50
55
60
65
TEMP
.
70
75
80
Exploratory Analysis
All Launches
Plot of failures per observation versus temperature range shows
temperature dependence:
Mean temperature for the 7 launches with o-ring failures was
lower, 63.7, than for the 17 launches without o-ring failures,
72.6. Contingency table analysis
25
Launches and O-Ring Failures
(Yes/No)
Fail: Yes
Fail: No
Column Totals
53-62 F
3
0
3
63-71 F
3
8
11
72-81 F
1
9
10
Row Totals
7
17
24
26
Launches and O-Ring Failures
(Yes/No) Expected/Observed
Fail: Yes
Fail: No
Column Totals
53-62 F
0.875/3
2.125/0
3
63-71 F
3.208/3
7.792/8
11
72-81 F
2.917/1
7.083/9
10
Row Totals
7
17
24
27
Launches and O-Ring Failures ChiSquare, 2dof=9.08, crit(=0.05)=6
Fail: Yes
Fail: No
Column Totals
53-62 F
5.16
2.125
3
63-71 F
0.013
0.005
11
72-81 F
1.26
0.519
10
Row Totals
7
17
24
28
Number of O-ring Failures Vs. Temperature
4
ORINGS
3
2
1
0
50
60
70
TEMP
80
90
Logit Extrapolated to 31F:
Probit extrapolated to 31F:
Probability Models
1
Probability
0.8
0.6
Bernoulli
LPM Fitted
Probit Fitted
0.4
0.2
0
30
40
50
60
70
80
90
-0.2
Temperature
30
Extrapolating OLS to 31F: OLS:
Tobit:
Number of Failed O-Rings
3.5
Number of Failed O-Rings
OLS Fitted
Tobit Fitted
3
2.5
Number
2
1.5
1
0.5
0
30
40
50
60
70
80
90
-0.5
Temperature
31
Conclusions
From extrapolating the probability models
to 31 F, Linear Probability, Probit, or
Logit, there was a high probability of one
or more o-rings failing
From extrapolating the Number of Orings failing to 31 F, OLS or Tobit, 3 or
more o-rings would fail.
There had been only one launch out of
24 where as many as 3 o-rings had
failed.
Decision theory argument: expected
cost/benefit ratio:
32
Conclusions
Decision theory argument: expected
cost/benefit ratio:
33
Ways to Analyze Challenger
Difference in mean temperatures for failures and
successes
Difference in probability of one or more o-ring
failures for high and low temperature ranges
Probabilty models: LPM (OLS), probit, logit
Number of o-ring failure per launch Vs. Temp.
OLS, Tobit
Contingency table analysis
ANOVA
34
Contingency Table Analysis
Challenger example
35
Launches and O-Ring Failures
(Yes/No)
Fail: Yes
Fail: No
Column Totals
53-62 F
3
0
3
63-71 F
3
8
11
72-81 F
1
9
10
Row Totals
7
17
24
36
ANOVA and O-Rings
Probability one or more o-rings fail
Low temp: 53-62 degrees
Medium temp: 63-71 degrees
High temp: 72-81 degrees
Average number of o-rings failing per launch
Low temp: 53-62 degrees
Medium temp: 63-71 degrees
High temp: 72-81 degrees
37
Probability one or more o-rings fails
38
Number of o-rings failing per launch
39
40
Outline
ANOVA and Regression
(Non-Parametric Statistics)
(Goodman Log-Linear Model)
41
Anova and Regression: One-Way
Salesaj =
c(1)*convenience+c(2)*quality+c(3)*price+ e
E[salesaj/(convenience=1, quality=0, price=0)]
=c(1) = mean for city(1)
c(1) = mean for city(1) (convenience)
c(2) = mean for city(2) (quality)
c(3) = mean for city(3) (price)
Test the null hypothesis that the means are equal
using a Wald test: c(1) = c(2) = c(3)
42
One-Way ANOVA and Regression
Regression Coefficients are the City Means; F statistic
43
Anova and Regression: One-Way
Alternative Specification
Salesaj = c(1) +
c(2)*convenience+c(3)*quality+e
E[Salesaj/(convenience=0, quality=0)] =
c(1) = mean for city(3) (price, the omitted
one)
E[Salesaj/(convenience=1, quality=0)] =
c(1) + c(2) = mean for city(1) (convenience)
c(1) = mean for city(3), the omitted city
c(2) = mean for city(1) minus mean for city(3)
Test that the mean for city(1) = mean for city(3)
Using the t-statistic for c(2)
44
Anova and Regression: One-Way
Alternative Specification
Salesaj = c(1) +
c(2)*convenience+c(3)*price+e
E[Salesaj/(convenience=0, price=0)] =
c(1) = mean for city(2) (quality, the
omitted one)
E[Salesaj/(convenience=1, price=0)] =
c(1) + c(2) = mean for city(1)
(convenience)
c(1) = mean for city(2), the omitted city
c(2) = mean for city(1) minus mean for city(2)
Test that the mean for city(1) = mean for
city(2)
45
ANOVA and Regression: Two-Way
Series of Regressions; Compare to
Table 11, Lecture 15
Salesaj = c(1) + c(2)*convenience + c(3)*
quality + c(4)*television +
c(5)*convenience*television +
c(6)*quality*television + e,
SSR=501,136.7
Salesaj = c(1) + c(2)*convenience + c(3)*
quality + c(4)*television + e,
SSR=502,746.3
Test for interaction effect: F2, 54 =
[(502746.3-501136.7)/2]/(501136.7/54) =
46
Table of Two-Way ANOVA for Apple Juice Sales
Table 11: 2-Way ANOVA of Apple Juice Sales
Source of Variation
Sum of Squares
Degrees of
Mean Square
Freedom
Explained(between
ESS =
treatments)
Strategy ESS(Strat) = 98838.6
(a-1) = 2
49419.3
Medium
ESS(Med) = 13172.0
(b-1) = 1
13172.0
Interaction
ESS(I) = 1609.6
(a-1)(b-1) = 2
804.8
(n-ab) = 60 – 6
9280.3
Unexplained(within
USS = 501136.7
treatments)
Total
= 54
TSS = 614756.98
(n-1) = 59
ANOVA and Regression: Two-Way
Series of Regressions
Salesaj = c(1) + c(2)*convenience + c(3)*
quality + e, SSR=515,918.3
Test for media effect: F1, 54 = [(515918.3502746.3)/1]/(501136.7/54) =
13172/9280.3 = 1.42
Salesaj = c(1) +e, SSR = 614757
Test for strategy effect: F2, 54 = [(614757515918.3)/2]/(501136.7/54) =
(98838.7/2)/(9280.3) = 5.32
48
Survival Analysis
Density, f(t)
Cumulative distribution function, CDF, F(t)
Survivor Function, S(t) = 1-F(t)
Probability you failed up to time t* =F(t*)
Probability you survived longer than t*, S(t*)
Kaplan-Meier estimates:
(#at risk- # ending)/# at risk
Applications
Testing a new drug
49
Chemotherapy Drug Taxol
Current standard for ovarian cancer is
taxol and a platinate such as cisplatin
Previous standard was
cyclophosphamide and cisplatin
Kaplan-Meier Survival curves comparing
the two regimens
Lab 7: ( # at risk- #ending)/# at riak
50
Taxol ( Bristol-Myers Squibb)
interrupts cell division (mitosis)
It is a cyclical hydrocarbon
51
Top Panel: European
Canadian and Scottish,
342 at risk for Tc, 292
Survived 1 year
Bottom Panel:
Gynecological Oncology
Group, 196 at risk
For Tc, 168 survived
1 year
52
2003 Final
53
Nonparametric Statistics
What to do when the sample of
observations is not distributed normally?
54
3 Nonparametric Techniques
Wilcoxon Rank Sum Test for independent
samples
Signs Test for Matched Pairs: Rated Data
Data Analysis Plus
Eviews, Descriptive Statistics
Wilcoxon Signed Rank Sum Test for
Matched Pairs: Quantitative Data
Eviews
55
Wilcoxon Rank Sum Test for
Independent Samples
Testing the difference between the
means of two populations when they are
non-normal
A New Painkiller Vs. Aspirin, Xm17-02
56
Rating scheme
Score
Legend
5
Ext remel y Effective
4
Qui t eEffective
3
Somewhat Effective
2
Slightly Effecti ve
1
Not At Al l Effective
57
Ratings
New Drug
3
5
4
3
2
5
1
4
5
3
3
5
5
5
As p irin
4
1
3
2
4
1
3
4
2
2
2
4
3
4
58
Rank the 30 Ratings
30 total ratings for both samples
3 ratings of 1
5 ratings of 2
etc
59
Rating
1
1
1
2
2
2
2
2
3
3
3
3
3
3
Raw Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Rank/Ties
2
2
2
6
6
6
6
6
12
12
12
12
12
12
3
15
12
60
continued
Rating
4
4
4
4
4
4
4
4
5
5
5
5
5
5
Raw Rank
16
17
18
19
20
21
22
23
24
25
26
27
28
29
5
30
Rank/Ties
19.5
19.5
19.5
19.5
19.5
19.5
19.5
19.5
27
27
27
27
27
27
27
61
Drug Rate
3
5
4
3
2
5
1
4
5
3
3
5
5
5
4
Rank
12
27
19.5
12
6
27
2
19.5
27
12
12
27
27
27
19.5
Rank Sum 276.5
As p . Rate
4
1
3
2
4
1
3
4
2
2
2
4
3
4
5
Rank
19.5
2
12
6
19.5
2
12
19.5
6
6
6
19.5
12
19.5
27
188.5
62
Rank Sum, T
E (T )= n1 (n1 + n2 + 1)/2 = 15*31/2 = 232.5
VAR (T) = n1 * n2 (n1 + n2 + 1)/12
VAR (T) = 15*31/12 , sT = 24.1
For sample sizes larger than 10, T is
normal
Z = [T-E(T)]/ sT = (276.5 - 232.5)/24.1 =
1.83
Null Hypothesis is that the central
tendency for the two drugs is the same
Alternative hypothesis: central tendency for
the new drug is greater than for aspirin: 163
Figure 1: One-Tailed Test, 5% Level, Normal Distribution
0.5
FREQUENCY
0.4
0.3
0.2
5%
0.1
0.0
-4
-2
0
Z
2
1.645
4