Transcript Topic_10

Topic 10: Miscellaneous
Topics
Outline
•
•
•
•
•
Joint estimation of β0 and β1
Multiplicity
Regression through the origin
Measurement error
Inverse predictions
Joint Estimation of β0 and β1
• Confidence intervals are used for a single
parameter
• Confidence regions for two or more
parameters
• The region for (β0, β1) defines a set of
lines…that form a band about the
estimated regression line (Topic 5)
Joint Estimation of β0 and β1
• Since β0 and β1 are (jointly) Normal, the
natural (i.e., smallest) confidence region
is an ellipse (STAT 524)
• Text consider rectangles (KNNL 4.1) (i.e.,
region formed from the union of two
separate intervals)
• Need to adjust confidence level of each
CI so region has proper a level
Bonferroni Correction
• We want the probability that both
intervals are correct to be ≥ 0.95
• Basic idea is an error budget
• Spend half on β0 and half on β1
• Since a=0.05, we use α* =0.025 for
each CI ( consider 97.5% CIs)
Bonferroni Correction
• For joint region of (β0, β1), use
b1 ± tcs(b1)
b0 ± tcs(b0)
where tc = t(.9875, n-2)
Note: .9875 = 1 – (.05)/(2*2)
Expanding on the Note
• We start with a 5% error budget.
• We have two intervals so we give
0.05/2=2.5% to each
• Each interval is two-sided so we
again divide by 2
• Thus 0.9875 = 1 – (.05)/(2*2)
Bonferroni Concept
• Theory behind this correction
• Let the two intervals be I1 and I2
• We will use c if the interval
contains the true parameter
value, nc if the interval does not
contain the true parameter
Bonferroni Inequality
• P(both c)=1-P(at least one nc)
• P(at least one nc)
= P(I1 nc) + P(I2 nc) - P(both nc)
≤ P(I1 nc) + P(I2 nc)
• Thus,
P(both c) ≥ 1-(P(I1 nc) + P(I2 nc))
Green area on left is greater
than green area on the right
.025
.025
<.025
.025
Bonferroni Inequality
• P(both c) ≥ 1-(P(I1 nc) + P(I2 nc))
• So if we use 0.05/2 for each interval,
1- (P(I1 nc) + P(I2 nc)) = 1 – 0.05 =0.95
• So P(both cor) is at least 0.95
• We will use this same idea when we do
multiple comparisons in ANOVA
Joint Estimation of β0 and β1
• For Toluca example, rectangular region is
8.20 ≤ b0≤ 116.5
2.85 ≤ b1≤ 4.29
• Region shown on next page…all lines
when X positive between
116.5 + 4.29X
8.2 + 2.85X
Definitely not as small nor
symmetric about mean X as the
confidence band
Mean Response CIs
• Simultaneous estimation for all Xh,
uses Working-Hotelling (KNNL 2.6)
ˆ h ± Ws( ˆ h) where W2=2F(1-α; 2, n-2)
• For simultaneous estimation for a few
Xh, use Bonferroni. Let g=# of Xh. Then
ˆ h ± Bs(ˆ h ) where B=t(1-α/(2g), n-2)
• Use this when B < W  narrower CIs
Simultaneous PIs
• Simultaneous prediction for a few Xh, use
• Bonferroni
ˆ h± Bs(pred)
where B=t(1-α/(2g), n-2)
• Scheffé
ˆ h± Ss(pred)
where S2 = gF(1-α; g, n-2)
• Again choose one with narrower intervals
Regression through the
Origin
•
•
•
•
Yi = β1Xi + ei
NOINT option in PROC REG
Generally not a good idea
Might be forcing model to behave
certain way in area with no data
• Problems with R2 and other statistics
• See cautions, KNNL p 164
Measurement Error
• For Y, this is usually not a
problem…just adds to variance s
• For X, we can get biased estimators
of our regression parameters
• See KKNL 4.5, pp 165-168
• Berkson model: special case where
measurement error in X is no
problem
Inverse Predictions
• Sometimes called calibration
• Given Yh, predict the corresponding
ˆ
value of X, X
h
• Solve the fitted equation for Xh
ˆ = (Yh – b0)/b1, b1≠ 0
• X
h
• Approximate CI can be obtained, see
KNNL, p 169
Background Reading
• Next class we will do simple regression
with vectors and matrices so that we
can generalize to multiple regression
• Look at KNNL 5.1 to 5.7 if this is
unfamiliar to you