Transcript Lecture6
Confidence intervals and
hypothesis testing
Petter Mostad
2005.10.03
Confidence intervals (repetition)
• Assume μ and σ2 are some real numbers, and
assume the data X1,X2,…,Xn are a random sample
from N(μ,σ2).
– Then
X
~ N (0,1)
/ n
P(1.96 Z 1.96) 95%
Z
– thus
P
(
X
1.96
X
1.96
) 95%
– so
n
n
(
X
1.96
,
X
1.96
) is a
and we say that
n
n
confidence interval for μ with 95% confidence, based
on the statistic X
Confidence intervals, general idea
• We have a model with an unknown parameter
• We find a ”statistic” (function of the sample) with
a known distribution, depending only on the
unknown parameter
• This distribution is used to construct an interval
with the following property: If you repeat many
times selecting a parameter and simulating the
statistic, then about (say) 95% of the time, the
confidence interval will contain the parameter
Hypothesis testing
• Selecting the most plausible model for the data,
among those suggested
• Example: Assume X1,X2,…,Xn is a random sample
from N(μ,σ2), where σ2 is known, but μ is not; we
want to select μ fitting the data.
• One possibility is to look at the probability of
observing the data given different values for μ.
(We will return to this)
• Another is to do a hypothesis test
Example
• We select two alternative hypotheses:
– H0: 0
– H1: 0
• Use the value of X to test H0 versus H1: If X
is far from 0 , it will indicate H1.
• Under H0, we know that
P( X 1.96
n
0 X 1.96
• Reject H0 if X is outside
n
) 95%
(0 1.96
,
1.96
)
0
n
n
General outline for hypothesis
testing
• The possible hypotheses are divided into
H0, the null hypothesis, and H1, the
alternative hypothesis
• A hypothesis can be
– Simple, so that it is possible to compute the
probability of data (e.g., 3.7 )
– Composite, i.e., a collection of simple
hypotheses (e.g., 3.7 )
General outline (cont.)
• A test statistic is selected. It must:
– Have a higher probability for ”extreme” values under
H1 than under H0
– Have a known distribution under H0 (when simple)
• If the value of the test statistic is ”too extreme”,
then H0 is rejected.
• The probability, under H0, of observing the given
data or something more extreme is called the pvalue. Thus we reject H0 if the p-value is small.
• The value at which we reject H0 is called the
significance level.
Note:
• There is an asymmetry between H0 and H1: In fact,
if the data is inconclusive, we end up not rejecting
H0.
• If H0 is true the probability to reject H0 is (say)
5%. That DOES NOT MEAN we are 95% certain
that H0 is true!
• How much evidence we have for choosing H1 over
H0 depends entirely on how much more probable
rejection is if H1 is true.
Errors of types I and II
• The above can be seen as a decision rule for
H0 or H1.
• For any such rule we can compute (if both
H0 and H1 are simple hypotheses):
H0 true
Accept H0
P(accept | H0)
Reject H0
P(reject | H0)
Significance
H1 true
1 - power
P(accept | H1)
TYPE II error
TYPE I error
P(reject | H1)
Significance and power
• If H0 is composite, we compute the
significance from the simple hypothesis that
gives the largest probability of rejecting H0.
• If H1 is composite, we compute a power
value for each simple hypothesis. Thus we
get a power function.
Example 1: Normal distribution with
unknown variance
• Assume X 1 , X 2 ,..., X n ~ N ( , 2 )
• Then
X
~ tn 1
s/ n
• Thus
P( X tn1, / 2
s
n
X tn1, / 2
s
n
)
• So a confidence interval for , with significance
is given by
( X tn1, / 2 sn , X tn1, / 2 sn )
Example 1 (Hypothesis testing)
• Hypotheses: H 0 : 0
H1 : 0
• Test statistic
X 0
~ tn 1 under H0
s/ n
• Reject H0 if
X 0 tn1, / 2
X 0 tn1, / 2
s
n
s
n
or if
• Alternatively, the p-value for the test can be
computed (if X 0 ) as the such that
X 0 tn1, / 2
2
n
Example 1 (cont.)
• Hypotheses: H 0 : 0
• Test statistic
• Reject H0 if
H1 : 0
X 0
~ tn 1 assuming 0
s/ n
X 0 tn 1,
s
n
• Alternatively, the p-value for the test can be
computed as the such that
X 0 tn 1,
2
n
Example 1 (cont.)
• Assume that you want to analyze as above
the data in some column of an SPSS table.
• Use ”Analyze” => ”Compare means” =>
”One-sample T Test”
• You get as output a confidence interval, and
a test as the one described above.
• You may adjust the confidence level using
”Options…”
Example 2: Differences between
means
2
X
,
X
,...,
X
~
N
(
,
• Assume 1 2
n
x
x ) and
2
Y1 , Y2 ,..., Ym ~ N ( y , y )
• We would like to study the difference 1 2
• Four different cases:
–
–
–
–
Matched pairs
Known population variances
Unknown but equal population variances
Unknown and possibly different pop. variances
Known population variances
• We get
X Y (x y )
2
x
nx
y2
~ N (0,1)
ny
• Confidence interval for 1 2
X Y Z / 2
2
x
nx
y2
ny
Unknown but equal population
variances
• We get
X Y (x y )
s
2
p
nx
where
s 2p
s
2
p
~ t nx n y 2
ny
(nx 1) sx2 (ny 1) s y2
nx ny 2
• Confidence interval for
x y
X Y tnx ny 2, / 2
s 2p
nx
s 2p
ny
Hypothesis testing: Unknown but
equal population variances
• Hypotheses:
H 0 : x y
X Y
• Test statistic:
2
p
2
p
s
s
nx n y
H1 : x y
~ t nx n y 2
”T test with equal variances”
• Reject H0 if
X Y
2
p
2
p
s
s
nx n y
tnx ny 2, / 2
or if
X Y
2
p
2
p
s
s
nx n y
tnx ny 2, / 2
Unknown and possibly unequal
population variances
• We get
X Y (x y )
2
x
2
y
s
s
nx n y
where
• Conf. interval for
~ t
s x2
nx
s 2y
ny
2
2
2
( sx2 / nx ) 2 ( s y / n y )
nx 1
ny 1
x y
X Y t , / 2
2
x
s y2
s
nx ny
Hypothesis test: Unknown and
possibly unequal pop. variances
• Hypotheses:
H 0 : x y
X Y
• Test statistic
2
y
2
x
s
s
nx n y
H1 : x y
~ t
”T test with unequal variances”
• Reject H0 if
X Y
2
x
2
y
s
s
nx n y
t , / 2
or if
X Y
2
x
2
y
s
s
nx n y
t , / 2
Practical examples:
• The lengths of children in a class are measured at
age 8 and at age 10. Use the data to find an
estimate, with confidence limits, on how much
children grow between these ages.
• You want to determine whether a costly operation
is generally done more cheaply in France than in
Norway. Your data is the actual costs of 10 such
operations in Norway and 20 in France.
Example 3: Population proportions
• Assume X ~ Bin( , n) , so that p Xn is a frequency.
p
• Then
~ N (0,1) (approximately, for large n)
(1 ) / n
p
• Thus
p(1 p) / n
• Thus
P p Z / 2
~ N (0,1)
(approximately, for large n)
p (1 p )
p Z / 2
n
• Confidence interval for
p Z / 2
p (1 p )
n
p (1 p )
, p Z / 2
n
p (1 p )
n
Example 3 (Hypothesis testing)
• Hypotheses:H 0 : 0
• Test statistic
under H0, for large n
• Reject H0 if p 0 Z / 2
p 0 Z / 2
H1 : 0
p 0
~ N (0,1)
0 (1 0 ) / n
0 (1 0 )
n
0 (1 0 )
n
or if
Example 4: Differences between
population proportions
• Assume X1 ~ Bin(1, n1 ) and X 2 ~ Bin( 2 , n2 ), so
X
X
p
p
that 1 n and 2 n are frequencies
• Then
1
2
1
2
p1 p2 ( 1 2 )
~ N (0,1) (approximately)
1 (1 1 ) 2 (1 2 )
n1
n2
• Confidence interval for 1 2
p1 p2 Z / 2
p1 (1 p1 ) p2 (1 p2 )
n1
n2
Example 4 (Hypothesis testing)
• Hypotheses: H 0 : 1 2
• Test statistic
H1 : 1 2
p1 p2
~ N (0,1)
p0 (1 p0 ) p0 (1 p0 )
n1
n2
n1 p1 n2 p2
where p0
n1 n2
• Reject H0 if
p1 p2
Z / 2
p0 (1 p0 ) p0 (1 p0 )
n1
n2
Example 5: The variance of a normal
distribution
• Assume X 1 , X 2 ,..., X n ~ N ( , 2 )
• Then (n 1) s 2
2
• Thus
2
~ n 1
2
(n 1)s 2
2
P n1,1 / 2
n1, / 2
2
• Confidence interval for 2
(n 1) s 2 (n 1) s 2
, 2
2
n 1,1 / 2 n 1, / 2
Example 6: Comparing variances for
normal distributions
• Assume X 1 , X 2 ,..., X n ~ N ( x , x2 ) Y1, Y2 ,..., Ym ~ N ( y , y2 )
2
2
s
/
• We get
x
x
~ Fnx 1, ny 1
2
2
sy / y
• Fnx-1,ny-1 is an F distribution with nx-1 and ny-1
degrees of freedom
• We can use this exactly as before to obtain a
confidence interval for x2 / y2 and for testing for
example if x2 y2
• Note: The assumption of normality is crucial!
Sample size computations
• For a sample from a normal population with
known variance, the size of the conficence interval
for the mean depends only on the sample size.
• So we can compute the necessary sample size to
match a required accuracy
• Note: If the variance is unknown, it must
somehow be estimated on beforehand to do the
computation
• Works also for population proportion estimation,
giving an inequality for the required sample size
Power computations
• If you reject H0, you know very little about the
evidence for H1 versus H0 unless you study the
power of the test.
• The power is 1 minus the probability of rejecting
H0 given that a hypothesis in H1 is true.
• Thus it is a function of the possible hypotheses in
H1.
• We would like our tests to have as high power as
possible.