Transcript Lecture 6

Econ 140
Inference about a Mean
Part II
Lecture 6
Lecture 6
1
Today’s Plan
Econ 140
• Confidence Intervals
• Hypothesis testing
– Small samples
– Large samples
• Types of errors
• Quick review of what we’ve learned so far
Lecture 6
2
What we’ve seen so far
Econ 140
• We’ve worked with univariate populations
– Recall that we have the standardized normal variate Z,
distributed Z ~ N(0,1):
Z
Y  Y 

• Ask question E(Y)? What is the probability that someone
selected at random will have earnings between $300 and
$400?
Lecture 6
3
What we’ve seen so far
Econ 140
• Before when we were considering the distribution around
y we were considering the distribution of Y
• Now we are considering Y as a point estimator for y
• The difference is that the distribution for Y has a variance
of 2/n where as Y has a variance of 2
• Having obtained an estimate of a parameter (y), and
considered the properties of the estimator (BLUE), we
need to find out how ‘good’ the estimate is. Estimation is
the first side of statistical inference.
• The other side of statistical inference: hypothesis testing
Lecture 6
4
Confidence Intervals
Econ 140
• Recall our picture showing the distributions of Y and Y
2
n
2
y
Y
Y
• You repeatedly take samples from the population and get
different estimates of Y
– Sampling distribution is the probability distribution for
the values Y in different samples (of a given size).
Lecture 6
5
Confidence Intervals (2)
Econ 140
• How do we assign probability bounds on our estimate?
• We don’t know what µy is, but we know the sample size
and the sample estimates of Y
• We can estimate µy give or take some amount of error
Y  Y  allowance for random error
• We know that Yis distributed Y ~ N (Y , s 2 n)
– We use s2 as an estimate of 2
– Our distribution of Y:
2
n
Lecture 6
y
6
Confidence Intervals (3)
Econ 140
• Remember: (as a rule of thumb - more precision later)
– Large samples: use the Z distribution
– Small samples: use the t distribution
• We’ll use the Z distribution for this example
– Our expression for the Z statistic is
Y  Y
Z
Z (s
Z (s
s
n
~ N (0,1)
n )  Y  Y
n )  Y   Y
Y  Y  Z ( s
Lecture 6
n)
7
Confidence Intervals (4)
Econ 140
• We have the standard normal distribution around µy
-Z
+Z
y
• We want to describe how much area is between -Z and +Z
• We can create a 95% confidence interval around Z
Lecture 6
8
Confidence Intervals (5)
Econ 140
• We can write the confidence interval as


Y  Y
Pr  1.96 
 1.96
s n


• Where did we get the values -1.96 and +1.96?
– Look at the standard normal table
– We see that 47.5% of the area under the curve can be
found between 0 and 1.96. Or 95% between +/- 1.96
47.5%
Lecture 6
-1.96
47.5%
0
+1.96
9
Confidence Intervals (6)
Econ 140
• So we can rewrite


Y  Y
Pr  1.96 
 1.96
s n


 1.96s n   Y  Y  1.96s
Y  Y  1.96s
n
n 
• This is the confidence interval estimate for µy at a 95%
level of confidence
– You can choose other levels of confidence
– As you increase the confidence level you increase the
range of possible values µy can take
Lecture 6
10
Using the t distribution
Econ 140
• If we have a small sample, we should use the t distribution
• Our t statistic looks like
t
Y  Y
s n
• What will our confidence interval look like?
– We substitute t for Z
Y  Y  t 2 s n 
• We don’t know the underlying population distribution
– But we can use the central limit theorem to assume that
the sample distribution is approximately normal
– We can use the t distribution to approximate the
distribution of sample means.
Lecture 6
11
Using the t distribution (2)
Econ 140
• We have to choose the confidence interval (1- ) that
requires a choice of 
• The area between the two t values is the confidence interval
-t
+t
Confidence interval
y
• The usual accepted confidence level is 95% ( = 0.05)
Lecture 6
12
Using the t distribution (3)
Econ 140
• If (1- ) is the area between the two t values, then () is
the sum of the area under the two tails
– if =0.95, (1- )=0.05
– 0.05/2 = .025
– So for a 95% confidence level, 0.025 of the area of the
curve is found in each tail of the distribution
Lecture 6
13
The t Table
Econ 140
• In the first row, there is an upper number and a lower
number
– The upper gives you the area in one tail given a two tail
test
– The lower number gives the area in one tail or in two
tails combined
• At an infinite number of observations, 2.5% of the area
under the curve is found in each of the tails when our t
statistic is 1.96 - it approximates the normal
• If our sample size is 10, 95% of the area under the t
distribution is between -2.228 and +2.228
– Note: the t has fatter tails than the standard normal
Lecture 6
14
The t Table (2)
Econ 140
• For a small sample size, the t values corresponding to a
95% confidence interval are larger in absolute value than
the Z values for the same interval
• Depending on 2 things we get a very different
approximation of the confidence interval
– Sample size
– Whether or not we know the population value for 
Lecture 6
15
Hypothesis Testing
Econ 140
• We want to ask:
– What is the probability that µy is equal to some value?
• Using hypothesis testing we can determine whether or not
it’s plausible that µy equals a certain value
• We have two types of sample (approximate rules)
– Large: n > 30
– Small: 30  n
Lecture 6
16
Large Samples
Econ 140
• Large samples
– Doesn’t matter if the population distribution is skewed
or normal
– Doesn’t matter if the population variance is known (use
) or unknown (calculate s - the sample estimate of the
standard deviation)
– Use the Z table
Lecture 6
17
Small Samples
Econ 140
• Small samples
– If the population is normally distributed and the
population variance is known, use the Z table
– If the population is normally distributed but the
population variance is unknown use the t distribution
with n-1 degrees of freedom (calculate the sample
variance as an estimate of the population).
– If the population is non-normally distributed, use
neither the t nor the Z (I will never give you a case like
this)
Lecture 6
18
Setting Up Hypotheses
Econ 140
• In hypothesis testing you set up a null hypothesis H0
• Under the null hypothesis µy will take a particular value
– Example: we can create a null such that
H° : µy = 300
• Once we have a null hypothesis we can set up an
alternative hypothesis H1
Lecture 6
19
One and Two - Tailed Tests
Econ 140
• We can represent this in the following graph:
s
n
 y  300
• One-tail tests
– We calculate the area in the right-hand tail if
H1 : µy > 300
– We calculate the area in the left-hand tail if
H1 : µy <300
• Two tail test:
– Find the area under both tails if H1 : µy  300
Lecture 6
20
Intervals and Regions
Econ 140
• We also need to assign a significance level (or confidence
interval)
• For a two-tailed test we are looking to see if a value of 300
lies within the confidence interval
• With hypothesis tests we are creating an acceptance region
bound by critical values
– Critical values are taken off the Z and t tables
– The regions in the tails are the critical regions
Lecture 6
21
Intervals and Regions (2)
Critical
value
Critical region
/2
Econ 140
Critical
value
1-
Acceptance Region
Critical region
/2
•  is the significance level
• If you fail to reject the null, the Z or t statistic must fall in
the acceptance region
• If you reject the null, the Z or t must fall in one of the
critical regions
Lecture 6
22
Types of Errors
Econ 140
• Type I errors
– Rejecting a hypothesis when it is in fact true
– Example:
In the confidence interval example we constructed the
confidence interval (254  y  380). If the true pop.
mean is 400 we can make H0 : y = 400. In this case
we’d falsely reject the null hypothesis!
• Type II errors
– Not rejecting a false hypothesis
– Example: if the true mean is 400 but we do not reject
H0 : y =300 we would not be rejecting a false
hypothesis
Lecture 6
23
Types of Errors (2)
Econ 140
• Statisticians worry about Type I errors
– They choose a significance level  that minimizes Type
I errors
• To minimize Type I errors choose a small , where  is the
total area in both tails
– Thus the area in each tail is /2

Lecture 6
24
Types of Errors (3)
Econ 140
• As  decreases, the likelihood of rejecting a true null
hypothesis also decreases
• Most of the time  = 5% is used, and /2 = 2.5%
• We can say that we do not reject or reject the null, but we
can’t say that we accept the alternative!
• Examples:
Lecture 6
25
Hypothesis Testing in General
• Null (H0):
Y  0
Alternative (H1)
 Y   0 right tail
 Y   0 left tail
Y   0
Econ 140
two tail
Critical Region
Z
Z
Y  0

n
Y  0
 Z
 Z
 n
Y  0
Z 
 Z 2
 n
• If you are using the t instead, replace the Z’s with t’s
Lecture 6
26
Where are we now?
Econ 140
• So far we have learned about inference and testing
hypotheses using assumptions about distributions
• Distributions
– We had samples and populations and used weights to
make inferences about the population using sample
statistics
– We assumed distributional forms such as the Z or t
distributions
• Sampling distribution of the mean
– You should know the difference between E(Y) and E (Y )
Lecture 6
27
Where are we now? (2)
Econ 140
• BLUE: we’ll return to this in the next lecture
• Estimation and hypothesis testing
• We now look to return to the regression line and consider
the estimators for a and b from:
Yˆi  a  bX i
ei  Yi  Yˆi
• Have to consider the properties of the OLS estimator
(BLUE), and how do we construct hypothesis tests on the
estimates of the parameters a and b?
Lecture 6
28