Lecture Notes 13

Download Report

Transcript Lecture Notes 13

Lecture 13
Dustin Lueker
Inferential statistical methods provide
predictions about characteristics of a
population, based on information in a sample
from that population
◦ Quantitative variables
 Usually estimate the population mean
 Mean household income
◦ Qualitative variables
 Usually estimate population proportions
 Proportion of people voting for candidate A
STA 291 Summer 2008 Lecture 13
Point Estimate
◦ A single number that is the best guess for the
 Sample mean is usually at good guess for the
population mean
Interval Estimate
◦ Point estimator with error bound
 A range of numbers around the point estimate
 Gives an idea about the precision of the estimator
 The proportion of people voting for A is between 67%
and 73%
STA 291 Summer 2008 Lecture 13
A point estimator of a parameter is a sample
statistic that predicts the value of that parameter
A good estimator is
◦ Unbiased
 Centered around the true parameter
◦ Consistent
 Gets closer to the true parameter as the sample size gets
◦ Efficient
 Has a standard error that is as small as possible (made use
of all available information)
STA 291 Summer 2008 Lecture 13
An estimator is unbiased if its sampling
distribution is centered around the true
◦ For example, we know that the mean of the
sampling distribution of x equals μ, which is the
true population mean
 So, x is an unbiased estimator of μ
 Note: For any particular sample, the sample mean
be smaller or greater than the population mean
 Unbiased means that there is no systematic
underestimation or overestimation
STA 291 Summer 2008 Lecture 13
A biased estimator systematically
underestimates or overestimates the
population parameter
◦ In the definition of sample variance and sample
standard deviation uses n-1 instead of n, because
this makes the estimator unbiased
◦ With n in the denominator, it would systematically
underestimate the variance
STA 291 Summer 2008 Lecture 13
An estimator is efficient if its standard error is
small compared to other estimators
◦ Such an estimator has high precision
A good estimator has small standard error
and small bias (or no bias at all)
◦ The following pictures represent different
estimators with different bias and efficiency
◦ Assume that the true population parameter is the
point (0,0) in the middle of the picture
STA 291 Summer 2008 Lecture 13
Note that even an
unbiased and efficient
estimator does not
always hit exactly the
population parameter.
But in the long run,
it is the best estimator.
STA 291 Summer 2008 Lecture 13
Inferential statement about a parameter
should always provide the accuracy of the
◦ How close is the estimate likely to fall to the true
parameter value?
 Within 1 unit? 2 units? 10 units?
◦ This can be determined using the sampling
distribution of the estimator/sample statistic
◦ In particular, we need the standard error to make
a statement about accuracy of the estimator
STA 291 Summer 2008 Lecture 13
Range of numbers that is likely to cover (or
capture) the true parameter
Probability that the confidence interval
captures the true parameter is called the
confidence coefficient or more commonly the
confidence level
◦ Confidence level is a chosen number close to 1,
usually 0.90, 0.95 or 0.99
◦ Level of significance = α = 1 – confidence level
STA 291 Summer 2008 Lecture 13
To calculate the confidence interval, we use
the Central Limit Theorem
◦ Substituting the sample standard deviation for the
population standard deviation
Also, we need a  / 2 that is determined by
the confidence level
Formula for 100(1-α)% confidence interval
for μ
x  Z / 2
STA 291 Summer 2008 Lecture 13
90% confidence interval
◦ Confidence level of 0.90
 α=.10
 Zα/2=1.645
95% confidence interval
◦ Confidence level of 0.95
 α=.05
 Zα/2=1.96
99% confidence interval
◦ Confidence level of 0.99
 α=.01
 Zα/2=2.576
STA 291 Summer 2008 Lecture 13
x  Z / 2
This interval will contain μ with a 100(1-α)%
◦ If we are estimating µ, then why it is unreasonable
for us to know σ?
 Thus we replace σ by s (sample standard deviation)
 This formula is used for large sample size (n≥30)
 If we have a sample size less than 30 a different
distribution is used, the t-distribution, we will get to this
STA 291 Summer 2008 Lecture 13
Compute a 95% confidence interval for μ if we
know that s=12 and the sample of size 36
yielded a mean of 7
STA 291 Summer 2008 Lecture 13
“Probability” means that in the long run
100(1-α)% of the intervals will contain the
◦ If repeated samples were taken and confidence
intervals calculated then 100(1-α)% of the intervals
will contain the parameter
For one sample, we do not know whether the
confidence interval contains the parameter
The 100(1-α)% probability only refers to the
method that is being used
STA 291 Summer 2008 Lecture 13
Incorrect statement
◦ With 95% probability, the population mean will fall
in the interval from 3.5 to 5.2
To avoid the misleading word “probability” we
say that we are “confident”
◦ We are 95% confident that the true population mean
will fall between 3.5 and 5.2
STA 291 Summer 2008 Lecture 13
Changing our confidence level will change
our confidence interval
◦ Increasing our confidence level will increase the
length of the confidence interval
 A confidence level of 100% would require a confidence
interval of infinite length
 Not informative
There is a tradeoff between length and
◦ Ideally we would like a short interval with high
accuracy (high confidence level)
STA 291 Summer 2008 Lecture 13