Lecture 3: Review

Download Report

Transcript Lecture 3: Review

Lecture 3:
Review
Review of Point and
Interval Estimators
Statistical Significance
Hypothesis Testing
Point Estimates


An estimator (as opposed to an estimate) is a
sample statistic that predicts a value of a parameter.
For example, the sample mean and sample standard
deviation are estimators.
A point estimate is a particular value of an estimator
used to predict the population value. For example,
an estimate of the population mean from the sample
mean may be “12”.
Point Estimates
2 characteristics of point estimators are their
efficiency and bias.
 An efficient estimator is one which has the
lowest standard error, relative to other
estimators
 An estimator is biased to the extent that the
sampling distribution is not centred around
the population parameter.
Efficiency and Bias
Efficient Estimator
Unbiased, but inefficient Estimator

Sampling Distributions
Biased Estimator
Point Estimates
y   yi / n
  Yi / N
y  ̂   yi / n
Sample Mean
Population Mean
Sample Mean as a point
estimate of population
mean
Point Estimates
s
2


y

y
 i
 
s  ˆ 
Sample Standard Deviation
n
 Y
i
Population Standard
Deviation
 
2
N
 y
 y
2
i
n 1
Sample Standard
Deviation as estimate of
Population Standard
Deviation
Interval Estimates
Confidence intervals for a mean
 A confidence interval is an interval estimate
around a mean. A confidence interval is range
of values in which the mean (or other statistic,
such as a proportion) has a certain probability
of falling.
95% Confidence Interval of a Mean
Sampling Distribution of Means
z  1.96
95% of the
area
z  1.96

x
95% Confidence Interval
2.5% of the
area
Confidence Interval of a Mean
C.I .  y  Zˆ y
ˆ y 
s
n
Z is the z-score
corresponding to a .the
confidence probability;
and ˆ y is an estimate of the
standard error of the mean
(for large samples);
s is the sample standard
deviation.
Confidence Interval of a Mean
Example: you wish to
estimate the average
height for all Canadian
men 18-21 from
sample of 50 with
mean height of 166 cm
and standard deviation
of 30 cm. Calculate
the 95% confidence
interval for the mean.
ˆ y 
s
n

30
50
 4.24
C.I .(95)  y  Zˆ y
C.I .(95)  166  1.96(4.24)
 166  7.03
 159173
We are 95% confident that the
true population mean falls in the
interval from 159cm to 173cm.
Point Estimates for Proportions
X
p
n
The sample proportion of
individuals in category x
X
 
N
The population proportion of
individuals with characteristic x
X
ˆ  p 
n
Sample proportion is an
estimator of the population
proportion
Confidence Intervals for Proportions
s
ˆ ˆ 
ˆ
n
p1  p 
n

The standard deviation
of the sample
probability distribution
ˆ 1  ˆ 
The standard error of the
sample proportion
n
ˆ  Zˆˆ    Z
ˆ 1  ˆ 
n
confidence interval for
a proportion
Confidence Intervals for Proportions
Example:
Consider a poll
predicting election
results with a sample
size of 1500. Of those
who responded to a
question about their
voting intentions in an
upcoming election, 840
(56%) answered that
they would vote for the
current governing
party.
If we use the sample proportion as an
estimator of the population proportion,
then
ˆ  840 / 1500  0.56
and the proportion that would not vote
for the governing party;
1     1  0.56  0.44
Confidence Intervals for Proportions
Example:
Consider a poll
predicting election
results with a sample
size of 1500. Of those
who responded to a
question about their
voting intentions in an
upcoming election, 840
(56%) answered that
they would vote for the
current governing
party.
The standard error of the estimated
proportion is then;
ˆˆ 
ˆˆ 
ˆ 1  ˆ 
n
0.560.44
1500
ˆˆ  .0002  .0128
Confidence Intervals for Proportions
Example:
Consider a poll
predicting election
results with a sample
size of 1500. Of those
who responded to a
question about their
voting intentions in an
upcoming election, 840
(56%) answered that
they would vote for the
current governing
party.
The 99% confidence interval is:
C.I .99  ˆ  2.58 ˆ
C.I .99  0.56  2.58(.0128)
 0.56  0.033
We are 99% confident that the true
population proportion lies between 0.527
and 0.593.
Significance Tests
Significance Tests
Elements of significance tests:
(1) Assumptions:
- Type of data
- Distributions
- Sampling
- Sample size
Significance Tests
Elements of significance tests:
(2) Hypotheses:
- H0: There is no effect, relationship, or
difference
- Ha: There is an effect, difference, relationship
Significance Tests
Elements of significance tests:
(3) Test Statistic
(4) Significance level, critical value
(5) Decision whether or not to reject H0
Large Sample Z-Test for a Mean
Example:
You sample 400 high school teachers and
submit them to an IQ test, which is known to
have a population mean of 100. You wish to
know whether the sample mean IQ of 115 is
significantly different from the population mean.
The sample s.d. is 18.
Large Sample Z-Test for a Mean
Assumptions: A z-test for a mean assumes
random sampling and n greater than 30.
Hypotheses:
H 0 :   100
H a :   100
Large Sample Z-Test for a Mean
Test Statistic:
y  0
z
ˆ y
z
z
y  0
s
n
115  100
18 400
z test  16.66
Large Sample Z-Test for a Mean
Significance Level and Critical Value:
α = .05, .01, or .001
Find the appropriate critical values (z-scores) from the
table of areas under the normal curve.
For a 2-tailed z-test at α=.05 zcritical = 1.96.
Decision:
Because ztest is higher than zcritical, we will reject the null
hypothesis and conclude that high school teachers do
have significantly higher IQ scores than the general
population
Critical Values


While in 231 we always found a critical value
for a test statistic given our determined αlevel, and compared it to our observed value,
we can instead find the probability of
observing the test statistic value, and
compare it to our determined α-level.
Computer output gives us the exact
probability, rather than a test statistic
Large-Sample Z-tests for proportions
Example:
You randomly sample 500 cattle and dairy
farmers in Saskatchewan, asking them whether
they have enough stored feed to last through a
summer of severe drought. 275 report that
they do not. You know that of the total
population (all Canadian farmers), 50% do not
have adequate stored feed. Is the proportion of
Saskatchewan farmers that is unprepared
significantly higher?
Large-Sample Z-tests for proportions
Assumptions:
Z-tests assume that the sample size is large enough
that the sampling distribution will be approximately
normal. In practice this means 30 cases or more.
Random sampling is also assumed.
Hypotheses:
H0 :   0
Ha :   0
H 0 :   0.50
H a :   0.50
Large-Sample Z-tests for proportions
Test Statistic:
ˆ   0
ˆ   0
z

ˆ ˆ
 0 1   0 
n
z
z
z test
0.55  0.50
.50(1  .50)
500
0.50
.25
500
 2.231
Large-Sample Z-tests for proportions
Significance level or critical value:
If we look at the table of normal curve probabilities for
the probability of finding a zcritical value of 2.23, we find
that the one-tail probability is about 0.0129.
This is greater than the .01 level, so we would fail to
reject the null. However, it is less than .05.
Conclusion:
A significantly higher proportion of Saskatchewan
farmers reported that they did not have enough stored
feed to withstand a drought (p=.00129).
Small-Sample Inference for Means
The t-distribution
The t-distribution is another bell-shaped
symmetrical distribution centred on 0. The tdistribution differs from the normal distribution
in that as sample size decreases, the tails of
the t-distribution become thicker than normal.
However, when n ≥ 30, the t-distribution is
practically the same as the normal distribution.
Small-Sample Inference for Means
Example:
Suppose we have a sample of 25 young offenders
whom we have interviewed. We are interested in the
effects of being born to young parents. The average
age of the mothers at the birth of the future young
offenders was 20.8, with a s.d. of 3.5 years. The
average age at first birth for women in Canada in 1990
was 23.2 years.
Is the average age of the mothers of young offenders
different than that of mothers in the general Canadian
population?
Small-Sample Inference for Means
Assumptions:
It is assumed that the sample is SRS, from a
population that is normally distributed (but it is
pretty robust to violations of this assumptions).
Hypotheses: H 0 :    0
H 0 :   23.2
H a :   0
H a :   23.2
Small-Sample Inference for Means
Test Statistic:
y  0 y  0
t

ˆ y
s/ n
t actual  2.43
t
20.8  23.2
3.5
25
Small-Sample Inference for Means
Significance level or critical value:
We look for the probability of finding a t-value of -2.43,
with (n-1) degrees of freedom.
If our decided P=value is .05, we look for the tcritical
value associated with a single-tail probability of .025
(half of the total 2-tailed p-value).
tcritical at p=.05 (2-tailed), 24 df = 2.064.
Small-Sample Inference for Means
Decision:
We decide to reject the null hypothesis, because the
actual value of t is greater than our critical value of t
at (p=.05). This means that the observed difference
between the observed sample mean and the
hypothesized population mean is sufficiently great
that we are willing to conclude that the sample
comes from a population with a different mean age
at first birth than the Canadian population.
Significance Tests
Note that the form of the z and t- statistics are
similar;
z
y  0 y  0

ˆ y
s n
ˆ   0
ˆ   0
z

 ˆ
 0 1   0 
Large-sample test for means
Large-sample test for proportions
n
t
y  0 y  0

ˆ y
s/ n
df  (n  1)
Small-sample test for means
Decisions and Types of Errors
“Confusion” Matrix
H0 is true
H0 is false
Reject H0
Type I error
(α)
Correct decision
(1-β) or “Power”
Fail to
Reject H0
Correct decision
(1-α)
Type II error
(β)
Type I and Type II Errors
Distribution under H0
Distribution under Ha

P-level


a