Transcript PPT 4

Presentation 4
First Part
Introduction to Inference:
Confidence Intervals
and Hypothesis Testing
What is inference?
Inference is when we use a sample to make conclusions
about a population.
1. Draw a Representative
SAMPLE from the POPULATION
2. Describe the SAMPLE90
Var 1
Var 2
Va 3
80
70
459
Brown
28
657
Red
43
60
50
40
30
321
Green
46
213
Blue
47
536
Blue
53
3. Use Rules of Probability and
Statistics to make Conclusions about
the POPULATION from the SAMPLE.
East
West
North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr
4th Qtr
Population Parameters
p = population proportion
 µ = population mean
 σ = population standard deviation
 β1 = population slope (we will see later)

Sample Statistics

p̂ = sample proportion

x = sample mean
s = sample standard deviation
 b1 = sample slope (we will see later)

Two Types of Inference
1.
Confidence Intervals:
– Confidence Intervals give us a range in which the
population parameter is likely to fall.
– We use confidence intervals whenever the research
question calls for an estimation of a population
parameter.
Example: Estimate the proportion of US adult
women who would vote for Hillary Clinton as
president.
Example: What is the mean age of trees in the
forest?
Two Types of Inference, Cont
2. Hypothesis Testing:
– Hypothesis tests are tests of population
parameters.
Example: Is the proportion of US adult women who
would vote for Hillary Clinton greater than 50%?
– We can only prove that a population parameter is
‘different’ than our null value. We cannot prove
that a population parameter is equal to some value.
Example:
Valid Hypothesis: Is the mean age of trees in the
forest greater than 50 years?
Invalid Hypothesis: Is the mean age of trees in the
forest equal to 50 years?
Types of CI’s and Hypothesis Tests
For Hypothesis Tests and C.I.’s:
 1-proportion (1-categorical variable)
 1-mean (1-quantitative variable)
 Difference in 2 proportions (2-categorical variables,
both with 2 possible outcomes)
 Difference in 2 means (1-quantitative and 1categorical variable, or 2-quantitative variables,
independent samples)
 Regression, Slope (2-quantitative variables)
For Hypothesis Tests only:
 Chi-Square Test (2-categorical variables, at least one
with 3 or more levels!)
Some Examples

Polina wants to estimate the mean high-school GPA of
incoming freshman at FIT.
Solution- CI for one population mean.

Pampos wants to know if the proportion of PSU
students who engage in under age drinking is greater
than 25%.
Solution- Hypothesis test of one proportion
Null Hypothesis:
Alternative Hypothesis:

H0: p ≤ .25
Ha: p > .25
Isaac wants to estimate the difference in the proportion
of men and women who smoke.
Solution- CI for difference in 2-proportions.
Interpreting Confidence Intervals

Given the confidence level, 90%, 95%, 99%, etc.
conclude the following (let L= confidence level):
“With L% confidence the population parameter is
within the confidence interval.”
Example: Suppose the 90% CI for age of trees in the
forest is (32,45) years.
We are 90% confident that the true mean age of trees in
the forest is between 32 and 45 years.
Interpreting Hypothesis Tests



There are two hypotheses, the null and the alternative.
The research aim is to to prove the alternative hypothesis
significant.
Use the p-value to determine whether we can reject the
null hypothesis (H0).
At this point we don’t need to know the exact definition, or
how to calculate the p-value. But generally, the p-value is
a measure of how consistent the data is with the
null hypothesis. A small p-value (<.05) indicates the
data we obtained was UNLIKELY under the null hypothesis.
Decision Rule:
If the p-value is <.05 we REJECT the null hypothesis, and accept the
alternative. We have a statistically significant result!
If the p-value is >.05 then we say that we do NOT have enough
evidence in the data to reject the null hypothesis.
Second Part
Confidence Intervals
for 1-Proportion
Sample Proportion
 Mean for
p̂
StdDev for
= E(p̂) = p
p̂ = s.d.(p̂ )
Standard Error of
=
p(1  p)
n
p̂ = s.e.( p̂)
=
pˆ (1  pˆ )
n
 If np and n(1-p) are greater than or equal to 10, the
sampling distribution of p̂ is approximately
normal with mean p and standard deviation
 p(1  p) 
pˆ ~ N  p,

n


approx
p(1  p)
i.e.
n
From Sampling Distributions to
Confidence Intervals…




The sample proportion will fall close to the true
(unknown) proportion.
Thus, the true proportion is likely to be close to the
observed sample proportion. How close?
95% of the p̂ would be expected to fall within ± 2
standard deviations of the true proportion p.
SO if we were to construct intervals around the sample
proportion with a width of ± 2 standard deviations
these intervals would contain the TRUE population
proportion 95% of the time!
Margin of Error & C.I.


p̂ is an estimator of p but it is not exactly equal to p.
But how far is p̂ from p? Or, how far is p from p̂?

Margin of Error is a measure of accuracy providing a
likely upper limit for the difference between p̂ and p.

In other words, this difference is almost always less
than the Margin of Error, i.e.
| pˆ  p | Margin of Error almost always


The almost always is translated “with large probability”.
Usually we are talking about 90%, 95% or 99%
probability.
Margin of Error & C.I., Cont
This probability is the confidence level.
 For example, if the confidence level is 95%, it means
that 95% of the times the difference between p̂ and p is
less than the Margin of Error. (e.g. we expect 38 out of
40 samples to give a p̂ such that its difference with p is
less than the Margin of Error.)
 Example: Based on a sample of 1000 voters, the
proportion of voters who favor candidate A are 34% with
a 3% Margin of Error based on a 95% confidence level.
What does this tell us?

Confidence Interval for 1-proprtion



Conditions: We need to have npˆ  10 and n(1  pˆ )  10
Note that we are using p̂ instead of p here!
CI for p: p
ˆ  M * s.e.( pˆ )
Margin of Error
– M = multiplier, depends on the level of confidence desired. For a
95% CI the multiplier is ~ 2.
– SE(p̂) is the standard error of the sample proportion.
– Margin of Error = the multiplier times the SE

Interpretation:
If M=2, we are 95% confident that the true population
proportion is contained within the confidence interval.
Example 1: A sample of 1200 people is polled to determine
the percentage that are in favor of candidate A. Suppose
580 say they are in favor. Construct a 95% CI for the true
population proportion.
95% C.I. for pˆ : pˆ  M * s.e.( pˆ )
580
pˆ 
 .483
1200
pˆ (1  pˆ )
.483(1  .483)
s.e.( pˆ ) 

 .0144
n
1200
M  2 (because we want 95% C.I.)
So 95% CI for pˆ is .483  2(.0144) = (.455,.512)
Conclusion: We are 95% confident that the true population
proportion of those who support candidate A is between
45.5% and 51.2%.
Example 2:





300 high-risk patients received an experimental AIDS
vaccine. The patients were followed for a period of 5 years
and ultimately 53 came down with the virus. Assuming all
patients were exposed to the virus construct a 99% CI for
the proportion of individuals protected.
99% CI = p̂ ± M*SE( p̂)
p̂ =
247/300 = .823
ˆ (1  p
ˆ)
p
SE(p̂ ) =
= sqrt(.823*(1-.823)/300) = .0220
n
M = 2.58
Can you see why M=2.58 using the Normal table?
So 99% CI = .823 +/- 2.58*.0220 = (.767,.880)
We are 99% confident that the true proportion of those protected by
the vaccine is between 76.7% and 88.0%.
Width of a Confidence Interval is
affected by:
n
as the sample size increases the standard
error of p̂ decreases and the confidence interval
gets smaller. So a larger sample size gives us a
more precise estimate of p.
M
as the confidence level increases, M
the multiplier increases leading to a wider
confidence interval.
So, if we want to control the length of the C.I. we can
adjust the confidence level or the sample size...