Hypoth Testing

Download Report

Transcript Hypoth Testing

Chapter 8:
Introduction to Statistical
Inference
April 16
In Chapter 8:
8.1 Concepts
8.2 Sampling Behavior of a Mean
8.3 Sampling Behavior of a Count and
Proportion
§8.1: Concepts
Statistical inference is the act of generalizing
from a sample to a population with calculated
degree of certainty.
We are
curious
about
parameters
in the
population
We
calculate
statistics
in the
sample
Parameters and Statistics
It is essential to draw the distinction between
parameters and statistics.
Parameters
Statistics
Source
Population
Sample
Calculated?
No
Yes
Constant?
Yes
No
Notation (examples)
μ, σ, p
x , s, pˆ
§8.2 Sampling Behavior of a Mean
• How precisely does a given sample mean
reflect the underlying population mean?
• To answer this question, we must
establish the sampling distribution of xbar
• The sampling distribution of x-bar is the
hypothetical distribution of means from all
possible samples of size n taken from the
same population
Simulation Experiment
• Population: N = 10,000 with lognormal
distribution (positive skew), μ = 173, and σ
= 30 (Figure A, next slide)
• Take repeated SRSs, each of n = 10, from
this population
• Calculate x-bar in each sample
• Plot x-bars (Figure B , next slide)
A. Population (individual values)
B. Sampling distribution of x-bars
Findings
1. Distribution B is Normal
even though Distribution
A is not (Central Limit
Theorem)
2. Both distributions are
centered on µ
(“unbiasedness”)
3. The standard deviation of
Distribution B is much
less than the standard
deviation of Distribution A
(square root law)
Results from Simulation
Experiment
•
•
•
Finding 1 (central limit theorem) says the
sampling distribution of x-bar tends toward
Normality even when the population
distribution is not Normal. This effect is strong
in large samples.
Finding 2 (unbiasedness) means that the
expected value of x-bar is μ
Finding 3 is related to the square root law
which says:
x 

n
Standard Deviation (Error) of the
Mean
• The standard deviation of the sampling
distribution of the mean has a special
name: it is called the “standard error of the
mean” (SE)
• The square root law says the SE is inversely
proportional to the square root of the sample
size:
 x  SE x 

n
Example, the Weschler Adult
Intelligence Scale has σ = 15
For n = 1  SEx 

For n = 4  SEx 

15
n
 15
1


n
For n = 16  SEx 
15
4

n
 7.5

15
 3.75
16
Quadrupling the sample size cut the SE in half
Square root law!
Putting it together: x ~ N(µ, SE)
• The sampling distribution of x-bar is Normal with
mean µ and standard deviation (SE) = σ / √n
(when population Normal or n is large)
• These facts make inferences about µ possible
• Example: Let X represent Weschler adult
intelligence scores: X ~ N(100, 15).
 Take an SRS of n = 10
 SE = σ / √n = 15/√10 = 4.7
 xbar ~ N(100, 4.7)
 68% of sample mean will be in the range
µ ± SE = 100 ± 4.7 = 95.3 to 104.7
Law of Large Numbers
• As a sample gets
larger and larger,
the sample mean
tends to get closer
and closer to the μ
• This tendency is
known as the Law
of Large Numbers
This figure shows results from
a sampling experiment in a
population with μ = 173.3
As n increased, the sample
mean became a better
reflection of μ = 173.3
8.3 Sampling Behavior of Counts
and Proportions
• Recall (from Ch 6) that binomial random variable
represents the random number of successes (X)
in n independent “success/failure” trials; the
probability of success for each trial is p
• Notation X~b(n,p)
• The sampling distribution X~b(10,0.2) is shown
on the next slide: μ = 2 when the outcome is
expressed as a count and μ = 0.2 when the
outcome is expressed as a proportion.
Normal Approximation to the
Binomial
• When n is large, the binomial distribution
takes on Normal properties
• How large does the sample have to be to
apply the Normal approximation?
• One rule says that the Normal
approximation applies when npq ≥ 5
Top figure:
X~b(10,0.2)
npq = 10 ∙ 0.2 ∙ (1–0.2) =
1.6 (less than 5) →
Normal approximation
does not apply
Bottom figure:
X~b(100,0.2)
npq = 100 ∙ 0.2 ∙ (1−0.2)
= 16 (greater than 5) →
Normal approximation
applies
Normal Approximation for a
Binomial Count
  np and   npq
When Normal approximation applies:

X ~ N np, npq

Normal Approximation for a
Binomial Proportion
  p and  

pˆ ~ N  p,


pq 
n 
pq
n
“p-hat” is the symbol for the sample proportion
Illustrative Example: Normal
Approximation to the Binomial
• Suppose the
prevalence of a risk
factor in a population is
p = 0.2
• Take an SRS of n =
100 from this
population
• A variable number of
cases in a sample will
follow a binomial
distribution with n = 20
and p = .2
Illustrative Example, cont.
The Normal approximation for the binomial count
is:
  np  100  .2  20
and   npq  100  .2  .8  4
X ~ N 20,4
The Normal approximation for the binomial
proportion is:
  p  .2
pq
.2  .8


 0.04
n
100
pˆ ~ N 0.2,0.04 
Illustrative Example, cont.
1. Statement of a problem: Suppose we see a
sample with 30 cases. What is the probability of
see at least 30 cases under these circumstance,
i.e., Pr(X ≥ 30) = ? assuming X ~ N(20, 4)
2. Standardize: z = (30 – 20) / 4 = 2.5
3. Sketch: next slide
4. Table B: Pr(Z ≥ 2.5) = 0.0062
Illustrative Example, cont.
Binomial and superimposed Normal sampling
distributions for the problem.
Our Normal approximation
suggests that only .0062
of samples will see at
least this many cases.