sampling distribution

Download Report

Transcript sampling distribution

Today in Class
• Last time we discussed statistical reasoning
and Type I and Type II errors
• Today we’ll discuss Type I and Type II
errors in more depth
• We’ll also discuss the necessity of sampling
distributions and how to find the sampling
distribution for a sample proportion
Hypothesis Testing Example
• I know I have 5 eggs,
but I don’t know if
they’re good or bad.
• I’ll make a guess that
3 are good.
• Then I can get all
possible samples of 3
from that scenario.
• I note that for this
hypothetical pop, it is
impossible to get 3
bad eggs out of 3.
• It is also unlikely (but
still possible) to get 3
good eggs out of 3.
• I’ll take a real sample,
if I get either of these
cases, I won’t believe
the hypothesized pop.
Type I and Type II Errors
• Recall that a Type I error is rejecting a true
null hypothesis.
• If the null hypothesis (3/5 good eggs) is
true, my decision rule will reject this
hypothesis for 1/10 samples. Therefore, the
probability of a Type I error is 0.10.
• Type II errors depend on what the true
population is.
Type I and Type II Errors
• If there are no bad eggs in the pop of 5, then
all sample of 3 will have all bad eggs. I’ll
reject the null hypothesis - correct decision.
In this case, I can’t make a Type II error.
• If there is 1 bad egg in the pop of 5, then of
the 10 possible samples, 6 samples have at
least one bad egg and at least one good egg.
I’ll fail to reject the false null hypothesis, and
make a Type II error. Thus for this case, I
have a 0.6 probability of a Type II error.
Type I and Type II Errors
• If there are really 3 bad eggs in the pop of 5,
then there is one sample (of 10 possible
samples) for which I reject the null
hypothesis. Thus, the probability of a Type
II error is 0.90.
• If there are really 4 bad eggs in the pop of 5,
then there are 4 samples (of 10) for which I
will reject the null hypothesis. Probability
of a Type II is 0.60.
Type I and Type II errors
• If there are 5 bad eggs out of 5 in the pop,
then every sample has 3 bad eggs and I
reject the null hypothesis. Thus, the
probability of a Type II error is 0 for this
case.
• I’ll demonstrate this with the coin-flip
challenge.
Coin Flip Challenge
• I make the real flips
my null hypothesis,
because I can
characterize all the
possible sets of 200
flips and their
probabilities for real
flips
• I’ll make a decision
rule to decide whether
a set of 200 flips is
real or not.
Fail to Reject
Reject
Null: Real
Alt: Fake
Statistical Reasoning
• Since we must rely on samples to make
inference about the population, we want to
consider every possible sample from a
hypothetical population.
• The sampling distribution is the
characterization of a sample statistic based
on every possible sample from a
hypothetical population.
• Finding sampling distributions is central to
statistics.
Finding Sampling Distributions
Mathematical
• Use of mathematics
and systematic
reasoning to derive
sampling distribution
• Results in normal, t,
c2, and F distributions
(which we will study
later)
•
•
•
•
Simulation
Uses a computer to
mimick sampling process
Take 1000’s of samples
Relies on a sample of
samples
Mathematical approach
should be used whenever
possible
An Example of a Simulation
• To determine the distribution of the longest
run in 200 coin flips, I used a simulation
• Program to simulate flipping a fair coin 200
times
• Repeat the 200 flips 1000 times
• Note how often each run occurs.
Sampling Distribution of a Proportion
Samples of One
0.6
Likelihood
0.5
0.4
0.3
Likelihood
0.2
0.1
0
"0.0
"0.5
Sample Proportion
Samples of Two
0.6
0.5
Likelihood
• Suppose we’re drawing
from a very large
population and asking
person if they’re a
Democrat
• Suppose 50% are
Democrats
• If we ask just one person,
then we’ll get either a
“yes” or “no”
• Ask 2 people: (Y,Y),
(Y,N), (N,Y), (N,N)
0.4
0.3
Likelihood
0.2
0.1
0
"0.0
"0.5
Sample Proportion
"1.0
Sampling Dist. for Proportion
Samples of Three
Likelihood
0.3
0.2
Likelihood
0.1
0
"0.0
"0.33
"0.67
"1.00
Sample Proportion
Samples of Four
0.4
Likelihood
• Ask 3 people, you get
(YYY), (YYN),
(YNY), (YNN),
(NYY), (NYN),
(NNY), (NNN)
• Ask 4 people, continue
• Keep going and for a
large enough sample
you get a bell-shaped
curve!
0.4
0.3
0.2
Likelihood
0.1
0
"0.0
"0.25
"0.50
"0.75
Sample Proportion
"1.00
The Normal Distribution
• Symmetric and Bell-Shaped
• Total Area = 1 since it
covers all possible samples
• Characterized by two
quantities: the mean m and
the standard deviation s
• Represents all possible
samples for hypothetical
population
• The mean m is the center
• The sd s is how spread the
curve is
_
s
m
Increasing s makes the
curve shorter and fatter
Increasing m moves the
curve to the right
Areas represent probabilities
of certain samples for the
hypothetical population