The Structure of Research - University of Minnesota Duluth

Download Report

Transcript The Structure of Research - University of Minnesota Duluth

INFERENTIAL STATISTICS

Samples are only estimates of the population

Sample statistics will be slightly off from the
true values of its population’s parameters

Sampling error:


The difference between a sample statistic and a
population parameter
Probability theory

Permits us to estimate the accuracy or
representativeness of the sample
The “Catch-22” of Inferential Statistics
 When
we collect a sample, we know nothing
about the population’s distribution of scores
We can calculate the mean (x-bar) & standard
deviation (s) of our sample, but  and  are
unknown
 The shape of the population distribution (normal or
skewed?) is also unknown

Probability Theory Allows Us To Answer:
What is the likelihood that a given sample statistic
accurately represents a population parameter?
Sample
N = 150
μ = ??? (N= Thousands)
Number of serious crimes
committed in year prior
to prison for inmates entering
the prison system
X=9.6
Sampling Distribution
(a.k.a. “Distribution of Sample Outcomes”)
 “OUTCOMES”
= proportions, means, etc.
 From repeated random sampling, a
mathematical description of all possible
sampling event outcomes

And the probability of each one
 Permits
us to make the link between sample
and population…

Answer the question: “What is the probability
that a sample finding is due to chance?”
Relationship between Sample,
Sampling Distribution & Population
•Empirical (exists in reality)
but unknown
•Nonempirical (theoretical or
hypothetical)
Laws of probability allow us
to describe its characteristics
(shape, central tendency,
dispersion)
•Empirical & known
(e.g.,
distribution shape,
mean, standard
deviation)
POPULATION
SAMPLING DISTRIBUTION
(Distribution of sample
means, proportions, or other
outcomes)
SAMPLE
Sampling Distribution: Characteristics

Central tendency
Sample means will cluster around the population mean
 Since samples are random, the sample means should be
distributed equally on either side of the population mean
 The mean of the sampling distribution is always
equal to the population mean


Shape: Normal distribution

Central Limit Theorem:
 Regardless of the shape of a raw score distribution
(sample or population) of an interval-ratio variable,
the sampling distribution will be approximately
normal, as long as sample size is ≥ 100
Sampling Distribution: Characteristics
 Dispersion:

Standard Error (SE)
Measures the spread of sampling error that occurs
when a population is sampled repeatedly
Same thing as standard deviation of the sampling
distribution
 Tells exactly how much error, on average, should
exist between the sample mean & the population
mean
 Formula:
σ / √N
 However, because σ usually isn’t known, s
(sample standard deviation) is used to estimate
population standard deviation

Sampling Distribution

Standard Error

Law of Large Numbers: The larger the
sample size (N), the more probable it is that
the sample mean will be close to the
population mean


In other words: a big sample works better (should
give a more accurate estimate of the pop.) than a
small one
Makes sense if you study the formula for standard
error
1. Estimation
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Estimation
ESTIMATION
Hypothesis
Testing
Introduction to Estimation

Estimation procedures
 Purpose:

To estimate population parameters from sample
statistics

Using the sampling distribution to infer from a sample to
the population
 Most
commonly used for polling data
 2 components:
Point estimate
 Confidence intervals

Estimation


Point Estimate: Value of a sample statistic used to
estimate a population parameter
Confidence Interval: A range of values around the point
estimate
Confidence Interval
Point Estimate
.58
.546
Confidence Limit
(Lower)
.614
Confidence Limit
(Upper)
Example

CNN Poll (CNN.com; Feb 20, 2009): Slight majority thinks
stimulus package will improve economy

“The White House's economic stimulus plan isn't a surefire winner
with the American public, but a majority does think the recovery plan
will help. According to a new poll, fifty-three percent said the plan
will improve economic conditions, while 44 percent said it won't
stimulate the economy.”

“On an individual level, there was less hope for improvement.
According to the poll, 67 percent said it would not help them
personally.”

“The Poll was conducted Wednesday and Thursday (Feb 18-19,
2009), with 1,046 people questioned by telephone. The survey's
sampling error is plus or minus 3 percentage points.”
Estimation

POINT ESTIMATES


CONFIDENCE INTERVAL
a.k.a. “MARGIN OF ERROR”

Indicates that over the long
run, 95 percent of the time,
the true pop. value will fall
within a range of +/- 3
Point estimates & confidence
interval should be reported
together


(another way of saying sample
statistics)
“…but a majority does think
the recovery plan will help,
according to a new poll.
Fifty-three percent said the
plan will improve economic
conditions, while 44
percent said it won't
stimulate the economy.
…. The Poll was conducted
Wednesday and Thursday
(Feb 18-19, 2009), with
1,046 people questioned
by telephone. The survey's
sampling error is plus or
minus 3 percentage points.
Estimation1 : Pick
Confidence Level
 Confidence

Probability that the unknown population
parameter falls within the interval


LEVEL
Alpha ()
 The probability that the parameter is NOT within
the interval
  is the odds of making an error
 Confidence level = 1 - 
Conventionally, confidence level values are
almost always 95%or 99%
Procedure for Constructing an Interval Estimate
2. Divide the probability of error equally into the
upper and lower tails of the distribution (2.5%
error in each tail with 95% confidence level)
 Find
the corresponding Z score
0.95
.025
.025
-1.96
 Z scores 
1.96
Procedure for Constructing an
Interval Estimate
3. Construct the confidence interval

Proportions (like the eavesdropping poll example):



Sample point estimate (convert % to a proportion):
 “Fifty-three percent said the plan will improve economic
conditions…”
 0.53
Sample size (N) = 1,046
Formula 7.3 in Healey
 Numerator = (your proportion) (1- proportion)
 95% confidence level (replicating results from article)
 99% confidence level – intervals widen as level of
confidence increases
Example 1: Estimate for the
economic recovery poll
p = .53 (53% think it will help)
 Z = 1.96 (95% confidence interval)
 N = 1046 (sample size)
 What happens when we…

 Recalculate for N = 10,000
 N back to original, recalculate for p. = .90
 Back to original, but change confidence level to 99%
Example 2

Houston Chronicle (2008) — A University of Texas poll to be
released today shows Republican presidential candidate John
McCain and GOP Sen. John Cornyn leading by comfortable margins
in Texas, as expected. But the statewide survey of 550 registered
voters has one very surprising finding: 23 percent of Texans are
convinced that Democratic presidential nominee Barack Obama is a
Muslim.
 The Obama-is-a-Muslim confusion is caused by fallacious
Internet rumors and radio talk-show gossip. McCain went so far
at one of his town hall meetings to grab a microphone from a
woman who claimed that Obama was an Arab.
1.
2.
GIVEN THIS INFO, IDENTIFY A POINT ESTIMATE &
CALCULATE THE CONFIDENCE INTERVAL (ASSUMING A
95% CONFIDENCE LEVEL).
CALCULATE THE CONFIDENCE INTERVAL ASSUMING A 99%
CONFIDENCE LEVEL
A Good Estimate is Unbiased

Sample means and proportions (like the .53 [53%] &
.23 [23%]) are UNBIASED estimates of the
population parameters



We know that the mean of the sampling distribution = the
pop. Mean
Other sample statistics (such as standard deviation) are
biased
 The standard deviation of a sample is by definition
smaller than the standard deviation of the population
Bottom line: A good estimate is UNBIASED

Trustworthy estimator of the pop. parameter
A Good Estimate is Efficient

Efficiency



Refers to the extent to which the sampling distribution is
clustered about its mean
Efficiency depends largely on sample size—as the sample
size increases, the sampling distribution gets tighter (more
narrow)
 Remember from earlier—the sampling distribution is
only normal with N>100
BOTTOM LINE: THE LESS SPREAD (THE SMALLER
THE S.E.), THE BETTER
Estimation of Population Means

EXAMPLE:
A researcher has gathered information from a
random sample of 178 households. Construct a
confidence interval to estimate the population
mean at the 95% level:

An average of 2.3 people reside in each household.
Standard deviation is .35.
PROCEDURE FOR CONSTRUCTING AN
INTERVAL ESTIMATE

A random sample of 429 college students was interviewed



They reported they had spent an average of $178 on
textbooks during the previous semester. If the standard
deviation (s) of these data is $15 construct an estimate of
the population at the 95% confidence level.
They reported they had missed 2.8 days of class per
semester because of illness. If the sample standard
deviation is 1.0, construct an estimate of the population
mean at the 99% confidence level.
Two individuals are running for mayor of Duluth. You
conduct an election survey of 100 adult Duluth residents 1
week before the election and find that 45% of the sample
support candidate Long Duck Dong, while 40% plan to vote
for candidate Singalingdon.

Using a 95% confidence level, based on your findings, can you
predict a winner?
What influences confidence intervals?

The width of a confidence interval depends on
three things

: The confidence level can be raised (e.g., to
99%) or lowered (e.g., to 90%)

N: we have more confidence in larger sample sizes
so as N increases, the interval decreases

Variation: more variation = more error


% agree closer to 50%
Higher standard deviations