PPA 415 – Research Methods in Public Administration

Download Report

Transcript PPA 415 – Research Methods in Public Administration

PPA 415 – Research Methods
in Public Administration
Lecture 5 – Normal Curve,
Sampling, and Estimation
Normal Curve


The normal curve is central to the theory that
underlies inferential statistics.
The normal curve is a theoretical model.



A frequency polygon that is perfectly symmetrical and
smooth.
Bell shaped, unimodal, with infinite tails.
Crucial point distances along the horizontal axis,
when measured in standard deviations, always
measure the same proportion under the curve.
Normal Curve
Normal Curve
Computing Z-Scores

To find the percentage of the total area (or
number of cases) above, below, or
between scores in an empirical
distribution, the original scores must be
expressed in units of the standard
deviation or converted into Z scores.
Xi  X
Z
s
Computing Z-Scores – Fair
Housing Survey 2000
100
80
60
40
20
Std. Dev = 2.46
Mean = 12.9
N = 156.00
0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
What was the last grade you completed in school?
Computing Z-Scores: Examples

What percentage of the cases have between six and the
mean years of education?
Z




X i X 6  12.9  6.9


 2.805
s
2.46
2.46
X i  X 12.9  12.9
0
Z


0
s
2.46
2.46
From Appendix A, Table A: Z=-2.81 is 0.0026.
From Appendix A, Table A: Z=0 is .5.
P6-12.9 = .5-.0026 = .4974.
49.74% of the distribution lies between 6 and 12.9 years of
education
Computing Z-Scores: Examples


What percentage of the cases are less
than eight years of education?
What percentage have more than 13
years?
Z
X i  X 8  12.9  4.9


 1.992, p  .5  .4767  .0233
s
2.46
2.46
Z 
Xi  X
13  12.9
.1


 .04; p  1  .5160  .4840.
s
2.46
2.46
Computing Z-Scores: Examples

What percentage of Birmingham residents
have between 10 and 13 years of
education?
Z
X i  X 10  12.9  2.9


 1.18, p  .1190.
s
2.46
2.46
X i  X 13  12.9
.1


 .04; p  .5160.
s
2.46
2.46
p1013  .5160  .1190  .3970
Z
Computing Z-scores: Rules


If you want the distance between a score and
the mean, subtract the probability from .5 if the Z
is negative. Subtract .5 from the probability if Z
is positive.
If you want the distance beyond a score (less
than a score lower than the mean), use the
probability in Appendix A, Table A. If the distance
is more than a score higher than the mean),
subtract the probability in Appendix A, Table A
from 1.
Computing Z-scores: Rules

If you want the difference between two
scores other than the mean:

Calculate Z for each score, identify the
appropriate probability, and subtract the
smaller probability from the larger.
Probability



One interpretation of the area under the
normal curve is as probabilities.
Probabilities are determined as the
number of successful events divided by
the total possible number of events.
The probability of selecting a king of
hearts from a deck of cards is 1/52 or
.0192 (1.92%).
Probability


The proportions under the normal curve
can be treated as probabilities that a
randomly selected case will fall within the
prescribed limits.
Thus, in the Birmingham fair housing
survey, the probability of selecting a
resident with between 10 and 13 years of
education is 39.7%.
Sampling


One of the goals of social science
research is to test our theories and
hypotheses using many different types of
people drawn from a broad cross section
of society.
However, the populations we are
interested in are usually too large to test.
Sampling


To deal with this problem, researchers
select samples or subsets of the
population.
The goal is to learn about the populations
using the data from the samples.
Sampling



Basic procedures for selecting probability
samples, the only kind that allow generalization
to the larger population.
Researcher do use nonprobability samples, but
generalizing from them is nearly impossible.
The goal of sampling is to select cases in the
final sample that are representative of the
population from which they are drawn.

A sample is representative if it reproduces the
important characteristics of the population.
Sampling

The fundamental principle of probability
sampling is that a sample is very likely to
be representative if it is selected by the
Equal Probability of Selection Method
(EPSEM).

Every case in the population must have an
equal chance of ending up in the sample.
Sampling

EPSEM and representativeness are not
the same thing.

EPSEM samples can be unrepresentative,
but the probability of such an event can be
calculated unlike nonprobability samples.
EPSEM Sampling Techniques




Simple random sample – list of cases and a
system for selection that ensures EPSEM.
Systematic sampling – only the first case is
randomly sample, then a skip interval is used.
Stratified sample – random subsamples on the
basis of some important characteristic.
Cluster sampling – used when no list exists.
Clusters often based on geography.
The Sampling Distribution

Once we have selected a probability
sample according to some EPSEM
procedure, what do we know?



We know a great deal about the sample, but
nothing about the population.
Somehow, we have to get from the sample to
the population.
The instrument used is the sampling
distribution.
The Sampling Distribution


The theoretical, probabilistic distribution of a
descriptive statistic (such as the mean) for all
possible samples of certain sample size (N).
Three distributions are involved in every
application of inferential statistics.



The sample distribution – empirical, shape, central
tendency and distribution.
The population distribution – empirical, unknown.
The sampling distribution – theoretical, shape, central
tendency, and dispersion can be deduced.
The Sampling Distribution


The sampling distribution allows us to
estimate the probability of any sample
outcome.
Discuss the identification of a sampling
distribution. Generally speaking, a
sampling distribution will be symmetrical,
approximately normal, and have the mean
of the population.
The Sampling Distribution

If repeated random samples of size N are
drawn from a normal population with mean
μ and standard deviation σ, then the
sampling distribution of sample means will
be normal with a mean μ and a standard
deviation of σ/N (standard error of the
mean).
The Sampling Distribution

Central Limit Theorem.



If repeated random samples of size N are drawn from
any population, with mean μ and standard deviation
σ, then, as N becomes large, the sampling
distribution of sample means will approach normality,
with mean μ and standard deviation σ/N.
The theorem removes normality constraint in
population.
Rule of thumb: N100.
The Sampling Distribution
Estimation Procedures


Bias – does the mean of the sampling
distribution equal the mean of the
population?
Efficiency – how closely around the mean
does the sampling distribution cluster. You
can improve efficiency by increasing
sample size.
Estimation Procedures

Point estimate – construct a sample,
calculate a proportion or mean, and
estimate the population will have the same
value as the sample. Always some
probability of error.
Estimation Procedures

Confidence interval – range around the sample
mean.

First step: determine a confidence level: how much
error are you willing to tolerate. The common
standard is 5% or .05. You are willing to be wrong
5% of the time in estimating populations. This figure
is known as alpha or α. If an infinite number of
confidence intervals are constructed, 95% will contain
the population mean and 5% won’t.
Estimation Procedures



We now work in reverse on the normal curve.
Divide the probability of error between the upper
and lower tails of the curve (so that the 95% is in
the middle), and estimate the Z-score that will
contain 2.5% of the area under the curve on
either end. That Z-score is ±1.96.
Similar Z-scores for 90% (alpha=.10), 99%
(alpha=.01), and 99.9% (alpha=.001) are ±1.65,
±2.58, and ±3.29.
Estimation Procedures
  
c.i.  X  Z 

 N
where
c.i.  confidence interval
X  the sample mean
Z  the Z score as determined by the alpha level
  

  the population standard error of the mean
 N
Estimation Procedures – Sample
Mean
 s 
c.i.  X  Z 

 n 1 
where
c.i.  confidence interval
X  the sample mean
Z  the Z score as determined by the alpha level
 s 

  the standard error of the mean
 n 1 
Only use if sample is 100 or greater
Estimation – Proportions Large
Sample
Pu (1  Pu )
c.i.  Ps  Z
N
use P  .5 for the most conservati ve estimates.
.25
c.i.  Ps  Z
N
Use only if sample size is greater than 100
Estimation Procedures

You can control the width of the
confidence intervals by adjusting the
confidence level or alpha or by adjusting
sample size.
Confidence Interval Examples
Birmingham Fair Housing Survey Education with 95%, 99%, and
99.9% confidence intervals.
 s 

c.i.  X  Z 
  12.9  1.96
 n 1 

 s 

c.i.  X  Z 
  12.9  2.58
 n 1 

2.46 
  12.9  .39  12.51    13.29
156 
2.46 
  12.9  .51  12.39    13.41
156 
 s 
 2.46 
c.i.  X  Z 
  12.9  3.29
  12.9  .65  12.25    13.55
 n 1 
 156 
Confidence Interval Examples

Proportion of sample who believe that
discrimination is a major problem in
Birmingham.
c.i.  Ps  Z
.25
.25
 .214  1.96
 .214  1.96 .0013  .214  1.96(.0361)  .214  .071  .143  Pu  .285
N
192
c.i.  Ps  Z
.25
.25
 .214  2.58
 .214  2.58 .0013  .214  2.58(.0361)  .214  .093  .121  Pu  .307
N
192
c.i.  Ps  Z
.25
.25
 .214  3.29
 .214  3.29 .0013  .214  3.29(.0361)  .214  .119  .095  Pu  .333
N
192