Transcript Week 6

Review: Two Main Uses of Statistics
1) Descriptive:
•
•
•
To describe or summarize a collection of data points
The data set in hand = all the data points of interest
No “going beyond the data”
2) Inferential:
•
•
•
•
To make decisions, draw inferences about a population,
or draw conclusions under conditions of uncertainty
The data set in hand = merely a sample from some
larger population (that is our real interest)
Use these data to make a calculated guess about what
we would find if we had full information
Use probability theory to make calculated guesses (with
some estimated margin of error)
Statistical inferences involve two
possible tasks:
1) Estimation:
•
Use sample data to infer population parameter 
e.g., lifetime risk of being a victim of a violent
crime according to NCVS data
2) Hypothesis Testing:
•
Use sample data to make a decision about the
correctness of some hypothesis or prediction 
e.g., whether civil orders of protection will lower
the risk recurrent violence against spouses
Both tasks rely on using Samples to
make statements about populations:
1) Sample:
•
A limited number of cases selected to represent
the larger population of data points
2) Key Terms/Ideas in Sampling:
•
•
•
Representativeness  degree to which sample
is an exact replica in miniature of the population
Sampling Error  degree to which sample
statistic deviates from population value
Sampling Method  procedure used to draw
cases from the population of data points
Two main types of sampling methods:
1) Probability Sampling
•
Selection where each data point has a known
probability for being selected into the sample
• Simple Random sample  every data point has
an equal likelihood of being selected
• Other types of probability samples?
─ Systematic
─ Stratified
─ Weighted
─ Cluster
• Doesn’t guarantee representativeness each time
Two main types of sampling methods:
2) Non-probability Sampling:
• Selection procedure in which probability of
selection is unknown
• Specific types of Non-probability samples?
─
─
─
─
─
Accidental
Convenience
Purposive
Snowball
Volunteer
• No guarantee of representativeness
• Inferences become more “iffy”
Why use one sample method versus
another?
1) Maximize representativeness of data
2) Minimize sampling error and bias in
data
3) Maximize the validity of our statistical
inferences
4) Note: Inferential statistics always
assume simple random sampling as a
basic premise
What exactly is “randomness”?
1) What does it look like?
2) How can we tell if we have it?
3) Randomness is a property of our data
selection procedure not the data points
4) Compromises to randomness:
•
•
•
Any deliberate departure from sampling
Refusals, dropouts, nonresponses are (almost)
never random
Our data are always a more-or-less imperfect
approximation to real random sample
Making inferences from sample
statistics involves 3 distributions:
1) Population distribution: unobserved in
population from which cases drawn
2) Sample distribution: observed in cases
from which data were collected
3) Sampling distribution: unobserved but
calculable distribution of statistics for
samples of same size/type as ours
(drawn from the same population)
 This distribution is the key to making inferences
“Sampling Distribution”: what is it?
1) A hypothetical population of samples
(and sample statistics) from drawn from
the same population
a) Has a describable theoretical distribution
(based on repeatedly drawing a sample an
infinite number of times)
b) Has certain parameters determined by the
population from which the sample is drawn
and the size of the sample
e.g.: If we draw a sample of 25 cases and
compute the sample mean
a) The sample mean has a theoretical sampling
distribution whose characteristics are exactly
determined by the distribution of the
population (μ & σ) and by the sample size
(n=25)
b) The mean of the sampling distribution = the
mean of the population (μs = μpop)
c) In this case: the σ of the sampling
distribution = σ/5 (i.e., one-fifth the σ of the
population)
Important features of Sampling
distributions:
a) If the variable is normally distributed in the
population, then the sampling distribution of
sample means will also be normal
b) The mean of the sampling distribution = the
mean of the population
c) The σ of the sampling distribution = σ/√n
d) Use this information to compute the likelihood
of any sample mean being drawn from the
population (using the standard normal [z]
table)
Important features of Sampling
distributions to remember:
a) The σ of the sampling distribution will always
be smaller than the σ of the population
b) The large the sample size, the smaller the
standard error of the sampling distribution
c) The mean of the sampling distribution will
always be the population mean
d) The sampling distribution will become more
Normal as the sample size gets larger – no
matter the distribution of the population!
[this is called the Central Limit Theorem]
Using Sample statistics to make inferences
about population parameters:
a) The best estimate of the population mean is
the sample mean
b) The usual sample estimate of σ is slightly too
low; it needs to be adjusted to be unbiased
c) Thus there are two different formulas for the
sample variance/standard deviation:
(descriptive)
(inferential/estimated)
s
X  X
N
2
 
 X  X
N 1
2
Basic Steps in Estimating Population
Parameters:
1) Select valid estimator (unbiased, consistent)
2) Select valid data sample
•
•
•
Corresponds to population of interest
Random sample
Complete (no censoring or omissions)
3) Compute value of statistical estimate
4) Compute standard error for the estimate
5) Compute confidence interval (i.e., plausible
margin of sampling error)
Two Approaches to estimation:
1) Point Estimation:
•
•
•
Use sample data to infer exact value of population
parameter
Highly likely to be wrong or off-mark to some degree
e.g., infer that 30% of adults will be victims of violent crime
in their lifetimes (could actually be 35% or 25%)
2) Interval Estimation:
•
•
Instead use sample data to compute a range of values
(“confidence intervals”) within which the actual parameter
is located (with some calculated margin of certainty or
confidence)
Yields more approximate but more plausible (or confident)
estimates.
Confidence Interval Estimation:
1) Compute the sample mean
2) Compute the sample standard error
─ From the population (σ)
─ From the sample (s or  )
4) Compute the confidence interval
X  z ci   X
or
X  t ci   X
Note:
For population value:
X 
x
N
For sample value:
 X 
 x
N

sx
N 1