Transcript 5-1 Day 1
AP STATISTICS
LESSON 5 - 1
DESIGNING DATA
ESSENTIAL QUESTION:
How can data be produced to
answer questions about real-life
situations?
To design experiments, surveys and analyze
them in order to answer questions as accurately
as possible.
To Learn how to use the B table to set up Simple
Random Samples.
To discover the difficulties that damage a
sample.
Introduction
Our goal in choosing a sample is a picture
of the population, disturbed as little as
possible by the act of gathering
information.
Sample surveys are one kind of
observations study.
Observation vs. Experiment
An observational study observes individuals and
measures variables of interest but does not
attempt to influence the responses.
An experiment, on the other hand, deliberately
imposes some treatment on individuals in order
to observe their responses.
Both have important roles depending on the
situation and the questions to be answered.
Additional facts
Observational studies of the effect of one
variable on another often fail because the
explanatory variable is confounded with lurking
variables.
In some situations, it may not be possible to
observe individuals directly or to perform an
experiment. In other cases, it may be logistically
difficult or simply inconvenient to
Simulation provides an alternative method to
produce data.
Population and Sample
The entire group of individuals that we
want information about is called the
population.
A sample is a part of the population that
we actually examine in order to gather
information.
Sampling vs. Census
Sampling involves studying a part in order to
gain information about the whole.
A census attempts to contact every individual in
the entire population.
The design of a sample refers to the method
used to choose the sample from the population.
Poor sample design can produce misleading
conclusions.
Voluntary Response Sample
A voluntary response sample consists of people
who choose themselves by responding to a
general appeal. Voluntary response samples
are biased because people with strong opinions,
especially negative opinions, are most likely to
respond.
Another type of bad sampling is convenience
sampling, which chooses the individuals easiest
to reach.
Bias
The design of a study is biased if it
systematically favors certain outcomes.
Choosing a sample by chance attacks bias
by giving all individuals an equal chance to
be chosen.
Simple Random Sample
A simple random sample (SRS) of size n
consists of n individuals from the population
chosen in such a way that every set of n
individuals has an equal chance to be the
sample actually selected.
An SRS not only gives each individual an equal
chance to be chosen, (thus avoiding bias in the
choice) but also gives every possible sample an
equal chance to be chosen.
Random Digits
A table of random digits is a long string of the
digits 0,1,2,3,4,5,6,7,8,9 with the following two
properties:
1. Each entry in the table is equally likely to be
any of the 10 digits 0 through 9.
2. The entries are independent of each other.
That is, knowledge of one part of the table
gives no information about any other part.
*Table B (in the back of your textbook is a Random
Digits Table
Choosing an SRS
Choose an SRS in two steps:
1. Table: Use Table B to select labels at
random.
2. Label: Assign a numerical label to every
individual in the population.
Probability Sample
A probability sample is a sample chosen
by chance. We must know what samples
are possible and what chance, or
probability, each possible sample has.
The use of chance to select the sample is
the essential principal of statistical
sampling.
Stratified Random Sample
To select a stratified random sample, first divide
the population into groups of similar individuals,
called strata. Then choose a separate SRS in
each stratum and combine these SRSs to form
the full sample.
This method is usually used for sampling from
large populations spread out over a wide area.
Multistage Sampling
A typical example of multistage sampling is Current Population Survey
Sampling Design, which is conducted as follows:
Stage 1: Divide the United States into 2007 geographical areas called
Primary Sampling Units.
Stage 2: Divide each PSU selected into smaller areas called
“neighborhoods” using ethnic and other information and take a
stratified sample of neighborhood.
Stage 3: Sort the housing units in each neighborhood into clusters of
four nearby units. Interview the households in a random sample of
these clusters.
This method saves time and money.
Cautions about sample surveys
We need a complete and accurate list of the population.
Undercoverage occurs when some groups in the population are left
out of the process of choosing the sample.
Nonresponse occurs when an individual chosen for the sample can’t
be contacted or does not cooperate.
Response bias is when respondents lie, especially if asked about
illegal or unpopular behaviors.
An interviewer whose attitude suggests that some answers are more
desirable than others will get these answers more often.
The wording of questions is the most important influence on the
answers given to a sample survey.
Inference about the population
Using chance to choose a sample eliminates
bias in the actual selection of the sample.
Because we deliberately use chance, the results
obey the laws of probability that govern chance
behavior.
Larger random samples give more accurate
results than smaller samples.