Chapter 1.3 PP

Download Report

Transcript Chapter 1.3 PP

1.3 Experimental Design
•Designing a Statistical Study
1. Identify the variable(s) of interest
(the focus) and the population of
the study.
2. Develop a detailed plan for
collecting data. Make sure the
data are representative of the
population
• Designing a Statistical Study
3. Collect the data.
4. Describe the data with descriptive statistics
and techniques.
5. Make decisions using inferential statistics.
Identify any possible errors.
Data Collection
1. Perform an experiment
– Do experiment to part of the population
– Do nothing to, or give a placebo to the other part
of the population. This is the control group.
– Take data and compare the results between the
two groups.
– Example: an experiment could be used to
evaluate the benefits of a new drug or medical
procedure.
Data Collection
2. Use a simulation
– A simulation uses a mathematical or physical
model to reproduce the conditions of a situation
or process.
– Allows the study of something that is impractical
or dangerous to create in real life.
– Simulations often save time and money.
– Example: the destructive characteristics of a
bomb or fire.
– Example: 50/50 odds game with pins on a board
at a carnival
Data Collection
3. Take a census
– A census is a count or measure of the entire
population.
– A census provides complete information, but is
expensive, time consuming, and difficult to
perform.
– In the case of destructive testing (think of testing
bombs), may not have anything left when done.
Data Collection
4. Use sampling – this is what you are going to
do.
– A sampling is a count or measure part of the
population
– Use the sample to predict the behavior of the
population
– A sample of the bombs can be tested for
potency, and the results can be used to predict
the potency of the un-tested bombs.
Examples from the book
• Try it Yourself – pg 16
1a) Focus: Effect of exercise on senior citizens.
Population: Collection of all senior citizens.
1b) Experiment
2a) Focus: Effect of radiation fallout on senior
citizens.
Population: Collection of all senior citizens
2b) Sampling
Dictionary Word Chase
• What percent of the English words do you
know?
• Randomly open the book and pick a word
• Is this truly random?
• This would be like convenience sampling
Dictionary Word Chase
• Simple Random Sample (SRS) all the words
have to have the same probability of being
selected.
• Use a number generator (math, probability,
randInt(#))to randomly pick a word from all
words in dictionary (Webster’s Ninth New
Collegiate Dictionary has 13,000,000 words)
• Is this feasible?
Dictionary Word Chase
• Stratification – use the number generator to
pick a letter (stratified on letter) and then
randomly select a word
• Is this feasible?
• Cluster – pick a page (or pages), then a
column, then all the words in that(those)
column(s)
Dictionary Word Chase
• Systematic – randomly select a page,
randomly select a starting word, select words
at a specified interval.
• An advantage to systematic sampling is that it
is easy to use.
Resource:
• Much of the information for the following
slides was taken from: http://stattrek.com/
Data Collection Methods: Pros and
Cons
• Each method of data collection has
advantages and disadvantages.
• Resources. When the population is large, a
sample survey has a big resource advantage
over a census. A well-designed sample survey
can provide very precise estimates of
population parameters - quicker, cheaper, and
with less manpower than a census.
Data Collection Methods: Pros and
Cons
• Generalizability. Generalizability refers to the
appropriateness of applying findings from a study to a
larger population. Generalizability requires random
selection. If participants in a study are randomly
selected from a larger population, it is appropriate to
generalize study results to the larger population; if not,
it is not appropriate to generalize.
Observational studies do not feature random selection;
so it is not appropriate to generalize from the results of
an observational study to a larger population.
Data Collection Methods: Pros and
Cons
• Causal inference. Cause-and-effect
relationships can be teased out when subjects
are randomly assigned to groups. Therefore,
experiments, which allow the researcher to
control assignment of subjects to treatment
groups, are the best method for investigating
causal relationships.
Test Your Understanding of This
Lesson
• Which of the following statements are true?
• I. A sample survey is an example of an experimental
study.
II. An observational study requires fewer resources
than an experiment.
III. The best method for investigating causal
relationships is an observational study.
• (A) I only
(B) II only
(C) III only
(D) All of the above.
(E) None of the above.
Test Your Understanding of This
Lesson
• Solution
• The correct answer is (E). In a sample survey, the
researcher does not assign treatments to survey
respondents. Therefore, a sample survey is not an
experimental study; rather, it is an observational study.
An observational study may or may not require fewer
resources (time, money, manpower) than an
experiment. The best method for investigating causal
relationships is an experiment - not an observational
study - because an experiment features randomized
assignment of subjects to treatment groups.
Survey Sampling Methods
• Probability vs. Non-Probability Samples
• As a group, sampling methods fall into one of two
categories.
• Probability samples. With probability sampling methods,
each population element has a known (non-zero) chance of
being chosen for the sample.
• Non-probability samples. With non-probability sampling
methods, we do not know the probability that each
population element will be chosen, and/or we cannot be
sure that each population element has a non-zero chance
of being chosen.
Survey Sampling Methods
• Non-probability samples. With non-probability
sampling methods, we do not know the
probability that each population element will be
chosen, and/or we cannot be sure that each
population element has a non-zero chance of
being chosen.
• Non-probability sampling methods offer two
potential advantages - convenience and cost. The
main disadvantage is that non-probability
sampling methods do not allow you to estimate
the extent to which sample statistics are likely to
differ from population parameters. Only
probability sampling methods permit that kind of
analysis.
Non-Probability Sampling Methods
• Two of the main types of non-probability sampling
methods are voluntary samples and convenience
samples.
• Voluntary sample. A voluntary sample is made up of
people who self-select into the survey. Often, these
folks have a strong interest in the main topic of the
survey.
Suppose, for example, that a news show asks viewers
to participate in an on-line poll. This would be a
volunteer sample. The sample is chosen by the
viewers, not by the survey administrator.
Non-Probability Sampling Methods
• Convenience sample. A convenience sample is
made up of people who are easy to reach.
Consider the following example. A pollster
interviews shoppers at a local mall. If the mall
was chosen because it was a convenient site
from which to solicit survey participants
and/or because it was close to the pollster's
home or business, this would be a
convenience sample.
Probability Sampling Methods
• The main types of probability sampling
methods are simple random sampling,
stratified sampling, cluster sampling,
multistage sampling, and systematic random
sampling. The key benefit of probability
sampling methods is that they guarantee that
the sample chosen is representative of the
population. This ensures that the statistical
conclusions will be valid.
Probability Sampling Methods
• Simple random sampling. Simple random sampling refers
to any sampling method that has the following properties.
The population consists of N objects.
– The sample consists of n objects.
– If all possible samples of n objects are equally likely to occur, the
sampling method is called simple random sampling.
•
There are many ways to obtain a simple random sample.
One way would be the lottery method. Each of the N
population members is assigned a unique number. The
numbers are placed in a bowl and thoroughly mixed. Then,
a blind-folded researcher selects n numbers. Population
members having the selected numbers are included in the
sample.
Probability Sampling Methods
• Stratified sampling. With stratified sampling, the
population is divided into groups, based on some
characteristic. Then, within each group, a probability
sample (often a simple random sample) is selected. In
stratified sampling, the groups are called strata.
As a example, suppose we conduct a national survey.
We might divide the population into groups or strata,
based on geography - north, east, south, and west.
Then, within each stratum, we might randomly select
survey respondents.
Probability Sampling Methods
• Cluster sampling. With cluster sampling, every
member of the population is assigned to one, and
only one, group. Each group is called a cluster. A
sample of clusters is chosen, using a probability
method (often simple random sampling). Only
individuals within sampled clusters are surveyed.
Note the difference between cluster sampling
and stratified sampling. With stratified sampling,
the sample includes elements from each stratum.
With cluster sampling, in contrast, the sample
includes elements only from sampled clusters.
Probability Sampling Methods
• Systematic random sampling. With systematic
random sampling, we create a list of every
member of the population. From the list, we
randomly select the first sample element from
the first k elements on the population list.
Thereafter, we select every kth element on the
list.
This method is different from simple random
sampling since every possible sample of n
elements is not equally likely.
Probability Sampling Methods
• Multistage sampling. With multistage
sampling, we select a sample by using
combinations of different sampling methods.
For example, in Stage 1, we might use cluster
sampling to choose clusters from a
population. Then, in Stage 2, we might use
simple random sampling to select a subset of
elements from each chosen cluster for the
final sample.
Test Your Understanding
• An auto analyst is conducting a satisfaction survey,
sampling from a list of 10,000 new car buyers. The list
includes 2,500 Ford buyers, 2,500 GM buyers, 2,500
Honda buyers, and 2,500 Toyota buyers. The analyst
selects a sample of 400 car buyers, by randomly
sampling 100 buyers of each brand.
• Is this an example of a simple random sample?
• (A) Yes, because each buyer in the sample was
randomly sampled.
(B) Yes, because each buyer in the sample had an equal
chance of being sampled.
(C) Yes, because car buyers of every brand were equally
represented in the sample.
(D) No, because every possible 400-buyer sample did
not have an equal chance of being chosen.
(E) No, because the population consisted of purchasers
of four different brands of car.
Test Your Understanding
• Solution
• The correct answer is (D). A simple random sample
requires that every sample of size n (in this problem, n
is equal to 400) have an equal chance of being
selected. In this problem, there was a 100 percent
chance that the sample would include 100 purchasers
of each brand of car. There was zero percent chance
that the sample would include, for example, 99 Ford
buyers, 101 Honda buyers, 100 Toyota buyers, and 100
GM buyers. Thus, all possible samples of size 400 did
not have an equal chance of being selected; so this
cannot be a simple random sample.
Test Your Understanding of This
Lesson
• The fact that each buyer in the sample was randomly
sampled is a necessary condition for a simple random
sample, but it is not sufficient. Similarly, the fact that
each buyer in the sample had an equal chance of being
selected is characteristic of a simple random sample,
but it is not sufficient. The sampling method in this
problem used random sampling and gave each buyer
an equal chance of being selected; but the sampling
method was actually stratified random sampling.
• The fact that car buyers of every brand were equally
represented in the sample is irrelevant to whether the
sampling method was simple random sampling.
Similarly, the fact that population consisted of buyers
of different car brands is irrelevant.
Bias in Survey Sampling
• In survey sampling, bias refers to the tendency
of a sample statistic to systematically over- or
under-estimate a population parameter.
Bias Due to Unrepresentative
Samples
• A good sample is representative. This means that
each sample point represents the attributes of a
known number of population elements.
• Bias often occurs when the survey sample does
not accurately represent the population. The bias
that results from an unrepresentative sample is
called selection bias. Some common examples of
selection bias are described below.
Bias Due to Unrepresentative
Samples
• Undercoverage. Undercoverage occurs when some
members of the population are inadequately represented
in the sample. A classic example of undercoverage is the
Literary Digest voter survey, which predicted that Alfred
Landon would beat Franklin Roosevelt in the 1936
presidential election. The survey sample suffered from
undercoverage of low-income voters, who tended to be
Democrats.
How did this happen? The survey relied on a convenience
sample, drawn from telephone directories and car
registration lists. In 1936, people who owned cars and
telephones tended to be more affluent. Undercoverage is
often a problem with convenience samples.
Bias Due to Unrepresentative
Samples
• Nonresponse bias. Sometimes, individuals chosen for the
sample are unwilling or unable to participate in the survey.
Nonresponse bias is the bias that results when respondents
differ in meaningful ways from nonrespondents. The
Literary Digest survey illustrates this problem. Respondents
tended to be Landon supporters; and nonrespondents,
Roosevelt supporters. Since only 25% of the sampled voters
actually completed the mail-in survey, survey results
overestimated voter support for Alfred Landon.
The Literary Digest experience illustrates a common
problem with mail surveys. Response rate is often low,
making mail surveys vulnerable to nonresponse bias.
Bias Due to Unrepresentative
Samples
• Voluntary response bias. Voluntary response
bias occurs when sample members are selfselected volunteers, as in voluntary samples.
An example would be call-in radio shows that
solicit audience participation in surveys on
controversial topics (abortion, affirmative
action, gun control, etc.). The resulting sample
tends to overrepresent individuals who have
strong opinions.
Bias Due to Unrepresentative
Samples
• Random sampling is a procedure for sampling
from a population in which (a) the selection of
a sample unit is based on chance and (b) every
element of the population has a known, nonzero probability of being selected. Random
sampling helps produce representative
samples by eliminating voluntary response
bias and guarding against undercoverage bias.
All probability sampling methods rely on
random sampling.
Bias Due to Measurement Error
• A poor measurement process can also lead to
bias. In survey research, the measurement
process includes the environment in which the
survey is conducted, the way that questions
are asked, and the state of the survey
respondent.
Bias Due to Measurement Error
• Response bias refers to the bias that results from
problems in the measurement process. Some examples
of response bias are given below.
• Leading questions. The wording of the question may
be loaded in some way to unduly favor one response
over another. For example, a satisfaction survey may
ask the respondent to indicate where she is satisfied,
dissatisfied, or very dissatified. By giving the
respondent one response option to express satisfaction
and two response options to express dissatisfaction,
this survey question is biased toward getting a
dissatisfied response.
Bias Due to Measurement Error
• Social desirability. Most people like to present
themselves in a favorable light, so they will be
reluctant to admit to unsavory attitudes or
illegal activities in a survey, particularly if
survey results are not confidential. Instead,
their responses may be biased toward what
they believe is socially desirable.
Test Your Understanding
• Which of the following statements are true?
• I. Random sampling is a good way to reduce response
bias.
II. To guard against bias from undercoverage, use a
convenience sample.
III. Increasing the sample size tends to reduce survey
bias.
IV. To guard against nonresponse bias, use a mail-in
survey.
• (A) I only
(B) II only
(C) III only
(D) IV only
(E) None of the above.
Test Your Understanding
• The correct answer is (E). None of the statements
is true. Random sampling provides strong
protection against bias from undercoverage bias
and voluntary response bias; but it is not effective
against response bias. A convenience sample
does not protect against undercoverage bias; in
fact, it sometimes causes undercoverage bias.
Increasing sample size does not affect survey
bias. And finally, using a mail-in survey does not
prevent nonresponse bias. In fact, mail-in surveys
are quite vulnerable to nonresponse bias.