Data for Decisions

Download Report

Transcript Data for Decisions

Chapter 7: Data for Decisions
Lesson Plan
For All Practical
Purposes
 Sampling
 Bad Sampling Methods
 Simple Random Samples
 Cautions About Sample Surveys
 Experiments
 Thinking About Experiments
 Inference: From Sample to Population
 Confidence Intervals
© 2006, W.H. Freeman and Company
Mathematical Literacy in
Today’s World, 7th ed.
Chapter 7: Data for Decisions
Sampling
 Statistics
 The science of collecting, organizing, and interpreting data.
 How is the data produced? Sampling and experiments.
 Sampling
 Gather information about a large group of individuals.


Time, cost, and inconvenience forbid contacting every individual.
Instead, gather information about only part of the group in order to draw
conclusions about the whole.
Population – The entire
group of individuals about
which we want
information.
Sample – Part of the population from
which we actually collect
information used to draw
conclusions about the whole.
Chapter 7: Data for Decisions
Bad Sampling Methods
 Bad Sampling Methods
 If personal choice is involved in selecting the sample, the following
could happen:
 Results could become biased.
 The sample may not be a true representation of the population.
1. Convenience Samples
 Interviewer chooses the sample from
individuals close at hand (easiest to reach).
 Example: Mall surveys
Bias – The design
2. Voluntary Response Sample
of a statistical
 People who choose themselves by
study that
responding to a general appeal.
systematically
 People with strong opinions are most
favors a certain
likely to respond; can cause bias.
outcome.
 Examples: Opinion polls, call-ins.
Chapter 7: Data for Decisions
Simple Random Samples
 Simple Random Sample (SRS)
 An SRS of size n consists of n individuals from the population
chosen in such a way that every set of n individuals has an equal
chance to be the sample actually selected.
 Choosing a sample by chance avoids bias by giving all individuals
an equal chance to be chosen (a good sampling method).
 Examples of SRS
 Draw names from a hat: Place all the names of the people in the
population into a hat and draw out a handful (the sample).
 Slow and inconvenient
 Use the table of random digits: A more efficient way of randomly
selecting the sample without bias.
 For smaller samples, tables of random digits are used.
 For larger samples, computers do the random digit sampling.
Chapter 7: Data for Decisions
Simple Random Sample
 Two Steps in Choosing a Simple Random Sample
1. Give each member of the population a numerical label of the
same length.
Example: 100 items can be labeled with two digits 01, 02, …, 99, 00
2. To choose the random sample, select a line in the digit table.
For a sample size of n, start
reading off numbers of length of A table of random digits – A list
of the digits 0, 1, 2, 3, 4, 5, 6, 7,
the labels until n individuals are
8, 9 with these two properties:
selected from the population.
 When selecting the n
individuals for the sample from
the random digits table:
1. Do not use any group of
digits not used as a label.
2. Do not use any repeats.
1. Each entry in the table is equally
likely to be any of the 10 digits
from 0 through 9 .
2. The entries are independent of
one another. That is, knowledge
of one part of the table gives no
information about the other part.
Chapter 7: Data for Decisions
Simple Random Sample
 Using the Random Digit Table
Section Taken From a Random Digits Table
101
102
103
104
19223
73676
45467
52711
95034
47150
71709
38889
05756
99400
77558
93074
28713
01927
00095
60227
96409
27754
32863
40011
12531
42648
29485
85848
42544
82425
82226
48767
82853
36290
90056
52573
 Example:
 A group of 70 people were labeled 01, 02, 03, …, 69, 70 .
 In the random digits table, line 104 was selected and two lucky
winners were selected.
 Reading off two digit labels from line 104…
 52 was selected first, 71 was skipped over (because it is not in the
range of labels), and 13 was chosen. Answer: 52 and 13
Chapter 7: Data for Decisions
Cautions About Sample Surveys
 Sample surveys of large populations require the following:




A good sampling design (can be done with SRS)
An accurate and complete list of the population
Participation of all individuals selected for the sample
A question posed that is neutral and clear
 Bias can occur due to the following:
 Problems with obtaining accurate and complete population list
 Undercoverage – Occurs when some groups in the population are left
out of the process of choosing the sample.
Example: Homeless, prison inmates, students in dormitories, etc.
 Problems with getting 100% participation of sampled people
 Nonresponse – Occurs when an individual chosen for the sample
cannot be contacted or refuses to participate.
 Problems with posing a misleading or confusing question
Chapter 7: Data for Decisions
Experiments
 Observation versus Experiments
 Observational Study – Example: sample survey
 Observes individuals and measures variable of interest but
does not attempt to influence the response.
 Purpose is to describe some group or situation.
 Experiment
 Deliberately imposes some treatment on individuals in order to
observe their responses.
 Purpose is to study whether the treatment causes a change in
the response.
 Examining Cause and Effect Between Variables
 Experiments are the preferred method for examining the effect of
one variable on another.
 By imposing specific treatment of interest and controlling other
influences, we can pin down cause and effect.
Chapter 7: Data for Decisions
Experiments
 Uncontrolled Experiment
 When it is not possible to control outside
factors that can influence the outcome.
 Confounding – The variables, whether
part of a study or not, are said to be
confounded when their effects on the
outcome cannot be distinguished from
each other.
Apply a
treatment
Observe or
measure
the response
Influences by
outside effects
 Randomized Comparative Experiment (helps confounding)
 The outside effects and confounding variables act on all groups.
 An experiment to compare two or more treatments in which people,
animals, or things are assigned to treatments by chance.
Randomized – The subjects are assigned to treatments by chance.
Comparative – Compares two or more treatments.
Chapter 7: Data for Decisions
Thinking About Experiments
 Statistical Significance
 An observed effect is statistically significant if it is so large that it is
unlikely to occur just by chance in the absence of a real effect in
the population from which the data were drawn.
 Example: The connection between smoking and lung cancer is statistically
significant.
 Control Group
 A group of experimental subjects who are given a standard
treatment or no treatment at all (such as a placebo).
 Placebo Effect
 The effect of a dummy treatment (such as an inert pill in a medical
experiment) on the response of the subjects.
 The tendency to respond favorably to any treatment.
 Double-Blind Experiments
 An experiment in which neither the experimental subjects nor the
persons who interact with them know which treatment each subject
received. This helps to eliminate possible influences or biases
between the subjects and workers — everyone kept “blind.”
Chapter 7: Data for Decisions
Inference: From Sample to Population
 Statistical Inference
 When the sample was chosen at random from a population, we
can infer conclusions about the wider population from these data.
 Statistical inference works only if the data come from random
samples or a randomized comparative experiment.
 Parameter is a number that describes the population.
 A parameter is a fixed number (in practice we do not know its value).
 A statistic is a number that describes a sample.
 The value of a statistic is known when we have taken a sample, but it
can change from sample to sample.
 Example: A random sample of 2500 people was chosen from the population
and asked a question: “Do you like getting new clothes but find shopping for
clothes frustrating and time consuming?” 1650 people agreed.
^
Sample statistic, p = 1650/2500 = 0.66 = 66%
Infer that 66% of the population agrees.
Chapter 7: Data for Decisions
Inference: From Sample to Population
 Sampling Distribution
 The distribution of values taken by the statistic in all possible
samples of the same size from the same population.
 For a fixed number of trials, a distribution with larger sample sizes
will have less variation and the values will lie closer to the mean.
1000 SRSs of size 100 from the
same population
1000 SRSs of size 2500 from the
same population (less variable
than samples of size 100)
Chapter 7: Data for Decisions
Inference: From Sample to Population
 Sample Proportion
 Choose a SRS of size n from a large population that contains
population proportion p of successes:
Sample proportion of successes, ^
p =
count of successes in the sample
n
 Then…
^
 Shape: For large sample sizes, the sampling distribution of p
is
approximately Normal.
 Center: The mean of the sampling distribution is p.
 Spread: The standard deviation of the sampling distribution is:
p (1 – p )
Standard deviation is

n
For the shopping example…
0.60 ( 1 - 0.60)
With a mean p = 0.6 and n = 2500, stand. dev. is  2500
= 0.0098
Chapter 7: Data for Decisions
Confidence Intervals
 The 68-95-99.7 Rule
 68% of the observations fall within ± 1 standard deviation of the mean.
 95% of the observations fall within ± 2 standard deviations of the mean.
 99.7% of the observations fall within ± 3 standard deviations of the mean.
 95% Confidence Interval
An interval obtained from the sample data by a method that in 95% of all
samples will produce an interval containing the true population parameter
^
A 95% confidence interval for p is
p^(1 – p)
^
p±2

n
^
 Where p is the proportion of successes in the sample
^
^
 And the margin of error is 2 p
(1 – p)/n
 This recipe is only approximately correct, but it is quite accurate when the
sample size n is large.