Data - beery
Download
Report
Transcript Data - beery
The Practice of Statistics
Third Edition
Chapter P:
What is Statistics?
Warm-Up
• Activity: Water Water Everywhere!
– Read the directions found on page 4 of your
text.
Case Study
Can Magnets Help Reduce Pain?
Read the study on page 3.
• What do you observe? Does
there appear to be a difference
between the reported Active
and Inactive pain scores?
• Can we “condense” the data?
What can we calculate to
simplify things? What do you
observe?
• Is this difference large enough
to argue that magnets are
effective in reducing pain?
• Is this difference due to chance
variation, or is it evidence of a
real difference?
Intro: What is Stats?
• Statistics:
– The science (and art) of learning from data.
• Data:
– Numbers with a contextual meaning.
• We use data and Statistics to draw
conclusions about a population based on
sample information.
Data/Statistics
Population
Sample
Inference/Conclusions
4 Themes of Stats
• Part I: Exploratory Data Analysis
– The tools and strategies for organizing, displaying,
describing, and analyzing data.
• Part II: Producing Data
– Designing surveys, experiments, and observational
studies that will yield the data necessary to answer a
question of interest.
• Part III: Probability
– The study of chance behavior. How likely are certain
outcomes?
• Part IV: Inference
– Draw conclusions about the population based on
samples. Test claims and compute estimates.
II. Data Production
• When answering a question, where the
data come from is important.
– Data beat personal experiences (anecdotal).
• Data Sources
–
–
–
–
Available Data
Surveys
Observational Studies
Experiments
Data Analysis
• Organize, Display, Summarize, and Interpret
• Individual: Objects described by data
• Variable: Characteristic of an individual
• Categorical: Places individuals into groups
• Quantitative: Numeric measures
• Distribution: Values taken on by a variable and how often
it takes those values.
Data Analysis
• When analyzing data, ask the following:
– Who are the individuals being described?
– What are the variables?
– Why were the data gathered?
– When, where, how, and by whom were the
data produced?
Exploratory Data Analysis
• Begin by examining each variable by itself.
– Move on to study relationships among the
variables.
• Begin with a graph or graphs.
– Add numerical summaries or specific aspects of
the data.
Describing Categorical Variables
• Bar graph
– Notice the vertical scale on this graph is
percents. (could be counts)
Describing Categorical Variables
• Side-by-side bar graph
Describing Quantitative Variables
• Dot Plot
Exploring Relationships between
Variables
On Time Delayed
Alaska Airlines
3274
501
America West
6438
787
AA = 501/3274 = 13.3%
AW = 787/7225 = 10.9%
Practice
• Page 19 #P7
– Cool Car Colors
Practice
• Page 19 #P9
– U.S. women’s soccer scores
Practice
• Page 20 #P11
– A class survey
Practice
• Page 21 #P12
Probability
• Long-term chances of an event occurring
• Chance behavior is unpredictable in the short
run, but has a regular, predictable pattern in
the long run.
– Consider flipping a coin, rolling dice, etc.
• We use probability to determine how likely certain
sample values/statistics are.
– We want to know, “Is this value likely to be due to
chance?”
Probability
• Chance behavior is unpredictable in the
short run, but has a regular, predictable
pattern in the long run.
Statistical Inference
• Water, Water Everywhere
Statistical Thinking
• Data come from real-world contexts...
• Doing statistics means more than just manipulating
data!
• Form the habit of asking “What do the data tell me?”
• Statistics involves a lot of calculating and graphing.
– We’ll let our calculator/computer do most of this.
However, ideas and judgments can not be automated!
• You learn statistics by doing statistical problems!
Case Closed!
• Can Magnets help reduce pain?
Page 11 P1-P4 Answers
P.1 Getting data from Jamie and her friends is
convenient, but it does not provide a good
snapshot of the opinions held by all young people.
In short, Jamie and her friends are not a
representative sample from the population of
interest (young people).
P.2 (a) This is an observational study. Patients were
matched by age, sex, and race, but the
investigators did not impose any treatment. They
simply asked about cell phone use and recorded
the responses.
(b) No, the results of this study are encouraging, but
cause-and-effect conclusions, like this one, must
be based on experiments.
Page 11 P1-P4 Answers
P.3
(a) This is an experiment. The company uses animation for one
group and a text for the other group in order to compare the
performance of the two different groups.
(b) The company could conclude that they have solid evidence that
computer animation was more successful than a textbook for
these students. If the company wants to generalize this
conclusion to a broader population, e.g. all juniors at a school or
all juniors in a state or all juniors in the country, then they must
assume that these groups are representative samples of the
population of interest. The information provided does not
indicate whether or not these students were randomly selected
from some larger population. If they were, then inferences to
that population are reasonable. Similarly, the information
provided does not indicate whether or not randomization was
used to partition the students into two groups. If randomization
was used, then cause and effect conclusions can be drawn.
Finally, we must assume that the tests adequately assess biology
concepts.
Page 11 P1-P4 Answers
P.4
(a) This type of question is best addressed with a survey by polling agencies like
the Gallup Organization. They use a variety of methods to get representative
samples from the entire country and have experts who can clarify the wording
of particular questions to avoid bias. Wording which leads the respondent to
think about economic issues would likely get a much different response than
wording which leads the respondent to think about social, political, or global
issues.
(b) Since we are interested in comparing two different teaching methods and
making an inference to all college students, an experiment is best. College
students who are interested in taking accounting must be randomly selected
and then those students must be randomly placed into two groups, one which
will be taught in a classroom and another which will learn the same material
online. Describing ideal experiments is easy, but think about the practical
problems (and costs) associated with the experiment described above.
(c) Since we are specifically interested in how long your teachers wait before
asking a question, it would be easiest to use an observational study that is
not a survey. Different lectures, labs, or discussions should be randomly
selected for each teacher. During the class, simply record the amount of time
the teacher waits to move on after asking each question. Since teachers have
different styles, you will have to think about whether you want the same
number of questions for each teacher, which would require you to observe
some teachers longer than others, or whether you want to observe each teacher
for the same amount of time.
Page 30: P19, P21, P22, P24, P27, P28 Answers
P.19
(a) Available data from family interviews and police
records were used. No treatments were imposed in
order to observe various responses.
(b) Parental involvement, profession of parents,
educational priorities, amount of reading, type of
child care, participation in sports, and participation
in other activities with peers are just a few other
variables that may be related to the amount of TV
watched. The effects of these other variables are
mixed up with and cannot be separated from the
effect of watching TV. This is known as
confounding.
Page 30: P19, P21, P22, P24, P27, P28 Answers
P.21
Who? The individuals are motor vehicles produced in 2004.
What? The categorical variables are: make and model; vehicle
type; transmission type. The quantitative variables are:
number of cylinders (integer count); city MPG (miles per
gallon); highway MPG (miles per gallon).
Why? The data were compiled to compare fuel economy.
When, where, how, and by whom? A statement on
www.fueleconomy.gov reveals that “The data included in
the Department of Energy's Fuel Economy Guide are the
result of vehicle testing done at the Environmental
Protection Agency's National Vehicle and Fuel Emissions
Laboratory in Ann Arbor, Michigan and by vehicle
manufacturers themselves with oversight by EPA.”
Page 30: P19, P21, P22, P24, P27, P28 Answers
Page 30: P19, P21, P22, P24, P27, P28 Answers
P.24 (a) Because they were interested in the opinions of all U.S.
adults and not just MLB fans who support the athletes.
(b) Although there will be variation from sample to sample, we can
assume that the sample obtained by the Gallup organization was
representative of the entire U.S. adult population.
Thus, the population percentages would be
about the same as the sample percentages
obtained by Gallup, 42% for “probably not”
and 33% for “definitely not.”
(c) No, we can not conclude that Barry Bonds
is lying. These percentages reflect the opinions
of U.S.adults, but public opinion can be much
different than fact. Some day we may find out
if Barry Bonds lied, but right now only Barry
and a few other people know the truth!
Page 30: P19, P21, P22, P24, P27, P28 Answers
Page 30: P19, P21, P22, P24, P27, P28 Answers
P.28
(a) These data were probably obtained from an observational study that
wasn’t a survey. The organizers of the program could easily collect
information from volunteers who participated in the “Mozart for
Minors” program and were willing to share their test scores. These
scores could then be compared with average scores for all students that
are typically reported to school districts and parents.
(b) No, we can not conclude that the “Mozart for Minors” program caused
an increase in the students’ test scores. Educational background of the
parents, family income, neighborhood, and many other factors may
influence test scores and participation in the music program.
(c) Conduct a simple comparative experiment. A large group of students
completes the exams and then is randomly split into two groups. One
group participates in the “Mozart for Minors” program and the other
group does not. Ideally, the two groups will have exactly the same
experiences, except for participation in the music program. This is
easier said than done! After a reasonable period of time, the students
take exams to measure verbal and math skills again. The averages of
the changes in the scores (second exam – first exam) can be compared
for the two groups.