What is Statistics?

Download Report

Transcript What is Statistics?

What is Statistics?
Chapter 0
What is Statistics?
• Statistics is the science (and art) of learning
from data.
• Statistics is the study of variability.
• Statistics is the study of how to collect,
organize, analyze, and interpret numerical
information from data.
4 Areas of Statistics
1. Data Production
(quality data – surveys, observations, experiments)
2. Exploratory Data Analysis
(organizing, describing, and analyzing data)
3. Probability
(the study of chance)
4. Inference
(making stat. sound decisions with confidence)
Data Production
Data VS. Personal Experience
• Read example P.1 on page 7 and be
prepared to discuss.
• Just because one plane crashes does not
mean flying is dangerous.
• “Well, my grandma…” or “My Uncle
John…”
Data Production
Data VS. Personal Experience
Suppose a group of students wanted to find
out if their classmates prefer cheeseburgers
from McDonald’s or Burger King. They decide
to ask 50 people under the age of 20 which
fast-food restaurant they prefer. In order to
save time and energy, they conduct their
survey at the McDonald’s closest to campus.
Available Data
• Available data are data that were produced
in the past fro some other purpose but that
ay help answer a present question.
• www.fedstats.gov (Federal Statistics)
• www.cdc.gov/nchs (National Center for Health Statistics)
• www.nces.ed.gov (National Center for Education Statistics)
Having Kids or not?
• Read example P.3 on page 9, and be
prepared to discuss
Where the data come from is
important!
A representative sample is a sample that takes
on all the characteristics of the population.
“In 1976, Shere Hite published The Hite Report on
Female Sexuality, Seven Stories Press, New
York, NY, 2004. The conclusions reported in her
book were based on 3,000 returned surveys
from some 100,000 surveys distributed by
various women’s groups. The results were that
women were highly critical of men. In what way
might the author’s findings have been biased.”
Surveys
Surveys are popular ways to gauge public
opinion.
1. Select a sample of people to represent a
larger population.
2. Ask the individuals in the sample some
questions and record their responses.
3. Use sample results to draw some
conclusions about the population.
* Getting valid survey results is not easy!
Observational Study
VS Experiment
• In an observational study, we observe
individuals and measure variables of
interest but d not attempt to influence the
responses.
• In an experiment, we deliberately do
something to individuals in order to
observe their responses.
Different Studies, Different Reasons
Surveys are usually intended to tell us something
about the population the survey was drawn from.
Experiments/Observational studies are usually
intended to compare one or more groups (ie,
does this new pill reduce stress?).
In short, our goal with surveys is to generalize.
Our goal with experiments/ obs. studies is to
compare.
Estrogen and Heart Attacks
• Read Example P.4 on page 10 and be
prepared to discuss.
Effect of Change
An observational study, even one based on
statistical sample, is a poor way to gauge
the effect of a change. To see the
response to a change, we must actually
impose the change. When our goal is to
understand cause and effect, experiments
are the best source of convincing data.
Examples
Question: Does drinking at least five carbonated sodas
a week improve a student’s GPA?
Observation: Compare GPA’s of a sample of students
who drink more than five sodas a week with those
who drink less.
Experiment: From a random group of students, require
some to drink more than five sodas per week and
require the rest to drink less. After a couple of years,
compare their GPAs.
Data Analysis
Making Sense of Data
Data Analysis
•
Statistical tools and ideas can help you
examine data in order to describe their main
features. This examination is called
exploratory data analysis.
1. Begin by examining each variable by itself.
Then move on to study relationships among
the variables.
2. Begin with graphs. Then add numerical
summaries of specific aspects of the data.
Data Analysis
• Data analysis is the act of transforming
data with the aim of extracting useful
information and facilitating conclusions.
W5HW
•
•
•
•
•
•
•
Who
What
Why
When
Where
How
By Whom*
• Studies are often produced by people with an
agenda that creeps into their interpretation of the
data they generate.
Population and Sample
• In statistics, we use the term population to refer to
the entire group of people or objects about which
information is desired. A study that examines data on
the entire population is called a census. However,
conducting a census is rarely feasible. A sample is a
(typically small) part of the population. If the sample is
selected carefully, so that it is representative of the
population, we still gain very useful information about
the population. The number of observational units
studied in a sample is the sample size. The essential
idea of a sample is to learn about the whole by
studying a part.
Individuals and Variables
• Individuals (observational unit) are the
objects described by a set of data
Individuals may be people, but they may
also be animals or things.
• A variable is any characteristic of an
individual. A variable can take different
values for different individuals. (A
characteristic that changes from individual
to individual.)
Categorical and Quantitative
• A categorical variable (qualitative)
places an individual into one of several
groups or categories.
• A quantitative variable takes a numerical
values fro which arithmetic operations
such as adding and averaging make
sense.
Distribution
• The distribution of a variable tells us
what values the variable takes and how
often it takes these values.
• The pattern of variation of a variable is its
distribution.
Describing Variables
Categorical (Qualitative)
• Bar Graphs
• Side-by-Side Bar
Graphs
• Pie Charts
Quantitative (Numerical)
• Dotplots
• Stemplots
• Histograms
• Ogives
Dotplots
• Example P.7 on page 16
Exploring Relationships
Between Variables
• Variables rarely exist in isolation and that
one of the many uses of data analysis is to
uncover how a change in one or more
variables influences change in another
variable. Ultimately, we want to uncover
cause-and-effect relationships, but that
comes after we gain an understanding of
how quality data is collected and used in
inference. Many relationships between two
variables are influenced by other variables
lurking in the background.
Exploring Relationships
Between Variables
• Example P.8 – On-time Flights
• Read and prepare to discuss.
Probability
What are the chances?
Probability
• The study of random variables, which
includes the study of probability, provides
the mathematical basis through which we
can use results from data to make
inferences about populations.
• Coin Toss
• Chance behavior is unpredictable in the
short run but has a regular and predictable
pattern in the long run.
Statistical Inference
Drawing Conclusions from Data
Inference
• Read Statistical Inference: Drawing
Conclusions from Data on page 23.
• Read example P.11 on page 24.
• Be prepared to discuss both articles.
Homework
Can magnets help reduce pain?