Transcript Data

Statistics Class 2
1/25/2012
Intro to Statistics
Data are collections of observations (such as measurements,
genders, survey responses).
Statistics is the science of planning studies and experiments,
obtaining data, and then organizing, summarizing,
presenting, analyzing, interpreting, and drawing conclusions
based on the data.
A Population is the complete collection of all individuals
(scores, people, measurements, and so on) to be studied.
The collection is complete in the sense that it includes all the
of the individuals to be studied.
A Census is the collection of data from every member of the
population.
A Sample is a sub-collection of members selected from a
population.
Statistical Thinking
When conducting statistical
analysis it is important to
consider:
Context of the data
Source of the data
Sampling method
conclusions
practical implications
•
•
•
•
•
Context of Data
When examining data it is important to consider the context of
the data. Consider the statement below by John Allen
Paulos.
The problem isn’t with statistical tests themselves but with what
we do before and after we run them. First, we count if we can, but
counting depends a great deal on previous assumptions about
categorization. Consider, for example, the number of homeless
people in Philadelphia, or the number of battered women in
Atlanta, or the number of suicides in Denver. Is someone
homeless if he’s unemployed and living with his brother’s family
temporarily? Do we require that a women self-identify as battered
to count her as such? If a person starts drinking day in and day
out after a cancer diagnosis and dies from acute cirrhosis, did he
kill himself?
Source of Data
Consider this news clip from UK based paper The Independent:
Tobacco companies are funding research into infertility in a bid to
counter widespread evidence that smoking drastically undermines
the chances of conceiving.
Philip Morris, one of the world's largest cigarette firms, is being
accused by the anti-smoking lobby of attempting to deceive
smokers into believing they can improve their chances of having
children if they take vitamin supplements.
Sampling Method
Collecting sample data fro a study can have a great influence
on the result of the study.
• Literary digest poll page 3 of the text.(Demographics)
• Question: Do you think the site Rate my Professor has
accurate data or results?(Self Selection)
Conclusions
It is important to state conclusions carefully, to avoid claiming
more than your results justify.
Practical Implications
Sometimes the Statistical significance of a study can differ from
its practical significance.
Example 6 in your book sites the results of a study of the Atkins
weight loss program. The mean weight loss of the program was
2.1 pounds after 1 year.
Statistical Significance
Ex 7: Apparently using MicroSort 13 out of 14 couples that
wanted to have girls had girls. Normally there is about a 1 in
1000 chance of that happening.
Ex 8: What if instead of 13 out of 14, only 8 out of 14 couples
had a baby girl?
The results from example 7 would be Statistically Significant,
while those of example 8 would not.
Types of Data
A Parameter is a numerical measurement describing some
characteristic of a population.
A Statistic is a numerical measurement describing some
characteristic of a sample.
Quantitative (or numerical) data consist of numbers
represent counts or measurements.
Categorical (or qualitative or attribute) data consists of
names or labels that are not numbers representing counts or
measurements.
Discrete data result when the number of possible values is
either a finite number or a "countable" number.
Continuous (numerical) data result from infinitely many
possible value that correspond to some continuous scale
that covers a range of values without gaps, interruption, or
jumps.
The nominal level of measurement is characterized by data
that consist of names, labels, or categories only. The data
cannot be arranged in an ordering scheme (such as low to
high).
Data are at the ordinal level of measurement if they can be
arranged in some order, but differences (obtained by
subtraction) between data values either cannot be
determined or are meaningless.
The interval level of measurement is like the ordinal level,
with additional property that the difference between any two
data values is meaningful. However, data at this level do not
have a natural zero starting point (where none of the quantity
is present).
The ratio level of measurement is the interval level with the
additional property that there is also a natural zero starting
point (where zero indicates none of the quantity is present).
For values at this level, differences and ratios are both
meaningful.
Ratio:
There is a natural zero starting point ex: Distances
and ratios are meaningful.
Interval:
Differences are meaningful, but there ex: Body temperatures
is no natural zero starting point and in degrees Fahrenheit
ratios are meaningless.
Ordinal:
Categories are ordered, but
differences can't be found or are
meaningless.
ex: Ranks of colleges in
U.S. News and World
Report
Nominal:
Categories only. Data cannot be
arranged in an ordering scheme.
ex: Eye Colors
Levels of Measurement
Quiz 1
1.Explain the difference between a Census and a
Sample.
2.Explain the difference between numerical and
Categorical data.
3.Give an example of data that is at the nominal
level of measurement.
Quiz 1 Solutions
1.Explain the difference between a Census and a
Sample. A census is a collection of data from the
entire population while a sample is from a subset.
2.Explain the difference between numerical and
Categorical data. Numerical data consists of
numbers. Categorical data consists of names of
labels or categories
3.Give an example of data that is at the nominal
level of measurement. Political affiliation.
Critical Thinking
We must think carefully about the context of the data the source
of the data, the method used in data collection, the
conclusions reached, and the practical implications
Fun Quotes
"There are three kinds of lies: lies, damned lies,
and statistics"-Benjamin Disraeli
"Figures don't lie; liars figure."-Mark Twain
"There are two kinds of statistics, the kind you
look up, and the kind you make up."-Rex Stout
•
•
•
Bad Statistics
Bad statistics happen either by evil intent or unintentional
errors. How to Lie with Statistics a book written by Darrell
Huff in 1954, is the classic text on this topic and has many
examples of intentional or unintentional misuses of statistics.
Misuse of graphs is a common way to misrepresent data or
results.
Bad Samples may result in incorrect findings as well. Bad
samples occur when the methods used to collect the data
results in a biased sample. So that the sample does not
represent the population from which it was obtained
A voluntary response sample (or self-selected sample) is
one in whch the respondents themselves decide whether to
be included.
Ex: Any poll or survey where the readers or listeners decide to
participate
Another way to misinterpret statistical data is to find a statistical
association between two variables and to conclude that one
of the variable caused the other. The relationship is called a
correlation. When one of the variables does cause a
change in the other, then we have causality.
Correlation and Causality
Reported results
When collecting data from people, it is better to take the
measurements yourself rather than rely on subjects to report
results.
Ex. 3 Voting Behavior When surveyed about whether they
voted or not about 70% of 1000 eligible voters reported that
they had voted. Voting records showed that only 61% had
indeed voted.
Small Samples
Conclusions should not be drawn from samples that are far too
small.
Ex. 4 The Children's defense fund published an article Children out
of School in America, in which it was reported that in a certain
school district that 67% of the students were suspended at least
three times. The figure is based on a sample of only 3 students.
Percentages
Percentages can be cited in a manner
that is either unclear or
misleading. The fact is that a 100%
of something is all of the
something. So if you see
percentages about 100% being
cited it is probably not justified.
Other Sampling considerations
Wording of a question
97% yes: "Should the President have the line item veto to
eliminate waste?"
57% yes: "should the President have the line item veto, or not?"
Order of Questions
Nonresponse
Missing Data Phone surveys miss people without phones
Self-Interest Study Kiwi Shoe Polish, getting a job
Other Sampling Considertations
Precise Numbers You cannot assume precise numbers are
accurate
Deliberate Distortions Avis vs Hertz
Collecting Sample Data
If sample data are not collected in an appropriate way, the data
may be so completely useless that no amouicsnt of statistical
torturing can salvage them.
In an Observational Study, we observe and measure specific
characters, but we don't attempt to modify the subjects being
studied.
In an experiment, we apply some treatment and then proceed
to observe its effects on the subjects. (Subjects in
experiments are called experimental units.)
Types of Samples
A Simple random sample of n subjects is selected in such a
way that every possible sample of the same size n has the
same chance of being chosen.
In a random sample members from the population are
selected in such a way that each individual member in the
sample has an equal chance of being selected.
A probability sample involves selecting members from a
population in such a way that each member of the population
has a know (but not necessarily the same) chance of being
selected
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample?
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample?
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample? No not all samples of size two have
the same chance of being picked. (a sample of senators
from different states cannot be picked at all).
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample? No not all samples of size two have
the same chance of being picked. (a sample of senators
from different states cannot be picked at all).
Probability sample?
Types of Samples Example
Each of the 50 states sends two senators to Congress, so
there are exactly 100 senators. Suppose that we write the
name of each state on a separate index card, then mix 50
cards in a bowl, and then select one card. If we consider the
two senators from the selected state to be sampled, is this
result a random sample? Yes since each individual senator
has an equal chance of being picked.
Simple random sample? No not all samples of size two have
the same chance of being picked. (a sample of senators
from different states cannot be picked at all).
Probability sample? Yes since each senator has a know
Other Sampling Methods
In Systematic sampling, we select some starting point and then
select every kth element in the population.
With convenience sampling, we simply use the results that are very
easy to get.
With Stratified sampling, we subdivide the population into at least
two different subgroups (or strata) so that subjects within the same
subgroup share the same characteristics (such as gender or age
bracket), then we draw a sample from each subgroup (or stratum).
In Cluster sampling, we first divide the population area into sections
(or clusters), then randomly select some of those clusters, and
then chose all the members from those selected clusters.
Other Sampling Methods
Multistage sampling occurs when pollsters collect data using
a combination of the basic sampling methods. In a
multistage sample design, pollsters select a sample in
different stages, and each stage might use different methods
of sampling.
Homework
1-2: 1-14, 15-25. odd
1-3: 1-4, 5, 7, 11-17 odd, 21-33 odd.
1-4: 1-4, 8-20, 23, 25
1-5: 1-4, 5-26
Say Hello via a comment on my website
Extra Credit Coupons for articles citing devious uses of
statistics.
Remember the odds are in the back :)