Communicating Quantitative Information

Download Report

Transcript Communicating Quantitative Information

Communicating Quantitative
Information
Diagrams
Sampling issue
Risk & Communicating Risk
Smoking. Hormone Replacement Therapy
Dimension
Homework: Prepare/Design diagram/chart. Postings
Special Survey
• Will come back to topic of Sampling
• Accuracy (confidence, error) of tests, surveys depends
on
– quality of sample
– size of sample
• My sister asked me: can [young people] identify
Einstein?
– Use my students / sample of my students
– Spring 2008:
• students responding to request to take survey in both my
classes
• Last Spring: quantity was pretty small (14 + 11)
Preview
• Margin of error:
– Claim actual result (for whole population) is within
certain limits (answer plus/minus MoE)
• Confidence
– confidence that this particular sample is not so
unusual as to make results wrong where wrong
means the actual result is outside the margin of error.
– Generally, the means (averages) of samples are
distributed normally around the mean of the whole
population and the SD of this distribution is smaller
(tighter) than the SD of the distribution for this quantity
for the whole population.
Thought experiment….
• Want to get average height of people in the
class.
• Claim: impossible to measure everyone, so use
a sample.
• Only time to measure 6 people
62 72 68 66 69 71 for a sample mean of 68
• Statistics say (IF the sample is random) then we
can be 95% confident that the mean of the
whole class is between
61.4 and 74.6
– will spend some more time on this
Thought experiment
• Want to know favorable rating of the President from
whole USA population.
• Ask a sample
• p (proportion) of sample view President favorably.
• Want to make a statement with 99% confidence
• Formula for the margin of error, call it E: we can be 95%
sure that the proportion of the whole population
favorable is within p-E and p+E
• If we want to be 99% sure, then formula will give a
bigger margin of error, call this F: we can be 99% sure
that the proportion of the whole population favorable is
within p-F and p+F.
Another way
• Random sample of size N means that each
person in the whole population equally likely to
be in the sample
• There are many samples of size N
• The results of a sample of size N vary, but
– Some samples of size N are very different from the
whole population, but most aren’t
• In most cases, the sample result will be close to
the result for the whole population
– What do I mean by close? Within the margin of error
– What do I mean by most? This refers to the
confidence interval (19/20, 99/100)
Formally…
• The averages of samples of size N are normally
distributed
• The average (mean) is the mean of the whole
population
• The standard deviation is smaller by a factor of
square root of N
– Think of narrow mountain
– To half (reduce to ½ what it was) the required margin
of error, you need to quadruple (*4) the sample size
Note
• The size of the whole population does not
enter into these calculations!
Quality of sample
• Does not mean: how good you are…in any way.
• Does mean: how representative of general population
– For the Einstein question, this means how representative of
'young people today'. Amend that to college students.
• The former class was practically all seniors. That class
and all since tend to be journalism, history, political
science majors…
– Students who don't have specific required math&science
courses
• These factors mean sample is not representative of
college population!
Quality of sample
• Opportunity sample
– subjects available to me in my classes. Are
they/you typical of 'young people'?
(My sister thought yes.)
• Response bias
– students who took up offer. Are they/you
more likely to 'know Einstein' than those who
didn't.
• higher level of general curiosity
• diligence at obtaining extra credit
Tester reliability
• I was generous in categorizing answers as
correct.
• Two questions considered separately
– 23 out of 26
– 24 out of 26
Reporting
Confidence at level alpha that actual proportion is
within error of tested proportion
More confidant at larger interval
• I am 95% confident (chances are only 1/20 that
this is wrong) that the actual proportion is at
least 83% that knows Einstein.
• I am 99.5% confident (chances are 1/200 that
this is wrong) that the actual proportion is at
least 78%
Formulas for margin of error
• Based on the finding that means of
samples are [close to] normally
distributed with standard deviation function
of tested proportion and size of sample.
• One-tailed test (just checking one side
because tested proportion close to 1)
Correlation (again)
• Two variables
– common examples
• height & weight
• mortality & set of health risks factors (e.g., smoking
history)
• Are the two correlated? Does value of one
predict [some of] value of the other?
Linear model
• Linear = line.
• X and Y (standard names for two
variables—variables, values that vary!)
• Y = a + b*X
– if a = 0, b>0
if a>0, b>0
Note: negative values of X and/or Y may or may not be
valid…
Linear Model
• a>0, b<0
• (This will be basis of negative correlation.
Still a relationship, but in the negative. As
X gets big, Y gets small.)
Cab fare
•
•
•
•
(Numbers are not right, but the idea is)
$3 to get in
$2 every ¼ of a mile
Y is the fare/total cost (not including tip!)
and X is distance, given in miles rounded
up to the nearest quarter mile.
• Fare = 3 + 2*(miles * 4)
• Example: rode ½ a mile. Fare is 3 + 2*2 =
7
(rough) graph of cab fare
Points
(0,3), (1,8)
Aside
• Units: miles versus quarter miles, miles
versus feet versus kilometers versus …
need to be understood. Some
stories/calculations/experiments succeed
or fail based on getting the units right!
– space flight that failed due to
misunderstanding/lack of agreement on units.
Correlation
• Two variables, X and Y.
• Make a graph (computer program does
not make a graph—you think about a
graph)
• Process: determine line that would be the
best fit
– defined as minimizing sum of the squares of
the distances from the line ('least squares')
Excel example
• List two sets of
numbers:
• Graph using scatter
plot
• Use
=correl(B2:B8,C2:C8)
.96927
23
40
12
25
56
90
27
54
18
40
44
79
38
80
Scatter plot example
100
80
60
40
20
0
0
10
20
30
40
50
60
Other models
• … other relationship: quadratic, log,
exponential, etc.
Say you know deer
population at two points in
time. Is/will the growth be
linear or exponential?????
Pop.
Time
Caution
• Correlation is not cause
– coincidental
– both caused by other factor
• Cause is not….absolute determination.
– other factors
Terminology (reprise)
• False positive: wrongly say
someone/something has condition.
• False negative: wrongly say
someone/something does NOT have
condition when, if fact, he or she or it does
• Control group: group in experiment that
does not have treatment.
• treatment/condition group: group in
experiment that does have treatment
Double-blind study
• Randomly assign subjects to
– treatment
– control (may give placebo)
• Subject does not know which….
• Tester/evaluator does not know which…
• See what happens. Time period may be
long.
• Smoking cannot be studies using a
double-blind study!
Retrospective study
• Of the people who did/have X, ask how
many did Y?
• Not as reliable.
• Also need to study group that do not have
X.
• 85% of people with lung cancer report that
they smoke[d].
• (How many George Burns are there?)
Smoking and Lung Cancer
• Strong correlation
– more smoking increases chances of lung cancer
– smoking comes before the cancer
– many different studies
• Women's incidence of lung cancer went up
when women started smoking
• Incidence going down in groups decreasing
smoking
• Biological evidence
– nicotine experiments with animals
– lab study of lungs, blood, blood pressure, etc.
What's likely to kill you
http://www.reason.com/blog/show/128501.html
Small multiples
• Several (many) graphs/diagrams of the
same format
Homework
• Identify complex topic (such as health risks, sports
records, voting)
– multiple dimensions/factors; multiple categories; timeline?,
geography?
• Find reputable source (more than one source even
better)
• Determine critical findings
– determine audience
• Design/build diagram (chart, graph, picture)
– Bring to class to present AND to turn in. Be professional!
– as appropriate, consider examples shown in class:
• using 'small multiple' idea as done for 31 days
• spreadsheet, but with pictures & words (tax cuts)
– as appropriate, consider charts presented on health risks
– This could be topic for your project I paper + charts
– DUE in 1 week.
• Continue postings.