Why Statistics are Scary

Download Report

Transcript Why Statistics are Scary

“While nothing is more uncertain than a single
life, nothing is more certain than the average
duration of a thousand lives”
-Elizur Wright
By: Gail Larsen MS4 2011
 Part
of the statistical analysis for my MPH
Thesis project
 Implantable
cardioverter-defibrillators
(ICD) decrease mortality in
appropriately selected patients
 However, ICD
shocks have been
associated with increased risk of death
Are shocks detrimental per se, or are
shocks a marker of a sicker patient
population?
 Two
examples:
• Research Question#1: Do patients that receive
shocks do worse than patients that do not
receive shocks?
• Research Question #2: Do patients that only
receive ATP (antitachycardia pacing) do worse
than patients that receive shocks or patients that
receive no therapy?
 ATP has not been found to increase mortality
**Note…the following survival curves are unadjusted comparisons and do not
control for covariates, which was of course done in adjusted analysis.
 Results
seem to indicate that shocks are
protective?
 That’s
fine but this completely opposite of
the expected result
 Let’s
take a second look at the analysis
 Study
endpoint = time to death or last
follow-up
• Follow-up time = Implant date – status date
 Each
patient = one observation
• Shock yes/no
• Dead yes/no
 Shock
should be modeled as a timedependent covariate. In other words, a
person is not as risk from shocks until
they have had their first shock.
 Risk
changes after the occurrence of first
shock
**Note…this was the way this was modeled in other studies so prior
analyses were appropriate.
 Results
now consistent with what has
been previously found
 Patients
receiving shocks do worse than
patients receiving no shocks
**Note…ATP only = 2 = Shock. Sorry I didn’t make nicer graphs.
 Results
seem to indicate that ATP is
harmful (even more so than
shocks)…which has never been found
before?
 Again, let’s
analysis
take a second look at the
 Patients
stratified as no therapy, ATP only,
or shock in initial analysis
 Many
patients in the shock group
received ≥1 ATP episodes before their
first shock episode and should be
included in that group before becoming
part of the shock group
 Results
now consistent with what has
been previously found
 ATP
not associated with an increased risk
of death
Time
Contributor
Theory
Ancient Greece
Philosophers
Theoretical- no quantitative
16th Century
Cardano
Attempts to calculate probabilities of
dice (game theory)
17th Century
Graunt, Petty, Pascal,
Bernoulli, Halley
Vital statistics of populations;
Studied probability through games of
chance; First mortality tables- relates
death to age; Law of large numbers
18th Century
Laplace, Gauss, Bayes
Normal curve, Regression through study
of astronomy, Bayes Theorem
19th Century
Quetelet, Galton
First application of statistical analyses to
human biology studied genetic variation
in humans(used regression and
correlation). The Average Person.
20th Century
Lots of people
Contributing to statistical analysis as we
know it
-Wilcoxon, Cox, Fisher, ANOVA,
Logistic/ Multiple regression, etc.
 Probability
deals with predicting the
likelihood of future events
• Probability theory- the variables and the initial
state are known
 Statistics
involves the analysis of the
frequency of past events
• Statistics- the outcome is known, but the past
causes are uncertain
Bayesian
Frequentist
Probability is subjective- can be
applied to single events based on
degree of confidence or prior belief
Probability is objective- relative
frequency of an event in a large
number of trials (ex. coin flip)
Parameters are random variables
that has a given distribution, and
probability statements can be made
about them
Parameters are fixes and unknown
constants
Probability has a distribution over
the parameters, and point estimates
are usually done by either taking the
mode or the mean of the distribution
Statistical process only has
interpretation based on certain
frequencies (ex. 95% CI of a
parameter will contain the true value
95% of the time)
 Relies
on drawing random samples from a
population
 Assigns
probability to a repeatable event in
which the uncertainty is due to randomness
 Basis
for hypothesis testing and confidence
intervals
• The type of statistics we are used to seeing
 Does
not condition on the observed data
 Chooses
a probability distribution as the
prior, which represents beliefs about the
parameters of interest
 Chooses
a probability distribution for the
likelihood, which represents beliefs about
the data
 Computes
the posterior, which represents
an update of our beliefs about the
parameters after having observed the data
 This
exercise was not to illustrate that
findings should be the same as the findings
that came before
• In fact, it would have been great if shocks weren’t
associated with increased risk of death in our study.
 This
exercise was to illustrate that
drastically (polar opposite) conclusions can
be reached depending on how the data is
set up and modeled
 Just
like medicine, statistical analysis is
an evolving science
 Just
like medicine, there is controversy as
to what are the best (most appropriate)
methods to use

It is not necessary to completely understand &
critique the methods section of every study
• I only pretend to understand the underlying math and
concepts


It is worth knowing a little bit about this stuff or at
the very least trying to elucidate the underlying
assumptions, population included, etc.
It is worth knowing somebody who does
understand this stuff
Education is the path from cocky ignorance to
miserable uncertainty. - Mark Twain
Uncertainty and mystery are energies of life. Don’t let
them scare you unduly, for they keep boredom at bay
and spark creativity. - R.I. Fitzhenry
The Black Swan: The Impact of the HIGHLY
IMPROBABLE.
Nassim Nicholas Taleb