Slides - Minnesota Defense Lawyers Association

Download Report

Transcript Slides - Minnesota Defense Lawyers Association

health & environmental sciences • failure analysis & prevention
Statistics in the
Courtroom
Dr. Nathan Soderborg, Principal Scientist
MDLA Trial Techniques Seminar
Duluth, Minnesota
August 19, 2016
A leading engineering & scientific consulting firm dedicated to helping our clients solve their technical problems.
2
Today’s Presentation
• Review (briefly) a few important statistical concepts
• Identify strengths and weaknesses of statistical arguments
• Point out some improper or dubious applications to be aware of
• Share principles you can apply to
– Bolster your own cases
– Counter the opposition
(Reference: Checklists)
3
Statistical Applications in Litigation
• Example contexts for
statistical evidence
–
–
–
–
–
–
–
–
Product liability/risk
Epidemiology
Construction defects
Personal injury
Employment discrimination
Consumer class actions
Environmental law (toxic torts)
Business disputes
(breach of contract, trade dress)
• Questions statistics help
answer
– Is there a difference?
– Is there a relationship?
– Is there a risk?
– Is it statistically significant?
– Is it practically significant?
4
Statistical Significance and Sampling
5
Basic Statistical Inference
• Use known data collected in a
sample to draw conclusions
about unknown characteristics
of a population
• Based on the laws of probability
Population
Probability
Sample
• A result is called statistically significant if it is unlikely to have occurred by
chance based on an assumed probability model for the data
• The concept of a confidence interval is often used to establish statistical
significance
6
Example: Sampling & Confidence Intervals in Polls
Confidence Intervals:
•
USA TODAY/Suffolk Poll:
Clinton’s lead over Trump
narrows to 5 points
Basically, if similar polls were
taken repeatedly, 95% of the
resulting confidence intervals
would contain the true
percentage favoring a
candidate
A larger sample reduces
interval size
“Clinton now leads Trump by 5
percentage points, 45.6% to 40.4%...”
•
“The poll of 1,000 likely voters, taken
by landline and cell phones from June
26 to 29 has a margin of error of +/3%.”
[39%, 45%] is a “95% confidence interval” for Trump
http://www.usatoday.com/story/news/politics/election
s/2016/07/04/usa-today-suffolk-poll-voters-alarmedtrump-clinton/866325267/[/7/2016 11:28:41 AM]
Sampling biases, poll dates
can lead to different results
White House Watch:
Trump 42%, Clinton 40%
“The latest Rasmussen Reports
national telephone and online survey
of Likely U.S. Voters finds Donald
Trump with 42% of the vote, while
Hillary Clinton earns 40%.”
“The survey of 1,000 Likely Voters was
conducted on July 5, 2016… The
margin of sampling error is +/- 3% with
a 95% level of confidence.”
http://www.rasmussenreports.com/public_content/poli
tics/elections/election_2016/white_house_watch
[7/7/2016 11:04:29 AM]
7
Sampling Factors that Affect Poll Results
• Random samples of adequate size contribute to accuracy
• Other sampling factors could also affect poll accuracy, e.g.,
Are respondents registered voters or likely voters?
When was the poll taken?
Is the sample disproportionately drawn from areas with a certain party affiliation?
Are certain demographic groups over-represented (e.g., male/female,
older/younger, richer/poorer)?
– Does question phrasing or manner of questioning by the pollster affect responses?
–
–
–
–
What are the analogous questions for sampling in other cases?
8
Example: Inadequate Sample
Allegation: Moisture intrusion in housing
development
• Total of 1,064 homes in five distinct styles
(villages)
• In depth testing* in five homes yielded one
positive result, so the plaintiff estimated
1/5  1,064 = 213 homes with moisture
• For a random sample, the 95% confidence
interval is [5, 762]
[
5
]
213
 Inadequate sample size
*Using a piezometer, an instrument for measuring fluid [water] pressure
762
1064
In Depth Tests
9
Example: Biased Sample
• How were the 5 in depth tested homes selected?
(i.e., Was the sample random?)
• Data from a prior round of screening tests came
to light over the course of the case
– A combined total of 77 homes in five villages had
screening tests
– 3 of the 5 in depth tested homes had the highest
screening test result in their respective villages
• The chance of 3 village maximums in a random
sample of 5 is < 4 in 1,000 (0.004)
 Indicates bias in sample selection
Screening
Test Maximums
Screening
Tests
10
Ensuring Sampling is Representative
• Good sampling is essential for obtaining trustworthy estimates
• Statisticians can help
– Design representative samples, e.g.,
• Prospective studies: random assignment of subjects to groups under study
• Retrospective studies: random selection of people, sites, etc.
– Criticize poor sample designs
• Principles
– Every statistical estimate based on a sample is subject to uncertainty
– Larger random sample sizes reduce estimate uncertainty
– Confidence Intervals help quantify uncertainty
11
Demonstrating Association and Causation
12
Association
• Question: Is disease or injury associated with exposure to a particular
substance/product
• This could be answered by establishing correlation
– E.g., high exposure coincides with high disease/injury rates and low exposure
coincides with low disease/injury rates
• This could be answered using a hypothesis test
– E.g., Disease/injury rates are determined to be “significantly” higher in the
exposed vs. unexposed group
13
Demonstrating Association by Correlation
• Correlation
– Quantifies and illustrates
the relationship between
variables
– Indicates association
– Is necessary but not
sufficient to prove causation
• For spurious correlations,
look for “lurking”
(i.e., missing) variables
Example: Cigarettes Smoked v. Lung Cancer Risk
Spurious Correlation Examples
Jed Spree and Kat Paton: https://tackk.com/0z27aw
14
Demonstrating Association by Hypothesis Tests
• Guidelines*
– “Hypotheses are generally based on a series of assumptions that can be
challenged
– Investigators check their assumptions by devising tests that could disprove their
theories
– A hypothesis that survives challenge may be true”
• Statistical Hypothesis Testing Pattern
–
–
–
–
Make an assumption – (i.e., assume a probability model)
Conduct the study, collect and organize data (typically from a sample)
Calculate the probability of results found in the data (based on the assumption)
If probability is low, reject the assumption
*Janet Macher, ed., Bioaerosols: Assessment and Control,
American Conf. of Governmental Industrial Hygienists 2-7, 1999
16
Example: Drinking Water Contamination
• Key Question in the Study:
If disease incidence in Group A (exposed to the chemical) is greater than in Group
B (not exposed), is the result due to chance or the chemical?
• Answer Based on Statistics:
1. Assume no effect due to the chemical
2. Then disease incidence for Group A would be the
same as for Group B
3. Based on this assumption, calculate probability, p,
of a difference in rates >= the difference in the study
4. If p is low, reject the assumption that the chemical
has no effect; conclude it increases disease rate
null hypothesis
probability model
p-value
“reject the null”
17
Meaning of p-values
• p=0.05 means that,
– assuming no association were present,
– the probability is 5% of observing a difference at least as large as the study result
– due to random chance
• Results with p  0.05 are called statistically significant; this is a
convention, established by long tradition
• NOTE
– p is the probability of “extreme” data given the null hypothesis
– p is not the probability of the null hypothesis given extreme data
18
Hypothesis Testing Truth Table
DECISION
Null Hypothesis:
No difference or effect
(status quo)
TRUTH
NO Difference Exists
(Null Hyp. True)
Conclude NO Difference
Conclude a Difference
(Fail to Reject Null Hyp)
(Reject Null Hyp.)
CORRECT DECISION
Difference Exists
FALSE NEGATIVE
(Null Hyp. Not True)
Type II error
FALSE POSITIVE
Type I error
p-value probability
CORRECT DECISION
A high p-value results in a conclusion of no difference—it does not prove “no difference;”
rather, —the probability of the difference found being due to chance is high
—so evidence for a difference is not sufficient
Hence terminology: “fail to reject” the null instead of “accept” the null
19
Criminal Law Analogy
DECISION
Null Hypothesis:
TRUTH
Innocent
(until proven guilty)
INNOCENT
(Null Hyp. True)
GUILTY
(Null Hyp. not True)
NOT GUILTY
GUILTY
(Fail to Reject Null Hyp.)
(Reject Null Hyp.)
CORRECT VERDICT
FALSE CONVICTION
FALSE ACQUITTAL
CORRECT VERDICT
Not Guilty verdict does not mean innocence is proved;
rather, there insufficient evidence to assign guilt
20
Taking Care with Significance Findings
• Statistical significance does not ensure practical significance
– Is the effect size considered meaningful for health, environment, etc.
Note how
headlines differ!
• Recent Example
– Are results practically significant? Opinions may differ
– Risk of anencephaly rose from 2 in 10,000 to 7 in 10,000 in children of women
taking a particular antidepressant (relative risk of 3.5)
• WSJ: “The increase in absolute risk was small”
• USA Today: “While risks for some conditions doubled or tripled,…overall risks [were] low”
22
Example: Significance False Positives-Benedictin
• Controversy and Withdrawal
– Morning sickness drug, appeared on the market in 1956
– Removed from market 1983 amidst birth defect lawsuits
– By 1983 ~33 million women worldwide had used the drug
• Re-emergence
– 2013: FDA approves return of the drug
– The scare had proven to be a false alarm
– “Bendectin was the archetypical case of junk
science scuttling a perfectly safe product. …
It was a sad episode in American
jurisprudence.” -Dr. Michael Greene, Chief of
Obstetrics at Mass. Gen. Hospital
23
Example: Benedictin Story Numbers
• US baby birth defect rate:
– 2.5% overall
– 0.2% for limb defects
• Given 33 million users, it is not
surprising that…
– Babies whose mothers had taken
Benedictin were born with defects
– Benedictin would be investigated as a
cause and lead to lawsuits
– A few out of many studies could report a
significant association between
Benedictin and birth defects
In Fact: With 95% probability of a correct conclusion,
the probability of 10 correct conclusions in 10
independent tests is only ~ 60%
24
Demonstrating Causation
• Causation typically can’t be “proved” with certainty
• Legal standards include
– Injury “more likely than not,” e.g., due to a certain kind of exposure
– “Substantial contributing cause”
– “But for” exposure, injury would not have occurred
• Scientific standards include
– “With a reasonable degree of scientific certainty”
– “At the X% confidence level”
• General vs. specific causation
– Even if studies show a particular substance/action can cause the injury, it is often
necessary to show it did cause that specific injury in a specific plaintiff
25
Factors that Strengthen Causation Claims (Hill)
• Exposure precedes the disease/injury (a must)
• Reduction in disease/injury incidence following exposure cessation
• A strong measure of association, i.e., large effect size—the larger, the better
• Risk increases as dose increases; consistency across a wide range of values
• Existence of multiple confirming studies (replication)—the more, the better
• Plausibility with respect to biological and other existing knowledge
• A highly specific association—the more specific, the better
• Elimination of alternative explanations
26
Assessing Risk and Exposure
27
Risk
Severity: counts or extent of harm
Risk =
Number of Harmful Outcomes
Chances to experience Harm
• Statistical approaches
Exposure: no. of products, users;
time in use
– Compare risks between two substances or activities
– Estimate (absolute) risk in terms of a familiar measure based on historical data
• It is statistically invalid to evaluate risk by
– Counting only the number of injuries associated with a device, substance, activity
– And not considering aspects such as
• Number of items, participants, or amount of the substance in the environment or use
• Behavioral and environmental conditions and factors associated with exposure or use
28
Example: Injury Comparative Risk Analysis
• Background
Over a 4 year period
– An off-road vehicle brand was
subject to injury lawsuits for an
alleged design defect
• Comparative Risk Analysis
– Injury and vehicle data was
collected from US CPSC
National Electronic Injury
Surveillance System (NEISS)
and other public sources
– Injury rate for this class of
vehicle was compared to the
rate for other classes of offroad and sport vehicles
Vehicle Comparator 1 Comparator 2 Comparator 3 Comparator 4
Estimated ER Treated
Injuries for Comparison
Vehicle
381,000
664,000
62,000
421,000
VehicleYears of Use for
Comparison Vehicle 11,028,000
39,200,000
4,796,000
31,230,000
Rate per 10,000 Vehicle
Years of Use
345
169
129
135
Severity
Exposure
Risk
Investigated
Vehicle
2,965
1,871,398
16 *
*95% upper confidence limit = 21
29
Example: Absolute Risk of Traffic Fatality
• Scenario
– Business is located next to a busy, suburban 4-lane road
– Business customer is standing in business driveway entering a vehicle
– Distracted driver runs off the road, across sidewalk, into driveway and hits
customer, resulting in fatality
• Questions
– Was the event foreseeable?
– Should businesses in that area have anticipated the risk to their customers?
– Should they have installed barriers, or be required to?
30
Example: Quantifying Absolute Risk
Local Exposure Data
National Exposure Data
Fatality Data
Scenario, per NHTSA
Fatality Analysis Reporting
System (FARS), 2011
(Any) Fatal Collision with a
Pedestrian
In Position of Interest*
In Position of Interest, during
Business Hours
In Position of Interest, during
Business Hours, "Outside
Traffic way"
*On sidewalk, in driveway, etc.
Site Rate
US Rate
Number of
Pedestrians
Killed in US
US
Vehicle Miles
Traveled
(in Billions)
Scenario Occurrence/
Vehicle Miles Traveled
Vehicle Miles
Traveled South
Past the
Business Daily
Risk of
Occurrence per
1 Million Years
4,432
2,946
0.0000001504%
448
246
170
2,946
0.0000000058%
448
9
119
2,946
0.0000000040%
448
7
29
2,946
0.0000000010%
448
2
31
Ensuring Risks are Represented Properly
• Is exposure considered and quantified correctly?
– Are data sources complete?
• Have risk comparisons been made fairly?
– Based on truly comparable products, drugs, or activities?
– Using data from the same or close periods of time?
• Have the many variables that define an event been considered in
comparing an event to others?
– E.g., age, demographics, environmental conditions?
• Have all lurking or confounding variables been accounted for properly?
32
Ways a Statistical Expert can Help
• Reactive: Challenge opposing party’s
– Inadequate sample sizes or plans
– Biased data collection
– Unsupported damage theories,
estimates, and interpretations
Didn’t look hard enough
or carefully enough
Looked in the wrong or
unrepresentative places
Extrapolated based on
faulty assumptions
• Proactive: Develop alternative, scientifically defensible
–
–
–
–
Efficient sampling plans
Estimation methods
Estimates of uncertainty
Empirically-based conclusions
33
Additional Uses of Statistical Evidence
• Bolster Quality Defense
– Show that an organization adequately anticipated risks and applied
countermeasures to avoid failures
– Show that specifications and design and manufacturing processes were
reasonable considering known sources of variability in manufacture and usage
– Use data to show that manufacturing systems made products within
specifications
• Analyze and predict performance
– Estimate and predict rates of field failure
– Estimate and project warranty costs
– Make performance comparisons to other manufacturers and products
34
For More Information, Contact
Nathan Soderborg, Ph.D.
Principal Scientist
(248) 324-9139
(734) 276-4494 (mobile)
[email protected]
35
APPENDIX:
Checklists for evaluating the strength of statistical
evidence and planning cross examination
36
Definition
A result is statistically
significant if it is unlikely
to have occurred by
chance based on an
assumed probability
model for the data
Questions
 Are the assumptions known and well understood?
 Are assumptions realistic and representative of the facts?
37
Principles
Questions
Every statistical estimate is
subject to uncertainty
 Is uncertainty quantified?
Samples should be
constructed to be
representative of the media
or population under study
 Is the sample large enough to acceptably minimize study
error due to chance (e.g., per a- and -error rates and conf.
interval widths)?
Larger sample sizes enable
detection of smaller
differences as statistically
significant
Smaller sample sizes can be
used to detect differences for
continuous variables
 Is the method for determining uncertainty valid?
 Is the sample designed using concepts such as randomization
and stratification to minimize selection bias?
 Does the sample account for exposure over appropriate
times and locations?
 Would a different sample size or significance level lead to a
different conclusion?
38
Principles
Studies should be well
planned, in advance
Questions
 Are the right variables being studied to ensure meaningful results
and efficient use of resources?
 Does the plan identify and address in advance…
 Potential sources of bias?
 Confounding variables?
 Hypotheses to be tested and subgroups that will be analyzed
and compared?
 Clear, objective measures and cutoff levels for exposure and
outcomes?
39
Principles
Epidemiology studies the
incidence, distribution,
and cause of disease in
human populations
Epidemiological studies
are concerned with
general causation—they
do not prove specific
causation
Questions
 Are the right variables being studied to ensure meaningful results
and efficient use of resources?
 Does an epidemiological study include random assignment of
subjects to one of two groups:
 One group exposed to the agent of interest?
 The other group not exposed?
 With both groups evaluated for disease development?
 Is the study placebo controlled? i.e., is the group not receiving
the active agent given an inactive ingredient that appears similar
to the active agent being studied?
 Is the study double blind? i.e., neither participants nor
investigators know who receives the active agent and who
receives the placebo?
40
Principles
Statistical
significance
means the
probability is
small of
observing a
difference at
least as large as
the study result
due to random
chance
Statistical
significance does
not imply
practical
significance
Questions
 Do lab or measurement results…
 Come from sources that can be trusted (e.g., certified facilities, with strict QC)?
 Have acceptable measurement system accuracy and precision?
 Are statistically significant effects reported with confidence intervals?
 Is the size of the reported effect practically significant? i.e., important in relation to
rules, specifications, regulations, or effects on health, environment, finances, etc.?
 Is a claim of no effect reported with power sufficient for detection of an effect?
 Are a reasonable number of comparisons made compared to the number of
significant results reported?
 Do investigators…
 Identify the influence of any outliers in the data; acceptable rationale for
excluding or maintaining outliers?
 Identify and quantify effects of any confounding variables?
 Provide full transparency regarding exclusion of any study subjects?
 Provide data reporting for all subgroups and at all study follow-up times?
41
Principles
Correlation does not
imply causation
Causation typically
can’t be proven with
certainty; statistics
combined with other
scientific evidence can
bolster claims of
causation
Questions
 Is correlation strong or weak?
 Do scientifically valid explanations support a claim of causation?
 Can claims of causation be supported by any of the following criteria
(due to Dr. Hill)?
 Exposure precedes the disease/injury (a must)
 Reduction in disease/injury incidence following exposure cessation
 A strong measure of association, i.e., large effect size—the larger, the
better
 Risk that increases as dose increases; consistency across a wide
range of values
 Existence of multiple confirming studies (replication)—the more, the
better
 Plausibility with respect to biological and other existing knowledge
 A highly specific association—the more specific, the better
 Elimination of alternative explanations
42
Principles
Questions
In evaluating risk, severity,  Is severity of harm important or relevant?
along with exposure level  Do exposure estimates account for item volume, usage
and conditions, must be
time/patterns, and related behavioral and environmental
taken into account
factors?
43
Appendix: Interesting Examples
44
Example: Traffic Fatalities v. Lemon Imports
http://www.grossmont.edu/johnoakes/s110online/Causation%20versus%20Correlation.pdf
45
46
return