Transcript Document

Graphical Causal Models: Determining
Causes from Observations
William Marsh
Risk Assessment and Decision Analysis
(RADAR)
Computer Science
RADAR Group, Computer Science


Risk Assessment and Decision Analysis
Research areas


Software engineering, safety, finance, legal
A new initiative in medical data analysis: DIADEM
Norman Fenton
Group leader
Martin Neil
http://www.dcs.qmul.ac.uk/researchgp/radar/
Outline

Graphical Causal Models




Bayesian networks: prediction or diagnosis
Causal induction: learning causes from data
Causal effect estimation: strength of causal
relationships from data
DIADEM project
Bayesian Nets
Detecting Asthma Exacerbations

Aim to assist early
detection of asthma
episodes in Paediatric
A&E


Using only data
already available
electronically
Network created by


Experts
Data
Bayes’ Theorem
P( A, B)  P( A | B).P( B)  P( B | A).P( A)
Joint probability
P( A | B)  P( B | A).P( A)
Revised belief
about A, given
evidence B
Prior probability of A
Factor to update belief
about A, given evidence B
Bayes’ Theorem (Made Easy)
yes, no
Infection rate: P(I) = 1%
Infection
False positive P(T=pos|I=no) = 5%
Negligible false negative
pos, neg
Test



A person has a positive test result
How likely is it they are infected?
17%
Medical Uses of BNs

Diagnosis


Prediction


Differential diagnosis from symptoms
Likely outcome
Building a BN


From expert knowledge  expert system
From data  data mining
Beyond Bayesian Networks
Cause versus Association
Infection
Fever
?
or
Fever


Joint probability same:
Infection
P( I , F )
 P( F | I ).P( I )
 P( I | F ).P( F )
Both represent fever  infection association
‘Causal model’ has arrow from cause to effect
Causal Induction



Discover causal relationships from data
Sometimes distinguishable
A
B
C
A
B
C
… different conditional independence
Causal Induction – Application

Discover causal relationships from data


Need lots of data
Applied to gene regulatory networks


Data from micro-array experiments
Recent explanation of limitations
Estimating Causal Effects

Suppose A is a cause of B
A

B
What is the causal effect?

Is it p(B | A) ?
Benefits of Sports?
intelligence
sport

Is there a relationship between sport and
exam success?



exam result
Data available
‘Intelligence’ correlate
Is this the correct test?
P(exam=pass|sport) > P(exam=pass| no-sport)
Benefits of Sports?
intelligence
observe
sport
exam result
p(pass|sport) > p(pass| no-sport)
73%

When we condition on ‘sport’



67%
Probability for ‘exam result’
Probability for ‘intelligence’ changes
What if I decide to start sport?
Intervention v Observation
intelligence
change

sport
exam result
Causal effect differs from conditional probability
P(pass|do(sport)) < P(pass| do(no sport))

Mostly interested in consequence of change


Causal effects can be measured by a Randomised
Control Trial
Causal effect of sport on exam results not identifiable
Benefit of Sport
intelligence
sport (S)


attendance (A)
exam result (E)
New observable variable ‘attendance at
lectures’
Causal effect of sport on exam results now
identifiable
P( E | do(S ))   P( A | S ) P( E | S , A).P(S )
A
S
Estimating Causal Effects

Rules to convert causal to statistical questions




Causal model



Generalises e.g. stratification, potential outcomes
Assumptions: a causal model
Some assumptions may be testable
Some variables observed, others not measured
Some causal effects identifiable
Challenges


Causal models for complex applications
Statistical implications
Example Application

Royal London trauma service




Criteria for activation of the trauma team
Aim to prevent unnecessary trauma team calls
Extensive records of trauma patient outcomes
US study of 1495 admissions proposed new
‘triage’ criteria




Significant decrease in overtriage 51%  29%
Insignificant increase in undertriage 1%  3%
None of the patients undertriaged by new criteria died
Does this show safety of new criteria?
DIADEM Project
Digital Economy in Healthcare



Data Information and Analysis for clinical
DEcision Making
EPSRC Digital Economy
Cluster


Partnership between solution providers and
clinical data analysis problem holders
Summarise unsolved data analysis needs, in
relation to the analysis techniques available
Join the DIADEM cluster
Cluster Activities and Outcomes

Engage stakeholders and build a
community:




A road map: data and information


Creation of a community web-site and forum
Meetings with potential ‘problem holders’
Workshops
Follow-up proposal
A self-sustaining website – health data
analytics
Summary

Bayesian networks


Causal induction


Prediction and diagnosis
Join the
DIADEM
cluster
Identify (some) causal relationships from (lots
of) data
Causal effects



Experimental results from …
… non-experimental data
… assumptions (causal model)