Transcript Document
Graphical Causal Models: Determining
Causes from Observations
William Marsh
Risk Assessment and Decision Analysis
(RADAR)
Computer Science
RADAR Group, Computer Science
Risk Assessment and Decision Analysis
Research areas
Software engineering, safety, finance, legal
A new initiative in medical data analysis: DIADEM
Norman Fenton
Group leader
Martin Neil
http://www.dcs.qmul.ac.uk/researchgp/radar/
Outline
Graphical Causal Models
Bayesian networks: prediction or diagnosis
Causal induction: learning causes from data
Causal effect estimation: strength of causal
relationships from data
DIADEM project
Bayesian Nets
Detecting Asthma Exacerbations
Aim to assist early
detection of asthma
episodes in Paediatric
A&E
Using only data
already available
electronically
Network created by
Experts
Data
Bayes’ Theorem
P( A, B) P( A | B).P( B) P( B | A).P( A)
Joint probability
P( A | B) P( B | A).P( A)
Revised belief
about A, given
evidence B
Prior probability of A
Factor to update belief
about A, given evidence B
Bayes’ Theorem (Made Easy)
yes, no
Infection rate: P(I) = 1%
Infection
False positive P(T=pos|I=no) = 5%
Negligible false negative
pos, neg
Test
A person has a positive test result
How likely is it they are infected?
17%
Medical Uses of BNs
Diagnosis
Prediction
Differential diagnosis from symptoms
Likely outcome
Building a BN
From expert knowledge expert system
From data data mining
Beyond Bayesian Networks
Cause versus Association
Infection
Fever
?
or
Fever
Joint probability same:
Infection
P( I , F )
P( F | I ).P( I )
P( I | F ).P( F )
Both represent fever infection association
‘Causal model’ has arrow from cause to effect
Causal Induction
Discover causal relationships from data
Sometimes distinguishable
A
B
C
A
B
C
… different conditional independence
Causal Induction – Application
Discover causal relationships from data
Need lots of data
Applied to gene regulatory networks
Data from micro-array experiments
Recent explanation of limitations
Estimating Causal Effects
Suppose A is a cause of B
A
B
What is the causal effect?
Is it p(B | A) ?
Benefits of Sports?
intelligence
sport
Is there a relationship between sport and
exam success?
exam result
Data available
‘Intelligence’ correlate
Is this the correct test?
P(exam=pass|sport) > P(exam=pass| no-sport)
Benefits of Sports?
intelligence
observe
sport
exam result
p(pass|sport) > p(pass| no-sport)
73%
When we condition on ‘sport’
67%
Probability for ‘exam result’
Probability for ‘intelligence’ changes
What if I decide to start sport?
Intervention v Observation
intelligence
change
sport
exam result
Causal effect differs from conditional probability
P(pass|do(sport)) < P(pass| do(no sport))
Mostly interested in consequence of change
Causal effects can be measured by a Randomised
Control Trial
Causal effect of sport on exam results not identifiable
Benefit of Sport
intelligence
sport (S)
attendance (A)
exam result (E)
New observable variable ‘attendance at
lectures’
Causal effect of sport on exam results now
identifiable
P( E | do(S )) P( A | S ) P( E | S , A).P(S )
A
S
Estimating Causal Effects
Rules to convert causal to statistical questions
Causal model
Generalises e.g. stratification, potential outcomes
Assumptions: a causal model
Some assumptions may be testable
Some variables observed, others not measured
Some causal effects identifiable
Challenges
Causal models for complex applications
Statistical implications
Example Application
Royal London trauma service
Criteria for activation of the trauma team
Aim to prevent unnecessary trauma team calls
Extensive records of trauma patient outcomes
US study of 1495 admissions proposed new
‘triage’ criteria
Significant decrease in overtriage 51% 29%
Insignificant increase in undertriage 1% 3%
None of the patients undertriaged by new criteria died
Does this show safety of new criteria?
DIADEM Project
Digital Economy in Healthcare
Data Information and Analysis for clinical
DEcision Making
EPSRC Digital Economy
Cluster
Partnership between solution providers and
clinical data analysis problem holders
Summarise unsolved data analysis needs, in
relation to the analysis techniques available
Join the DIADEM cluster
Cluster Activities and Outcomes
Engage stakeholders and build a
community:
A road map: data and information
Creation of a community web-site and forum
Meetings with potential ‘problem holders’
Workshops
Follow-up proposal
A self-sustaining website – health data
analytics
Summary
Bayesian networks
Causal induction
Prediction and diagnosis
Join the
DIADEM
cluster
Identify (some) causal relationships from (lots
of) data
Causal effects
Experimental results from …
… non-experimental data
… assumptions (causal model)