Objective Bayesian Nets for Integrating Cancer Knowledge: a

Download Report

Transcript Objective Bayesian Nets for Integrating Cancer Knowledge: a

Objective Bayesian Nets for Integrating Cancer
Knowledge
Sylvia Nagl PhD
Cancer Systems Biology & Biomedical Informatics
UCL London
caOBNET: Overview

Knowledge integration by objective Bayesian networks
(obNETS)

Maximum entropy method

An integrated clinico-genomic obNET for breast cancer

Conclusions
Bayesian networks
Graphical models
•
directed and acyclic graph (DAG)
Joint multivariate probability distribution
•
with conditional
between variables
independencies
Given the data, optimal network topology can
be estimated
•
heuristic search
scoring criteria
algorithms
Statistical significance of edge strengths
•
Bayesian methods
bootstrapping
and
Apolipoprotein E gene SNPs and
plasma apoE level
Rodin & Boerwinkle 2005
Knowledge integration

Cancer treatment decisions should be based on
all available knowledge

Knowledge is complex and varied:
Patient's symptoms, expert knowledge, clinical databases relating to
past patients, molecular databases, scientific papers, medical
informatics systems

Generated by independent studies with
diverse protocols
Knowledge integration

Diverse data types
Genomic, transcriptomic, proteomic, SNPs, tissue microarray,
histopathology, clinical etc.

New data types, e.g., epigenetic data

All data types capture different characteristics of a
dynamic complex system
 At different spatial and temporal scales
 Cell, tumour, patient, and therapeutic system of
patient-therapy interactions

How can this disparate data be used for an integrated
understanding on which to base our actions?
Objective Bayesianism

Data and knowledge impinge on belief – we try to find a coherent
set of beliefs with best fit

Beliefs based on undefeated items of knowledge

In case of conflict, try to find compromise beliefs

Objective Bayesianism offers a formalism for determining the
beliefs that best fit background knowledge

Applying Bayesian theory, an agent’s degree of belief should be
representable by a probability function p

Empirical knowledge imposes quantitative constraints on p

Represented in an obNET (learnt from database)
obNETS for prediction
Standard algorithms can be used to calculate the probability of a
specific outcome
A direct link between variables may suggest a causal connection
Bayesian networks

Can BNs be integrated?
Spanning genetic/molecular and clinical levels

obNETS offer a principled path to knowledge
integration
Maximum entropy principle

Adopt p, from all those that satisfy the constraints,
that are maximally equivocal

Williamson, J.(2002) Maximising Entropy Efficiently.
Williamson, J. (2005a): Bayesian Nets and Causality.
Williamson, J. (2005b): Objective Bayesian nets.


www.kent.ac.uk/secl/philosophy/jw/
Example

Two items of empirical knowledge may conflict:

Study 1: Cancer will recur in 50% of patients with given set of
characteristics
Degree of belief in recurrence in individual patient = 0.5

Study 2: Frequency of recurrence is 30%

Degree of belief will be constrained to closed interval [0.3,0.5]
In general:
 Belief function will lie within a closed set of probability
functions
 There will be a unique function that maximises entropy
obNet integration
obNet integration
Original obNETs provide
probability distributions
obNET integration
obNET integration
obNET integration
n number of nets
obNET integration
Maximum entropy principle
If CPTs for merged nodes
disagree on probabilities,
assign closed interval and
take least committal value
in that range
obNET integration: Proof of principle
Two obNETs from breast cancer knowledge domain

Genomic: Comparative genome hybridisation (CGH)
data - progenetix database


Subset of bands with 3 or more genes implicated in tumour
progression and response to cytotoxic therapies (28 bands)
Clinical: American Surveillance, Epidemiology and End
results (SEER) database
Clinical and genomic nets (Hugin 6.6)
SEER
database
4731 cases
?
progenetix database
28 bands/502 cases
obNet integration
obNet learnt from 2nd progenetix
dataset - 119 cases with clinical
annotation (lymph node status,
tumour size, grade)
CPT
22q12: -1
LN:0 0.148
1 0.852
0
0.5
0.5
1
0.148
0.852
Additional empirical knowledge
chr. 22
Fridlyand et al. 2006
obNet integration
chr. 22
Fridlyand et al. 2006
CPT
obNet integration
chr. 22
Fridlyand et al. 2006
CPT
Metastasis-associated genes
KREMEN1
MYH9
cadherin11
CD97
BMP7,
ELMO2,
BCAS1,
BCAS4,
ZNF217
KREMEN1
Howard et al., 2003
Biological knowledge suggests possible causal link
(in context of whole obNET – HR status!)
Knowledge integration
Multi-scale obNETs
Cancer clinical data & epidemiology
Predictive
markers
Translation of clinical data
to genomics research
Molecular profiling of tumours
Acknowledgements

Jon Williamson (Philosophy, Unversity of Kent)
www.kent.ac.uk/secl/philosophy/jw/

Matt Williams (Cancer Research UK)
Nadjet El-Mehidi (Cancer Systems Biology, UCL)
Vivek Patkar (Cancer Research UK)

Contact: [email protected]


obNET integration: Proof of principle

Two obNETs

Non-independent rearrangements at chromosomal
locations in breast cancer from comparative genome
hybridisation (CGH) data - progenetix database


Subset of bands with 3 or more genes implicated in tumour
progression and response to cytotoxic therapies (28 bands)
Probabilistic
dependencies
between
clinical
parameters from the American Surveillance,
Epidemiology and End results (SEER) database
HR status link
Genomic systems



Genomes are dynamic molecular systems
Selection acts on unstable cancer genomes as integrated
wholes, not just on individual oncogenes or tumour
suppressors.
A multitude of ways to ‘solve the problems’ of achieving a
survival advantage in cancer cells:
 Irreversible evolutionary processes
 Randomness of mutation
 Modularity and redundancy of complex systems
Genome-wide rearrangements



Can we identify probabilistic dependency networks in
large sample sets of genomic data from individual
tumours?
If so, under which conditions may these be interpreted
as causal networks?
Can we identify probabilistic dependency networks
involving molecular and clinical levels?
Systems Biology and Causation

Profound conceptual challenge regarding
causation in complex biological systems

Mutual dependence of physical causes

The biological relevance of any factor, and therefore “the
information” it conveys, is jointly determined, frequently in
a statistically interactive fashion, by that factor and the
system state (Susan Oyama, The Ontogeny of
Information, 2000)
The influence of a gene, or a genetic mutation, depends
on the context, such as availability of other molecular
agents and the state of the biological system, including the
rest of the genome

physical
System state
agents
Cell networks are dynamically instantiated – genes for components are
switched on or off in response to signals and cell state
System state
Cell networks are reconfigured in response to
changes in environment or cell’s internal state
System state
Cell computation networks are reconfigured in response to
changes in environment or cell’s internal state
Cancer: Genome instability re-programs cell networks
Selection for increased proliferation, resistance, invasiveness etc.
Driven by tumour cell – tissue interactions
Genome-wide rearrangements

Can we identify probabilistic dependency networks in
large sample sets of genomic data from individual
tumours?

Can we identify probabilistic dependency networks
involving molecular and clinical levels?
Proof of principle

Screen the whole genome
abnormalities in one experiment
for
chromosomal
Cytogenetics

Comparative genomic hybridization (CGH)

Fluorescence in situ hybridization (FISH)
fluorescence in situ hybridization (MFISH)

Detection of allelic instabilities, loss of heterozygosity (LOH)
and
multicolour