PowePoint presentation of comments by the expert

Download Report

Transcript PowePoint presentation of comments by the expert

W07
The Discovery Challenge
on Thrombosis Data
Data Provider
Katsuhiko TAKABAYASHI MD
Chiba University Hospital, Japan
Anti-Phospholipid antibody Syndrome
(APS)
Anti-cardiolipin antibodies (aCL)
Lupus anticoagulant (LAC)
 Induces thrombotic events
(such as AMI, Stroke, deep venous
thrombosis, miscarriage, pulmonary
hypertension etc.)
 sometimes positive in other collagen
diseases (Lupus, Sjoegren syndrome)

TF
Ca++
VIIa
VII
TF/VIIa
XI
XII
XIa
XIIa
IX
Ca++Mg++
Ca++ Mg++
IXa
VIII
Ca++ VIIIa
PL Ca++Mg
++
X
Xa
Ca++
PL
ProteinC
ProteinC
Va
prothrombin
fibrinogen
V
thrombin
fibrin
XIIIa
XIII
Collagen disease
SLE
APS
RA
Collagen Diseases



Autoimmune disease
Rheumatic disease
Connective tissue disease
autoantibodies
APS
Thombosis ; Vessel stasis by blood clots
Myocardial Infarction, Stroke etc.
Thrombosis

Who in APS ?

When ?

Which laboratory data have relations with
thrombosis as well as anti-cardiolipin
antibodies ?
The Goal of this trial (1)
Assessment of validity of each study

If a data mining technique can point
out important key factors (aCL, LAC, PT,
APTT) which are already known to be
related with thrombosis properly from
many variants we provided.
The Goal of this trial (2)
The Results to expect
1) to identify high risk patients who have
no history of thrombosis so far.
2) to predict the time of thrombosis or
detect the change of some variants
in the course of thrombosis from the
series of temporal data.
Evaluation of the results
From the current medical point of view,.





Common sense results (positive control)
Probable results
Possible results
unclear results, difficult to evaluate
Nonsense results (negative control)
We cannot judge what we do
not know !
The study
The study
most results of which
have low accordance with
current knowledge
Domain researchers
cannot believe the rest
of unclear results !
most results of which
have good accordance with
current knowledge
Domain researchers cannot
say that other unclear results
are also true.
Assessment in domain field
Medical Data Set



Medical data set here is from 1241
patients with collagen diseases and 7
basic laboratory data for aCL from 806
cases were provided.
As for temporal laboratory data, 41
items in 57,543 tests totally in 17 years
were prepared.
Seventy-six cases had some thrombotic
events in their clinical course.
Evaluations from medical aspects
Coursac I et al



The bridge theory ;
Genetic Programming
It can predict patients’ health state from
spe-exams and lab-exams in 99.28%.
CNS lupus has a relation with anti-DNA Ab
level and IgM type aCL.
aCL IgM and anti-DNA Ab levels are related
independently with the thrombosis in the
future.
Evaluations from medical aspects
Boulicaut et al




δ- strong classification rules
a lot of rules with 100% confidence, but
most of them were not useful.
A rule that aCL >2.4 and range of aCL IgM from
1.9 to 2.7 and KCT (-) is SLE.
The rule that sex is M and ANA is 0 is Behcet.
We would like to look at the other rules not
written here to find attractive ones.
Evaluations from medical aspects
Jensen S et al


CRISP
( cross-industry standard process )
LAC, ANA, U-pro, centromere-type, SSA,
SSB,RNP,SM,SCl-70 were strong contributors
to predict the presence of thrombosis.
Other possibilities of thrombosis without aCL
antibodies.
Evaluations from medical aspects
Jensen S et al


Sequential analysis for temporal data did
not show interesting results.
It might be difficult to predict the time of
thrombosis. One possibility is that the data
might be modified by the treatment or
prophylaxis.
Bias by physicians

Modification of treatment

Selection of the cases, laboratory data
Evaluations from medical aspects
Werner J and Fogarty T



genetic programming
determined a discriminate function that
separates occurrences of thrombosis with
very low false negatives.
However, ..... is it possible to translate the
meaning and make us understood ?
Weightening?
When the Results Beyond Expert’s
Knowledge ability


Complicated relations might be difficult to
be explained.
No drug relations for three items were tried.
The results through a black box might be
ignored by the experts simply because it
can not make them understood!
Evaluations from medical aspects
Zytkow J and Gupta S SQL ; cross contingency
classification
 reasonable results as Infozoom.
 ANA pattern analysis
 Patients with severe attacks have more
possibilities of other attacks.
 Thrombosis related with the level of aCLs.
 Alveolar hemorrhage and CNS attacks are
not associated with milder attacks.
Evaluations from medical aspects



Beilken and Spenke (InfoZoom) : by using
user friendly interface, easy to understand
their test results. They could choose the
reasonable and interesting rules.
Levin: by using Wizwhy producing 7356 rules.
Complicated rules are difficult to comment
because of its complexity.
Taylor : from temporal data missing data
disturbed the analysis. Only common sense
findings were selected.
Evaluations from medical aspects
Zytkow J and Gupta S SQL ; cross contingency
classification
 reasonable results as Infozoom.
 ANA pattern analysis
 Patients with severe attacks have more
possibilities of other attacks.
 Thrombosis related with the level of aCLs.
 Alveolar hemorrhage and CNS attacks are
not associated with milder attacks.
To obtain the good results efficiently
(1)




Cleaning of data
Preprocessing the data is very essential by
domain researchers who concerned with
the database to minimize the noises.
Definition, classification, adjustment etc.
Recognition of the modification by the
treatment or prophylaxis.
Indication to treat missing data
To obtain the good results efficiently
(2)
Introduction of the domain knowledge


To involve medical knowledge as possible
with the data set in the beginning
To cooperate with domain researchers to
obtain domain knowledge during data
mining.
Causal Relation
Misjudge in temporary meaning
Backward and non-objective relationships
Bacteria invades
Pneumonia occurs
Bacteria has invaded
Pneumonia occurs
Bacteria will invade
Pneumonia occurs
To obtain the good results efficiently
(3)
Cooperation with domain researchers


An interactive technique will avoid user’s
discontent of a black box and assist to
drive to the right direction.
Hypothetico-deductive method will be
easily accepted by physicians.
Causal Relation
Misjudge in temporary meaning
Backward and non-objective relationships
It rains
The road is wet.
It rains
The road is wet.
It will rain
The road is wet.
Data mining




Retrospective approach ; not arranged, many
noises.
Data ; More genuine and adequate data set
must be prepared. Terms, definitions and
background must be introduced beforehand.
Rules ; Complicated rules (relations between
more than 3 items) found by this analysis cannot
be explained nor proved whether they are true
from medical approach.
3種の薬剤の治験はない
Bias
By Physicians


Modification of treatment
Selection of the cases, laboratory data
By Accident

change of the disease; before and after the
events (thrombosis)