Datamining @ ARTreat Project

Download Report

Transcript Datamining @ ARTreat Project

Datamining @ ARTreat
Veljko Milutinović
Zoran Babović
Nenad Korolija
Goran Rakočević
Marko Novaković
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Agenda
 ARTReat – the project
 Arteriosclerosis – the basics
 Plaque classification
 Hemodynamic analysis
 Data mining for the hemodynamic problem
 Data mining from patent records
2/28
ARTreat – the project
 ARTreat targets at providing a patient-specific computational model
of the cardiovascular system, used to improve the quality of prediction
for the atherosclerosis progression and propagation
into life-threatening events.
 FP7 Large-scale Integrating Project (IP)
 16 partners
 Funding: 10,000,000 €
3/28
Atherosclerosis
 Atherosclerosis is the condition in which an artery wall thickens
as the result of a build-up of fatty materials such as cholesterol
4/28
Artheriosclerotic plaque
 Begins as a fatty streak, an ill-defined yellow lesion–fatty plaque,
develops edges that evolve to fibrous plaques,
whitish lesions with a grumous lipid-rich core
5/28
Plaque components
 Fibrous, Lipid, Calcified, Intra-plaque Hemorrhage
6/28
Plaque classification
 Different types of plaque pose different risks
 Manual plaque classification (done by doctors)
is a difficult task, and is error prone
 Idea: develop an AI algorithm
to distinguish between different types of plaque
 Visual data mining
7/28
Plaque classification (2)
 Developed by Foundation for Research and Technology
 Based on Support Vector Machines
 Looks at images produced by IVUS and MRI
and are hand labeled by physicians
 Up to 90% accurate
8/28
Data mining task in Belgrade
 Two separate paths:
 Data mining from the results of hemodynamic simulations
 Data mining form medical patient records
 Goal:
to provide input regarding the progression of the disease
to be used for medical decision support
9/28
Hemodynamics – the basics
 Study of the flow of blood
through the blood vessels
 Maximum Wall Shear Stress –
an important parameter
for plaque development prognoses
10/28
Hemodynamics - CFD
 Classical methods for hemodynamic calculations
employ Computer Fluid Dynamics (CFD) methods
 Involves solving the Navier-Stokes equation:
 …but involves solving it millions of times!
 One simulation can take weeks
11/28
Data mining form hemodynamic simulations
(first path)
 Idea: use results of previously done simulations
 Train a data mining AI system capable of regression analysis
 Use the system to estimate the desired values
in a much shorter time
12/28
Neural Networks - background
 Systems that are inspired by the principle of operation
of biological neural systems (brain)
13/28
Neural Networks – the basics
 A parallel, distributed information processing structure
 Each processing element has a single output which branches
(“fans out”) into as many collateral connections as desired
 One input, one output and one or more hidden layers
14/28
Artificial neurons
 Each node (neuron) consists of two segments:
 Integration function
 Activation function
 Common activation function
 Sigmoid
15/28
Neural Networks - backpropagation
 A training method for neural networks
 Try to minimize the error function:
by adjusting the weights
 Gradient descent:
 Calculate the “blame” of each input for the output error
 Adjust the weights by:
(γ- the learning rate)
16/28
Input data set
 Carotid artery
 11 geometric parameters and the MWSS value
17/28
The model
 One hidden layer
 Input layer: linear
 Hidden and output:
sigmoid
 Learning rate 0.6
 500K training cycles
 Decay and momentum
18/28
Current results
 Average error: 8.6%
 Maximum error 16,9%
19/28
The “dreaded” line 4
 Line 4 of the original test set proved difficult to predict
 Error was over 30%
 Turned out to be an outlier
 Combination of parameters was such that it couldn’t
 But the CFD worked, NN worked
 Visually the geometry looked fine
 Goes to show how challenging the data preprocessing can be
20/28
Data mining from medical data
(second path)
 Use a large medical database (3000 patients)
to attempt to find patterns
that help predicating progression of arteriosclerosis
 Data include:
 Coronary angiography results
 Blood chemistry
 Risk factors (such as smoking, obesity, family histrory, etc.)
21/28
Repeated angio dataset
 90 different parameters
 Includes data from two coronary angiographies
taken at different times (distances between 3 months and 10 years)
22/28
Current approach

Divide the patients into three categories,
according to the second angio:
 Less then 50% stenosis
 50-75% stenosis
 More than 75% stenosis
(percentages chosen based on the dataset values)
 Use Neural and SVM classifiers to attempt classification
23/28
Current resutls

Current results: 80% accuracy,

But:
 Division is very crude (“inherited” form the dataset)
 Misclassifications sometimes happen between class 1 and class 3
 Dataset lacks healthy and less critical patients
 LDL data are missing

Further improvements, both in algorithms and the data needed,
to make the results significant
24/28
Genetic data
 Single coronary angiography
 Blood chemistry
 Medications
 Single Nucleotide Polymorphism (SNP) data
on selected DNA sequences
25/28
…and now for something
completely different
26/28
Questions
27/28
Datamining @ ARTreat
Project
Veljko Milutinović
Zoran Babović
Nenad Korolija
Goran Rakočević
Marko Novaković
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]