Datamining @ ARTreat Project
Download
Report
Transcript Datamining @ ARTreat Project
Datamining @ ARTreat
Veljko Milutinović
Zoran Babović
Nenad Korolija
Goran Rakočević
Marko Novaković
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Agenda
ARTReat – the project
Arteriosclerosis – the basics
Plaque classification
Hemodynamic analysis
Data mining for the hemodynamic problem
Data mining from patent records
2/28
ARTreat – the project
ARTreat targets at providing a patient-specific computational model
of the cardiovascular system, used to improve the quality of prediction
for the atherosclerosis progression and propagation
into life-threatening events.
FP7 Large-scale Integrating Project (IP)
16 partners
Funding: 10,000,000 €
3/28
Atherosclerosis
Atherosclerosis is the condition in which an artery wall thickens
as the result of a build-up of fatty materials such as cholesterol
4/28
Artheriosclerotic plaque
Begins as a fatty streak, an ill-defined yellow lesion–fatty plaque,
develops edges that evolve to fibrous plaques,
whitish lesions with a grumous lipid-rich core
5/28
Plaque components
Fibrous, Lipid, Calcified, Intra-plaque Hemorrhage
6/28
Plaque classification
Different types of plaque pose different risks
Manual plaque classification (done by doctors)
is a difficult task, and is error prone
Idea: develop an AI algorithm
to distinguish between different types of plaque
Visual data mining
7/28
Plaque classification (2)
Developed by Foundation for Research and Technology
Based on Support Vector Machines
Looks at images produced by IVUS and MRI
and are hand labeled by physicians
Up to 90% accurate
8/28
Data mining task in Belgrade
Two separate paths:
Data mining from the results of hemodynamic simulations
Data mining form medical patient records
Goal:
to provide input regarding the progression of the disease
to be used for medical decision support
9/28
Hemodynamics – the basics
Study of the flow of blood
through the blood vessels
Maximum Wall Shear Stress –
an important parameter
for plaque development prognoses
10/28
Hemodynamics - CFD
Classical methods for hemodynamic calculations
employ Computer Fluid Dynamics (CFD) methods
Involves solving the Navier-Stokes equation:
…but involves solving it millions of times!
One simulation can take weeks
11/28
Data mining form hemodynamic simulations
(first path)
Idea: use results of previously done simulations
Train a data mining AI system capable of regression analysis
Use the system to estimate the desired values
in a much shorter time
12/28
Neural Networks - background
Systems that are inspired by the principle of operation
of biological neural systems (brain)
13/28
Neural Networks – the basics
A parallel, distributed information processing structure
Each processing element has a single output which branches
(“fans out”) into as many collateral connections as desired
One input, one output and one or more hidden layers
14/28
Artificial neurons
Each node (neuron) consists of two segments:
Integration function
Activation function
Common activation function
Sigmoid
15/28
Neural Networks - backpropagation
A training method for neural networks
Try to minimize the error function:
by adjusting the weights
Gradient descent:
Calculate the “blame” of each input for the output error
Adjust the weights by:
(γ- the learning rate)
16/28
Input data set
Carotid artery
11 geometric parameters and the MWSS value
17/28
The model
One hidden layer
Input layer: linear
Hidden and output:
sigmoid
Learning rate 0.6
500K training cycles
Decay and momentum
18/28
Current results
Average error: 8.6%
Maximum error 16,9%
19/28
The “dreaded” line 4
Line 4 of the original test set proved difficult to predict
Error was over 30%
Turned out to be an outlier
Combination of parameters was such that it couldn’t
But the CFD worked, NN worked
Visually the geometry looked fine
Goes to show how challenging the data preprocessing can be
20/28
Data mining from medical data
(second path)
Use a large medical database (3000 patients)
to attempt to find patterns
that help predicating progression of arteriosclerosis
Data include:
Coronary angiography results
Blood chemistry
Risk factors (such as smoking, obesity, family histrory, etc.)
21/28
Repeated angio dataset
90 different parameters
Includes data from two coronary angiographies
taken at different times (distances between 3 months and 10 years)
22/28
Current approach
Divide the patients into three categories,
according to the second angio:
Less then 50% stenosis
50-75% stenosis
More than 75% stenosis
(percentages chosen based on the dataset values)
Use Neural and SVM classifiers to attempt classification
23/28
Current resutls
Current results: 80% accuracy,
But:
Division is very crude (“inherited” form the dataset)
Misclassifications sometimes happen between class 1 and class 3
Dataset lacks healthy and less critical patients
LDL data are missing
Further improvements, both in algorithms and the data needed,
to make the results significant
24/28
Genetic data
Single coronary angiography
Blood chemistry
Medications
Single Nucleotide Polymorphism (SNP) data
on selected DNA sequences
25/28
…and now for something
completely different
26/28
Questions
27/28
Datamining @ ARTreat
Project
Veljko Milutinović
Zoran Babović
Nenad Korolija
Goran Rakočević
Marko Novaković
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]