Transcript Slides

Optimization-Based Data
Mining Approaches in
Neuroscience Research
Panos M. Pardalos
University of Florida
Introduction

Data Mining: “the practice of searching through large
amounts of computerized data to find useful patterns or
trends.”

Optimization: “An act, process, or methodology of making
something (as a design, system, or decision) as fully perfect,
functional, or effective as possible; specifically : the
mathematical procedures (as finding the maximum of a
function) involved in this.”
Merriam Webster Dictionary
Introduction



The combination of data mining and
optimization:
Find the “best” way to extract meaningful
“patterns” from data.
Not always an easy task.
How difficult Optimization can
be?



Given integers N1,N2,…,Nk and M find a
subset of N1,N2,…,Nk such that their sum is
equal to M.
Can you find a better algorithm than of O(2k).
Exponential complexity ?
Hard drive Cost

Approximate
ly 1/10
cheaper
every 5
years
Hard Drive Capacity

Approximat
ely 10 times
more every
5 years
Processing power

Number of
transistors of a
computer
processor
double every
two years
References

Handbook of Massive
Data Sets, co-editors: J.
Abello, P.M. Pardalos,
and M. Resende, Kluwer
Academic Publishers,
(2002).
Main problems in data mining

Data preprocessing

Dimensionality reduction

Feature selection

Regression

Clustering (Unsupervised learning)

Classification (Supervised Learning)

Semi-Supervised learning (between unsupervised and unsupervised)

Biclustering

Result Validation

Data Visualization/Representation

Biomedical Informatics is a challenging area with lots of these problems.
Agenda








Research Background
 Epilepsy
 Seizure Prediction
Sources of Data
 Electroencephalogram (EEG) Time Series
Dimensionality Reduction
 Chaos Theory
Feature Selection for Brain Monitoring
Time Series Classification of Neuro-Physiological States
Brain Clustering
Brain Network Models
Concluding Remarks
Facts About Epilepsy

At least 2 million Americans and other 40-50
million people worldwide (about 1% of
population) suffer from Epilepsy.

Epilepsy is the second most common brain
disorder (after stroke) that causes recurrent
seizures.

Epileptic seizures occur when a massive group
of neurons in the cerebral cortex suddenly begin
to discharge in a highly organized rhythmic
pattern.
Epileptic Seizures

Seizures usually occur spontaneously, in the
absence of external triggers.

Seizures cause temporary disturbances of brain
functions such as motor control, responsiveness
and recall which typically last from seconds to a
few minutes.

Seizures may be followed by a post-ictal period
of confusion or impaired sensorial that can
persist for several hours.
10-second EEGs: Seizure Evolution
Normal
Seizure Onset
Pre-Seizure
Post-Seizure
Why do we care?

Based on 1995 estimates, epilepsy imposes an annual
economic burden of $12.5 billion* in the U.S. in
associated health care costs and losses in employment,
wages, and productivity.

Cost per patient ranged from $4,272 for persons** with
remission after initial diagnosis and treatment to
$138,602 for persons** with intractable and frequent
seizures.
Current Epilepsy Treatment

Pharmacological Therapy
 Anti-Epileptic Drugs (AEDs)
 Mainstay of epilepsy treatment
 Approximately 25 to 30% remain unresponsive

Epilepsy Resective Surgery
 Require long-term invasive EEG monitoring to locate
a specific, localized part of the brain where the
seizures are thought to originate
 50% of pre-surgical candidates do not undergo
respective surgery
 Multiple epileptogenic zones
 Epileptogenic zone located in functional brain tissue
 Only 50-60% of surgery cases result in seizure free
Current Epilepsy Treatment






Electrical Stimulation (Vagus nerve stimulator)
 Parameters (amplitude and duration of stimulation)
arbitrarily adjusted
 As effective as one additional AED dose
 Side Effects
Seizure Prediction?
Monitoring Unit?
Forecasting Impending Seizures?
Seizure Control?
Deep Brain Stimulator?
Electroencephalogram (EEG)




…is a traditional tool for evaluating the physiological
state of the brain.
…offers excellent spatial and temporal resolution
to characterize rapidly changing electrical activity of
brain activation
…captures voltage potentials produced by brain
cells while communicating.
In an EEG, electrodes are implanted in deep brain
or placed on the scalp over multiple areas of the
brain to detect and record patterns of electrical
activity and check for abnormalities.
From Microscopic to Macroscopic
Level (Electroencephalogram - EEG)
Electrode Montage and EEGs
ROF
43 2 1
RST
1234
4321
1 2 34
1
2
1
2
3
3
4
5
RTD
4
5
LTD
LOF
LST
Scalp EEG Data Acquisition
Open Problems





Is the seizure occurrence random?
If not, can seizures be predicted?
If yes, are there seizure pre-cursors (in
EEGs) preceding seizures?
If yes, what data mining techniques can be
used to indicate these pre-cursors?
Does normal brain activity during differ from
abnormal brain activity?
Goals of Research

Test the hypothesis that seizures are not a random process.

Demonstrate that seizures could be predicted
 Feature Selection to identify seizure pre-cursors (Statistical
Process Control)
Demonstrate that normal and abnormal EEGs can be
differentiated
 Time Series Classification
Better understand the epileptogenic process – how seizures are
initiated and propagated.
 Brain Clustering



Develop a closed-loop seizure control device (Brain Pacemaker)
Dimensionality
Reduction
Chaos Theory
EEGs with the Curse of
Dimensionality
The brain is a non-stationary system.
 EEG time series is non-stationary.
 With 200 Hz sampling, 1 hour of EEGs is
comprised of
200*60*60*30 = 21,600,000 data points = 43.2MB
(assume 16-bit ASCI format)
 1 day = 1.04GB
 1 week = 7.28GB
 20 patients ≈ 0.15TB

Kilobytes → Megabytes
→ Gigabytes → Terabytes
Data Transformation Using
Chaos Theory

Measure the brain dynamics from time series:


Stock Market
Currency Exchanges (e.g., Swedish Kroner)

Apply dynamical measures (based on chaos theory)
to non-overlapping EEG epochs of 10.24 seconds =
2048 points.

Maximum Short-Term Lyapunov Exponent
 measure the average uncertainty along the local
eigenvectors and phase differences of an attractor in
the phase space
 measure the stability/chaoticity of EEG signals
EEG Voltage
Measure of Chaos
Time
 d1  d 0 e1t is the major axis.
 d 2  d 0 e2t is the minor axis.
 The i th Lyapunov exponents after n steps can be defined as:
d
1
i  log i
n
d0
STLmax Profiles
Pre-Seizure
Seizure Onset
Post-Seizure
Hidden Synchronization
Patterns
How similar are they?
Statistics to quantify the convergence of STLmax
By paired-T statistic:
Per electrode, for EEG signal epochs i and j, suppose their STLmax
values in the epochs (of length 60 points, 10 minutes) are
Li  {STL max1i , STL max i2 ,
, STL max i60 }
L j  {STL max1j , STL max 2j ,
, STL max 60j },
Dij  Li  L j  {dij1 , dij2 ,
, dij60 }
 {STL max1i  STL max1j , STL max i2  STL max 2j ,
, STL max i60  STL max 60j }
Then, we calculate the average value, D ij ,and the sample standard
deviation, ˆ d , of Dij  {dij , dij2 , , dij60} .
Dij
The T-index between EEG signal epochs i and j is defined as Tij  ˆ
d
,
60
Statistically Quantifying the
Convergence
Convergence of STLmax
Why Feature Selection?


Not every electrode site shows the convergence.
Feature Selection: Select the electrodes that are most
likely to show the convergence preceding the next seizure.
Feature Selection
Quadratic Integer Programming
with Quadratic Constraints
Optimization Problem


Optimization:
 We apply optimization techniques to find a group of
electrode sites such that …
 They are the most converged (in STLmax) electrode
sites during 10-min window before the seizure
 They show the dynamical resetting (diverged in
STLmax) during 10-min window after the seizure.
 Such electrode sites are defined as “critical electrode
sites”.
Hypothesis:
 The critical electrode sites should be most likely to
show the convergence in STLmax again before the
next seizure.
Notation and Modeling





x is an n-dimensional column vector (decision variables), where
each xi represents the electrode site i.
 xi = 1 if electrode i is selected to be one of the critical electrode
sites.
 xi = 0 otherwise.
Q is an (nn) matrix, whose each element qij represents the Tindex between electrode i and j during 10-minute window before a
seizure.
b is an integer constant. (the number of critical electrode sites)
D is an (nn) matrix, whose each element dij represents the Tindex between electrode i and j during 10-minute window after a
seizure.
α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of
T-index, as previously defined, to reject H0: “`two brain sites
acquire identical STLmax values within 10-minute window”
Multi-Quadratic Integer
Programming

To select critical electrode sites, we
formulated this problem as a multiquadratic integer (0-1) programming
(MQIP) problem with …
 objective function to minimize the
average T-index among electrode
sites
 a linear constraint to identify the
number of critical electrode sites
 a quadratic constraint to ensure
that the selected electrode sites
show the dynamical resetting
Problem P1 :
Min f( x)  xT Qx
n
s.t.
x b
i 1
i
xT Dx  
xi  {0,1}, i  1,..., n
Conventional Linearization Approach for
Multi-Quadratic 0-1 Problem
 For each product xi x j , we introduce new 0-1 variable xij  xi x j (i  j ).
Note that xii  xi 2  xi for xi  0,1 .
 The equivalent linear 0-1 problem is given by:
min
 q x
ij ij
i
s.t.
j
Ax  b
xij  xi , for i, j  1,..., n (i  j )
xij  x j , for i, j  1,..., n (i  j )
xi  x j  1  xij , for i, j  1,..., n (i  j )
d
i
x 
ij ij
j
xi  {0,1}, 0  xij  1, i, j  1,..., n
 Note that the number of continuous variables has been increased to O( n 2 ).
 Note that this problem formulation is computationally inefficient as n increases.
Theoretical Results:
MILP formulation for MQIP problem


Consider the MQIP problem
We proved that the MQIP program is EQUIVALENT to a MILP problem
with the SAME number of integer variables.
Problem P :
1
Min eT s
Problem P :
1
Min f( x)  xT Qx
s.t.
Ax  b
xT Dx  
x {0,1}, i  1,..., n
i
Equivalent
Qx  y  s  0
(1)
Ax  b
(2)
y  M (1 x) (3)
Dx  z  0
(4)
eT z  
(5)
z  M 'x
(6)
s, y, z  0, x 0,1
(7)
where M  max  qij  Q  ,
i j
M '  max  dij  D 
i j
Empirical Results:
Performance on Larger Problems
Hypothesis Testing Simulation


Hypothesis:
 The critical electrode sites should be most likely to show
the convergence in STLmax (drop in T-index below the
critical value) again before the next seizure.
 The critical electrode sites are electrode sites that
 are the most converged (in STLmax ) electrode sites
during 10-min window before the seizure
 show the dynamical resetting (diverged in STLmax )
during 10-min window after the seizure
Simulation:
 Based on 3 patients with 20 seizures, we compare the
probability of showing the convergence in STLmax (drop in
T-index below the critical value) before the next seizure
between the electrode sites, which are
 Critical electrode sites
 Randomly selected (5,000 times)
Optimal VS Non-Optimal
Simulation - Results
Statistical Process Control:
How to automate the system?
Automated Seizure Warning System
EEG Signals
Continuously calculate
STLmax from multichannel EEG.
ASWA
Select critical electrode
sites after every
subsequent seizure
Give a warning when
T-index value drops
below a critical value
Monitor the average
T-index of the
critical electrodes
Data Characteristics
Performance Evaluation for
ASWS



To test this algorithm, a warning was
considered to be true if a seizure occurred
within 3 hours after the warning.
# of accurately predicted seizures
Sensitivity =
# of analyzed seizures
False Prediction Rate = average number of
false warnings per hour
Training Results
Performance characteristics of automated seizure warning
algorithm with the best parameter-settings of training data set.
RECEIVER OPERATING
CHARACTERISTICS (ROC)



ROC curve (receiver operating characteristic) is
used to indicate an appropriate trade-off that one
can achieve between:
the false positive rate (1-Specificity, plotted on Xaxis) that needs to be minimized
the detection rate (Sensitivity, plotted on Y-axis) that
needs to be maximized.
Test Results
Performance characteristics of automated seizure warning
algorithm with the best parameter settings on testing data set.
Validation of the ASWS
algorithm

Temporal Properties



Surrogate Seizure Time Data Set
100 Surrogate Data Sets
Spatial Properties


Non-Optimized ASWS – Selecting non-optimal
electrode sites
100 Randomly Selected Electrodes
Prediction Scores: Surrogate Data
and Non-Optimized ASWS
Remarks





Optimization as feature selection for brain monitoring
Developed an online real-time seizure prediction system
Tested on the dataset of
 10 patients suffering from temporal lope seizures
 ~90 days (2100 hours) of EEG data
 58 seizures
Seizure Prediction
 Predicting ~70% of temporal lobe seizures on average
 Giving a false alarm rate of ~0.16 per hour on average
What’s next?-fundamental questions on brain physiology
Time Series
Classification I
Support Vector Machines with
Dynamic Time Warping
Other Dynamical Measures:
Phase Profiles
Other Dynamical Measures:
Entropy H of Attractor
Classification of Physiological
States
Support Vector Machines
From 1 electrode
Input

Standard SVM Input


30 electrodes, 30 data points, 3 dynamical
features = 2,700 features
Time Series SVM Input

30*29 data pairs, 3 dynamical features = 2,700 –
90 features
Dynamic Time Warping
Preliminary Data Set


132 5-minute epochs of pre-seizure EEGs
300 5-minute epochs of normal EEGs
 Pre-seizure = 0-30 minutes before seizure
 Normal = 10 hours away from seizure
Metrics for Performance
Evaluation
PREDICTED CLASS
Class=Yes Class=No
ACTUAL
CLASS
Class=Yes
a
b
Class=No
c
d
a: TP (true positive); b: FN (false negative);
c: FP (false positive); d: TN (true negative)
Sensitivity and Specificity


Sensitivity measures the fraction of positive cases that are
classified as positive.
Specificity measures the fraction of negative cases classified as
negative.
Sensitivity = TP/(TP+FN)
Specificity = TN/(TN+FP)



Sensitivity can be considered as a detection (prediction or
classification) rate that one wants to maximize.
Maximize the probability of correctly classifying patient states.
False positive rate can be considered as 1-Specificity which one
wants to minimize.
Leave-one-out Cross Validation


Cross-validation can be seen as a way of
applying partial information about the
applicability of alternative classification
strategies.
K-fold cross validation:




Divide all the data into k subsets of equal size.
Train a classifier using k-1 groups of training data.
Test a classifier on the omitted subset.
Iterate k times.
Empirical Results
Automated Seizure Prediction Paradigm
Multichannel
Com
Feature
Extraction/
Cluster
Analysis
Data Acquisition
Interface
Technology
Pattern
Recognition
VNS
User
Initiate a variety of
therapies (e.g., electrical
stimulation, drug
injection)
Drug
Related Patents
Multi-dimensional multi-parameter time series processing for seizure warning and
prediction
Patent 7,263,467 (Issued on August 28, 2007).
Optimization of Multi-dimensional Time Series Processing for seizure warning and
prediction
Patent 7,373,199 (Issued on May 13, 2008).
Optimization of spatio-temporal pattern processing for seizure warning and prediction
Patent 7,461,045 (Issued on December 2, 2008).
Multi-dimensional dynamical analysis
U.S. Utility Patent application filed on December 21, 2006, Serial No.: 11/339,606.
Closed-Loop State-Dependent Seizure Prevention Systems
U.S. Utility Patent application filed on December 19, 2006, Serial No.: 11/641,292.
Brain Network Models
Brain Connectivity Networks
Based on fMRI Data
The Problem



Certain neurological diseases are very difficult to
diagnose at early stages
Functional Magnetic Resonance Imaging (fMRI)
technique provides vast amount of information about
structure and function of human brain, but there is
lack of methods to analyse these data
Computational methods and algorithms based on
mathematical models should be applied in order to
find and recognize key patterns in this “ocean” of
data
Network Models



Network models of human brain
Partition of the brain into regions of
interest
Functional interconnections between
regions in brain
Connectivity Networks
MRI Data


Blood flow level as an indicator of neuronal activity
Representation of values of signal in spatial voxels as
2D and 3D images
fMRI Data


The measurements are being performed every 2
seconds over 6 minutes for each voxel of brain
of size 2mm x 2mm x 2mm
The fMRI data is therefore a set of time series,
corresponding to particular elementary volumes
of the brain. In our data set each series contains
180 elements.
fMRI Data, Vector
Representation
Z
(x, y, z)
X
0
Y
Small World Networks

Small world phenomenon first described by
Stanley Milgram in 1960.


“Six degrees of separation”
“Erdos number”
From Random Graph to
Regular Lattice



Random graphs generally have property of low
mean shortest path length and low clustering
coefficient
Regular lattice has high mean shortest path and
high cluster coefficient
Small world networks have low mean shortest
path length while still high clustering coefficient
Random Graph vs Regular
Lattice
Small World Network
Quantitative Measures of
“Small World” Property

Characteristic path length

Clustering coefficient

Global efficiency

Nodal efficiency
Brain Connectivity Networks


Brain connectivity networks possess small
world properties
We predicted, that network characteristics,
such as global and local efficiency values,
would be decreased for people with
Parkinson’s disease.
Nodes in Connectivity Network



How to define brain regions – nodes in the
network?
Clustering problem
Standard MNI template
Signal Time Series Form
Clusters
time
Clustering Problem


Each data set contains roughly 100 000 of
time series, each of them consist of 180
elements
Efficient algorithms should be developed
in order to solve this problem
Standard MNI Brain Atlas

Partition of the brain into 116 brain regions
Edges in Connectivity Network


Weighted graph with nodes corresponding
to MNI brain regions
Weights of edges defined based on
correlation between averaged neural
activity over the regions
Signal Processing




Neural activity
Head movements during the MR session
Respiratory and heart rhythms
Noise
Maximal Overlap Discrete
Wavelet Transform


Wavelet is a “small wave”
Wavelet transform is a decomposition of initial
signal into linear combination of wavelets
Time Series Decomposition by
Wavelets
Wavelet Coefficients
Correlation


Inter-regional correlations in resting state
fMRI data are particularly salient at
frequencies below 0.1 Hz
Second scale wavelet coefficients correspond
to 0.06 – 0.12 Hz frequency range
Connectivity Strength

Averaged over the regions signal vectors

Define level 2 wavelet coefficients of
averaged signals
,
.
The connectivity between regions A and B is

Definition of Distance Between
Nodes



For each time series S = {s1, s2, …, sn } of
size n there is a corresponding point in ndimensional space
For normal vectors x and y the distance
between end points is equal to
Therefore, (1 – corr(x,y)) may serve as a
measure of distance between time series
Geometrical Representation
x
x-y
S = (s1, s2, …, sn)
y
0
Data Set

15 healthy controls, 14 Parkinson patients

Each network for each patient consist of
116 nodes
Averaged Connectivity
Networks
Control
Parkinson
Global Network Efficiency
Values
Control (1.85 +/- 0.57), Parkinson (1.12 +/- 0.55),
independent t-test p-value = 0.0017
Top 30 Nodal Efficiency Values
Nodal Efficiency Plot
Red line – Control set, blue line - PD set
Discussion


Parkinson’s brain network properties possess
measurable alteration in comparison with
healthy ones
Further research, in particular, different
network model, may reveal the pattern in
brain networks, which could be used as a
diagnosis criteria
Concluding Remarks







Overview of Epilepsy Research
Applications of Data Mining and Optimization
Techniques
Interplay between theory and application
Feature Selection
Time Series Classification
Brain Clustering
Brain Network Models
Related Patents
Sensor registration by global optimization
procedures
Patent 7,653,513 (Issued January 26, 2010).
Atomic Magnetometer Sensor Array
Magnetoencephalogram Systems and
Method
United States Patent Application
20100219820 (Filed April 14, 2008)
References

Handbook of Massive
Data Sets, co-editors: J.
Abello, P.M. Pardalos,
and M. Resende, Kluwer
Academic Publishers,
(2002).
References
“Clustering Challenges on Biological Networks” S.
Butenko, W. A. Chaovalitwongse and P. M. Pardalos,
World Scientific (2009).

“Feature Selection for Consistent Biclustering via Fractional 0-1 Programming” (with Stanislav
Busygin and Oleg A. Prokopyev), Journal of Combinatorial Optimization, Volume 10, Number 1
(2005), pp. 7-21.

“Biclustering in Data Mining” (with S. Busygin, and O. Prokopyev), Computers & Operations
Research, Volume 35, Issue 9 (2008), pp. 2964-2987.

“On Biclustering with Features Selection for Microarray Data Set” (with S. Busygin and O.
Prokopyev), In (BIOMAT 2005) Proceedings of the International Symposium on Mathematical
and Computational Biology (Edited by R. Mondaini & R. Dilao), World Scientific (2006), pp. 367377.

“Biclustering: algorithms and applications in data mining and forecasting” (with P. Xanthopoulos,
N. Boyko and N. Fan) In Encyclopedia of Operations Research and Management Science
(accepted to appear) Wiley(2010).
References

Quantitative Neuroscience, co-editors:
P.M. Pardalos, C. Sackellares, P.
Carney, and L. Iasemidis, Kluwer
Academic Publishers, (2004).

Biocomputing, co-editors: P.M.
Pardalos and J. Principe, Kluwer
Academic Publishers, (2002).
References
New in 2010: Computational
Neuroscience, co-editors:
W.A. Chaovalitwongse, P.M.
Pardalos, P. Xanthopoulos
(Eds.) Series: Springer
Optimization and Its
Applications , Vol. 38.
References

Optimization in
Medicine, Carlos
Alves,, Panos M.
Pardalos, Luis Vicente
(Eds.), 2008
References

Handbook of Optimization
in Medicine, Panos M.
Pardalos, Edwin H. Romeijn
(Eds.), 2009
Reference







W. Chaovalitwongse, L.D. Iasemidis, P.M. Pardalos, P.R. Carney, D.-S. Shiau, and J.C.
Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to
Epilepsy. Epilepsy Research, 64, 93-133, 2005.
W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Electroencephalogram (EEG) time
series classification: Applications in epilepsy , Annals of Operations Research, 148, 1 (2006), p 227250.
Jicong Zhang, Petros Xanthopoulos ,Chang-Chia Liu, Panos M. Pardalos. Real-time differentiation
of nonconvulsive status epilepticus from other encephalopathies using quantitative EEG analysis: A
pilot study“, Epilepsia, 51, 2 (2010), pp. 243-250
W. Chaovalitwongse , P.M. Pardalos, L.D. Iasemidis, D.-S. Shiau, and J.C. Sackellares. Dynamical
Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods
and Software, 20 (2-3): 383-394, 2005 .
L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K.
Tsakalis, P.R. Carney, and J.C. Sackellares. Long Term Prospective On-Line Real-Time Seizure
Prediction. Journal of Clinical Neurophysiology, 116 (3): 532-544, 2005.
P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney,
O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal
Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004. (INFORMS
Pierskalla Best Paper Award 2004)
W. Chaovalitwongse , P.M. Pardalos, and O.A. Prokopyev. A New Linearization Technique for
Multi-Quadratic 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004.
(Rank 5th in Top 25 Articles in Operations Research Letters)
Thank you for your attention!
Questions?
Conference in 2011