슬라이드 1 - Academia Sinica
Download
Report
Transcript 슬라이드 1 - Academia Sinica
Academia Sinica
June. 15 2014, Taipei
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Symbolic Tree for
Prognosis of Hepato Cellular Carcinoma
June. 15 2014, Taipei
Taerim Lee(1) Hyosuk Lee(2) Edwin Diday(3)
(1) Korea National Open University [email protected]
(2) Department of Internal Medicine, SNU Hospital
(3) University of Paris 9 Dauphine France
[email protected]
Academia Sinica
Outline
1. Review of Literature
2. Motivation
3. Tree structures Classification Model for HCC
4. Symbolic Data Analysis for HCC
5. Remarks
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Motivation
1. To develop the powerful modeling
technique for exploring the functional
form of covariate effects for prognosis of
HCC patients
2. To obtain the tree structured prognostic
models for HCC with time covariate
3. To extract new knowledge from a HCC
data using Symbolic Data Analysis
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Purposes
1. To identify the effect of prognostic factors
of HCC.
2. To quantify the patient characteristics that
related to the high risk clinical factor.
3. To explore the functional form of the
relationships of the covariates.
4. To extract new knowledge and fit symbolic
tree model
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Previous Work
Breiman,L.,Friedman,J.H.,Olshen,R.A.,Stone,C.
J.(1984)
developed Classification and regression tree,
CART
L. Gorden & R. Olshen (1985)
presented tree structured survival analysis in
the CancerTreatment Reports
Ciampi.Thiffault, Nakache & Asselain (1986)
proposed a variety of splitting criteria such as
likelihood ratio statistics based on the
exponential model or the Cox partial likelihood,
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Previous Work
M.LeBlanc & John Crowley (1992)
developed a method for obtaining treestructured relative risk estimate using the logrank statistic for splitting and need between node
dissimilarity in a puonning algorithm.
H.Ahn & W.Y. Loh (1994)
yields a piece wise-linear Cox proportional
hazard model using curvature detection tests
rather than exhaustive serach which evaluate all
possible splits in finding splits to reduce
computing time.
W.Y. Loh & Y.S shin (1997)
derived split selection methods for
classification tree in Statistica Sinica.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Previous Work
T. R Lee,H.S Moon(1994) Prediction Model of
craniofacial growth-dental arch classification of 6 and
7 year old children-, The Journal of Korea Society of
Dental Health, vol21,no.3
T. R Lee(1998) Classification Model for High Risk Dental
Caries with RBF Neural Networks,, The Journal of
Data Science and Classification, vol.2 (2)
T. R Lee et al (2006) Independent Prognostic factors of
861 cases of oral squamous cell carcinoma in korean
adults, Oral Oncology, vol.42, p208-217
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Previous Work
Bock, H.H, Diday E (2000) Analysis of symbolic Data.
Exploratory methods for extracting statistical Information
from complex data. Springer Verlag,Heidelberg
Bravo Liatas, M.C (2000) Strata decision tree sysmbolic
data analysis software , Data analysis, classification and
related methods, Springer Verlag, p409-415
T. R Lee(2009) Tree Structured Prognostic Model for
Hepatocellular Carcinoma, Journal of Korea Health
Inormation & Statistics, Vol.28 No.1, 2009.
T. R Lee (2011) Survival tree for Hepato Cellular Carcinoma
patient, Journal of Korean Society of Public Health
Information & Statistics
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Previous Work
V. Patel, S.Leethanakul (2001)
reported new approaches to the understanding of
the molecular basis of oral cancer.
Billard L, Diday E(2003)
looks at the concept of SDA in general, and
attempt to review the methods available to analyze
such data.
‘From the statistics of Data to the Statistics of
knowledge’
Mballo C., Diday E.(2005)
compare the Kolmogorov Simirnov criterion and
Gini index for test selection metric for decision tree
induction
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Tree Structured
Classification
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Tree Model
• The tree structured classification modeling
constructs class classification rules based on the
information provided in a learning sample of objects
with known identities.
total
L
X2 >b
X1 >a
X 3>c
D
D
L
X4 >d
L
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Logistic Regression Model
By the stepwise Logistic Regression Analysis(LRA),
four variables, were used to construct the logistic
regression model.
• The Model which involves is as follows ;
• Log Likelihood = 611.989, p = 0.0004,
• Goodness of fit chi-sq = 569.34, p = 0.02.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Schematic comparison of a classification tree and
logistic regression equation for risk assessment0
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
CART
H: High
risk
L: Low
risk
total
X1 >a
H
X2 >b
X 3>c
L
L
X4 >d
H
L
tree structured prognostic model with effective covariate
: CART uses a decision tree to display how data
may be classified or predicted.
: automatically searches for important relationships
and uncovers hidden structure even in highly complex
data.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
FACT
H: high risk
L: low risk
total
X1 >a
L
X2 >b
X 3>c
H
L
X4 >d
H
L
tree structured prognostic model with effective covariate
: FACT employs statistical hypothesis test to select a
variable for splitting each node and then uses
discriminant analysis to find the split point .
The size of the tree is determined by a set of rules
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
QUEST
D: death
L: live
total
X4+2X1 >a
L
X2 >b
D
X 3>c
X4 >d
L
D
L
: QUEST is a new classification tree algorithm derived
from the FACT method. It can be used with
univariate splits or linear combination splits.
Unlike FACT, QUEST uses cross-validation pruning.
It distinguishes from other decision tree classifiers is
that when used with univariate splits the classifier
performs approximately unbiased variable selection.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
DATA
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Classification Tree Model
H: High Risk group
L: Low Risk group
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
CART
94
46(0)
48(1)
84
37(0)
47(1)
49
15(0)
34(1)
46
12(0)
34(1)
1
INV≤0.5
CHILD≤5.5
TAENUM≤1.5
35
22(0)
13(1)
10
9(0)
1(1)
0
3
18
SIZE≤3.85 17
3(0)
8(0)
14(0)
0(1)
10(1)
3(1)
0 AFP≤10.4
0
8
10
7(0)
7(0)
1(1)
3(1)
1
0
Sensitivity 71.7%
Specificity 85.4%
Total
78.7%
1. TAENUM 100.0
2. AFP
87.7
3. CHILD
72.3
4. SIZE
59.4
5. INV
59.0
6. CLIP
45.5
Fig.4 Tree Structured Model for TACE
group of HCC data
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
RBF Neural Network Classification
Block diagram representation of nervous system
Stimulus
Receptors
Neural net
Effectors
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Response
Academia Sinica
RBF NN ROC curve according to the
Radial Basis Function
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Classification results
Kernel
V16 , V17, V19
66.3
64.2
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Survival Tree
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Survival Data
. The response var ; survival time
- The length of time; a patient has
survived after diagnosis
. Censoring is common since the endpoint
may not be observed because of
termination of a study or failure to
follow up
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Cox proportional Hazard Model
. Data (Yi, i, xi)
where Yi is the minimum of failure time
Zi and a censoring time Ci
i = I (Zi Ci) is an indicator of the event
that a failure is observed.
Xi=(X1i …Xpi ) is a p dimensional column
vector of covariates.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Cox Proportional Hazard Model
Let (t|x) be the hazard rate at time y for
an individual with risk factor X
Cox proportional hazard model;
P
( t | x ) 0 ( y ) e xp( k X k )
k 1
Where 1 , 2 , p are unknow parameters
0(y) is the baseline hazard rate at
time y.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
STUDI
X1 >a
L
S: short term
survive
L: long term
survive
total
X2 >b
X 3>c
S
L
X4 >d
S
L
Survival Tree with Unbiased Detection of Interaction
: STUDI is a tree-structured regression modeling tool.
It is easy to interpret predict survival value for new case.
Missing values can easily be handled and time dependent
covariates can be incorporated.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Split Covariate Selection
1. Fit a model to n and f covariates in the node.
2. Obtain the modified Cox-Snell residuals.
3. Perform a curvature test for each of n-s-and
c-covariates.
4. Perform a interaction test for each pair of ns-and c-covariates.
5. Select the covariate which has the smallest
p-value.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
29
Academia Sinica
STUDI
Survival Tree with Unbiased Detection of
Interaction
Cho & Loh(2001)
- STUDI is tree structured regression modeling tool.
- It is easy to interpret predict survival value for new
case.
- Missing value can easily be handled and time
dependent covariates can be incorporated.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
STUDI
Let the survival function for a covariate Xi
be S ( y | X xi ) exp{0 ( y) exp( X i )}
where 0 ( y) is the cumulative baseline hazard
rate.
Then median survival time for an individual
~
i is defined as y inf{y | S ( y ) 0.5} and the
cost at a node t be is defined as
n
R(t) | y i yˆ i |
i 1
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Tree Structured Survival Model
STUDI
Modified Cox-Snell(MCS) residuals;
ˆ (Y ) exp(ˆX ) 0.693(1 )
MCS
0
i
i
i
for i 1, , n
where ˆ 0 is the estimator of the cumulative
baseline hazard function.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig 4. Scatter plot of Box plot of the MCS Residuals
33
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.11 Tree Structured Survival Model with SNP and Clinical Data
of HCC using imputed 252 missing data
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig. 6 Tree structured Survival model for OSCC
Radio
≤ 5.92E+03
1
2
Pstage=1,2,3
88
5
Age
≤ 5.20E+01 73
4
15
2.42E+02
txmethod=1,2,5
141
Radio ≤ 0.00E+00
size
≤ 1.60E+01
t=1,4
10
20
19
40
40
1.06E+02
21
6
6.30E+01
size
22
≤ 1.00E+00
24
41
44
9
6.30E+01
6
7.30E+01
Age
≤ 5.80E+01
6
25
13
10
9.40E+01
size
≤ 1.04E+01
15
1.80E+01
23
28
24
1.00E+01
8
8.70E+01
Site
=10,2,3,4,5,6,7,9
7
28
12
48
25
size
≤ 6.77E+00
11
3
14
15
15
13
1.57E+02
29
7
7.50E+01
45
18
Site
=10,2,3,5,6,7,9
180
6
6.50E+01
90
12
91
6
2.60E+01
181
6
3.30E+01
36
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
37
Academia Sinica
SDA
(Symbolic Data Analysis)
1. To generalize data mining and statistics to
higher level units described by symbolic data
2. To extract new knowledge from a database
by using a standard data table
3. Working on higher level units called concepts
necessary described by more complex data
extending data mining to knowledge mining
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
From data mining to knowledge mining
1. A SDA needs two level of units
The first level : individual
The second level : concepts
2. A Concept is described by using the description
of class of individuals of its extent
3. The description of a concept must express the
variation of the individuals of its extent
4. Output of SDA provide new symbolic objects
associated with new categories, categories of
concepts
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA steps
1. Related database : composed of several more or less
linked data
2. Define a set of categories based on the categorical
variable from a quary to be given related database
3. The class of individuals which defines the extent of
category
4. Generalize process is applied to the subset of
individuals belonging to the extent of each concept
5. Define a symbolic data table
6. Symbolic Data Analysis
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
The main step for a SDA
Put the Data in a relational Data Base
Define a Context by Giving the Units & Classes
Build a Symbolic Data Table
Apply SDA tools:
Decision tree, Clustering, Graphical visualization
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA
Advantage
Aggregated data representation
Confidentiality preservation
Data volume reduction
Symbolic
Object
= intention
(symbolic description + recognition function of the extension)
+ extension
(individuals represented by the concept)
Eg. [ sex~(man(0.8), woman(0.2))]^[region~{city, rural}]^
Salary~[1.2, 3.1]
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Object
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA
Schematic expression
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA
Input Symbolic Data
Description of
individual
concepts
Column symbolic variable
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Data Table
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Data
variable
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Input Symbolic Data 2D Zoom Visualization
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
3D Zoom Stars
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
2D and 3D Doom Stars
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig5. SDA results according to new defined
concept of metastasis & prognosis
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Tree
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Clustering Tree
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Table 1. Patients Baseline characteristics of HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Table 2 Cox proportional hazards model for metastasis-free
survival
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig1. Survival curve of metastasis free HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig2. Survival curve of metastasis free HCC patient
according to AJCC statge
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig3. Survival curve of metastasis free HCC patient
according to the histologic response
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.11 Tree Structured Survival Model with SNP and Clinical Data
of HCC using imputed 252 missing data
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.5 Characterization of the classes according to the evolution
of HCC.
individuals free of HCC and liver cirrhosis but bCL (2x0xbCL), individuals free
of HCC and diagnosis3 and liver cirrhosis (3x1xbCL), individuals with
diagnosis4 and liver cirrhosis and acute HCC (4x1xaHCC), and individuals
with diagnosis 6 and acute HCC and free of liver cirrhosis (6x0xaHCC).
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.5 Comparison of the Partition free of HCC class.
groups the 12 clinical variables and 3 gene data with the lowest frequency of
degradation (3x1xbCL), against Partition HCC class that contains the larger
variance with the highest frequency of degradation (4x1xaHCC).
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.6 The most discriminating variables to influence HCC prognosis.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.6 The most discriminating variables to influence HCC prognosis.
INPUT: the symbolic data table with three concepts
OUTPUT: a symbolic data table with three rows associated to
the three concepts:
The first column represents the frequencies of missing data "."
the others represent the frequencies of LC = 0 and the last
row the frequencies of LC = 1
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.7 Maps of the distribution in the categories of "Encephalothy",
"Ascites", on the first factorial plane.
Data with the lowest frequency of degradation (3x1xbCL),
against Partition HCC class that contains the larger variance
with the highest frequency of degradation (4x1xaHCC).
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.8 Correlation circle between the first more discriminating symbolic
variables and some of their bins
The plane and symbolic variables in the smallest square which contains the first
quadrant of the circle of correlation. The Figure 8 shows the distribution of the
categories of the weight. From the representation of the weight on the left, it can be
seen that the class (3x1xbCL), i. e. individuals with HCC at baseline and at the end
of the study, are the lightest individuals, and from the representation of the
encephalothy on the right, we can see that the class 3x1xbC of degradation has
greater ascites well.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.9 Correlation circle the whole gene variable and bins of
clinical variables
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Tree for HCC
INPUT : Patient data table
OUTPUT : Decision tree and rules
Notice that Symbolic TREE is better adapted
working on concepts than on individuals.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.10 Symbolic Tree for HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Table 3 List of the most characteristic bins of 3x1xbCL and
4x1xaHCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Remarks
1. The application of tree structured classification
gave easy interpretable method with small number
predictor variables.
2. The application of SDA results in more detail
information and symbolic description of classes.
3. SDA gave more practical information with
visualization graph and diagram.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Q&A
Academia Sinica
Thank you !
谢谢 !