TA_4_Ho - The International Conference on Bioinformatics
Download
Report
Transcript TA_4_Ho - The International Conference on Bioinformatics
A model selection approach to discover
age-dependent gene expression patterns
using quantile regression models
Joshua W.K. Ho1,2
1 School
of IT, The University of Sydney
2 NICTA, Australian Technology Park, NSW
[email protected]
Joint work with
Maurizio Stefani,
Cristobal dos Remedios and
Michael Charleston
Ageing Microarray Dataset
Arrays
Gene
Age
InCoB 2009
Discover age-dependent patterns
How to find DE patterns?
InCoB 2009
Standard approach
Linear regression using method of least
squares
Second order linear regression (Quadratic
regression)
InCoB 2009
Quantile regression
Solve a different optimization problem:
where
InCoB 2009
Check function
We obtain a median regression line when
InCoB 2009
A quantile regression line
How to know slope of
the quantile
regression line is 0 ?
We obtain a median regression line when
InCoB 2009
Solution: a model selection approach
InCoB 2009
Quantile regression models
Constant Model (C)
Linear Model (L)
Piecewise Model (PL)
where
Model complexity
InCoB 2009
Model selection basics
We measure the goodness-of-fit of a model
based on Residual Sum of Absolute
Differences (RSAD):
• The smaller the RSAD, the better the fit.
• A more complex model always yields lower RSAD than a simpler model
InCoB 2009
Discovering DE patterns
InCoB 2009
Simulation results – ROC analysis
InCoB 2009
What about this type of patterns?
A change in variability of gene expression?
InCoB 2009
Differential Variability –
a missing pattern in microarray analysis
DV analysis is useful in human disease studies
DV is related to differential coexpression
Ho et al. (2008) Bioinformatics (ISMB’08 issue), 24, i390-i398
InCoB 2009
Solution – DV analysis
InCoB 2009
DV and Non-DV QR models
where each f(.) is a piecewise linear function
Upper quantile
where
and each f(.) is a piecewise linear function
Differences in model
Lower quantile
Non-DV model, the slopes are identical in fupper
and flower
DV model, the slopes are all independent
A gene is said to be DV if
InCoB 2009
Simulation results – ROC analysis
InCoB 2009
Analysis of two brain ageing datasets
Lu dataset
12625 genes
30 individuals, aged 26-106
Colantuoni dataset
31 schizophrenia susceptibility genes
72 individuals, aged 18-67
InCoB 2009
Selection of alpha based on FDR
FDR estimation based
on randomization of
dataset
InCoB 2009
InCoB 2009
DV – Colantuoni dataset
InCoB 2009
DV genes – Lu dataset (1)
Observation: Different individual age at a different rate w.r.t. gene expression changes.
InCoB 2009
DV genes – Lu dataset (2)
InCoB 2009
Extension to multi-class problems
Using RSAD as goodness-of-fit measures, we can
extend our approach to discover DE and DV genes in
multi-class datasets
InCoB 2009
Summary
Novel application of quantile regression
models to identify DE and DV patterns in
ageing microarray datasets.
Our approach is more robust than the
standard least-square linear regression
approach
Application to human brain ageing
InCoB 2009
Acknowledgement
Supervisors
Dr. Michael Charleston (School of IT, USyd)
Prof. Cristobal dos Remedios (School of Med Sci, USyd)
Collaborator
Maurizio Stefani (USyd)
Funding:
Travel fellowship from InCoB’09
The University of Sydney
NICTA
http://www.it.usyd.edu.au/~joshua
InCoB 2009