TA_4_Ho - The International Conference on Bioinformatics

Download Report

Transcript TA_4_Ho - The International Conference on Bioinformatics

A model selection approach to discover
age-dependent gene expression patterns
using quantile regression models
Joshua W.K. Ho1,2
1 School
of IT, The University of Sydney
2 NICTA, Australian Technology Park, NSW
[email protected]
Joint work with
Maurizio Stefani,
Cristobal dos Remedios and
Michael Charleston
Ageing Microarray Dataset
Arrays
Gene
Age
InCoB 2009
Discover age-dependent patterns
How to find DE patterns?
InCoB 2009
Standard approach

Linear regression using method of least
squares

Second order linear regression (Quadratic
regression)
InCoB 2009
Quantile regression

Solve a different optimization problem:
where
InCoB 2009
Check function

We obtain a median regression line when
InCoB 2009
A quantile regression line
How to know slope of
the quantile
regression line is 0 ?

We obtain a median regression line when
InCoB 2009
Solution: a model selection approach
InCoB 2009
Quantile regression models
Constant Model (C)

Linear Model (L)

Piecewise Model (PL)
where
Model complexity

InCoB 2009
Model selection basics

We measure the goodness-of-fit of a model
based on Residual Sum of Absolute
Differences (RSAD):
• The smaller the RSAD, the better the fit.
• A more complex model always yields lower RSAD than a simpler model
InCoB 2009
Discovering DE patterns
InCoB 2009
Simulation results – ROC analysis
InCoB 2009
What about this type of patterns?

A change in variability of gene expression?
InCoB 2009
Differential Variability –
a missing pattern in microarray analysis


DV analysis is useful in human disease studies
DV is related to differential coexpression
Ho et al. (2008) Bioinformatics (ISMB’08 issue), 24, i390-i398
InCoB 2009
Solution – DV analysis
InCoB 2009
DV and Non-DV QR models

where each f(.) is a piecewise linear function
Upper quantile
where

and each f(.) is a piecewise linear function
Differences in model



Lower quantile
Non-DV model, the slopes are identical in fupper
and flower
DV model, the slopes are all independent
A gene is said to be DV if
InCoB 2009
Simulation results – ROC analysis
InCoB 2009
Analysis of two brain ageing datasets

Lu dataset



12625 genes
30 individuals, aged 26-106
Colantuoni dataset


31 schizophrenia susceptibility genes
72 individuals, aged 18-67
InCoB 2009
Selection of alpha based on FDR
FDR estimation based
on randomization of
dataset
InCoB 2009
InCoB 2009
DV – Colantuoni dataset
InCoB 2009
DV genes – Lu dataset (1)
Observation: Different individual age at a different rate w.r.t. gene expression changes.
InCoB 2009
DV genes – Lu dataset (2)
InCoB 2009
Extension to multi-class problems

Using RSAD as goodness-of-fit measures, we can
extend our approach to discover DE and DV genes in
multi-class datasets
InCoB 2009
Summary

Novel application of quantile regression
models to identify DE and DV patterns in
ageing microarray datasets.

Our approach is more robust than the
standard least-square linear regression
approach

Application to human brain ageing
InCoB 2009
Acknowledgement


Supervisors

Dr. Michael Charleston (School of IT, USyd)

Prof. Cristobal dos Remedios (School of Med Sci, USyd)
Collaborator


Maurizio Stefani (USyd)
Funding:
 Travel fellowship from InCoB’09
 The University of Sydney
 NICTA
http://www.it.usyd.edu.au/~joshua
InCoB 2009