An introduction to Genetical Genomics and Systems

Download Report

Transcript An introduction to Genetical Genomics and Systems

Inference of Regulatory
Networks via Systems
Genetics
Ina Hoeschele
1
Systems Genetics
Infer cell’s regulatory
structure
Systems Biology
Infer molecular basis of
phenotypes / diseases
Complex Trait Biology
2
Systems Genetics

Measure DNA sequence polymorphisms on a group of
related individuals (<100 to 2000+) covering the entire
genome (e.g. SNPs)



Several genotypes at each polymorphism (e.g. two, 0/1)
Multi-factorial perturbations of a system, genetically randomized
populations
Measure molecular and organismal variables, e.g.




Expression profiling (etraits)
Expression profiling and disease phenotypes
Expression profiling, methylation profiling, disease
Metabolite, protein profiling …
3
Systems Genetics

The genotypes at some polymorphisms
influence directly the expression of certain
genes


in cis: polymorphism A in gene A’s promoter
region influences its transcript abundance
in trans: polymorphism A in gene A’s coding
region influences the function of protein A; let
gene A be a regulator of gene B, then both
polymorphism A and gene A influence the
expression of gene B
4
Systems Genetics

The genes’ expression profiles (=etraits) have
both polymorphism and gene (etrait) regulators







Very large number of targets (regulated genes etc.)
Very large number of potential regulators for each
target
Sample size (n) MUCH smaller than number of
potential regulators (p)
Targets are co-regulated
Regulators are correlated
Regulatory networks are cyclic
Analyses of regulatory programs should account
for all of the above
5
Systems Genetics

One target – one regulator approach





YT =  + bPR + e
do for each T and each R (except cis analysis)
low power
trans: YT =  + b1YR + b2PR + e (+ cisP)
better power but does not account for coregulation of multiple targets
6
Systems Genetics

One target – all regulators approach




YT =  + Rb1RPR (+ Rb2RYR ) + e
do for each T, still does not account for co-regulation
standard variable selection methods and regularization methods
tend not to perform well (n<<p, correlated regulators)
May also need to consider interactions among loci


Often ignored or limited to two-way interactions
Penalization/Regularization methods


Constrained OLS, bounds on Lt norm(s) of coefficients (t=1, 2, …)
Elastic net variable selection (Zou and Hastie 2005)



Extension of lasso (compromise with ridge regression)
n<<p, joint selection of correlated predictors
Bayesian variable selection



Priors on b
MCMC ??
Deterministic (e.g. variational) ??
7
Systems Genetics

Clustering of targets




Analyze jointly the targets in a cluster
Single regulator model, multivariate analysis
costly
PCA within clusters, analyze PCs separately
Analyze cluster with all regulator model
(individual Y model but joint variable
selection)

Geronemo: iteratively perform clustering and
selection of cluster=module regulators (regression
tree) (Lee et al. 2006)
8
Systems Genetics

Biclustering, two-group association



Find groups of targets regulated by groups of
polymorphisms
Biclustering based on matrix of associations btw
targets and polymorphisms – efficient but meaningful
results?
Various approaches for two-group association

Penalized Canonical Correlation Analysis (CCA)


Represent CCA in regression framework
Bayesian CCA (probabilistic interpretation, joint latent factor
model for both groups of variables)


MCMC (convergence issues, see factor analysis)
deterministic (variational)
9
Systems Genetics

Two-step regulatory network inference
1a) Construct an Undirected Dependency Graph (UDG) using
target data (e.g., expression) only
1b) Determine which polymorphisms affects which targets and use
this information to direct edges (e.g., Neto et al. 2008)
2a) Perform cis and trans polymorphism analysis and combine into
an encompassing network (Liu et al. 2008)
2b) Sparsify the network, using structural equation modeling SEM



extension of linear regression (variables can be both response and
predictor)
likelihoods for SEM and LR not the same for cyclic networks
Toward one-step regulatory network inference

Geronemo (etraits; small list (~300) of candidate regulator
genes)
10
11