An introduction to Genetical Genomics and Systems
Download
Report
Transcript An introduction to Genetical Genomics and Systems
Inference of Regulatory
Networks via Systems
Genetics
Ina Hoeschele
1
Systems Genetics
Infer cell’s regulatory
structure
Systems Biology
Infer molecular basis of
phenotypes / diseases
Complex Trait Biology
2
Systems Genetics
Measure DNA sequence polymorphisms on a group of
related individuals (<100 to 2000+) covering the entire
genome (e.g. SNPs)
Several genotypes at each polymorphism (e.g. two, 0/1)
Multi-factorial perturbations of a system, genetically randomized
populations
Measure molecular and organismal variables, e.g.
Expression profiling (etraits)
Expression profiling and disease phenotypes
Expression profiling, methylation profiling, disease
Metabolite, protein profiling …
3
Systems Genetics
The genotypes at some polymorphisms
influence directly the expression of certain
genes
in cis: polymorphism A in gene A’s promoter
region influences its transcript abundance
in trans: polymorphism A in gene A’s coding
region influences the function of protein A; let
gene A be a regulator of gene B, then both
polymorphism A and gene A influence the
expression of gene B
4
Systems Genetics
The genes’ expression profiles (=etraits) have
both polymorphism and gene (etrait) regulators
Very large number of targets (regulated genes etc.)
Very large number of potential regulators for each
target
Sample size (n) MUCH smaller than number of
potential regulators (p)
Targets are co-regulated
Regulators are correlated
Regulatory networks are cyclic
Analyses of regulatory programs should account
for all of the above
5
Systems Genetics
One target – one regulator approach
YT = + bPR + e
do for each T and each R (except cis analysis)
low power
trans: YT = + b1YR + b2PR + e (+ cisP)
better power but does not account for coregulation of multiple targets
6
Systems Genetics
One target – all regulators approach
YT = + Rb1RPR (+ Rb2RYR ) + e
do for each T, still does not account for co-regulation
standard variable selection methods and regularization methods
tend not to perform well (n<<p, correlated regulators)
May also need to consider interactions among loci
Often ignored or limited to two-way interactions
Penalization/Regularization methods
Constrained OLS, bounds on Lt norm(s) of coefficients (t=1, 2, …)
Elastic net variable selection (Zou and Hastie 2005)
Extension of lasso (compromise with ridge regression)
n<<p, joint selection of correlated predictors
Bayesian variable selection
Priors on b
MCMC ??
Deterministic (e.g. variational) ??
7
Systems Genetics
Clustering of targets
Analyze jointly the targets in a cluster
Single regulator model, multivariate analysis
costly
PCA within clusters, analyze PCs separately
Analyze cluster with all regulator model
(individual Y model but joint variable
selection)
Geronemo: iteratively perform clustering and
selection of cluster=module regulators (regression
tree) (Lee et al. 2006)
8
Systems Genetics
Biclustering, two-group association
Find groups of targets regulated by groups of
polymorphisms
Biclustering based on matrix of associations btw
targets and polymorphisms – efficient but meaningful
results?
Various approaches for two-group association
Penalized Canonical Correlation Analysis (CCA)
Represent CCA in regression framework
Bayesian CCA (probabilistic interpretation, joint latent factor
model for both groups of variables)
MCMC (convergence issues, see factor analysis)
deterministic (variational)
9
Systems Genetics
Two-step regulatory network inference
1a) Construct an Undirected Dependency Graph (UDG) using
target data (e.g., expression) only
1b) Determine which polymorphisms affects which targets and use
this information to direct edges (e.g., Neto et al. 2008)
2a) Perform cis and trans polymorphism analysis and combine into
an encompassing network (Liu et al. 2008)
2b) Sparsify the network, using structural equation modeling SEM
extension of linear regression (variables can be both response and
predictor)
likelihoods for SEM and LR not the same for cyclic networks
Toward one-step regulatory network inference
Geronemo (etraits; small list (~300) of candidate regulator
genes)
10
11