Transcript ppt
A systems biology approach to the
identification and analysis of
transcriptional regulatory
networks in osteocytes
Angela K. Dean, Stephen E. Harris, Jianhua
Ruan
Overview
Osteocytes – Background & Motivation
Review of Biological Central Dogma
Osteoctye gene set derivation
Osteocyte purification
Microarray experiments
Functional annotation analysis
Sequence Analysis of promoter regions
Construction of regulatory network
Partitioning to define cis-regulatory
modules
Results
Background – Cellular functions
Certain types of cells perform specific
biological functions
Key
genes must be activated to perform correctly
Osteocytes play an essential role in regulating
bone formation and remodeling
We
want to identify these key genes and the
activators of these genes
Why study osteocyte cells?
Identifying these key genes (and their
activators) involved in the bone-formation
process may lead to new targeted therapies
For
osteoporosis, loss of bone in space travel,
extended bed rest, etc.
Molecular Biology Central Dogma
We
want to identify these associations between
Transcription Factors and the genes that they regulate
in order to build a “transcriptional regulatory network”
Osteocyte cells are hard to isolate
Embedded within the bone matrix, and lacking
molecular and cell surface markers, they are
seemingly inaccessible
How to characterize and isolate these cells?
Solution: create “special” mouse that contains
inserted “special” gene that drives
fluorescence in osteocytes
Isolating osteocytes
Osteocytes are known to highly express Dentin
matrix protein 1 (DMP1)
A
transgene was created with the same promoter
(activation) region as DMP1 that drives GFP, then
inserted into this transgenic mouse
Cells that highly express DMP1 (osteocytes) will
also drive GFP
We can now purify osteocytes from
other cells using fluorescence-activated
cell sorting
Identifying key osteocyte genes using
microarray
Microarray experiments allow us to measure the
activity of genes (expression profile)
We compared the expression profiles of the
purified osteocyte cells (+GFP) to non-osteocyte
cells (-GFP)
Identified the top 269 genes expressed > 3 fold
in the +GFP as compared to –GFP (FDRcorrected p-value < 0.05)
Identifying functionally-related osteocyte genes
Each of the 269 genes has one or more GO
terms or PIR-keywords associated with it
Gene
Ontology (GO) terms describe biological
processes, cellular components and molecular
functions
Protein Information Resource (PIR) keyword is an
annotation from the PIR database
Functional Annotation Clustering
For each GO term associated with a gene or group of genes
within the 269 set, a p-value is computed using
hypergeometric dist. and adjusted for multiple testing using
Benjamini method
Enrichment score per cluster is the geometric mean of the
indivual GO p-vals.
DAVID Bioinformatics Tool was used for the clustering
Functional annotation clustering results
As expected, most enriched clusters relate to
“extracellular region”, “system development”, etc.
Cluster 2 relates to bone, and interestingly, Cluster 5
relates to muscle
We narrowed our 269 gene set to these 98 genes
corresponding to bone and muscle
Identifying TF Binding Sites in the 98 gene set
We searched the 5kb promoter sequence upstream to
TSS of each gene for known TF binding motifs from
TRANSFAC db, using rVista tool
Filtered the TF motifs to keep only those
conserved between mouse and human genomes
Conserved motifs increase confidence
Identifying TF Binding Sites in the 98 gene set
Many motifs identified related to bone & muscle
67 of the 98 genes contained over 10 conserved Mef2
binding sites in their promoters
Bone & muscle genes and their number of conserved Mef2 binding
sites
Building the transcriptional regulatory network
Created a network consisting of the 98 gene set and
their conserved and enriched TF’s as nodes
An edge between a gene and a TF represents the
statistically significant presence of that TF’s
binding site on the promoter of that gene
TF’s filtered using conservation AND enrichment
to produce more reliable edges and reduce noise
Enrichment of a TF motif is determined by a p-value
based on the # of occurrences in the 5kb upstream of this
gene, as compared to the # of occurrences in the 5kb
upstream of the rest of the genes in the genome
Modular structure of the regulatory network
Final network consisted of 98 genes and 153
conserved and over-represented TF’s
To identify possible combinatorial effects of TFBS, we
partitioned the genes in the network using the Q-Cut
algorithm
Q-Cut is a graph partitioning algorithm for finding dense
subnets (i.e., communities). Optimizes a statistical score
called the modularity, and automatically determines the most
appropriate number of communities
We reduced noise and created a more sparse gene-gene
network for better partitioning
We created this temporary network by assigning a cosine
similarity score to each pair of genes according to their
shared TF’s.
Cosine similarity is a measure of similarity between two
vectors (each vector contains 153 slots for the 153 enriched
TFs in the 98 gene set)
Edges between genes represent their similarity score,
and this net was converted to a sparse net by
connecting each gene to its k nearest neighbors (k=7)
and employing a similarity score cutoff of 0.5
Identifying modules in the initial regulatory network
Q-Cut was then applied to this gene-gene network,
resulting in communities with many common TF
binding sites
Interesting clusters
Cluster below shows a strong community structure
between 16 genes and their common TFBS
Representative of many TF’s coordinately
regulating a small set of genes
A putative model of a transcriptional network
A proposed model was built using the network results
DMP1 & Sost (highly expr. in osteocytes) are shown
to be regulated by Mef2 and Myogenin
Putative model used to generate hypotheses
We now have an ex vivo system for pure osteocytes in
a proper microenvironment to conduct experimental
validation based on this model
Here the osteocytes will make appropriate levels of
osteocyte-specific genes
Experiments are currently underway
Conclusions
We used a systems biology method to construct a
putative transcriptional regulatory network model for
osteocytes, by integrating
Microarray data
Functional annotation
Comparative genomics
Graph-theoretic knowledge
Many parts of the network can be confirmed by the
literature
Experiments are currently underway to further validate
the model