Transcript ppt

A systems biology approach to the
identification and analysis of
transcriptional regulatory
networks in osteocytes
Angela K. Dean, Stephen E. Harris, Jianhua
Ruan
Overview
Osteocytes – Background & Motivation
 Review of Biological Central Dogma
 Osteoctye gene set derivation




Osteocyte purification
Microarray experiments
Functional annotation analysis
Sequence Analysis of promoter regions
 Construction of regulatory network
 Partitioning to define cis-regulatory
modules
 Results

Background – Cellular functions

Certain types of cells perform specific
biological functions
 Key

genes must be activated to perform correctly
Osteocytes play an essential role in regulating
bone formation and remodeling
 We
want to identify these key genes and the
activators of these genes
Why study osteocyte cells?

Identifying these key genes (and their
activators) involved in the bone-formation
process may lead to new targeted therapies
 For
osteoporosis, loss of bone in space travel,
extended bed rest, etc.
Molecular Biology Central Dogma
 We
want to identify these associations between
Transcription Factors and the genes that they regulate
in order to build a “transcriptional regulatory network”
Osteocyte cells are hard to isolate



Embedded within the bone matrix, and lacking
molecular and cell surface markers, they are
seemingly inaccessible
How to characterize and isolate these cells?
Solution: create “special” mouse that contains
inserted “special” gene that drives
fluorescence in osteocytes
Isolating osteocytes

Osteocytes are known to highly express Dentin
matrix protein 1 (DMP1)
A
transgene was created with the same promoter
(activation) region as DMP1 that drives GFP, then
inserted into this transgenic mouse
 Cells that highly express DMP1 (osteocytes) will
also drive GFP

We can now purify osteocytes from
other cells using fluorescence-activated
cell sorting
Identifying key osteocyte genes using
microarray

Microarray experiments allow us to measure the
activity of genes (expression profile)

We compared the expression profiles of the
purified osteocyte cells (+GFP) to non-osteocyte
cells (-GFP)
 Identified the top 269 genes expressed > 3 fold
in the +GFP as compared to –GFP (FDRcorrected p-value < 0.05)
Identifying functionally-related osteocyte genes

Each of the 269 genes has one or more GO
terms or PIR-keywords associated with it
 Gene
Ontology (GO) terms describe biological
processes, cellular components and molecular
functions
 Protein Information Resource (PIR) keyword is an
annotation from the PIR database
Functional Annotation Clustering



For each GO term associated with a gene or group of genes
within the 269 set, a p-value is computed using
hypergeometric dist. and adjusted for multiple testing using
Benjamini method
Enrichment score per cluster is the geometric mean of the
indivual GO p-vals.
DAVID Bioinformatics Tool was used for the clustering
Functional annotation clustering results


As expected, most enriched clusters relate to
“extracellular region”, “system development”, etc.
Cluster 2 relates to bone, and interestingly, Cluster 5
relates to muscle
 We narrowed our 269 gene set to these 98 genes
corresponding to bone and muscle
Identifying TF Binding Sites in the 98 gene set

We searched the 5kb promoter sequence upstream to
TSS of each gene for known TF binding motifs from
TRANSFAC db, using rVista tool
 Filtered the TF motifs to keep only those
conserved between mouse and human genomes

Conserved motifs increase confidence
Identifying TF Binding Sites in the 98 gene set


Many motifs identified related to bone & muscle
67 of the 98 genes contained over 10 conserved Mef2
binding sites in their promoters

Bone & muscle genes and their number of conserved Mef2 binding
sites
Building the transcriptional regulatory network

Created a network consisting of the 98 gene set and
their conserved and enriched TF’s as nodes
 An edge between a gene and a TF represents the
statistically significant presence of that TF’s
binding site on the promoter of that gene
 TF’s filtered using conservation AND enrichment
to produce more reliable edges and reduce noise

Enrichment of a TF motif is determined by a p-value
based on the # of occurrences in the 5kb upstream of this
gene, as compared to the # of occurrences in the 5kb
upstream of the rest of the genes in the genome
Modular structure of the regulatory network


Final network consisted of 98 genes and 153
conserved and over-represented TF’s
To identify possible combinatorial effects of TFBS, we
partitioned the genes in the network using the Q-Cut
algorithm

Q-Cut is a graph partitioning algorithm for finding dense
subnets (i.e., communities). Optimizes a statistical score
called the modularity, and automatically determines the most
appropriate number of communities


We reduced noise and created a more sparse gene-gene
network for better partitioning
We created this temporary network by assigning a cosine
similarity score to each pair of genes according to their
shared TF’s.


Cosine similarity is a measure of similarity between two
vectors (each vector contains 153 slots for the 153 enriched
TFs in the 98 gene set)
Edges between genes represent their similarity score,
and this net was converted to a sparse net by
connecting each gene to its k nearest neighbors (k=7)
and employing a similarity score cutoff of 0.5
Identifying modules in the initial regulatory network

Q-Cut was then applied to this gene-gene network,
resulting in communities with many common TF
binding sites
Interesting clusters

Cluster below shows a strong community structure
between 16 genes and their common TFBS
 Representative of many TF’s coordinately
regulating a small set of genes
A putative model of a transcriptional network


A proposed model was built using the network results
DMP1 & Sost (highly expr. in osteocytes) are shown
to be regulated by Mef2 and Myogenin
Putative model used to generate hypotheses


We now have an ex vivo system for pure osteocytes in
a proper microenvironment to conduct experimental
validation based on this model
 Here the osteocytes will make appropriate levels of
osteocyte-specific genes
Experiments are currently underway
Conclusions



We used a systems biology method to construct a
putative transcriptional regulatory network model for
osteocytes, by integrating
 Microarray data
 Functional annotation
 Comparative genomics
 Graph-theoretic knowledge
Many parts of the network can be confirmed by the
literature
Experiments are currently underway to further validate
the model