Modeling DNA Sequenc..
Download
Report
Transcript Modeling DNA Sequenc..
Modeling DNA Sequence Based
cis-Regulatory Gene Networks
Hamid Bolouri and Eric H.
Davidson
Presented by Geoffrey
Introduction
Cis-Regulatory elements can be regarded as pieces of
DNA sequences that have target site sequences
recognized by binding proteins
They are genetically hardwired information
processors and are linked together to form a huge
network
Each element receives informational input that
determines its activity and produces an informational
output in the form of regulatory instructions (i.e.
activates or inhibits other elements)
Introduction
Genetic regulatory apparatus remains
unchanged in every cell. What it does will
depend on the inputs that it receives at each
point in time
Part of the inputs depends on prior
transactions of genes that synthesize the
necessary factors, and part on other events,
such as extra-cellular signals
Cis-Regulatory Elements
Each element carries out some processing of its input
information
Inputs are often multiple while the output is a unique function
that informs the basal transcription apparatus how frequently to
initiate transcription
Example of an element in diagram form
(This diagram shows a gene whose
expression is activated by Ubiquitous
activator and inhibited by protein A)
Cis-Regulatory Elements in
Development
Cis-Regulatory information processing is
important in development because
development depends fundamentally on
spatial (which type of cells and where) and
temporal (when) control of gene
expression
These decisions result from logic functions
carried out by the regulatory elements
For example, a given cis-regulatory
element might lead to the expression of a
gene when two inputs overlap (AND
operation), resulting in the appearance of
a new factor; or it might control the
expression through the interplay between
positive and negative inputs.
Cis-Regulatory Elements in
Development
Hence thinking about cis-regulatory elements from an
informational point of view leads to the mutable,
measurable and regulatory properties of genomic
DNA
The gene sequence of each element will dictate
which input the element will listen to and the
functions it is capable of processing
Each input hence indicates a target site sequence
that can be tested and recognized via mutation or
gene transfer
Illustration – Endo16 Model
The cis-regulatory system of the
endo16 gene of the sea urchin has
been studied in great detail
It has a modestly complex pattern
of expression during its
embryogenesis
It is activated in the vegetal plate of
the embryo, specifically in the Veg2
lineage, at about the 8th cleavage
The Veg2 lineage consists of the
progeny of eight 6th cleavage
founder cells, and from it derives
most of the endoderms
Illustration – Endo16 Model
The endo16 gene is transcribed in this
endomesodermal field until gastrulation
(process by which germ cells of the blastoderm
are translocated to new positions in the
embryo), during which it is expressed
throughout the invaginating archenteron but no
longer in the mesodermal domain
As the gut become regionalized,
expression is extinguished in the
foregut and hindgut but accelerated
in the midgut where it continues to
be expressed in the feeding larva
Illustration – Endo16 Model
Summary of the expression pattern of endo16 gene (shown in blue)
Illustration – Endo16 Model
The cis-regulatory system that controls the endo16 expression is
about 2300 base pairs in length and it consists of several
clusters of target sites that execute distinct functions, hence
each can be thought of as separable modular regulatory
elements
The basal transcription apparatus (Bp) has no regulatory activity
on its own and is used to service regulatory elements expressed
in every domain of the embryo
Illustration – Endo16 Model
Modules A and B carry out many interesting regulatory functions
They have altogether 17 target sites for factors that recognize and bind specifically at given
sequences
A protein, SpGCF1, interacts at 5
sites of module A. The other 12
target sites are serviced by 9
different transcription factors where
each interaction has a distinct and
measurable functional meaning
The details of the interaction are
shown in the box below. The target
sites are indicated by boxes (blue for
Module B and red for Module A).
The arrows lead from the target site
to the logic operations indicated in
circles. The logic operation will then
state how the factors will interact
Illustration – Endo16 Model
The complete model of
the endo16 expression is
shown in the figure
below. The elements
now are the individual
genes that are involved
in the expression
Details of the model can
be seen here [click]
Logic Operation
The model specifies logic operations by which
the inputs are processed and the altered
values are carried forward
Common operations includes:
AND – when all the conditions are met, then the
indicated operations on the value of the output at
that node will take place
OR – when one (or some) of the conditions are
met, then the indicated operations will take place
Logic Operation
There are direct physical implications of the logic
operations. The ‘AND’ operator shows that the proteins
binding at the respective sites are together necessary
for the function to occur (e.g. formation of a huge
functional complex by the transcriptional factors)
However, it does not necessary mean that it is an allor-none output. Alternate outputs with values could be
associated with inputs not being present by adding the
‘else’ portion
The point is that the model describes the functions
that are mediated by each site, conditional on the
inputs present. It does not attempt to describe the
biochemistry of the proteins that contribute to this
function
Simply put, they are just information processing
constructs similar to those that can be found in normal
programming languages
Continuous and Boolean
Functions
Taking again the computational model as an example:
The fill in boxes with solid lines extending indicates inputs where the amplitude varies
over time, e.g. UI, R, OTX
Open boxes with dashed lines indicate inputs that are often present in excess, and
hence can be regarded as boolean inputs, i.e. either they are present, or they are not
Open boxes with thin lines indicate scalar operations on the inputs of the node
Continuous and Boolean
Functions
Hence the endo16 model is not a kinetic model per se
It does not consist of a set of time based differential equations
describing the kinetic reactions
Instead, it describes the logic functions mediated by the DNA
target sites
Although it is not something new in other fields, say,
engineering, but it does offer a refreshing way of modeling gene
regulatory networks, which are predominantly based on
differential equations
Models for Networks of cisRegulatory Elements
Symbolism and Significance
All major processes in animal development are driven forward
by regulatory genes, i.e. genes that express transcription factors
Development events are not discrete and the regulatory
networks that control development are often connected to other
networks that control prior and surrounding processes in both
the spatial and temporal domains
The model used for the cis-Regulatory elements can be used to
model the beginning of the process for which the network
displays the genetic program, as well as the end, which is the
activation of gene batteries (a series of genes), e.g. endo16
which expresses an adhesion protein involved in the gastrulation
of the sea urchin embryo
General Purposes of DNA
sequence based Network Models
The objective of such a model is to:
State the key inputs and outputs of the cis-regulatory
system
Explain why each gene runs where and when it does
How the spatial territories are being built up
Even incomplete models are informative as the
interactions found always have some functional
meaning
Each cis-regulatory system can also be considered as
a ‘black box’ which can be connected to other
systems
Genomic and Nuclear Views
A useful concept for DNA sequence-level
network is the distinction between “View from
Genome” and “View from Nucleus”
The VFG shows all the interactions that the
system is capable of while the VFN focus on
those sites that are occupied by the indicated
inputs in any given nucleus at any given time,
i.e. snapshot
Genomic and Nuclear Views
A simple illustration is shown in
the figure. Here there are two
spatial domains of an embryo –
domain A, and the rest (~A)
The VFG shows that there is a
ubiquitous positive activator
needed for all three genes. But
gene 1 also requires another
positive input to be activated
and it acts positively in domain
A and negatively in others (~A)
This will then affect the
expression of gene 2 and gene 3
Hence in any development
stage, either VFN(A) or VFN(~A)
could be possible
Conclusion
Cis-Regulatory networks serve as a development biologist’s essential
organizer for getting causal relationship between genes
They are essential due to the myriad of information and possible
interactions that may occur
The models used are not actually genetic models although their key
elements are genomic target site sequence elements
The relationship between the elements can be viewed from several
angles, i.e. views – VFG, VFN, Black Box View (Bird’s eye view). No
transformations are needed to transit from one view to another
The model serves also as a predictive tool, enabling developmental
biologists to see what might happen to the regulatory system if a
target site is mutated or experimentally altered