Woolfe, 2005
Download
Report
Transcript Woolfe, 2005
Highly Conserved Non-Coding
Sequences are Associated with
Vertebrate Development
PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11.
Yvonne Li
Paper presentation for MEDG505
Jan 27, 2005
Outline
Motivation
Method and Results
Discussion
Motivation
Gene Regulatory Networks for development have been
described in invertebrates but not characterized for
vertebrates
Studies have shown:
a number of developmental genes are regulated by highly
conserved enhancer regions at distances of hundreds of kb
ultra-conserved elements are more frequent than expected
there is a significant association between these highly
conserved elements and DNA binding proteins
Goal: look for all such elements in the entire human
genome and see how they relate to development.
Method
Computationally identify
Computationally analyze
Experimentally validate
Sequence Data
Identifying
CNE : Highly Conserved Noncoding Elements
Which 2 species to use for whole-genome alignment?
Sequence Data
Which 2 species to use for wholegenome alignment?
Human and Fugu
Fugu has 1/8 genome size of human
but similar gene repertoire
Fugu’s developmental blueprint
is very similar to Human
Two ways to detect CNEs
1.
2.
Whole-genome alignment
Regional alignments
Identifying
Identifying
Obtaining CNEs
Start with Fugu genome assembly
MegaBLAST against Ensembl human genome v18.34.1
Remove alignments < 100bp in length
Masked coding and non-coding RNA content
Remove telomere-like sequences and transposons
S
T
A
T
S
1373 core set of elements
Length: ave 199bp
Identity: ave 84%
1365 conserved in
1316 conserved in
1310 conserved in
1093 conserved in
max 736 bp
max 98%
mouse
rat
chicken
zebrafish
CNE Distribution
CNEs in human genome are found on all
chromosomes except 21 and Y
Distribution of CNEs is highly clustered
Clustered CNEs by genomic location
165 clusters
The 20 largest clusters have ≥ 20 CNEs
Analyzing
CNE associated genes
Find most statistically over-represented GO terms
Over 93% of clusters have transdev gene within 500kb of its CNEs. 15%
have 2 or more.
CNEs generally located large distances from nearest gene
For each CNE, extract closest gene from Ensembl
12 of the 13 terms relate to transcriptional regulation and
development
How many clusters situated near such transdev genes?
Analyzing
Average distance between CNE and 5’ end of closest human gene is
182kb, with 93 CNEs > 500kb, and 12 CNEs > 1Mb.
Transdev genes are located in regions of low gene density
Average number of genes within 500 kb upstream or downstream is 16
for all human genes and 6 for transdev genes
Obtaining rCNEs
Identifying
Use MLAGAN (Localized multiple alignment) to identify
additional conserved sequences around specific genes
MLAGAN more sensitive than whole-genome alignment
Species: Human, Fugu, Mouse, Rat
Algorithm itself is more sensitive
Require only 40bp window with 60% identity
Chose 4 cluster regions containing diff types of
developmental genes:
SOX21, PAX6, HLXB9, SHH
Sometimes, the CNEs are more conserved than the
gene’s coding exons!
Sox21 MLAGAN
Vertebrates vs Invertebrates
Are the CNEs also found in invertebrates?
Use all CNEs and rCNEs
Search whole genome sequence of
Ciona intestinalis
Drosophila melanogaster
Caenorhabditis elegans
Anopheles gambiae
No significant matches
(however, the genes have clear homologs)
43 CNEs show significant similarity to at least one other
CNE (their genes have clear paralogous relationships)
Method
Computationally identify CNEs
Computationally analyze CNEs
Experimentally validate a few CNEs
Experimental Validation
Coinject CNEs with green fluorescent protein
(GFP) reporter, in zebrafish embryos
Idea:
CNEs contain something that affects the
transcription of a transdev gene
The transdev gene affects development
Examine the ability of CNEs to up-regulate GFP
reporter expression
Experimental Validation
Chose 25 regions for GFP assay
10 CNEs, 15 rCNEs
Look for GFP expression in live embryos
Average of 200 embryos screened per control
No upregulation
Average of 188 embryos screened per element
GFP expression in all but 2 elements; varied from 4% to 44%
SOX21 associated elements
Known
SRY-related box gene
Acts as a transcriptional repressor
during early development
Expressed in a complex manner in
CNS, and in nasal epithelium, lens
and retina of eye, inner ear
PAX6 associated elements
Known
Paired-box containing transcription
factor, known to be influenced by
cis-acting elements in upstream,
intronic and downstream positions
Expressed in developing eye,
forebrain, hindbrain, spinal cord
HLXB9 associated elements
Known
Homeobox gene associated with
autosomal dominant effects
Zebrafish ortholog is expressed in
notochord, hypochord, tail
mesoderm, and tailbud
SHH associated elements
Known
A signaling molecule
Zebrafish ortholog is expressed
mainly in midline structures like
floorplate and notochord, but also
in branchial arches, pectoral fin
buds, retina
Limitations
CNE-gene misassociated, especially in gene-rich regions
Can kind of tell from results of assays
CNEs missed due to stringent whole-genome analysis
Down regulation of expression will not be detected
Assayed elements out of context and individually
Each element had cases of unexpected expression
Tissues from few cells are underrepresented
Late developing tissues or cell types after 24 h will be
missed completely
Summary
Identified a set of 1373 vertebrate CNEs
Experimentally showed CNE-transdev gene association
CNEs found in clusters, in front of transdev genes
CNEs act at large distances from coding sequence
The relative order and positions of CNEs are conserved
No vertebrate CNEs were found in invertebrates, even
though the genes had clear homologs
Many of these results are paralleled by a similar paper
(Sandelin et al. 2004)
>50bp, >95% Human/Mouse identity
3583 Human/Mouse/Pufferfish UCRs; ave length 125 bp
Discussion
Almost all CNEs are associated with
developmental regulators
CNEs act at large distances from gene
Do most transdev genes have CNEs associated?
They could be enhancers or silencers
The relative positioning and order of CNEs are
completely conserved
Do they play a role in structuring the genomic
architecture around transdev genes?
Discussion
No vertebrate CNEs are found in invertebrates
Are there CNEs in invertebrates?
But PAX6 in Drosophila has been shown to have an
highly effective LE9 enhancer, that is also well
conserved in vertebrates (The Interactive Fly)
Why is it not found in this analysis?
Only 52 bp in length! (but the MLAGAN should have found it ..)
So, maybe invertebrate enhancers/CNEs are shorter
Should maybe look for shorter CNEs in vertebrates
Discussion
Missing whole genome CNEs due to stringency of
parameters.
Try discontinuous MegaBLAST which does not require
exact word match of 20.
Only 109 of 256 of non-coding ultraconserved
regions from Berejano et al. are identified.
Discussion
What is in the CNE?
Modules of transcription factor binding sites?
Regulatory RNAs? (i.e. microRNAs)
Hard to account for the high level conservation.
Perform assays on portions of the CNEs.
Use computational methods.
Lack of EST evidence.
Use regulatory RNA gene finders?
Something else entirely?
One thing is in agreement:
More functional studies are needed.
Discussion
Do CNEs work together?
How to robustly test combinations of elements?
Mutations in CNEs can cause human disease
Studies are showing that mutations in CNEs cause
disorders. CNEs at very distal locations can still
effect the transcription
May be candidates for genetic screens seeking
sequence variation associated with disease
Check it out with dbSNP!
References & Acknowledgements
Thanks to Misha Bilenky for lots of fun discussion
Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T,
Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks
W, Edwards YJ, Cooke JE, Elgar G. Highly conserved non-coding
sequences are associated with vertebrate development. PLoS Biol.
2005 Jan;3(1):e7. Epub 2004 Nov 11.
Elgar, G. Identification and analysis of cis-regulatory elements in
development using comparative genomics with the pufferfish, Fugu
rubripes. Semin Cell Dev Biol. 2004 Dec;15(6):715-9.
Venkatesh B, Yap WH. Comparative genomics using fugu: a tool for
the identification of conserved vertebrate cis-regulatory elements.
Bioessays. 2005 Jan;27(1):100-7.
Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman
WW, Ericson J, Lenhard B. Arrays of ultraconserved non-coding
regions span the loci of key developmental genes in vertebrate
genomes. BMC Genomics. 2004 Dec 21;5(1):99.
The interactive fly. http://www.sdbonline.org/fly/aimain/1aahome.htm