Transcript Document

Manolis Kellis
modENCODE analysis group
January 11, 2007
Part 1: Target identification: comparative vs. exprmt.
(really the topic for today)
Part 2: Target validation (optional)
Part 3: Motif discovery (optional)
Part 4: Enhancer identification (optional)
Part 1
Identifying targets using
comparative genomics
Evolutionary signatures of motif instances
• Allow for motif movements
– Sequencing/alignment errors
– Loss, movement, divergence
• Measure branch-length score
– Sum evidence along branches
– Close species little contribution
BLS: 25%
Mef2:YTAWWWWTAR
BLS: 83%
Motif confidence selects functional instances
Transcription factor motifs
Confidence
Increasing BLS 
Increasing confidence
Confidence selects
functional regions
Confidence
Confidence selects
in vivo bound sites
High
sensitivity
microRNA motifs
Increasing BLS 
Increasing confidence
Confidence selects
functional regions
Confidence selects
positive strand
Initial regulatory network for an animal genome
• ChIP-grade quality
– Similar functional
enrichment
– High sens. High spec.
• Systems-level
–
–
–
–
81% of Transc. Factors
86% of microRNAs
8k + 2k targets
46k connections
• Lessons learned
– Pre- and post- are
correlated (hihi/lolo)
– Regulators are heavily
targeted, feedback loop
Network captures literature-supported connections
Network captures co-expression supported edges
Red = co-expressed
Grey = not co-expressed
Named = literature-supported
Bold = literature-supported
46% of edges
are supported (P=10-3)
ChIP vs. conservation: similar power / complementary
• Together: best
 complementary
• Bound but not
conserved:
reduced enrichmnt
 Selects functional
• All-ChIP vs. Allcons: similar enr.
 Similar power
• Cons-only vs.
ChIP-all: similar
 Additional sites
Part 2
Cool story of miRNA targets
for a new anti-sense miRNA
Surprise: miR-Anti-sense function
•
•
•
•
A single miRNA locus transcribed from both strands
Both processed to mature miRNAs: mir-iab-4, miR-iab-4AS (anti-sense)
The two miRNAs show distinct expression domains (mutually exclusive)
The two show distinct Hox targets – another Hox master regulator
Sensory bristles
wing
haltere
wing
w/bristles
haltere
WT
wing
sense
• Mis-expression of mir-iab-4S & AS:
altereswings homeotic transform.
• Stronger phenotype for AS miRNA
• Sense/anti-sense pairs as general
building blocks for miRNA regulation
• 9 new anti-sense miRNAs in mouse
Antisense
Note: C,D,E same magnification
Surprise: miR-Anti-sense function
Part 3 (optional)
Discovering motifs
Evolutionary signatures for regulatory motifs
5’-UTR
Known
engrailed
site
(footprint)
D.mel
D.sim
D.sec
D.yak
D.ere
D.ana
3’-UTR
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCT--AGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGC--TAGCC-AACTCTCTAATTAGCGACTAAGTC-CAAGTC
CAGCGGTCGCCAAACTCTCTAATTAGCGACCAAGTC-CAAGTC
CACTAGTTCCTAGGCACTCTAATTAGCAAGTTAGTCTCTAGAG
**
*
* *********** *
**** * **
D.mel
D. ere
D. ana
D. pse.
• Individual motif instances are preferentially conserved
• Measure conservation across entire genome
– Over thousands of motif instances  Increased discovery power
– Couple to rapid enumeration and rapid string search
 De novo discovery of regulatory motifs
Power of evolutionary signatures for motif discovery
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Consensus
CTAATTAAA
TTKCAATTAA
WATTRATTK
AAATTTATGCK
GCAATAAA
DTAATTTRYNR
TGATTAAT
YMATTAAAA
AAACNNGTT
RATTKAATT
GCACGTGT
AACASCTG
AATTRMATTA
TATGCWAAT
TAATTATG
CATNAATCA
TTACATAA
RTAAATCAA
AATKNMATTT
ATGTCAAHT
ATAAAYAAA
YYAATCAAA
WTTTTATG
TTTYMATTA
TGTMAATA
TAAYGAG
AAAKTGA
AAANNAAA
RTAAWTTAT
TTATTTAYR
MCS
65.6
57.3
54.9
54.4
51
46.7
45.7
43.1
41.2
40
39.5
38.8
38.2
37.8
37.5
36.9
36.9
36.3
36
35.6
35.5
33.9
33.8
33.6
33.2
33.1
32.9
32.9
32.9
32.9
Matches to known
engrailed (en)
reversed-polarity (repo)
araucan (ara)
paired (prd)
ventral veins lacking (vvl)
Ultrabithorax (Ubx)
apterous (ap)
abdominal A (abd-A)
fushi tarazu (ftz)
broad-Z3 (br-Z3)
Antennapedia (Antp)
Abdominal B (Abd-B)
extradenticle (exd)
gooseberry-neuro (gsb-n)
Deformed (Dfd)
Expression enrichment
Promoters
25.4
5.8
11.7
4.5
13.2
16
7.1
7
20.1
3.9
17.9
10.7
19.5
5.8
14.1
1.8
5.4
3.2
3.6
2.4
57.2
5.3
6.3
6.7
8.9
4.7
7.6
449.7
11
30.7
Enhancers
2
4.2
2.6
16.5
0.3
3.3
1.7
2.2
4.3
0.7
1.2
2
5.4
1.7
2.8
0
4.6
-0.5
0.6
6
1.7
1.6
2.7
0.3
0.8
0.8
Ability to discover full dictionary of regulatory motifs de novo
Tissue-specific enrichment and clustering
Functional clusters emerge
• Infer candidate functions for novel motifs
• Reveal ‘modules’ of co-operating motifs
Discovered motifs show positional biases
• May represent new core promoter elements
• Show enrichment in distinct functional categories
Recognizing functional motifs in coding regions
miRNAs
Top motifs
• Challenge:
– Overlapping selective pressures
– Most ‘motifs’ from di-codon biases
– Hundreds of motifs due to noise
• Solution:
– Test each frame offset separately
– Di-codon biases  Frame biased
– True motifs  Frame unbiased
• Result:
– Top 20 motifs  11 miRNA seeds
– (before: 11 seeds in 200+ motifs)
Ability to distinguish overlapping pressures
Evidence of miRNA targeting in coding reg.
miRNA targeting in protein-coding regions
• MicroRNA seeds are specifically selected
• Coding & 3’UTRs show same conservation profile
Part 4 (optional)
Characterizing enhancers
Developmental enhancer identification in Drosophila
• Supported by tiling arrays and regulatory motifs (nucleotide resolution)
• Identify nearly all known enhancers (20 of 22 highly bound)
Bound in vivo.
Conserved D/Tw/Sn motifs in 12 flies.
Clear DV expression pattern (lacZ/end).
• Large number of novel enhancers (428 Dorsal/Twi/Sna). They validate!
Surprise 1: AP genes targeted by DV regulators
• Novel DorsoVentral enhancers in known AntPosterior genes
– Bound in vivo by DV genes (by all three DV master regulators)
– Show evolutionarily conserved motifs for all three DV factors
– Yet, found in known AP genes, with clear AP expression patterns
 Integration of DV and AP patterning networks
Surprise 2: Some silent genes show Pol II binding
Active
Repressed
Poised
• Distinct modes of Pol II occupancy
– Active genes (27%): Pol II throughout the gene, transcribing
– Repressed genes (37%): Pol II simply absent, no expression
• Third class (12%): Pol II found only at the TSS, stalled
– Qualitatively different: abundantly bound, but strongly punctate
– Genes not expressed: known repressed genes, confirmed by arrays
– Enriched in development, neurogenesis, ectoderm, muscle differ.
• Hypothesis: Developmental genes poised for expression
– Reminiscent of ‘bivalent’ K4/K27 domains in mammals
Surprise 3: Master regulators also bind downstream targets
• Abundant feed-forward loops in DV patterning
• Cooperation of master reg. & downstream reg.
Manolis Kellis - modENCODE analysis - summary
• Part 1: Target identification
– Comp. vs. Expt: each has unique advantages
– Bound & not conserved appear less functional!
• Part 2: Target validation (for anti-sense miRNA)
– It’s nice when expected outcome comes true
– Need more collaborations for target validation
• Part 3: Motif discovery
– Methods for genome-wide motif discovery
– Expect increased power in bound regions
• Part 4: Enhancer identification
– Many new enhancers – with motifs & validation
– AP / DV system cross-talk – expect dense network
– PolII stalling: spatial dynamics matter
Who’s actually doing the work
Main contributors:
Alex Stark
Pouya
Kheradpour
Julia
Zeitlinger
Collaborators:
Targets
Sushmita Roy @ UNM
iab-4AS
Natascha Bushati, Steve Cohen @ EMBL
Julius Brennecke, Greg Hannon @ CSHL
Calvin Jan, David Bartel @ Whitehead
Enhancers
Julia Zeitlinger, Rick Young @ Whitehead
Robert Zinzen, Mike Levine @ UC Berkeley