Poster - Anil Jegga - Cincinnati Children`s Hospital Medical Center

Download Report

Transcript Poster - Anil Jegga - Cincinnati Children`s Hospital Medical Center

Identification of Compositionally Similar Cis-element Clusters in Coordinately Regulated Genes
Abstract: A singular efficient method to decipher the underlying transcriptional
control elements in higher eukaryotic genomes is still elusive. We have explored
the extension of comparative genomics approaches to tackle this problem using
known TF binding sites. Starting with an earlier developed method for
identification of conserved cis-elements that are contained within evolutionarily
conserved genomic regions (http://trafac.chmcc.org), we extended the query to
identify compositionally similar cis-regulatory element clusters that occur in
groups of co-expressed genes within each of their ortholog-pair evolutionarily
conserved cis-regulatory regions (“peak analyzer”). We have tested series of coregulated ortholog pairs of promoters and genes using known regulatory regions
as training sets and microarray array profile data based co-expressed genes as test
sets in the central nervous system, liver, olfactory and immuno-hematologic
systems. Our results suggest that this combinatorial approach is broadly sensitive
for the identification of known and potential regulatory regions containing
conserved cis-elements for known compartment-specific trans-acting factors.
However, sensitive detection of some known regulatory regions leads to an
abundance of apparently false positives. We believe this approach can be
substantially refined by improvement in the use of compositional similarity
algorithms and weighted detection of preferred architecture models.
Anil G Jegga, Ashima Gupta, Andrew T Pinski, James W Carman, Bruce J Aronow
Cincinnati Children’s Hospital Medical Center, Cincinnati, OH-45229
Skeletal Muscle Genes - Regulogram depiction of shared cis-elements: Horizontal bars with colored segments (exons)
are human and mouse genomic sequences. The different colored quadrilaterals are regions of alignment. Within each of
these blocks, the % sequence similarity and the number of TF-binding sites are represented as two separate line graphs.
TraFaC images of the experimentally validated regulatory regions of Skeletal
Muscle genes (represented as blue circle on regulograms): The two gray vertical
bars are the two genes that are compared. The TF-binding sites occurring in both
the genes are highlighted as various colored bars drawn across the two genes.
DES: Upstream Enhancer Region
MYL1: Intronic Enhancer Region
Method:
GeneChip Experiments
ESTs/cDNAs
A set of Coordinately Expressed Genes
1. Cis-elements that are not conserved across
the orthologous genes cannot be identified
even though they occur in regions of
sequence similarity across the species..
>Seq 2 Mouse/Human Genomic
GACTGAGGGCTTGTGAAACAG
CAAGAACCTGTCTCAAAAAACA
GTGGGCAGGGAGGGGATTAAT
GAATAGGCAGCTACGTTCTGGG
ACTGGAGGGACTCGAGGTGGC
TAGAAAGCAAGAGGTACTGGGA
GACAAGGCTGCAGACATTTCTT
TTTTTACTAGAGTC
BlastZ
TF Binding Sites
V$ETSF/ETS1_B
8333 - 8347
V$STAT/STAT1_01
8335 - 8355
V$ETSF/PU1_B
8335 - 8350
V$ETSF/GABP_B
8336 - 8347
V$ETSF/NRF2_01
8338 - 8347
V$CLOX/CDPCR3_01 8363 - 8377
TF Binding Sites
CKM: Upstream Enhancer Region
Local Alignment
Similarity Score: 3074
Match Percentage: 51 %
Number of Matches: 96
Number of Mismatches: 39
Total Length of Gaps: 52
Begins at (8281,8874) and
Ends at (8416,9059)
Seq 1
<--> Seq 2
Sim% Nt
8281-8300 <--> 8874-8893 70% (20 nt)
8301-8310 <--> 8902-8911 90% (10 nt)
8311-8324 <--> 8923-8936 57% (14 nt)
8325-8376 <--> 8947-8998 62% (52 nt)
8378-8386 <--> 8999-9007 67% (9 nt)
8387-8416 <--> 9030-9059 90% (30 nt)
Trafac
V$ETSF/ETS1_B
8880-8894
V$STAT/STAT1_01
8881-8901
V$ETSF/PU1_B
8882-8897
V$ETSF/NRF2_01
8892-8902
V$CLOX/CDPCR3_01
8908-8922
V$GATA/GATA_C
8916-8928
Coordinately Expressed Genes in Olfactory Mucosa: Three
genes with high levels of expression in Olfactory Mucosa shared
several clusters of cis-elements. Each of these clusters was also
conserved in human and mouse. The window size ranged from 200
to 300 base pairs. Two of the genes (XM_134943 and XM_143313)
depicted here encode hypothetical proteins while the third is TPD52
(Tumor protein D52) (Genter et al., 2003).
Gene 1: Hs-Mm
Seq 1
<--> Seq 2
Sim% Nt
Hits
8301-8310 <--> 8902-8911 90% (10 nt)
3
8311-8324 <--> 8923-8936 57% (14 nt)
2
Gene 2: Hs-Mm
Gene 3: Hs-Mm
8325-8376 <--> 8947-8998 62% (52 nt)
3
Seq 3
<--> Seq 4
Sim% Nt 8378-8386
Hits <--> 8999-9007 67% (9 nt)Seq 7 0 <--> Seq 8
Sim% Nt
Hits
8301-8310 <--> 8902-8911 90% (10 nt)
3 <--> 9030-9059 90% (30 nt)
8301-8310
3
8387-8416
4 <--> 8902-8911 90% (10 nt)
8311-8324 <--> 8923-8936 57% (14 nt)
2
8311-8324 <--> 8923-8936 57% (14 nt)
2
8325-8376 <--> 8947-8998 62% (52 nt)
3
8325-8376 <--> 8947-8998 62% (52 nt)
3
Seq 9
<--> Seq 10
Sim% Nt
Seq 5
<--> Seq 6
Sim% Nt
Hits
8378-8386 <--> 8999-9007 67% (9 nt)
0
8378-8386 <--> 8999-9007 67% (9 nt)
0
8301-8310
<-->
8902-8911
90%
(10
nt)
8301-8310 <--> 8902-8911 90% (10 nt)
3
8387-8416 <--> 9030-9059 90% (30 nt)
4
8387-8416 <--> 9030-9059 90% (30 nt)
4
8311-8324
<-->
8923-8936
57%
(14
nt)
8311-8324 <--> 8923-8936 57% (14 nt)
2
8325-8376 <--> 8947-8998 62% (52 nt)
8325-8376 <--> 8947-8998 62% (52 nt)
3
8378-8386 <--> 8999-9007 67% (9 nt)
8378-8386 <--> 8999-9007 67% (9 nt)
0
8387-8416 <--> 9030-9059 90% (30 nt)
8387-8416 <--> 9030-9059 90% (30 nt)
4
Peak-Analyzer
Gene 4: Hs-Mm
Gene1: Hs-Mm, Gene2: Hs-Mm, Gene3: Hs-Mm, … Genen: Hs-Mm
After the initial genomic sequence
alignment of orthologous skeletal muscle
genes (DES (Desmin), MYL1 (Myosin light
polypeptide 1), CKM (creatine kinase
muscle) and ENO3 (enolase 3 beta,
muscle)), the “peaks” or “hits” (common
cis-elements between orthologous gene pair
and occurring in conserved genomic
regions) were compared to identify shared
cis-regulatory modules. The identified cis
clusters included the experimental validated
regulatory regions in each of these genes
and comprised of multiple muscle
regulatory cis-elements (Wasserman and
Fickett, 1998) . The horizontal lines are the
genomic sequences of the base species
(human in this case). Yellow vertical bars
are the exons. The different colored boxes
represent the different cis-clusters.
Limitations:
BlastN/Blat Search for
genomic sequence retrieval
>Seq 1 Human/Mouse Genomic
AGAGAAAATTGCTAGAGCTCA
GGAGTTTGAGACCAGCCTGG
GCAATAGAGTAAGACTTTGTCT
CTATCAAAAATTTAAAAATTAAC
TGGGCTTGGCGGTGTGCACC
TGTGGTCCAGCTACTCAGGAG
GCTGAGGTGGGAGGATTGCTT
GAGCCCAAGA
Peak Analyzer:
ENO3: Intronic Enhancer Region
2. Cis-elements that occur in non-aligned
genomic regions across the two species
cannot be identified by this approach.
Conclusions:
1. The combinatorial approach of identifying coordinately regulated genes that
share compositional similarity of cis-elements within their orthologous noncoding genomic regions offers a powerful filter that can aid in the
identification of potential functional cis-clusters.
2. Peak analyzer appears capable of identifying known and novel regulatory
modules within a cluster of coordinately regulated genes.
3. These novel cis-element modules may be useable as probes for genome wide
annotation of potential regulatory regions.
Hits
3
2
3
0
4
Gene 5: Hs-Mm
Peak Analyzer
Coordinately Expressed Genes in Cerebellum: Conserved Ciselement clusters (200-300 base pair window) between human and
mouse homologs and shared by four genes (ATP2A2 (Ca++ATPase); HPCAL1 (Hippocalcin-like 1); CACNA1A (P/Q type Ca
channel alpha 1A); and PLA2G7 (phospholipase A2 group VII)).
highly expressed in Cerebellum (Zhang et al., 2003).
References:
http://trafac.chmcc.org
Support:
HHMI and NIEHS U01 ES11038 Mouse Centers Genomics Consortium