Seminar Presentation

Download Report

Transcript Seminar Presentation

Discovery of New Regulatory Motifs of
Purine Biosynthetic Genes in
Escherichia Coli and Bacillus Subtilis
Indiana University
School of Informatics
Haifeng Zhao
Outline of Presentation
 Project Goals
 Introduction
 PlatCom
 Discovery of DNA Regulatory Motifs
 Results
 Discussion
Project Goals
 Develop a Platform for Comparative
Study of Predicted Proteins and
Genomic Sequences
 Analyze the Transcription Regulatory
Motifs of De Novo Purine
Biosynthetic Pathway of Escherichia
Coli and Bacillus Subtilis.
Purine de novo synthesis
http://gtcw3.aist-nara.ac.jp/mori/research/dbservice/operon/fig17.htm
Genbank Data
A_thaliana
Escherichia_coli
Bacteria
Bacillus_subtilis
Completely
C_elegans
…
Sequenced
Plasmodium_falciparum
Genomes
P_falciparum
S_cerevisiae
Genomes
D_melanogaster
Anopheles_gambiae
Incompletely
Sequenced
Genomes
H_sapiens
R_norvegicus
MITOCHONDRIA
M_musculus
PlatCom
A Platform for Computational
Comparative Genomics
1. Building databases of all pairwise
comparisons.
2. A toolkit for multiple genome
comparisons.
PlatCom
Genbank Data
BlastZ: Gapped BLAST algorithm designed
*.fna.cmp
*.faa.cmp
*.est.cmp
for aligning two long genomic sequences
FASTA
PlatCorm
Browser
NCBI FTP Server
IBM Super Computer
Server
PlatCom
Dynamically Update the Databases
1. Update Genome Data
2. Add New Genome Data
3. Automatically Detect Missing Data
Discovery of DNA Regulatory Motifs
Genome
Predict Coregulated
Sequences
Set of Genes
Use Motif-Finding Aglorithm
DNA Regulatory Motifs
on Upstream Regions
Identify De Novo Purine (PurR)
Biosynthetic Genes of E. coli
http://biocyc.org
Identify Orthologs of Bacteria in COG
Database
547 Genes
COG0015 COG0026
COG0034 COG0041
COG0046 COG0047
COG0138 COG0150
COG0151 COG0152
COG0299 COG0516
COG0517 COG0518
COG0519
………
Identify Upstream Regulatory Regions
C
Operon Head
B
A
Convert Gene Names of COG Database
to Gene Names of GenBank Database
Extract upstream regions
GenBank
*.gbk
DataBases of
Parser
Upstream Regions
Motif-Finding Algorithms
1. Gibbs Sampler Algorithm
2. AlignACE ( Based on Gibbs Sampler )
3. MEME
4. MACAW
Run AlignACE and MEME
100, 300 bp Upstream Databases
AlignACE
MEME
Motifs
ScanACE
MAST
Escherichia Coli
Bacillus Subtilis
Sites
DPInteract (E. coli)
>guaBA 48->74
ggtagatgcaatcggttacgctctgt
>purB -205->-179
TGCCGACGCAATCGGTTACCTTGATG
>purC 148->174
atgatacgcaaacgtgtgcgtctgca
>purEK 66->92
GAGCAAGGAAAACGGTTGCGTGGCTG
>purF
tccctacgcaaacgttttctttttct
>purH 102->128
GCGTTGCGCAAACGTTTTCGTTACAA
>purL 71->97
tttccacgcaaacggtttcgtcagcg
>purMN 59->85
cagtctcgcaaacgtttgctttccct
pur Operon of Bacillus subtilis
The Bacillus subtilis purEKBCQLFMNHD operon, called
the pur operon, encodes 10 enzymes required for de
novo purine tynthesis.
The Dnase I footprinting of the pur operon covered from
-179 to -30 upstream region.
The common DNA recognition element for binding of
PurR to pur operon is not known.
Results: AlignACE (Escherichia Coli)
Number of Motifs
Identified Known Sites
in DPInteract
100bp
database
85
300bp
database
100
(1000 hits)
Identified Known Sites
in DPInteract
25%
14%
(2000 hits)
50%
29%
Results: AlignACE (Bacillus Subtilis)
Gene name
Starting Strand
point
Sequence Map
Score
purE
-176
cgcagaagc 11.4119
gaacgac
+
Results: MEME (Escherichia Coli)
100bp
database
Number of Motifs
Identified Known
Sites in
DPInteract
(E-Value < 10)
300bp
database
30
10
50%
38%
Results: MEME (Bacillus Subtilis)
Gene
name
purE
Starting Strand
point
-72
+
Sequence E-Value
TGTCTTT
CTCGAA
CT
0.11
Results: Locations of Mapped Genes of
Escherichia coli and Bacillus subtilis
AlignACE
MEME
Discussion
PlatCom: A Platform for Comparative Study of
Multiple Genomes
Multiple Tools
Multiple Genomes
AlignACE
Escherichia Coli
MEME
Bacillus Subtilis
…
…
Performance
Many Significant new
…
motifs are found
Acknowledgement
Sun Kim, Advisor
Zhiping Wang, Classmate