Transcript Document

Welcome to
Integrated Bioinformatics
Friday, 8 September 2006
• Comparison of genomes – Scenario
• Installing and running Blast
• Weekend/Monday – How to find differences
• Nature of research articles
E. coli: What makes it kill?
Escherichia coli . . .
. . . very small lab rats
Courtesy of Kent State University Microbiology
E. coli: What makes it kill?
Escherichia coli . . .
haemorrhagic colitis
E. coli: What makes it kill?
E. coli K12
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Gene finder
E. coli O157:H7
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Gene finder
E. coli: What makes it kill?
E. coli K12
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Gene finder
E. coli O157:H7
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Gene finder
E. coli: What makes it kill?
Killer protein
Killer functions
Membrane protein, sodium transporter
Iron responsive transcriptional regulator
Calcium-dependent protein kinase
Unknown protein
Unknown protein
Similarity finder
Unknown protein
...
ideas for new antibiotics
Welcome to
Integrated Bioinformatics
Friday, 8 September 2004
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Gene finder
TCTACTTATA
AAGAGTCTGT
TTCTGTCTGC
TGGATTTCGG
GAACCTTAGT
CTCCGTAAAC
TGAATAAACT
AAGAGTTTAA
AAACCTGTAT
TTATATATTT
CCCCAGCTGT
GACAGCACTG
GCTGAAATTC
CCCTGCACCA
ATGAATGACT
TTCAATCCAC
TGAATGAACA
TCTGACCTCT
AACTCTAGCC
GACTTCTGCT
CTCTAACATG
TTGTTAAAGG
AGTTAAAAAC
GGTTACATGA
TAAGAAATTA
CATTAAAAAG
ACCCTCAAGA
CGCTGAGAGC
GGTCTTTCCT
GAACGAACGA
AGGGCTACAC
CATACATGGT
GGCAGCTTTC
TGCCCCACTC
ATACCAAAGT
ATGTCAGCAA
TACAAATGAA
GAATTGCAGT
ACTGCCTAAA
ATTGCAATTA
AGGCAAATAC
AGGCACCGGC
AGAGTGGTAC
GTGGGCACTG
TTGAATGAAA
Gene finder
Welcome to
Integrated Bioinformatics
Friday, 8 September 2006
• Nature of research articles
• Comparison of genomes - Scenario
• Weekend/Monday – How to find differences
– Parsing programs
– Regular expressions
Welcome to
Integrated Bioinformatics
Friday, 8 September 2006
• Nature of problem sets
• Nature of research articles
• Comparison of genomes - Scenario
• Weekend/Monday – How to find differences
• Today – Why differences
How do differences arise between genomes?
Addition/deletion of DNA
Where do they come from?
How to distinguish
– GC-content
from
?
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
organism 1 TTT TCT GAA TCC GTA GAC GTT
organism 2 TTT TCT GAA TCA GCA GAC GTG
What kind of mutations arise?
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
Keeping track of gene variants
– Concepts of ortholog / paralog
How do differences arise between genomes?
Infection
Phage
Phage genome
Bacterial chromosome
Lysogenic
Phage genome pathway
Death
General transduction
Lytic
pathway
How do differences arise between genomes?
Infection
Phage
Phage genome
Bacterial chromosome
Lysogenic
Phage genome pathway
Lytic
pathway
Life!
How do differences arise between genomes?
Infection
Phage
Phage genome
Bacterial chromosome
Lysogenic
Phage genome pathway
Lytic
pathway
Life!
Specialized transduction
The gene encoding diphtheria toxin (tox)
is carried on corynephage b
b
tox – C.d.
tox + C.d.
Lysogenic conversion by corynephage b confers toxogenicity!!
How to distinguish foreign from native genes?
GC-content =
[G] + [C]
[total nucleotides]
SQ2: List the two triplets that code for Lys. What
proportion of each is used in Borrelia burgdorferi
compared to Mycobacterium tuberculosis? Is this
finding surprising? Why or why not?
Borrelia burgdorferi
AAU Asn 0.80
AAC Asn 0.20
AAA Lys 0.80
AAG Lys 0.20
Mycobacterium tuberculosis
AAU Asn 0.21
AAC Asn 0.79
AAA Lys 0.26
AAG Lys 0.74
29% GC content
65% GC content
How to distinguish foreign from native genes?
SQ4: The GC content of Bacillus anthracis is
33.97%. By analysis of codon use, would it likely be
easier to detect a foreign gene originating from
Borrelia burgdorferi or from Mycobacterium
tuberculosis?
Borrelia burgdorferi
AAU Asn 0.80
AAC Asn 0.20
AAA Lys 0.80
AAG Lys 0.20
Mycobacterium tuberculosis
AAU Asn 0.21
AAC Asn 0.79
AAA Lys 0.26
AAG Lys 0.74
29% GC content
65% GC content
DNA mutation has multiple causes
• Errors during DNA replication
• base mis-incorporation
• polymerase slippage / repeat amplification
• Errors during recombination or cell division
• chromosome loss or rearrangement
• large insertions or deletions
• Environmental factors – mutagens:
• radiation – UV or ionizing radiation
• chemical – many mechanism of action
• Spontaneous events:
• tautomerisation
• depurination
• deamination
• Viral infection or transposons
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
organism 1 TTT TCT GAA TCC GTA GAC GTT
organism 2 TTT TCT GAA TCA GCA GAC GTG
GUU
GUC
GUA
GUG
Val
Val
Val
Val
GCU
GCC
GCA
GCG
Ala
Ala
Ala
Ala
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
organism 1 TTT TCT GAA TCC GTA GAC GTT
organism 2 TTT TCT GAA TCA GCA GAC GTG
Silent mutation
GUU
GUC
GUA
GUG
Val
Val
Val
Val
GCU
GCC
GCA
GCG
Ala
Ala
Ala
Ala
Single base mutations
Transitions
Purine for purine
or
pyrimidine for pyrimidine
Transversions
Purine for pyrimidine
or
pyrimidine for purine
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
organism 1 TTT TCT GAA TCC GTA GAC GTT
organism 2 TTT TCT GAA TCA GCA GAC GTG
Transition:
Transversion:
purine
pyrimidine
purine
pyrimidine
purine
pyrimidine
Tautomerization of bases
C
T
G
A
C* T*
A
G
DNA replication can “lock in” a mutation
Mutations can arise as a consequence of misincorporation during replication
How to distinguish foreign from native genes?
SQ7: There are two codons each for 9 of the amino
acids. Choose any one of these 18 codons.
• Create a transition mutation in the third position of
the codon. What is the result?
• Create a transversion mutation in the third position.
What is the result?
• In the third position, are transition mutations or
transversion mutations more likely to result in a
change in the amino acid encoded?
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
Keeping track of gene variants
– Concepts of ortholog / paralog
How do differences arise between genomes?
Addition/deletion of DNA
Point mutation
Keeping track of gene variants
– Concepts of ortholog / paralog
Orthologs, Paralogs, and Xenologs
Speciation event
leading to orthologs
Horizontal transfer
leads to xenologs
Gene duplication
gives rise to paralogs
Orthologs vs Paralogs
SQ5: Are genes B1
and C2 orthologs or
paralogs?
How to predict
orthology with
imperfect
information?
A1 AB1
Species A
B1
B2 C1 C2
Species B
C3
Species C