Introduction to bioinformatics

Download Report

Transcript Introduction to bioinformatics

Biological databases
International genome sequencing and protein
structure determination
Protein Data Bank (PDB)
Sequence data = strings of letters
Nucleotides (bases)
Adenine (A)
Cytosine (C) triplet codons
Guanine (G) genetic code
Thymine (T)
20 amino
acids (A, L,
V, S etc.)
Three-dimensional protein structure =
atomic coordinates in 3D space
Conversion into metric
Protein folding
Data types
primary data
sequence
primary database
DMPVERILEALAVE…DNA
secondary data
amino
acid
secondary
“motifs”: regular
expressions, blocks,
profiles, fingerprints
protein structure
tertiary data
tertiary protein
structure
atomic co-ordinates
interaction
data
tertiary db
e. g., alpha-helices, betastrands and
pathways
interaction db
functional
binary protein-protein networks
interactions/ networks
secondary db
domains, folding units
Primary biological databases
• Nucleic acid
EMBL
GenBank
DDBJ (DNA Data
Bank of Japan)
• Protein
PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D
International nucleotide data banks
EMBL
GenBank
Europe
EMBL
EBI
USA
NLM
NCBI
International
Advisory Meeting
Collaborative Meeting
TrEMBL
DDBJ
Japan
NIG
CIB
NRDB
GenBank file format
GenBank file format
Swiss-Prot
SWISS-PROT file format
SWISS-PROT file format
SWISS-PROT file format
SWISS-PROT file format
Other primary protein databases
• TrEMBL (translated EMBL) in SWISS-PROT format
rapid access to sequence data from genome
projects
computer-annotated supplement to SWISS-PROT
translations of all coding sequences (CDS) in
EMBL
• SP-TrEMBL
Other primary protein databases
The Protein Information Resource (PIR)
• integrated system of protein sequence databases
and derived related databases, e. g., alignment
databases
• rapid searching, comparison, and pattern matching of
protein sequences
• retrieval of descriptive, bibliographic, feature, and
concurrent cross-reference information
• aims to be comprehensive and consistently
annotated
PIR: related databases
NRL-3D Sequence-Structure Database
• produced by PIR from sequence and annotation
information extracted from three-dimensional
structures in the Protein Databank (PDB)
• allows keyword and similarity searches
Two other useful sites
INFOBIOGEN-The Public Catalog of Databases
http://www.infobiogen.fr/services/dbcat/
KEGG-Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/
Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to
computerize current knowledge of molecular and cellular biology in
terms of the information pathways that consist of interacting molecules
or genes and to provide links from the gene catalogs produced by
genome sequencing projects.
Sequence Retrieval System (SRS)
Database browser that allows
users to
•retrieve
•link
•access
entries from all interconnected
resources.
Users can formulate queries
across a range of different
database types.
Guide to Protein Databases:
http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.html
http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html
With thanks to Dr Roman Laskowski.
Interaction databases
Biomolecule-ligand interactions
• SRS: Enzymes,
reactions and
metabolic pathway
databases
• Receptor-ligand
database searches
relibase.ebi.ac.uk/
Interaction databases
Yeast model
• YPD - http://www.incyte.com/sequence/proteome
• proteome database of model organism
• 6142 proteins : 3430 known, 804 similarity, 1908 unknown
• data on protein interaction maps
• derived from literature and experiment
• Curagen - http://curatools.curagen.com
• Curagen -Yeast two-hybrid screen data
• 957 putative interactions of 1004 yeast proteins
• Uetz et al., 2000 - Nature 403 p623-630
Protein-Protein Interaction Databases
http://www.hgmp.mrc.ac.uk/GenomeWeb/protinteraction.html
Protein-Protein Interactions
DIP
Biocarta
KEGG
KEGG
http://www.genome.ad.jp/kegg/
•Search database for metabolic and regulatory pathways
•Compute KEGG: Generate possible reaction pathways between two
compounds
http://www.genome.ad.jp/
Metabolic pathways
Signal transduction pathways
(species-specific,
Homo sapiens shown)
Biocarta pathway database
http://www.biocarta.com