PIR-International Protein Sequence Database
Download
Report
Transcript PIR-International Protein Sequence Database
Archives and Information
Retrieval
CSC 487/687 Computing for
Bioinformatics
Introduction
Learning objectives:
What is the general arrangement of biological data in
the public databases?
To know the information retrieval skills that will allow
you to make effective use of the databases.
To become familiar with basic operations.
How does one retrieve information on a particular
subject in the literature?
Primary public domain bioinformatics
servers
Public Domain
Bioinformatics
Facilities
National Center
For Biotechnology
Information (NCBI)
United States
Databases
Analysis
Tools
European Bioinformatics
Institute (EBI)
United Kingdom
Databases
Analysis
Tools
Genome
Net
(KEGG & DDBJ)
Japan
Databases
Analysis
Tools
The Archives
Massive biological experimental data
These biological information databases can
be classified into two types
The first level databases
Come from the raw data which were obtained via
the experiments. “simple”
The second level databases
Further reorganized based on.. in order to
achieve some specific goals
The Archives
Some examples:
The first level databases
Nucleic acid sequence databases: GenBank,
EMBL Data Library, DNA Database of Japan
(DDBJ)
Protein sequence database: SWISS-PROT, PIR
Protein structure database: PDB
The second level databases
GDB
TRANSFAC
SCOP
Nucleic acid sequence databases
International DNA Sequence Database
Collaboration
NCBI (GenBank) – USA (1982)
EMBL (Data Library)– Europe (1982)
DDBJ (DNA Data Bank)– Japan (1988)
NCBI
Established in USA in 1988 as a national
resource for molecular biology information
creates public databases
conducts research in computational biology
develops software tools for analyzing genome
data
disseminates biomedical information
Nucleic acid sequence databases
GenBank
nucleic acid sequence and the protein
sequence
literature work
biological annotation
A new release is made every two months
GenBank information retrieval system
NCBI ENTREZ
A platform that provides access to and links
to databases with biological information
ENTREZ
PubMed
MedLine
GenBank
Protein Genomes
databases
PopSet
Taxonomy
OMIM
NCBI ENTREZ
MedLine
OMIM
Literature Database
Database of human genes and genetic disorders
GenBank
Database of all publicly available DNA sequences
Protein
databases
Database of amino acid sequences from SwissProt, PIR, PRF,
PDB, and translations from annotated coding regions in
GenBank and RefSeq.
Genomes
Database of genomes from organisms and viruses
PopSet
Taxonomy
Database of DNA sequences that have been collected to
analyze the evolutionary relatedness of a population.
Database of names of organisms with sequences in GenBank or Prot
PubMed Center
the U.S. National Library of Medicine's digital
archive of life sciences journal literature
Access to the full text of articles in PMC is
free, except where a journal requires a
subscription for access to recent articles
OMIM-Online Mendelian
Inheritance in Man
A catalog of human genes linked to diseases
Began by Victor A. McKusick at Johns Hopkins
University
A good place to start when you want to research a
certain disease or biological molecule
This database is cross-referenced to PubMed and
other NCBI-based databases
How to submit sequence data to
GenBank
Bankit based web interface
http://www.ncbi.nlm.nih.gov/BankIt
Sequin program
http://www.ncbi.nlm.nih.gov/Sequin
On-class exercise
Protein databases
The Protein Information Resource (PIR) was
established in 1984 by the National Biomedical
Research Foundation (NBRF).
The PIR Protein Sequence Database evolved from
the original NBRF Protein Sequence Database,
developed over 20 years
PIR-International is a collaboration between NBRF,
the Munich Information Center for Protein Sequences
(MIPS), and the Japan International Protein
Information Database (JIPID)
collect and publish what is now the oldest and largest
database of biomolecular sequence, source,
literature, and feature information.
PIR
PIR-International Protein Sequence Database: an annotated,
non-redundant and cross-referenced database of protein
sequences.
PIR Alignment Database, PIR-ALN: contains sequence
alignments of superfamilies, families and homology domains
produced from information in the Protein Sequence Database.
FAMBASE Family Database: a searchable database
containing a single representative sequence from each protein
family.
RESID Database of Amino Acid Modifications: based on
feature information in the Protein Sequence Database.
PIR
http://www-nbrf.georgetown.edu/pir/
SWISS-PROT
http://www.ebi.ac.uk/swissprot/
an well-annotated protein sequence database established in
1986.
It is maintained collaboratively by the Swiss Institute for
Bioinformatics (SIB) and the European Bioinformatics Institute
(EBI).
a curated protein sequence database that provides a high level
of annotation, a minimal level of redundancy and a high level of
integration with other databases.
Note: UniProtKB/TrEMBL and UniProtKB/Swiss-Prot have been
incorporated into the UniProt (Universal Protein Resource). a
one-stop shop allowing easy access to all publicly available
information about protein sequences.
PROSITE
http://ca.expasy.org/prosite/
a method of determining what is the function
of uncharacterized proteins translated from
genomic or cDNA sequences.
a database of biologically significant sites
patterns formulated in such a way that with
appropriate computational tools it can rapidly
and reliably identify to which known family of
protein (if any) the new sequence belongs.
PDB
http://www.rcsb.org/pdb/
The single international repository for public data on
the 3-dimensional structures of biological
macromolecules
Is established by the Brookhaven National Lab of
United States
The contents are primarily experimental data derived
from X-ray crystallography and NMR experiments
Rasmol may demonstrate 3D structure of the
biological macromolecule according to the PDB
document