Database Categories List http://www3.oup.co.uk/nar/database/c/

Download Report

Transcript Database Categories List http://www3.oup.co.uk/nar/database/c/

SWISS-PROT
• The SWISS-PROT database consists of sequence entries. It
contains high-quality annotation, is non-redundant and crossreferenced to many other databases.
• Release 39.0 of SWISS-PROT contains 86,593 sequence
entries.
• SWISS-PROT is accompanied by TrEMBL, a computerannotated supplement to SWISS-PROT. TrEMBL contains the
translations of all coding sequences (CDS) present in the
EMBL Nucleotide Sequence Database, which are not yet
integrated into SWISS-PROT.
•
TrEMBL
• TrEMBL release 17 (June 2001) was created from the EMBL
Nucleotide Sequence Database release 66 and updates up to
01.05.01 and contains 540,195 sequence entries, comprising
155,771,315 amino acids.
• TrEMBL is split into two main sections; SP-TrEMBL and REMTrEMBL. SP-TrEMBL (SWISS-PROT TrEMBL) contains the
entries which should eventually be incorporated into SWISSPROT and can be considered as a preliminary section of
SWISS-PROT as all SP-TrEMBL entries have been assigned
SWISS-PROT accession numbers. REM-TrEMBL (REMaining
TrEMBL) contains the entries that we do not want to include in
SWISS-PROT. REM-TrEMBL entries have no accession
numbers.
Protein Information Resource (PIR)
The Protein Information Resource (PIR), in
collaboration with MIPS and JIPID, produces the PIRInternational Protein Sequence Database (PIR-PSD) -- a
comprehensive, non-redundant, expertly annotated, fully
classified and extensively cross-referenced protein
sequence database in the public domain. The PIR-PSD,
iProClass and other PIR auxiliary databases provide an
integration of sequences, functional, and structural
information to support genomics and proteomics
research.
Nyhet: UniProt
UniProt (Universal Protein Resource) is the world's most
comprehensive catalogue of information on proteins. It is a
central repository of protein sequence and function created by
joining the information contained in Swiss-Prot, TrEMBL, and
PIR.
UniProt is comprised of three components, each optimised for
different uses. The UniProt Knowledgebase (UniProt) is the
central access point for extensive curated protein information,
including function, classification, and cross-reference. The
UniProt Non-redundant Reference (UniRef) databases
combine closely related sequences into a single record to
speed searches. The UniProt Archive (UniParc) is a
comprehensive repository, reflecting the history of all protein
sequences.
The sequences and information in UniProt is accessible via
text search, BLAST similarity search, and FTP.
Entrez at NCBI
Entrez is a retrieval system for searching several linked databases. It
provides access to:
PubMed: The biomedical literature (PubMed)
Nucleotide sequence database (Genbank)
Protein sequence database
Structure: three-dimensional macromolecular structures
Genome: complete genome assemblies
PopSet: population study data sets
OMIM: Online Mendelian Inheritance in Man
Taxonomy: organisms in GenBank
Books: online books
ProbeSet: Gene Expression Omnibus (GEO)
3D Domains: domains from Entrez Structure
•Go to NCBI
Database links in Entrez
SRS at EBI
SRS is a powerful data integration platform, providing
rapid, easy and user friendly access to the large volumes of
diverse and heterogeneous Life Science data stored in more
than 400 internal and public domain databases. SRS enables
the querying of diverse biological and Life Science data
through only one interface,
SRS facilitates the rapid development of applications and
algorithms, as well as bioinformatics portals for the Inter- or
Intranet, making the data efficiently available to entire
organizations. Today, SRS is answering the most
demanding requirements of modern Life Science companies
and will truly add value to their research programs.
SRS enables:
Fast access to diverse life science data - genetic, protein,
cellular, molecular, and clinical - for researchers
and bioinformaticians
Integration of public and proprietary data through one
interface
Unique ability to perform cross-database queries
Rapid string search of large volumes of data
Scalability to the customer's specific requirements
EBI, the European Bioinformatics Institute (EMBL Outstation,
Hinxton, UK)
Forskjellige sekvensformater
Her er en sekvens i GCG-format
EXTRACTPEPTIDE of frames: C from: caupol.map
(Linear) MAP of: caupol.raw check: 2457 from: 1 to: 3957
Frame C from:
1 to: 1318
caupol.pep Length: 941 August 27, 1995 16:35 Type: P Check:
9501 ..
1
MAYPLLVLVD GHALAYRAFF ALRESGLRSS RGEPTYAVFG FAQILLTALA
51
EYRPDYAAVA FDVGRTFRDD LYAEYKAGRA ETPEEFYPQF ERIKQLVQAL
101
NIPIYTAEGY EADDVIGTLA RQATERGVDT IILTGDSDVL QLVNDHVRVA
151
LANPYGGKTS VTLYDLEQVR KRYDGLEPDQ LADLRGLKGD TSDNIPGVRG
Her er en annen i FASTA-format
>ECPOLA V00317 E. coli gene polA coding for DNA polymerase I. 9/93
CACCGGGCAACGGCGGCAGAAGTGTTTGGTTTGCCACTGGAAACCGTCACCAGCGAGCAA
CGCCGTAGCGCGAAAGCGATCAACTTTGGTCTGATTTATGGCATGAGTGCTTTCGGTCTG
GCGCGGCAATTGAACATTCCACGTAAAGAAGCGCAGAAGTACATGGACCTTTACTTCGAA
CGCTACCCTGGCGTGCTGGAGTATATGGAACGCACCCGTGCTCAGGCGAAAGAGCAGGGC
TACGTTGAAACGCTGGACGGACGCCGTCTGTATCTGCCGGATATCAAATCCAGCAATGGT
GCTCGTCGTGCAGCGGCTGAACGTGCAGCCATTAACGCGCCAATGCAGGGAACCGCCGCC
GA
Mens dette er et eksempel på en ren tekstfil
CGCCGTAGCGCGAAAGCGATCAACTTTGGTCTGATTTATGGCATGAGTGCTTTCGGTCTG
GCGCGGCAATTGAACATTCCACGTAAAGAAGCGCAGAAGTACATGGACCTTTACTTCGAA
CGCTACCCTGGCGTGCTGGAGTATATGGAACGCACCCGTGCTCAGGCGAAAGAGCAGGGC
TACGTTGAAACGCTGGACGGACGCCGTCTGTATCTGCCGGATATCAAATCCAGCAATGGT
GCTCGTCGTGCAGCGGCTGAACGTGCAGCCATTAACGCGCCAATGCAGGGAACCGCCGCC
GA
Hvordan oversette fra et format til et
annet?
ReadSeq
http://www.ebi.ac.uk/cgi-bin/readseq.cgi
ReadSeq kan oversette fra og til 21 forskjellige
sekvensformater