Searching GenBank - Institute of Microbial Technology

Download Report

Transcript Searching GenBank - Institute of Microbial Technology

GenBank
•Nucleotide only sequence database
• Archival in nature
•Data shared nightly among three collaborating
databases
•GenBank at NCBI
•DNA Database of Japan (DDBJ)
•EMBL at EBI
Source NCBI
NCBI site map: A good place to find resources
http://www.ncbi.nlm.nih.gov/Sitemap/index.html
GeneBank Release 131.0
December 15 2003
•30968418
•36553368485
Sequences
Bases
• full release every two months
• incremental and cumulative updates daily
• available only through internet
ftp://ftp.ncbi.nih.gov/genbank/
GenBank Record
 Header
information that apply to
the whole record
 Features
annotations on the record

Sequence
Header
GeneBank Record
modification
date
Molecule Type
Locus Name
Sequence Length
Accession Number
Version Number
Modification Date
GenBank Division
FEATURE
Link to Seq
GeneBank Record
Sequence
GenBank Record
Entrez
http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
Select GenBank
Entrez
Find mRNA sequence for
human “epidermal growth factor receptor”
Specify human as an organism :
Click
Preview/Index
Specify “human” by
selecting “Organisms”
from “All Fields” dropdown menu
2
1
Limit your search
Exclude all
technology
generated
records
Select mRNA in the
“Molecule” list
Select “Refseq” in the
database list
RefSeq

Database of reference sequences

Curated

Non-redundant; one record for each gene, or
each splice variant, from each organism
represented

Each record is intended to present an
encapsulation of the current understanding of a
gene or protein, similar to a review article
RefSeq FAQ
Molecular databases
Find Gene Name by searching LocusLink :
http://www.ncbi.nlm.nih.gov/LocusLink/
Select
organism
LocusLink
Find mRNA sequence for epidermal growth factor receptor
(EGFR):
Starts with gene name
EGFR
Limit search to
1. Gene Name
2. exclude all technology
generated records
3. Select mRNA as Molecule
4. Select “Refseq” as source
database
Entrez: Neighbors and Hard Links
Word weight
PubMed
abstracts
Phylogeny
3
-D
3-D
Structure
Structure
Taxonomy
VAST
Genomes
BLAST
Nucleotide
sequences
Protein
sequences
BLAST
Source NCBI
SRS – List of Public SRS
Servers
SRS – List of Public SRS
Servers
SRS Tutorial
http://srs.ebi.ac.uk
Permanent
session
Temporary
session
List of public servers
Database
Information
-which are present
-when indexed
Documentation
What is SRS?

Central resource for molecular biology data
 Data retrieval system
- more than 250 databanks have been indexed. More
than 35 SRS servers over the WWW

Data analysis applications server
- 11 protein applications
- 6 nucleic acid applications
-
Uniform query interface on the web
History of SRS

1990 - Main author Dr. Thure Etzold
– Development started in EMBL, Heidelberg

1997
– Moved to EBI in Cambridge. Development
work was supported by various grants amongst
others from the EMBnet.

1998
– Etzold and his group join LionBiosciences
Why SRS?

Information retrieval
– Easy way to retrieve information from
sequence and sequence-related databases
– Possibility to search for multiple words/other
criteria

Linkage between different databases
– E.g. Find all primary structures with known
three-dimensional structure

... and much more
Philosophy of SRS
Original database file
-plain text, html, xml
parsed
Index file
Data Retrieval
Searchable links between
database entries
The Library Select Page
Workbenches
Query Forms
Libraries
L
i
b
r
a
r
y
g
r
o
u
p
s
SRS main toolbar tabs

Top Page: displays databases in different database
groups
 Query: displays either the standard or extended query
form
 Results or “the query manager”: maintains a history
of all the results obtained during a session
 Projects or “the project manager”: maintains a
history of all queries and views used during a session
 Views: allows a user to define a user specific view for
one or more databases
 Databanks: contains a list and some facts about the
databases available in the system
Search terms in SRS

SRS indexed fields can be searched using any of the
following:
– Single word search
– Multiple word phrases
– Numbers and dates
– Regular expressions
– Wildcards