Diapositiva 1

Download Report

Transcript Diapositiva 1

The Reference Sequence database
•
•
•
•
•
A non-redundant collection of richly annotated DNA, RNA, and protein
sequences from diverse taxa
The collection includes sequences from plasmids, organelles, viruses,
archaea, bacteria, and eukaryotes
Each RefSeq represents a single, naturally occurring molecule from one
organism.
RefSeq biological sequences (also known as RefSeqs) are derived from
GenBank records but differ in that each RefSeq is a synthesis of
information, not an archived unit of primary research data
Similar to a review article in the literature, a RefSeq represents the
consolidation of information by a particular group at a particular time.
The RefSeq accession number format and molecule types
Accession prefix
AC_
Molecule type
Genomic
Comment
Complete genomic molecule,
alternate assembly
NC_
Genomic
Complete genomic molecule,
reference assembly
NG_
Genomic
Incomplete genomic region
NT_
Genomic
Contig or scaffold, clone-based
a
or WGS
NW_
Genomic
NS_
Genomic
Contig or scaffold, primarily
a
WGS
Environmental sequence
b
NZ_
NM_
NR_
c
XM_
c
XR_
AP_
Genomic
mRNA
RNA
mRNA
RNA
Protein
NP_
c
YP_
c
XP_
c
ZP_
Protein
Protein
Protein
Protein
a
Whole Genome Shotgun sequence data.
An ordered collection of WGS for a genome.
c Computed.
b
Unfinished WGS
Predicted model
Predicted model
Annotated on AC_ alternate
assembly
Predicted model
Predicted model, annotated on
NZ_ genomic records
Flat File Format and Annotated Features
RefSeq records appear similar in format to the GenBank records from
which they are derived.
Features of a RefSeq record
RefSeq records may also be displayed in a graphical format
RefSeq status codes
Code
GENOME ANNOTATION
Description
The RefSeq record is provided via automated processing and
is not subject to individual review or revision between builds.
INFERRED
The RefSeq record has been predicted by genome sequence
analysis, but it is not yet supported by experimental
evidence. The record may be partially supported by
homology data.
The RefSeq record has not yet been subject to individual
review, and some aspect of the RefSeq record is predicted.
The RefSeq record has not yet been subject to individual
review. The initial sequence-to-gene name associations have
been established by outside collaborators or NCBI staff.
The RefSeq record has been reviewed by NCBI staff or by a
collaborator. The NCBI review process includes assessing
available sequence data and the literature. Some RefSeq
records may incorporate expanded sequence and annotation
information.
The RefSeq record has undergone an initial review to provide
the preferred sequence standard. The record has not yet
been subject to final review, at which time additional
functional information may be provided.
The RefSeq record is provided to represent a collection of
whole genome shotgun sequences. These records are not
subject to individual review or revisions between genome
updates.
PREDICTED
PROVISIONAL
REVIEWED
VALIDATED
WGS
Using Entrez Limits to restrict a query to RefSeq
http://www.ncbi.nlm.nih.gov/gene
Gene maintains information about genes from
genomes of interest to the RefSeq group
Entrez Gene is accessed like any other Entrez database:
Find genes by...
free text
partial name and multiple species
chromosome and symbol
Search text
human muscular dystrophy
transporter[title] AND ("Drosophila
melanogaster"[orgn] OR "Mus
musculus"[orgn])
(II[chr] OR 2[chr]) AND adh*[sym]
associated sequence accession number
M11313[accn]
gene name (symbol)
publication (PubMed ID)
Gene Ontology (GO) terms or identifiers
Genes with variants of medical interest
BRCA1[sym]
11331580[PMID]
"cell adhesion"[GO]
10030[GO]
gene_snp_clin[filter]
chromosome and species
Enzyme Commission (EC) numbers
Y[CHR] AND human[ORGN]
1.9.3.1[EC]