Gene databases

Download Report

Transcript Gene databases

21 Enero 2010
Dr. Victor Treviño
BIOINFORMATICS - GENE DATABASES





HUGO (www.genenames.org)
NCBI (http://www.ncbi.nlm.nih.gov)
EBI – EMBL (http://www.ebi.ac.uk/ )
EBIMed (http://www.ebi.ac.uk/Rebholz-srv/ebimed/ )
SwissProt / UniProt







http://www.ebi.ac.uk/uniprot/
http://www.psc.edu/general/software/packages/swiss/sw
iss.php
PubGene (http://www.pubgene.org/ )
GeneCards (http://www.genecards.org/ )
iHOP (http://www.ihop-net.org/UniPub/iHOP/ )
Panther (http://www.pantherdb.org/ )
"Others"
[email protected]
GENE DATABASES
(DNA, RNA, PROTEIN)


Human Genome
Organization
"OFFICIAL" Gene Names
NCBI LINKS
[email protected]
HUGO – HGNC
WWW.GENENAMES.ORG

The "richest" information about genes
[email protected]
NCBI
[email protected]
NCBI










Summary
Species (or specific)
Function
Sequence
CDS
Chr Location
Domains
Interactions
GeneRIFs: Gene
References Into
Function
Lots of LINKS to all
parts of NCBI and
Externals
[email protected]
NCBI – GENE DATABASE
GenBank/GenPept Format
[email protected]
NCBI – NUCLEOTIDE / PROTEIN
http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml
[email protected]
GENE SEQUENCE FORMATS
[email protected]
GENE SEQUENCE FORMATS

cgagatgcagatagcagctag
agat (at random)

small sequences may
identify a gene (dbEST,
dbSTS, ePCR)
[email protected]
BLAST - SEARCHING A GENE FROM
SEQUENCE



Unified Information about A Gene
across reported sequences
"set of transcript sequences that
appear to come from the same
transcription locus (gene or
expressed pseudogene), together
with information on protein
similarities, gene expression,
cDNA clone reagents, and
genomic location."
VERY IMPORTANT : UniGene ID
(Hs. xxxxxx)
[email protected]
NCBI - UNIGENE
On-Line Mendelian Inheritance in Man /
Animals
 Curated "Function" of Genes
 Good References
 Strong History / Evidence

[email protected]
NCBI – OMIM - OMIA




HomoloGene 
conserved functions
dbEST  snapshot of
genes expressed in a
given tissue
UniSTS  sequence
tagged sites, PCR
primer pairs, genomic
position, genes
SNP  Polymorphisms
[email protected]
NCBI – OTHERS…
HTTP://WWW.EBI.AC.UK/
Ensembl - automatic annotation of large
eukaryotic genomes (Genes ID)
 UniProt - (Universal Protein Resource) is the
world's most comprehensive catalogue of
information on proteins
 CiteXplore (good for literature)
 EBIMed (Tools, semantic mining)

[email protected]
EBI

Uniprot: http://www.uniprot.org/


Curated:






Union of Swiss-Prot, TrEMBL, and PIR
Swiss-Prot is manually annotated and reviewed.
Good Summaries
References
Sequence
Features (repeats, disulfid, … , domains)
Examples…

Names and origin · Protein attributes · General annotation
(Comments) · Ontologies · Alternative products · Sequence
annotation (Features) · Sequences · References · Cross-references
· Entry information · Relevant documents
[email protected]
SWISSPROT
HTTP://WWW.EBI.AC.UK/UNIPROT/
HTTP://WWW.PUBGENE.ORG/





Good for gene
interactions
References
Association to Gene
Onthologies (GO)
TEXT-MINING
REALLY NICE
[email protected]
PUBGENE
HTTP://WWW.GENECARDS.ORG/



Good Summary
Function
Lots of links
[email protected]
GENECARDS
HTTP://WWW.IHOP-NET.ORG/UNIPUB/IHOP/




"Summary" of
information for a protein
Linked
TEXT-MINING
REALLY NICE
[email protected]
IHOP
HTTP://WWW.PANTHERDB.ORG/






Rich Information
Curated
Pathways
Functions
Families
Homologous
[email protected]
PANTHER

http://bioinformatics.ca/links_directory/
[email protected]
BIOINFORMATICS LINKS DIRECTORY

No SINGLE site contains ALL information
 we
have to use several sources
 BioGPS
CURATED data is valuable
 Be cautious with predicted data
 Relation with other genes is more difficult to
explore

[email protected]
GENE DATABASES - SUMMARY
http://biogps.gnf.org/
 It is a portal of portals
 You can add as many
portal sites as you
want
 Easy to configure
 Versatile
 VERY IMPORTANT!

[email protected]
BIOGPS