introduction to biomart

Download Report

Transcript introduction to biomart

Data retrieval
BioMart
Export View
Data sets on ftp site
MySQL queries of databases
Perl API access to databases
ExportView
Data Mining in Ensembl with
EnsMart
August 2005
Possible queries…
• All genes from a candidate region
• Genes with a particular protein domain
• Members of a protein family
• Genes associated with SNPs
More specific queries
• Human genes with upstream regions conserved w.r.t. mouse
• Upstream sequence for all Ensembl genes mapped to U95A
chip (similarly, complete genomic annotation of MG_U74).
• Genomic location and description of all mouse, rat and fugu
homologues of all human genes, with transmembrane
domains, expressed in cardiovascular system and have nonsynonymous SNPs.
Ensembl core database
•
•
•
•
Normalised
Each data point stored only once
Quick updates
Minimal storage requirements
• But:
• Many tables
• Many joins for complicated queries
• Slow for data mining questions
BioMart and EnsMart
•
•
•
•
•
•
•
Large-scale data retrieval tool
Query builder interface
Databases: Ensembl, SNP, Vega, (MSD, UniProt)
Associated features or sequences
Flexible output formats
http://www.ebi.ac.uk/biomart/
http://www.ensembl.org/EnsMart/
Mart database
•
•
•
•
De-normalised
Tables with ‘redundant’ information
Query-optimised
Fast and flexible
• designed for data mining
Primary Data Sets
• Ensembl genes
• SNP
– Single nucleotide polymorphisms
– Deletion-insertion polymorphisms
– Short tandem repeats
• Vega genes
• (MSD protein structures)
• (UniProt proteomes)
Secondary Data Sets
•
•
•
•
•
•
Markers
Diseases
Gene ontology
Gene expression information
Homology predictions
Protein annotation
Information flow
start
SPECIES
filter
output
REGION
REGION
GENE
GENE
EXPRESSION
EXPRESSION
HOMOLOGY
HOMOLOGY
PROTEIN
PROTEIN
SNP
SNP
REFSEQ
FASTA
EMBL
GTF
AFFY
HTML
SWISSPROT
TEXT
FOCUS
GO
EXCEL
INTERPRO
FILE
BioMart
http://www.biomart.org/
BioMart - Features
BioMart - Sequences
Output formats
HTML
What about queries not
possible to do in EnsMart
• Direct database access at ensembldb.ensembl.org
•
martdb.ebi.ac.uk
• MySQL client
Download MySQL for Windows
http://www.winmysql.com/page4.html
File: wmysr11.zip
Access via Perl object API
• Based on bioperl
• Ensembl modules
• For an introduction, see the tutorial at:
• http://www.ensembl.org/info/software/core/
There are other ways…
MartShell
Commandline interface to Mart written in Java.
It works with a Mart Query Language
MartExplorer