An overview of the EBI - European Bioinformatics Institute

Download Report

Transcript An overview of the EBI - European Bioinformatics Institute

Bioinformatics tools for biologists @ the EBI
An overview
Bioinformatics
• The science of storing, retrieving and analyzing large
amounts of biological information
• An interdisciplinary science, involving biologists,
computer scientists and mathematicians
• At the heart of modern biology
2
EBI Overview
“Large-scale” focus
• Data explosion and new types of data
• High-throughput biology
• Emphasis on systems, not reductionism
• Large community of users with no training in
bioinformatics
• Growth of applied biology – molecular medicine,
agriculture, food, environmental sciences…
3
EBI Overview
What is EMBL-EBI?
• Based on the Wellcome
Trust Genome Campus
near Cambridge, UK
• Part of the European
Molecular Biology
Laboratory
• Non-profit organization
4
EBI Overview
The EBI’s mission
• To provide freely available data and bioinformatics
services to all facets of the scientific community in ways
that promote scientific progress
• To contribute to the advancement of biology through
basic investigator-driven research in bioinformatics
• To provide advanced bioinformatics training to scientists
at all levels, from PhD students to independent
investigators
• To help disseminate cutting-edge technologies to
industry
Filler text
5
EBI Overview
Databases and tools
www.ebi.ac.uk
New types of data
Literature and ontologies
Genomes
Protein sequence
DNA & RNA sequence
Protein structure
Gene expression
Chemical entities
Protein families,
motifs and domains
Protein interactions
Pathways
Systems
7
EBI Overview
Databases: molecules to systems
Literature and ontologies
CiteXplore, GO
Genomes
Ensembl
Ensembl Genomes
EGA
Nucleotide sequence
EMBL-Bank
Protein families,
motifs and domains
InterPro
Microarray & gene
expression data
ArrayExpress
Protein interactions
IntAct
Protein structure
PDBe
Pathways
Reactome
Proteomes
UniProt, PRIDE
Chemical entities
ChEBI
Systems
BioModels
8
EBI Overview
Database collaborations
9
EBI Overview
Standards development – international collaborations
Genomics Standards Consortium (GSC)
http://gensc.org
Genome annotation
www.geneontology.org
Protein sequence
www.uniprot.org
Nucleotide sequence
www.insdc.org
Microarray and Gene
Expression Data (MGED)
www.mged.org
Cheminformatics
www.ebi.ac.uk/chebi
HUPOProteomics
Standards
Initiative (PSI)
www.psidev.info
Pathways
www.reactome.org
www.biopax.org
Metabolomics Standards Initiative (MSI)
www.metabolomicssociety.org
10
EBI Overview
Protein structure
www.wwpdb.org
Systems modeling
standards
www.sbml.org
EBI website: www.ebi.ac.uk
Databases
11
EBI Overview
Tools
EBI search engine: EB-eye
Search all main
databases in one go
12
EBI Overview
Nucleotides: European Nucleotide Archive (ENA)
• ENA provides a comprehensive, accessible
and publicly available repository for
nucleotide sequence data
• Collaboration with GenBank and DDBJ for
data sharing
• It consolidates information from EMBLBank, the European Trace Archive
(containing raw data from electrophoresisbased sequencing machines) and the
Sequence Read Archive (containing raw
data from next-generation sequencing
platforms)
• Provides access to the whole scale of
sequencing information: from raw data,
through assembly and mapping
information, through to high-level functional
annotation (see figure).
13
EBI Overview
Nucleotides: ENA
Download data
Navigate to view
related data, e.g.
taxon-specific
data
Other type of data include
SRA experiments
14
EBI Overview
Genomes: Ensembl & Ensembl Genomes
• Genome browser providing free access to the complete sequences of
higher and model organism
• With Ensembl you can:
 Retrieve all or part of a genome sequence
 Perform sequence alignment using BLAST or BLAT
 Link to genome annotation from microarray results
 View expressed mRNA, protein, etc. in a chromosomal region
 View variations such as SNPs across strains or populations
 View all alternative splicing for a gene
 Explore homologues and phylogenetic tree across > 30 species
 View conserved regions across species
• Ensembl Genomes extends to non-vertebrate genomes
15
EBI Overview
Genomes: Ensembl
Genomic alignments
Chromosomes
Genes
Pick a genome
Synteny
Gene families
SNPs
Across species
16
EBI Overview
Orthology
Within species
Genomes: Ensembl Genomes
Ensembl
Metazoa
Ensembl-like genome browser for nonvertebrate species
Ensembl Bacteria
Across species
17
EBI Overview
View options
Using view options, you can select
to view only the current gene or
the entire expanded gene tree
Select Orthologue view to
see putative orthologues
Retrieving data with Biomart
• BioMart is a search engine that can be used to download
data into a table format
• Many EBI databases are powered by Biomart
• For example, you can use Ensembl Biomart to retrieve:
 All the genes for one species
 Or… only genes on one specific region of a chromosome
 Or… genes on one region of a chromosome associated with an
InterPro domain
 Or…etc.
18
EBI Overview
Biomart – how it works
First Step:
Choose a dataset
Second step:
Add filters to define
a gene set
Third step:
Add attributes to
determine
column output
19
EBI Overview
Biomart results
20
EBI Overview
www.biomart.org
21
EBI Overview
ArrayExpress & Atlas of Gene Expression
• ArrayExpress Archive is a public repository of functional
genomics experiments, including gene expression,
supporting scientific publications
• You can query it to retrieve experimental information and
download functional genomics data
• Atlas of Gene Expression contains a subset of curated
and re-annotated Archive data
• Can be queried for individual gene expression under
different biological conditions across experiments
22
EBI Overview
Transcriptomes: ArrayExpress
ArrayExpress
Archive: browse
experiments
Expand results
Search by keyword
Spreadsheets
describing the
experiment, sample
properties or array
design
23
EBI Overview
Transcriptomes: Atlas of Gene Expression
Atlas interface
Gene
summary page
24
EBI Overview
Experiment page
Search by gene name or
biological condition
Protein sequence: UniProt
•
Provides the scientific community with a
comprehensive, richly curated, highquality and freely accessible resource of
protein sequence and functional
information
•
Users can perform simple and complex
text-based queries, run sequence-based
searches, perform multiple sequence
alignments, etc.
•
Consists of:





25
EBI Overview
UniProtKB/Swiss-prot, manually annotated
UniProtKB/TrEMBL, computationally analyzed
records
Uniref, clustered by sequence identity
UniParc, most comprehensive publicly available
non-redundant protein sequence db, unannotated
UniMES, protein sequence from metagenomic
and environmental data
UniPort text search for Brca1
26
EBI Overview
Protein families, motifs & domains: InterPro
27
•
Integrated documentation resource
for protein families, domains and
functional sites
•
Protein signatures from different
member databases describing the
same biological protein family or
domain are united into a single
InterPro entry containing
information about the signature(s)
and links to the protein in UniProt
•
Links to Gene Ontology indicate the
biological function and process that
the proteins are involved in
EBI Overview
Protein families, motifs and domains: InterPro
Compare methods of protein
signature prediction
Visualize the taxonomic range
for a protein signature
View architectures of proteins
containing a signature
28
EBI Overview
Molecular interaction database: Intact
•
IntAct provides a freely available,
open source database system and
analysis tools for protein interaction
data.
•
All interactions are derived from
literature curation or direct user
submissions
•
With Intact you can:
 Find molecules that interact with your
protein of interest
 Display interaction networks
 Analyze interaction networks using GO
terms, molecule type, role, etc.
 Download data
 Install IntAct system locally
29
EBI Overview
The Protein Data Bank in Europe (PDBe)
•
PDBe is a resource for the collection, organization and dissemination of data
about biological macromolecular structures
•
A suite of web-based services allows you to:
 PDBeView and PDBeLite provide a flexible and user-friendly query interface to the PDBe
database
 PDBeAnalysis provides searches and statistical analyses of macromolecular structure and
residue information
 PDBeFold allows performing pairwise or multiple comparisons as well as 3D alignments of
structures
 PDBeChem allows searching for and visualize any molecule in the PDB’s ligand dictionary
 PDBePisa is an interactive tool for exploring macromolecular interfaces and surfaces,
predicting probable quaternary structures (assemblies) and searching the PDB for structurally
similar interfaces and assemblies
 PDBeMotif allows complex searches of the PDB based on small 3D motifs, sequence motifs
in conjunction with ligand environment, secondary structure patterns
 Many more tools available
30
EBI Overview
Structures: PDBe
Sequence
mapping
Linking to
domain data
Ligands
Assemblies
Electron
density
visualization
Active sites
Fold matching
31
EBI Overview
Surface
matching
PRoteomics IDEntifications database (PRIDE)
32
•
PRIDE is a centralized, standards
compliant, public data repository for
proteomics data
•
Provides the proteomics community
with a public repository for protein
and peptide identifications together
with the evidence supporting these
identifications.
•
PRIDE is also able to capture
details of post-translational
modifications coordinated relative to
the peptides in which they have
been found.
EBI Overview
Enzymes: IntEnz
33
•
IntEnz (Integrated
relational Enzyme
database) is a freely
available resource
focused on enzyme
nomenclature.
•
IntEnz contains the
recommendations of the
Nomenclature Committee
of the IUBMB on the
nomenclature and
classification of enzymecatalysed reactions.
EBI Overview
Chemical entities: ChEBI
•
ChEBI is a freely available, manually annotated database of small molecular
entities
•
A molecular entity is any constitutionally or isotopically distinct atom,
molecule, ion, ion pair, radical, radical ion, complex, conformer, etc.,
identifiable as a separately distinguishable entity, not directly encoded by the
genome
•
With ChEBI you can:
 Find the correct chemical terminolgy using name, formula or registry number
 Visualize chemical structures
 Perform similarity searches
 View the relationship between molecules using the chEBI ontology
 Bridge the gap between small molecules and the macromolecules they interact with (crosslink
to UniProt and Reactome)
 Downoload chemical structures
 Submit new structures
34
EBI Overview
Chemical entities: ChEBI
View mappings to other
databases such as
Reactome and Uniprot
Download flat files,
database dumps and
the ChEBI Ontology
for local installation
View
relationships in
the ChEBI
Ontology
Link to other
databases
35
EBI Overview
View structure,
nomenclature,
formula and more
Chemogenomics: ChEMBL
36
•
ChEMBL is a publicly available database of drugs, drug-like small molecules
and their targets
•
The data includes information about how small molecules bind to their
targets, how these compounds affect cells and whole organisms, and
information on the molecules’ absorption, distribution, metabolism, excretion
and toxicity.
•
ChEMBL holds two-dimensional structures, calculated molecular properties
(e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and
bioactivity data (such as binding constants and pharmacology).
•
The bioactivity data is tagged to show links between molecular targets and
published assays, with a set of varying confidence levels.
•
Additional data on the clinical progress of compounds is being integrated into
ChEMBL.
EBI Overview
Chemogenomics: ChEMBL
ChEMBL
37
EBI Overview
Pathways: Reactome
• A free, online, open-source curated database of pathways and
reactions in human biology
• Information in the database is authored by expert biologist
researchers, maintained by Reactome editorial staff
• Used to infer orthologous events in 22 non-human species including
mouse, rat, chicken, puffer fish, worm, fly, yeast
• Extensively cross-referenced to other resources e.g. NCBI, Ensembl,
UCSC genome Browser, UniProt, PubMed, KEGG, ChEBI and GO.
38
EBI Overview
Pathways: Reactome
View reactions and events in
detail
Select a
pathway
Compare events in
different species
Export pathway
Pathways: Reactome
Display expression data
40
EBI Overview
Link to source
databases
Biological ontologies: Gene Ontology (GO)
41
•
The GO project is a collaborative
effort to address the need for
consistent descriptions of gene
products in different databases
•
GO develops ontologies that
describe biological processes,
cellular components and molecular
functions in a species-independent
manner
•
Also GO annotates several of the
EBI’s databases with GO terms
EBI Overview
User support
• 2Can bioinformatics user support – www.ebi.ac.uk/2Can
• Online help pages – www.ebi.ac.uk/help
• E-mail support – www.ebi.ac.uk/support
42
EBI Overview
http://www.ebi.ac.uk/Information/Brochures/
43
EBI Overview
Research
www.ebi.ac.uk/groups
Key facts about research
• The EBI provides a unique environment for bioinformatics
research
• Seven dedicated research groups aim to understand
biology through new approaches to interpreting biological
data
• Services teams also carry out R&D to enhance existing
services and develop new ones
• Research program complements services and the two
are mutually supportive
45
EBI Overview
Research
Functional genomics
and small RNA analysis
Enright
Vertebrate genome
annotation
Flicek
Literature analysis and
semantic data integration
in life science research
Rebholz-Schuhmann
Algorithmic methods
for genome analysis
Birney
Transcriptome
analysis on a
genomic scale
Brazma
Genome analysis using
evolutionary tools
Goldman
Evolutionary
biology
Marioni
Protein
sequence
analysis and
functional
annotation
Apweiler
Analysis of protein
structure, function and
evolution
Thornton
Analysis and
validation of protein
structures; protein–
ligand interactions
Kleywegt
Cheminformatics and
metabolism
Steinbeck
Chemogenomics
and drug discovery
Overington
Genome-scale
analysis of
regulatory
systems
Luscombe
Neurobiology
networks and
systems
Le Novère
Systems
Biomedicine
Saez-Rodriguez
Mammalian stem cell
differentiation and development
Bertone
Training
www.ebi.ac.uk/training
A tripartite user-training programme
Training any time, anywhere, at
any pace
www.ebi.ac.uk/training/elearning
Training comes to you
www.ebi.ac.uk/training/roadshow
eLearning
programme
Bioinformatics
Roadshow
Hands-on
training at
EMBL-EBI
Hands-on user training on all our
core data resources for
researchers
www.ebi.ac.uk/training/handson
48
EBI Overview
Hands-on training for all levels of experience
• Interactive training in our purpose-built IT training suite at EMBLEBI, Hinxton, Cambridge
• Learn from the EBI’s experts through a combination of talks and
practical exercises
• Take a tour of all our core data resources, or focus in on specific
data types
• Full programme at www.ebi.ac.uk/training/handson
49
EBI Overview
eLearning project – pilot phase
• Do you want to learn at your
own pace at a time that suits
you?
• We are developing a new
eLearning platform and need our
users to help us test it
• If you would like to get involved,
contact: [email protected]
50
EBI Overview