Biological databases-Intro

Download Report

Transcript Biological databases-Intro

Integration and analysis of
multi-type high-throughput
data for biomolecular
knowledge discovery
Dr. Erik Bongcam-Rudloff
SGBC-SLU
Uppsala, Sweden
Biologists modus operandi
Observing a phenomenon that is in some way
interesting or puzzling.
Making a guess as to the explanation of the
phenomenon.
Devising a test to show how likely this explanation is
to be true or false.
Carrying out the test, and, on the basis of the results,
deciding whether the explanation is a good one or
not. In the latter case, a new explanation will (with
luck) 'spring to mind' as a result of the first test.
http://www.biology.ed.ac.uk/archive/jdeacon/statistics/tress2.html
The Observed
phenomenon
Selection of test times
But was is the real event?
Sometimes you could be lucky
Positive
Next Generation techniques
New challenges
1 TB data
Gbases produced at
Sanger
World NGS Map
http://omicsmaps.com/
But this is wonderful! Or?
Sequence without knowledge connected to it is worth:
0
The deluge of data produced by these hordes of
machines worldwide demand automatic workflows
Complete new systems to shuffle data around
Storage of never used amounts
Machines with gigantic amounts of RAM
COSTS
PROBLEMS
NOmenclature
Publishing culture
Moving target development
Old ways of work and resistance to changes in culture
Publishing culture as
example
We get tax payers money, we pay publishers to
publish, the publishers sell the articles and obtain the
copy rights
To connect knowledge to sequences we need
automatic methods, workflows, text mining. Most of
this is limited by close database systems. Only
available is PubMed. But PubMed has only short
abstracts. NO information about conditions, M&M etc
We need to change this culture
The BLAST analogy...
By far the most used tool by biologists
Not possible if databases were not Open Access and
freely searchable
Imagine if Nucleotide and Protein databases followed
the life science publishing model
BLAST
BLAST
BLAST
BLAST
BLAST
Human centric
What about all other areas of the Life Sciences?
Most genes are named by sequence similarity, but
are the functions the same?
Microbiome
A microbiome is the
totality of microbes, their
genetic elements
(genomes), and
environmental interactions
in a particular
environment.
http://www.secondgenome.com
Fat and lean
Metabolic effects of transplanting gut microbiota from
lean donors to subjects with metabolic syndrome.
A. Vrieze et al, EASD abstracts, 24 September 2012.
The result was: Lean donor faecal infusion improves
hepatic and peripheral insulin resistance as well as
fasting lipid levels in obese individuals with the
metabolic syndrome
Genome sizes
How many species?
Several orders of magnitude:
Some estimates:
3-50 million species of arthropods
1-100 million species of nematodes
Only a portion of bacterias have being identified, 99%
of bacterias cannot be cultured.
“Once the diversity of the microbial worldis
catalogued, it will make astronomy to look like a pitiful
science”
Julian Davies, Professor Emeritus. UBC
New research strategies
Microbial
Livestock
Plants
Typical Sources of
Metagenomics
Soil samples
Sea water samples
Air samples
Medical samples
Farm animal samples
Ancient bones
Human microbiome
Ion Proton: "Personal Genome
Machine".
Real tests of transcriptome sequencing on the Proton.
Using 500 ng of input poly-A RNA, it was possible to
generate 50 million reads from a melanoma cancer
sample.
Joe Boland of the National Cancer Institute according to Genomeweb.
LIFE TECHNOLOGIES CORPORATION
Oxford Nanopore
http://www.nanoporetech.com/
High technology
everywhere!
New applications
Only imagination will put the limits of what its possible
to be done using Next Generation Technologies!
The big challenge:
Open Access, Open source, collaborative networks
Data sharing
Common language
Tool systems to glue all together!!
SeqAhead
COST Action BM1006: Next Generation Sequencing
Data Analysis Network. 2011-2014
COST Action 25 countries
http://www.seqahead.eu/
ALLBIO
10 partners 8 countries
FP7 project
Broadening the Bioinformatics Infrastructure to
unicellular, animal, and plant science
www.allbioinformatics.eu
THANKS!!