Keynote for 2008 Genomics Workshop
Download
Report
Transcript Keynote for 2008 Genomics Workshop
cheap sequencing for regular
Joes and Janes: the demand for
more bioinformatics software
Gane Ka-Shu Wong: iCORE Chair in BioSystems Informatics The
University of Alberta – Biological Sciences and Medicine
And also, Associate Director of the Beijing Genomics Institute
the essence of the human genome
project and it’s offspring the human
HapMap project was really about
improving the core technologies of
genetics: sequencing, genotyping
2001
2002
2005
2007
historical costs to sequence the
3 billion bp of a human genome
Gordon Moore
costs to sequence a human genome
10
log10 (US dollar)
9
8
$3 billion
7
$300 million
6
5
$300,000
two vendors
4
1989
1991
1993
1995
1997
1999 2001
year sequenced
2003
competition
2005
2007
2009
Roche-454 (pyro)sequencing
1. fragment and
denature
2. add adapters
to both ends
3. one fragment
per bead
chemiluminescent signal generation: dNTP incorporation releases PPi;
sulfurylase converts PPi to ATP; luciferase converts ATP to visible light
4. emulsion PCR
amplification
5. sequencing by
synthesis
6. analyze image
of bead array
Illumina-Solexa sequencing
1. fragment, denature,
and add adapters
2. bind randomly to
primer lawn, perform
bridge amplification
in contrast to Roche-454, the Illumina-Solexa technology generates
multi-colored fluorescent signals on a randomly arrayed 2D surface
3. sequencing by
synthesis, four-color
labeled dNTPs
4. computer analysis
of lawn image
capillaries versus next generation
(massively parallel) DNA sequencing
PE-ABI (3730xl)
Amplify before loading
ng's DNA per read
One hour to run 96 reads
Lengths 1000 bp
Daily throughput: 2.3 Mb
Higher accuracy
Costs per 1000 bp: $2
Roche-454 (FLX )
Illumina-Solexa (GA)
Single molecule sensitivity Single molecule sensitivity
μg's DNA per run
μg's DNA per run
7.5 hrs run for 0.4M reads 2.5 days run for 80M reads
Lengths 200 bp
Lengths 40 bp
Daily throughput: 0.25 Gb Daily throughput: 1.28 Gb
Lower accuracy
Higher accuracy
Costs per 1000 bp: $0.20 Costs per 1000 bp: $0.005
excitement about
Pacific Biosciences
is based on read
lengths of many kb,
albeit with lower
base pair accuracy
BGI Offers Next-Gen Sequencing Service: Kicks Off 100Genome Sequencing Project [8 January 2008]
Knome, BGI Forge Sequencing Alliance; GATC Spins Off
Personal Genomics Unit [January 15 2008]
Google
580,000 SNPs
BGI-Shenzhen
1 million SNPs
whole genome
BGI-Shenzhen and allies in the US and UK will be
sequencing 1000 human genomes in the next 3 years
Nature: 17 January 2008
Science: 25 January 2008
1000 human genomes will turn the
medical genetics world upside down
PHENOTYPE TO GENOTYPE
cystic fibrosis CFTR disease affects less than a percent of population
breast cancer BRCA1+BRCA2 genes affect only a few percent of patients
GENOTYPE TO PHENOTYPE
functional polymorphisms identified in 1000 individuals linked to disease by
association studies information of value to policy makers in public health
Rommens JM, … Tsui L-C, Collins FS (1989).
Identification of the cystic fibrosis gene:
chromosome walking and jumping. Science 245:
1059-1065.
Riordan JR, … Collin FS, Tsui L-C (1989).
Identification of the cystic fibrosis gene: cloning
and characterization of complementary DNA.
Science 245: 1066-1073.
Kerem B, … Tsui L-C (1989). Identification of
the cystic fibrosis gene: genetic analysis. Science
245: 1073-1080.
8 September 1989
after 19 years (and 1000 genes) we have not cured a genetic disease
Maynard, I just decided that I hate
your generation. You made all those
promises about the human genome
sequence improving health care, but
my generation will have to deliver.
That’s right. One of these
days one of you will have to
actually cure something!
Prof. Maynard Olson
YanHuang and the panda genome
(raising awareness for the new technologies)
Emperor’s Yan and Huang were
the first rulers of ancient China,
so modern Chinese say that they
are descendants of YanHuang.
The panda is a Chinese national treasure
and the logo for the World Wildlife Fund.
While not the first endangered species to
be sequenced (chimp was first), it will be
the first with a conservation focus.
Whole genome shotgun assembly is nontrivial for 45 bp reads even with paired end
information and 50x redundancy.
YanHuang genome: data collected
YanHuang genome: read coverage
YanHuang genome: SNP accuracy
YanHuang genome: disease alleles
aftermath of 12 May 2008 earthquake in
Sichuan measuring 7.9 on the Richter scale
aftermath of 12 May 2008 earthquake in
Sichuan measuring 7.9 on the Richter scale
our plans for the panda genome
(whole genome assembly using short reads)
50x of paired end data using Solexa
average read lengths 40~50 bp
estimated scaffold sizes 10~100 kbp
anchored by synteny to human
first assembly by end of August ’08
graph and overlap layout based
molecular censusing doubles giant panda
population estimate in a key nature reserve
Zhan X, Li M, Zhang Z, Goossens B, Chen Y, Wang H, Bruford MW, Wei F.
Curr Biol. 2006 Jun 20; 16(12): R451-2
redo experiments on more
comprehensive population
from every panda reserve,
and with 1536 SNPs rather
than just 9 microsatellites
expressed gene sequences of 1000
medicinal plants for only $2 million
There are 96 plant species with more than
20,000 expressed sequence tags (ESTs),
but most are crop plants. If we count only
medicinal plants, generously defined to
include makers of secondary metabolites
with purported health benefits, such as
lycopene for tomatoes and resveratrol for
grapes, there are 16 plant species with
more than 20,000 ESTs. If we use a strict
definition of medicinal, there are just 4
plant species with more than a mere 5000
ESTs. They are artemesia, Madagascar
periwinkle, gingko, and ginseng.
1/1000 of the proposed data has
launched the field of phylogenomics
10 April 2008 – 40 Mb total
from ESTs in 29 animals
27 June 2008 – 5.4 Mb total
from genome of 169 birds
artemisinin: poster child for the
synthetic biology investment world
synthesized by Jay Keasling
$40M from Gates foundation
Amyris is now into biofuels
$600M to Berkeley university
CYP71AV1 by
x-species EST
in leaves of sweet wormwood
FPP pathway
most effective anti-malarial
a proposal to crowd source the
writing of bioinformatics software
OPEN SOURCING’s classic example is Linux
sophisticated software (e.g. comparable sophistication in
bioinformatics is whole genome shotgun assembly) that was
developed by a small handful of talented programmers
CROWD SOURCING alternative is Wikipedia
millions of contributors each writing a small article on a
specific topic; similar to much (but not all) of bioinformatics
as it does not require PhDs and can be done by students
who will work for free and how
would we incentivize them to do so
biologist
with data to
analyze
technical
specification
of issues
talented
bioinformatics
student
contributions recorded
on website open to
prospective employers
young people need a chance to prove themselves; we will provide a web
based mechanism for them to do so, on a high profile international scale
Alberta and China: where is this
happening and who is paying for it
BGI – Jian Wang, Jun Wang, Huanming Yang, Jun Yu
UofA Biological Sciences – Michael Deyholos
UofA Medicine – Andrew Mason, Richard Fedorak, Lorne Tyrrell
UofA Computing Science – Paul Lu, Guohui Lin
Research funding from the Alberta Informatics Circle of Research
Excellence and the Government of Shenzhen
Additional support from UofA Biological Sciences