Transcript Lecture
What Is Genomics?
Genomics is the study of how the entire genome of a
species functions as a unit and evolves over time.
It is the study of life’s blueprint, life’s diversity, and
life’s history.
Bioinformatics: Analyses the information content of genomes.
Comparative Genomics: Compares genome sequences with each other to
infer evolutionary relationships and mechanisms of evolution.
Functional Genomics: Probes how genomes function, as a whole, to give
rise to organisms.
Ecological Genomics: Understands how genomes, and the organisms they
encode, fill specific environmental niches.
>500 complete microbial genomes
730 in progress
Why Sequence Whole Genomes?
• To speed characterization of genes
mapped by linkage
• To obtain a "parts list" for what
makes up an organism.
• To discover what sets of genes make
organisms (and each of us) similar to
and different from one another.
• To understand our evolutionary
heritage. Our genomes are a reflection
of our recent and ancient origins.
Dideoxy “chain termination” DNA sequencingSanger v1, continued
Dideoxy “chain termination”
DNA sequencingSanger v2 !!
Fred Sanger, SECOND Nobel Prize
in 1980 (Chemistry; his first was in
1958 for methods for determining
amino acid sequences in proteins).
Made easily robot friendly.
High throughput DNA sequencing has
allowed the sequencing of whole
Genomes.
This has driven the Genomics revolution.
The Next Gen Technologies:
Pyrosequencing (454, Roche)
Sheared and ssDNA
Oil emulsion:
1 template/bead
Polony:
clonal
amplification
Polony + enzyme beads
Picoliter plate
First of the ‘parallel’ sequencing platforms.
Sequencing-by-synthesis
Pyrosequencing
3’ATCGTTGCACGTCGACGTA
5’TAGCAACG
dGTP
PPi
ATP sulfurylase
ATP
Luciferase
400K reads X 400 bases = 16-20 Mb in 4 hrs!
So, 125 Mb is only 6 runs for 1X coverage!
The Next Gen Technologies: Illumina (Solexa)
Sequence by synthesis – one base at a time
Each base has a different color
Each base has a reversible terminator
The Next Gen Technologies: Illumina (Solexa)
3’ 5’
A
G
C
A
T
C
G
A
T
G
C
T
Sample
DNA
preparation
1
2
3
Single
Cluster
molecule
growtharray
4
5
6
7
8
T
G
C
T
A
C
G
A
T
A
C
C
C
G
A
T
C
G
A
T
5’
Sequencing
9
T G C T A C G A T …
Image acquisition
5 million clusters / channel; 8 channels / flow cell
= 40 million reads times ~35 cycles (bases)
= 120 billion bases in 3 days (1.2Gb)
Base calling
Technology Development drives Biology
Sanger
Shotgun
454
Solexa
Cloning
Yes
No
No
Chemistry
Sanger
pyrosequencing reversible
terminators
DNA bases per $
1000
50,000
500,000
Human (1X!!) $
3,000,000
60,000
6,000
Time
A really long
time
4 hours
4 days
Accuracy
Consensus
99.99%
Worst of the 3;
not good with
homopolymer
repeats
1-3% errors
Assembly
Computational
tools
Best
OK
Horrible
Gap Closure and
Finishing
Pain
Pain in the rear
ROYAL Pain in
the rear
First, we've learned that we have a lot to learn
Over 35% of genes in ANY organism (including Human)
have no deducible function!