related to Myers et al and sequencing/assembly

Download Report

Transcript related to Myers et al and sequencing/assembly

Welcome to
Introduction to Bioinformatics
Wednesday, 10 February
Genome Sequencing/Assembly
• Genome sequencing/Assembly
This demonstration is best viewed as a slide show,
enabling you to simulate a session and make
changes
in cursor
more
Click
anywhere
to position
go on to
theobvious.
next slide
To do this, click Slide Show on the top tool bar, then View show.
What to do for summer vacation?
Deadline, SUNday Feb 28!
Target, Monday Mar 1!
Deadline, ???
Deadline, FRIday Feb 26!
Global Viral Genome Project
Deadline, whenever!
Learn more about…
HHMI: http://www.vcu.edu/csbc/hhmi/
BBSI: http://www.vcu.edu/csbc/bbsi/
VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm
GVGP: http://biobike.csbc.vcu.edu (News)
Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
G A T C
Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
G A T C
Dideoxy sequencing
(= Sanger sequencing)
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
G A T C
Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
G A T C
ddC
ddC
ddC
ddC
ddC
T
C
G
T
G
T
A
C
A
T
C
G
T
A
A
C
A
C
G
G
T
T
A
A
G
T
Sequencing process
Sequence it
Drosophila genome
(~100 million nt)
Technical limitation
Reads limited to 100’s of nt
Sequencing process
Drosophila genome
(~100 million nt)
...
How many possible 500 nt fragments are there?
Sequencing process
Drosophila genome
(~100 million nt)
...
SAMPLE
Sequencing process
Drosophila genome
(~100 million nt)
...
SAMPLE
How many 500 nt samples needed  100 million nt?
100 000 000
500
Sequencing process
Drosophila genome
(~100 million nt)
...
SAMPLE
How many 500 nt samples needed  100 million nt?
1 000 000
Is this enough?
5
Oversampling … coverage?
Study Question 8 & 9
"oversampling"? "coverage"?
Shotgun sequencing ?
Paint the wall
How long
will this take?
Study Question 8 & 9
"oversampling"? "coverage"?
Shotgun sequencing ?
Paint the wall
How long
will this take?
Study Question 8 & 9
"oversampling"? "coverage"?
Shotgun sequencing ?
40 "
Paint the wall
25 "
How long
will this take?
1 sq "
Study Question 8 & 9
"oversampling"? "coverage"?
Shotgun sequencing ?
40 "
Paint the wall
1000
paint balls?
25 "
How long
will this take?
Study Question 8 & 9
"oversampling"? "coverage"?
Shotgun sequencing ?
1
0.9
Completeness
0.8
0.7
How much is painted
with 1x oversampling?
0.6
0.5
What fraction won't
be painted?
0.4
0.3
0.2
0.1
0
0
2
4
6
Oversampling
8
10
Intersection of possibilities
(Rule of multiplication)
Probability that two coins come up both tails
Rule of multiplication
intersection
independent
First
coin
toss
Second coin toss
H
T
H
HH
HT
T
TH
TT
Gets T from first AND gets T from second
P(TT) = 1/2
x
1/2 = 1/4
Union of possibilities
(Rule of addition)
Probability that either of two coins comes up tails
1/2 x 1/2 = 1/4?
Second coin toss
1/2 + 1/2 = 1?
First
coin
toss
H
T
H
HH
HT
T
TH
TT
Gets HT or TH or TT
P(at least 1 T) = 1/4 + 1/4 + 1/4
Union of possibilities
(Rule of addition)
Probability that either of two coins comes up tails
Rule of addition
union
mutually exclusive
First
coin
toss
Second coin toss
H
T
H
HH
HT
T
TH
TT
Gets HT or TH or TT
P(at least 1 T) = 1/4 + 1/4 + 1/4
Union of possibilities
(Rule of complementation)
Probability that either of two coins does not comes up tails
Rule of complementation
yin-yang
Adds to 1
First
coin
toss
Second coin toss
H
T
H
HH
HT
T
TH
TT
Probability(2 T) = 1 – Probability(NOT 2 T)
P(at least 1 T) = 1 - 1/4
Sequencing process
Drosophila genome
(~100 million nt)
...
Focus on one nucleotide…
What’s the probability that it’s covered by one read?
What’s the probability that it’s covered by two reads?
What’s the probability that it’s covered by 200,000 reads?
Problem Set 3, Problem 2
Statistics of mini-plasmid assembly
Myers et al SQ6
Why read pairs? Scaffolds?
Contig 1
DNA
Contig 2
Myers et al SQ6
Why read pairs? Scaffolds?
primer
~2000 nt
insert
x 1000's
plasmid
mates
primer
GATC
Myers et al SQ6
Why read pairs? Scaffolds?
Bacterial Artificial CHROMOSOME
P1-derived Artificial CHROMOSOME
~ 150,000 nt
...
mates
Myers et al SQ6
Why read pairs? Scaffolds?
Myers et al (2000)
SQ14. From figures given in the text and in
Table 1, check the accuracy of each of the
following statements:
a. "We produced 3.156 million reads that yielded
1.76 Gbp of sequence. . ."
b. ". . .trillions of overlaps between reads are
examined."
c. ". . .to produce 654,000 of the 2-kbp mates and
497,000 of the 10-kbp mates."