No Slide Title

Download Report

Transcript No Slide Title

Mouse Genome Sequencing
10/15/2002
Wei Yuan
---Sequencing strategy and the Physical Map
Reference:
1) Gregory, et al. A physical map of the mouse genome. Nature,
2002, 418, 743-750
2) Green, ED, Strategies for the systematic sequencing of
complex genomes. Nature reviews Genetics, 2001, 2, 573-583
First Part
Strategies for the systematic sequencing
of complex genomes
Human
Why Mouse?
1) Human and mouse genomes
have conserved blocks of
genetic material
Mouse
Source: Lisa Stubbs, Lawrence Livermore National Lab
Why Mouse?
Humans and mice
share many of the
same genes
Mouse- Human Comparative Map
(2 cM around Acrb gene)
http://www.informatics.jax.org
Why Mouse?
Humans and mice
suffer from similar
diseases
Species Concordance for Susceptibility Alleles for Hypertension
Hyperte nsion QT L in Mouse & Hu man (B . Paigen & G. Chu rchill)
Strategies for Complex Genomes sequencing
•clone-by-clone shotgun sequencing
• whole-genome shotgun sequencing
• hybrid strategies for shotgun sequencing
Contig: overlapping series of clones or sequences reads (for
a clone contig or sequencing contig, respectively) that
corresponds to a contiguous segment of the source genome.
Two main shotgun-sequencing strategies
clone-by-clone
whole-genome
For clone-by-clone, a sequence-ready BAC contig map is required
Sequence-ready BAC contig map. A collection of overlapping
bacterial artificial chromosome (BAC) clones that contain human DNA
was subjected to restriction enzyme digest-based fingerprint analysis.
The resulting data was analysed using the program FPC, which
constructed the depicted BAC contig map that spans >1 Mb.
Minimal Tiling Path: a minimal set of overlapping clones that together provides
complete coverage across a genomic region. (The 11 clones outlined in red, which
provide a minimal tiling path across the corresponding genomic region, were selected
for sequencing. )
The probability of two clones overlapping is based on the similarity
of their fragments, performed by the program FPC.
FPC uses an algorithm to cluster clones into contigs based on their
probability of coincidence score. For each contig, it builds a
consensus band (CB) map which is similar to a restriction map; but
it does not try to resolve all the errors. The CB map is used to
assign coordinates to the clones based on their alignment to the
map and to provide a detailed visualization of the clone overlap.
two clones are considered to overlap if the following score is
below a user supplied cutoff:
M is the number of shared bands, nL and nH are the lowest and highest number of bands in the two clones, respectively, t is
the tolerance, gellen is approximately the number of possible values, b = 2t/gellen, and p = (1 b)nH,.
Shotgun-sequence assembly ---display from the program Consed
Hybrid shotgun-sequencing approach
Hybrid shotgun-sequencing approach
• take benefits of both clone-by-clone and whole-genome shotgun
• whole-genome shotgun: provides rapid insight about the
sequence of the entire genome
• clone-by-clone shotgun: simplifies the process of sequence
assembly to individual clone-sized genomic segments, thereby
minimizing the likelihood of serious misassemblies
Used by NIH in mouse genome sequencing. (Celera is using
whole genome shotgun)
Second Part
Construction of a physical map of the
mouse genome
Physical map of a genome is an essential guide for navigation,
allowing the location of any gene or other landmark in the
chromosomal DNA.
It provides:
• a framework for assembly of whole-genome
shotgun sequence data
• a tile path of clones for generation of the
reference sequence
Strategy: Using the human sequence as a framework!
Benefit:
1. Give a better level of resolution
2. Accelerate the process of constructing the mouse
clone map
But why to choose human sequence?
Because they are similar in
sequence organization!
• 180 conserved synteny (a region where the chromosomal
location of multiple genes is conserved)
• conserved segment/linkage (a region where the order of
multiple genes on a single chromosome segment is the same in
both species)
Comparing Human and Mouse DNA
• Most human genes have mouse orthologs
• Coding exons usually correspond 1-1
• Coding sequence similarity ~ 85%
Let’s go back to an old slide
Human
Why Mouse?
1) Human and mouse genomes
have conserved blocks of
genetic material
Mouse
Source: Lisa Stubbs, Lawrence Livermore National Lab
How to construct a physical clone map of the mouse genome
Two Phases:
Phase I: Generation of a human-mouse homology clone map
• Compared restriction digest patterns (‘fingerprint’) of 305,716 BAC
clones. Identified overlaps between clones on the basis of similarity
between fingerprints and use this information to construct 7,587 contigs
of overlapping clones.
----Done by the program FPC, under high strigency conditions at
a probability of 1x10-16 and a match tolerance of seven
• Align the mouse BAC contigs to the human genome sequence by BES
(BAC end sequences). Extend and join contigs where possible after reexaming the fingerprint data (p>1x10-12).
---- Done by BLASTN (with a blast score>700)
Phase II: Generation of a mouse clone map
Use a set of independently mapped mouse markers
(available in existing genetic and radiation hybrid maps of
the mouse) to position the BAC contigs in the mouse
genome.
---Markers were added to the map either by electronic PCR,
or by hybridization using probes
After further manual contig editing was carried out
(p>1x10-10), a mouse clone map comprising 296 contigs
was generated.
Construction of human–mouse homology clone map
Alignment between part of human chromosome 6 (Hsa6) and mouse
chromosome 4 (Mmu4). A 1.6-Mb interval is enlarged, showing
part of Hsa6q16.1 aligned to a 1.3-Mb mouse BAC contig.
11 of the 15 segments of human sequence match to
29 of the BESs within a mouse BAC contig
Summary statistics of human-mouse homology clone map
Summary statistics of mouse physical clone map by chromosome
More details about the mouse physical map
• found 51,486 homologous crosslinks btw two genomes
• Of the clones in the human genome tile path, 88% are collinear
with the mouse BAC map. For individual human chromosomes,
coverage by aligned mouse contigs exceeds 80% on all except
chromosome 19 (61%) and the Y chromosome (0%).
• Of the total coverage of the mouse BAC map (in 211 contigs), 97%
(2,658 Mb) is aligned to the human genome sequence.
• Most mouse BAC contigs contained multiple mouse markers
(average 57 markers per contig).
• coverage of the mouse genome (2.8 Gb) in mapped BACs is
virtually complete: 296 contigs of average size 9.3 Mb cover an
estimated 2,739 Mb. (~98%)
• 275 gaps due to breaks in synteny btw the two genomes.
Now
Available in
Draft Form!
The Mouse Genome
http://www.ncbi.nlm.nih.gov
http://genome.ucsc.edu
http://www.ensemble.org
Future Work for Mouse Genome Sequencing
• Finish sequencing by 2005
• Analysis
Sequence comparisons
Annotation
Gene Expression analysis
Global genomic analysis