Transcript Document
Sequence and Analysis of the Maize B73 Genome
Doreen Ware1,2, Joshua Stein1, Apurva Narechania1, Shiran Pasternak1, Linda McMahan1, Chengzhi Liang1, Wei Zhao1, Sharon Wei1, William Spooner1,,
Ben Faga1, and The Maize Genome Sequencing Consortium3
1
Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY11724, USA
2 USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, USA
3 Genome Sequencing Center, Washington University, St. Louis, MO 63108, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724; Arizona
Genomics Institute, University of Arizona, Tucson, AZ 85721; and Iowa State University, Ames, IA 50011
Maize Genome Gene Densities
Summary
Maize Accelerated Region Synteny Analysis
From its domestication 8,000 years ago in Central America to its position today
as the world’s leading harvested grain, Zea mays has played an important role
in human civilization, providing food, animal feed, and biofuel. Maize also enjoys
a long and distinguished history as a model organism owing to its rich diversity
and tractable genetics. The complete sequence of the maize genome would
therefore propel advances in basic research as well as agriculture and other
industries.
Sorghum
The Maize Genome Sequencing Consortium was launched with a three-year
grant from NSF to produce a complete sequence of the maize (B73) genome. At
2.5 Gb, the maize genome rivals mammalians in terms of size, and is six times
larger than rice, owing to its high content of retrotransposable elements. To meet
the challenge of producing an assembled sequence we took a BAC-by-BAC
approach, selecting a minimal tiling path of clones from a 20X fingerprint map.
Now in its third year, the project has produced complete sequences of 15,200
BAC clones comprising approximately 2 billion non-redundant bases, all
available via GenBank. Annotation of this first draft, using both ab initio gene
prediction and evidence-based approaches, gives preliminary estimates of gene
numbers, many of which produce alternative transcripts. Comparison to rice,
and a detailed analysis of a 22 Mb contig on chromosome 4, reveals that the
maize genome has been largely shaped by its history of tetraploidization,
subsequent rearrangement and duplicate gene loss. Gene annotations and
comparative maps generated by this project are available at the Gramene
Genome Browser (maizesequence.org).
Mean gene densities across chromosomes
*Densities calculated given 500Kb windows.
*Standard deviations provided in parentheses.
Count
58511
64716
254608
66525
• Genes were called on a
freeze of the maize data
containing 14,042 BACs.
• WH: with homology
(alignment to non-TE)
• NH: no homology
• TE: Alignment to TE’s in
a curated DB
• Prediction on masked
sequence with Gramene
Ensembl using same and
cross species evidence
• Evidence-based genes and transposons called on Maize BACs were
projected to the maize FPC map to illustrate contiguous, chromosome-level
gene frequency.
• Virtual Core Bins were generated via IBM2 anchors to the FPC map.
• The boxed area corresponds to the maize accelerated region. This region
appears to be gene rich relative to the rest of the genome.
22 mb
Sorghum
• The maize accel region contains syntenic
blocks to rice chr2 and sorghum chr4
• Maize: max gap between NETS 100,000
residues; min NET size 5000 residues.
• Rice and sorghum: max NET gap 50,000
residues; min NET size 2000 residues.
• Syntenic blocks are defined in two steps. First,
NETS are grouped if the distance
between them is smaller than twice the max
gap parameter and there are no NETS
breaking the synteny. Second, these groups are
arranged into syntenic blocks up
to 30 times the max gap parameter with two
synteny breaking groups allowed.
• The rice assembly is complements of TIGR
(version 5), and early access to the sorghum
assemblies complements of JGI.
Maize
Sorghum
Maize
Gene Level Synteny With Rice
Survey of Retroelements
Repeat Class†
Nucleotides*
Percent
Class I retroelements
1,725,195,428
76.03
Ty1/copia-like elements
541,469,024
23.86
Ty3/gypsy-like elements
1,035,938,047
45.65
LINES
8,758,596
0.39
SINES
183,194
0.01
Other retroelements
138,846,567
6.12
Class II DNA transposons
38,170,464
1.68
hAT superfamily
2,987,960
0.13
CACTA superfamily
13,098,567
0.58
Mutator
4,220,844
0.19
Tourist-like MITEs
780,447
0.03
Other MITEs
3,120,948
0.14
Other DNA transposons
13,961,698
0.62
Simple repeats
625,223
0.03
High-copy-number genes
3,369,366
0.15
Other repeats
17,054,679
0.75
Total repeats
1,784,415,160
78.64
†
Based on RepeatMasker using MIPS REcat library and classification database.
*Based on 14,042 BAC clones having 2,269,061,581 bp of sequence
• Class I
retroelements
are the most
abundant in the
genome.
• Class II DNA
transposons,
simple repeats,
and other
repeats are far
less abundant
• 78% of the
maize BACs
sequence is
repetitive.
• The high gene density of the accelerated region
(22.1 evidence-genes/500kb) is shown in further detail.
• Clusters of Gramene Genes (GeneBuilder) and Fgenesh Models (ab initio)
seem to mirror each other, a trend that is more apparent at 1 Mbase
magnification.
40
grande 3.3%
35
30
1.0%
1.4%
3.0%
prem1 4.3%
cinful
4.5%
milt
• Retro elements comprise 76% of
tekay
the genome sequence.
• DNA transposes, Sirs, and other
xilon
repeats comprise less than 3%.
• ji and huck families together grande
occupy 24% of the genome
prem1
sequence.
Rice chr 2
(29.0 – 35.8 Mb)
Alignments to
proteins in NR
68 genes
• Inversion associated with a deletion resulting in an unmatched span of 68
genes in rice.
• 2.7 -fold expansion in length of region in maize.
• 476/961 (49.5%) maize genes are syntenic.
• 380/1150 (33.0%) rice genes are syntenic.
• Suggestive of extensive gene movement as well as loss in maize.
Maize Accelerated Region Duplication
Cereal sequence
alignments
1800
1600
cinful
zeon
5.0%
25
20
Maize 22 Mb region
giepum 1.5%
other 1.8%
other
11.2%
opie
15
8.6%
zeon
other gypsy
huck
Phred quality
scores of BAC
sequence
giepum
huck
11.9%
ji
5
11.9%
opie
Mathematically
defined repeats
(20-mer freq’s in a
0.45X WGS maize
library)
ji
0
Ty3/gypsy
Ty1/copia
1400
1200
1000
800
600
400
200
other copia
10
Blastz-NET Alignments
Percent BAC Sequence (%)
45
milt
tekay
xilon
Non-syntenic
deletion
MIPS repeats
Retroelement Composition
50
Syntenic
BACs at MaizeSequence.org
Gene Predictions
• FgenesH
(ab initio)
• Gramene Genes
(evidence-based)
• Ensembl tracks are configurable and data sets can be toggled on and off
with user preference.
0
1
2
3
4
5
6
Rice
Sorghum Inversion
• Maize-sorghum and Maize-rice synteny
illustrates two large scale inversions, one on
Maize Chr4 (the accelerated region), and the
other on Sorghum Chr4.
1 mb
Rice
Maize Inversion
C hromos ome E videnc e- Genes T rans pos ons
1
1 5 .5 (9 .9 )
6 7 (2 1 .2 )
2
1 6 (1 2 .3 )
6 4 .7 (2 0 .5 )
3
1 3 .9 (1 0 .7 )
6 5 .3 (2 1 .1 )
4
1 3 .7 (1 1 .8 )
7 0 .6 (2 1 .7 )
5
1 4 .6 (1 2 .4 )
6 4 .5 (2 1 .2 )
6
1 5 .8 (1 1 .4 )
6 7 .5 (2 1 .5 )
7
1 5 .1 (1 1 .2 )
6 5 .1 (2 2 .2 )
8
1 6 .2 (1 1 .8 )
6 7 .7 (1 9 .9 )
9
1 6 .2 (1 1 .2 )
6 0 .6 (2 1 .8 )
10
1 5 .8 (1 1 .6 )
6 9 .1 (2 1 .8 )
Survey of Gene Statistics
Type
WH
NH
TE
Evidence
Maize
7
8
9
10
Maize Chromosome
• Rice Chr2 from positions 29MB to 36MB aligns to Maize Chromosomes 4 and 5
in equal measure indicating a duplication event. Alignments were made to maize
BAC-contigs and mapped to Chromosomes 4 and 5 using the FPC map.
• The majority of Chr4 hits were on FPC ctg182, corresponding to the accelerated
region. The majority of NETS on Chr5 were on contigs 250, 251, 253, and 254 in
agreement with marker based studies. PLoS Genet. 2007 Jul 20;3(7):e123
Rice