Differential insertion of transposable elements in Anopheles

Download Report

Transcript Differential insertion of transposable elements in Anopheles

Differential insertion of transposable elements in Anopheles
gambiae M & S genomes
Jenica L. Abrudan, Ryan C. Kennedy, Maria F. Unger, Michael R. Olson, Scott J. Emrich, Frank H. Collins, Nora J. Besansky
Eck Institute of Global Health, University of Notre Dame
Abstract :
Generating the start up data
Indels between M/S and PEST were
determined by mapping the reads of M/S
to the PEST assembly and comparing the
distance between the mate pairs.
Mosquitoes in the Anopheles gambiae species complex are the major
vectors of malaria in Africa. The original A. gambiae genome
sequenced was the PEST strain, which was later discovered to be a
composite of the A. gambiae M and S forms. These 2 sympatric
forms demonstrate reproductive isolation and are believed to be
incipient or different species. They have been individually sequenced
recently, so we are performing computational analysis of the three
genomes to identify sequence differences. We hypothesize that
transposable elements may be influencing the speciation of A.
gambiae.
Sequences that are inserted into the S
genome assembly , but not in the M or
PEST assembly - 6,767 sequences
Sequences that are inserted into the M
genome assembly , but not in the S or
PEST assembly - 6,792 sequences
Sequences that are inserted into the M
and PEST genome assemblies but not in
S
1,301 sequences
Insertions of transposable elements have been associated with alterations
in chromosome structure, recombination, replication, and gene
regulation. Recent studies have indicated the existence of “speciation
islands” and numerous genes differentially expressed across multiple
developmental stages between the M and S forms though many of those
genes lie outside of the “speciation islands” implying there are more
causal factors to be discovered. We have identified sequences differently
inserted between the M & S genomes relative to PEST. We then
identified the subset of those sequences that contain transposable
elements using a discovery pipeline we have developed. We are currently
using this subset of data to identify those sequences that are in close
proximity (~1kb) to gene elements, and will perform experiments
designed to measure the expression levels of those genes. We hope to
find a correlation between the differentially inserted transposons and the
observed gene expression differences.
Sequences that are inserted into the S and
PEST genome assemblies but not in M
2,128 sequences
The differentially inserted sequences are computed from
the two genome assemblies by mapping the M and S
reads to the PEST genome and measuring the distance
between the mate pairs and comparing it to where the
mate pair would map to PEST.
Suspicious sequences fell into two categories :
•Sequences present in only one assembly (either M or S)
•Sequences present in either M or S and PEST but not in
all three assemblies
Steps in analyzing the potential differential insertions between M ans S
b
l
a
s
t
n
Putative insertion in
M relative to S
Input data:
• M genome assembly
• S genome assembly
• PEST genome assembly
• DNA sequences from RepBase and TEFam databases for the
known transposable elements in Anopheles gambiae
• Bioperl
Future plans:
Putative insertion in
S relative to M
Transposable
elements
compiled
database
1,075 transposable element related
putative insertions into M relative to S
1,146 transposable element related
putative insertions into S relative to M
Sequences that were computationally
found to be different between the M and S
assemblies were further analyzed for
presence of transposable elements.
For this purpose a database was computed
out of the transposable element known to
be present in the A. gambiae.
The sequences were blasted (blastn)
against this database and only those with
an e-value of 10-26 or less were further
considered.
The distribution of the families of transposable elements derived indels between the two genomes seems to be highly similar
• Verify that the insertions are fixed between M & S
• Look at the insertion site relative to genes
• Look at the possible influence of transposable elements
related sequences on gene expression
Helitron
3%
SINE
10%
DNA
trasposons
48%
LTR
retrotransposo
ns
7%
Non-LTR
retrotransposo
ns
32%
LTR
retrotranspos
ons
10%
Helitron SINE
6%
2%
TEfam. http://tefam.biochem.vt.edu.
DNA
trasposons
45%
Non-LTR
retrotranspos
ons
37%
Harbringer1
HarbringerN1
HarbringerN2
hAT
Ikirara
ITmD37E_Ele1
Mariner
Pegasus
PiggyBack
M indels TE content
Distribution of TE derived indels in S
relative to M
Repbase. http://www.girinst.org/repbase/index.html.
VectorBase. http://www.vectorbase.org
Gambol
Class I Non-LTR retrotransposons and Class II
DNA transposons seem to have the highest
representation among the potentially different
insertions between the two genomes both as
diversity of sequence and as number of sequences
present.
References:
D. Lawson, et al., VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids
Research, 37:D58307, 2009.
S indels TE content
Distribution of TE derived indels in M
relative to S
Gambol
While the numbers differ between the two
genomes, Mariner seems to be the most abundant
DNA transposon for both, with 212 sequence for
S and 189 for M. The next most abundant
element from the same class in M, Tc1 has a
much lower number for S (151 vs 94)
Bioperl. http://bioperl.org
The VectorBase project is funded by the US National Institute of Allergy and Infectious Diseases (NIAID), contract HHSN266200400039C.
Harbringer
hAT
Ikirara
ITmD37E
Mariner
P3_AG
Pegasus
PiggyBac
Tc1