Print your poster - HCGS - University of New Hampshire

Download Report

Transcript Print your poster - HCGS - University of New Hampshire

Protein Alignment for Functional Profiling Whole Metagenome Shotgun Data
Westbrook, Anthony; Ramsdell, Jordan; Normington, Louisa; Aggarwal, Taruna;
Bergeron, R. Daniel; Thomas, W. Kelley; MacManes, Matthew
Great Bay Community College & University of New Hampshire
Abstract
Whole metagenome shotgun sequencing is a
potentially powerful approach to assaying the
functional potential of microbial communities. The
process is currently limited by the lack of tools that
can efficiently and accurately map DNA sequence
reads to functionally annotated reference
genomes. Here we present a modification of the
Burrows Wheeler Alignment that significantly
improves the efficiency and accuracy of functional
read mapping by directly mapping in protein
sequence space.
Background
Methods
Generate PE reads using 6 bacterial genomes
BWA
Map to nucleotide
references in
UniProt database
PALADIN
Transcribe and translate
each read into 6
potential ORFs and
corresponding protein
sequences, respectively
• Sequencing of DNA directly from the
environment (metagenomics) is transforming our
knowledge of microbial community diversity.
• Using Whole Metagenome Shotgun reads to
understand the functional capacity of a microbial
community is challenged by the divergence of
reference genomes and the vast number of
sequences derived from environmental samples.
Objective
Hypothesis
P
E
L
Y
Map to protein
references in the
UniProt database
1.
2.
Create degenerate version of
references based on nucleotide
ambiguity code
Map to degenerate references
corresponding to UniProt database
Mapping whole genome shotgun reads in protein
space will dramatically increase the efficiency of
identifying gene functions while maintaining
accuracy.
Jaccard Similarity Index
RESEARCH POSTER PRESENTATION DESIGN © 2012
www.PosterPresentations.com
Evaluation of Three Mapping Methods
CCA AAG TAC GAA
CCN AAR TAY GAR
Graphs
Conclusion
Figure 1. Intersection of GO term graphs, indicated in
purple, constructed from two collections of genes from a
human breast cancer dataset2 using the GOrilla
software3. The darkened vertices at the bottom of the
graphs are the exact GO term assignments, and the
similarity index between the two collections is 7/29 =
0.24.
Novoalign
To improve the mapping of metagenome-derived
DNA sequence to homologous protein coding
sequences in reference databases, we have
developed an algorithm that maps sequence reads
to a reference in the amino acid space.
Graph Similarity Index for Gene
1
Ontology Comparison
CPU
Memory
Usage
% Reads
Mapped
BWA Full
UniProt
28.7
BWA
Filtered
UniProt
Novoalign
Full
UniProt
Novoalign
Filtered
UniProt
PALADIN
Full
UniProt
PALADIN
Filtered
UniProt
21.9
GO Term
Similarity
Index
Future Work includes rapid functional
characterization of human microbe metagenomic
data for the novel development of new treatment
methods.
References
1. Gene Ontology Consortium. Nucleic Acids
Research, 43(D1), D1049-D1056GO (2015).
2. van't Veer LJ et al. Nature 415, 530-536 (2001).
3. Eden, E. et al. BMC Bioinformatics, 10, 48 (2009).
Acknowledgements
4323.57
39.0
4377.02
33.4
0.75
I would like to thank NH-INBRE for funding this
research opportunity. Many thanks (to my mentor
Kelley Thomas and the Hubbard Center for Genome
Studies University of New Hampshire) and to my
academic advisor Leslie Barber (Great Bay
Community College) for her support with all of my
academic endeavors.