CG13335 - University of Pittsburgh

Download Report

Transcript CG13335 - University of Pittsburgh

Comparative genome sequence
navigation and manipulation with
the GenePalette software tool
Mark Rebeiz
University of Pittsburgh
in situ
in fly embryos
Insert into pHStinger to see
expression is
What does GenePalette do?
• Load genome sequences from any
genome annotated in GenBank on any
computer platform (Windows, Mac, Linux)
• Design primers, search for motifs, look at
restriction sites
• Evolutionary comparisons of DNA
• Prepare “to scale” diagrams of gene
structure for presentations and publication
Enter a query to GenBank
Select genes to work with from t
chromosomal region of interest
The region is loaded into a fully integrated
interface, where every element is
Search for motifs (restrictions sites, primers, transcription
factor binding sites) within the loaded sequence to visualize
Design primers for PCR by simply selecting a
region of DNA
Phylogenetic Footprinting
Regions that could be important for binding are often evolutionarily conserved
Phylogenetic footprinting is
laborious by hand
• Alignments of non-coding sequences
are difficult, since there are lots of
insertions/deletions (“indels”)
• Often, binding sites are conserved, but
not much else is
• The methods for automating this
process are clumsy
GenePalette in the literature
Potential Projects
• Update the interface, make components
easier to use
• Automate the acquisition of orthologous
sequences from databases
• Improve accuracy and speed of
algorithm for sequence alignment
Full text description
In the post-genomic era, the analysis of genomic sequence is a constant experimental need.
A particularly challenging issue is determining the function of non-coding sequences that
control when and where each gene is transcribed. Currently, a limited number of tools are
available for aligning and visualizing regulatory sequence motifs in genomic DNA.The
GenePalette software tool is a program written in the Rebeiz Lab at the University of
Pittsburgh to handle this need. Coded in Java, this program allows users to download
genome sequences from a database, and visualize features within the sequence using a
graphical interface.
Several independent improvements could be the focus of a capstone project:
(1) Update the GUI to make it more user friendly. The software is used by many researchers
(several thousand registered users) who are not necessarily computer savvy. Thus,
improvements that facilitate logical use of components would greatly improve the software’s
utility to researchers
(2) Streamline the acquisition of orthologous sequences from various databases. The
software was originally designed to access GenBank, a fairly generic repository for DNA
sequence data. However, several other extremely useful resources, such as ENSEMBL and
UCSC are now available. In particular, the UCSC database contains a "concordance map”
that allows users to find orthologous genomic sequences. This project would involve
implementing an interface within the software to use these resources.
(3) Improve the sequence alignment algorithm. To compare and contrast evolutionary
conservation or lack thereof, the software implements a sequence alignment algorithm that
finds unique “words” of defined length that are identical between multiple sequences. These
“landmarks” allow the user to assess whether individual motifs are conserved among
species. The current algorithm is a slow “brute force” algorithm. This project would be to
improve this algorithm to make it faster and more robust.