- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

Weixi Zhong
Mentor: Dr. Andrew Cameron
Center for Computational Regulatory Genomics
California Institute of Technology
 Set
up an accessible database for
E. tribuloides transcriptome




Compare the quality of Eucidaris
tribuloides RNA sequence assemblies
Choose best assembly
Create sequence database
Create web interface to access database



Facilitate future E. tribuloides gene studies
Share findings on E. tribuloides transcriptome
Extensions after further research (i.e. more search
options, feedback, etc.)
1. Image courtesy of http://www.peteducation.com/
Image 1.
 Strongylocentrotus
purpuratus
Image 2.


Only Echinoderm with fully sequenced genome
Evolutionarily closer to humans than many other
model organisms used in developmental biology
 Eucidaris

tribuloides
Distant relative of S. purpuratus (~275 my)
 Useful in comparative studies
2. Image courtesy of SpBase (http://www.spbase.org/)
E. tribuloides
S. purpuratus
Image 3.
Gene regulatory differences?
*Red arrows point to mesenchyme cells, which develop later in E.
tribuloides than other sea urchins; circles indicate location of blastopore
3. Image courtesy of http://www.palaeos.com/
Microscope images of sea urchin gastrula courtesy of Dr. Andrew Cameron
Image 3.
 No

available E. tribuloides genome
Assemble transcriptome:
Early Et gastrula
Quality
comparison!
RNA
Velvet
assembly
Database
Expression studies
cDNA
Solexa
reads

High-throughput short read sequencing
technology
cDNA
Sequenced
reads
.
:
A
G
G
T
C
T
T
A
C
.
:

De novo genome assembly software developed by Daniel
Zerbino and Ewan Birney at the European Bioinformatics
Institute (EMBL-EBI) in UK
SOLEXA
reads
Contigs

A G C A T A C C T G T A A
Contig – sequence of a set of contiguous overlapping
reads

Contigs from a single velvet run assumed to be unique and
non-overlapping
Information from http://www.ebi.ac.uk/~zerbino/velvet/
Assess quality of assembly using length
distribution: n50 and 90% complexity calculations
N50—length of shortest contig such that the summed length of
equal or longer contigs constitute at least 50% of the total length
of all contigs*
 90% complexity—similar, assuming unique contigs
Weighted length frequency of contigs

Weighted frequency (total # of
nucleotides in contigs of given length)

2500000
2000000
1500000
1000000
500000
0
0
100
n50
200
300
400
500
600
700
800
900
Length (nucleotides)
*n50 definition based on definition by Jeremy Leipzig
(http://jermdemo.blogspot.com/2008/11/calculating-n50-from-velvet-output.html)
1000
 Use


S. purpuratus proteome as reference
Map contigs to proteome
Using proteome “removes” silent mutation
differences between genes
S. purpuratus: CTC-ATG-TAC-TTC-GAG-GGA-TGC-TTG-AAG
GLEAN3_00299: LEU-MET-TYR-PHE-GLU-GLY-CYS-LEU-LYS
E. tribuloides: TTG-ATG-TAT-TTT-GAA-GGA-TGC-CTG-AAA

Record metadata : count of matches, annotated
matches, unique matches
 Create





User information table
Contig information table
Gene information table
Contig-gene match information table
Sequences
 Write


database using PostgreSQL
webpage to access database
Ability to search using both species
Display in text and graphical formats
Search
Sp genome
search
results
Change
display
order
Change
display
order
Et contigs
search
results
Gene match
information
graphical
display
Tabular
display
Contig
information
popup
Search
history
Match
details
popup
Eucidaris tribuloides RNA Sequence Database
 Conduct
 Share
 Add
research using database information
data with researchers through website
functionality to website as research
findings evolve
 Special

The SoCalBSI faculty and staff



Dr. Jamil Momand, Dr. Sandy Sharp, Dr. Nancy
Warter-Perez, Dr. Wendie Johnston, Dr. Beverly
Krilowicz, and Ronnie Cheng
My mentor: Dr. Andrew Cameron
The CCRG staff


thanks to:
Autumn Yuan, Dong He, Dave Felt
All the SoCalBSI interns
 Funded
by:
Choose search criterion
Narrow
Searchdown
using
Enter
Examples
search
results
either species
terms
Sp
Official
Link
gene
to result
identifier
name
page
with
forfor
link
this
all
tosequence
Et
match
inthat
the match
page
Sp genome
to this gene
Link
to SpBase
page
forcontigs
thisdisplay
gene
Corresponding
Contig
Topname,
SPU
Blastx
genes
Contig
link
matches
score
iflength
existent,
to popup
and
for e-value
this
with
withcontig
link
contig
to display
information
page
Contig name
Contig length
Contig coverage
Contig sequence
Gene name
Basic gene
information
and links to more
comprehensive
Change webpages
display format
Alignment
Link to popup with detailed alignment
Link to popup
Tabulated alignment
with detailed
summary
alignment
Change display format
Alignment
details
Contig information