Chana & Malka present:

Download Report

Transcript Chana & Malka present:

Chani & Malki
present:
The
OdzFinder
Project adviser: Dr. Ron Wides
WANTED
Name:
Odz
a.k.a:
Ten-m
Family:
pair-rule gene
Length:
10,000 bp
Getting to Know Odz …



Discovered in D. Melanogaster in 1994
Belongs to pair rule gene family
Plays a crucial role in the CNS during fetal development
Odz protein is expressed in
neurons, developing brain and
hindgut
Odz protein is expressed in
segmentation.
The Odz Family
Odz gene orthologs have been found in 3 phylums:
Vertebrates
Ten-m1
Ten-m2
Ten-m3
Ten-m4
Ten-a
Arthropods
Nematodes
Ten-m
Ten-m
The Odz Protein
The only pair rule gene that encodes a protein!


2731 Amino Acids
Contains 3 domains:
I.
extracellular EGF-like repeats
II.
tyrosine kinase
phosphorylation sites
III. hydrophobic sequences,
probably transmembrane
sequence
EGF-like domain
Intracellular kinase substrate domain
ODZ
EGF-like Repeats
EGF-like domain:
 30 - 40 amino acid residues
 Significant homology to epidermal growth factor
(EGF)
 Has been found in single or multiple copies in a
number of other proteins
 Generally found in the extracellular domain of
membrane proteins or secreted proteins
 Involved in receptor-ligand interactions
 Includes 6 conserved cysteine residues involved
in disulfide bonds
x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x
The lab’s goals:
Genomics:
To find a broad family of Odz gene
Phylogenetic trees to discover segmentation mechanism
Massive alignment to find conserved regions
Biological in-vivo experiments to change regions
Proteomics:
The protein’s role
How the protein functions
The protein’s interactions with other proteins ( i.e : notch)
Finding Odz Genes
 BLASTing existing databases
Data
Bases
 BLASTing new EST libraries
 Extracting DNA from
various innocent creatures
Odz
DataBase
EST
Libraries
Se/uences
discovered
in the lab
Odz Database

The collected data was organized by Michal
Markovitz in a relational database.

The database consists of 10 different tables.
For example:
2 problems remained:
1. Blast results include many non
Odz hits:
•
prokaryotic hits
•
non-metazoan hits
•
EGF region hits
•
Low similarity
2. Every day…
•
New sequences are added to the
existing databases
•
New EST libraries are released
80
70
60
50
40
30
20
low score
prokaryotic
non-metazoan
Odz
Egf-like
10
0
We need a program to automatically
extract Odz hits from NCBI Blast results!!!
The
OdzFinder
A perl program that will automatically extract
Odz hits from NCBI Blast results.
S.O.F.T - screen Odz Flow Template
Blast Report
input
Tax Report
Prokaryote?
no
Metazoan?
yes
No EGF
EGF?
All EGF
Mixed EGF
Score>x?
yes
Score>x?
Combination
Evalue>y?
yes
Evalue>y?
yes
yes
Odz
UpdateDatabase
Look up table
Blast Report
input
BLASTS are performed on the Odz orthologs
The results are sent to the OdzFinder program to be filtered.
The program extracts relevant information from each hit:
>gi|163076235|gb|AC765764.7
Apis mellifera BAC clone RP11-18D7 , complete sequence
Length = 184032
Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%)
Frame = +3 / +3
Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH
Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
Blast Report
input
Tax Report
Search for eukaryotic and metazoan results.
Build prokaryotic database for possible future use.
Evolutional distance becomes relevant when dealing with EGF-like
repeats.
>gi|163076235|gb|AC765764.7
Apis mellifera BAC clone RP11-18D7 , complete sequence
Taxonomy
Report
Eukaryota .................................. 2502 hits 41 orgs [root; cellular organisms]
Length = 184032
. Bilateria
................................
2421 hits
33 orgs [Fungi/Metazoa
group; Metazoa;
Eumetazoa]
root;
cellular
organisms;
Eukaryota;
Fungi/Metazoa
group;
Metazoa;
= 153 bits
(328), Expect = 3e-36
Identities
= 59/59 (100%), Positives = 59/59 (100%)
.Score
. Coelomata
..............................
2396 hits
31 orgs
Eumetazoa;
Bilateria;
Coelomata;
Protostomia; Panarthropoda; Arthropoda;
. . . Deuterostomia
........................
2322 hits 23 orgs
Frame = +3 / +3
. . . . Chordata ...........................
2296 hits 22
orgs
Mandibulata;
Pancrustacea;
Hexapoda;
Insecta; Dicondylia; Pterygota;
Query:
3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH
. . . . . Euteleostomi
..................... 2236 hits 21 orgs [Craniata; Vertebrata; Gnathostomata;179
Teleostomi]
Neoptera;
Endopterygota;
Hymenoptera;
Apocrita;
Aculeata;
Apoidea; Apidae;
. . . . . . Tetrapoda ...................... 2022 hits 14 orgs [Sarcopterygii]
IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH
. . . . . . . Amniota ...................... 1908 hits 12 orgs
Apinae;
Apini;................... 1634 hits 10 orgs [Mammalia; Theria]
IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH
179
.Subjct:
. . . . . .3. Eutheria
Apis
The program will receive the BLAST hit’s Taxonomy Report and manipulate it
into a manageable hash table.
A default Taxonomy Report will be available when BLASTing against ESTs.
EGF?
Tenascin-m (odz) includes 8 EGF-like repeats
The conserved EGF region gave problematic results.
Many hits appear only due to their similarity to the EGF region.
Query :
Subject
:
High score!!!
There are three possible positions regarding the hit’s
relation to the query’s EGF-like region -
I.
The hit is completely inside the query’s EGF-region
525
804
2750
Query
Hit
II. The hit is completely outside the query’s EGF-region
Query
525
804
Hit
III. The hit is partially in the query’s EGF-region
Query
Hit
525
804
Get a better picture ..
Position I : No EGF
The hit is completely outside the query’s EGF-like region

score & e-value are examined

Set low threshholds to ensure that very small hits are not missed - some times
they are translocations
Score>x?
yes
Evalue<y?
yes
Odz
Position II :
All EGF
The hit is completely inside the query’s EGF-like region
In order to prevent acceptance of non-odz hits
with high scores due to their egf-region , a look up
table was established
Score>x?
yes
evolutionally close query & subject
evolutionally distant query & subject
high id % demanded
low id % demanded
Evalue>y?
yes
Look up table
?
Look up table example:
Odz
Query
Hit
Odz Ortholog
Odz Paralog
Mus Musculus
Homo Sapiens
95%
70%
Mus Musculus
Drosophila
Melanogaster
75%
55%
Position III : Mixed EGF
The hit is partially inside the query’s EGF-like region
2 Possibilities:
A. False call ! An EGF hit with insignificant similarity outside of EGF-domains.
B. The Real Thing ! EGF with adjacent regions of significant similarity.
Is it more like A or like B?
A
Treat like II
B
Treat like I
Update Database
DBI

A database interface module for Perl

Enables Perl applications to access multiple database types

Provides a consistent database interface independent of the actual
database being used
:Data flow through DBI
Perl Script
DBI
DBD::MSQL
MySQL
RDBMS
Results!
gi
score
species
49256537
140
Xenopus
48096180
637
Apis mellifera
45382362
619
Gallus gallus
42658224
125
Homo sapiens
34932761
384
Rattus
norvegicus
38087011
463
Mus musculus
45446084
419
Drosophila
melanogaster
32565715
1604
Caenorhabditis
elegans
41469033
760
Gasterosteus
aculeatus
Odz
Odz
not Metazoa
Metazoa
not
EGF
EGF
Prokaryotic
Prokaryotic
Special thanks to our project adviser
Dr. Ron Wides
For his guidance, patience & Krispy Kreme donuts