Protein Modularity and Evolution: An examination of

Download Report

Transcript Protein Modularity and Evolution: An examination of

Protein Modularity and Evolution:
An examination of organism complexity via
protein domain structure
Presented by
Jennelle Heyer and Jonathan Ebbers
December 7, 2004
Presentation Outline
• Background Material
- Protein Evolution, Theory of Domains,
Gene Number
• Hypothesis
- Using a model protein family
• Procedure/Methods
- DPIP Program, Phylogenic Analysis
• Results
• Discussion/Conclusions
Theories of Protein Evolution
A long time ago, in the primodial
soup of life, small polypeptides
began to form…
HDLC or TCP or….
HDLC or TCP or….
HDLC + TCP = HCLCTCP
HDLC + TCP = HCLCTCP
HCI*CTCP + TCP…
HCI*CTCP + QZX…
Functional proteins
Functional proteins
Concept of Modularity
• Proteins consist of one or more domains
that were pieced together over time
• Domain  building blocks of proteins
– Defined as “spatially distinct structures
that could conceivably fold and function in
isolation” (Pontig and Russell, 2002)
– Dictate the function of the protein
– Evolutionary pressure to conserve
(sequence and/or structure)
Organismal Complexity
• The nematode, C.
elegans, has 19,500
genes in its genome
• Humans have between
20,000 and 25,000
genes in their genome
• HOW CAN THAT BE?
• Alternate splicing, multi-functional/network proteins
Hypothesis
• Gene products, proteins, can be multifunctional with the introduction of domains
• “…evolution does not produce innovation from
scratch. It works on what already exists, either
transforming a system to give it a new function or
combining several systems to produce a more complex
one” (Jacob, 1946)
• More complex or phylogenetically derived
organisms produce proteins with greater
domain complexity
Hypothesis Part II
• Create a protein domain “tool”
–
–
–
–
–
Position
Partner domain
General organization
Protein evolution
Using a variety of sequenced genomes
• Allow investigators to learn about domain of
interest and apply to research
Kinesins: A model protein family
http://www.mb.tn.tudelft.nl/projects/
• Motor proteins found in
eukaryotic organisms
• Contain a conserved
motor domain
• Bind and walk along
microtubules
• Can carry a variety of
“cargo”
• May contain multiple
domains
Kinesins: A model protein family
From Reddy and Day, 2001
• Arabidopsis thaliana, a model plant species,
contains 61 kinesins
• S. pombe – 10, C. elegans – 22, Drosophlia – 25,
Human and mouse ~ 45
Programming Approach
• Two programs used, BLAST and InterProScan,
held together with perl scripts
• Give a domain sequence to PSI-BLAST, which
will identify proteins that have that domain.
• One by one, give those protein sequences to
IPR, which identifies domains in the protein.
• Create a listing of proteins and map the data
into a phylogeny.
• Create a tree based on the phylogeny and
domains
Program Flowchart
Domain
Sequence
BLAST
List of proteins
with similar domains
InterProScan
List of domains in
every protein
Maketree
Tree
(includes domains)
Program Details
• Database selection:
– BLAST: Refseq over nr
– InterProScan: SMART database, only
• Threshold values:
– BLAST: Option to change, improve resolution
– InterProScan: E-value at 0.99, up from 0.01
• Used Arabidopsis sequences as a control
• Name: DPIP (Domain Placement in Proteins)
Results
• A Quick Look at the Data
• Phylogenetic Approach
– Hypothesis I
• Qualitative Approach
– Hypothesis II
A Quick L
k
Phylogenetic Approach
• “More complex or phylogenically derived
organisms produce proteins with greater
domain complexity”
• Trace domain characteristics on a preset
tree
– Use MacClade tree drawing software
– Uses input data to create most parsimonious
trace
• Characteristics: Maximum # domains
Unique domains
Maximum # of Domains per Protein
Green = 1
Black = 3
Number of Unique Domains per Organism
Blue = 1
Pink = 2
Dk. Blue = 3
Yellow = 5
Black = 6
Dash - ???
Phylogenetic Conclusions
• Inconclusive or null hypothesis supported
• Possible explanations:
– Kinesins may have limited domain complexity due
to function or folding
– Inherent bias in DPIP (refseq database)
• Future Work:
– Testing other domains through same process
– Updating database
– Include measure for position (N/I/C)
Qualitative Approach
• Create a protein domain “tool”
–
–
–
–
–
Position
Partner domain
General organization
Protein evolution
Using a variety of sequenced genomes
• Compile data into a more
informative table
- Can I trace domain or protein evolution??
Presence of FHA/PH domain in kinesins
Yellow – Absent
Blue - Present
Conclusions
• DPIP program was created to answer two
questions:
– Does organismal complexity correspond with
protein complexity?
– Can we create a tool for researched to better
understand domain in protein families?
• For kinesins motor domains: No and Yes
• For other domains:????
Thanks to Webb Miller, Richard Cyr
Claude DePamphillis, Alexander
Richter, Plant Physiology, Biology,
and Bioinformatics Depts.