Matrix and Gene families
Download
Report
Transcript Matrix and Gene families
Dynamic Programming
How to match up sequences and
have the matches make sense
and be quantitative
Question is
• How does a specific sequence compare
to one other specific sequence?
– Is it similar?
– If so, at what level?
• Can’t compare every base to every
other base--to complex
You are in the driver’s seat
• What is the most important?
– Exact nucleotide match?
– One-for-one (no gaps)?
– Length
Mathematical model
• Derive equation for each position,
based on your value system
• Methodically go through each base for
each sequence and calculate the value
• At the end, find the optimal path
Starting point: three possible
scenarios for each position in
sequences X and Y
• At a given position, the bases (Xm and Yn)
are identical in X and Y
• At a given position, the base (Xm) in X is
aligned with a gap in Y (and Yn appeared
earlier)
• At a given position, the base in Y is
aligned with a gap in X (and Xm appeared
earlier)
Assign a value to each
situation
• Identical: +5
• Mismatch: -2
• Insertion or deletion: -6
(Could have others; could choose
different values)
http://www.acm.org/crossroads/xrds13-1/dna.html
Alpha-glucosidase in plants:
Enzymes sharing WIDMNE signature
sequence
alpha-glucosidase (all groups)
alpha-xylosidase (plant, bacteria, archaea)
Sucrase/Isomaltase (animal)
Related sequences with broad substrate
specificity
At
XYL1
Plantae
Tm
XYL
Mj
Aglu
Fungi
Pt Aglu
Sp Aglu
St MAL2
Anig aglA
Pp BAB3946
An AgdA
So Aglu
Ca GAM1
Bv Aglu
Soc GAM1
At Aglu-1
An agdB
Hv Aglu
Tp GAA
Hs GAA
Protista
Cj GAAII
Cj GAAI
Ss xylS
Archaea
Hs S/I-N
Hs S/I-C
Bt Aglu-III
Lv GAA
Ce AAA8317
Bh BAB0442
Aa GlcA
Animalia
Sc
CAB8890
Tm
AAD3539
Lp XylQ
Bacteria
0.1
Plant -amylases are located in different
cellular compartments
Plastids (chloroplasts, amyloplasts)
Cytosol
Apoplast (cell wall space)
What is the function of the non-plastid
forms?
Clade I
Secreted
421-445 aa
Arabidopsis AMY1
barley A
rice 2A
barley B
morning glory
rice 3B
dodder
maize
adzuki bean
rice 3E
rice XP_472377
Arabidopsis AMY2
apple 10
cassava
apple 9
kiwifruit
apple 8
plantain
Clade III
Plastidic
877-906 aa
Arabidopsis AMY3
rice NP_916641
potato
Clade II
Cytosolic
407-414 aa
Homologous sequences (homologues)
Share a common ancestor
Paralogs
Homologues derived by gene duplication
Functions may vary
Look for differences
Orthologs
Homologues derived by speciation
Common function
Look for similarities
Use alignments to look for:
• Structures important for common
functions (orthologs)
• Structures important for unique
functions (paralogs)
• Unusual structures
AMY1 has a three amino acid deletion
N
AtAMY1
3
C
Barley -amylase
Red: NHDTGST
Blue: VAEIW
Active site
residues
Variation in the active site loop among
plant and bacterial -amylases
AtAMY1