dynamic-programming strategies
Download
Report
Transcript dynamic-programming strategies
Part II : Sequence Comparison
Multiple Sequence Alignment
By Zhiwei Cao
Dept. of Biological Science
National university of Singapore
Email: [email protected]
Pair-Wise Alignment : Two Sequences
2
Made by Cao Zhiwei
Multiple sequence alignment -- MSA
3
The multiple sequence alignment problem is to
simultaneously align more than two sequences.
Made by Cao Zhiwei
Multiple sequence alignment
4
Made by Cao Zhiwei
What is MSA: A Definition
2D table
Absolute and relative positions
Sequences
5
Residues
I
1
Y
2
D
3
G
4
G
5
A
6
V
7
---
8
E
9
A
10
L
II
Y
D
G
G
---
---
---
E
A
L
III
F
E
G
G
I
L
V
E
A
L
IV
F
D
---
G
I
L
V
Q
A
V
V
Y
E
G
G
A
V
V
Q
A
L
Made by Cao Zhiwei
Why multiple sequence alignment
1. Determine whether a group of proteins are related
2. Show regions of conservation within a protein family
sequence pattern
3.
6
Determine evolutionary history of gene families
phylogeny tree
Made by Cao Zhiwei
MSA: How to Align?
Seq1
Seq2
Seq3
AGAC
AC
AG
Seq1 AGAC
7
Seq3 AG-Made by Cao Zhiwei
Seq1
AGAC
Seq2
--AC
Seq2
AC
Seq3
AG
MSA: Some Possible Alignments
--AC
--AG
AGAC
AGAC
A--C
AG--
Seq1,2,3
AC-AG-AGAC
8
AGAC
AG---AC
AGAC
--AC
AG--
Made by Cao Zhiwei
MSA History
Until 1987 multiple alignments constructed
manually from pairwise alignments
Lipman et al. 1989 pairwise dynamic
programming approach applied to multiple
sequence alignment - MSA
http://www.psc.edu/general/software/packages/msa/msa.html
9
Made by Cao Zhiwei
Commonly Used MSA Methods
1.
Dynamic programming - extension of pairwise
sequence alignment
2.
Progressive sequence alignment - incorporates
phylogenetic information to guide the alignment process
3.
Iterative sequence alignment - correct for problems
with progressive alignment by repeatedly realigning
subgroups of sequence
10
Made by Cao Zhiwei
Progressive Method of MSA
11
Progressive alignment invented in ‘87 & ‘88 - Feng & Doolittle
1987, Higgins and Sharp 1988
Based on phylogeny
Made by Cao Zhiwei
How MSA: Progressive method
1 - Do pairwise alignment of all sequences and
calculate distance matrix
[1]
Scerevisiae
Celegans
Drosophia
Human
Mouse
12
[1]
[2]
[3]
[4]
[5]
[2]
[3]
[4]
2
0.640
0.634 0.327
1
0.630 0.408 0.420
0.619 0.405 0.469 0.289
Made by Cao Zhiwei
How MSA: Progressive method
2 - Create a guide tree based on this pairwise
distance matrix
Human
Mouse
Dmel
Cele
Scer
13
Made by Cao Zhiwei
How MSA: Progressive method
3 - Align progressively following guide tree
14
Start by aligning most closely related pairs of sequences
Gaps
At each step align two sequences or one to an existing
subalignment
Made by Cao Zhiwei
Available programs for progressive
MSA
15
CLUSTAL (Free package):
Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple
sequence alignment on a microcomputer. Gene 73,237-244.
http://www.ebi.ac.uk/clustalw/
http://clustalw.genome.ad.jp/ (origin 2)
PILEUP (part of GCG commercial package)
http://www.gcg.com
Others
Made by Cao Zhiwei
Example software---ClustalW
http://clustalw.genome.ad.jp
16
Made by Cao Zhiwei
Example Software---ClustalW (Bioedit)
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
17
Made by Cao Zhiwei
Steps To Do ClustalW:
Step 1: Prepare the sequences:
Retrieve sequences
General considerations:
1.
i.
ii.
iii.
18
The more the better
Exclude similar (>80%) sequences
Necessary modification
Made by Cao Zhiwei
Steps To Do ClustalW:
Step 2: Input the sequences:
–
–
–
19
Put all sequnces into one file Copy and
paste
Upload sequences one by one
Pay attention to sequence format
Made by Cao Zhiwei
Steps To Do ClustalW:
Step 3: Set the parameters:
Default parameters for protein alignment General
Setting Parameters:
20
Output Format: CLUSTALW
Pairwise Alignment: FAST/APPROXIMATE
Made by Cao Zhiwei
Example: SH2 domain family
21
SH2 domains function as
regulatory modules of intracellular
signalling cascades
V-Src Tyrosine Kinase
Transforming Protein
(Phosphotyrosine Recognition
Domain Sh2) Complex With
Phosphopeptide A (PDB code
1SHA):
Made by Cao Zhiwei
Input Sequences For ClustalW
22
>1SHA-A V-SRC Tyrosine kinase transforming protein (SH2 domain),
from Rous sarcoma virus
>1A81-A Chain A, Tandem Sh2 Domain Of The Syk Kinase, from Homo
sapiens
>1JWO-A Chain A, Sh2 Domain Of The Csk Homologous Kinase Chk,
from Homo sapiens
>1BLJ Nmr Ensemble Of Blk Sh2 Domain, from Mus musculus (house
mouse)
Made by Cao Zhiwei
Result 1 of ClustalW
23
Made by Cao Zhiwei
Result 2 of ClustalW
24
Made by Cao Zhiwei
Result 3 of ClustalW: N-J tree
25
Made by Cao Zhiwei
Interpret ClustalW results
1.
2.
3.
26
Three characters are used in the results 2:
'*' indicates positions which have a single, fully conserved
residue
':' indicates that 'strongly' conserved groups
'.' indicates that 'weakerly' conserved groups
Made by Cao Zhiwei
Interpret ClustalW results
Insertion and deletion, gap
Consensus
27
………………………QCGG………....G.....C …......C...........YSQC...
Consensus sequence Sequence Pattern
Made by Cao Zhiwei
Notes on how to use ClustalW
28
Remove signal peptide before alignment, try to compare
homologous portion
Sequence containing a repetitive element (such as a
domain)
Heuristic algorithm: not guaranteed for perfect
alignment
Made by Cao Zhiwei
Notes on how to use ClustalW
29
Mobilize your biological knowledge, check the
alignment and recheck the alignment
Manually re-align your sequences if it’s bad
Made by Cao Zhiwei
Application of MSA
Example: Drug discovery for SARS
Anand et al., www.scienceexpress.org //10.1126/science.1085658, published May 13, 2003
Coronaviruses are positive-stranded RNA viruses
Sequence structure function
Human coronavirus 229E: HCoV;
Porcine transmissible gastroenteritis virus: TGEV;
Mouse hepatitis virus: MHV;
Bovine coronavirus: BCoV;
SARS-associated coronavirus: SARS-CoV;
Avian infectious bronchitisvirus: IBV.
30
Made by Cao Zhiwei
Application of MSA
Example: Drug
Discovery for SARS
Anand et al.,
www.scienceexpress.org
//10.1126/science.1085658,
published May 13, 2003
31
Made by Cao Zhiwei
Summary
32
What is MSA
Why do MSA
How to do MSA
– Available computational methods
– ClustalW
– Interpret results of ClustalW
– Quality control
Application example of MSA: SARS drug
discovery
Made by Cao Zhiwei
Phylogeny tree: evolutionary history
33
Made by Cao Zhiwei