dynamic-programming strategies

Download Report

Transcript dynamic-programming strategies

Part II : Sequence Comparison
Multiple Sequence Alignment
By Zhiwei Cao
Dept. of Biological Science
National university of Singapore
Email: [email protected]
Pair-Wise Alignment : Two Sequences
2
Made by Cao Zhiwei
Multiple sequence alignment -- MSA

3
The multiple sequence alignment problem is to
simultaneously align more than two sequences.
Made by Cao Zhiwei
Multiple sequence alignment
4
Made by Cao Zhiwei
What is MSA: A Definition
2D table

Absolute and relative positions
Sequences
5
Residues

I
1
Y
2
D
3
G
4
G
5
A
6
V
7
---
8
E
9
A
10
L
II
Y
D
G
G
---
---
---
E
A
L
III
F
E
G
G
I
L
V
E
A
L
IV
F
D
---
G
I
L
V
Q
A
V
V
Y
E
G
G
A
V
V
Q
A
L
Made by Cao Zhiwei
Why multiple sequence alignment
1. Determine whether a group of proteins are related
2. Show regions of conservation within a protein family
 sequence pattern
3.
6
Determine evolutionary history of gene families
 phylogeny tree
Made by Cao Zhiwei
MSA: How to Align?
Seq1
Seq2
Seq3
AGAC
AC
AG
Seq1 AGAC
7
Seq3 AG-Made by Cao Zhiwei
Seq1
AGAC
Seq2
--AC
Seq2
AC
Seq3
AG
MSA: Some Possible Alignments
--AC
--AG
AGAC
AGAC
A--C
AG--
Seq1,2,3
AC-AG-AGAC
8
AGAC
AG---AC
AGAC
--AC
AG--
Made by Cao Zhiwei
MSA History

Until 1987 multiple alignments constructed
manually from pairwise alignments

Lipman et al. 1989 pairwise dynamic
programming approach applied to multiple
sequence alignment - MSA
http://www.psc.edu/general/software/packages/msa/msa.html
9
Made by Cao Zhiwei
Commonly Used MSA Methods
1.
Dynamic programming - extension of pairwise
sequence alignment
2.
Progressive sequence alignment - incorporates
phylogenetic information to guide the alignment process
3.
Iterative sequence alignment - correct for problems
with progressive alignment by repeatedly realigning
subgroups of sequence
10
Made by Cao Zhiwei
Progressive Method of MSA
11

Progressive alignment invented in ‘87 & ‘88 - Feng & Doolittle
1987, Higgins and Sharp 1988

Based on phylogeny
Made by Cao Zhiwei
How MSA: Progressive method
1 - Do pairwise alignment of all sequences and
calculate distance matrix
[1]
Scerevisiae
Celegans
Drosophia
Human
Mouse
12
[1]
[2]
[3]
[4]
[5]
[2]
[3]
[4]
2
0.640
0.634 0.327
1
0.630 0.408 0.420
0.619 0.405 0.469 0.289
Made by Cao Zhiwei
How MSA: Progressive method
2 - Create a guide tree based on this pairwise
distance matrix
Human
Mouse
Dmel
Cele
Scer
13
Made by Cao Zhiwei
How MSA: Progressive method
3 - Align progressively following guide tree
14

Start by aligning most closely related pairs of sequences

Gaps

At each step align two sequences or one to an existing
subalignment
Made by Cao Zhiwei
Available programs for progressive
MSA
15

CLUSTAL (Free package):

Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple
sequence alignment on a microcomputer. Gene 73,237-244.


http://www.ebi.ac.uk/clustalw/
http://clustalw.genome.ad.jp/ (origin 2)

PILEUP (part of GCG commercial package)

http://www.gcg.com

Others
Made by Cao Zhiwei
Example software---ClustalW
http://clustalw.genome.ad.jp
16
Made by Cao Zhiwei
Example Software---ClustalW (Bioedit)
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
17
Made by Cao Zhiwei
Steps To Do ClustalW:
Step 1: Prepare the sequences:

Retrieve sequences
General considerations:
1.
i.
ii.
iii.
18
The more the better
Exclude similar (>80%) sequences
Necessary modification
Made by Cao Zhiwei
Steps To Do ClustalW:
Step 2: Input the sequences:
–
–
–
19
Put all sequnces into one file Copy and
paste
Upload sequences one by one
Pay attention to sequence format
Made by Cao Zhiwei
Steps To Do ClustalW:
Step 3: Set the parameters:

Default parameters for protein alignment General
Setting Parameters:


20
Output Format: CLUSTALW
Pairwise Alignment: FAST/APPROXIMATE
Made by Cao Zhiwei
Example: SH2 domain family
21

SH2 domains function as
regulatory modules of intracellular
signalling cascades

V-Src Tyrosine Kinase
Transforming Protein
(Phosphotyrosine Recognition
Domain Sh2) Complex With
Phosphopeptide A (PDB code
1SHA):
Made by Cao Zhiwei
Input Sequences For ClustalW




22
>1SHA-A V-SRC Tyrosine kinase transforming protein (SH2 domain),
from Rous sarcoma virus
>1A81-A Chain A, Tandem Sh2 Domain Of The Syk Kinase, from Homo
sapiens
>1JWO-A Chain A, Sh2 Domain Of The Csk Homologous Kinase Chk,
from Homo sapiens
>1BLJ Nmr Ensemble Of Blk Sh2 Domain, from Mus musculus (house
mouse)
Made by Cao Zhiwei
Result 1 of ClustalW
23
Made by Cao Zhiwei
Result 2 of ClustalW
24
Made by Cao Zhiwei
Result 3 of ClustalW: N-J tree
25
Made by Cao Zhiwei
Interpret ClustalW results

1.
2.
3.
26
Three characters are used in the results 2:
'*' indicates positions which have a single, fully conserved
residue
':' indicates that 'strongly' conserved groups
'.' indicates that 'weakerly' conserved groups
Made by Cao Zhiwei
Interpret ClustalW results

Insertion and deletion, gap
Consensus

27
………………………QCGG………....G.....C …......C...........YSQC...
Consensus sequence Sequence Pattern
Made by Cao Zhiwei
Notes on how to use ClustalW
28

Remove signal peptide before alignment, try to compare
homologous portion

Sequence containing a repetitive element (such as a
domain)

Heuristic algorithm: not guaranteed for perfect
alignment
Made by Cao Zhiwei
Notes on how to use ClustalW
29

Mobilize your biological knowledge, check the
alignment and recheck the alignment

Manually re-align your sequences if it’s bad
Made by Cao Zhiwei
Application of MSA
Example: Drug discovery for SARS
Anand et al., www.scienceexpress.org //10.1126/science.1085658, published May 13, 2003

Coronaviruses are positive-stranded RNA viruses

Sequence structure function
Human coronavirus 229E: HCoV;
 Porcine transmissible gastroenteritis virus: TGEV;
 Mouse hepatitis virus: MHV;
 Bovine coronavirus: BCoV;
 SARS-associated coronavirus: SARS-CoV;
 Avian infectious bronchitisvirus: IBV.

30
Made by Cao Zhiwei
Application of MSA
Example: Drug
Discovery for SARS
Anand et al.,
www.scienceexpress.org
//10.1126/science.1085658,
published May 13, 2003
31
Made by Cao Zhiwei
Summary




32
What is MSA
Why do MSA
How to do MSA
– Available computational methods
– ClustalW
– Interpret results of ClustalW
– Quality control
Application example of MSA: SARS drug
discovery
Made by Cao Zhiwei
Phylogeny tree: evolutionary history
33
Made by Cao Zhiwei