Smith Waterman vs Blast in siRNA Design

Download Report

Transcript Smith Waterman vs Blast in siRNA Design

Smith-Waterman vs Blast in
siRNA Oligonucleotide Design
and Selection
Christine Lee
Dr. Cecilie Boysen, Ph.D.
Paracel, Applied High Performance Computing
Southern California Bioinformatics Institute
Summer 2004
Funded by the National Science Foundation and National Institute of Health
Outline






History of RNAi
Small interfering RNA (siRNA) Mechanism
siRNA design and selection
Blast vs Smith-Waterman
Project Objectives and Results
Conclusions & Future Work
History of RNAi



Discovered in 1998 by Andrew Fire, Craig
Mello, and colleagues
RNAi – silencing of gene expression by
dsRNA molecules
Organism used: Caenorhabditis elegans
Short interfering RNA (siRNA)
Mechanism
http://www.bioteach.ubc.ca/MolecularBiology/AntisenseRNA/siRNA.gif
siRNA Selection & Design:
Avoiding Cross-Hybridization



Important to guard against strong crosshybridization to other genes
Cross-hybridization with non-specific targets
results in wasted lab time and materials, as well
as inaccurate conclusions
Preliminary sequence analysis allows verification
of candidate oligos to protect against crosshybridization
siRNA Selection & Design

Hybridization concerns:
siRNA mismatch tolerance
 Insertion/deletion vs mismatch

Query:
Sbjct:
Query:
Sbjct:
1 GAACTTATCTTCCTTCTTC 19
|||||||||||||||||||
3783 GAACTTATCTTCCTTCTTC 3801
19 GAAGAAGGAAGATAAGTTC 1
||||||||| || ||||||
778 GAAGAAGGATGAGAAGTTC 796
Blast vs Smith-Waterman


Blast may potentially miss relevant alignments
 Using word size seven, nearly 6% of all
possible alignments with three mismatches
between 21-mers will be missed
 Increasing word size or allowing more
mismatches contribute to higher rate of
missed hits
Smith-Waterman is said to have higher
sensitivity, so why not use it?
Project Objectives




Test set: 10,000 19-mer oligos/siRNAs
Test database: RefSeq
Comparison study between Blast and
Smith Waterman
15/19 -> Percent Identity threshold set to
78% … e-value adjustment from default of
10. E-value 500 used
A Closer Look at Smith-Waterman
& Blast Parameters
Algorithm
Alignment
Score/
(ID)
Param
Match
Mismatch
Smith
Waterman
Query:
29
17/19
(89%)
default
+2
-2
Blast
Query: 1
gaaagagcatctacgg 16
||||||||||| ||||
Sbjct: 2393 gaaagagcatccacgg 2378
12
15/16
(93%)
W7
e 500
Default
+1
-3
G -5
E -2
-7
Blast
Query: 1
29
17/19
(89%)
W7
e 500
G1
q2r2
+2
-2
G -1
E -2
-3
Sbjct:
19 TCACCGTAGATGCTCTTTC 1
|| |||| |||||||||||
2376 TC-CCGTGGATGCTCTTTC 2393
gaaagagcatctacggtga 19
||||||||||| |||| ||
Sbjct: 2393 gaaagagcatccacgg-ga 2376
GO/
GE
Gap
Total
-3
Smith-Waterman vs. Blast Results
Original Query Sequence: CTTTTTAACATCGACGGTC
>gi|4503928|ref|NM_002051.1| Homo sapiens
GATA binding protein 3
(GATA3), mRNA
Length = 2365
Score = 31.7 bits (38), Expect = 0.041
Identities = 19/19 (100%)
Strand = Plus / Plus
Query: 1
ctttttaacatcgacggtc 19
|||||||||||||||||||
Sbjct: 299 ctttttaacatcgacggtc 317
SWN hit-4 bin
Blast hit-1 bin W7 G1 r2 q2 e500 E2
Percent Identity: 89% ,GATA3 gene
Smith-Waterman vs. Blast Results
Original Query Sequence: AAAATACTGAGAGAGGGAG
>gi|4503928|ref|NM_002051.1| Homo sapiens
GATA binding protein 3
>gi|4503928|ref|NM_002051.1| Homo sapiens GATA binding protein 3
(GATA3), mRNA
Length = 2365
(GATA3), mRNA
Score = 31.7 bits (38), Expect = 0.041
Length = 2365
Identities = 19/19 (100%)
Strand = Plus / Plus
Score = 31.7 bits (38), Expect = 0.041
Query: 1
ctttttaacatcgacggtc 19
|||||||||||||||||||
Identities = 19/19 (100%)
Sbjct: 299 ctttttaacatcgacggtc 317
Strand = Plus / Plus
>gi|4557424|ref|NM_001248.1| Homo sapiens ectonucleoside
triphosphate
diphosphohydrolase 3 (ENTPD3), mRNA
Query: 1
Length = 2797
ctttttaacatcgacggtc 19
|||||||||||||||||||
Sbjct: 299 ctttttaacatcgacggtc 317
Score = 24.6 bits (29), Expect = 5.7
Identities = 17/19 (89%), Gaps = 1/19 (5%)
Strand = Plus / Minus
Query: 1
SWN hit-1 bin
aa-aatactgagagaggga 18
|| ||||||||| ||||||
Sbjct: 2044 aagaatactgagggaggga 2026
Blast hit-4 bin
Conclusions and Future Work



Produce more conclusive statistics for occurrences of
more accurate Smith-Waterman results
No consensus exists as to which hits are considered
dangerous or significant for cross-hybridization
Creation of a position-specific matrix



Mutation tolerance on the 5’ end
Low tolerance on the 3’ end
GU wobble
References





Novina, C and Sharp, P. The RNAi revolution.
Nature. 2004 Jul 8;430(6996):161-4.
Dorsett, Y and Tuschl, T. siRNAs: applications in
functional genomics and potential as
therapeutics. Nat Rev Drug Discov. 2004
Apr;3(4):318-29.
Snove, O Jr. and Holen, T. Many commonly used
siRNAs risk off-target activity. Biochem Biophys
Res Commun. 2004 Jun 18;319(1):256-63.
Paroo, Z and Corey, DR. Challenges for RNAi in
vivo. Trends Biotechnol. 2004 Aug;22(8):390-4.
Amarzguioui, M. et al. Tolerance for mutations
and chemical modifications in siRNA. Nucl Acids
Research. 2003; 31(2)589-595.
Acknowledgements








Dr. Cecilie Boysen (advisor) Parcel Scientific Staff
David Meyer Paracel Software Engineer
Stephanie Pao Paracel Technical Sales Engineer
Frances Tong Paracel Intern
William White Paracel Technical Writer
Southern California Bioinformatics Institute 2004 Faculty
and Staff:
Dr. Jamil Momand, Dr. Nancy Warter-Perez,
Dr. Sandra Sharp & Dr. Wendie Johnston,
& Jackie Leung
Fellow interns
NIH & NSF
Short interfering RNA
Mechanism
Post-transcriptional gene
silencing.
Novina, C and Sharp, P. The RNAi
revolution. Nature Vol 430. July 8,
2004.
Dorsett, Y and Tuschl, T.
siRNAs: applications in
functional genomics and
potential as therapeutics. Nat
Rev Drug Discov. 2004
Apr;3(4):318-29.
•Reverse genetic approaches – expensive and time consuming
•siRNA may be chemically synthesized or expressed from DNA vectors
MicroRNAs
Translational silencing.
Picture from:
Novina, C and Sharp, P. The RNAi
revolution. Nature Vol 430. July 8,
2004.



Short RNAs 19-25 nucleotides
Abundant, single stranded RNAs encoded in
genomes of most multicellular organisms: from few
thousand to 40,000 molecules per cell
Some evolutionarily conserved and
developmentally regulated
Differences between siRNA and
miRNA
siRNA



Promote the cleavage
or degradation of
mRNAs
Sense strand has
“exactly the same
sequence as the target
strand”
Target genes or
genetic elements from
which they originated
miRNA



Regulate the
expression of mRNAs;
transcription is not
impeded and mRNAs
not destroyed
Imperfect base-pairing
between mRNA targets
and miRNA
Regulate separate
genes
Interchangeability of siRNAs and
miRNAs



miRNA may act like siRNA
* perfect or near-perfect complementarity to
cellular mRNAs
Could siRNA also work like miRNA?
* synthetic siRNA partially complementary to
‘reporter’ gene inhibited its expression
Distinction between single site with almost
exact complementarity and numerous partially
complementary binding sites
Laboratory and Clinical
Applications of siRNA



In C. elegans, simple experiment: inject dsRNA,
soak in dsRNA solution, or feed with bacteria
expressing dsRNA
In worms, screening for obesity and ageing
In fruitflies, purified long dsRNA used to identify
roles of genes in cholesterol metabolism and
heart formation
 Therapeutic potential of siRNAs for humans
File
Type
Bases
Sequences
# of Oligos
BRCA1
fasta
3243
1
3255
GATA3
fasta
3070
1
3070
HLA-molecule
fasta
2918
1
2918
Insulin-likegrowth-factor
fasta
4989
1
4971
Interleukinreceptor
fasta
1451
1
1433
NFKB1
fasta
4104
1
4186
Serine kinase
fasta
3506
1
3488
Serotoninreceptor
fasta
1927
1
1909
TNF2
fasta
1669
1
1651
Vinculin
fasta
5647
1
5629
32554
10
Total
Paroo, Z and Corey, DR. Challenges for RNAi in vivo.
Trends Biotechnol. 2004 Aug;22(8):390-4.
Blast vs Smith-Waterman Speed
Test Results
346.08
350
300
250
Time in
minutes
200
205.69
SWN
Blast
150
100
50
0
46.7
11.35
Default
e500