Presentation (HGM2006) - Gmu

Download Report

Transcript Presentation (HGM2006) - Gmu

Whole-genome approach
to highly specific siRNA design
by CRM
(Comprehensive Redundancy Minimizer)
Algorithm
Tariq Alsheddi, Leonid Vasin &
Ancha Baranova
George Mason University
USA
Presented by Prof. Vikas Chandhoke
Introduction:
DNA
siRNA
mRNA
Protein
RNA interference (RNAi) : use of double-stranded RNA to
target specific mRNAs for degradation, thereby silencing
gene expression.
short, interfering (si) RNAs are commonly used as a tool to
analyze the functions of the genes.
Major Challenge in RNAi technology:
Off-targeting by Cross-hybridization
82% of the respondents to the survey have said that off-target
effect requires major improvement. (Genome technology 46 : Sep. 2004)
Gene 1
AG CAAAG C GAG C G C G TAAT G C GAC G T G
siRNA
TCGCTCGCGCATTACGCTGC
Gene 2
C C TAC AG C G C G TAAT G C GAC G
Gene 3
CAG C GAG C G C G TAAT G TAG T G
Gene 4
AG T TA AG C G C G TAAT G C GAAT
Gene 5
AAG C GAG C G C G TAACAC T G T C
Off – targeting leads to the misinterpreted
experiments or to the failures (or side-effects)
in the siRNA based therapy
Good:
siRNA
gene
Off-target:
siRNA
gene
Common sense of the work with
siRNA
Don’t assume that your siRNA is working •
exclusively through silencing
its intended target and nothing else. •
Assume off-target action until you’ve proven
otherwise.
Peter S. Linsley,
Vice President of Research
at Rosetta Infarmatics
Methods dealing with off-target hybridization of
siRNA are mostly based on
BLAST or Smith-Waterman algorithm.
BLAST may sacrifices some sensitivity
to gain speed.
Different BLAST search parameters
may yield very different results.
The Smith–Waterman local alignment algorithm
(exhaustive search of homologies)
may return accurate answers
but is very time-consuming to execute.
Both BLAST and Smith-Waterman algorithms
are executed as de novo searches of matches
that your pre-designed mRNA
will off-target
That means that you design your siRNA first,
then you run siRNA against transcriptome
with the hope that your siRNA will not be rejected
If siRNA is rejected,
you need to start form scratch
BOTTOM-UP approach to siRNA
design
siRNA that will efficiently silence
your gene
All possible siRNA
with minimal off-target
Complete human transcriptome
Bottom – up method (simple):
Human genome (and transcriptome) sequence is known
Retrieve sequences of all human genes one by one
Find all the places within human genes
that are represented by unique sequences
with length N (e.g. N = 14)
Mask all non-unique sequences
siRNA with minimal off-target
effects are selected by elimination
CRM
(Comprehensive Redundancy Minimizer)
Algorithm
All human mRNA are retrieved from UNIGENE
(one longest mRNA per cluster)
These mRNA targets are parsed
into overlapping kernels of length N
to create a kernel set specific for each target
(target ID is preserved in the database);
All redundant kernels •
that are present more than once in the kernel set •
are removed from the set •
Each kernel is concatenated with its suffix
according to the sequence of the original mRNA target.
L-length (N+X) siRNA candidate sequences
are created for every gene
L ( siRNA length) = N (kernel length) + X (suffix length)
When all the overlapping kernels with corresponding suffixes
are present in the table,
siRNA candidate with length L (L = N+X) is truly non-redundant
(will not produce off target hybridization with other genes
by its kernel with length N)
siRNAs
with kernel N = 14 are better than ones with N = 15 etc
Gene Id.
gi=31560634
gi=31560634
gi=31560634
gi=31560634
gi=31560634
gi=31560634
gi=31560634
gi=31560634
gi=31560634
Target
AGTACAGCTTGTTG
GTACAGCTTGTTGC
TACAGCTTGTTGCG
ACAGCTTGTTGCGC
CAGCTTGTTGCGCT
AGCTTGTTGCGCTC
GCTTGTTGCGCTCT
CTTGTTGCGCTCTG
TTGTTGCGCTCTGA
Suffix
CGCTCTG
GCTCTGA
CTCTGAA
TCTGAAT
CTGAATA
TGAATAT
GAATATA
AATATAT
ATATATT
P
73
74
75
76
77
78
79
80
81
Data Flow
Download Ref-Seq
Generate all possible ‘N’ length
sequences for each gene.
1
AA
2
AC
3
AG
16
TT
Redundant
Sort on the sequences, compare,
remove redundant and save unique
Unique
Sort the unique sequences on gene id, starting position
respectively, compare, remove sequences that do not
have the next (21-n) present in these files.
siRNA
Apply the suggested
siRNA designing rules
MOUSE
MOUSE
Percentage covered
siRNAs per gene
mouse
HUMAN
Percentage covered
HUMAN
siRNAs per gene
Transcriptome minimization step
aimed at removal of the hypothetical genes
(19.2% of human transcriptome)
proven unnecessary – gains are little)
To evaluate the potential for CRM application in siRNA design:
A number of siRNAs sequences used in various published o
experiments were run through CRM database.
The presence of overlapping kernel targets in siRNA sequences o
was analyzed at the level of uniqueness N = 15 nt
(even not very deep). o
17 paper describing 25 siRNA sequences o
used for characterization of phenotypic effects o
of the corresponding genes were studied, o
only 2 (two) sequences were shown o
to produce no significant off-target effects o
In some cases indicated effects
may potentially cloud scientific conclusions
Experiments with siRNA prove that Histone Acetyl Transferase o
Tip60 (HTATIP)
is putatively involved in the p53 response, o
but its siRNA non-specifically targets Cyclin M4, o
which plays a role in cell cycle regulation o
Target
Designed siRNA
gene
sequence
IDs
Genes offtargeted
Sequence offtargeted and length
of overlapping
kernel target
ACGGAAGGTGG
HTATIP
AGGTGGTT
Cyclin M4
(CNNM4)
GGAAGGTGGAGG
TGG (15 nt)
[Legube G, et al 2004].
Target gene
IDs
Designed siRNA
sequence
Genes offtargeted
Sequence off-targeted and
length of overlapping
kernel target
heat shock 70kD
protein 9B
(HSPA9B)
spindle
checkpoi
CGGGCATTTGAATAT
nt
GAAA
kinase
(BUB1B)
MAPK/ERK
kinase 5
(MAP2K5)
transcript
variant A
MAPK/ERK
kinase 5
(MAP2K5)
transcript
variant C
MAPK/ERK
kinase 5
(MAP2K5)
transcript
variant B
GCATTTGAATATGAA (15
nt)
In Semizarov’ dataset the predicted degree of siRNA “uniqueness”
directly correlates with its efficiency of gene silencing
best
worst
Semizarov et al., 2004
(Nucleic Acid Research)
Measured effects
of 5 different siRNAs
specific for gene RB1
In Semizarov’ dataset the predicted degree of siRNA “uniqueness”
directly correlates with its efficiency of gene silencing
Target gene IDs
Designed siRNA
sequence
Genes off-targeted
Sequence off-targeted and
length of overlapping
kernel target
baculoviral IAP repeatcontaining 3
BEST (BIRC3),
siRNA
transcript
Retinoblastoma
GTTGATAATGCTATGTC
GATAATGCTATGTCA
variant 1 by 100%)
suppressor (suppressing gene expression
AA
(15 nt)
(RB1)
IAP
repeat-containing
3
were not found in the off-target
lists at all
(BIRC3),
transcript variant 2
WORST
Retinoblastoma
ACTCTCACCTCCCATGT
suppressor
TG
(RB1)
Retinoblastoma
CACCCAGGCGAGGTCAG
suppressor
AA
(RB1)
Ca++ ATase sununit
(ATP2B4), transcript
variant 2
Ca++ ATase sununit
(ATP2B4), transcript
variant 1
ACTCTCACCTCCCAT
(15 nt)
keratin 4 (KRT4)
CTCTCACCTCCCATG
(15 nt)
keratin 12 (KRT12)
CCCAGGCGAGGTCAG
(15 nt)
Public interface that allow everybody
to search for the siRNA candidates
with minimized off-target effects
http://129.174.194.243
Database can be
searched with human
and mouse gene names
Typical CRM output
(IPO4 – importin 4)
siRNA with kernel N = 14
Authors are grateful
Acknowledgement:
to Dr. Matthias Trust
for providing HuSiDa database
)Truss et al., 2005)
The algorithm filed
to the US Patent and Trademark Office