Meynert1-2 - Computing Science
Download
Report
Transcript Meynert1-2 - Computing Science
RNA-RNA
interaction
A biological crash course and
introduction to prediction
methods
Part I – Biological crash course
Bacteria
Plasmid
copy control
Post-segregational killing systems
trans-encoded chromosomal RNAs
RNA interference (gene silencing)
Translation regulation
C.
elegans developmental regulation
miRNA-miRNA interactions
Human telomerase
DNA vs. RNA
Bases
#Strands Structure
DNA
A,C,G,T 2
Double helix
RNA
A,C,G,U 1 or 2
Stem-loop,
pseudoknots, etc.
Gene expression
Central dogma of
molecular biology
Translation
mRNA -> protein via triplet code
What happens if mRNA is destroyed or
otherwise can’t be translated?
Bacteria backgrounder
Single-celled organisms
Prokaryotes = no nucleus
Multi-cistronic transcripts -> multiple
genes transcribed at one time, often with
overlapping reading frames
Bacterial genetic information
Bacterial chromosome (1)
Genome
of organism
Required for life
Plasmids (2)
Circular
DNA molecules
Double-stranded
Independently self-replicating
Not required for life, often confer selective
advantage such as antibiotic resistance
Plasmid replication
(1),(2) – Genes encoded on plasmid
(3) – Origin of Replication (ORI)
Plasmid copy control
Recall independent self-replication
Copy number fluctuations are unavoidable
Too many -> “runaway”, host dies
Too few -> increased risk of plasmid loss
Problem: How to control copy count?
Solution: negative feedback loop mediated
by RNA-RNA interaction
R1 copy control
Genes:
– origin of replication
repA – lots of this protein product is required
for replication initiation
tap – translation of protein product is required
for translation of repA protein
copA – product is antisense RNA
copB – product is a repressor protein (not
covered here)
oriR1
R1 copy control (2)
copA – RNA with stem-loop structure
copT – target segment of repA/tap mRNA,
also forms a stem-loop structure
Single loop-loop interaction
R1 copy control (3)
R1 copy control (4)
copA RNA is unstable; it degrades
If not enough plasmids are producing copA
antisense RNA (copy number is too low),
more repA protein can be produced
Therefore the plasmid can replicate
Post-segregational killing systems
Plasmid self-preservation mechanism
Bacterial host losing plasmid results in
host death
R1 plasmid hok/sok system is the
prototype
All such systems work similarly
R1 hok/sok system
hok/sok locus encodes:
protein – “host killing”
Overlapping reading frame – mok –
“modulator of killing”
sok RNA – “suppressor of killer”
hok
mok must be translated for hok to be
expressed
mok cannot be translated if sok is present
R1 hok/sok system (2)
hok mRNA is extremely compact
Many
stem-loop structures
Flush 5’ – 3’ pairing
Highly stable -> long half-life
Translationally inert
mok segment is both:
Translationally
active
Able to bind sok inhibitor RNA
R1 hok/sok system (3)
sok RNA is highly unstable
Bacteria with R1 have lots of sok produced
sok
binds mok, hok is not translated
Bacteria which lose R1 have:
Lots
of stable hok mRNA
Quickly degrading sok RNA (low stability)
No new sok RNA being produced
hok is translated -> bacteria dies
Bacterial chromosomes
Plasmid antisense RNAs are generally cisencoded
Implies
complete Watson-Crick
complementarity
Bacterial chromosomes contain transencoded antisense RNAs
Not
necessarily complete complementarity
Often stress-related control systems
oxyS/fhlA in E. coli
oxyS – RNA transcript
induced by stress
fhlA – transcriptional
activator site
oxyS/fhlA complex
binds via two loop-loop
interactions
RNA interference (RNAi)
a.k.a. post-transcriptional gene silencing
Double-stranded RNAs are introduced into
the cell
Complementary
to mRNA for a gene
Directly introduced in a wet lab, or
Produced by the cell itself
RNA interference (2)
dsRNAs are cleaved into 21-23 nt
segments (“small interfering RNAs”, or
siRNAs) by an enzyme called Dicer
RNA interference (3)
siRNAs are incorporated into RNAinduced silencing complex (RISC)
RNA interference (4)
Guided by base complementarity of the
siRNA, the RISC targets mRNA for
degradation
RNA interference – why?
Studying gene function
Knock
out or inhibit a gene’s normal function
Can the organism survive?
What phenotypic changes are observed?
Therapeutic suppression
E.g.
cancer treatment
micro RNA (miRNA)
Gene expression regulation
Created by similar process to siRNA
Generally prevents binding of ribosome
Ex: C. elegans development
lin-4 and let-7 antisense RNAs
Regulate larval development in C. elegans
One of the two binding sites for lin-41 and
let-7 interaction:
Human telomerase
Telomerase = ribonucleoprotein complex
Ribo
= ribosomal/RNA association
Nucleo = nuclear localization
Protein = contains a protein
Responsible for maintaining telomere
length in eukaryotic chromosomes
Main components:
Telomerase
reverse transcriptase
Human telomerase RNA (hTR)
Human telomerase (2)
Reverse transcriptase
Transcribes
RNA to DNA (rather than the
usual DNA to RNA)
Telomeres – repeated regions at the end
of eukaryotic chromosomes
hTR is the template for the repeated
region
Human telomerase (3)
hTR 11-nt templating region consists of:
Repeat
template: CUAACCC
Alignment domain: UAAC
Positions telomerase on the DNA strand
Provides template for repeat region
Human telomerase (4)
Loop-loop interaction
Sometimes referred to as “kissing loops”
Recall that all of the RNA-RNA interaction
discussed so far (excepting RNAi), involve
loop-loop interaction
Predicting miRNA transcripts and targets
involves loop structure prediction
References
Couzin, J. (2002) “Breakthrough of the year – Small RNAs
make big splash.” Science 298(5602):2296-2297.
Lai, E.C., Wiel, C., and Rubin, G.M. (2004)
“Complementary miRNA pairs suggest a regulatory role
for miRNA:miRNA duplexes.” RNA 10(2):171-175.
Moss, E.G. (2001) “RNA interference – It’s a small RNA
world.” Current Biology 11(19):R722-775.
Sharp, P.A. (2001) “RNA interference – 2001.” Genes and
Development 15(5):485-90.
Shi, Y. (2003) “Mammalian RNAi for the masses.” TRENDS
in Genetics 19(1):9-12.
References (2)
Ueda, C.T., and Roberts, R.W. (2004) “Analysis of a longrange interaction between conserved domains of human
telomerase RNA.” RNA 10(1):139-147.
Wagner, E.G.H. and Flärdh, K. (2002) “Antisense RNAs
everywhere?” TRENDS in Genetics 18(5):223-226.
Wagner, E.G.H., Altuvia, S., and Romby, P. (2002)
“Antisense RNAs in bacteria and their genetic elements.”
Advances in Genetics 45:361-398.
Part II – Prediction
Identifying effective siRNAs
Neural
network approach
Identifying targets
Mammalian
miRNA target prediction
Prediction of siRNAs
Sequence properties that make a good
antisense RNA an effective gene inhibitor
are not well understood
Most computational models consider only:
RNA structure
prediction
Motif searches
Neural net approach
Training set: 490 known siRNA molecules
Input parameters:
Base
composition
mRNA:siRNA binding energy properties
3’ and 5’ binding energy
Structure of siRNA (hairpin energy and
quality)
Target function: efficacy
Neural net approach (2)
Neural net results
14 inputs, 11 hidden units, 1 output
Success rate of 92%
Average prediction of 12 effective siRNAs
per 1000 base pairs
Stringent (high specificity)
Good for designing siRNAs for RNAi
Prediction of miRNA targets
Mammals/vertebrates
Lots
of known miRNAs
Mostly unknown target genes
Initial method outline
Look
at conserved miRNAs
Look for conserved target sites
micro RNAs in animals
0.5-1.0% of predicted genes encode
miRNA
One
of the more abundant regulatory classes
Tissue-specific or developmental stagespecific expression
High evolutionary conservation
micro RNAs in plants
Finding targets in plants is relatively easy
Look for mRNA transcripts with nearperfect complementarity to known miRNAs
Signal-to-noise ratio exceeds 10:1 for
Arabidopsis (model plant organism)
Naïve approach in C. elegans and D.
melanogaster? No more hits than
expected by random chance!
So what can we use?
Pairing to nucleotides 2-8 at the 5’ end of
the miRNA
Target
recognition
Target regions enriched for genes involved
in transcriptional regulation
Goals for algorithm
Predict 100s of miRNA targets
Estimate false-positive rates
Provide computational and experimental
evidence of authenticity
Identify common functionality classes
other than transcriptional regulator genes
TargetScan
Algorithm developed by Lewis et al 2003
Input:
miRNA that
is known to be conserved across
multiple organisms
Orthologous 3’ UTR sequences
Cut-off values for two parameters
Value for one free parameter
Output:
Ranked
list of candidate target genes
TargetScan (1)
Search UTRs in one organism
2-8 from miRNA = “miRNA seed”
Perfect Watson-Crick complementarity
No wobble pairs (G-U)
7nt matches = “seed matches”
Bases
TargetScan (2)
Extend seed matches
Allow
G-U (wobble) pairs
Both directions
Stop at mismatches
TargetScan (3)
Optimize basepairing
Remaining
3’ region of miRNA
35 bases of UTR 5’ to each seed match
RNAfold program (Hofacker et al 1994)
TargetScan (4)
Folding free energy (G) assigned to each
putative miRNA:target interaction
Ignores initiation free energy
RNAeval (Hofacker et al 1994)
TargetScan (5)
Z score for each UTR (no match -> Z=1.0)
n
Z e
Gk / T
k 1
n = number of seed matches in UTR (may be more than one)
Gk = free energy of miRNA:target site interaction of kth seed match
T = parameter influencing relative weighting of UTRs with few high
affinity target sites against UTRs with lots of low affinity target sites
(experimentally determined)
TargetScan (6)
Order UTRs by Z score
Assign rank to each UTR
Repeat this process for each of the other
organisms with UTR datasets
TargetScan (7)
UTR i is a predicted target if for all
organisms:
Zi ZC
Ri RC
Datasets
nrMamm (mammalian – 79 sequences)
Homologs
in human, mouse, and pufferfish
Identical between human and mouse, not
necessarily pufferfish (fugu)
nrVert (vertebrate – 55 sequences)
Identical
between human, mouse, and fugu
Non-redundant: if multiple miRNAs had
the same seed, one representative chosen
Sample program flow
Results for nrMamm
nrMamm searched against human,
mouse, and rat orthologous 3’ UTRs
451 miRNA:target interactions predicted
for 400 unique genes
Average 5.7 targets per miRNA
Signal:noise ratio of 3.2:1
Results for nrVert
Additional search against fugu UTRs
Signal:noise ratio improves to 4.6:1
Relaxed cut-off values
115 predicted miRNA:target interactions
for 107 unique genes
2.1 putative targets per miRNA
Signal:noise ratio calculation
Signal = number of predicted targets from
nrMamm dataset
Noise = number of predicted targets from
randomly shuffled miRNAs
Shuffled control sequences screened to
ensure preservation of relevant features –
don’t underestimate the noise!
Screening control sequences
Features to consider:
Expected
frequency of seed matches
Expected frequency of matching to 3’ end of
miRNA (after seed extension)
Observed count of seed matches in UTR
datasets
Predicted free energies for seed:match
interactions
Signal:noise results
Filled bars are for
authentic miRNAs
Open bars show the
mean and standard
deviation for shuffled
sequences
nrMamm set used for
first two, nrVert used
for set including fugu
Biological relevance
Hypothesis: 5’ conservation of miRNAs is
important for mRNA target recognition
Highest
signal:noise ratio observed when
seed positioned close to 5’ end
Hypothesis: highly conserved miRNAs are
more involved in regulation
High
degree of conservation -> more
predicted targets
Membership in large miRNA family -> more
predicted targets
Experimental verification
15 predicted target sites chosen
All
with known biological function
Representative of the entire list of candidates
11 target sites confirmed
Expression
of upstream ORF influenced
27% false positives – close correspondance
to predicted 30% false positives
References
Chalk, A.M. and Sonnhammer, E.L.L. (2002)
“Computational antisense oligo prediction with a neural
network model.” Bioinformatics 18(12):1567-1575.
Hofacker, I.L., Fontanta, W., Stadler, P.F., Bonhoeffer, S.,
Tacker, M., and Schuster, P. (1994) “Fast folding and
comparison of RNA secondary structures.” Monatshefte
fur Chemie 125:167-168.
Lewis, B.P., Shih, I., Jones-Rhoades, M.W., and Bartel,
D.P. (2003) “Prediction of mammalian microRNA
targets.” Cell 115(7):787-798.