Meynert1-2 - Computing Science

Download Report

Transcript Meynert1-2 - Computing Science

RNA-RNA
interaction
A biological crash course and
introduction to prediction
methods
Part I – Biological crash course

Bacteria
 Plasmid
copy control
 Post-segregational killing systems
 trans-encoded chromosomal RNAs
RNA interference (gene silencing)
 Translation regulation

 C.
elegans developmental regulation
 miRNA-miRNA interactions

Human telomerase
DNA vs. RNA
Bases
#Strands Structure
DNA
A,C,G,T 2
Double helix
RNA
A,C,G,U 1 or 2
Stem-loop,
pseudoknots, etc.
Gene expression
Central dogma of
molecular biology
Translation
mRNA -> protein via triplet code
 What happens if mRNA is destroyed or
otherwise can’t be translated?

Bacteria backgrounder
Single-celled organisms
 Prokaryotes = no nucleus
 Multi-cistronic transcripts -> multiple
genes transcribed at one time, often with
overlapping reading frames

Bacterial genetic information

Bacterial chromosome (1)
 Genome
of organism
 Required for life

Plasmids (2)
 Circular
DNA molecules
 Double-stranded
 Independently self-replicating
 Not required for life, often confer selective
advantage such as antibiotic resistance
Plasmid replication
(1),(2) – Genes encoded on plasmid
 (3) – Origin of Replication (ORI)

Plasmid copy control
Recall independent self-replication
 Copy number fluctuations are unavoidable
 Too many -> “runaway”, host dies
 Too few -> increased risk of plasmid loss

Problem: How to control copy count?
Solution: negative feedback loop mediated
by RNA-RNA interaction
R1 copy control

Genes:
– origin of replication
 repA – lots of this protein product is required
for replication initiation
 tap – translation of protein product is required
for translation of repA protein
 copA – product is antisense RNA
 copB – product is a repressor protein (not
covered here)
 oriR1
R1 copy control (2)
copA – RNA with stem-loop structure
 copT – target segment of repA/tap mRNA,
also forms a stem-loop structure
 Single loop-loop interaction

R1 copy control (3)
R1 copy control (4)
copA RNA is unstable; it degrades
 If not enough plasmids are producing copA
antisense RNA (copy number is too low),
more repA protein can be produced
 Therefore the plasmid can replicate

Post-segregational killing systems
Plasmid self-preservation mechanism
 Bacterial host losing plasmid results in
host death
 R1 plasmid hok/sok system is the
prototype
 All such systems work similarly

R1 hok/sok system

hok/sok locus encodes:
protein – “host killing”
 Overlapping reading frame – mok –
“modulator of killing”
 sok RNA – “suppressor of killer”
 hok
mok must be translated for hok to be
expressed
 mok cannot be translated if sok is present

R1 hok/sok system (2)

hok mRNA is extremely compact
 Many
stem-loop structures
 Flush 5’ – 3’ pairing
 Highly stable -> long half-life
 Translationally inert

mok segment is both:
 Translationally
active
 Able to bind sok inhibitor RNA
R1 hok/sok system (3)
sok RNA is highly unstable
 Bacteria with R1 have lots of sok produced

 sok

binds mok, hok is not translated
Bacteria which lose R1 have:
 Lots
of stable hok mRNA
 Quickly degrading sok RNA (low stability)
 No new sok RNA being produced
 hok is translated -> bacteria dies
Bacterial chromosomes

Plasmid antisense RNAs are generally cisencoded
 Implies
complete Watson-Crick
complementarity

Bacterial chromosomes contain transencoded antisense RNAs
 Not

necessarily complete complementarity
Often stress-related control systems
oxyS/fhlA in E. coli
oxyS – RNA transcript
induced by stress
 fhlA – transcriptional
activator site
 oxyS/fhlA complex
binds via two loop-loop
interactions

RNA interference (RNAi)
a.k.a. post-transcriptional gene silencing
 Double-stranded RNAs are introduced into
the cell

 Complementary
to mRNA for a gene
 Directly introduced in a wet lab, or
 Produced by the cell itself
RNA interference (2)

dsRNAs are cleaved into 21-23 nt
segments (“small interfering RNAs”, or
siRNAs) by an enzyme called Dicer
RNA interference (3)

siRNAs are incorporated into RNAinduced silencing complex (RISC)
RNA interference (4)

Guided by base complementarity of the
siRNA, the RISC targets mRNA for
degradation
RNA interference – why?

Studying gene function
 Knock
out or inhibit a gene’s normal function
 Can the organism survive?
 What phenotypic changes are observed?

Therapeutic suppression
 E.g.
cancer treatment
micro RNA (miRNA)
Gene expression regulation
 Created by similar process to siRNA
 Generally prevents binding of ribosome

Ex: C. elegans development
lin-4 and let-7 antisense RNAs
 Regulate larval development in C. elegans
 One of the two binding sites for lin-41 and
let-7 interaction:

Human telomerase

Telomerase = ribonucleoprotein complex
 Ribo
= ribosomal/RNA association
 Nucleo = nuclear localization
 Protein = contains a protein
Responsible for maintaining telomere
length in eukaryotic chromosomes
 Main components:

 Telomerase
reverse transcriptase
 Human telomerase RNA (hTR)
Human telomerase (2)

Reverse transcriptase
 Transcribes
RNA to DNA (rather than the
usual DNA to RNA)
Telomeres – repeated regions at the end
of eukaryotic chromosomes
 hTR is the template for the repeated
region

Human telomerase (3)

hTR 11-nt templating region consists of:
 Repeat
template: CUAACCC
 Alignment domain: UAAC
Positions telomerase on the DNA strand
 Provides template for repeat region

Human telomerase (4)
Loop-loop interaction
Sometimes referred to as “kissing loops”
 Recall that all of the RNA-RNA interaction
discussed so far (excepting RNAi), involve
loop-loop interaction
 Predicting miRNA transcripts and targets
involves loop structure prediction

References
Couzin, J. (2002) “Breakthrough of the year – Small RNAs
make big splash.” Science 298(5602):2296-2297.
Lai, E.C., Wiel, C., and Rubin, G.M. (2004)
“Complementary miRNA pairs suggest a regulatory role
for miRNA:miRNA duplexes.” RNA 10(2):171-175.
Moss, E.G. (2001) “RNA interference – It’s a small RNA
world.” Current Biology 11(19):R722-775.
Sharp, P.A. (2001) “RNA interference – 2001.” Genes and
Development 15(5):485-90.
Shi, Y. (2003) “Mammalian RNAi for the masses.” TRENDS
in Genetics 19(1):9-12.
References (2)
Ueda, C.T., and Roberts, R.W. (2004) “Analysis of a longrange interaction between conserved domains of human
telomerase RNA.” RNA 10(1):139-147.
Wagner, E.G.H. and Flärdh, K. (2002) “Antisense RNAs
everywhere?” TRENDS in Genetics 18(5):223-226.
Wagner, E.G.H., Altuvia, S., and Romby, P. (2002)
“Antisense RNAs in bacteria and their genetic elements.”
Advances in Genetics 45:361-398.
Part II – Prediction

Identifying effective siRNAs
 Neural

network approach
Identifying targets
 Mammalian
miRNA target prediction
Prediction of siRNAs
Sequence properties that make a good
antisense RNA an effective gene inhibitor
are not well understood
 Most computational models consider only:

 RNA structure
prediction
 Motif searches
Neural net approach
Training set: 490 known siRNA molecules
 Input parameters:

 Base
composition
 mRNA:siRNA binding energy properties
 3’ and 5’ binding energy
 Structure of siRNA (hairpin energy and
quality)

Target function: efficacy
Neural net approach (2)
Neural net results
14 inputs, 11 hidden units, 1 output
 Success rate of 92%
 Average prediction of 12 effective siRNAs
per 1000 base pairs
 Stringent (high specificity)
 Good for designing siRNAs for RNAi

Prediction of miRNA targets

Mammals/vertebrates
 Lots
of known miRNAs
 Mostly unknown target genes

Initial method outline
 Look
at conserved miRNAs
 Look for conserved target sites
micro RNAs in animals

0.5-1.0% of predicted genes encode
miRNA
 One
of the more abundant regulatory classes
Tissue-specific or developmental stagespecific expression
 High evolutionary conservation

micro RNAs in plants
Finding targets in plants is relatively easy
 Look for mRNA transcripts with nearperfect complementarity to known miRNAs
 Signal-to-noise ratio exceeds 10:1 for
Arabidopsis (model plant organism)
 Naïve approach in C. elegans and D.
melanogaster? No more hits than
expected by random chance!

So what can we use?

Pairing to nucleotides 2-8 at the 5’ end of
the miRNA
 Target

recognition
Target regions enriched for genes involved
in transcriptional regulation
Goals for algorithm
Predict 100s of miRNA targets
 Estimate false-positive rates
 Provide computational and experimental
evidence of authenticity
 Identify common functionality classes
other than transcriptional regulator genes

TargetScan
Algorithm developed by Lewis et al 2003
 Input:

 miRNA that
is known to be conserved across
multiple organisms
 Orthologous 3’ UTR sequences
 Cut-off values for two parameters
 Value for one free parameter

Output:
 Ranked
list of candidate target genes
TargetScan (1)

Search UTRs in one organism
2-8 from miRNA = “miRNA seed”
 Perfect Watson-Crick complementarity
 No wobble pairs (G-U)
 7nt matches = “seed matches”
 Bases
TargetScan (2)

Extend seed matches
 Allow
G-U (wobble) pairs
 Both directions
 Stop at mismatches
TargetScan (3)

Optimize basepairing
 Remaining
3’ region of miRNA
 35 bases of UTR 5’ to each seed match
 RNAfold program (Hofacker et al 1994)
TargetScan (4)
Folding free energy (G) assigned to each
putative miRNA:target interaction
 Ignores initiation free energy
 RNAeval (Hofacker et al 1994)

TargetScan (5)

Z score for each UTR (no match -> Z=1.0)
n
Z  e
 Gk / T
k 1
n = number of seed matches in UTR (may be more than one)
Gk = free energy of miRNA:target site interaction of kth seed match
T = parameter influencing relative weighting of UTRs with few high
affinity target sites against UTRs with lots of low affinity target sites
(experimentally determined)
TargetScan (6)
Order UTRs by Z score
 Assign rank to each UTR
 Repeat this process for each of the other
organisms with UTR datasets

TargetScan (7)

UTR i is a predicted target if for all
organisms:
Zi  ZC
Ri  RC
Datasets

nrMamm (mammalian – 79 sequences)
 Homologs
in human, mouse, and pufferfish
 Identical between human and mouse, not
necessarily pufferfish (fugu)

nrVert (vertebrate – 55 sequences)
 Identical

between human, mouse, and fugu
Non-redundant: if multiple miRNAs had
the same seed, one representative chosen
Sample program flow
Results for nrMamm
nrMamm searched against human,
mouse, and rat orthologous 3’ UTRs
 451 miRNA:target interactions predicted
for 400 unique genes
 Average 5.7 targets per miRNA
 Signal:noise ratio of 3.2:1

Results for nrVert
Additional search against fugu UTRs
 Signal:noise ratio improves to 4.6:1
 Relaxed cut-off values
 115 predicted miRNA:target interactions
for 107 unique genes
 2.1 putative targets per miRNA

Signal:noise ratio calculation
Signal = number of predicted targets from
nrMamm dataset
 Noise = number of predicted targets from
randomly shuffled miRNAs
 Shuffled control sequences screened to
ensure preservation of relevant features –
don’t underestimate the noise!

Screening control sequences

Features to consider:
 Expected
frequency of seed matches
 Expected frequency of matching to 3’ end of
miRNA (after seed extension)
 Observed count of seed matches in UTR
datasets
 Predicted free energies for seed:match
interactions
Signal:noise results



Filled bars are for
authentic miRNAs
Open bars show the
mean and standard
deviation for shuffled
sequences
nrMamm set used for
first two, nrVert used
for set including fugu
Biological relevance

Hypothesis: 5’ conservation of miRNAs is
important for mRNA target recognition
 Highest
signal:noise ratio observed when
seed positioned close to 5’ end

Hypothesis: highly conserved miRNAs are
more involved in regulation
 High
degree of conservation -> more
predicted targets
 Membership in large miRNA family -> more
predicted targets
Experimental verification

15 predicted target sites chosen
 All
with known biological function
 Representative of the entire list of candidates

11 target sites confirmed
 Expression
of upstream ORF influenced
 27% false positives – close correspondance
to predicted 30% false positives
References
Chalk, A.M. and Sonnhammer, E.L.L. (2002)
“Computational antisense oligo prediction with a neural
network model.” Bioinformatics 18(12):1567-1575.
Hofacker, I.L., Fontanta, W., Stadler, P.F., Bonhoeffer, S.,
Tacker, M., and Schuster, P. (1994) “Fast folding and
comparison of RNA secondary structures.” Monatshefte
fur Chemie 125:167-168.
Lewis, B.P., Shih, I., Jones-Rhoades, M.W., and Bartel,
D.P. (2003) “Prediction of mammalian microRNA
targets.” Cell 115(7):787-798.