Given a miRNA sequence, what are its target

Transcript Given a miRNA sequence, what are its target

Improving miRNA Target Genes
Prediction
Rikky Wenang Purbojati
miRNA


MicroRNA (miRNA) is a class of RNA which is believed
to play important roles in gene regulation.
It’s a short (21- to 23-nt) RNAs that bind to the 3′
untranslated regions (3′ UTRs) of target genes.
miRNA Functions


miRNA plays a major role in RNA Induced Silencing
Complex (RISC).
miRNAs control the expression of large numbers of
genes by:



mRNA degradation
Translational repression
Recent studies indicates it plays a role in cancer
development:


Surplus of miRNA might inhibit cell apoptosis process
Deficit of miRNA might cause excess of certain oncogenes
RNA Induced Silencing Complex

mRNA degradation


Breaks the structural integrity of a mRNA.
Translational repression

Prevent the mRNA from being translated.
Characteristics of miRNA


Short (22-25nts)
Transcripted from a miRNA gene



Intragenic: miRNA gene is located inside a host gene (usually
intron region)
Intergenic: miRNA gene is located outside gene bodies
A consistent 5’ and 3’ boundary:



Transcription Start Site
5’ Cap
Poly(A) tail
Development of miRNA
miRNA General Research Question


Much attention has been directed in miRNA processing
and targeting.
Computational-wise, one basic challenge of miRNA:
Given a miRNA sequence, what are its target genes?
miRNA sequence target prediction


Predict target genes by matching the complement of
miRNA sequence.
Two types of complement:

Perfect complement

Imperfect complement
Find perfect match for
seed (2-8nt)
miRNA sequence target prediction

Several requirements for matching:



Strong Watson-Crick base pairing of the 5’ seed (2-8 nts)
Conservation of the miRNA binding site across species
Another approach: thermodynamic rule

Local miRNA-mRNA interaction with positive balance of
minimum free energy
Problems and Opportunities

Problem:
Pure computational target genes prediction produces a
lot of candidates




No unifying theory for target gene prediction yet
Most of them are not validated yet
Common assumption is that most of them are false positives
Can we shorten the list to include only the strong candidates ?
Problems and Opportunities

Opportunity:
Lots of publicly available experimental dataset i.e. cDNA
microarray, miRNA microarray, etc.


Use the dataset to computationally validate some of the target
genes
Current Research:
Preliminary research tries to utilizes the abundance of
publicly available microarray data.
Assumptions


miRNA works by silencing target genes, thus miRNA
gene and target genes should be anti-correlated
Intragenic miRNA are expressed along with the host
gene.


a host gene should be anti-correlated with a target gene
Intergenic miRNA does not have a host gene, but we
might be able to use available composite (miRNA
microarray + cDNA microarray) dataset

If a miRNA is up-regulated in miRNA microarray, then its
target genes should be down-regulated in cDNA microarray
Current Work


There have been some works related to this idea (i.e.
HOCTAR)
However, we can improve it by:



Using a stricter criteria across the microarray data
Using a more diverse data
We expect we will get a much better specifity than the
previous method
Hoctar Method




Get a list of target genes from 3 different tools (pictar,
TargetScan,miranda)
Uses Pearson correlation to determine the correlation
coefficient between 2 genes
Include target genes which have correlation below some
threshold (-)
Only works for intragenic miRNA
Hoctar Method
Shortcomings of Hoctar



Uses all probes data even though they are not consistent
Uses only one target gene prediction algorithm approach
Depends on Pearson Correlation, which is sensitive to
outliers
Improvement Idea (1)


Use only subset of data which probes are all consistent
Treat each probes as different experiments
Improvement Idea (2)

Pearson correlation is very sensitive to outliers,
alternative solutions:



Uses Rank correlation coefficients instead of Pearson
correlation coefficients
Normalize the dataset to normal distribution
Ignore outliers
Improvement Idea (3)


In addition to probes consistency and rank correlation, we
might use entropy rule in eliminating candidate target genes
Assumption:



Transcript level can be approximated from expression level data
One miRNA transcript can only degrade one mRNA transcript
Thus miRNA expression changes should not be much different from
mRNA expression changes
Improvement Idea (4)


Uses a larger amount of microarray data
We might be able to include miRNA microarray to
further refine target genes list for several miRNA
Preliminary Result


GSE9234 dataset (hipoxia/normoxia)
Using only consistency criteria
miRNA
Host Gene
Known
Target Gene
HOCTAR
Refined
miR-103-2
PANK3
GPD1
YES
YES
miR-103-2
PANK3
FBW1B
NO
YES
miR-140
WWP2
HDAC4
YES
YES
miR-224
GABRE
API5
NO
NO
Refining Intergenic miRNA prediction



Refining intergenic miRNA prediction using microarray
dataset is not a trivial task
Microarray can only be used to measure the expression
of target genes, but not the miRNA gene
Might have to rely on additional data:


Proxy measurement
miRNA microarray
Intergenic miRNA proxy measurement

Putative target gene approximation



use the expression level of a known target genes for that
specific intergenic miRNA
If its target genes are consistently down-regulated, then we
can assume that the expression level of the intergenic miRNA
gene is up-regulated
Cluster miRNA approximation


Some intergenic miRNAs are clustered with each other;
according to (Saini et al. 2007) most of these clusters use the
same pri-mirNA transcript
Use method 1 for neighboring miRNA to get the intergenic
miRNA expression approximation
Further Work


Implementation and evaluation
Standardizing composite dataset repository

Given a miRNA sequence, what are its target

Transcript Given a miRNA sequence, what are its target

Directory