Transcript RNA-Seq

普通高等教育
“十二五”规划教材
生物信息学
Bioinformatics
第十二章:第二代测序技术
及其应用
Sequence 2.0
Web 2.0





Coined by Darcy DiNucci in 1999.
Resurface in 2003.
2004, O'Reilly Media and MediaLive hosted the first Web
2.0 conference.
Examples of Web 2.0 include web-based communities,
hosted services, web applications, social-networking
sites, video-sharing sites, wikis, blogs, mashups and
folksonomies. A Web 2.0 site allows its users to interact
with other users or to change website content, in
contrast to non-interactive websites where users are
limited to the passive viewing of information that is
provided to them.
Ajax/Flash/Flex etc.
Next-generation DNA sequencing

BAC-based sequencing, whole-genome
sequencing (WGS)



1970s-2004, HGP
Sanger/Maxam-Gilbert
Next-generation DNA sequencing (NGS)




2004-now
Second-generation DNA sequencing
Deep-sequencing
High-throughput DNA sequencing
Second-generation DNA sequencing
technologies





Roche/454 FLX Pyrosequencer
Illumina/Solexa Genome Analyzer
Applied Biosystems SOLiDTM Sequencer
Pacific Sequencer
HeliScope Single Molecule Sequencer
Roche/454 FLX Pyrosequencer
Illumina Genome Analyzer
Applied Biosystems SOLiDTM
Sequencer
Applications of NGS
Category
Examples of applications
Complete genome resequencing
Comprehensive polymorphism and mutation discovery in
individual human genomes
Reduced representation sequencing
Large-scale polymorphism discovery
Targeted genomic resequencing
Targeted polymorphism and mutation discovery
Paired end sequencing
Discovery of inherited and acquired structural variation
Metagenomic sequencing
Discovery of infectious and commensal flora
RNA-Seq
Deep-sequencing shotgun libraries derived from mRNA/small
RNAs. microRNA profiling/splice junctions/transcript
boundaries/structural rearrangements/copy number variation
DNA methylation
Determining patterns of cytosine methylation in
genomic DNA , large-scale analysis of DNA methylation, by deep
sequencing of bisulfite-treated DNA
Chromatin immunoprecipitation–
sequencing (ChIP-Seq)
Genome-wide mapping of protein-DNA interactions, by deep
sequencing of DNA fragments pulled down by ChIP.
Nuclease fragmentation and
sequencing
Nucleosome positioning
Molecular barcoding
Multiplex sequencing of samples from multiple individuals
We are interested in …



mRNA/small RNA analyses (RNA-Seq)
Analysis of mRNA targets of miRNA/siRNA
Epigenetic analyses (ChIP-Seq)
mRNA targets of miRNA/siRNA

Analyses of mRNA targets of miRNA and
PARE:
mRNA degradation productsparallel
by NGS
analysis
of RNA ends
methods.
GMUCT:
genome-wide mapping
of uncapped and leaved
transcript
Epigenetic analyses
based on genome sequence;
cross-hybridization (high background levels);
limited dynamic range of detection (<1000-fold);
normalization problems(across
different experiments).
Technologies
used in transcriptome
studies:
RNA-Seq


Hybridization-based approaches :
low throughput;
Microarrays/chip;
based on expensive Sanger sequencing
technology;

expensive;
b)
genomic tiling microarrays.
a)
high throughput;
not quantitative.

Sequence-based
approaches:
more precise;
EST: Expression
Sequence
Tag
bp, 20-7000
a a)portion
the short tags
cannot
be(~400
uniquely
mapped.bp)
b)
tag-based methods:




CAGE: cap analysis of gene expression (~14-20 bp, 5′ ends)
SAGE: serial analysis of gene expression (~14-20 bp, 3′ ends)
MPSS: massively parallel signature sequencing (17-20 bp)
Next-generation Sequencing-based method:
RNA-Seq
RNA-Seq


Sequencing length: 30 - 400bp.
Advantages:







can be used to detect transcripts of any genome.
low background, highly accurate
large dynamic range of expression levels (~10000-fold)
high levels of reproducibility (both for technical and
biological replicates)
requires less RNA sample (cloning steps)
lower cost
DNA sequencing technologies used for ranscriptome
sequencing applications:

Illumina IG, Applied Biosystems SOLiD and Roche 454 Life
Science systems etc.
Advantages of RNA-Seq

Advantages of RNA-Seq v.s. other
transcriptomics methods
RNA-Seq


Sequencing length: 30 - 400bp.
Advantages:







can be used to detect transcripts of any genome.
low background, highly accurate
large dynamic range of expression levels (~10000-fold)
high levels of reproducibility (both for technical and
biological replicates)
requires less RNA sample (cloning steps)
lower cost
DNA sequencing technologies used for ranscriptome
sequencing applications:

Illumina IG, Applied Biosystems SOLiD and Roche 454 Life
Science systems etc.
RNA-Seq technologies

Commercially available sequencing technologies used for
transcriptome sequencing applications (Sep 15, 2008).
A typical RNA-Seq experiment
mapping reads
to the genome
Program
Website
Publications
BLAST
http://www.ncbi.nlm.nih.gov/blast/
1990, J. Mol. Biol.
BLAT
http://www.soe.ucsc.edu/~kent/src/
2002, Genome Research
Cross_match
http://www.phrap.org/phredphrapconsed.html
***
ELAND
http://www.illumina.com/
***
TopHat
http://tophat.cbcb.umd.edu/
***
Novoalign
http://www.novocraft.comne/
***
Mosaik
http://bioinformatics.bc.edu/marthlab/Mosaikne/
***
Bowtie
http://bowtie.cbcb.umd.edune
2009, Genome Biology
BWA
http://maq.sourceforge.net/bwa-man.shtmlne/
2009, Bioinformatics
MAQ
http://maq.sourceforge.net/
2008, Genome Research
SOAP/SOAP2
http://soap.genomics.org.cn/
2008/2009, Bioinformatics
ZOOM
http://www.bioinfor.com/products/zoom/
2008, Bioinformatics
PerM
http://code.google.com/p/perm/
2009, Bioinformatics
BWT-SW
http://i.cs.hku.hk/~ckwong3/bwtsw/
2008, Bioinformatics
RMAP
http://rulai.cshl.edu/rmap/
2008, BMC Bioinformatics
SHRiMP
http://compbio.cs.toronto.edu/shrimp/
2009, PLoS Computational Biology
SeqMap
http://biogibbs.stanford.edu/~jiangh/SeqMap/
MOM
http://mom.csbc.vcu.edu/
ProbMatch
http://www.cs.wisc.edu/∼jignesh/probematch/
Exonerate
http://www.ebi.ac.uk/~guy/exonerate/
SSAHA2
http://www.sanger.ac.uk/Software/analysis/SSAHA2/
Edena
http://www.genomic.ch/edena
VCAKE
http://sourceforge.net/projects/vcake/
2007, Bioinformatics
Euler-SR
***
2007, Genome Research
Mapping algorithms
(a) MAQ: based on
spaced-seed
2008, Bioinformatics
indexing;
2009, Bioinformatics
(b) Bowtie: based on
2009, Bioinformatics
the Burrows2005, BMC Bioinformatics
Wheeler transform
2001, Genome Research
(BWT).
2008, Genome Research
More Detailed problems

Library construction: fragmentation
methods.
Tag Count
3. mRNAs are reverse transcribed to cDNA,
2. mRNAs
randomly
fragmented,
the fragments
1. mRNAs
are
randomly
fragmented,
adapters
are
PCR amplified, randomly fragmented,
andare
are5reverse
transcribed
cDNA,
PCR
to the
and 3 ends,toand
then
theamplified,
adapterthen adapters are attachedattached
to
the
and
3 5
RNA
fragmentation
andmRNAs
then adapters
are attached
to the
5 and cDNA
3 ends.
ligated
are reverse
transcribed
to make
ends.
and PCR amplified.
DNA library preparation: RNA fragmentation and DNA fragmentation
compared.
Genes
More Detailed problems


Library construction
Gene model coverage by sequencing-based
methods for transcriptome analysis and its
expression profiling.
Reference genome sequence
5’
Extron1
cDNA
Intron1
Extron1
Extron2
Extron2
Intron2
3’
Extron3
5’ EST
reads
of reads No. of reads
No. of reads
No. ofNo.
Extron3
3’ EST
CAGE tags
Position in the genome
SAGE tags
RNA-Seq reads
Position in the genome
Position in the genome
Position in the genome
More Detailed problems



Library construction
Gene model coverage by sequencing-based
methods for transcriptome analysis and its
expression profiling.
RNA-Seq used for protein-coding gene
annotation.
Potential novel intron
Reference genome sequence
5’
Extron1
Intron1
Extron2
Sequencing reads
Intron2
Mapping
Extron3
Potential novel exon
More Detailed problems





Library construction
Gene model coverage by sequencing-based
methods for transcriptome analysis and its
expression profiling.
RNA-Seq used for protein-coding gene
annotation.
RNA-Seq used for small ncRNAs
(miRNA/siRNA) discovery and detection.
Transcript rearrangement discovery.
Applications of RNA-Seq





RNA-Seq can be used to detect the expression profile
of small RNAs (miRNA/piRNA/siRNA etc.) and
larger RNA (mRNA/tRNA/rRNA etc.).
RNA-Seq use for annotation: splice junctions/
transcription (exon/intron) boundaries.
RNA-Seq use for mapping of structural
rearrangements: translocations, inversions, small
insertions/deletions (indels), and copy number
variants (CNVs).
RNA-Seq can find new genes.
RNA-Seq can also reveal sequence variations (SNPs)
in the ranscribed regions.
Our Works
Our Works
ChIP-seq

ChIP-seq (Chromatin immunoprecipitation
followed by sequencing)
A technique for genome-wide profiling of DNAbinding proteins, histone modifications or
nucleosomes.



Illumina Solexa Genome Analyzer (sequencing is performed by
sequencing-by-synthesis)
Roche 454 platform (sequenced by pyrosequencing) and Applied
Biosystems (ABI) SOLiD platform (sequenced DNA ligase-driven
synthesis)
Helicos HeliScope platform (single-molecule sequencing platform)
ChIP-chip

ChIP-chip (Chromatin immunoprecipitation
followed by hybridization to a microarray)


Affymetrix: the ChIP and control samples are
hybridized to separate arrays (IP vs. control, twosample and multiple-sample tests)
NimbleGen and Agilent: hybridize a ChIP sample and a
control sample simultaneously to a single array (log
ratios, one-sample test)
ChIP-chip and ChIP-seq
More detailed prolems
Overview of CHiP–seq analysis
Software tools for ChIP data
analyses
Projects








ENCODE (ENCyclopedia Of DNA Elements) project:

http://www.genome.gov/10005107/
modeNCODE (model organism ENCyclopedia Of DNA Elements) project:

http://www.modencode.org/
NIH Roadmap epigenomics Program

http://nihroadmap.nih.gov/epigenomics/
GWAS: Genome-Wide Association Studies

http://grants.nih.gov/grants/gwas/
The Large-Scale Genome Sequencing Program

http://www.genome.gov/10001691/
International HapMap Project

http://www.genome.gov/10001688/
Genetic Variation Program

http://www.genome.gov/10001551/
1000 Genomes -- A Deep Catalog of Human Genetic Variation

http://www.1000genomes.org/
can highWhat would you do if youWhat
could
throughput
sequencing do for
sequence everything? you?
Now come to Sequence 2.0




Next-generation Sequencing (NGS).
Web 2.0
More than sequence.
Platform, resource and analysis.
JBrowse: A next-generation genome
browser (Dojo library: Ajax)
LookSeq: A browser-based web
viewer for deep sequencing data
(Google Maps: Ajax)
References



















2008 Medicine 2.0: Social Networking, Collaboration, Participation, Apomediation, and
Openness.
2009 Emerging Patient-Driven Health Care Models: An Examination of Health Social Networks,
Consumer Personalized Medicine and Quantified Self-Tracking.
2009 RNA-Seq: a revolutionary tool for transcriptomics.
2009 Applications of New Sequencing Technologies for Transcriptome Analysis.
2008 Next-Generation DNA Sequencing Methods.
2008 Next-generation DNA sequencing.
2005 Next generation sequencing technologies.
2008 Applications of next-generation sequencing technologies in functional genomics.
2008 What would you do if you could sequence everything?
2006 Genome-Wide Analysis of Protein-DNA Interactions.
2008 Design and analysis of ChIP-seq experiments for DNA-binding proteins.
2009 Insights from genomic profiling of transcription factors.
2008 An integrated software system for analyzing ChIP-chip and ChIP-seq data.
2009 ChIP-seq: Using high-throughput sequencing to discover protein–DNA interactions.
2009 Can web 2.0 reboot clinical trials.
2009 LookSeq: A browser-based viewer for deep sequencing data.
2009 JBrowse: A next-generation genome browser.
2009 PMRD: plant microRNA database.
2009 TransmiR: a transcription factor–microRNA regulation database