et al. Nat Methods

Download Report

Transcript et al. Nat Methods

Transcriptomics
Jim Noonan
GENE 760
Transcriptomics
Introduction to RNA-seq
RNA-seq
workflow
Martin and Wang Nat Rev Genet 12:671 (2011)
Wang et al. Nat Rev Genet 10:57 (2009)
Illumina RNA-seq library preparation
Capture poly-A RNA with poly-T oligo attached beads (100 ng total) (2x)
•
•
RNA quality must be high – degradation produces 3’ bias
Non-poly-A RNAs are not recovered
Fragment mRNA
Synthesize ds cDNA
Ligate adapters
Amplify
Generate clusters and
sequence
Ribosomal RNA subtraction
RiboMinus
Quantifying relative expression levels in RNA-seq
Use existing gene annotation:
•
•
•
•
•
•
Align to genome plus annotated splices
Depends on high-quality gene annotation
Which annotation to use: RefSeq, GENCODE, UCSC?
Isoform quantification?
Identifying novel transcripts?
Differential expression
De novo transcript assembly:
• Assemble transcripts directly from reads
• Allows transcriptome analyses of species without
reference genomes
Mapping RNA-seq reads
Quantifying relative expression levels in RNA-seq
Reads per kilobase of feature length per million mapped reads (RPKM)
Fragments per kilobase per million mapped reads (FPKM) (paired-end reads)
Transcripts per million (TPM)
Counts per million (CPM)
• What is a “feature?”
• What about genomes with poor genome annotation?
• What about species with no sequenced genome?
For a detailed comparison of normalization methods, see:
Bullard et al. BMC Bioinformatics 11:94 (2010).
Robinson and Oshlack, Genome Biol 11:R25 (2010)
Composite gene models
Map reads to genome
Map remaining reads to
known splice junctions
• Requires good gene models
• Isoforms are ignored
Which gene annotation to use?
Splice-aware short read aligners
Martin and Wang Nat Rev Genet 12:671 (2011)
The ‘Tuxedo’ suite
Trapnell et al. Nature Protocols 7:562 (2012)
Cufflinks: ab initio transcript assembly
Step 1: map reads to reference genome
Trapnell et al. Nat. Biotechnology 28:511 (2010)
Cufflinks: ab initio transcript assembly
Isoform abundances estimated
by maximum likelihood
Trapnell et al. Nat. Biotechnology 28:511 (2010)
Differential expression
Garber et al. Nat Methods 8:469 (2011)
Differential expression
Popular methods:
• EdgeR
• DEseq
• Cuffdiff
Require count data
Assume negative binomial or
Poisson distribution
Garber et al. Nat Methods 8:469 (2011)
What depth of sequencing is required to
characterize a transcriptome?
Wang et al. Nat Rev Genet 10:57 (2009)
Considerations
Gene length:
• Long genes are detected before short genes
Expression level:
• High expressors are detected before low expressors
Complexity of the transcriptome:
• Tissues with many cell types require more sequencing
Feature type
• Composite gene models
• Common isoforms
• Rare isoforms
Detection vs. quantification
• Obtaining confident expression level estimates (e.g.,
“stable” RPKMs) requires greater coverage
Applications of RNA-seq
Characterizing transcriptome complexity
• Alternative splicing
Differential expression analysis
• Gene- and isoform-level expression comparisons
Novel RNA species
• lincRNAs
• Pervasive transcription
Allele-specific expression
• Effect of genetic variation on gene expression
• Imprinting
RNA editing
• Novel events
Alternative isoform regulation in human tissue
transcriptomes
Wang et al Nature 456:470 (2008)
Diversity of alternative splicing events
in human tissues
Wang et al. Nature 456:470 (2008)
Novel RNA species: annotating lincRNAs
Guttman et al Nat Biotechnol 28:503 (2010)
Small RNA sequencing
Rother and Meister,
Biochimie 93: 1905 (2011)
Small RNA sequencing
microRNAs ~22 nt
piRNAs ~25-30 nt
Rother and Meister,
Biochimie 93: 1905 (2011)
Small RNA sequencing: Illumina protocol
microRNAs ~22 nt
piRNAs ~25-30 nt
Distinguishing functional small RNAs from noise
• Structural similarity to known small RNAs: miR-deep, miR-cat
• Binding to small RNA processing proteins
• Genetic requirements for processing
Friedlander et al. Nat Biotechnology 26:407 (2008)
Measuring translation by ribosome footprinting
Ingolia, Nat Rev Genet 15:205(2014)
Measuring translation by ribosome footprinting
Ingolia et al. Science 324:218 (2009)
Measuring translation by ribosome footprinting
Ingolia et al. Science 324:218 (2009)
Some lincRNAs are translated in mouse ES cells
Ingolia et al. Cell 147:789 (2011)
Detecting RNA-protein interactions: CLIP
Rother and Meister,
Biochimie 93: 1905 (2011)
Enhancer-associated RNAs (eRNAs)
Ren B. Nature 465:173 (2010)
Enhancer-associated RNAs (eRNAs)
Kim et al Nature 465:182 (2010)
How much of the genome is transcribed?
Estimates from
ENCODE
Kellis et al. Proc. Natl. Acad. Sci. USA 111:6131 (2014)