Indroductory_RNA-seq

Download Report

Transcript Indroductory_RNA-seq

Introductory RNA-seq Transcriptome
Profiling
Before we start:
Align sequence reads to the reference genome
The most time-consuming part of the analysis is doing the
alignments of the reads (in Sanger fastq format) for all
replicates against the reference genome.
RNA-seq in the Discovery Environment
Overview: This training module is designed to provide a
hands on experience in using RNA-Seq for transcriptome
profiling.
Question:
How well is the annotated transcriptome represented in
RNA-seq data in Arabidopsis WT and hy5 genetic
backgrounds?
How can we compare gene expression levels in the two
samples?
Scientific Objective
LONG HYPOCOTYL 5 (HY5) is a basic leucine zipper
transcription factor (TF).
Mutations in the HY5 gene cause aberrant phenotypes in
Arabidopsis morphology, pigmentation and hormonal
response.
We will use RNA-seq to compare the transcriptomes of
seedlings from WT and hy5 genetic backgrounds to identify
HY5-regulated genes.
Samples
• Experimental data downloaded from the NCBI
Short Read Archive (GEO:GSM613465 and
GEO:GSM613466)
• Two replicates each of RNA-seq runs for Wildtype and hy5 mutant seedlings.
Specific Objectives
By the end of this module, you should
1) Be more familiar with the DE user interface
2) Understand the starting data for RNA-seq analysis
3) Be able to align short sequence reads with a reference
genome in the DE
4) Be able to analyze differential gene expression in the DE
5) Be able to use DE text manipulation tools to explore the
gene expression data
Quick Summary
Pre-Configured: Getting the RNA-seq Data
Import SRA data
from NCBI SRA
Extract FASTQ
files from the
downloaded SRA
archives
Examining Data Quality with fastQC
Examining Data Quality with fastQC
RNA-Seq Conceptual Overview
Image source: http://www.bgisequence.com
RNA-Seq Workflow Overview
Step 1: Align Reads to the Genome
Built-in ref. genomes
User provided ref. genomes
A single FASTQ file
Folder with >= 1 FASTQ files
Align the four
FASTQ files to
Arabidopsis
genome using
TopHat
TopHat
• TopHat is one of many applications for aligning
short sequence reads to a reference genome.
• It uses the BOWTIE aligner internally.
• Other alternatives are BWA, MAQ, OLego,
Stampy, Novoalign, etc.
RNA-seq Sample Read Statistics
• Genome alignments from TopHat were saved as BAM
files, the binary version of SAM
(samtools.sourceforge.net/).
• Reads retained by TopHat are shown below
Sequence run
WT-1
Reads
Seq. (Mbase)
WT-2
hy5-1
hy5-2
10,866,702 10,276,268
13,410,011
12,471,462
445.5
549.8
511.3
421.3
Prepare BAM files for viewing
Index BAM files using SAMtools
Using IGV in Atmosphere
1. We already
Launched an
instance of
NGS Viewers
in Atmosphere
2. Use VNClient
to connect to
your remote
desktop
Pre-configured VM for NGS Viewers
Integrated Genomics Viewer (IGV)
The Integrative Genomics Viewer (IGV) is a high-performance
visualization tool for interactive exploration of large, integrated
genomic datasets. It supports a wide variety of data types,
including array-based and next-generation sequence data,
and genomic annotations.
http://www.broadinstitute.org/igv/
Use IGV to inspect outputs from TopHat
ATG44120 (12S seed storage protein) significantly down-regulated in hy5 mutant
Background (> 9-fold p=0). Compare to gene on right lacking differential expression
RNA-Seq Workflow Overview
CuffDiff
• CuffLinks is a program that assembles aligned RNA-Seq
reads into transcripts, estimates their abundances, and
tests for differential expression and regulation
transcriptome-wide.
• CuffDiff is a program within CuffLinks that compares
transcript abundance between samples
Examining Differential Gene Expression
Examining the Gene Expression Data
Differentially expressed genes
Filter CuffDiff results for up or down-regulated
gene expression in hy5 seedlings
Differentially expressed genes
Example filtered CuffDiff results generated with the Filter_CuffDiff_Results to
1) Select genes with minimum two-fold expression difference
2) Select genes with significant differential expression (q <= 0.05)
3) Add gene descriptions
Coming Soon: Downstream Analysis with cummeRbund