[Title Slide: Your title goes here]

Download Report

Transcript [Title Slide: Your title goes here]

Microarray Analysis Using R/Bioconductor
Reddy Gali, Ph.D.
[email protected]
[email protected]
http://catalyst.harvard.edu
Agenda
•
•
•
•
•
•
Introduction to microarrays
Workflow of a gene expression microarray experiment
Publishing microarray data (MIAME format)
Microarray experimental design
Public microarray databases
Microarray preprocessing - Quality control and Diagnostic analysis
1
Agenda
•
•
•
•
Introduction to R/Bioconductor
Installation of R and Bioconductor Packages
General data analysis and strategies
Data analysis using AffylmGUI
2
Microarray Applications
• Analyze and compare patterns of gene expression
- before and after an intervention
- between tissue types
- between transgenic strains
- in neighboring cells (laser capture microdissection)
•
•
•
•
Find DNA copy-number variations
SNP detection
Tool for genotyping
High throughput screening tool for drug
discovery
• Elucidate gene function (RNAi microarrays;
Silva et al., PNAS 2004)
• Investigate interactions between DNA and
protein (ChIP on Chip)
3
Workflow of Gene Expression
Biological question
Experimental design
QC
Tissue / sample preparation
Extraction of Total RNA
QC
Probe amplification & labeling
QC
Microarray hybridization & processing
Image analysis
QC
Data analysis
Biological Verification
Expression measures - Normalization Statistical Filtering - Clustering Pathway analysis
QC
4
Pitfalls of Microarray Experiment
• Gene expression changes detected by microarray analysis cannot be
validated by other methods
- Inadequate design
- Data quality is low
- Statistical approach is not adequate
- Expression level of gene is below detection limit
- Change in gene expression is small
- Microarray detection probe is not specific or not sensitive
5
Microarray Processing
6
Two color vs Single color
Homemade Microarray
Affymetrix GeneChip
Tissue
Tissue
normal
normal
diseased
diseased
Total RNA
Total RNA
cDNA synthesis
First-strand cDNA
synthesis
Cy5
Cy3
in vitro transcription
Cy3 or Cy5
labeled cDNA
Double-stranded
cDNA
Biotin-labeled
cRNA
Mixing
Hybridization
Hybridization and Staining
Raw Data Output
Raw Data Output
Expression Ratio
to
Absolute Expression Values
7
Affymetrix probe design
PM
MM
11 Probe pairs / Probe Set
Multiple Probe Sets / Gene
Lipshutz et al; 1999; Nature Genetics, 21(1):20-24
8
Questions usually asked
•
•
•
•
•
•
•
•
•
What kind of technology or microarrays I have to use
How many replicates do I need
What is a real replicate
Do I need statistical advice
Should I do technical replicate
Should I do dye swap
Should I pool my samples
How do I analyze my dataset
What software should I use
9
Design of Microarray Experiment
• Replicates
• Goal, resources, technology, quality, design and analysis
• Two fold change – 3 replicates
• Smaller change – 5 replicates
• Technical replicates and Biological replicates
• Sample pooling
• Amount of sample
• Replicates of pooled sample
• No way to find variance between samples
10
MIAME- How to publish
Minimum Information About a Microarray
Experiment (MIAME)- www.mged.org
11
MIAME – Check list
•
Type of experiment: for example, is it a comparison of normal vs. diseased tissue,
a time course, or is it designed to study the effects of a gene knock-out?
•
Experimental factors: the parameters or conditions tested, such as time, dose, or
genetic variation.
•
The number of hybridizations performed in the experiment.
•
The type of reference used for the hybridizations, if any.
•
Hybridization design: if applicable, a description of the comparisons made in each
hybridization, whether to a standard reference sample, or between experimental
samples. An accompanying diagram or table may be useful.
•
Quality control steps taken: for example, replicates or dye swaps.
12
MIAME – Check list
•
The origin of the biological sample (for instance, name of the organism, the
provider of the sample) and its characteristics: for example, gender, age,
developmental stage, strain, or disease state.
•
Manipulation of biological samples and protocols used: for example, growth
conditions, treatments, separation techniques.
•
Protocol for preparing the hybridization extract: for example, the RNA or
DNA extraction and purification protocol.
•
Labeling protocol(s)
•
External controls (spikes)
13
MIAME – Check list
•
Type of scanning hardware and software used: this information is appropriate
for a materials and methods section.
•
Type of image analysis software used: specifications should be stated in the
materials and methods.
•
A description of the measurements produced by the image-analysis
software and a description of which measurements were used in the analysis.
•
The complete output of the image analysis before data selection and
transformation (spot quantitation matrices).
•
Data selection and transformation procedures.
•
Final gene expression data table(s) used by the authors to make their conclusions
after data selection and transformation (gene expression data matrices).
14
Gene Expression Omnibus- GEO
15
Public Microarray Databases
•
•
•
•
•
•
•
BodyMap - http://bodymap.ims.u-tokyo.ac.jp/
SMD - http://genome-www5.stanford.edu/
RIKEN - http://read.gsc.riken.go.jp/
MGI - http://www.informatics.jax.org/
GEO - http://www.ncbi.nlm.nih.gov/geo/
CIBEX - http://cibex.nig.ac.jp/index.jsp
ArrayExpress - http://www.ebi.ac.uk/microarray-as/ae/
16
Microarray Platforms
• Agilent Microarrays 60-mer format
• Codelink Bioarrays 30-mer format
• Affymetrix GeneChips 25-mer format
• Illumina Beadchips
• NimbleGen 60-mer format
17
RNA quality
• OD 260/280  1.8-2
• Electropherograms: degradation, rRNA peaks
• Bio-analyzer graphs
Microarray data Mining
Biological question
Experimental design
Microarray experiment
Pre-processing
Image analysis
Expression quantification
Normalization
Estimation/Testing
Biological verification/
interpretation
Data analysis
Classification/Prediction
Clustering
19
Microarray data Mining
CDF / CEL
Quality assessment
Background correction
probe level normalization
probe set summary
Log ratios
Log intensities
Identify genes
Clustering etc
20
Microarrays – Image Inspection
Microarray:
- Visual inspection of the chip
 Scratches, bubbles, uneven hybridization
 outlier detection
21
Diagnostic plots-RNA degradation
22
Box Plots of unnormalized data
23
Raw vs Normalized data
Raw Data
Normalized Data
24
Histograms of unnormalized data
25
QC stats
26
Why Normalize
• It adjusts the individual hybridization intensities to balance them
appropriately so that meaningful biological comparisons can be
made.
• Unequal quantities of starting RNA
• Differences in labeling or detection efficiencies between the
fluorescent dyes used
• Systematic biases in the measured expression levels.
•
•
•
•
•
Sample preparation
Variability in hybridization
Spatial effects
Scanner settings
Experimenter bias
27
Data analysis workflow
28
Free Software – Data analysis
• Bioconductor
– is an open source and open development software project
to provide tools for the analysis and comprehension of
genomic data.
• TMEV 4.0
– is an application that allows the viewing of processed
microarray slide representations and the identification of
genes and expression patterns of interest.
• dCHIP
– DNA-Chip Analyzer (dChip) is a software package for
probe-level (e.g. Affymetrix platform) and high-level
analysis of gene expression microarrays and SNP
microarrays.
29
R / Bioconductor
• R and Bioconductor packages
• R (http://cran.r-project.org/ )is a comprehensive
statistical environment and programming language for
professional data analysis and graphical display.
• Bioconductor (http://www.bioconductor.org/) is an
open source and open development software project for
the analysis of microarray, sequence and genome data.
• More 300 Bioconductor packages.
• http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/
R_BioCondManual.html
30
R / Bioconductor - Installation
31
OneChannelGUI
• A graphical interface (GUI) for Bioconductor libraries to
be used for quality control, normalization, filtering,
statistical validation and data mining for single channel
microarrays
• Affymetrix IVT, Human Gene 1.0 ST and exon arrays are
implemented
• OneChannelGUI is an add-on Bioconductor package
providing a new set of functions extending the capability
of the affylmGUI package.
32
TCL and Tk pacakges
• ActiveTcl is ActiveState's distribution of Tcl. It is most
commonly used for rapid prototyping, scripted
applications and GUIs.
• Install Tcl - http://www.activestate.com/activetcl/
• Tcl/Tk packages, BWidget and Tktable
• Install in C:\Tcl Directory
33
Installing R/ Active Tcl
•
•
http://cran.r-project.org/
http://www.activestate.com/activetcl/
34
Installing AffylmGUI packages
for Affymetrix data
• install.packages("affylmGUI",contriburl="http://bioinf.
wehi.edu.au/affylmGUI")
• source("http://www.bioconductor.org/biocLite.R")
•
•
•
•
•
•
biocLite("affylmGUI", dependencies=TRUE)
biocLite("affylmGUI")
biocLite("tkrplot")
biocLite("affyPLM")
biocLite("R2HTML")
biocLite("xtable")
• library(affylmGUI)
35
AffylmGUI Browser
36
OneChannelGUI Installation
• source("http://www.bioconductor.org/biocLite.R")
• biocLite("oneChannelGUI")
• biocLite("oneChannelGUI ", dependencies=TRUE)
• library(oneChannelGUI)
37
OneChannelGUI
38
Target File creation
•
•
•
•
•
•
•
Create, with excel, a tab delimited file named targets.txt
Targets file is made of three columns with the following
header:
Name, FileName, Target
In column Name place a brief name (e.g. c1, c2, etc)
In column FileName place the name of the corresponding
.CEL file
In column Target place the experimental conditions (e.g.
control, treatment, etc)
Place targets.txt and CEL files into a folder (directory)
39
Target File
40
Working with OnechannelGUI
41
Working with OnechannelGUI
A
Click on “File” to start a new project
B
Click on “New” to start a new project
C
Selected 3’IVT arrays
D
Select working directory that has
the .CEL files and targets.txt file
42
Working with OnechannelGUI
Working with OnechannelGUI
Quality
control
Normalization
Filtering
Statistical
analysis
Biological
Knowledge
extraction
Annotation
44
Quality Control plots
Click on Quality Control menu
45
QC plots/reports
• Work with your data set
• Plot various QC plots and come up with what arrays are
not of good quality
• Plot RNA degradation plot
• Download affyQCreport package and create a QC report
for the dataset you are working
•
> library(affyQCReport)
> QCReport(mydata, file=“reddy.pdf”)
46
Working with OnechannelGUI
Quality
control
Normalization
Filtering
Statistical
analysis
Biological
Knowledge
extraction
Annotation
47
Probe set summary
A
Click on probe set menu
and select the probe set
summary and normalization
option.
B
Normalization
49
Exercise 4
• Calculate probe set summaries with GCRMA and RMA
– With GCRMA and RMA
– Export and save the normalized values
50
Working with OnechannelGUI
Quality
control
Normalization
Filtering
Statistical
analysis
Biological
Knowledge
extraction
Annotation
51
Filtering - OnechannelGUI
Signal features:
Percent intensities greater of a user defined value
Interquantile range (IQR) greater of a defined value
Annotation features:
•
Specific gene features (i.e. GO term, presence of
transcriptional regulative elements in promoters,
etc.)
•
Using Ingenuity pathway knowledge
base
52
Filtering
•
Perform IQR filter at 0.25 followed by an intensity
filter at 50% of the arrays with and intensity over
100.
• Export the data as tab delimited file.
-Question:
How many probe sets are left after the first
and the second filter?
•
Using transcription factors from Ingenuity create a
file containing only the entrez genes without header
and use it to filter the data set. Save the data set
53
Linear Modeling (Limma)
Differential Expression
Computer contrasts builds
differential expression
MA and Volcano plots
56
Expression values
Gene
Description
Gene
Symbol
Log2 FC
Average
intensity
P-values
T statistics
Log-odd
statistics
AffyID
57
Differential Expression
•
Use the “Table of Genes Ranked in order of
Differential Expression” and filter the genes and
export the normalized expression values
•
Plot differentially expressed genes with raw p-value
≤ 0.05 and an absolute fold change ≥ 1 for the two
contrasts.
•
Using "Venn Diagram between probe set lists“,
evaluate the level of overlap between the two sets.
Hint: make two sets from two contrasts
58
Thank you
Reddy Gali, Ph.D.
[email protected]
Phone: 617 432 7471
http://catalyst.harvard.edu
59