Constraint-Techniques for Collaborative Design

Download Report

Transcript Constraint-Techniques for Collaborative Design

BioConductor R for Microarray Analysis
Claudio Lottaz
Computational Diagnostics Group
Computational Molecular Biology Department
Max Planck Institute for Molecular Genetics
Overview
17-Jul-15
Overview
•
•
•
•
•
Introduction
File Formats
Data structures
Analysis methods
Summary
Claudio Lottaz: BioConductor - R for Microarray Analysis
2 / 14
Introduction
17-Jul-15
The R-Project
• S/Splus: commercial statistics software package
• origins in the academic community
• now commercialized
• serious effort in graphical user interface and the like
• R: public domain statistics software package
• based on the public roots of S
• still compatible with the S language
• command-line like user interface
Claudio Lottaz: BioConductor - R for Microarray Analysis
3 / 14
Introduction
17-Jul-15
BioConductor
• R is extendable through packages
• packages may be written in R (language)
• programming interface to C available
• BioConductor is a collection of packages
• various contributors
• various methods on different types of data
• heterogeneous usage
Claudio Lottaz: BioConductor - R for Microarray Analysis
4 / 14
File Formats
17-Jul-15
File Formats
• Importing red/green experiments data
• intensities from image processing output
.spot or .gpr files (Spot or GenePix packages)
• Textual information on probes and targets
.gal and .gdl files generated by GenePix
• Importing Affymetrix data:
• reads Affymetrix CEL-files
• needs copyright protected CDF-files for interpretation
•
Exporting tab delimited ASCII-files
Claudio Lottaz: BioConductor - R for Microarray Analysis
5 / 14
Data Structures
17-Jul-15
Red/Green Specific Data Structures
•
marrayLayout objects: contain information on
• Probes and their locations
• House-keeping genes
•
marrayRaw objects: intensities for a batch of arrays
• red/green, Foreground/back ground
• information on applied targets
•
marrayNorm objects: post normalization data
• Average log intensities, normalized log ratios
• Normalization factors
Claudio Lottaz: BioConductor - R for Microarray Analysis
6 / 14
Data Structures
17-Jul-15
Affymetrix Specific Data Structures
•
•
•
•
•
•
Cdf objects: chip description
Cel objects: contains probe data of one chip
Cel.container object: a set of Cel objects
PPSet object: all probes for a particular target
PPSet.container object: a set of PPSet objects
For convenience: Plobs (probe level objects)
• contain a Cdf and a Cel-container object
• Simple use, less flexible access
Claudio Lottaz: BioConductor - R for Microarray Analysis
7 / 14
Data Structures
17-Jul-15
Common Data Structures
• exprSet objects: hold expression data
• matrix of expression data and standard errors
• link to phenotype data and gene annotations
• geneNames to identify the genes
• phenoData objects: hold phenotype/patient data
• list of variables for each phenotype
• matrix of data: row per case, column per variable
•
Some packages use their own data structures
Claudio Lottaz: BioConductor - R for Microarray Analysis
8 / 14
Data Structures
17-Jul-15
Utilities
• Utilities for resampling
• Aggregators
• e.g. cumulate results in a cross-validation
•
Summary statistics
• Convenient methods for graphical output
• histograms, scatter plots, gene location, boxplots...
• on various subsets of data
Claudio Lottaz: BioConductor - R for Microarray Analysis
9 / 14
Analysis Methods
17-Jul-15
Red/Green Specific Analysis
• Diagnostical plots to find printing, hybridization or
scanning artifacts
• boxplots, scatter plots and spatial images
• Foreground, background, log-ratio...
• Normalization (Yang et al. 2001, 2002)
• location normalization: local weighted regression,
intensity dependent or 2D spatial
• Scale normalization:
median absolute deviation (MAD)
Claudio Lottaz: BioConductor - R for Microarray Analysis
10 / 14
Analysis Methods
17-Jul-15
Affymetrix Specific Analysis
• Exploring probe level data (package affy)
probe names, perfect match/mismatch intensities,...
• Normalization (on probe data)
• MVA plots for Affymetrix data
• Various methods, default is quantile normalization
• Determining expression levels
• Various methods: Affymetrix (1999), Li&Wong (2001),
Irizarry (2002)
• Standard errors are determine per expression value
Claudio Lottaz: BioConductor - R for Microarray Analysis
11 / 14
Analysis Methods
17-Jul-15
Common Analysis
• Gene filtering: e.g.
• find high expressed genes
• find differentially expressed genes
(also more than 2 groups)
• Find genes with similar expression patterns
to given gene of interest
• Receiver operating characteristic (ROC)
• Annotation: chromosome location, gene ontology
Claudio Lottaz: BioConductor - R for Microarray Analysis
12 / 14
Analysis Methods
17-Jul-15
Common Analysis (continued)
• Expression density diagnostics
• gene-wise compare distributional shapes
to find differences between groups
• Multiple hypothesis testing
•
•
•
•
family-wise error rates, false discovery rate
minP and maxT procedures, step-up procedures
based on various statistic (t-, F-, Wilcoxon...)
adjusted p-values for genes declared differentially
expressed, obtained through permutation
Claudio Lottaz: BioConductor - R for Microarray Analysis
13 / 14
Summary
17-Jul-15
Summary
• Public domain software, reproducible methods
• Open source, references to publications
• Sophisticated methods available
• Rather specific input formats needed,
license problem on Affymetrix chip description files
• Some heterogeneity in implementation
• Blurry definition of the R language
Claudio Lottaz: BioConductor - R for Microarray Analysis
14 / 14