Transcript Document
Canadian Bioinformatics Workshops
www.bioinformatics.ca
Module #: Title of Module
2
Module 6
David Wishart
A Typical Metabolomics Experiment
2 Routes to Metabolomics
ppm
7
6
5
4
Quantitative (Targeted)
Methods
3
2
Chemometric (Profiling)
Methods
25
TMAO
hippurate
allantoin creatinine taurine
1
PC2
20
creatinine
15
10
citrate
ANIT
5
hippurate
urea
2-oxoglutarate
water
succinate
fumarate
0
-5
-10
ppm
7
6
5
4
3
2
1
Control
-15
PAP
-20
-25
-30
PC1
-20
-10
0
10
Metabolomics Data Workflow
Chemometric Methods
Targeted Methods
• Data Integrity Check
• Spectral alignment or
binning
• Data normalization
• Data QC/outlier removal
• Data reduction & analysis
• Compound ID
• Data Integrity Check
• Compound ID and
quantification
• Data normalization
• Data QC/outlier removal
• Data reduction & analysis
Data Integrity/Quality
• LC-MS and GC-MS have high
number of false positive
peaks
• Problems with adducts (LC),
extra derivatization products
(GC), isotopes, breakdown
products (ionization issues),
etc.
• Not usually a problem with
NMR
• Check using replicates and
adduct calculators
MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html
HMDB http://www.hmdb.ca/search/spectra?type=ms_search
Data/Spectral Alignment
• Important for LC-MS
and GC-MS studies
• Not so important for
NMR (pH variation)
• Many programs
available (XCMS,
ChromA, Mzmine)
• Most based on time
warping algorithms
http://mzmine.sourceforge.net/
http://bibiserv.techfak.uni-bielefeld.de/chroma
http://metlin.scripps.edu/download/
Binning (3000 pts to 14 bins)
xi,yi
x = 232.1 (AOC)
y = 10 (bin #)
bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
Data Normalization/Scaling
• Can scale to sample or scale
to feature
• Scaling to whole sample
controls for dilution
• Normalize to integrated
area, probabilistic quotient
method, internal standard,
sample specific (weight or
volume of sample)
• Choice depends on sample
& circumstances
Same or different?
Data Normalization/Scaling
• Can scale to sample or scale
to feature
• Scaling to feature(s) helps
manage outliers
• Several feature scaling
options available: log
transformation, auto-scaling,
Pareto scaling, probabilistic
quotient, and range scaling
MetaboAnalyst http://www.metaboanalyst.ca
Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.
Data QC, Outlier Removal & Data
Reduction
• Data filtering (remove solvent peaks, noise
filtering, false positives, outlier removal -- needs
justification)
• Dimensional reduction or feature selection to
reduce number of features or factors to consider
(PCA or PLS-DA)
• Clustering to find similarity
Module 6
bioinformatics.
MetaboAnalyst
• Web server designed to handle
large sets of LC-MS, GC-MS or
NMR-based metabolomic data
• Supports both univariate and
multivariate data processing,
including t-tests, ANOVA, PCA,
PLS-DA
• Identifies significantly altered
metabolites, produces colorful
plots, provides detailed
explanations & summaries
• Links sig. metabolites to pathways
via SMPDB
http://www.metaboanalyst.ca
GC/LC-MS
raw spectra
MS / NMR
peak lists
MS / NMR
spectra bins
• Peak detection
• Retention time correction
Metabolite
concentrations
Baseline filtering
Peak alignment
• Data integrity check
• Missing value imputation
Resources & utilities
• Peak searching
• Pathway mapping
• Name conversion
• Lipidomics
• Metabolite set libraries
Enrichment analysis
• Over representation
analysis
• Single sample profiling
• Quantitative enrichment
analysis
Data normalization
• Row-wise normalization (4)
• Column-wise normalization (4)
Statistical analysis
Pathway analysis
• Enrichment analysis
• Topology analysis
• Interactive visualization
• Univariate analysis
• Dimension reduction
• Feature selection
• Cluster analysis
• Classification
Downloads
•Processed data
• PDF report
• Images
Time-series /two factor
• Visualization
• Two-way ANOVA
• ASCA
• Temporal comparison
MetaboAnalyst Overview
• Raw data processing
– Using MetaboAnalyst
• Data Reduction & Statistical analysis
– Using Metaboanalyst
• Functional enrichment analysis
– Using MSEA in MetaboAnalyst
• Metabolic pathway analysis
– Using MetPA in MetaboAnalyst
Module 6
bioinformatics.
Example Datasets
Example Datasets
Module 6
bioinformatics.
Metabolomic Data Processing
Common Tasks
• Purpose: to convert various raw data forms into
data matrices suitable for statistical analysis
• Supported data formats
–
–
–
–
Module 6
Concentration tables (Targeted Analysis)
Peak lists (Untargeted)
Spectral bins (Untargeted)
Raw spectra (Untargeted)
bioinformatics.
Data Upload
Module 6
bioinformatics.
Alternatively …
Data Set Selected
• Here we will be selecting a data set from dairy
cattle fed different proportions of cereal grains
(0%, 15%, 30%, 45%)
• The rumen was analyzed using NMR spectroscopy
using quantitative metabolomic techniques
• High grain diets are thought to be stressful on
cows
Module 6
bioinformatics.
Data Integrity Check
Module 6
bioinformatics.
Data Normalization
Module 6
bioinformatics.
Data Normalization
• At this point, the data has been transformed to a
matrix with the samples in rows and the variables
(compounds/peaks/bins) in columns
• MetaboAnalyst offers three types of
normalization, row-wise normalization, columnwise normalization and combined normalization
• Row-wise normalization aims to make each
sample (row) comparable to each other (i.e. urine
samples with different dilution effects)
Module 6
bioinformatics.
Data Normalization
• Column-wise normalization aims to make each variable
(column) comparable to each other. This procedure is
useful when variables are of very different orders of
magnitude. Four methods have been implemented for
this purpose – log transformation, autoscaling, Pareto
scaling and range scaling
Module 6
bioinformatics.
Normalization Result
Module 6
bioinformatics.
Quality Control
• Dealing with outliers
– Detected mainly by visual inspection
– May be corrected by normalization
– May be excluded
• Noise reduction
– More of a concern for spectral bins/ peak lists
– Usually improves downstream results
Module 6
bioinformatics.
Visual Inspection
• What does an outlier look like?
Module 6
bioinformatics.
Outlier Removal
Module 6
bioinformatics.
Noise Reduction
Noise Reduction (cont.)
• Characteristics of noise
– Low intensities
– Low variances (default)
Module 6
bioinformatics.
Data Reduction and Statistical
Analysis
Common tasks
•
•
•
•
To detect interesting patterns;
To identify important features;
To assess difference between the phenotypes
Classification / prediction
Module 6
bioinformatics.
ANOVA
View Individual Compounds
Module 6
bioinformatics.
Questions
• Q: Which compounds show significant difference
among all the neighboring groups (0-15, 15-30,
and 30-45)?
• Q: For Uracil, are groups 15, 30, 45 significantly
different from each other?
Module 6
bioinformatics.
Template Matching
• Looking for compounds showing interesting patterns of
change
Module 6
bioinformatics.
Template Matching (cont.)
Question
• Q: Identify compounds that decrease in the first three
groups but increase in the last group?
Module 6
bioinformatics.
PCA Scores Plot
PCA Loading Plot
Question
Q: Identify compounds that contribute most to the separation
between group 15 and 45
Module 6
bioinformatics.
PLS-DA Score Plot
Determine # of Components
Important Compounds
Model Validation
Questions
• Q: What does p < 0.01 mean?
• Q: How many permutations need to be performed if you
want to claim p value < 0.0001?
Module 6
bioinformatics.
Heatmap Visualization
Heatmap Visualization (cont.)
Module 6
bioinformatics.
Question
Q: Identify compounds with a low concentration in group 0, 15
but increase in the group 35 and 45
Q: Which compound is the only one significantly increased in
group 45?
Module 6
bioinformatics.
Download Results
Analysis Report
Module 6
bioinformatics.
Metabolite Set Enrichment Analysis
Metabolite Set Enrichment Analysis
(MSEA)
• Web tool designed to handle lists
of metabolites (with or without
concentration data)
• Modeled after Gene Set
Enrichment Analysis (GSEA)
• Supports over representation
analysis (ORA), single sample
profiling (SSP) and quantitative
enrichment analysis (QEA)
• Contains a library of 6300 predefined metabolite sets including
85 pathway sets & 850 disease
sets
http://www.msea.ca
Enrichment Analysis
• Purpose: To test if there are some biologically meaningful
groups of metabolites that are significantly enriched in
your data
• Biological meaningful
– Pathways
– Disease
– Localization
• Currently, only supports human metabolomic data
Module 6
bioinformatics.
MSEA
• Accepts 3 kinds of input files
• 1) list of metabolite names only
• 2) list of metabolite names + concentration data from a
single sample
• 3) a concentration table with a list of metabolite names +
concentrations for multiple samples/patients
Module 6
bioinformatics.
Start with a Compound List
Upload Compound List
Compound Name Standardization
Name Standardization (cont.)
Select a Metabolite Set Library
Result
Result (cont.)
The Matched Metabolite Set
Single Sample Profiling
Single Sample Profiling (cont.)
Concentration Comparison
Concentration Comparison (cont.)
Quantitative Enrichment Analysis
Data Set Selected
• Here we are using a collection of metabolites identified
by NMR (compound list + concentrations) from the urine
from 77 lung and colon cancer patients, some of whom
were suffering from cachexia (muscle wasting)
Module 6
bioinformatics.
Result
The Matched Metabolite Set
Module 6
bioinformatics.
Question
• Q: Are these metabolites increased or decreased in the
cachexia group?
Module 6
bioinformatics.
Metabolic Pathway Analysis with
MetPA
Pathway Analysis
• Purpose: to extend and enhance metabolite set
enrichment analysis for pathways by
– Considering the pathway structures
– Supporting pathway visualization
• Currently supports 15 organisms
Module 6
bioinformatics.
Data Upload
Module 6
bioinformatics.
Data Set Selected
• Here we are using a collection of metabolites identified
by NMR (compound list + concentrations) from the urine
from 77 lung and colon cancer patients, some of whom
were suffering from cachexia (muscle wasting)
Module 6
bioinformatics.
Normalization
Module 6
bioinformatics.
Pathway Libraries
Network Topology Analysis
Which Node is More Important?
High
degree
centrality
High
betweenness
centrality
Module 6
bioinformatics.
Pathway Visualization
Module 6
bioinformatics.
Pathway Visualization (cont.)
Module 6
bioinformatics.
Question
• Q: Which pathway do you think is likely to be affected the
most? Why?
Module 6
bioinformatics.
Result
Module 6
bioinformatics.
Not Everything Was Covered
•
•
•
•
•
•
Clustering (K-means, SOM)
Classification (SVM, randomForests)
Time-series data analysis
Two factor data analysis
Peak searching
….
Module 6
bioinformatics.