A blind search for patterns

Download Report

Transcript A blind search for patterns

A blind search for
patterns
Unravelling low replicate data
ExSpec Pipeline
Data: Structure and variability
 Structure
 Between 500-10,000+ features
 Each feature has an associate ion
count for each sample aligned.
 Data is not normally distributed.
 Variability
 Up to 30% technical variability
 Each feature is effected differently
Data Structure and variability
Data: Structure and variability
The majority of features
that are detected are
singletons.
Low Replicate data
 “Suck it and see”
 One off project
 Pump priming projects
 Medical samples
 Biopsy
 Difficult to access
 Ecological data
 Resampling is difficult
Methods
 Finger printing
 PCA
 Basic scoring
 PDE model
 Gradient search
 Differential analysis
PCA
 Very simple
 Can be highly informative
 Depends on the data
 Used in pipeline
 Data quality
PCA
Analysis
Bruno Project
 Samples :
 Human biopsy
 Replication – biopsy cut into
equal parts
PCA
Analysis
 N group
 Non-cancer biopsy
 T group
 Cancer biopsy
Using PCA clustering we are
able to distinguish between
healthy and sick patients
PCA
Analysis
PCA reveled profile similarity which
correlated with biological evidence
PCA
Analysis
Human Urine project
• 22 patients sampled
• 11 healthy and 11 sick
patients
• Sample labels dropped
PCA
Analysis
Ecological Data
Large number of
samples without clear
replication.
PCA
Analysis
Cluster pattern:
Find the features which
hold the cluster pattern
PCA
Analysis
Using PCA and profile
similarity analysis subset of
features of interest were
found
Basic Scoring
 Use Z-score to sort data
 Use this to pull out important features.
 Control – Exp
 With two class problem we can use PDE modelling.
Basic Scoring :
PDE modelling

Multi class problem

Plants


Wild type

act ko mutant
Treatments

Normal light

High light
Gradient Analysis
 Use rate of change of abuandace to
 Mine data for spesifc trends
 Find features of intrest
 Use PDE modelling of rates
Gradient
Analysis
Mining for
features which
showed rapid
increase due
to a specific
treatment
Data Provided by:
 Brno
 Ted Hupp
 Rob O’Neill
 Urine study
 Steve Michell
 John Mcgrath
 Ecological data
 Dave Hodgson
 Nicole Goody
 Gradient analysis
 John Love
 Data scoring
 Nicholas Smirnoff
 Mike Page
Metabolomics and Proteomics Mass
Spectrometry Facility
@ The University of Exeter
Nick Smirnoff (Director of Mass Spectrometry) [email protected]
Hannah Florance (MS Facility Manager) [email protected]
Venura Perera (Bioinformatics and Mathematical Support) [email protected]
http://biosciences.exeter.ac.uk/facilities/spectrometry/
http://bio-massspeclocal.ex.ac.uk/
About me
 Background
 Applied Maths
 Untargeted metabolite profiling
 Research interests
 Data driven modelling
 Small molecule profiling
 Gene regulatory network modelling
 Application of mathematical methods
 Metabolite identification using LC-MS/MS