A blind search for patterns
Download
Report
Transcript A blind search for patterns
A blind search for
patterns
Unravelling low replicate data
ExSpec Pipeline
Data: Structure and variability
Structure
Between 500-10,000+ features
Each feature has an associate ion
count for each sample aligned.
Data is not normally distributed.
Variability
Up to 30% technical variability
Each feature is effected differently
Data Structure and variability
Data: Structure and variability
The majority of features
that are detected are
singletons.
Low Replicate data
“Suck it and see”
One off project
Pump priming projects
Medical samples
Biopsy
Difficult to access
Ecological data
Resampling is difficult
Methods
Finger printing
PCA
Basic scoring
PDE model
Gradient search
Differential analysis
PCA
Very simple
Can be highly informative
Depends on the data
Used in pipeline
Data quality
PCA
Analysis
Bruno Project
Samples :
Human biopsy
Replication – biopsy cut into
equal parts
PCA
Analysis
N group
Non-cancer biopsy
T group
Cancer biopsy
Using PCA clustering we are
able to distinguish between
healthy and sick patients
PCA
Analysis
PCA reveled profile similarity which
correlated with biological evidence
PCA
Analysis
Human Urine project
• 22 patients sampled
• 11 healthy and 11 sick
patients
• Sample labels dropped
PCA
Analysis
Ecological Data
Large number of
samples without clear
replication.
PCA
Analysis
Cluster pattern:
Find the features which
hold the cluster pattern
PCA
Analysis
Using PCA and profile
similarity analysis subset of
features of interest were
found
Basic Scoring
Use Z-score to sort data
Use this to pull out important features.
Control – Exp
With two class problem we can use PDE modelling.
Basic Scoring :
PDE modelling
Multi class problem
Plants
Wild type
act ko mutant
Treatments
Normal light
High light
Gradient Analysis
Use rate of change of abuandace to
Mine data for spesifc trends
Find features of intrest
Use PDE modelling of rates
Gradient
Analysis
Mining for
features which
showed rapid
increase due
to a specific
treatment
Data Provided by:
Brno
Ted Hupp
Rob O’Neill
Urine study
Steve Michell
John Mcgrath
Ecological data
Dave Hodgson
Nicole Goody
Gradient analysis
John Love
Data scoring
Nicholas Smirnoff
Mike Page
Metabolomics and Proteomics Mass
Spectrometry Facility
@ The University of Exeter
Nick Smirnoff (Director of Mass Spectrometry) [email protected]
Hannah Florance (MS Facility Manager) [email protected]
Venura Perera (Bioinformatics and Mathematical Support) [email protected]
http://biosciences.exeter.ac.uk/facilities/spectrometry/
http://bio-massspeclocal.ex.ac.uk/
About me
Background
Applied Maths
Untargeted metabolite profiling
Research interests
Data driven modelling
Small molecule profiling
Gene regulatory network modelling
Application of mathematical methods
Metabolite identification using LC-MS/MS