MATLAB for the Life Sciences

Download Report

Transcript MATLAB for the Life Sciences

© 2006 The MathWorks, Inc.
Accelerating Life Science Research
with MATLAB and SimBiology
John Brinegar – Account Manager
Brett Shoelson, PhD – Principal Application Engineer
Search jobs on Monster.com reveals 900 for
MATLAB and 200 for Simulink
1000
900
800

20 million web results non-MathWorks sites only

796,000 Google Groups
results since Jan 2001
26% more than Fortran
700
600
500
400
MATLAB
Google Facts

Sum of 3
technical
products
65,000 results for
“matlab tutorial”
300
Simulink
200
100
0
June 2001
February 2006
2
FMI session, Rhesus ONH
3
FMI session, Rhesus ONH
4
Result: A pseudo-colored map of absolute volumetric blood flow—
data intensive, quantification of a traditionally qualitative test!
5
MATLAB helped a novice programmer tackle a
nightmarish problem

Pre-defined functionality for a broad range of
applications
 File I/O, image processing, statistics, video
manipulation, and visualization

MATLAB is a high-level language
 Easy to develop and implement algorithms
Small learning curve
No pointers, declarations, pre-allocations, etc.



I could make my computer “sing and dance” within
a short time
6
7
Life Science Necessities: Flexibility and Breadth
f(x)
…
my_app.exe
8
Typical Workflow?
Access Data
Analyze Data
Share Results
9
Today’s Agenda
Introduction to MATLAB
Data Gathering
Automating Analyses and Workflows
Introduction to Simulink and SimBiology
Distributed Computing
Summary and Questions
10
A typical workflow example:
Gene Expression Analysis
DEMO
11
Today’s Agenda
Introduction to MATLAB
Data Gathering
Automating Analyses and Workflows
Introduction to Simulink and SimBiology
Distributed Computing
Summary and Questions
12
Data Gathering – Breadth and Depth
Easily load common file formats
Excel, CSV and other text
Image (jpeg, tiff, gif, bmp, png, etc.)
Access to many specialized formats
Sequence data (fasta, embl, genbank, etc.)
Microarray (Affymetrix, GenePix, GEO, etc.)
BLAST Reports, Mass Spec, Phylogenetic Trees, etc.
Complete integration to SQL and ODBC sources
Direct Access to External Hardware
Video Cameras, Medical Equipment, etc.
13
Example: Seamless Database Connections
Visual Query Builder
Access
Scroll
data without knowing SQL
through tables and fields
Customize
Built-in
your query
visualization tools
Plotting
and charting
Creating
HMTL reports
Handling
date strings
Reuse
SQL statements in your own program
SQL
Querying
DEMO
14
Today’s Agenda
Introduction to MATLAB
Data Gathering
Automating Analyses and Workflows
Introduction to Simulink and SimBiology
Distributed Computing
Summary and Questions
15
The Critical Need for Automation
16
Problems with insufficiently automated analyses

Lack of objectivity

Inadequate metrics for quantification, publishing

Slow, Costly

Boredom

Human error, transcription errors

Limited scientific discovery
17
The advantage of automation

Obtain objective results

Reduce costs and boredom

Decrease processing and analysis time

Alleviate human errors and transcription errors

Serendipity
18
Consider this image from National Cancer Institute:
19
Quantifying Tissue Metastasis

Goal: To quantify the amount of tissue metastasis
for a given image

Initial method: Post-doc sits behind microscope
and counts the number of metastatic spots
Pros
 Relatively simple
 not too time consuming
for one image
Cons
 Error-prone
 Boring
 Not a very convincing metric
20
Quantifying Tissue Metastasis

Goal: To quantify the amount of tissue metastasis for a
given image

Initial method: Post-doc sits behind microscope and counts
the number of metastatic spots

How automation helped:




Obtain objective results
Reduce costs and boredom
Decrease processing and analysis time
Alleviate human errors and transcription errors
DEMO
21
Tectorial Membrane
22
Atomic Force Microscope (AFM)

Goal: Determine elasticity of Tectorial Membrane
23
Atomic Force Microscope (AFM)
Initial method to measure tissue elasticity:
 Analysis of 1 AFM file took 30-40 minutes
 A realistic goal was to analyze 10 files in one day
With automation, the obtainable amount of data increased significantly


Analysis of 1 AFM now took 3-4 seconds
Now we could analyze 100s of files in a portion of a day
24
Serendipity!


Discovered radial gradient in the material properties;
new hypotheses about function of the tissue
Suggested new use for AFM itself!
Shoelson B et al. Biophys J. 87:2768—2777; Oct 2004.
25
Fluorescein Angiogram of a Rat
26
Analysis of Fluorescein Angiogram
Goal: Determine mean circulation time (MCT) and
retinal blood flow (RBF)
Io
Intensity, I
Ip

0

tp
tm
Time, t
Fit Intensity-vs-Time to lognormal curve
parameterized by Io, Ip, tp, b (shape factor)
27
Analysis of Fluorescein Angiogram
I
to
tA
t
 3 
tm  t p  t A  exp 
 4b 
MCT  tm ,vein  tm ,artery
RBF  D 2 art  D 2 vein  MCT
28
Analysis of Fluorescein Angiogram

Previous Method
 Manually track vessels, collecting time-intensity
data (40 minutes in a dark room!)
 Manually identify arteries, veins
 Transfer intensity information to statistics
package to calculate fit parameters
 Determine MCT
 Manually measure vessel pairs
 Calculate RBF
 Log results in lab notebook
29
Analysis of Fluorescein Angiogram

Improved Method:

Perfect application for neural networks classification
Automated the analysis with MATLAB and Toolboxes
Code currently used in labs worldwide

Let’s take a look!


DEMO
30
Analysis of Fluorescein Angiogram

Goal: Determine mean circulation time (MCT) and
retinal blood flow (RBF)

Previous Method
 Time consuming
 Very subjective

Automation allowed us to:
 Obtain objective results
 Decrease processing and analysis time
 Reduce costs and boredom
31
Typical Workflow?
Access Data
Analyze Data
Share Results
32
Today’s Agenda
Introduction to MATLAB
Data Gathering
Automating Analyses and Workflows
Introduction to Simulink and SimBiology
Distributed Computing
Summary and Questions
33
SimBiology, Systems Biology
NEW!
34
Questions to ponder…
the value of
modeling biological
What is “Systems
Biology”?
systems in silico?
35
Why Create Quantitative Biochemical
Reaction Models?

Biochemical pathways start out simple and quickly grow in complexity.

Testing pathways via experiment is expensive in both time and money.
Quantitative modeling narrows the range of experiments.

Once created and validated with experiments the quantitative model
can be used as an in-silico sandbox to test new ideas dramatically
faster than through experimentation.
36
Challenges with in silico biochemical modeling

Integrating knowledge from experimental data,
intuition, literature, and other models is difficult

Modelers and scientists have difficulty
communicating knowledge and sharing work

The mathematics for solving these models is
evolving faster than the tools

Many different tools are needed to complete
entire workflow
37
Modeling Biological systems

Build models




Import SBML
Drag-and-drop from block
library
Enter in chemical reactions
Simulate

Estimate parameters
using experimental data

Isolate relevant
parameters using
sensitivity analysis
Model created by
Merrimack Pharmaceuticals
38
SimBiology example
DEMO
>> sbiodesktop
39
Introduction to SimBiology

Provides one environment for both
graphical and programmatic pathway
analysis

Provides one tool for modeling,
simulating, and analyzing pathways

Used by modelers or programmers to
gain insight into their pathway and to
communicate their pathway with
biologists
40
Key Features

Building a model





Running a simulation



Tabular interface
Block-diagram editor
Via MATLAB code
Import SBML files
Stochastic
Deterministic
Analyzing a model



Parameter estimation
Sensitivity analysis
Moiety conservation
41
Let’s build a simple model…
Gene Regulation

A simple gene
regulation model
with transcription,
translation, and
negative feedback
to suppress
transcription
42
Let’s build a simple model…
Transcription: the process
through which a DNA sequence
is enzymatically copied by an
RNA polymerase to produce a
complementary RNA; the
transfer of genetic information
from DNA into RNA.
Translation: the second
part of protein biosynthesis, in
which an mRNA sequence is
converted to a chain of amino
acids to form a protein.
DEMO
>> sbiodesktop
43
Pharmacokinetics
The study of what the body does to a drug after administration
The study of Absorption, Distribution, Metabolism, and
Excretion (ADME) of drugs in the body
Pharmacodynamics
The study of what the drug does to the body
The study of the biochemical and physiological effects of drugs
mechanisms of drug action relationship between drug
concentration and effect
PROBLEM: The effect of a drug is calculated from the amount in
the biophase, which, unfortunately, cannot be directly measured.
PK knowledge is needed to model transfer of drug from blood to
effect site
44
Challenges in PK/PD modeling

Many tools are not user-friendly
 NONMEM, Basic, Fortan, C: Building and maintaining models can
be difficult.

Organ Specific or niche Simulation tools are too complex and/or
blackbox
 Organ models not editable, methods are not viewable
 Flexibility is limited

Workflow is manual, not automated
 Modelling, simulation, statistics, and visualization all require different
tools

Manual integration is time consuming
45
PK Example – Transdermal Input
• Nicotine patch is applied to the skin for 16 hours
• Overlapping zero-order input rates
• Drug concentration monitored for 24 hours
• Single compartment model
Rapid decrease in concentration
when infusion rates drop
Patch
Skin
Ffast =
Total dose – Doseslow
Timefast
Fslow =
Doseslow
Timeslow
Burst + controlled
infusion
No infusion
Controlled infusion
dC/dt = (Ffast + Fslow – Cl*C)/V
46
PK Example… continued





Total dose of 15890 µg
Fast infusion runs for time Timefast
Slow infusion runs for time Timeslow
Initial nicotine concentration = 2 µg/L
Initial Parameter Estimates
 V
= 140 L
 Cl
= 78 L/h
 Doseslow = 10000 µg
 Timefast
= 6h
 Timeslow
= 17 h
Obtain from previous studies or create
M-files to calculate approximates
Experimental Data
Time
Concentration
0.1
25
0.15
35
0.25
42
0.5
48
0.75
46.5
0.8
47
1
46
1.25
44.5
1.5
45
2
42.1
3
63
4
70.15
5
72
6
65
8
55
12
39
24
14
DEMO
>> PkPD_Demo
47
PK Example – What about more complex models?
Generic PBPK model of rat:
From Poulin and Thiel; J Pharmaceutical Sciences. 91:5, May 2002.
48
PK Example – What about more complex models?
From Poulin and Thiel; J Pharmaceutical Sciences. 91:5, May 2002.
49
PK Example – Let’s show how we might implement
this in SimBiology!
Generic PBPK model of rat:
DEMO
From Poulin and Thiel; J Pharmaceutical Sciences. 91:5, May 2002.
>> sbiodesktop
50
Today’s Agenda
Introduction to MATLAB
Data Gathering
Automating Analyses and Workflows
Introduction to Simulink and SimBiology
Distributed Computing
Summary and Questions
51
Market trend
The 10 GFlop Personal Computer
104 more power for the money vs 1991
2005
4xShuttle
$4,000
10,000
7,500
5,000
2,500
0
1991
1998
1991
Cray Y-MP C916
Sun
HPC10000
$40,000,000
$1,000,000
1998
2005
52
Distributed Computing with MATLAB
Client Machine
CPU
CPU
Toolboxes
CPU
Blocksets
CPU
53
Distributed Computing with MATLAB
MATLAB Distributed
Computing Engine
Client Machine
Task
Result
CPU
Worker
Task
Job
Toolboxes
Distributed
Computing
Toolbox
Result
CPU
Worker
Task
Result
Scheduler
Result
CPU
Worker
Task
Blocksets
Functionality:
• Create jobs
• Create tasks
• Pass data
• Retrieve results
Result
Functionality:
 Queue jobs
 Dynamically license workers
 Evaluate tasks
CPU
Worker
54
Support for Parallel Applications
Client Machine
Task
Result
CPU
Worker
Task
Job
Toolboxes
Blocksets
Distributed
Computing
Toolbox
Result
CPU
Worker
Task
Result
Scheduler
Result
CPU
Worker
Task
Result
CPU
Worker
MPI support
55
56
57
58
Today’s Agenda
Introduction to MATLAB
Data Gathering
Automating Analyses and Workflows
Introduction to Simulink and SimBiology
Distributed Computing
Summary and Questions
59
Career Opportunity with The MathWorks:
BioPharmaceutical Application Engineer



Lead customer-facing engineer for SimBiology
 Work directly with customers interested in modeling biochemical
pathways with MATLAB and SimBiology
 Support the broad range of BioPharmaceutical applications of The
MathWorks software
 Work closely with Development and Marketing to influence product
direction and positioning
Requirements
 MATLAB experience
 BioTech / Pharmaceutical knowledge
 Experience modeling dynamic systems, with a strong mathematics
background
 Familiarity with Systems Biology or pathways is a plus
To Apply
 Submit resume to Brett: [email protected]
60
Questions?
61
Support and Community
MATLAB Homework Helper
Faculty Center
62
MathWorks Academic Web Site
http://www.mathworks.com/academia





Faculty and Student Centers
Curriculum Exchange by
Academic Department
 Course materials
 Product recommendations
 MATLAB based books by
discipline
Product Tutorials for Students
 Introduction to MATLAB
Basics
 Introduction to Simulink
Academic licensing and pricing
overview
MATLAB & Simulink Student
63
Consulting from The MathWorks
MATLAB Homework Helper
Using MathWorks products, you can explore and solve many different science and
engineering problems. To help you get started using these tools, we've developed
several example homework problems.
The math behind bungee jumping
The physics of baseball
Model a simple control system
View and modify outputs from Simulink and send data to MATLAB
Collision of billiard balls
http://www.mathworks.com/academia/student_center/homework/
64
MATLAB Central
File exchange and newsgroup access for MATLAB and
Simulink users


150,000 visits per month
Over 2,800 files in the exchange




General-purpose functions,
industry- and application-specific tools and examples
100 new submissions per month
5,000 downloads per day
5,000 posts to “CSSM” (comp.soft-sys.matlab) per month, 60%
routed through MATLAB Central
www.mathworks.com/matlabcentral
65
Technical Support
• Technical Support
- 90% of problems solved in 24 hours
- 60+ Application Engineers on staff, ½ with
Masters Degrees
• World Wide Web
(www.mathworks.com)
- 24x7 self-service technical support
- over 9,000 technical solutions
- software archive (ftp.mathworks.com)
- MATLAB Digest – electronic newsletter
• Newsgroup (comp.soft-sys.matlab)
66
Further Information
• Product information and demos:
John Brinegar
[email protected]
508-647-7649
• MathWorks Academic
Teaching and Research:
www.mathworks.com/academia
• Trials and technical literature: www.mathworks.com
67
Bioinformatics Toolbox
68
The Bioinformatics Toolbox and MATLAB
Data I/O
File support
Web connectivity
Microarray Analysis
Mass Spec Analysis
Normalization
Visualization
Baseline removal
Profile alignment
Peak detection
Sequence Analysis
Pairwise sequence alignment
Multiple sequence alignment
Phylogenetic Analysis
Statistical Learning
Support Vector Machines
K-nearest neighbor
69
Supported Data Formats
Sequence Data
 FASTA
 EMBL
 GenBank
 PDB
 PFam
 SCF
 ClustalW
 BLAST results
Microarray Data
 Affymetrix
 Agilent
 GenePix
 SPOT
 ImaGene
 Gene Expression
Omnibus
Other Data Formats
 CSV
 Excel
 JCAMP
GenePix is a registered trademark of Axon Instruments, Inc.
Affymetrix is a registered trademark of Affymetrix, Inc.
70
Design of Primers for Automated DNA Sequencing
Calculate properties of primers
Filter primers based on GC content or Tm
Check for dimerization and hairpin formation
Retrieve primer pairs
Find restriction enzyme that cut inside primer
Isolate primers lacking a GC clamp
…
Fwd/Rev Primers
Pos %GC mT Length
actgactccttgctactctg 845 50 54.58
cacatagcccttgccataag 1137 50 54.37 292
71
Applied Biosystems Develops
a Crucial DNA Sequencing
®
Algorithm in MATLAB
The Challenge
 To develop a robust yet flexible calibration
algorithm to be
included in a high-throughput
DNA analysis instrument
The Solution
 Use MATLAB to test ideas and
code a prototype, and then use
the MATLAB compiler tools to
convert the algorithm to C/C++
code and test it at a component level
The Results
 Research options extended
 Calibration errors reliably detected
 Development completed in 2-3 weeks
A portion of the DNA dye-label spectral profile, which
allows the researcher to read the sequence of bases in a
selected strand of DNA
“Having one integrated package is a big
advantage. Using MATLAB® and the MATLAB
Compiler reduced my development time by a
factor of 4 or 5."
Jim Labrenz,
Applied Biosystems
72
®
Infinity Pharmaceuticals Standardizes on MATLAB for Drug
Discovery Data Analysis
The Challenge
 To standardize on a drug discovery data analysis tool
that handles large data sets and integrates with
existing systems
The Solution
 Use MATLAB® to automate the process of capturing,
analyzing, visualizing, and importing data
The Results
 Development costs reduced by $100,000
 Future cost savings expected
 Quality results delivered consistently
The test station interface
“With MATLAB, we reduced our
development time significantly. That
has resulted in an annual savings of
$100,000."
Dennis Underwood,
Infinity Pharmaceuticals
73
Life Science Necessities: Flexibility and Breadth
my_app.exe
75
Integrating MATLAB with Excel
Problem:



Wet-bench scientists prefer to work with Excel
Many instruments output data to Excel
Excel is ubiquitous in the bioinformatics field
Solution:

MATLAB works seamlessly with Excel…..
 The advantage of the flexible MATLAB
programming environment and powerful
mathematics in Excel
76
Integrating with Excel
Spread Sheet Applications

MATLAB Excel Link can be
the computational engine
behind your Excel applications

Transport data and perform
MATLAB analysis with the
click of a button

Fast scalable solution
MLPutMatrix("data",B2:H43)
MLPutMatrix("Genes",A2:A43)
MLPutMatrix("TimeSteps",B1:H1)
MLEvalString("clustergram(data,'RowLabels',…
Genes,'ColLabels',TimeSteps)")
77
Deploying to Excel
MATLAB Builder for Excel

Create Excel plug-in
to give to biologists
& clinicians

Appears as userdefined function in
formula toolbar
 As easy to use as
sum, mean, etc.
78
GENE REGULATION
Demo
…Switch to MATLAB, and launch the
SimBiology platform
>>sbiodesktop
79
GENE REGULATION
Demo
…Create and name a new project
File → New Project → RENAME to
GeneRegulation →
Dbl Click DIAGRAM pane
80
GENE REGULATION
Demo
…Add 4 species and 5 reactions
SPECIES
Protein
DNA
Protein_DNAComplex
mRNA
REACTION
BindingUnbinding
Transcription
Translation
ProteinDegradation
mRNA_Degradation
81
GENE REGULATION
Demo
…Connect lines species and reactions appropriately
82
GENE REGULATION
Demo
…Use Kinetic Law Tab (of Reaction Pane) to set reaction type
(Mass Action) and forward (and reverse) parameters
83
GENE REGULATION
Demo
…Initialize amounts (DNA = 50)
and parameters (optional)
84