Introduction to Matlab & Data analysis
Download
Report
Transcript Introduction to Matlab & Data analysis
Introduction to Matlab
& Data Analysis
Final Project:
That’s all, Folks!
Yuval Hart, Weizmann 2010©
1
Outline
Parsing files
Efficient programming - vectorization
Correlation coefficients
Passing extra parameters
Image plotting
Curve Fitting & Optimization
Figure handling
2
“Rotation in 60 minutes”
3
Rotation in 60 minutes:
During the past
month you’ve
measured promoter
activity of 20 genes.
Your PI wants you
to present your
results at the next
group meeting.
4
To Do List
Get the sequences of the genes from a
GenBank+Fasta files and calculate GC
content
Display all correlation coefficients of the
measured PA and relation to GC content
Find for the highest 4 genes, how
correlation decays with distance from
initial gene in the pathway
5
To Do List
Get the sequences of the genes from a
GenBank+Fasta files and calculate GC
content
Display all correlation coefficients of the
measured PA and relation to GC content
Find for the highest 4 genes, how
correlation decays with distance from
initial gene in the pathway
6
GenBank file format
7
Step 3: Attach every gene name with
its DNA sequence
%
%
%
%
%
Build the structure with all needed fields:
Build the structure Genes with the desired genes and their data:
name, startPosition, endPosition, sequence, complement (1/0), GCcontent
This is also the way to preallocate for structures:
Genes(1,sum(indGeneList))=struct( 'name', [], 'complement', [], 'sequence',[],...
'StartPosition',[],'EndPosition',[],'GCcontent',1);
Genes=struct('name',geneNames(indGeneList),…
'complement', num2cell(indComplement(indGeneList)'),...
'StartPosition',CDSpositionStartEndCelled(indGeneList,1)',…
'EndPosition',CDSpositionStartEndCelled(indGeneList,2)',...
'sequence',seq,'GCcontent',GCcontent);
a=Genes;
Note: Structures are assigned one by one only with cell arrays
8
To Do List
Get the sequences of the genes from a
GenBank+Fasta files and calculate GC
content
Display all correlation coefficients of the
measured PA and relation to GC content
Find for the highest 4 genes, how
correlation decays with distance from
initial gene in the pathway
9
Calculate and plot Correlation Matrix
%
%
%
%
%
Load the list of genes and measurements
Input:
measurement mat file contains:
geneList - a cell array of the genes Names
measurements - a matrix of 20 genes measurements at 1001 time points
GenesGCcontent - a vector of the genes GCcontent values
%measurements has a row for each gene containing its measurements through
%1001 time points and the geneList names
load measurements
10
Plot GC content and mean PA dependence
Plot fit results upon the previous graph:
Note:
Smoothed
data
can lower the
effect
of outliers
11
Calculate and plot Correlation Matrix
Calculate and display the corr. matrix
12
To Do List
Get the sequences of the genes from a
GenBank+Fasta files and calculate GC
content
Display all correlation coefficients of the
measured PA and relation to GC content
Find for the highest 4 genes, how
correlation decays with distance from
initial gene in the pathway
13
Step 2: Fit correlations to the desired
function
Using anonymous function to add more
Parameters and fitting using lsqcurvefit:
initDis=-0.1;
c0=[.7 0.1]; %assigning the initial values for the fit search
paramfunc = @(c,x)FittingCurveExpGuess(c,x,initDis); %def. of the anonymous function
ExpParam=lsqcurvefit(paramfunc,c0,XdataPoints,correl,[0 -1],[1 1],options);
Function name
Initial guess
X data
Y data
Lower
bound
function y_hat=FittingCurveExpGuess(c,x,init)
% This assumes an exponential decreasing curve
y_hat=init+c(1)*exp(c(2).*x);
upper
bound
14
Step 3: Plot the correlation data and fit
15
Best of Luck in the Group
Meeting !
16
Best of Luck in the Group
Meeting !
17
This is the end, my friend, the end
"Louis, I think this is the beginning of a beautiful friendship."
18