Octave - Regression + Signals Slides
Download
Report
Transcript Octave - Regression + Signals Slides
Systems of Equations and
Signal Processing
in Octave
Mitch Parry
First, a Short Primer on Regression
• Predict one (or more) dependent variables
given one (or more) independent
variables.
– Matrix Multiplication
– Regression
• Equation of a line
• Best fit line for given 2-dimensional data
• Multiple dependent and/or independent variables:
Multivariate regression
First, a Short Primer on Regression
• Matrix Multiplication
• Regression
– Equation of a line
– Best fit line for given 2-dimensional data
– Multiple dependent and/or independent
variables: Multivariate regression
Quick: What is Matrix
Multiplication?
Y
A
X
Quick: What is Matrix
Multiplication?
In Octave
>> A = [1 2 3; 4 5 6]
>> B = [1 3 5; 2 4 6]
>> C = A * B
>> size(A)
>> size(B)
In Octave
>> C = A * B’
>> C = A’ * B
First, a Short Primer on Regression
• Matrix Multiplication
• Regression
– Equation of a line
– Best fit line for given 2-dimensional data
– Multiple dependent and/or independent
variables: Multivariate regression
Regression: 1 Point Makes a Line?
College GPA
For now, assume y-intercept, b = 0.
SAT
• In Octave:
In Octave
>> x = 5
>> y = 3
>> a = y / x
Slope = rise / run = 3/5 = 0.6
Regression: N Points Make a Line
College GPA
For now, assume y-intercept b = 0.
SAT
How to infer ‘a’?
Regression: N Points Make a Line
Find Global Minimum
For N = 1:
Matrix-vector form
College
GPA
Students
Slope
Students
y
a
x
SAT
Stack samples into vectors
• In Octave:
• a contains the slope.
• x contains SAT
scores.
• y contains college
GPAs.
(Regardless of N!!)
In Octave
>> x = [1 2 5]
>> y = [2 4 5]
>> a = y / x
>>
>>
>>
>>
>>
figure;
plot(x,y,’o’);
hold on;
plot(x,a*x,’xk-’);
saveas(gcf,’npoints.png’);
College GPA
More Generally: K-dimensional
Regressors (X)
SAT
K-dimensional Regressors (X)
College
GPA
Students
Slope
y
a
Students
X
• In Octave:
(Same as before!!)
SAT
HS GPA
ACT
In Octave
>> x = [1 2 5; -9 -7 -1]
>> y = [2 4 5]
>> a = y / x
>>
>>
>>
>>
>>
figure;
plot(x(1,:),y,’o’);
hold on;
plot(x(1,:),a*x,’xk-’);
saveas(gcf,’npoints.png’);
College GPA
K-dimensional Regressors (X)
SAT
• In Octave:
College GPA, GRE
M-dimensional Predictions (Y)
SAT
M-dimensional Y
Slopes
Students
X
Y
• In Octave:
A
HS Stats
College Stats
Students
(Same as before!!)
College GPA, GRE
M-dimensional Predictions (Y)
• In Octave:
SAT
(Same as before!!)
Some applications
•
•
•
•
•
Sports Ratings
Audio (musical notes/ instruments)
Mass spectrometry imaging
Quantum Dot Fluorescence Imaging
Genotype Data
Fun with Regression
• Sports ratings
– Want to predict score differential for game
between team i and team k.
Games
Score
Differential
-27
-4
-7
• In Octave:
Games
Linear Regression
Teams
1 -1
1
-1
1 -1
Team
Ratings
A Elon
B GSU
ASU
Results (2011)
Elon 14, GSU 41
Elon 24, ASU 28
GSU 17, ASU 24
(Same as before!!)
Games
Score
Differential
-27
-4
-7
Games
In Octave
Teams
1 -1
1
-1
1 -1
Team
Ratings
Elon
GSU
ASU
Results (2011)
Elon 14, GSU 41
Elon 24, ASU 28
GSU 17, ASU 24
>> A = [1 -1 0; 1 0 -1; 0 1 -1];
>> y = [-27; -4; -7];
>> x = A \ y
warning: matrix singular to machine precision, rcond = 0
x =
Games are not independent:
Row 1 + Row 3 = Row 2
We need another constraint…
Games
Score
Differential
-27
-4
-7
[
0
]
Games
Linear Regression
Teams
1 -1
1
-1
1 -1
Team
Ratings
Elon
GSU
ASU
[
Results (2011)
Elon 14, GSU 41
Elon 24, ASU 28
GSU 17, ASU 24
1
1
...
1
]
Games
Score
Differential
-27
-4
-7
Games
Linear Regression
Teams
1 -1
1
-1
1 -1
Team
Ratings
Elon
GSU
ASU
Results (2011)
Elon 14, GSU 41
Elon 24, ASU 28
GSU 17, ASU 24
>> A = [1 -1 0; 1 0 -1; 0 1 -1; 1 1 1];
>> y = [-27; -4; -7; 0];
>> x = A \ y
x =
-10.3333
6.6667
3.6667
Ratings
GSU,
6.67
ASU,
3.67
Elon, -10.33
Predict that on average Georgia Southern beats App State by 3 points.
Games
Score
Differential
-27
-4
-7
Games
Linear Regression
Teams
1 -1
1
-1
1 -1
Team
Ratings
Elon
GSU
ASU
Results (2011)
Elon 14, GSU 41
Elon 24, ASU 28
GSU 17, ASU 24
How many more points does App State need to finish first?
Ratings
GSU,
6.67
ASU,
3.67
Elon, -10.33
Just Win!
-1
-1
-1
Games
Games
Win?
Teams
1 -1
1
-1
1 -1
Team
Ratings
Elon
GSU
ASU
Results (2011)
Elon 14, GSU 41
Elon 24, ASU 28
GSU 17, ASU 24
>> A = [1 -1 0; 1 0 -1; 0 1 -1; 1 1 1];
>> y = [-1; -1; -1; 0];
>> x = A / y
x =
-6.6667e-001
1.6025e-017
6.6667e-001
Ratings
ASU,
0.67
GSU,
0.00
Elon, -0.67
Predict that on average App State beats Georgia Southern.
Some applications
•
•
•
•
•
Sports Ratings
Audio (musical notes/ instruments)
Mass spectrometry imaging
Quantum Dot Fluorescence Imaging
Genotype Data
Cocktail Party Problem
Source
Microphone
Key Mixture
Characteristic:
- Amplitude difference
distance
Physical Isolation
• Sound booths
Hardware + Software
http://www.zoom.co.jp/english/products/h2/
Software Solution
Source A
Source B
+
Time
Time
=
Time
Source Separation
Estimate A
Time
Estimate B
Time
Signal Representations
Time Domain
Time-Frequency Domain
MIDI Piano
In Octave
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
pkg install signal
pkg load signal
[x,fs] = wavread(‘Lateralus-60.wav’);
x = mean(x,2);
[S,f,t]=specgram(x,1024,fs);
figure;
imagesc(t,f,log10(abs(S)));
axis xy
ylim([0 6000]);
xlim([10 15]);
Also, mouse pan/zoom. [P]
In Octave
>>
>>
>>
>>
>>
>>
>>
>>
[x,fs] = wavread(‘Lateralus-60.wav’);
x = mean(x,2);
[S]=stft(x,1024);
figure;
imagesc(log10(abs(S)));
axis xy
ylim([0 150]);
xlim([1000 1500]);
Also, mouse pan/zoom. [P]
In Octave
>>
>>
>>
>>
x2 = istft(S,1024);
figure;
mx = length(x2);
plot([x(1:mx) x2(1:mx)]));
Also, mouse pan/zoom. [P]
Spectra
Piano Note
Amplitude
A
X
Frequency (Hz)
Time (seconds)
Spectra
Bass Pluck
Amplitude
B
A
X
Frequency (Hz)
Time (seconds)
Spectra
Trumpet Note Amplitude
A
X
Frequency (Hz)
Time (seconds)
Nonnegative Matrix Factorization
Regression?
Y
Time
X
A
Sources
Sources
Frequency
Frequency
Time
Piano
Bass
Trumpet
Frequency
Y1
Y2
Y3
Time
• A contains spectral shapes.
• X contains amplitude
envelopes.
Multivariate Regression
• Block coordinate descent
– Randomly initialize A and X, then:
In Octave
>>
>>
>>
>>
>>
>>
>>
>>
[x,fs]=wavread(‘Lateralus-60.wav’);
x = x(1:10*fs,:);
X = stft(mean(x,2),1024);
[W,H] = nmf_omnispect_v3(abs(X),5);
figure;
plot(W); xlim([0 100]);
figure;
plot(H’);
In Octave
For One Microphone
Piano note
Piano
Mix
Flute note A
How to cluster?
Flute
Flute note B
Electric Bass, Vocals, and Organ
Spectra
Electric Bass, Vocals, and Organ
Amplitude
Extension to Multiple Microphones
• Extend factorization from one microphone
to multiple microphones utilizing spatial
information.
Some applications
•
•
•
•
•
Sports Ratings
Audio (musical notes/ instruments)
Quantum Dot Fluorescence Imaging
Mass spectrometry imaging
Genotype Data
Fluorescence Imaging
news.thomasnet.com
Xing, Chaudry, Shen, et al. “Bioconjugated
quantum dots for multiplexed and quantitative
immunohistochemistry,” Nature Protocols 2, 11521165, 2007.
Han, Gao, Su, and Nie. “Quantum-dot-tagged microbeads for multiplexed
optical coding of biomolecules,” Nature Biotechnology 19, 631-635, 2001
Gao, Cui, Levenson, et al. “In vivo cancer targeting and imaging with
semiconductor quantum dots,” Nature Biotechnology 22, 969-976, 2004.
Physical Isolation
Single stain on adjacent tissue slices
Liu et al. “Molecular Mapping of Tumor Heterogeneity on Clinical Tissue Specimens with Multiplexed
Quantum Dots,” ACS Nano 4(5): 2755-2765, 2010.
Spectral Isolation
-Pick wavelengths
with no overlap.
- At most two stains
Liu et al. “Molecular Mapping of Tumor Heterogeneity on Clinical Tissue Specimens with Multiplexed
Quantum Dots,” ACS Nano 4(5): 2755-2765, 2010.
Nonnegative Matrix Factorization
1 2
3 4 5
6 7
Image 1
vectorized
Spectra
1 2 3 4 5 6 7
Y
Pixels
Spectra
Pixels
X
A
Multispectral Image
• A contains
spectral shapes.
• X contains the
intensity at each
pixel.
4 QDs + Autofluorescence
QD 565 nm
QD 655 nm
QD 605 nm
QD 705 nm
Some applications
•
•
•
•
•
Sports Ratings
Audio (musical notes/ instruments)
Quantum Dot Fluorescence Imaging
Mass spectrometry imaging
Genotype Data
Mass Spectrometry Imaging
Albany.edu
Richard Caprioli, Vanderbilt
http://ww2.chemistry.gatech.edu/~bims/
Mass Spectrometry Imaging
Histochemical stain
• Immunohistochemistry
– 1or 2 molecules
• Quantum dots
– Up to 10 molecules
• Mass spectrometry
Abundance
– >100 molecules
Mass spectrometry
m/z
Quantum Dot Stain
Imaging Mass Spectrometry
Abundance
• Mass spectrometry
m/z
Imaging
Nonnegative Matrix Factorization
1 2
3 4 5
6 7
Image 1
vectorized
1 2 3 4 5 6 7
Y
Mass Spectrometry Image
Pixels
Molecules
Molecules
Pixels
X
A
Mass Spectrum
• A contains
spectral shapes.
• X contains the
intensity at each
pixel.
Acetaminophen + Sharpie (pen)
Total Ion Image
Ion Isolation
m/z 152
m/z 443
m/z 174
Source Separation
NMF 1
NMF 3
NMF 2
Some applications
•
•
•
•
•
Sports Ratings
Audio (musical notes/ instruments)
Quantum Dot Fluorescence Imaging
Mass spectrometry imaging
Genotype Data
DNA Sequencing
http://www.shimadzu.com
http://www.ocf.berkeley.edu/~edy/genome/sequencing.html
ccbusm.com
http://hapmap.ncbi.nlm.nih.gov/
Population Inference from
Genotype Information
• Genotype
– Each gene exists in various versions called alleles.
– Different allele combinations produce different phenotypes
• BB or Bb or bB Brown Eyes
• bb Blue eyes
• Single Nucleotide Polymorphisms (SNP)
– Marker to distinguish different alleles
– G, C, A, or T, compared to a reference, say ‘T’
• Human Genotype:
– 0, 1, or 2 copies of the reference nucleotide
• Data matrix:
– L loci N individuals
– e.g., 10K 1,000 data matrix
Nonnegative Matrix Factorization
Y
Individuals
Loci
Loci
Individuals
X
B
A
• A contains populations.
• X contains individuals.
Genomic Isolation:
Homogeneous Populations
Average Population
Genotype
High-dimensional space
(thousands or millions of loci)
More Realistic Mixture
Average Population
Genotype
Individuals are a
combination of one or
more homogenous
ancestral populations
High-dimensional space
(thousands or millions)
Ancestral Population
(Hardy-Weinberg Principle)
q2 = 0.6
q1 = 0.1
q3 = 0.3
Timing Comparison
Mixture Denseness
Number of Populations
Timing Comparison
Number of Individuals
Real Genotype Data from
HapMap3 Project
Admixture
Least-Squares
What do you want to do with
Octave?