Transcript mh_slides

18.338 Course Project
Numerical Methods
for Empirical
Covariance Matrix
Analysis
Miriam Huntley
SEAS, Harvard University
May 15, 2013
Real World Data
“When it comes to RMT
in the real world, we
know close to nothing.”
-Prof. Alan Edelman, last week
RMT
Who Cares about
Covariance Matrices?
• Basic assumption in many areas of data analysis:
multivariate data X = Y S1/2
• You get X , want to find S
n
T
• X X can be a very bad estimator if p finite
• Current standard using PCA (=SVD): distinguish from
null model X = YI
• In RMT language: any eigenvalues which lie very far
away from the distribution expected for a white
Wishart matrix should be considered signal
Who Cares about
Covariance Matrices?
Gene Expression Data
500
1000
Genes
1500
2000
2500
3000
3500
4000
20
40
60
Samples
80
Data from:
Alizadeh A, et al. (2000) Distinct types of
diffuse large B-cell lymphoma identifed by
gene expression profiling. Nature 403:503-511.
Why adventure beyond
white Wishart?
• Null model X = YI not particularly sophisticated.
Can we do better?
• Noise with structure
t
 Example: Financial data xi = s xi
 What if there is no right edge?
• Known S , how many samples do we need before
we recover it from empirical data?
Approach:
General MP Law
p
• Data matrixnxp
X where X = Y S
n
• Y entries are iid (real or complex) and E(Yi, j ) = 0,
1/2
p
nxp pxp
and define g =
2
E(Yi, j ) =1
• Let Hp be the spectral distribution of S p and
assume Hp converges weakly to H∞
T
• Let FP be the spectral distribution of XX (empirical)
and vFP its Stieltjes transform
• Then:
vFP ® v¥
1
l dH ¥ (l )
= z -g ò
, "z Î C +
v¥ (z)
1+ l v¥ (z)
See: Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a
class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192.
El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random
matrix theory, Ann. Statist. 36 (2008), 2757–2790
Numerical Solutions of
General MP
Single, True
Covariance
Matrix
True Covariance
Matrix Spectral
Distribution
-
1
l dH(l )
= z -g ò
, "z Î C +
v(z)
1+ l v(z)
Discretize in z
Numerically Solve
Live Demos…
Empirical
Spectral
Distribution
Inverse Solutions of
General MP?
Single, True
Covariance
Matrix
True Covariance
Matrix Spectral
Distribution
-
1
l dH(l )
= z -g ò
, "z Î C +
v(z)
1+ l v(z)
Discretize in z
Numerically Solve
Empirical
Spectral
Distribution
Toy Example:
Block Covariance Matrix
?
Warning: Don’t try this at home
Toy Example:
Block Covariance Matrix
Thanks!
This was fun.
• Colwell LJ, Qin Y, Manta A and Brenner MP (2013). Signal
identification from Sample Covariance Matrices with
Correlated Noise. Under Review
• El Karoui, N., Spectrum estimation for large dimensional
covariance matrices using random matrix theory, Ann.
Statist. 36 (2008), 2757–2790
• MARCENKO , V. A. and PASTUR, L. A. (1967). Distribution
of eigenvalues in certain sets of random matrices. Mat.
Sb. (N.S.) 72 507–536.
• Silverstein, J. W. and Bai, Z. D. (1995). On the empirical
distribution of eigenvalues of a class of large-dimensional
random matrices. J. Multivariate Anal. 54, 2,175–192.