Transcript m - APC

Bayesian Parameter
Estimation Techniques for
LISA
Nelson Christensen, Carleton College, Northfield,
Minnesota, USA
Journées LISA-France, Meudon, May 15-16, 2006
1
Outline of talk
Bayesian methods - Quick Review
– Fundamentals
– Markov chain Monte Carlo (MCMC) Methods
LISA data analysis applications
– LISA source confusion problem
– Time Delay Interferometry variables
– Parameter Estimation: Binary inspiral signals
Journées LISA-France, Meudon, May 15-16, 2006
2
MCMC Collaborators
• Glasgow Physics and Astronomy: Dr. Graham
Woan, Dr. Martin Hendry, John Veitch
• Auckland Statistics: Dr. Renate Meyer,
Richard Umstätter, Christian Röver
Journées LISA-France, Meudon, May 15-16, 2006
3
Why Bayesian methods?
Orthodox statistical methods are concerned solely with
deductions following experiments with populations:
" The trouble is that what we [statisticians] call
modern statistics was developed under strong
pressure on the part of biologists. As a result, there is
practically nothing done by us which is directly
applicable to problems of astronomy."
Jerzy Neyman, founder of frequentist hypothesis testing.
Journées LISA-France, Meudon, May 15-16, 2006
4
Why Bayesian methods?
Bayesian methods explore the joint probability space of data and
hypotheses within some global model, quantifying their joint uncertainty
and consistency as a scalar function:
p(data, hypotheses | world view)
means “given”
There is only one algebra consistent with this idea (and some further,
very reasonable, constraints), which leads to (amongst other things) the
product rule:
p(d, h | w)  p(d | w)p(h | d,w)  p(h | w)p(d | h,w)
Journées LISA-France, Meudon, May 15-16, 2006
5
Why Bayesian methods?
Bayes’ theorem: the appropriate rule for updating our degree
of belief (in one of several hypotheses within some world view)
when we have new data:
p(h | d,w)

p(h | w)  p(d | h,w)
p(d | w)
Prior
Posterior
p(model | data, I)

Likelihood
p(model | I)  p(data | model, I)
p(data | I)
Evidence, or “global likelihood”
We can usually calculate all these terms
Journées LISA-France, Meudon, May 15-16, 2006
6
Marginalisation
• We can also deduce the marginal probabilities. If X and Y are propositions that
can take on values drawn from Y  { y 1, y 2 ,..., y m }, and X  { x1, x 2 ,..., x n },
then
p(x i )  p(x i )
 p(y
j
j 1..m
| xi ) 
 p(x )p(y
i
j
| xi ) 
j 1..m
 p(x , y )
i
j
j 1..m
=1
this gives us the probability of X when we don’t care about Y. In these
circumstances, Y is known as a nuisance parameter.
• All these relationships can be smoothly extended from discrete probabilities to
probability densities, e.g.
p(x)   p(x, y)dy
where “p(y)dy” is the probability that y lies in the range y to y+dy.
Journées LISA-France, Meudon, May 15-16, 2006
7
Markov Chain Monte Carlo methods
• We need to be able to evaluate marginal integrals of the form
p(x)   ... p(x, x1,..., x n )dx1...dx n
• The approach is to sample in the (x, x1,..., x n )space so that the density of
samples reflects the posterior probability p(x, x1,..., x n ).
• MCMC algorithms perform random walks in the parameter space so that the
probability of being in a hypervolume dV is  p(x, x1,..., x n )dV
.
• The random walk is a Markov chain: the transition probability of making a step
/
depends on the proposed location, a k and the current location a k
• MCMC - demonstrated success with problems with large parameter number.
• Used by Google, WMAP, Financial Markets, LISA???
Journées LISA-France, Meudon, May 15-16, 2006
8
Metropolis-Hastings Algorithm
•
We want to explore p(a). Let the current location be at .
1)
Choose a candidate state at/ using a
/
proposal distribution q(at | at ).
2)
3)

4)
q(at/ | at )
at/
Compute the Metropolis ratio
R
p(a)
pat/ pBk | at/ qat | at/ 
r
at
pat  pBk | at qat/ | at 
If R>1 then make the step (i.e., at 1  at)
if R<1 then make the step with probability R, otherwise set at 1  at , so that
the location is repeated.
i.e., make the step with an acceptance probability
 pa / pB | a / qa | a / 
t
k
t
t
t
/

 at | at  min 1,
/

 pat  pBk | at qat | at 

/
Choose the next candidate based on the (new) current position…

Journées LISA-France, Meudon, May 15-16, 2006
9
Metropolis-Hastings Algorithm
•
{at} form a Markov chain drawn
from p(a), so a histogram of {at}, or
any of its components,
approximates the (joint) pdf of
those components.
• The form of the acceptance
probability guarantees reversibility
even for proposal distributions that
are asymmetric.
• There is a burn-in period before the
equilibrium distribution is reached:
Journées LISA-France, Meudon, May 15-16, 2006
10
Application to LISA data analysis
Source Confusion Problem
TDI Variables
Parameter Estimation for Signals
Journées LISA-France, Meudon, May 15-16, 2006
11
LISA source identification
• This has implications for the analysis of LISA data, which is expected to contain
many (perhaps 50,000) signals from white dwarf binaries. The data will contain
resolvable binaries and binaries that just contribute to the overall noise (either
because they are faint or because their frequencies are too close together).
Bayes can sort these out without having to introduce ad hoc acceptance and
rejection criteria, and without needing to know the “true noise level”
(whatever that means!).
Journées LISA-France, Meudon, May 15-16, 2006
12
Things that are not generally true
• “A time series of length T has a frequency resolution of 1/T.”
Frequency resolution also depends on signal-to-noise ratio. We know the period
of PSR 1913+16 to 1e-13 Hz, but haven’t been observing it for 3e5 years.
In fact frequency resolution is
1

.
T  snr
• “you can subtract sources piece-wise from data.”
Only true if the source signals are orthogonal over the observation period.
• “frequency confusion sets a fundamental limit for low-frequency LISA.”
This limit is set by parameter confusion, which includes sky location and other
relevant parameters (with a precision dependent on snr).
Journées LISA-France, Meudon, May 15-16, 2006
13
LISA source identification
•
Toy (zeroth-order LISA) problem:
You are given a time series of 1000 data points comprising a number of sinusoids
embedded in gaussian noise. Determine the number of sinusoids, their amplitudes, phases
and frequencies and the standard deviation of the noise.
•
We could think of this as comparing hypotheses Hm that there are m sinusoids in the data,
with m ranging from 0 to mmax. Equivalently, we could consider this a parameter fitting
problem, with m an unknown parameter within the global model.
signal
parameterised by
giving data
and a likelihood
Journées LISA-France, Meudon, May 15-16, 2006
14
LISA source identification
• With suitably chosen priors on m and am we can write down the full posterior
pdf of the model
But this is (3m+2) dimensional, with m ~100 in our toy problem, so the direct
evaluation of marginal pdfs for, say, m or m or to extract the pdf of a
component amplitude, is unfeasible.
• Explore this space using a modified Markov Chain Monte Carlo technique…
Journées LISA-France, Meudon, May 15-16, 2006
15
Reversible Jump MCMC
• Trans-dimensional moves (changing m) cannot be performed in conventional
MCMC. We need to make jumps from to dimensions
• Reversibility is guaranteed if the acceptance probability for an upward
transition is
where
is the Jacobian determinant of the transformation of
the old parameters [and proposal random vector r drawn from q(r)] to the new
set of parameters, i.e.
.
• We use two sorts of trans-dimensional moves:
–
–
‘split and merge’ involving adjacent signals
‘birth and death’ involving single signals
Journées LISA-France, Meudon, May 15-16, 2006
16
Trans-dimensional split-and-merge transitions
• A split transition takes the parameter subvector
from ak
and splits it into two components of similar frequency but about half the
amplitude:
A
A
f
Journées LISA-France, Meudon, May 15-16, 2006
f
17
Trans-dimensional split-and-merge transitions
• A merge transition takes two parameter subvectors and merges them to their
mean:
A
A
f
Journées LISA-France, Meudon, May 15-16, 2006
f
18
Initial values
• A good initial choice of parameters greatly decreases the length of the
‘burn-in’ period to reach convergence (equilibrium). For simplicity we
use a thresholded FFT:
• The threshold is set low, as it is easier to destroy bad signals that to
create good ones.
Journées LISA-France, Meudon, May 15-16, 2006
19
Simulations
•
•
•
•
•
1000 time samples with Gaussian noise
100 embedded sinusoids of form A cos 2ft  B sin 2ft
As and Bs chosen randomly in [-1 … 1]
fs chosen randomly in [0 ... 0.5]
Noise   1
Priors
• Am,Bm uniform over [-5…5]
• fm uniform over [0 ... 0.5]
2
•  m has a standard vague inverse2
gamma prior IG(  m ;0.001,0.001)
p( m2 )
 m2
 m2
Journées LISA-France, Meudon, May 15-16, 2006
20
results teaser (spectral density)
energy
energy
density
energy
density
frequency
Journées LISA-France, Meudon, May 15-16, 2006
21
results teaser (spectral density)
energy
energy
density
energy
density
frequency
Journées LISA-France, Meudon, May 15-16, 2006
22
Strong, close signals
A
A
B
f
f
1/T
B
Journées LISA-France, Meudon, May 15-16, 2006
23
Signal mixing
• Two signals (red and green) approaching in frequency:
Journées LISA-France, Meudon, May 15-16, 2006
24
Marginal pdfs for m and 
p(m | { D }, I)  
{ A } m ,{ f } m , m
p(m, { A} m , {B } m , { f } m , | { D }, I) d{ A} m d{B } m d{ f } m d m
Journées LISA-France, Meudon, May 15-16, 2006
25
Spectral density estimates
Journées LISA-France, Meudon, May 15-16, 2006
26
Joint energy/frequency posterior
Journées LISA-France, Meudon, May 15-16, 2006
27
Well-separated signals (~1/T)
These signals (separated by ~1 Nyquist
step) can be easily distinguished and
parameterized.
Journées LISA-France, Meudon, May 15-16, 2006
28
Closely-spaced signals
Signals can be distinguished, but parameter estimation in difficult.
95% contour
Journées LISA-France, Meudon, May 15-16, 2006
29
Source Confusion: Extensions to full LISA
• We have implemented an MCMC method of extracting useful
information from zeroth-order LISA data under difficult
conditions.
• Extension to orbital Doppler/source location information should
improve source identification. This code extension is currently
being tested.
• Extension to TDI variables is straightforward. Raw Doppler
measurements could also be used, with a suitable data
covariance matrix.
• There is nothing special about WD-WD signals here. Similar
analyses could be performed for BH mergers, EMRI sources etc…
Journées LISA-France, Meudon, May 15-16, 2006
30
Time Delay Interferometry (TDI) variables
• Principal Component Analysis (PCA)
• See Romano and Woan gr-qc/0602033
• Estimate signal parameters and noise with
MCMC
• All information is in the likelihood
• TDI variables fall right out
Journées LISA-France, Meudon, May 15-16, 2006
31
Simple Example with Correlated Noise
Data from 2 detectors
s1= p + n1 + h1
s2= p + n2 + h2
Astro-signal h1=2a and h2=a
n1 and n2 uncorrelated noise, p common noise;
all noise zero mean
<n12> = < n22> = n2 and <p2> = p2
Uncorrelated <n1n2> = <n1p> = <n2p> = 0
Likelihood p(s1,s2|a)  exp[-Q/2]
Q=(si-hi)C-1ij(sj-hj) C-noise covariance matrix
Journées LISA-France, Meudon, May 15-16, 2006
32
Principal Component Analysis
C- find eigenvectors, factorize likelihood
p(s1,s2|a)  p(s+|a)p(s-|a)
s+=s1+s2 and s-=s1-s2 where
p(s+|a)  exp[(s+-3a)2/(8p2 + 4n2)]
p(s-|a)  exp[(s--a)2 /(4n2)]
For LISA p2>>n2, so there is
no loss of information by
doing statistical inference
only on the s- term - TDI.
Journées LISA-France, Meudon, May 15-16, 2006
33
More Realistic LISA
Data streams:
s1= D3p2 - p1+ n1 + h1
s’1= D2p3 - p1+ n’1 + h’1
…
D - delay operator
MCMC- Everything is in
the likelihood.
Toy problem - sinusoidal gravity wave from above
LISA; 3 signal parameters and 9 noise levels. 1000
data points at 1 Hz.
p2>>n2 and p>>h
Journées LISA-France, Meudon, May 15-16, 2006
34
Trace Plots
Markov
Chains
Posterior
PDFs made
from these
Journées LISA-France, Meudon, May 15-16, 2006
35
Parameter Estimation
Posterior
PDFs for signal
parameters
and
noise levels.
Journées LISA-France, Meudon, May 15-16, 2006
36
TDI Variables: Summary
• TDI variables fall out of likelihood - matches
well to MCMC approach
• Simplify calculation for MCMC too
• Incorporate LISA complexity, step by step.
Realistic noise and signal terms, LISA orbit,
arm length change, etc.
• MCMC methods handle large parameter
number; computational time grows linearly
• Long-term effort to develop realistic LISA
scenario, with good prospects for success
Journées LISA-France, Meudon, May 15-16, 2006
37
Parameter Estimation for Signals
•
•
•
•
•
Binary Inspiral as an MCMC exercise
Numerous parameters
MCMC can provides means for estimates
Applications for other types of signals too
Demonstrated to work with a network of
ground-based interferometers - extend this
work to LISA
Journées LISA-France, Meudon, May 15-16, 2006
38
Interferometer Detection
• Single Detector - 5 parameters: m1, m2, effective
distance dL, phase c and time tc at coalescence
• Reparameterize mass: chirp mass mc and 
• For multi-detectors- Coherent addition of signals
• Parameters for estimation: m1, m2, c, tc, actual
distance d, polarization , angle of inclination of
orbital plane , sky position RA and dec
Journées LISA-France, Meudon, May 15-16, 2006
39
Amplitude Correction Work with Inspirals
We have already developed an inspiral (ground
based interferometer) MCMC pipeline for
signals that are 3.5 Post-Newtonian (PN) in
phase, 2.5 PN in the amplitude
Time domain templates, then FFT into
frequency domain
MCMC provides parameter estimation and
statistics.
Future work will include spin of masses
Journées LISA-France, Meudon, May 15-16, 2006
40
Likelihood for Inspiral Signal
• Work in the frequency domain
• Detector output z(t) is the sum of gravity
wave signal s(t,), that depends on unknown
parameters , and the noise n(t)
• z(t)=s(t,)+n(t)
• Noise spectral density Sn(f)
* 
 
z˜  f   s˜ f z˜ f   s˜  f 


p z |   exp2  df
S
f




n
0


 
Journées LISA-France, Meudon, May 15-16, 2006
41
Ground based example: 2 LIGO sites and Virgo.
Code works, but optimization is still in progress.
Journées LISA-France, Meudon, May 15-16, 2006
42
Journées LISA-France, Meudon, May 15-16, 2006
43
MCMC LISA Summary
• LISA faces extremely complex data analysis
challenges
• MCMC methods have demonstrated record of
success with large parameter number
problems
• MCMC for source confusion - binary
background
• TDI variables, signal and noise estimation
• Parameter estimation: binary inspirals and
other waveforms
Journées LISA-France, Meudon, May 15-16, 2006
44
Delayed rejection
•
•
Sampling can be improved (beyond Metropolis Hastings) if a second proposal is made
following, and based on, an initial rejected proposal. The initial proposal is only rejected
if this second proposal is also rejected.
Acceptance probability of the second stage has to be chosen to preserve reversibility
(detailed balance):
acceptance probability for 1st stage:
and for the 2nd stage:
•
Delayed Rejection Reversible Jump Markov Chain Monte Carlo method
‘DRRJMCMC’ Green & Mira (2001) Biometrika 88 1035-1053.
Journées LISA-France, Meudon, May 15-16, 2006
45
Label-switching
•
•
As set up, the posterior is invariant under signal renumbering – we
have not specified what we mean by ‘signal 1’.
Break the symmetry by ordering in frequency:
1.
2.
3.
4.
5.
6.
Fix m at the most probable number of signals, containing n MCMC steps.
Order the nm MCMC parameter triples (A,B,f) in frequency.
Perform a rough density estimate to divide the samples into m blocks.
Perform an iterative minimum variance cluster analaysis on these blocks.
Merge clusters to get exactly m signals.
Tag the parameter triples in each cluster.
f
Journées LISA-France, Meudon, May 15-16, 2006
46
Application to the LISA confusion problem
Journées LISA-France, Meudon, May 15-16, 2006
47