Fundamentals of Bayes’ method Lecture 2

Download Report

Transcript Fundamentals of Bayes’ method Lecture 2

Preparatory Statistics
• Beta Handbook: all (most) distributions
• Wikibook in statistics:
http:en.wikibooks.org/Statistics
• MIT open course: EE&CS:
Introductory Probability and Statistics
• Mainly as lookup literature
Kolmogorov Axioms
Fundamentals of Bayes’ method
D - observations from simple or complex Observation space
- state or parameter from state or parameter space 
P- Likelihood function: probability that D is generated in state 
f - prior: what we know about the state
f(|D)- posterior: what we know after seeing D
Sign between right and left part: proportionality. Multiply right
part with c normalising the left part so it integrates to 1 over 
State space alternatives
• Discrete, two or more states
(e.g., diseased, non-diseased)
• Continuous, e.g., an interval of reals
• High-dimensional, a vector of reals
(e.g., target position and velocity)
• Composite, a vector plus a label
(e.g., missile type x at point p with velocity v)
• Product space: vector of non-fixed dimension
(e.g., one or more targets approaching,
one or more disease-causing genes)
• Voxel set (medical imaging, atmosphere
mapping
Testing for disease
• State space (d,n), disease or not
• Observation space (P,N) positive or negative
• Prior: 0.1% of population has disease,
prior is (0.001,0.999)
• Likelihood: Test gives 5% false negatives, 10%
false positives:
P: (0.95, 0.1), N: (0.05, 0.90)
Testing for disease
• State space (d,n), disease or not
• Observation space (P,N) positive or negative
• Prior: 0.1% of population has disease,
prior is Prior=[0.001,0.999]
• Likelihood: 5% false negatives, 10% false positives:
P: [0.95, 0.10], N: [0.05, 0.90]
• Combing prior and likelihood:
Prior.*P=[0.00095,0.0999]; ->[0.01,0.99]
Prior.*N=[0.00005,0.8991]; ->[0.0001, 0.9999]
Deciding target type
•
•
•
•
•
Attack aircraft: small, dynamic
Bomber aircraft: large, dynamic
Civilian: Large, slow dynamics
Prior: (0.5,0.4,0.1);
Observer 1: probably small,
likelihood (0.8,0.1,0.1);
• Observer 2: probably fast,
likelihood (0.4,0.4,0.2);
Target classification, Matlab
>> prior=[0.5,0.4,0.1 ];
>> lik1=[0.8,0.1,0.1 ];
>> lik2=[0.4,0.4,0.2];
>> post1=prior.*lik1; post1=post1/sum(post1)
post1 =
0.8889 0.0889 0.0222
>> post2=prior.*lik2; post2=post2/sum(post2)
post2 =
0.5263 0.4211 0.0526
>> post12=post1.*lik2; post12=post12/sum(post12)
post12 =
0.8989 0.0899 0.0112
Odds and Bayes’ Factor
Odds for A against B is P(A|D)/P(B|D):
>> OddsCiv=post12(3)/(1-post12(3)) %Civilian vs not
OddsCiv =
0.0114
>> OddsAtt=post12(1)/(1-post12(1))
OddsAtt =
8.8889
>>
In first case the Odds is conveniently low, not C
Second case high, probably A
% Attack vs not
Inference on a probability
• Bayes’ original problem: estimating
success probability p from experiment
with f failures and s successes, n=s+f;
• Prior is uniform probability for p;
• In particular s=9; n=12;
• Likelihood:
9
3
p (1- p)
Estimating a probability
>> p=[0:0.01:1];
>> likh=p.^9.*(1-p).^3;
>> posterior=likh/sum(likh);
>> plot(p)
>> print -depsc beta
>>
NOTE: In Lecture notes example, s and f are swapped
and the computation is analytic instead of numeric!
Estimating a probability
>>postcum=cumsum(post);
>> plot(p,postcum,'b-',[0,1],[0.025 0.025],…
'r-',[0,1],[0.975 0.975],'r-');
95% credible interval
>>
95% credible interval
for p: [0.46, 0.91]
in other words, fairness is
not
rejected. Estimate p by
posterior mean:
>> sum(posterior.*p)
ans =
0.7143
>> postcum([50:51])
ans = 0.0427 0.0497
Is the coin balanced (LN 2.1.10)?
• Use outcome D:(s,f) in flipping n=s+f times
• Evaluate using two models, one H_r where probability
is 0.5, one H_u where it is uniformly
distributed over [0,1].
• P(D:(s,f)|H_r) = 2^(-n)
• P(D:(s,f)|H_u) = s!f!/(n+1)!
(normalization in Beta
dist)
• For s=3, f=9,
Bayes factor P(D|H_u)/P(D|H_r)1.4, or
P(H_r|D)  0.42 ; P(H_u|D)  0.58
HW 1
Is the coin balanced (LN 2.1.10)?
>> s=3;f=9;
>> gamma(s+1)*gamma(f+1)/gamma(s+f+2)*2^(s+f)
ans = 1.4322
>> s=6; f=18;
>> gamma(s+1)*gamma(f+1)/gamma(s+f+2)*2^(s+f)
ans = 4.9859
>> s=30;f=90;
>> gamma(s+1)*gamma(f+1)/gamma(s+f+2)*2^(s+f)
ans = 6.4717e+05
% in logs:
>> exp(gammaln(s+1)+gammaln(f+1)-gammaln(s+f+2)+…
log(2)*(s+f))
ans = 6.4717e+05
Dissecting Master Bayes’ formula
• Parametrized and composite models:
Recursive & Dynamic inference
• Repeated measurements improve
accuracy:
• Chapman Kolmogorov, tracking in time:
Retrodiction: what happened?
Retrodiction(smoothing) gives additional precision, but later
MCMC: PET camera
likelihood
prior
D: film, count by detector j
X: radioactivity in voxel i
a_ij: camera geometry Fraction of emission
from voxel i reaching detector j
Inference about X gives posterior, its
mean is often a good picture of patient
MCMC: PET camera
likelihood
prior
MCMC: Stochastic solution of probability problems
Generate sequence of states with the same distribution
as the posterior. In this case (X1, X2, …). Each member
is a full 3D image.
ESTIMATE X by taking mean over trace.
MCMC: PET camera
likelihood
prior
MCMC: PET camera
Main MCMC loop: We have (X1, X2, … Xk) and want
to compute X(k+1).
Propose a new image Z by changing the value in one voxel
Compute a=(Z)/(Xk), acceptance probability.
Accept X(k+1)=Z if a>1 or with probability a.
If not accept, X(k+1)=Xk.
Matlab: if a>rand X(k+1)=Z else X(k+1)=X(k) end;
In practise: Compute in logarithms to avoid underflow
Differential computation: Most of the terms in (Z)
same as in (Xk)
Sinogram and reconstruction
Tumour
Fruit Fly
Drosophila
family (Xray)
Does Bayes give the right answer?
• Output is a posterior. How accurate?
Depends on prior and likelihood
assessed.
• If data is generated by distribution g(x)
and inference is for parameter of f(x|)
then asymptotically posterior for  will
concentrate on argmin KL(g(.),f(.| ))
l
KL: Kullback-Leibler distance
Does Bayes give right answer?
• Coherence:
If you evaluate bets on uncertain events, anyone who
does not use Bayes’ rule to evaluate will potentially
loose unlimited amount to you who use Bayes’ rule.
(Freedman Purves, 1969)
• Consistency:
Observing properties of Boolean (set) algebra, the
calculus of plausibility has to be embeddable in an
ordered field where + and  correspond to the
combination functions for deriving plausibilities of
disjunction and conjunction. (Jaynes Ch 2, Arnborg,
Sjödin MaxEnt 2000, ECCAI 2000)