Transcript Document

HW1
Beta(p;4,10),Beta(p;9,15), Beta(6,6)
Beta(p;1,1)
likelihood
FDR, Evidence Theory,
Robustness
General Dependency test
function PV=testdep(D,N)
% General dependecy test TESTDEP(D,N)
% D: n by d Data Matrix
% N : Monte Carlo comparison sample
F=[]; [n,d]=size(D);
mv=mean(D);
D=D-repmat(mv,n,1);
%remove mean
st=std(D);
D=D./repmat(st,n,1);
%standardize variance
for i=1:d
for j=1:i-1
q=mean(D(:,i).*D(:,j));
F=[F q];
end
end
Empirical no-dependency
distribution
EE=[];
for iN=1:N-1
E=[];
for i=1:d
q=[];
if i>1
D(:,i)=D(randperm(n),i);
for j=1:i-1
q=mean(D(:,i).*D(:,j));
E=[E q];
end
end
end
EE=[EE;E];
end
Computing P-value
%Sorting twice gives value ranks of EE - test statistics
EE=[F ; EE];
[EEs,iix]=sort(EE);
[EEs,iix]=sort(iix);
% p-value is proportional to value rank
PV=iix(1,:)/N;
% reshuffle to matrix
PVM(ix)=PV
Correlation coefficient
>> D=[1:100]';
>> D=[D -D D.^2 D+200*rand(size(D)) randn(size(D))];
>> [c pv]=corrcoef(D)
c=
1.0000 -1.0000 0.9689 0.2506 -0.0977
-1.0000 1.0000 -0.9689 -0.2506 0.0977
0.9689 -0.9689 1.0000 0.2959 -0.0540
0.2506 -0.2506 0.2959 1.0000 -0.0242
-0.0977 0.0977 -0.0540 -0.0242 1.0000
Correlation coefficient
>> [c pv]=corrcoef(D)
pv =
1.0000
0 0.0000
>> [,pv]=testdep(D,N)
pv =
0
0 1.0000
>>
0.0119
0.3335
0.9936
0.1668
Multiple testing
• The probability of rejecting a true null
hypothesis at 99% is 1%.
• Thus, if you repeat test 100 times, each time
with new data, you will reject sometime with
probability 0.63
• Bonferroni correction, FWE control:
in order to reach significance level 1% in an
experiment involving 1000 tests, each test
should be checked with significance
1/1000 %
Multiple testing
• Several approaches try to verify an excess of
small p-values
• Sort set of p-values and test if there is an
excess of small values - this is an indication
of false null hypotheses
Approaches to multiple testing
Definition of FDR, positive
correlation
No significance
Lower envelope
FDR
FDR
corrected
Some significance
One of 15 first
tests not null-On 5% significanc
More significance
FDR:
95% of first 3 tests
not null hypothesis
Even more significance
95% of first 14 tests
not null - worth effort
to investigate all
FDR Example - independence
Fdrex(pv,0.05,0)
10 signals suggested.
Smallest p-value not
significant with
Bonferroni correction
(0.019 vs 0.013)
FDR Example - dependency
Fdrex(pv,0.05,1)
10 signals suggested
assuming independence
all disappear with
correction term
Ed Jaynes devoted a large
part of his career to promote
Bayesian inference.
He also championed the
use of Maximum Entropy in physics
Outside physics, he received
resistance from people who had
already invented other methods.
Why should statistical mechanics
say anything about our daily human
world??
Generalisation of
Bayes/Kalman:
What
if:
You have no prior?
•
• Likelihood infeasible to compute (imprecision)?
• Parameter space vague, i.e., not the same for
all likelihoods? (Fuzziness, vagueness)?
• Parameter space has complex structure
(a simple structure is e.g., a Cartesian product
of reals, R, and some finite sets)?
Philippe Smets (1938-2005)
Developed Dempster’s and Shafer’s method
in uncertainty management into the
Transferable Belief Method, that combines imprecise
‘evidence’ (likelihood or prior) using Dempster’ rule,
and uses pignistic transformation to get a sharp
decision criterion
Some approaches...
• Robust Bayes: replace distributions by convex sets of
distributions (Berger m fl)
• Dempster/Shafer/TBM: Describe imprecision with
random sets
• DSm: Transform parameter space to capture
vagueness. (Dezert/Smarandache, controversial)
• FISST: FInite Set STatistics: Generalises
observation- and parameter space to product of
spaces described as random sets.
(Goodman, Mahler, Ngyuen)
Combining Evidence
Combining Evidence
Combining Evidence
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Robust Bayes
• Priors and likelihoods are convex sets of probability distributions
(Berger, de Finetti, Walley,...): imprecise probability:
f (  | D)  f (D | ) f ()
F(  | D)  F (D |  )F (  )
• Every member of posterior is a ’parallell combination’ of one member of
likelihood and one member of prior.
• For decision making: Jaynes recommends to use that member of
posterior with maximum entropy (Maxent estimate).


Ellsberg’s Paradox:
Ambiguity Avoidance
Urna A innehåller
4 vita och 4 svarta
kulor, och 4 av
okänd färg (svart
eller vit)
?
Urna B
innehåller
6 vita och
6 svarta kulor
?
?
?
Du får en krona om du drar en svart kula. Ur vilken urna
vill du dra den?
En precis Bayesian bör först anta hur ?-kulorna är färgade och sedan
svara. Men en majoritet föredrar urna B även om svart byts mot vit
Prospect Theory: Kahneman,
Tversky
• Safety belts eliminate car
collision injuries at low speed
completely
(I BUY IT!!!)
• Safety belts eliminate 90% of
injuries in car accidents. In
10% the speed is to high
(So belts are not that
good!???)
Hur används imprecisa
sannolikheter?
• Förväntad nytta för beslutsalternativ blir intervall i
stället för punkter: maximax, maximin, maximedel?
u
Bayesian
pessimist
optimist
a
Dempster/Shafer/Smets
• Evidence is random set over over .
• I.e., probability distribution over . 
• Probability of singleton: ‘Belief’ allocated to
alternative, i.e., probability.
• Probability of non-singelton: ‘Belief’ allocated to set of
alternatives, but not to any part of it.
• Evidences combined by random intersection
conditioned to be non-empty (Dempster’s rule).
2

Logic of Dempster’s rule
• Each observer has a private state space and
assesses the posterior over it.
• Each private state can correspond to one or
more global or common states, multivalued
mapping
• Observers state spaces are assumed
independent.
Correspondence DS-structure-set of probability distributions
For a pdf (bba) m over 2^, consider all
ways of reallocating the probability mass
of non-singletons to their member atoms:
This gives a convex set of probability distributions
over . Example: ={A,B,C}
set of pdfs
bba
A: 0.1
A: 0.1+0.5*x
for all x[0,1]
B: 0.3
B: 0.3+0.5*(1-x)
C: 0.1
C: 0.1
AB: 0.5
Can we regard any set of pdf:s as a bba? Answer is NO!!
There are more convex sets of pdf:s than DS-structures
Representing probability set as bba: 3-element universe
Rounding up: use
lower envelope.
Black: convex set
Blue: rounded up Rounding down:
Red: rounded down Linear programming
Rounding is not unique!!
Another appealing conjecture
• Precise pdf can be regarded as (singleton) random set.
• Bayesian combination of precise pdf:s corresponds to random
set intersection (conditioned on non-emptiness)
• DS-structure corresponds to Choquet capacity
(set of pdf:s)
• Is it reasonable to combine Choquet capacities by (nonempty)
random set intersection (Dempster’s rule)??
• Answer is NO!!
• Counterexample: Dempster’s combination cannot be obtained
by combining members of prior and likelihood:
Arnborg: JAIF vol 1, No 1, 2006
Consistency of fusion
operators
Axes are probabilities of A and B in a 3-element universe
P(B)
Operands (evidence)
Robust Fusion
Dempster’s rule
Modified Dempster’s rule
Rounded robust
DS rule
MDS rule
P(A)
P(C )=1-P(A)-P(B)
Deciding target type
•
•
•
•
•
Attack aircraft: small, dynamic
Bomber aircraft: large, dynamic
Civilian: Large, slow dynamics
Prior: (0.5,0.4,0.1);
Observer 1: probably small,
likelihood (0.8,0.1,0.1);
• Observer 2: probably fast,
likelihood (0.4,0.4,0.2);
Estimators
Center encl sphere
Pignistic
MaxEnt
3 states: P( C ) = 1-P(A)-P(B)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
What about Smets’ TBM??
• TBM combines the original Dempster’s rule with the
pignistic transformation. This is not compatible with
precise Bayesian analysis.
• However, there is nothing against claiming TBM to be
some kind of Robust Bayesian scheme.
• Main problem: Dempster’s rule and its motivation
using multi-valued mappings is against the dominant
argumentation used in introductions and tutorials:
TBM is incompatible with the Capacity interpretation
of DS structures