A Bayesian multi-scale model of perceptual organization

Download Report

Transcript A Bayesian multi-scale model of perceptual organization

Probabilistic Reasoning for Modeling
Unreliable Data
Ron Tal
York University
Agenda
 Modeling Uncertainty
 Bayesian Reasoning
 M-Estimation
 Maximum Likelihood
 Common Pitfall
 More Advanced Models
Modeling Uncertainty
 Why is it necessary?
 The only certainty in this world is uncertainty
 Often we cannot afford or are not capable of explicitly
enumerating all variables absolutely
 Sometimes uncertainty is caused by a limit of the reliability of the
technology
 Making decisions with unreliable data
Modeling Uncertainty (cont.)
 Three competing paradigms:
 Non-monotonous Calculus
 Fuzzy Logic
 Probability Theory
 Since we cannot construct a deterministic solution to
many problems, we model sources of uncertainty as
probability distributions
Bayesian Reasoning
 At the core of probabilistic frameworks is Bayesian
Inference
 Let’s define a few concepts:
 P( E | H ) - The probability of witnessing evidence E given a
hypothesis H
 P( H | E ) - The probability of hypothesis H given the evidence E
 P( H )
- Probability of H prior to observing E

-
P( E )
 P(E | H )P(H )
i
i
Bayesian Reasoning: Bayes’ theorem
 States that:
Expressed in terms of our model
P( E | H ) P( H )
P( H | E ) 
P( E )
What we want to maximize
 Our life becomes simpler
We usually know!
We don’t always care!
Bayesian Reasoning: Bayes’ theorem
 If we prefer, it can also be written as
P( E  H )
P( H | E ) 
P( E )
The joint probability
M-Estimation
 Bayesian Inference gives us a powerful tool to choose
the hypothesis that models the data
 A simple example is the set of parameters of a line of
best fit through noisy data
 Statistical tools to achieve this are called M-Estimators
 The most popular choice is a special case called
“Maximum Likelihood Estimator”
Maximum Likelihood
 Recall Bayes’ theorem:
P( E | H ) P( H )
P( H | E ) 
P( E )
 The denominator is merely a normalization constant
 Maximum Likelihood can be applied if we assume the
model prior is known
Maximum Likelihood (cont.)
 When model prior is constant:
n
( H | e1 ,..., en )   P(ei | H )
i 1
 Thus, we can fit model parameters by maximizing the
likelihood
Maximum Likelihood (cont.)
 To determine parameters of a model, we maximize the
negative log likelihood:
ˆ  min   log    

 This let’s us avoid playing with products
Maximum Likelihood (cont.)
 For Gaussian distribution this is especially convenient:
 n

1


ˆ
  min   log
 Z

 i 1






e
 
2 2  
 


i ˆ
 n   ˆ 2
i
1

 min  
log 
2

Z
 i 1 2

2
e





Maximum Likelihood
 Becoming:
1
log 
Z


2 2
e 

Constant
n
2

 min
ˆ
  i   
 i 1



Least Squares 
Common Pitfall
 We love Gaussian Distributions
 We love Least-Squares
 However, using Least-Squares without the process of
probabilistic reasoning is a common rookie mistake
Common Pitfall: Illustration
Better Modeling
 Many statistical tools are available for when the Gaussian
assumption fails
 Assumptions can include
 Good Data is Gaussian, Outliers are present
 pdf can be represented as a mixture of causes
 No parametric model is best suited for the job
Robust Statistics
 In Robust M-Estimators it is assumed that the data is locally
Gaussian but outliers make traditional Least-Squares unsuitable
 Essentially, we give ‘bad’ data more credibility than it deserves
 Robust formulation ‘weighs’ the data with a Robust Influence
Function
Robust Statistics (cont.)
 E.g. Tukey’s Biweight:
Mixture Models
 Data can be represented as caused by one of several
possible causes
 Essentially a weighted sum of distributions
 GMM is extremely powerful
 EM Clustering is the ideal estimator for that
Non-parametric
 Actual observed data is used in place of a fitted model
 Usually a histogram
 To find the ML fit between new observed data and the
histogram we can minimize the Bhattachariyya Distance:


DB  p, q    ln   p  i  q  i  
 iN

Non-parametric
 Very simple to use
 Sometimes most accurate
 Very inefficient for problems with high dimensionality
Thank You 