A Bayesian multi-scale model of perceptual organization
Download
Report
Transcript A Bayesian multi-scale model of perceptual organization
Probabilistic Reasoning for Modeling
Unreliable Data
Ron Tal
York University
Agenda
Modeling Uncertainty
Bayesian Reasoning
M-Estimation
Maximum Likelihood
Common Pitfall
More Advanced Models
Modeling Uncertainty
Why is it necessary?
The only certainty in this world is uncertainty
Often we cannot afford or are not capable of explicitly
enumerating all variables absolutely
Sometimes uncertainty is caused by a limit of the reliability of the
technology
Making decisions with unreliable data
Modeling Uncertainty (cont.)
Three competing paradigms:
Non-monotonous Calculus
Fuzzy Logic
Probability Theory
Since we cannot construct a deterministic solution to
many problems, we model sources of uncertainty as
probability distributions
Bayesian Reasoning
At the core of probabilistic frameworks is Bayesian
Inference
Let’s define a few concepts:
P( E | H ) - The probability of witnessing evidence E given a
hypothesis H
P( H | E ) - The probability of hypothesis H given the evidence E
P( H )
- Probability of H prior to observing E
-
P( E )
P(E | H )P(H )
i
i
Bayesian Reasoning: Bayes’ theorem
States that:
Expressed in terms of our model
P( E | H ) P( H )
P( H | E )
P( E )
What we want to maximize
Our life becomes simpler
We usually know!
We don’t always care!
Bayesian Reasoning: Bayes’ theorem
If we prefer, it can also be written as
P( E H )
P( H | E )
P( E )
The joint probability
M-Estimation
Bayesian Inference gives us a powerful tool to choose
the hypothesis that models the data
A simple example is the set of parameters of a line of
best fit through noisy data
Statistical tools to achieve this are called M-Estimators
The most popular choice is a special case called
“Maximum Likelihood Estimator”
Maximum Likelihood
Recall Bayes’ theorem:
P( E | H ) P( H )
P( H | E )
P( E )
The denominator is merely a normalization constant
Maximum Likelihood can be applied if we assume the
model prior is known
Maximum Likelihood (cont.)
When model prior is constant:
n
( H | e1 ,..., en ) P(ei | H )
i 1
Thus, we can fit model parameters by maximizing the
likelihood
Maximum Likelihood (cont.)
To determine parameters of a model, we maximize the
negative log likelihood:
ˆ min log
This let’s us avoid playing with products
Maximum Likelihood (cont.)
For Gaussian distribution this is especially convenient:
n
1
ˆ
min log
Z
i 1
e
2 2
i ˆ
n ˆ 2
i
1
min
log
2
Z
i 1 2
2
e
Maximum Likelihood
Becoming:
1
log
Z
2 2
e
Constant
n
2
min
ˆ
i
i 1
Least Squares
Common Pitfall
We love Gaussian Distributions
We love Least-Squares
However, using Least-Squares without the process of
probabilistic reasoning is a common rookie mistake
Common Pitfall: Illustration
Better Modeling
Many statistical tools are available for when the Gaussian
assumption fails
Assumptions can include
Good Data is Gaussian, Outliers are present
pdf can be represented as a mixture of causes
No parametric model is best suited for the job
Robust Statistics
In Robust M-Estimators it is assumed that the data is locally
Gaussian but outliers make traditional Least-Squares unsuitable
Essentially, we give ‘bad’ data more credibility than it deserves
Robust formulation ‘weighs’ the data with a Robust Influence
Function
Robust Statistics (cont.)
E.g. Tukey’s Biweight:
Mixture Models
Data can be represented as caused by one of several
possible causes
Essentially a weighted sum of distributions
GMM is extremely powerful
EM Clustering is the ideal estimator for that
Non-parametric
Actual observed data is used in place of a fitted model
Usually a histogram
To find the ML fit between new observed data and the
histogram we can minimize the Bhattachariyya Distance:
DB p, q ln p i q i
iN
Non-parametric
Very simple to use
Sometimes most accurate
Very inefficient for problems with high dimensionality
Thank You