lecture1x - M. Pawan Kumar

Download Report

Transcript lecture1x - M. Pawan Kumar

Probabilistic Inference
Lecture 1
M. Pawan Kumar
[email protected]
Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/
About the Course
• 7 lectures + 1 exam
• Probabilistic Models – 1 lecture
• Energy Minimization – 4 lectures
• Computing Marginals – 2 lectures
• Related Courses
• Probabilistic Graphical Models (MVA)
• Structured Prediction
Instructor
• Assistant Professor (2012 – Present)
• Center for Visual Computing
• 12 Full-time Faculty Members
• 2 Associate Faculty Members
• Research Interests
• Probabilistic Models
• Machine Learning
• Computer Vision
• Medical Image Analysis
Students
• Third year at ECP
• Specializing in Machine Learning and Vision
• Prerequisites
• Probability Theory
• Continuous Optimization
• Discrete Optimization
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
Example (on board) !!
Outline
• Probabilistic Models
• Markov Random Fields (MRF)
• Bayesian Networks
• Factor Graphs
• Conversions
• Exponential Family
• Inference
MRF
Unobserved
Random
Variables
Neighbors
Edges define a neighborhood over random variables
MRF
V
V
V
1
2
3
V
V
V
4
5
6
V
V
V
7
8
9
Variable Va takes a value or a label va from a set L = {l1, l2,…, lh}
V = v is called a labeling
Discrete, Finite
MRF
V
V
V
1
2
3
V
V
V
4
5
6
V
V
V
7
8
9
MRF assumes the Markovian property for P(v)
MRF
V
V
V
1
2
3
V
V
V
4
5
6
V
V
V
7
8
9
Va is conditionally independent of Vb given Va’s neighbors
Hammersley-Clifford Theorem
MRF
Potential
ψ12(v1,v2)
V
V
V
1
2
3
V
V
V
4
5
6
V
V
V
7
8
9
Potential
ψ56(v5,v6)
Probability P(v) can be decomposed into clique potentials
MRF
Potential
ψ1(v1,d1)
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
Probability P(v) proportional to Π(a,b) ψab(va,vb)
Probability P(d|v) proportional to Πa ψa (va,da)
Observed
Data
MRF
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
Probability P(v,d) =
Πa ψa(va,da) Π(a,b) ψab(va,vb)
Z
Z is known as the partition function
MRF
d1
d2
V
V
V
1
2
3
d4
High-order
Potential
ψ4578(v4,v5,v7,v8)
d3
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
Pairwise MRF
Unary
Potential
ψ1(v1,d1)
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
Probability P(v,d) =
Pairwise
Potential
ψ56(v5,v6)
Πa ψa(va,da) Π(a,b) ψab(va,vb)
Z
Z is known as the partition function
MRF
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
A is conditionally independent of B given C if
there is no path from A to B when C is removed
Conditional Random Fields (CRF)
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
CRF assumes the Markovian property for P(v|d)
Hammersley-Clifford Theorem
CRF
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d)
Clique potentials that depend on the data
CRF
d1
d2
d3
V
V
V
1
2
3
d4
d5
d6
V
V
V
4
5
6
d7
d8
d9
V
V
V
7
8
9
Probability P(v|d) =
Πa ψa (va;d) Π(a,b) ψab(va,vb;d)
Z
Z is known as the partition function
MRF and CRF
Probability P(v) =
V
V
V
1
2
3
V
V
V
4
5
6
V
V
V
7
8
9
Πa ψa(va) Π(a,b) ψab(va,vb)
Z
Outline
• Probabilistic Models
• Markov Random Fields (MRF)
• Bayesian Networks
• Factor Graphs
• Conversions
• Exponential Family
• Inference
Bayesian Networks
V
1
V
V
2
3
V
V
V
4
5
6
V
V
7
8
Directed Acyclic Graph (DAG) – no directed loops
Ignoring directionality of edges, a DAG can have loops
Bayesian Networks
V
1
V
V
2
3
V
V
V
4
5
6
V
V
7
8
Bayesian Network concisely represents the probability P(v)
Bayesian Networks
V
1
V
V
2
3
V
V
V
4
5
6
V
V
7
8
Probability P(v) = Πa P(va|Parents(va))
P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6)
Bayesian Networks
Courtesy Kevin Murphy
Bayesian Networks
V
1
V
V
2
3
V
V
V
4
5
6
V
V
7
8
Va is conditionally independent of its ancestors given its parents
Bayesian Networks
Conditional independence of A and B given C
Courtesy Kevin Murphy
Outline
• Probabilistic Models
• Markov Random Fields (MRF)
• Bayesian Networks
• Factor Graphs
• Conversions
• Exponential Family
• Inference
Factor Graphs
V
a
V
b
V
1
2
3
c
d
e
V
4
f
V
5
g
V
6
Two types of nodes: variable nodes and factor nodes
Bipartite graph between the two types of nodes
Factor Graphs
ψa(v1,v2)
V
a
V
b
V
1
2
3
c
d
e
V
4
f
V
5
g
V
6
Factor graphs concisely represents the probability P(v)
Factor Graphs
ψa({v}a)
V
a
V
b
V
1
2
3
c
d
e
V
4
f
V
5
g
V
6
Factor graphs concisely represents the probability P(v)
Factor Graphs
ψb(v2,v3)
V
a
V
b
V
1
2
3
c
d
e
V
4
f
V
5
g
V
6
Factor graphs concisely represents the probability P(v)
Factor Graphs
ψb({v}b)
V
a
V
b
V
1
2
3
c
d
e
V
4
f
V
5
g
V
6
Factor graphs concisely represents the probability P(v)
Factor Graphs
ψb({v}b)
V
a
b
V
1
2
3
c
d
e
V
f
4
Probability P(v) =
V
V
5
Πa ψa({v}a)
Z
Z is known as the partition function
g
V
6
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
MRF to Factor Graphs
Bayesian Networks to Factor Graphs
Factor Graphs to MRF
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
Motivation
Random Variable V
Label set L = {l1, l2,…, lh}
Samples V1, V2, …, Vm that are i.i.d.
Functions ϕα: L  Reals
α indexes a set of functions
Empirical expectations: μα = (Σi ϕα(Vi))/m
Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li)
Given empirical expectations, find compatible distribution
Underdetermined problem
Maximum Entropy Principle
max Entropy of the distribution
s.t. Distribution is compatible
Maximum Entropy Principle
max -Σi P(li)log(P(li))
s.t. Distribution is compatible
Maximum Entropy Principle
max -Σi P(li)log(P(li))
s.t. Σi ϕα(li)P(li) = μα for all α
Σi P(li) = 1
P(v) proportional to exp(-Σα θαϕα(v))
Exponential Family
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2,…, lh}
Labeling V = v, va  L for all a  {1, 2,…, n}
Functions ϕα: Ln  Reals
α indexes a set of functions
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Parameters
Sufficient Normalization
Statistics
Constant
Minimal Representation
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Parameters
Sufficient Normalization
Statistics
Constant
No non-zero c such that Σα cαΦα(v) = Constant
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2}
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {-1, +1}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
va
θa
for all Va  V
vavb
θab
for all (Va,Vb)  E
Ising Model
P(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {-1, +1}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
va
θa
for all Va  V
vavb
θab
for all (Va,Vb)  E
Interactive Binary Segmentation
Interactive Binary Segmentation
Foreground histogram of RGB values FG
Background histogram of RGB values BG
‘+1’ indicates foreground and ‘-1’ indicates background
Interactive Binary Segmentation
More likely to be foreground than background
Interactive Binary Segmentation
θa proportional to -log(FG(da)) + log(BG(da))
More likely to be background than foreground
Interactive Binary Segmentation
More likely to belong to same label
Interactive Binary Segmentation
θab proportional to -exp(-(da-db)2)
Less likely to belong to same label
Rest of lecture 1 ….
Exponential Family
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Parameters
Sufficient
Statistics
Log-Partition
Function
Random Variables V = {V1,V2,…,Vn}
Random Variable Va takes a value or label va
va  L = {l1,l2,…,lh}
Labeling V = v
Overcomplete Representation
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Parameters
Sufficient
Statistics
Log-Partition
Function
There exists a non-zero c such that Σα cαΦα(v) = Constant
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2}
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {0, 1}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
Ia;i(va)
θa;i
for all Va  V, li  L
Iab;ik(va,vb)
θab;ik
for all (Va,Vb)  E,
li, lk  L
Ia;i(va): indicator for va = li
Iab;ik(va,vb): indicator for va = li, vb = lk
Ising Model
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {0, 1}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
Ia;i(va)
θa;i
for all Va  V, li  L
Iab;ik(va,vb)
θab;ik
for all (Va,Vb)  E,
li, lk  L
Ia;i(va): indicator for va = li
Iab;ik(va,vb): indicator for va = li, vb = lk
Interactive Binary Segmentation
Foreground histogram of RGB values FG
Background histogram of RGB values BG
‘1’ indicates foreground and ‘0’ indicates background
Interactive Binary Segmentation
More likely to be foreground than background
Interactive Binary Segmentation
θa;0 proportional to -log(BG(da))
θa;1 proportional to -log(FG(da))
More likely to be background than foreground
Interactive Binary Segmentation
More likely to belong to same label
Interactive Binary Segmentation
θab;ik proportional to exp(-(da-db)2) if i ≠ k
θab;ik = 0 if i = k
Less likely to belong to same label
Metric Labeling
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2, …, lh}
Metric Labeling
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {0, …, h-1}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
Ia;i(va)
θa;i
for all Va  V, li  L
Iab;ik(va,vb)
θab;ik
for all (Va,Vb)  E,
li, lk  L
θab;ik is a metric distance function over labels
Metric Labeling
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {0, …, h-1}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
Ia;i(va)
θa;i
for all Va  V, li  L
Iab;ik(va,vb)
θab;ik
for all (Va,Vb)  E,
li, lk  L
θab;ik is a metric distance function over labels
Stereo Correspondence
Disparity Map
Stereo Correspondence
L = {disparities}
Pixel (xa,ya) in left
corresponds to
pixel (xa+va,ya) in right
Stereo Correspondence
L = {disparities}
θa;i is proportional to
the difference in RGB values
Stereo Correspondence
L = {disparities}
θab;ik = wab d(i,k)
wab proportional to exp(-(da-db)2)
Pairwise MRF
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2, …, lh}
Neighborhood over variables specified by edges E
Sufficient Statistics
Parameters
Ia;i(va)
θa;i
for all Va  V, li  L
Iab;ik(va,vb)
θab;ik
for all (Va,Vb)  E,
li, lk  L
Pairwise MRF
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2, …, lh}
Neighborhood over variables specified by edges E
Probability P(v) =
Πa ψa(va) Π(a,b) ψab(va,vb)
Z
A(θ) : log Z
ψa(li) : exp(-θa;i)
ψa(li,lk) : exp(-θab;ik)
Parameters θ are sometimes also referred to as potentials
Pairwise MRF
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2, …, lh}
Neighborhood over variables specified by edges E
Labeling as a function f : {1, 2, … , n}  {1, 2, …, h}
Variable Va takes a label lf(a)
Pairwise MRF
P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2, …, lh}
Neighborhood over variables specified by edges E
Labeling as a function f : {1, 2, … , n}  {1, 2, …, h}
Variable Va takes a label lf(a)
Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)
Pairwise MRF
P(f) = exp{-Q(f) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Label set L = {l1, l2, …, lh}
Neighborhood over variables specified by edges E
Labeling as a function f : {1, 2, … , n}  {1, 2, …, h}
Variable Va takes a label lf(a)
Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
Inference
maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} )
Maximum a Posteriori (MAP) Estimation
minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) )
Energy Minimization
P(va = li) = Σv P(v)δ(va = li)
P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk)
Computing Marginals
Next Lecture …
Energy minimization for tree-structured pairwise MRF