Probabilistic temporal models

Download Report

Transcript Probabilistic temporal models

Temporal Probabilistic Models
Reminders
• HW5 due on Thursday
• HW6 out on Tuesday
• ~1 month until final projects due
Motivation
• Observing a stream of
data
– Health monitoring (of
people, computer
systems, etc)
– Surveillance, tracking
– Finance & economics
– Science
• Questions:
– Modeling & forecasting
– Unobserved variables
Time Series Modeling
• Time occurs in steps t=0,1,2,…
– Time step can be seconds, days, years, etc
• State variable Xt, t=0,1,2,…
• For partially observed problems, we see
observations Ot, t=1,2,… and do not see
the X’s
– X’s are hidden variables (aka latent variables)
Modeling Time
• Arrow of time
Causes
Effects
• Causality? Bayesian networks to the
rescue
Probabilistic Modeling
• For now, assume fully observable case
X0
X1
X2
X3
X2
X3
• What parents?
X0
X1
Markov Assumption
• Assume Xt+k is independent of all Xi for i<t
P(Xt+k | X0,…,Xt+k-1) = P(Xt+k | Xt,…,Xt+k-1)
• K-th order Markov Chain
Order 0
X0
X1
X2
X3
Order 1
X0
X1
X2
X3
Order 2
X0
X1
X2
X3
Order 3
X0
X1
X2
X3
1st order Markov Chain
• MC’s of order k>1 can be converted into a
1st order MC
[left as exercise]
• So w.o.l.o.g., “MC” refers to a 1st order MC
X0
X1
X2
X3
Inference in MC
• What independence relationships can we
read from the BN?
X0
X1
X2
X3
Observe X1
X0 independent of X2, X3, …
P(Xt|Xt-1) known as transition model
Inference in MC
• Prediction: the probability of future state?
– P(Xt) = Sx0,…,xt-1P (X0,…,Xt-1)
= Sx0,…,xt-1P (X0) Px1,…,xt P(Xi|Xi-1)
= Sxt-1P(Xt|Xt-1) P(Xt-1)
• “Blurs” over time, and approaches
stationary distribution as t grows
– Limited prediction power
– Rate of blurring known as mixing time
How does the Markov assumption
affect the choice of state?
• Suppose we’re tracking a point (x,y) in 2D
• What if the point is…
– A momentumless particle
subject to thermal vibration?
– A particle with velocity?
– A particle with intent, like
a person?
How does the Markov assumption
affect the choice of state?
• Suppose the point is the position of our
robot, and we observe velocity and intent
• What if:
– Terrain affects
speed?
– Battery level
affects speed?
– Position is noisy, e.g. GPS?
Is the Markov assumption
appropriate for:
• A car on a slippery road?
• Sales of toothpaste?
• The stock market?
Partial Observability
• Hidden Markov Model (HMM)
– Roughly equivalent to a Dynamic Bayesian
Network (DBN)
X0
X1
X2
X3
Hidden state
variables
O1
O2
O3
Observed
variables
P(Ot|Xt) called the observation
model (or sensor model)
Inference in HMMs
•
•
•
•
Filtering
Prediction
Smoothing, aka hindsight
Most likely explanation
X0
X1
X2
X3
O1
O2
O3
Inference in HMMs
•
•
•
•
Filtering
Prediction
Smoothing, aka hindsight
Most likely explanation
Query variable
X0
X1
X2
O1
O2
Filtering
• Name comes from signal processing
• P(Xt|o1:t) = Sxt-1 P(xt-1|o1:t-1) P(Xt|xt-1,ot)
• P(Xt|Xt-1,ot) = P(ot|Xt-1,Xt)P(Xt|Xt-1)/P(ot|Xt-1)
= a P(ot|Xt)P(Xt|Xt-1)
Query variable
X0
X1
X2
O1
O2
Filtering
• P(Xt|o1:t) = a Sxt-1P(xt-1|o1:t-1) P(ot|Xt)P(Xt|xt-1)
• Forward recursion
• If we keep track of P(Xt|o1:t)
=> O(1) updates for all t!
Query variable
X0
X1
X2
O1
O2
Inference in HMMs
•
•
•
•
Filtering
Prediction
Smoothing, aka hindsight
Most likely explanation
X0
Query
X1
X2
X3
O1
O2
O3
Prediction
• P(Xt+k|o1:t)
• 2 steps: P(Xt|o1:t), then P(Xt+k|Xt)
• Filter then predict as with standard MC
Query
X0
X1
X2
X3
O1
O2
O3
Inference in HMMs
•
•
•
•
Filtering
Prediction
Smoothing, aka hindsight
Most likely explanation
Query
X0
X1
X2
X3
O1
O2
O3
Smoothing
• P(Xk|o1:t) for k < t
• P(Xk|o1:k,ok+1:t)
= P(ok+1:t|Xk,o1:k)P(Xk|o1:k)/P(ok+1:t|o1:k)
= a P(ok+1:t|Xk)P(Xk|o1:k) Standard filtering to time k
Query
X0
X1
X2
X3
O1
O2
O3
Smoothing
• Computing P(ok+1:t|Xk)
• P(ok+1:t|Xk) = Sxk+1P(ok+1:t|Xk,xk+1) P(xk+1|Xk)
= Sxk+1P(ok+1:t|xk+1) P(xk+1|Xk)
= Sxk+1P(ok+2:t|xk+1)P(ok+1|xk+1)P(xk+1|Xk)
Backward recursion
Given prior states
X0
X1
O1
X2
O2
X3
O3
What’s the
probability of this
sequence?
Inference in HMMs
•
•
•
•
Filtering
Prediction
Smoothing, aka hindsight
Most likely explanation
X0
X1
X2
X3
O1
O2
O3
Query returns a path
through state space
x0,…,x3
MLE: Viterbi Algorithm
• Recursive computation of max likelihood
of path to all xt in Val(Xt)
• mt(Xt) = maxx1:t-1 P(x1,…,xt-1,Xt|o1:t)
=a P(ot|Xt) maxxt-1P(Xt|xt-1) mt-1(xt-1)
• Previous ML state
argmaxxt-1P(Xt|xt-1) mt-1(xt-1)
Does this sound familiar?
MLE: Viterbi Algorithm
• Do the “logarithm trick”
• log mt(Xt) = log a P(ot|Xt)
+ maxxt-1 [log P(Xt|xt-1) + log mt-1(xt-1) ]
• View:
– log a P(ot|Xt) as a reward
– log P(Xt|xt-1) as a cost
– log mt(Xt) as a value function
• Bellman equation
MLE: Viterbi Algorithm
• Do the “logarithm trick”
• log mt(Xt) = log a P(ot|Xt)
+ maxxt-1 [log P(Xt|xt-1) + log mt-1(xt-1) ]
• View:
– log a P(ot|Xt) as a reward
– log P(Xt|xt-1) as a cost
– log mt(Xt) as a value function
• Bellman equation
Applications of HMMs in NLP
• Speech recognition
• Hidden phones
(e.g., ah eh ee th r)
• Observed, noisy
acoustic features
(produced by signal
processing)
Phone Observation Models
Phonet
Model defined to be
robust over variations
in accent, speed,
pitch, noise
Signal processing
Features
(24,13,3,59)
Featurest
Phone Transition Models
Phonet
Phonet
Good models will capture (among other things):
Featurest
Pronunciation of words
Subphone structure
Coarticulation effects
Triphone models = order 3 Markov chain
Word Segmentation
• Words run together
when pronounced
• Unigrams P(wi)
• Bigrams P(wi|wi-1)
• Trigrams P(wi|wi-1,wi-2)
Random 20 word samples from R&N using N-gram models
Logical are as
confusion a may right
tries agent goal the
was diesel more object
then informationgathering search is
Planning purely diagnostic
expert systems are very
similar computational
approach would be
represented compactly
using tic tac toe a predicate
Planning and
scheduling are
integrated the success
of naïve bayes model is
just a possible prior
source by that time
Tricks to improve recognition
• Narrow the # of variables
– Digits, yes/no, phone tree
• Training with real user data
– Real story: “Yes ma’am”
Kalman Filtering
• In a nutshell
– Efficient filtering in continuous state
spaces
– Gaussian transition and observation
models
• Ubiquitous for tracking with noisy
sensors, e.g. radar, GPS, cameras
Linear Gaussian Transition Model
• Consider position and velocity xt, vt
• Time step h
• Without noise
xt+1 = xt + h vt
vt+1 = vt
• With Gaussian noise of std s1
P(xt+1|xt)  exp(-(xt+1 – (xt + h
vt))2/(2s12)
i.e. xt+1 ~ N(xt + h vt, s1)
Linear Gaussian Transition Model
• If prior on position is Gaussian, then the
posterior is also Gaussian
vh
s1
N(m,s)  N(m+vh,s+s1)
Linear Gaussian Observation
Model
• Position observation zt
• Gaussian noise of std s2
zt ~ N(xt,s2)
Linear Gaussian Observation
Model
• If prior on position is Gaussian, then the
posterior is also Gaussian
Posterior probability
Observation probability
Position prior
m  (s2z+s22m)/(s2+s22)
s2  s2s22/(s2+s22)
Multivariate Case
• Transition matrix F, covariance Sx
• Observation matrix H, covariance Sz
mt+1 = F mt + Kt+1(zt+1 – HFmt)
St+1 = (I - Kt+1)(FStFT + Sx)
Where
Kt+1= (FStFT + Sx)HT(H(FStFT + Sx)HT +Sz)-1
• Got that memorized?
Properties of Kalman Filter
• Optimal Bayesian estimate for linear
Gaussian transition/observation models
• Need estimates of covariance… model
identification necessary
• Extensions to nonlinear
transition/observation models work as long
as they aren’t too nonlinear
– Extended Kalman Filter
– Unscented Kalman Filter
Properties of Kalman Filter
• Optimal Bayesian estimate for linear
Gaussian transition/observation models
• Need estimates of covariance… model
identification necessary
• Extensions to nonlinear systems
– Extended Kalman Filter: linearize models
– Unscented Kalman Filter: pass points through
nonlinear model to reconstruct gaussian
– Work as long as systems aren’t too nonlinear
Non-Gaussian distributions
• Gaussian distributions are a “lump”
Kalman filter
estimate
Non-Gaussian distributions
• Integrating continuous and discrete states
“up”
“down”
Splitting with a
binary choice
Example: Failure detection
• Consider a battery meter sensor
– Battery = true level of battery
– BMeter = sensor reading
• Transient failures: send garbage at time t
• Persistent failures: send garbage forever
Example: Failure detection
• Consider a battery meter sensor
– Battery = true level of battery
– BMeter = sensor reading
• Transient failures: send garbage at time t
– 5555500555…
• Persistent failures: sensor is broken
– 5555500000…
Dynamic Bayesian Network
Batteryt-1
Batteryt
(Think of this structure
“unrolled” forever…)
BMetert
BMetert ~ N(Batteryt,s)
Dynamic Bayesian Network
Batteryt-1
Batteryt
BMetert
BMetert ~ N(Batteryt,s)
Transient
failure model
P(BMetert=0 | Batteryt=5) = 0.03
Results on Transient Failure
Meter reads 55555005555…
E(Batteryt)
Transient failure occurs
Without model
With model
Results on Persistent Failure
Meter reads 5555500000…
E(Batteryt)
Persistent failure occurs
With transient
model
Persistent Failure Model
Brokent-1
Brokent
Batteryt-1
Batteryt
BMetert
BMetert ~ N(Batteryt,s)
P(BMetert=0 | Batteryt=5) = 0.03
P(BMetert=0 | Brokent) = 1
Results on Persistent Failure
Meter reads 5555500000…
E(Batteryt)
Persistent failure occurs
With persistent
failure model
With transient
model
How to perform inference on DBN?
• Exact inference on “unrolled” BN
– Variable Elimination – eliminate old time steps
– After a few time steps, all variables in the
state space become dependent!
– Lost sparsity structure
• Approximate inference
– Particle Filtering
Particle Filtering
• Represent a distribution at time t as a set
of N “particles” St1,…,StN
• Repeat for t=0,1,2,…
– Sample S[i] from P(Xt+1|Xt=Sti) for all i
– Compute weight w[i] = P(e|Xt+1=S[i]) for all i
– Sample St+1i from S[.] according to weights
w[.]
Weighted resampling step
Battery Example
Sampling step
Brokent-1
Batteryt-1
Brokent
Batteryt
BMetert
Battery Example
P(BMeter=0|sample) = ?
Brokent-1
Batteryt-1
Brokent
Batteryt
BMetert
Suppose we now observe BMeter=0
0.03
1
Battery Example
P(BMeter=0|sample) = ?
Brokent-1
Batteryt-1
Brokent
Batteryt
BMetert
Compute weights (drawn as particle size)
0.03
1
Battery Example
P(BMeter=0|sample) = ?
Brokent-1
Batteryt-1
Brokent
Batteryt
BMetert
Weighted resampling
Battery Example
Sampling Step
Brokent-1
Batteryt-1
Brokent
Batteryt
BMetert
Battery Example
Brokent-1
Brokent
Batteryt-1
Batteryt
BMetert
Now observe BMetert = 5
Battery Example
Brokent-1
Brokent
Batteryt-1
Batteryt
BMetert
Compute weights
1
0
Battery Example
Brokent-1
Brokent
Batteryt-1
Batteryt
BMetert
Weighted resample
Applications of Particle Filtering in
Robotics
• Simultaneous Localization and Mapping (SLAM)
• Observations: laser rangefinder
• State variables: position, walls
Couple of Plugs
• CSCI B659: Principles of Intelligent Robot
Motion
– http://cs.indiana.edu/classes/b659-hauserk
• CSCI B657: Computer Vision
– Chen Yu
Next Time
• Learning distributions from data
• Read R&N 20.1-3
• HW5 due at end of class