Bayesian Networks

Download Report

Transcript Bayesian Networks

Bayesian Networks
VISA
Hyoungjune Yi
BN – Intro.




Introduced by Pearl (1986 )
Resembles human reasoning
Causal relationship
Decision support system/ Expert System
Common Sense Reasoning about
uncertainty






June is waiting for Larry and Jacobs who are both
late for VISA seminar
June is worried that if the roads are icy one or both
of them may have crash his car
Suddenly June learns that Larry has crashed
June think: “If Larry has crashed then probably the
roads are icy. So Jacobs has also crashed”
June then learns that it is warm outside and roads
are salted
June Think: “Larry was unlucky; Jacobs should still
make it”
Causal Relationships
State of Road
Icy/ not icy
Larry
Crash/No crash
Jacobs
Crash/No crash
Larry Crashed !
Information
Flow
Larry
Crash/No crash
State of Road
Icy/ not icy
Jacobs
Crash/No crash
But Roads are dry
State of Road
not icy
Larry
Crash/No crash
Information
Flow
Jacobs
Crash/No crash
Wet grass





To avoid icy roads, Larry moves to UCLA; Jacobs
moves in USC
One morning as Larry leaves for work, he notices
that his grass is wet. He wondered whether he has
left his sprinkler on or it has rained
Glancing over to Jacobs’ lawn he notices that it is
also get wet
Larry thinks: “Since Jacobs’ lawn is wet, it probably
rained last night”
Larry then thinks: “If it rained then that explains why
my lawn is wet, so probably the sprinkler is off”
Larry’s grass is wet
Information
Flow
Jacobs grass
Wet/Dry
Sprinkler
On/Off
Rain
Yes/no
Larry’s grass
Wet
Jacobs’ grass is also wet
Information
Flow
Jacobs grass
Wet
Sprinkler
On/Off
Rain
Yes/no
Larry’s grass
Wet
Bayesian Network



Data structure which represents the dependence
between variables
Gives concise specification of joint prob. dist.
Bayesian Belief Network is a graph that holds
–
–
–
–
–
Nodes are a set of random variables
Each node has a conditional prob. Table
Edges denote conditional dependencies
DAG : No directed cycle
Markov condition
Bayesian network

Markov Assumption
–
–
Each random variable X is
independent of its nondescendent given its parent
Pa(X)
Formally,
Ind(X; NonDesc(X) | Pa(X))
if G is an I-MAP of P (<-? )
I-MAP? Later
Y1
Y2
X
Markov Assumption
Earthquake

Burglary
In this example:
–
–
–
–
–
Ind(
Ind(
Ind(
Ind(
Ind(
E; B )
B; E, R )
R; A, B, C | E )
A; R | B,E )
C; B, E, R | A)
Radio
Alarm
Call
I-Maps
A DAG G is an I-Map of a distribution P if the
all Markov assumptions implied by G are
satisfied by P
 Examples:

X
Y
X
Y
I-MAP

G is Minimal I-Map iff
–
–

G is I-Map of P
If G’  G then G’ is not an I-Map of P
I-Map is not unique
Factorization




Given that G is an I-Map of P, can we simplify the
representation of P?
Example:
X
Y
Since Ind(X;Y), we have that P(X|Y) = P(X)
Applying the chain rule
P(X,Y) = P(X|Y) P(Y) = P(X) P(Y)

Thus, we have a simpler representation of P(X,Y)
Factorization Theorem
Thm: if G is an I-Map of P, then
P(X1 ,..., X n )   P(X i | Pa(X i ))
Earthquake
Burglary
i
Radio
P(C,A,R,E,B) =
P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E)
versus
P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A)
Alarm
Call
So, what ?
We can write P in terms of “local” conditional
probabilities
 If G is sparse,

that is, |Pa(Xi)| < k ,
 each conditional probability can be specified
compactly
e.g. for binary variables, these require O(2k) params.
 representation of P is compact
linear in number of variables
Formal definition of BN

A Bayesian network specifies a probability distribution via
two components:
– A DAG G
– A collection of conditional probability distributions
P(Xi|Pai)

The joint distribution P is defined by the factorization
P (X1 ,..., Xn )   P (Xi | Pai )
i

Additional requirement: G is a minimal I-Map of P
Bayesian Network - Example
P T P(I |P, T )
p t 0.8 0.2
p t
p t
0.6 0.4
0.2 0.8
Pneumonia
Tuberculosis
Lung Infiltrates
XRay
Sputum Smear
p t 0.010.99

Each node Xi has a conditional probability
distribution P(Xi|Pai)
–
–
If variables are discrete, P is usually multinomial
P can be linear Gaussian, mixture of Gaussians, …
BN Semantics
P
T
I
X
S
conditional
local
full joint
independencies + probability = distribution
models
over domain
in BN structure
P(p ,t,i, x, s )  P(p ) P(t) P(i | p ,t)
P(x | i) P(s | t)

Compact & natural representation:
–
nodes have  k parents  2k n vs. 2n params
d-separation

d-sep(X;Y | Z, G)
–

X is d-separated from Y, given Z if all paths from a node in X
to a node in Y are blocked given Z
Meaning ?
–
–
On the blackboard
Path


–
Active: dependency between end nodes in the path
Blocked: No dependency
Common cause, Intermediate, common effect

On the blackboard
BN – Belief, Evidence and Query


BN is for “Query” - partly
Query involves evidence
–

Evidence is an assignment of values to a set of
variables in the domain
Query is a posteriori belief
–
Belief

P(x) = 1 or P(x) = 0
Learning Structure

Problem Definition
–
–

Issue
–
–

Given: Data D
Return: directed graph expressing BN
Superfluous edges
Missing edges
Very difficult
–
http://robotics.stanford.edu/people/nir/tutorial/
BN Learning
Data

T
I
X
S
BN models can be learned from empirical data
–
–

Induce
r
P
parameter estimation via numerical optimization
structure learning via combinatorial search.
BN hypothesis space biased towards distributions with
independence structure.