BayesianNetworks

Download Report

Transcript BayesianNetworks

Bayesian Belief Networks
Bayesian Belief Networks
What does it mean for two variables to be independent?
Consider a multidimensional distribution p(x).
If for two features we know that p(xi,xj) = p(xi)p(xj)
we say the features are statistically independent.
If we know which features are independent and which not,
we can simplify the computation of joint probabilities.
Figure 2.23
Bayesian Belief Networks
A Bayesian Belief Network is a method to describe the joint probability
distribution of a set of variables.
It is also called a casual network or belief net.
Let x1, x2, …, xn be a set of variables or features.
A Bayesian Belief Network or BBN will tell us the probability
of any combination of x1, x2 , .., xn.
An Example
Set of Boolean variables and their relations:
Storm
Bus Tour Group
Lightning
Campfire
Thunder
Forest Fire
Conditional Probabilities
S,B
S,~B ~S,B
~S,~B
C
0.4
0.1
0.8
0.2
~C
0.6
0.9
0.2
0.8
Storm
Bus Tour Group
Campfire
Conditional Independence
We say x1 is conditionally independent of x2 given x3 if the
probability of x1 is independent of x2 given x3:
P(x1|x2,x3) = P(x1|x3)
The same can be said for a set of variables:
x1,x2,x3 is independent of y1,y2,y3 given z1,z2,z3:
P(x1,x2,x2|y1,y2,y3,z1,z2,z3) = P(x1,x2,x3|z1,z2,z3)
Representation
A BBN represents the joint probability distribution of a set of variables
by explicitly indicating the assumptions of conditional independence
through :
a) directed acyclic graph and
b) local conditional probabilities.
Storm
Bus Tour Group
conditional
probabilities
Campfire
Representation
Each variable is independent of its non-descendants given its predecessors
We say x1 is a descendant of x2 if there is a direct path from x2 to x1.
Example: Predecessors of Campfire: Storm, Bus Tour Group.
(Campfire is a descendant of these two variables).
Campfire is independent of Lightning given its predecessors.
Storm
Bus Tour Group
Lightning
Campfire
Figure 2.25
Joint Probability Distribution
To compute the joint probability distribution of a set of variables
given a Bayesian Belief Network we simply use the formula:
P(x1,x2,…,xn) = Π i P(xi | Parents(xi))
Where parents are the immediate predecessors of xi.
Example:
P(Campfire, Storm, BusGroupTour, Lightning, Thunder, ForestFire)?
P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup)
P(Lightning|Storm)P(Thunder|Lightning)
P(ForestFire|Lightning,Storm,Campfire).
Joint Distribution, An Example
Storm
Bus Tour Group
Lightning
Campfire
Thunder
Forest Fire
P(Storm)P(BusTourGroup)P(Campfire|Storm,BusTourGroup)
P(Lightning|Storm)P(Thunder|Lightning)
P(ForestFire|Lightning,Storm,Campfire).
Conditional Probabilities, An Example
S,B
S,~B
~S,B
~S,~B
C
0.4
0.1
0.8
0.2
~C
0.6
0.9
0.2
0.8
Storm
Bus Tour Group
Campfire
P(Campfire=true|Storm=true,BusTourGroup=true) = 0.4
Learning Belief Networks
We can learn BBN in different ways. Two basic approaches follow:
1. Assume we know the network structure:
We can estimate the conditional probabilities for each variable
from the data.
2. Assume we know part of the structure but some variables are
missing:
This is like learning hidden units in a neural network.
One can use a gradient ascent method to train the BBN.
3. Assume nothing is known.
We can learn the structure and conditional probabilities by
looking in the space of possible networks.
Naïve Bayes
What is the connection between a BBN and classification?
Suppose one of the variables is the target variable. Can we compute the
probability of the target variable given the other variables?
In Naïve Bayes:
X1
Concept wj
X2
…
Xn
P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) … P(xn|wj)
General Case
In the general case we can use a BBN to specify independence
assumptions among variables.
General Case:
Concept wj
X1
X2
x4
X3
P(x1,x2,…xn,wj) = P(wj) P(x1|wj) P(x2|wj) P(x3|x1,x2,wj)P(x4,wj)
Judea Pearl – coined the term in 1985
Professor at Univ. of California at LA
Pioneer of Bayesian Networks
Application – Medical Domain
Simple Diagnosis
Smoker
Congestion
Lung
Disease
Positive
Results
Cancer
Constructing Bayesian Networks
Choose the right order from causes to effects.
P(x1,x2,…,xn) = P(xn|xn-1,..,x1)P(xn-1,…,x1)
= Π P(xi|xi-1,…,x1) -- chain rule
Example:
P(x1,x2,x3) = P(x1|x2,x3)P(x2|x3)P(x3)
How to construct BBN
P(x1,x2,x3)
root cause
x3
x2
x1
leaf
Correct order: add root causes first, and then
“leaves”, with no influence on other nodes.
Compactness
BBN are locally structured systems.
They represent joint distributions compactly.
Assume n random variables, each influenced
by k nodes.
Size BBN: n2k
Full size: 2n
Representing Conditional Distributions
Even if k is small O(2k) may be unmanageable.
Solution: use canonical distributions.
Example:
U.S.
Mexico
Canada
North America
simple
disjunction
Noisy-OR
Cold
Flu
Malaria
Fever
A link may be inhibited due to uncertainty
Noisy-OR
Inhibitions probabilities:
P(~fever | cold, ~flu, ~malaria) = 0.6
P(~fever | ~cold, flu, ~malaria) = 0.2
P(~fever | ~cold, ~flu, malaria) = 0.1
Noisy-OR
Now the whole probability can be built:
P(~fever | cold, ~flu, malaria) = 0.6 x 0.1
P(~fever | cold, flu, ~malaria) = 0.6 x 0.2
P(~fever | ~cold, flu, malaria) = 0.2 x 0.1
P(~fever | cold, flu, malaria) = 0.6 x 0.2 x 0.1
P(~fever | ~cold, ~flu, ~malaria) = 1.0
Continuous Variables
Continuous variables can be discretized.
Or define probability density functions
Example: Gaussian distribution.
A network with both variables is called a
Hybrid Bayesian Network.
Continuous Variables
Subsidy
Harvest
Cost
Buys
Continuous Variables
P(cost | harvest, subsidy)
P(cost | harvest, ~subsidy)
Normal
distribution
P(x)
x
Summary
• Bayesian networks are directed acyclic graphs
that concisely represent conditional
independence relations among random
variables.
• BBN specify the full joint probability
distribution of a set of variables.
• BBN can be hybrid, combining categorical
variables with numeric variables.