Data Modeling - Hiram College

Download Report

Transcript Data Modeling - Hiram College

Bayesian Networks
CPSC 386 Artificial Intelligence
Ellen Walker
Hiram College
Bayes’ Rule
• P(A^B) = P(A|B) * P(B)
• = P(B|A) * P(A)
• So
• P(A|B) = P(B|A) * P(A) / P(B)
• This allows us to compute diagnostic probabilities
from causal probabilities and prior probabilities!
Joint Probability Distribution
• Consider all possibilities of a set of
propositions
E.g. Picking 2 cards from a deck
P(card1 is red and card2 is red)
P(card1 is red and card2 is black)
P(card1 is black and card2 is red)
P(card1 is black and card2 is black)
Sum of all combinations should be 1
Sum of all “card1 is red” combinations is P(card1 is
red)
Joint P.D. table
Second card is Second card is
red
black
First card is red 26*25/52*51
(0.245)
26*26/52*51
(0.255)
First card is
black
26*25/52*51
(0.245)
26*26/52*51
(0.255)
Operations on Joint P.D.
• Marginalization (summing out)
– Add up elements in a row or column that contain all
possibilities for a given item to remove that item from the
distribution
• P(1st card is red) = 0.245+0.255 = 0.5
• Conditioning
– Get a distribution over one variable by summing out all other
variables
• Normalization
– Take the values in the distribution and find a proper multiplier
(alpha) so that they add up to 1.
A bigger joint PD
Flu
No flu
Fever
No fever
Fever
No fever
ache
.15
.05
.05
.1
No ache
.05
0
.2
.4
Based on that PD…
• Summing out…
–
–
–
–
–
P(flu) = 0.25
P(fever) = 0.45
P(flu ^ fever) = 0.2
P(~flu ^ fever) = 0.25
P(flu | fever) = P(flu ^ fever) / P(fever) =
.0.2 / 0.45 = (4/9)
• Normalizing
– P(flu | fever) =  <0.2, 0.25> = 0.2/ (0.45) = 4/9
Evaluating Full Joint PD’s
• Advantage
– All combinations are available
– Any joint or unconditional probability can be
computed
• Disadvantage
– Combinatorial Explosion! For N variables, need
2N individual probabilities
– Difficult to get probabilities for all combinations
Independence
• Absolute independence:
– P(A|B) = P(A) or P(A,B) = P(A)*P(B)
– No need for joint table
Conditional independence
• P(A|B,C)= P(A|C) or P(A,B|C) =
P(A|C)*P(B|C)
• If we know the truth of C, then A and B
become independent
– (e.g. ache and fever are independent given flu)
• We can say C “separates” A and B
Naïve Bayes model
• Assume that all possible effects (symptoms)
are separated by the cause
• Then:
– P(cause, effect1, effect2, effect3…) =
P(effect1|cause) * P(effect2|cause) * …
• Can work surprisingly well in many cases
• Necessary conditional probabilities can be
learned
Bayesian Network
• Data structure that represents
– Dependencies (and conditional independencies)
among variables
– Necessary information to compute a full joint
probability distribution
Structure of Bayesian Network
• Nodes represent random variables
– (e.g. flu, fever, ache)
• Directed links (arrows) connect pairs of
nodes, from parent to child
• Each node has joint P.D. P(child | parents)
– More parents, bigger P.D.
• Graph has no directed cycles
– No node is its own (great… grand) parent!
Example Bayesian Network
Damp weather
ache
Thermometer
>100 F
fever
flu
spots
measles
Probabilities in the network
• Probability of set of variable assignments is
the product of joint probabilities computed
from parents
– P(x1,x2, …) =
P(x1 |parents(X1))* P(x2 |parents(X2)) …
• Example
– P(~therm ^ damp ^ ache ^ ~fever ^flu) =
P(~therm) * P(damp) * P(ache | damp) *P(~fever |
therm) * P(flu | ache ^ fever)
Constructing a Bayesian Network
• Start with root causes
• Add direct consequences next (connected)
and so on…
– E.g. damp weather -> ache, not damp weather ->
flu
• Each node should be directly connected to
(influenced by) only a few
• If we choose a different order, we’ll get tables
that are too big!