Bayesian Networks

Download Report

Transcript Bayesian Networks

Bayesian Networks
What is the likelihood of X given
evidence E? i.e. P(X|E) = ?
Issues
• Representational Power
– allows for unknown, uncertain information
• Inference
– Question: What is Probability of X if E is true.
– Processing: in general, exponential
• Acquisition or Learning
– network: human input
– probabilities: data+ learning
Bayesian Network
•
•
•
•
Directed Acyclic Graph
Nodes are RV’s
Edges denote dependencies
Root nodes = nodes without predecessors
– prior probability table
• Non-root nodes
– conditional probabilites for all predecessors
Pearl Alarm Example
B -> A E ->A A->JC A->MC
P(B) = .001 P(-B) = .999
P(E) = .002 P(-E) = .998 etc.
P(A|B&E) = .95 P(A|B&-E) = .94
P(A|-B&E) = .29 P(A|-B&-E) = .001
P(JC|A) = .90 P(JC|-A) = .05
P(MC|A) = .70 P(MC|-A) = .01
Joint Probability yields all
• Event = fully specified values for RVs.
• Prob of event: P(x1,x2,..xn) =
P(x1|Parents(X1)*..P(xn|Parents(Xn))
• E.g. P(j&m&a&-b&-e) =
P(j|a)*P(m|a)*P(a|-b^-e)*P(-b)*P(-e) =
.9*.7*.001*.999*..998 = .00062.
• Do this for all events and then sum as
needed.
• Yield exact probability
Summing out
• P(b) from full joint = sum over all events
with b = true (marginalization)
– Silly: must be .001.
• P(MC|JC) = linked by alarm, sum still too
much but not so apparent
• Need to Answer: what RVs are independent
of others depending on the evidence.
• We’re skipping: Market Blankets
Probability Calculation Cost
• For example, P( 5 boolean variables)
requires 2^5 entries. In general 2^n.
• For Bayes net, only need tables for all
conditional probabilities and priors.
• If max k inputs to a node, and n RVs, then
need n*2^k table entries.
• Computation can be reduced, but difficult.
Approximate Inference
• Simple Sampling
• Use BayesNetwork as a generative model
• Eg. generate 10000 examples, via
topological order.
• Generates examples with appropriate
distribution.
• Now use examples to estimate probabilities.
Sampling -> probabilities
•
•
•
Generate examples with proper probability
density.
Use the ordering of the nodes to construct
events.
Finally use counting to yield an estimate
of the exact probability.
Estimate P(JC|MC)
1. Do a large number of times
1. Use prior tables to compute “root events”
2. Say b = f and e = g
3. Use conditional tables to compute internal nodes
values (IN ORDER)
4. Say a = false
5. Say JC = false and MC = true
2. Count the number of appropriate events
3. Note many entries irrelevant: In this case only if
MC = true is event considered. Markov Chain
Monte Carlo will only construct appropriate
events.
Confidence of Estimate
• Given n examples and k are heads.
• How many examples needed to be 99%
certain that k/n is within .01 of the true p.
• From statistic: Mean = np, Variance = npq
• For confidence of .99, t = 3.25 (table)
• 3.25*sqrt(pq/N) < .01 => N >6,400.
• Correct probabilities not needed: just
correct ordering.
Applications
• Bayesian Networks extended to decision
theory.
– decision nodes have actions attached
– value nodes indicate expect utility
• Pathfinder (heckerman): medical diagnosis
– adds utility theory (decision theory)
– some actions specific tests
– 60 disease, 130 features
• Research Arena