poster - Xiannian Fan

Download Report

Transcript poster - Xiannian Fan

TIGHTENING BOUNDS FOR
BAYESIAN NETWORKS STRUCTURE LEARNING
Xiannian Fan, Changhe Yuan and Brandon Malone
A recent breadth-first branch and bound algorithm (BFBnB) for learning Bayesian network structures (Malone et al. 2011) uses two bounds to prune the search space for better efficiency;
one is a lower bound calculated from pattern database heuristics, and the other is an upper bound obtained by a hill climbing search. Whenever the lower bound of a search path exceeds
its upper bound, the path is guaranteed to lead to suboptimal solutions and is discarded immediately. This paper introduces methods for tightening the bounds. The lower bound is
tightened by using more informed variable groupings in creating the pattern databases, and the upper bound is tightened using an anytime learning algorithm. Empirical results show that
these bounds improve the efficiency of Bayesian network learning by two to three orders of magnitude.
Tightening Upper Bound
Bayesian Network Structure Learning
Representation. Joint probability distribution over a
set of variables.
Structure. DAG storing conditional dependencies.
•Vertices correspond to variables.
•Edges indicate relationships among variables.
Parameters. Conditional probability distributions.
Learning. Find the network with the minimal score for
complete dataset D. We often omit D for brevity.
Anytime window A* (AWA*) was shown to find high quality, often optimal, solutions very quickly, thus provided a tight upper bound.
Tightening Lower Bound
The lower bound was calculated from a pattern database heuristic called k-cycle conflict heuristic. Particularly, Static k-cycle conflict pattern database
was shown to have a good performance.
Computing k-Cycle Conflict Heuristic.
Its main idea is to relax the acyclicity constraint between groups of variables; acyclicity is enforced among the variables within each group.
For a 8-variable problem, partition all the variables by Simple Grouping (SG) into two groups: G1={X1, X2, X3, X4}, G2={X5, X6, X7, X8}. We created the
pattern databases with a backward breadth-first search in the order graph for each group.
More Informed Grouping Strategies
Dynamic Programming
Rather than use SG (1st half VS 2nd half grouping), we
developed more informed grouping strategies.
Intuition. All DAGs must have a leaf. Optimal
networks for a single variable are trivial. Recursively
add new leaves and select optimal parents until adding
all variables. All orderings have to be considered.
Begin with a Pick one variable Pick another leaf.
Find its optimal
single variable. as leaf. Find its
optimal parents. parents from
current.
Recurrences.
1. Maximizing the correlation between the variables
within each group, and Minimize the correlation between
groups.
a) Family Grouping (FG): We created a correlation graph
by Max-Min Parent Children (MMPC) algorithm, and
gave weights by negative p-value; then performed
graph partition.
Continue picking leaves
and finding optimal
parents.
P1 = h1({X2,X3}) = BestScore(X2, {X1, X4}U G2) + BestScore(X3, {X1, X2 , X4}U G2)
b) Parents Grouping (PG): We created a correlation graph
by only considering the optimal parent set out of all the
other variables for each variable, and gave weights by
negative p-value; then performed graph partition.
P2 = h2({X5,X7}) = BestScore(X5, {X6, X8} U G1) + BestScore(X7, {X5, X6 , X8} U G1)
2. Using Topological Ordering Information.
Additive Pattern database heuristic :
h({X2,X3,X5,X7}) = h1({X2,X3})+h2({X1, X4}))= P1 + P2
a) Topology Grouping (TG): We created a correlation
graph by considering the topological ordering of an
anytime Bayesian Network solution by AWA*, then
partitioned the variables according to the ordering.
E.g., how to calculate the heuristic for pattern {X2,X3,X5,X7}?
Graph Search Formulation
The dynamic programming can be visualized as a
search through an order graph.
The Order Graph
Calculation. Score(U),
best subnetwork for U.
Node. Score(U) for U.
Successor. Add X as a
leaf to U.
Path. Induces an
ordering on variables.
Size. 2n nodes, one
for each subset.
Experiments
We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB.
The effect of upper bounds
generated by running AWA*
for different amount of time
on the performance of
BFBnB search.
Admissible Heuristic Search
Formulation
Start Node. Top node, {}.
Goal Node. Bottom
node, V.
Shortest Path. Corresponds
to optimal structure.
g(U). Score(U).
h(U). Relax acyclicity.
Selected References
1.
2.
3.
4.
Yuan, C.; Malone, B.; and Wu, X. 2011. Learning optimal
Bayesian networks using A* search. In IJCAI ‘11, 2011.
Malone, B.; Yuan,C.2011. Improving the Scalability of
Optimal Bayesian Network Learning with Frontier BreadthFirst Branch and Bound Search. In UAI’11, 2011.
Felner, A.; Korf, R. E.; and Hanan, S. 2004. Additive pattern
database heuristics. Journal of Artificial Intelligence Research
(JAIR) vol. 22.
Malone, B, Yuan, C. 2013. Evaluating Anytime Algorithms for
Learning Optimal Bayesian Networks. In UAI’13, 2013.
The effect of different
grouping strategies on the
number of expanded nodes
and time. The four grouping
methods are the simple
grouping (SG), FG, PG, and
TG.