Jensen.Gitelman.SSDAAR2.Poster.2003

Download Report

Transcript Jensen.Gitelman.SSDAAR2.Poster.2003

Bayesian Belief Networks: Computational Considerations for the Environmental
Researcher
Bayesian Belief Networks (BBN)
Bayesian Belief Networks are a class of models which can be used to describe succinctly the
dependencies and interactions between large sets of variables. BBN have been used extensively in
fields such as artificial intelligence and decision sciences. BBN are also well-suited to environmental
research with its large numbers of variables and extensive dependencies and interactions. In fact, BBN
are ideally suited to studying relationships between effluent limitations and water quality (e.g., Borsuk
et al., 2003).
Stephen Jensen & Alix I Gitelman
Statistics Department
Oregon State University
August, 2003
BBN are graphical models—a set of nodes represents the random variables and a set of vertices
(edges) represents relationships between the nodes. A BBN can contain directed or undirected vertices,
and even a mixture of the two (some examples are shown below):
Four Node Example: Structural Learning vs RJMCMC Methods
or “This is all very nice, Steve, but will it work?”
In this example we consider a simple data set of four continuous variables, to which we will fit a directed graphical model. This model is fit by using a traditional algorithm (in this case DEAL), and also by an RJ
MCMC algorithm (using the package BayesX). The data are Mid-Atlantic Integrated Assessment (MAIA) data set for 1997-1998. Specifically, the variables are BUGIBI, an index of biotic integrity for macro-invertebrates;
LAT, latitude of the sample point; ANC, acid neutralizing capacity; and SO4,sulfur dioxide.
Directed Graph
Undirected Graph
Chain Graph
Undirected BBN have their genesis in statistical physics work by Gibbs (1902). Additionally, they
have been used through the years for modeling spatial interactions (Besag, 1974), interpreting
hierarchical log-linear models by analogy to Markov random fields (Darroch et al., 1980). Directed
BBN originated in path analysis (Wright 1921). They have seen much more use recently in the guise of
causal networks (Pearl, 1986; Lauritzen & Speed, 1988).
Structural Learning Method
Reversible Jump MCMC Method
Title:
deal2
Creator:
dot version 1.10 (Wed Jul 9 23:09:17 EDT 2003)
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer, but not to
other ty pes of printers.
Title:
deal1
Creator:
dot version 1.10 (Wed Jul 9 23:09:17 EDT 2003)
Prev iew :
This EPS picture w as not s av ed
w ith a preview inc luded in it.
Comment:
This EPS picture w ill print to a
Pos tSc ript printer, but not to
other ty pes of printers.
Probability: 0.130
Probability: 0.125
Probability: 0.107
Probability: 0.638
Estimating BBN
Several directed BBN algorithms have been devised, including HUGIN (Andersen et al., 1989),
TETRAD (Sprites et al., 1993) and DEAL (Bøttcher & Dethlefsen, 2003). To determine a BBN model
for a set of data, a canned package such as HUGIN or DEAL can be used exclusively. These packages
search across some space of likely models using either an exhaustive search or possibly a heuristic like
a greedy search. The networks are scored in some manner to determine the best candidates. These
packages are useful for structural learning—that is, understanding which nodes are connected to which.
Using these packages, the probability of an edge being included in the graph is not given, and neither
are standard error estimates provided for parameter (i.e., node probability) estimates. Furthermore, for
efficiency, these canned packages all require frequent triangulation of a graph, which is known to be an
NP-hard problem. Although restriction to decomposable models only requires a single triangulation.
(Deshpande et al., 2001).
An extension of this method is possible by first using HUGIN or another package first to find a set
of candidate models (that is, to perform the structural learning), and then estimate the parameters of
those models by using a Markov Chain Monte Carlo method (this is the parameter learning phase). In
this way, varaibility estimates for the model parameters can be calculated. However, this approach is
two-stage, and in it, probabilities for an edge being included in the model remain uncalculated. Others
have simply assumed a structure for a BBN, and then used extant software to estimate the strength of
the edges (Borsuk et al., 2003).
Reversible jump Markov chain Monte Carlo (RJMCMC; Green, 1995) is a generalization of the
Metropolis-Hastings algorithm to models with varying dimension of the parameter space. This makes it
well suited for graphical model estimation, where the number (and direction) of edges may be
unknown a priori. That is, using the RJMCMC algorithm, we can accomplish both structural and
parameter learning. Furthermore, edge probabilities and estimate variability are calculated as a matter
of course. RJ MCMC works well for discrete undirected graphical models (Dellaportas & Forster,
1999), but is computationally demanding. Efficiency is improved by restricting the search to
decomposable models. Algorithms for undirected decomposable graphs (UDGs) exist in purely
continuous (Giudici & Green, 1999) and purely discrete settings (Giudici et al., 2000). Fronk &
Giudici (2000) provided an algorithm for directed acyclic graphs (DAGs).
Relative Score = 1
Relative Score = 0.8211
Title:
deal4
Creator:
dot v ers ion 1.10 (Wed Jul 9 23:09:17 EDT 2003)
Preview :
This EPS picture w as not saved
w ith a preview included in it.
Comment:
This EPS picture w ill print to a
PostScript printer, but not to
other ty pes of printers .
Title:
deal3
Creator:
dot v ers ion 1.10 (Wed Jul 9 23:09:17 EDT 2003)
Preview :
This EPS picture w as not saved
w ith a preview included in it.
Comment:
This EPS picture w ill print to a
PostScript printer, but not to
other ty pes of printers .
References
Relative Score = 0.7999
Relative Score = 0.7234
Recommendations
Depending on the nature of prior information regarding the ecological systems one wishes to model
using BBN, there are several computational approaches from which to choose.
•Assume “known” nodes, perform structural learning (e.g., DEAL, HUGIN).
•Assume “known” edges, perform parameter learning (e.g., WINBUGS).
Select an edge for addition, removal or reversal.
2.
Decide whether to make the structural change and update the structure and parameters.
3.
Update the parameters in some fashion without making a structural change.
RJ MCMC algorithms will often require the same re-triangulation of the graph as the traditional
algorithms, making the structural updating quite computationally intensive. Restriction to
decomposable models makes the computation easier, as the model structure can be updated in
polynomial time (Desphande et al., 2001). Updating the parameters (step 3 above) is a simple
Metropolis-Hastings step, which is rather quick.
BayesX. http://www.stat.uni-muenchen.de/~lang/bayesx/bayesx.html
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 302-309.
Borsuk, M.E., Stow, C.A., Reckhow, K.H. (2003) Integrated approach to total maximum daily load development for Neuse River Estuary using Bayesian
probability network model. J. Water Res. Pl. & Mgmt. 129,4 271-282.
Bøttcher, S.G. and Dethlefsen, C. (2003). DEAL: A package for learning Bayesian networks. Technical report, R-2003-03. Department of Mathematical
Sciences, Aalborg University.
BUGS. http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml
Darroch, J.N., Lauritzen, S.L. and Speed, T.P. (1980). Markov fields and log-linear interaction models for contingency tables. Ann. Stat. 8 522-539.
Dawid, A. and Lauritzen, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat., 21 1272-1317.
DEAL. http://www.math.auc.dk/novo/deal/
•Perform a two-step approach—first learn the structure, then the parameters.
Dellaportas, P., Forster, J.J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615-633.
•Perform structural and parameter learning simultaneously (RJMCMC; e.g., BayesX).
Deshpande, A., Garofalakis, M. and Jordan, M.I. (2001). In J.Breese and D. Koller (eds)., Uncertainty in Artificial Intelligence (UAI), Proceedings of the
Seventeenth Conference 2001.
Tradeoffs for these approaches include computational expenses (in terms of programming costs and CPU time)
and the feasibility (really, availability) of probability and uncertainty estimation.
The RJ MCMC algorithms for graphical models involve three steps:
1.
Andersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN – A shell for building Bayesian belief universes for expert systems, in Shafer, G. and
Pearl, J. (eds) (1990). Readings in Uncertain Reasoning, Morgan Kaufmann Publishers, San Mateo, CA.
Fronk, E. and Giudici, P. (2000). Markov chain Monte Carlo model selection for DAG models. Technical report #118, Dept. of Political Economy and
Quantitative Methods, University of Pavia.
Gansner, E., Koutsofis, E., and North, S. Drawing graphs with dot. http://www.research.att.com/sw/tools/graphviz/dotguide.pdf
Gibbs, W. (1902). Elementary Principles of Statistical Mechanics. Yale University Press, New Haven, Connecticut.
This poster is brought to you by:
Giudici, P., Green, P.J. and Tarantola, C. (2000). Efficient model determination for discrete graphical models. Biometrika. To appear.
Giudici, P. and Green, P.J. (1999). Decomposable graphical Gaussian model determination. Biometrika 86 785-801.
Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.
This research is funded by
U.S.EPA – Science To Achieve
Results (STAR) Program
Cooperative
#CR-829095
Agreement
HUGIN. http://www.hugin.com/
Lauritzen, S.L. (2003). gRaphical models -- A Software Perspective. Slides from DSC-2003.
Lauritzen, S.L. and Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical
discussion). J. Roy. Stat. Soc. Ser. B 50 157-224.
structures and their application to expert systems (with
Pearl, J. (1986). Fusion, propagation and structuring in belief networks. Artificial Intelligence 29 241-288.
Sprites, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction and Search. Springer-Verlag, New York.
TETRAD. http://www.phil.cmu.edu/projects/tetrad/
Waagepetersen, R. and Sorensen, D. (2001). A tutorial on reversible jump MCMC with a view toward QTL-mapping. International Statistical Review 69 49-61.
Wright, S. (1921). Correlation and causation. J. Agric. Res. 20 557-585.
The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State
University. This poster has not been formally reviewed by EPA. The views expressed here are solely those of the authors and the STARMAP, the Program they represents. EPA does not
endorse any products or commercial services mentioned in this presentation.