MS PowerPoint format

Download Report

Transcript MS PowerPoint format

Lecture 30 of 41
Graphical Models of Probability
for Causal Reasoning
Monday, 01 November 2004
William H. Hsu
Laboratory for Knowledge Discovery in Databases
Department of Computing and Information Sciences
Kansas State University
http://www.kddresearch.org
This presentation is based upon:
http://www.kddresearch.org/KSU/CIS/Math-20021107.ppt
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Overview
• Graphical Models of Probability
–
–
–
–
Markov graphs
Bayesian (belief) networks
Causal semantics
Direction-dependent separation (d-separation) property
• Learning and Reasoning: Problems, Algorithms
– Inference: exact and approximate
• Junction tree – Lauritzen and Spiegelhalter (1988)
• (Bounded) loop cutset conditioning – Horvitz and Cooper (1989)
• Variable elimination – Dechter (1996)
– Structure learning
• K2 algorithm – Cooper and Herskovits (1992)
• Variable ordering problem – Larannaga (1996), Hsu et al. (2002)
• Probabilistic Reasoning in Machine Learning, Data Mining
• Current Research and Open Problems
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Graphical Models Overview [1]:
Bayesian Networks
•
Conditional Independence
– X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff
P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z
– Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L
•
Bayesian (Belief) Network
– Acyclic directed graph model B = (V, E, ) representing CI assertions over 
– Vertices (nodes) V: denote events (each a random variable)
– Edges (arcs, links) E: denote conditional dependencies
•
•
n
Markov Condition for BBNs (Chain Rule): P X , X ,  , X   P X | parents X 

1
2
n
i
i
i 1
Example BBN
Exposure-To-Toxins
Serum Calcium
Age X1
X3
Cancer
X6
X5
Gender X2
X4
Smoking

 



X7
Tumor
Lung


Descendants
NonDescendants
Parents
P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative)
= P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Graphical Models Overview [2]:
Markov Blankets and d-Separation Property
Motivation: The conditional independence status of
nodes within a BBN might change as the availability of
evidence E changes. Direction-dependent separation (dseparation) is a technique used to determine conditional
independence of nodes as evidence changes.
Definition: A set of evidence nodes E d-separates two
sets of nodes X and Y if every undirected path from a
node in X to a node in Y is blocked given E.
A path is blocked if one of three conditions holds:
X
From S. Russell & P. Norvig (1995)
E
(1)
Z
(2)
Z
(3)
Z
Y
Adapted from J. Schlabach (1996)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Graphical Models Overview [3]:
Inference Problem
Multiply-connected case: exact, approximate inference are #P-complete
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
http://aima.cs.berkeley.edu/
Kansas State University
Department of Computing and Information Sciences
Other Topics in Graphical Models [1]:
Temporal Probabilistic Reasoning
•
Goal: Estimate P(X ti | y 1r )
•
Filtering: r = t
Adapted from Murphy (2001), Guo (2002)
– Intuition: infer current state from observations
– Applications: signal identification
– Variation: Viterbi algorithm
•
Prediction: r < t
– Intuition: infer future state
– Applications: prognostics
•
Smoothing: r > t
– Intuition: infer past hidden state
– Applications: signal enhancement
•
CF Tasks
– Plan recognition by smoothing
– Prediction cf. WebCANVAS – Cadez et al. (2000)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Other Topics in Graphical Models [2]:
Learning Structure from Data
•
General-Case BBN Structure Learning: Use Inference to Compute Scores
•
Optimal Strategy: Bayesian Model Averaging
– Assumption: models h  H are mutually exclusive and exhaustive
– Combine predictions of models in proportion to marginal likelihood
• Compute conditional probability of hypothesis h given observed data D
• i.e., compute expectation over unknown h for unseen cases
• Let h  structure, parameters   CPTs




P x m 1 | D  P x 1 , x 2 ,  , x n | x 1 , x 2  ,  , x m 


P x m 1 | D, h  P h | D 

 
 


hH
Posterior Score
Marginal Likelihood
Prior over Parameters
P h | D   P D | h   P h 
 P h   P D | h, Θ   P Θ | h  dΘ

Prior over Structures
CIS 730: Introduction to Artificial Intelligence
Likelihood
Kansas State University
Department of Computing and Information Sciences
Propagation Algorithm in Singly-Connected
Bayesian Networks – Pearl (1983)
C1
Upward (child-toparent)  messages
C2
’ (Ci’) modified during 
message-passing phase
C3
Downward  messages
C4
C5
P’ (Ci’) is computed during 
message-passing phase
C6
Multiply-connected case: exact, approximate inference are #P-complete
(counting problem is #P-complete iff decision problem is NP-complete)
Adapted from Neapolitan (1990), Guo (2000)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inference by Clustering [1]: Graph Operations
(Moralization, Triangulation, Maximal Cliques)
Find Maximal Cliques
A1
Clq4
Clq1
A
1
B
2
Moralize
A
Bayesian Network
(Acyclic Digraph)
A
G
E
D
8
C
G
E
D
B2
G
5
H
G5
E3
E3
G5
E3
C4
Clq3
C4
H
7
G5
C4
Clq6
F
B
Clq2
C
4
F
B
F
6
E
3
F6
B2
Triangulate
C4
D8
Clq5
H7
C
D
H
Adapted from Neapolitan (1990), Guo (2000)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inference by Clustering [2]:
Junction Tree – Lauritzen & Spiegelhalter (1988)
Input: list of cliques of triangulated, moralized graph Gu
Output:
Tree of cliques
Separators nodes Si,
Residual nodes Ri and potential probability (Clqi) for all cliques
Algorithm:
1. Si = Clqi (Clq1  Clq2 … Clqi-1)
2. Ri = Clqi - Si
3. If i >1 then identify a j < i such that Clqj is a parent of Clqi
4. Assign each node v to a unique clique Clqi that v  c(v)  Clqi
5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi}
6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques
Adapted from Neapolitan (1990), Guo (2000)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inference by Clustering [3]:
Clique-Tree Operations
Ri: residual nodes
Si: separator nodes
(Clqi): potential probability of Clique i
A1
Clq4
Clq1
F6
B2
Clq2
B2
BEC
Clq2 = {B,E,C}
R2 = {C,E}
S2 = { B }
G5
E3
C4
Clq3
C4
G5
Clq3 = {E,C,G}
R3 = {G}
S3 = { E,C }
C4
Clq6
C4
EGF
Clq5
H7
Clq4 = {E, G, F}
R4 = {F}
S4 = { E,G }
Adapted from Neapolitan (1990), Guo (2000)
CIS 730: Introduction to Artificial Intelligence
B
(Clq2) = P(C|B,E)
Clq2
ECG EC
(Clq3) = 1
Clq3
EG
(Clq4) =
P(E|F)P(G|F)P(F)
D8
(Clq1) = P(B|A)P(A)
Clq1
G5
E3
E3
AB
Clq1 = {A, B}
R1 = {A, B}
S1 = {}
CG
(Clq5) = P(H|C,G)
CGH
Clq5
Clq4
Clq5 = {C, G,H}
R5 = {H}
S5 = { C,G }
CD
Clq6 = {C, D}
R5 = {D}
S5 = { C}
(Clq2) = P(D|C)
C
Clq6
Kansas State University
Department of Computing and Information Sciences
Inference by Loop Cutset Conditioning
Age = [0, 10)
Split vertex in
undirected cycle;
condition upon each
of its state values
X1,1
Age = [10, 20)
X1,2
Exposure-ToToxins
X3
Age = [100, )
X1,10
X2
Gender
Serum Calcium
Cancer
X6
X5
X4
Smoking
•
Deciding Optimal Cutset: NP-hard
•
Current Open Problems
X7
Lung Tumor
Number of network
instantiations:
Product of arity of
nodes in minimal loop
cutset
Posterior: marginal
conditioned upon
cutset variable values
– Bounded cutset conditioning: ordering heuristics
– Finding randomized algorithms for loop cutset optimization
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Inference by Variable Elimination [1]:
Intuition
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
http://aima.cs.berkeley.edu/
Kansas State University
Department of Computing and Information Sciences
Inference by Variable Elimination [2]:
Factoring Operations
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
http://aima.cs.berkeley.edu/
Kansas State University
Department of Computing and Information Sciences
Inference by Variable Elimination [3]:
Example
P(A), P(B|A), P(C|A), P(D|B,A), P(F|B,C), P(G|F)
Season
A
Sprinkler
Rain
B
C
F
D
Wet
Manual
Watering
G
G
P(G|F) G=1
D
P(D|B,A)
F
P(F|B,C)
B
P(B|A)
C
P(C|A)
A
P(A)
Slippery
P(A|G=1) = ?
d = < A, C, B, F, D, G >
λG(f) = ΣG=1 P(G|F)
Adapted from Dechter (1996), Joehanes (2002)
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Genetic Algorithms for Parameter Tuning in
Bayesian Network Structure Learning
Dtrain (Inductive Learning)
D:
Training Data
[2] Representation Evaluator
for Learning Problems
Dval (Inference)

I e:
Inference Specification
α
f(α)
Representation
Candidate
Fitness
Representation
[1] Genetic Algorithm
Genetic Wrapper for
Change of Representation
and Inductive Bias Control
CIS 730: Introduction to Artificial Intelligence
α̂
Optimized
Representation
Kansas State University
Department of Computing and Information Sciences
Computational Genomics and
Microarray Gene Expression Modeling
[A] Structure
Learning
G2
D: Data (User, Microarray)
G1
G4
G3
G5
G = (V, E)
[B] Parameter
Estimation
G2
Treatment 1
(Control)
Treatment 2
(Pathogen)
Learning
Environment
Messenger RNA cDNA
(mRNA) Extract 1
G1
Dval (Model Validation by Inference)
G4
G5
G3
B = (V, E, )
Specification Fitness
(Inferential Loss)
Messenger RNA
(mRNA) Extract 2 cDNA
DNA Hybridization Microarray
(under LASER)
Adapted from Friedman et al. (2000)
CIS 730: Introduction to Artificial Intelligence
http://www.cs.huji.ac.il/labs/compbio/
Kansas State University
Department of Computing and Information Sciences
Tools for Building Graphical Models
• Commercial Tools: Ergo, Netica, TETRAD, Hugin
• Bayes Net Toolbox (BNT) – Murphy (1997-present)
– Distribution page
http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html
– Development group
http://groups.yahoo.com/group/BayesNetToolbox
• Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present)
– Distribution page
http://bndev.sourceforge.net
– Development group
http://groups.yahoo.com/group/bndev
– Current (re)implementation projects for KSU KDD Lab
• Continuous state: Minka (2002) – Hsu, Guo, Perry, Boddhireddy
• Formats: XML BNIF (MSBN), Netica – Guo, Hsu
• Space-efficient DBN inference – Joehanes
• Bounded cutset conditioning – Chandak
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
References [1]:
Graphical Models and Inference Algorithms
• Graphical Models
– Bayesian (Belief) Networks tutorial – Murphy (2001)
http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
– Learning Bayesian Networks – Heckerman (1996, 1999)
http://research.microsoft.com/~heckerman
• Inference Algorithms
– Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988)
http://citeseer.nj.nec.com/huang94inference.html
– (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989)
http://citeseer.nj.nec.com/shachter94global.html
– Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986)
http://citeseer.nj.nec.com/dechter96bucket.html
– Recommended Books
• Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001)
• Castillo, Gutierrez, Hadi (1997)
• Cowell, Dawid, Lauritzen, Spiegelhalter (1999)
– Stochastic Approximation http://citeseer.nj.nec.com/cheng00aisbn.html
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
References [2]:
Machine Learning, KDD, and Bioinformatics
• Machine Learning, Data Mining, and Knowledge Discovery
– K-State KDD Lab: literature survey and resource catalog (2002)
http://www.kddresearch.org/Resources
– Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer,
Thornton (2004)
http://bndev.sourceforge.net
– Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002)
http://mldev.sourceforge.net
– NCSA Data to Knowledge (D2K): Welge, Redman, Auvil, Tcheng, Hsu
http://www.ncsa.uiuc.edu/STI/ALG
• Bioinformatics
– European Bioinformatics Institute Tutorial: Brazma et al. (2001)
http://www.ebi.ac.uk/microarray/biology_intro.htm
– Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002)
http://www.cs.huji.ac.il/labs/compbio/
– K-State BMI Group: literature survey and resource catalog (2002)
http://www.kddresearch.org/Groups/Bioinformatics
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences