On-distributing-Baye.. - Department of Math/CS

Download Report

Transcript On-distributing-Baye.. - Department of Math/CS

On Distributing
Probabilistic Inference
Metron, Inc.
Thor Whalen
1
Outline
•
•
•
•
Inference and distributed inference
Conditional independence (CI)
Graphical models, CI, and separability
Using CI: Sufficient information
2
Outline
•
Inference and distributed inference
–
–
–
•
•
•
Probabilistic inference
Distributed probabilistic inference
Use of distributed probabilistic inference
Conditional independence (CI)
Graphical models, CI, and separability
Using CI: Sufficient information
3
Probabilistic Inference
• “(Probabilistic) inference, or model evaluation, is the
process of updating probabilities of outcomes based upon
the relationships in the model and the evidence known
about the situation at hand.” (research.microsoft.com)
• Let V = {X1, ..., Xn} be the
set of variables of interest
• We are given a (prior)
probability space P (V) on V
• Bayesian Inference: Given
some evidence e, we
compute the posterior
probability space P ‘=P (V|e)
Evidence
4
Probabilistic Inference
• A probability space on a finite set V of discrete variables
can be represented as a table containing the probability of
every combination of states of V
• Such a table is commonly called a “potential”
• A receives evidence → have P(e|A) → want P(A,B,C|e)
0.0120
0.0080
0.0540
0.3000
0.1260
0.9000
0.2880
0.1920
0.0960
0.2240
5
Probabilistic Inference
• A probability space on a finite set V of discrete variables
can be represented as a table containing the probability of
every combination of states of V
• Such a table is commonly called a “potential”
• A receives evidence → have P(e|A) → want P(A,B,C|e)
• Assuming evidence e only depends on variable A, we have
P(A,B,C|e) = P(A,B,C)P(e|A)
0.0120
0.0080
0.0036
0.0046
0.0024
0.0031
0.3000
0.0540
0.1260
0.0162
0.0208
0.0378
0.0485
0.3000
0.3000
0.3000
0.9000
0.2880
0.1920
0.2592
0.3323
0.1728
0.2215
0.9000
0.0960
0.2240
0.0864
0.1108
0.2016
0.2585
0.9000
0.3000
0.9000
0.9000
6
Distributed Probabilistic Inference
• Several components, each inferring on a given subspace
• Two components may communicate information about variables they
have in common
• Wish to be able to fuse evidence received throughout the system
A
A
Evidence
same
Probability
space
B
B
B
B
same
Evidence
C
C
7
Use of Distributed Probabilistic Inference
• Be able to implement probabilistic inference
• Implement multi-agent systems:
- agents sense their environment and take actions intelligently
- they can observe given variables of the probability space
- they need to infer on other variables in order to take action
- cooperate in exchanging information about the probability space
(assuming one million operations per second)
n
8
Use of Distributed Probabilistic Inference
• Be able to implement probabilistic inference
• Implement multi-agent systems:
- agents sense their environment and take actions intelligently
- they can observe given variables of the probability space
- they need to infer on other variables in order to take action
- cooperate in exchanging information about the probability space
Agent 1
Agent 2
9
Outline
•
•
•
•
Inference and distributed inference
Conditional independence (CI)
Graphical models, CI, and separability
Using CI: Sufficient information
10
Conditional independence
• Marginalize out “A” from
P(A,B,C) to get P(A,B)
0.0120 + 0.0080
• P(C|A,B) = P(A,B,C)/P(A,B)
• P(C|A,B) = P(C|B)
11
Conditional independence
Wait
P(C|A,B)
a
is a table
you
little!!
minute
of Why
size
Dr.8,
whereas
Watson!
P(C|B) is of size 4!
This is entropicaly
• Marginalize out “A” from
impossible!
P(A,B,C) to get P(A,B)
• P(C|A,B) = P(A,B,C)/P(A,B)
• P(C|A,B) = P(C|B)
Eh, but I see only
four distinct numbers
here, doc...
... and “entropicaly”
is not a word!
12
Conditional independence
• Marginalize out “A” from
P(A,B,C) to get P(A,B)
• P(C|A,B) = P(A,B,C)/P(A,B)
Insensative to A
• P(C|A,B) = P(C|B)
i.e.
(a, b, c)  A  B  C ,
P(C  c | A  a, B  b)
 P(C  c | B  b)
We say that A and C are
conditionally independent
13
given B
Outline
•
•
•
Inference and distributed inference
Conditional independence (CI)
Graphical models, CI, and separability
–
–
•
Bayes nets
Markov networks
Using CI: Sufficient information
14
Bayes Nets
B
A
D
C
E
Note that
P(A,B,C,D,E) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)
P(A) P(B) P(C|A) P(D|B,C) P(E|C)
15
Bayes Nets
• A Bayes Net is a representation of
a probability distribution P(V) on a
set V=X1, ..., Xn of variables
16
Bayes Nets
X5
X4
X7
X9
– A Directed Acyclic Graph (DAG)
X8
X6
X11
• A Bayes Net is a representation of
a probability distribution P(V) on a
set V=X1, ..., Xn of variables
• A BN consists of
X3
X2
X1
• Nodes: Variables of V
• Edges: Causal relations
X10
X12
X13
Directed cycle
A DAG is a directed graph with no directed cycles
The above directed graph is a DAG
Now this graph IS NOT a DAG because it has a directed cycle
17
Bayes Net
X5
X4
X7
X9
– A Directed Acyclic Graph (DAG)
X8
X6
X11
• A Bayes Net is a representation of
a probability distribution P(V) on a
set V=X1, ..., Xn of variables
• A BN consists of
X3
X2
X1
• Nodes: Variables of V
• Edges: Causal relations
X10
X12
X13
– A list of conditional probability
distributions (CPDs); one for every
node of the DAG
Etc...
18
Bayes Nets
X5
X4
X7
X9
• Nodes: variables of V
• Edges: Causal relations
X10
X12
A
– A Directed Acyclic Graph (DAG)
X8
X6
X11
• A Bayes Net is a representation of
a probability distribution P(V) on a
set V=X1, ..., Xn of variables
• A BN consists of
X3
X2
X1
X13
B
C
– A list of conditional probability
distributions (CPDs); one for every
node of the DAG
• The DAG exhibits particular
(in)dependencies of P(V)
A and C are independent given B
- i.e. P(A , C | B) = P(A | B) P(C | B)
- i.e. P(C | A, B) = P(C | B)
We say that B separates A and C
19
Bayes Nets
X5
X4
• A BN consists of
X7
X9
– A Directed Acyclic Graph (DAG)
X8
X6
X11
• A Bayes Net is a representation of
a probability distribution P(V) on a
set V=X1, ..., Xn of variables
X3
X2
X1
• Nodes: variables of V
• Edges: Causal relations
X10
X12
X13
– A list of conditional probability
distributions (CPDs); one for every
node of the DAG
• The DAG characterizes the
(in)dependency structure of P(V)
• The CPDs characterize the
probabilistic and/or deterministic
relations between parent states
20
and children states
Bayes Nets
Parentless nodes
X3
X2
X1
X5
X4
X77
X8
X6
X9
X11
X10
X12
Evidence
X13
• The prior distributions on the
variables of parentless nodes,
along with the CPDs of the BN,
induce prior distribution—called
“beliefs” in the literature—on all
the variables
• If the system receives evidence
on a variable:
– this evidence impacts its belief,
– along with the beliefs of all other
variables
21
Markov networks
• The edges of a Markov network exhibit
direct dependencies between variables
• The absence of an edge means absence of direct dependency
• If a set B of nodes separates the graph into several components
then these components are independent given B
22
Outline
•
•
•
•
Inference and distributed inference
Conditional independence (CI)
Graphical models, CI, and separability
Using CI: Sufficient information
–
–
–
Specifications for a distributed inference system
A naive solution
Using separation
23
Using CI: Sufficient information
Specifications
• A probability space
• A number of agents, each
having
- query variables
- evidence variables
What variables must
agent 1 represent so
that it may fuse the
evidence received by
other agents?
Query variables
Agent 1
B
A
M
D
C
L
Agent 2
Agent 4
E
G
F
K
J
H
I
Agent 3
Evidence variables
24
Using CI: Sufficient information
A naïve solution
• Agents contain their
own query and evidence
variables
• In order to receive
evidence from the other
agents, agent 1 must
represent variables E, F,
G, H, I, J, K, L, and M
Specifications
• A probability space
• A number of agents, each
having
- query variables
- evidence variables
Agent 1
A
B
B
Agent 1
EFG
HIJ
A
M
D
KLM
C
L
Agent 2
Agent 4
E
G
Agent 2
Agent 3
Agent 4
EFG
HIJ
KLM
F
K
J
H
I
Agent 3
25
Using CI: Sufficient information
A naïve solution
• Agents contain their
own query and evidence
variables
• In order to receive
evidence from the other
agents, agent 1 must
represent variables E, F,
G, H, I, J, K, L, and M
Specifications
• A probability space
• A number of agents, each
having
- query variables
- evidence variables
X
A
Agent 1
B
B
HIJ
How else could
the other agents
communicate their
evidence?
Z
Y
A
EFG
Agent 1 must
represent many
variables!
M
Note that
D
KLM
C
Z separates
X and Y
Agent
4
L
Agent 2
whether Y is equal to:
E
G
F
EFG
HIJ
KLM
Y
• {K,L,M},
K
J
H
I
Agent 3
Y
• {H,J,I}, or
• {E,F,G}.
26
Using CI: Sufficient information
Z separates X and Y
→
→
P(Y|Z) = P(Y|X,Z)
Likelihood given
and Z
P(eYX|X,Z)
of evidence on Y
→
=
Likelihood
P(eY|Z) given Z
of evidence on Y
It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
X = {A,B}
X
A
B
P(X,Z|eY)
P(X,Z)
Z = {C,D}
Z
Y
D
C
G
E
F
x P(X,Z)P(Z)-1
P(Z|eY)
ΣY
Z = {C,D}
P(Y,Z) Y)
P(Y,Z|e
evidence eY
Y = {E,G,F}
Agent 2
27
Using CI: Sufficient information
Z separates X and Y
→
→
→
P(Y|X,Z) = P(Y|Z)
P(eY|X,Z)
Because:
P(X,Z)P(Z)-1P(Z|eY)
X = {A,B}
A
B
P(X,Z|eY)
Z = {C,D}
Z
Y
D
C
G
E
P(eY|Z)
It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
X
=
= P(X,Z)P(Z)-1P(Z,eY) P(eY)-1
P(Z|eY)
= P(X,Z) P(eY|Z)P(eY)-1
= P(X,Z) P(eY|X,Z) P(eY)-1
Z = {C,D}
P(Y,Z|eY)
F
Y = {E,G,F}
Agent 2
= P(X,Z,eY) P(eY)-1 = P(X,Z|eY)
28
Using CI: Sufficient information
A naïve solution
• Agents contain their
own query and evidence
variables
• In order to receive
evidence from the other
agents, agent 1 must
represent variables E, F,
G, H, I, J, K, L, and M
A
EFG
EFG
Specifications
• A Bayes net
• A number of agents, each
having
- query variables
- evidence variables
Using separation
• Agent 1 only needs to
represent two extra variables
• Agent 1 may compute its
posterior queries faster from
CD than from EFGHIJK
• Communication lines need to
transmit two variables instead
of three
A
B
HIJ
HIJ
C
KLM
KLM
B
CD
EFG
D
CD
HIJ
CD
KLM
29
Using CI: Sufficient information
Let E and Q be repectively the evidence and querry sets of two different agents.
In order for the agent querying Q to fuse evidence received on E by the other agent
we wish to find a sequence S1,2 , S2,3 ,..., S j 1 such that k  2
P(Qk | Sk 1,k , E )  P(Qk | Sk 1,k ) and P( Sk ,k 1 | Sk 1,k , E )  P(Qk | Sk 1,k )
Query variable
Evidence variable
Other variable
30
31
MatLab Tool
Main functionality
• Insert evidence into given
agents and propagate
their impact inside the
subnet
• Initiate communication
between agents, followed
by the propagation of
new information
• View the marginal
distributions of the
different agents at every
step
• Step forward and
backward
• Save eye-friendly logs to
a file
i + [Enter] to Initialize,
v + [Enter] to view all variables (even
those containing no information),
e + [Enter] to enter evidence,
c + [Enter] to perform a inter-agent
communication,
p + [Enter] to go to the previous step,
n + [Enter] to go to the next step,
a + [Enter] to add a sensor,
r + [Enter] to remove a sensor,
t + [Enter] to turn true marginals view
ON,
m + [Enter] to turn discrepancy marking
OFF,
s + [Enter] to save to a file,
q + [Enter] to quit.
Enter Command:
32
MatLab Tool: Display
Indicates step
number and last
action that was
taken
Shows the marginal
distributions that
would have been
obtained by infering
on the entire Bayes
Net
Shows the marginal
distributions of the
variables represented
in each subnet
Prompts for
new action
* Configuration 2: After evidence L(e|C) = (2,5) has been entered into subnet number 2
The TRUTH:
-----------------|A
| Values |
|----------------|
| True | 0.2005 |
| False | 0.7995 |
------------------
-----------------|C
| Values |
|----------------|
| True | 0.1434 |
| False | 0.8566 |
------------------
-----------------|B
| Values |
|----------------|
| True | 0.4403 |
| False | 0.5597 |
------------------
-----------------|E
| Values |
|----------------|
| True | 0.5426 |
| False | 0.4574 |
------------------
-----------------|D
| Values |
|----------------|
| True | 0.2780 |
| False | 0.7220 |
------------------
-----------------|F
| Values |
|----------------|
| True | 0.2901 |
| False | 0.7099 |
------------------
SUBNET 1 (adjacent to subnets 2): Err(ACB) = 0.0527.
~~~~ AD = 0.0704 / ~~~~ AD = 0.1072 / ~~~~ AD = 0.0493 /
/A
/ Values / / C
/ Values / / B
/ Values /
/~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/
/ True / 0.3000 / / True / 0.2950 / / True / 0.5100 /
/ False / 0.7000 / / False / 0.7050 / / False / 0.4900 /
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
-----------------|E
| Values |
|----------------|
| True | ###### |
| False | ###### |
------------------
-----------------|D
| Values |
|----------------|
| True | ###### |
| False | ###### |
------------------
-----------------|F
| Values |
|----------------|
| True | ###### |
| False | ###### |
------------------
SUBNET 2 (adjacent to subnets 1, 3):
------------------ -----------------|A
| Values | | C
| Values |
|----------------| |----------------|
| True | ###### | | True | 0.1434 |
| False | ###### | | False | 0.8566 |
------------------ ------------------
-----------------|B
| Values |
|----------------|
| True | 0.4403 |
| False | 0.5597 |
------------------
-----------------|E
| Values |
|----------------|
| True | 0.5426 |
| False | 0.4574 |
------------------
-----------------|D
| Values |
|----------------|
| True | 0.2780 |
| False | 0.7220 |
------------------
-----------------|F
| Values |
|----------------|
| True | ###### |
| False | ###### |
------------------
SUBNET 3 (adjacent to subnets 2): Err(EDF) = 0.0169.
------------------ ------------------ -----------------|A
| Values | | C
| Values | | B
| Values |
|----------------| |----------------| |----------------|
| True | ###### | | True | ###### | | True | ###### |
| False | ###### | | False | ###### | | False | ###### |
------------------ ------------------ ------------------
~~~~ AD = 0.0429 /
/E
/ Values /
/~~~~~~~~~~~~~~~~/
/ True / 0.4820 /
/ False / 0.5180 /
~~~~~~~~~~~~~~~~~~
~~~~ AD = 0.0025 /
/D
/ Values /
/~~~~~~~~~~~~~~~~/
/ True / 0.2745 /
/ False / 0.7255 /
~~~~~~~~~~~~~~~~~~
~~~~ AD = 0.0048 /
/F
/ Values /
/~~~~~~~~~~~~~~~~/
/ True / 0.2969 /
/ False / 0.7031 /
~~~~~~~~~~~~~~~~~~
Enter a command (enter h + [Enter] for help):
33
Communication Graph Considerations
3
4
6
1
2
5
A communication
One solution (often
adopted) graph
would be to impose
Agent
receives info
from
agent 1
a tree6 structure
to the
communication
graph
through both agent 4 and 5.
How should subnet 6 deal with possible redundancy?
34
35
Communication Graph Considerations
• When choosing the communication graph,
one should take into consideration
- The quality of the possible communication lines
- The processing speed of the agents
- The importance of given queries
...then
thisiscommunication
graph
If this
the key decisionis more
appropriate
making
agent
… than this one
36
Problem Specification
Given:
• A prob. space on V={X1, ..., Xn}
• A number of agents, each having:
– Qi: a set of query variables
– Ei: a set of evidence variables
37
Problem Specification
Given:
• A BN on V={X1, ..., Xn}
• A number of agents, each having:
– Qi: a set of query variables
– Ei: a set of evidence variables
Determine:
• An agent communication graph
• A subset Si of V for each agent
• An inference protocol that specifies
– How to fuse evidence and messages
received from other agents
– The content of messages between
agents
38
Distributed Inference Design
• A communication graph:
– Nodes represent agents
– Edges represent communication lines
• Each agent i has:
–
–
–
Qi: a set of query variables
Ei: a set of evidence variables
Pi(Si): a probability space on a subset Si of V
–
An inference protocol. This includes a specification of
• What to do with received evidence or messages?
• What messages must be sent to other agents?
Query variables
Evidence variables
39
Distributed Bayesian Inference Problem
Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an
inference scheme so that,
given any evidence e = e1, ..., ek
(where ei is the set of evidence received by subnet i),
the agents may compute the correct posterior on their
query variables,
i.e. for all i, Pi (the probability space of agent i) must
become consistent with P on its query variables
i.e. agent i must be able to compute, for all query
variable Q of Qi, the probability Pi(Q|e) = P (Q|e)
40
More Definitions
Let X, Y and Z be subsets of V:
• If P is a prob. space on V, I(X,Y|Z)P is the statement
“X is independent of Y given Z,” i.e. P (X|Y,Z) = P (X|Z)
• If D is a DAG, I(X,Y|Z)D is the statement
“X and Y are d-separated by Z”
• If G a graph, I(X,Y|Z)G is the statement
“X and Y are disconnected by Z”
• Theorem: If D is a Bayes Net for P and G is the moral
graph of the ancestor hull of XUYUZ, then
I(X,Y|Z)G ↔ I(X,Y|Z)D → I(X,Y|Z)P
41
Use of Conditional Independence
in the Distributed Inference Problem
• What should S1 send to S2 so that Q2 so may “feel” the
effect of evidence received by S1 on E1?
• We want S2 to be able to update its probability space so
that
P2(Q2 | e1) = P (Q2 | e1)
• Claim: If I(E1,Q2|A,B)P then P1(A,B|e1) = is sufficient
information for S2 to update its probability space
• “Proof”: P (Q2 | E1,A,B) = P (Q2 | A,B)
S1
E1
S2
A
Q2
B
42
Using CI: Sufficient information
43
Distributed Bayesian Inference Problem
Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an
inference scheme so that,
given any evidence e = e1, ..., ek
where ei is the set of evidence received by subnet i,
the subnets may compute the correct posterior on their
query variables,
i.e. the Pi must become consistent with P on their query
variables
i.e. subnet i must be able to compute, for all Q of Qi, the
probability Pi(Q|e) = P (Q|e)
44
Distributed Bayesian Inference:
Inference Protocol
• A message between two subnets is a joint distribution on
a common subset of variables, computed from the
probability space of the sender
• Subnets remember the last message that each subnet
sent to it
• A subnet divides the new message by the old one and
absorbs the result into its probability space
45
Sufficient information
Z separates X and Y
→
→
P(Y|Z) = P(Y|X,Z)
Likelihood given X and Z
of evidence on Y
→
=
Likelihood given Z
of evidence on Y
It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
X = {A,B}
X
A
B
P(X,Z|eY)
P(X,Z)
Z = {C,D}
Z
Y
D
C
G
E
F
x P(X,Z)P(Z)-1
P(Z|eY)
ΣY
Z = {C,D}
P(Y,Z) Y)
P(Y,Z|e
evidence eY
Y = {E,G,F}
Agent 2
46
Sufficient information
Z separates X and Y
→
→
→
P(Y|X,Z) = P(Y|Z)
Likelihood given
and Z
P(eYX|X,Z)
of evidence on Y
It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
Because:
P(X,Z)P(Z)-1P(Z|eY)
X = {A,B}
X
A
B
P(X,Z|eY)
Z = {C,D}
Z
Y
D
C
G
E
=
Likelihood
P(eY|Z) given Z
of evidence on Y
= P(X,Z)P(Z)-1P(Z,eY) P(eY)-1
P(Z|eY)
= P(X,Z) P(eY|Z)P(eY)-1
= P(X,Z) P(eY|X,Z) P(eY)-1
Z = {C,D}
P(Y,Z|eY)
F
Y = {E,G,F}
Agent 2
= P(X,Z,eY) P(eY)-1 = P(X,Z|eY)
47
Communication Graph Considerations
• In a tree communication graph every edge is the only communication line
between two parts of the network
• Hence it must deliver enough information so that the evidence received in
one part may convey its impact to the query variables of the other part
• We restrict ourselves to the case where every node represented by an
agent can be queried or receive evidence
• In this case it is sufficient that the set of variables Z, that will be
represented in any communication line, separates the set X of variables of
one side of the network from the set Y of variables of the other side
Z
X
Y
48