Computational Models For Probabilistic Inference
Download
Report
Transcript Computational Models For Probabilistic Inference
George Mason University
Department of Systems Engineering and Operations Research
GMU
Bayesian Decision Theory
Instructor: Kathryn Blackmond Laskey
George Mason University
Department of Systems Engineering and
Operations Research
[email protected]
http://www.ite.gmu.edu/~klaskey
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 1 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
This workshop is dedicated to the memory
of journalist Danny Pearl, brutally murdered
in Pakistan in February 2002, and to the
pioneering research of his father Judea
Pearl. Judea Pearl’s research has the
potential to create unprecedented advances
in our ability to anticipate and prevent
future terrorist incidents.
A portion of any compensation derived from this workshop
will be donated to the foundation to honor Danny Pearl:
Pearl Family Foundation:
c/o Wall Street Journal
P.O. Box 300
Princeton, New Jersey 08543
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 2 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Software for Bayesian Networks
and Decision Graphs
•
•
•
A number of COTS software tools are now available for Bayesian
networks and decision graphs
The CD supplied with this seminar contains a set of examples
constructed in the Netica® package. A limited functionality version
of Netica® is included on the CD. All the examples can be run
directly from the CD. To purchase a full functionality version of
Netica® visit http://www.norsys.com.
Examples on the following pages are annotated with the name of the
Netica® file or files containing a Bayesian network model for the
example
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 3 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference using Bayes Rule
Combining expert knowledge and data
7. Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 4 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Models and Representations
• Engineers, scientists, and policy analysts construct models to
represent systems
– We use the models to answer questions about the system
– Goal: Build “good enough” models
» “Good enough” depends on purpose for which model is used
» Simplifications and inaccuracies don’t matter if they don’t materially
affect results
• Representations are approximations
– Restricted set of variables
– Unrealistic simplifications
– Untested assumptions
• Models are constructed from:
– Past data on system or related systems
– Judgment of subject matter experts
– Judgment of experienced model builders
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 5 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
Observations
Representation
Real World
Actions
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 6 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 7 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
The Fusion Challenge
• Fusion is the process of incorporating information from
different sources into a single “fused” representation
• Why fusion is difficult:
–
–
–
–
Vast quantities of sensor information
Real-time processing requirements
Restrictions on weight, communication bandwidth
Need to integrate physical and geometrical models with
qualitative knowledge
– Noisy, unreliable, ambiguous data
– Active attempts at deception
– Requirement for robustness to new or little-known threat
types and/or behavior patterns
• Why fusion is important:
– Features that are meaningless in isolation are definitive in
combination
Data, data everywhere, and
not the time to think…
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 8 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
The Systems Engineering Challenge
• Systems engineering is the process of transforming a user need
into an operational system that meets that need
• Why systems engineering is difficult:
–
–
–
–
–
–
–
–
Increasing complexity of systems
Moving from stovepipe to system of systems
Constant improvement cycle
Need to predict performance of integrated systems made up of
legacy, COTS and newly developed subsystems
Need to predict performance of new and untested technologies
Need to do trade studies among large numbers of options with
varying degrees of technology maturity and performance uncertainty
Increased reliance on models and simulations with varying degrees
of fidelity and resolution
Increasing cost and schedule pressure
• Why systems engineering is important:
– An effective and efficient systems engineering process produces
quality systems in a cost-effective manner
– Poor systems engineering leads to spectacular performance
debacles and cost overruns
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 9 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Paradigm Shift in Computing
• Old paradigm: Algorithms running on Turing machines
– Deterministic steps transform inputs into outputs
– Result is either right or wrong
– Semantics based on Boolean logic
• New paradigm: Economy of software agents executing on a physical
symbol system
– Agents make decisions (deterministic or stochastic) to achieve objectives
– “Program” is replaced by dynamic system in which solution quality
improves over time
– Semantics based on decision theory / game theory / stochastic processes
• Hardware realizations of physical symbol systems
–
–
–
–
Physical systems minimize action
Decision theoretic systems maximize utility / minimize loss
Hardware realization of physical symbol system maps action to utility
Programming languages are replaced by specification / interaction
languages
– Software designer specifies goals, rewards and information flows
– Unified theory spans sub-symbolic to cognitive levels
• Old paradigm is limiting case of new paradigm
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 10 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Requirements for New Paradigm Logic
•
•
•
•
Embrace uncertainty
Perform plausible reasoning
Learn from experience
Incorporate observation, historical data, expert
knowledge
• Explore multiple alternatives
• Replace rote procedure with focus on attaining
objectives
• Trade off multiple objectives
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 11 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
7.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference and optimization algorithms
Combining expert knowledge and data
Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 12 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
The GOOD-D Process
• Bayesian decision theory is based on the GOOD-D process
• Good decision makers:
– Think carefully about their Goals
– Explore multiple Options
– Predict the possible Outcomes of the decision and the likelihood
of each outcome under each of the options they are considering
– Weigh uncertainties and trade off different goals to Decide which
option best serves their goals
– Do it! Implement a sound and effective plan to carry through on
the decision and monitor success
• Computational models to support GOOD-D process should be
grounded in new-paradigm theory
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 13 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
Decision Graph
•
Utility variables represent decision maker’s goals
•
Action variables represent options available to decision maker
•
World state variables represent outcomes of the decision and other relevant aspects of world state
•
Arrows represent influences
–
Cause and effect relationships between actions and world state variables
–
Statistical associations among world state variables
–
How utility for the decision maker depends on state of the world
Probabilities: P(Has disease) = .6
Disease
Expected utilities:
Outcome
Utility
Treat:
.690
+ .490
=
90
Don't treat:
.60
+ .4100 =
40
Best action: Treat patient
Action
Outcom e
Dis e as e
Outcom e
Utility
Tr e at
Has dis e as e
90
Tr e at
Doe s n't have
dis e as e
Don't tre at
Has dis e as e
Dis e as e fr ee
Side e ffe cts
Dis e as e fr ee
Side e ffe cts
Long illne s s
No s ide e ffe cts
Don't tre at
Doe s n't have
dis e as e
Dis e as e fr ee
No s ide e ffe cts
100
©Kathryn Blackmond Laskey
June 2002
90
0
Decision graphs are also
called influence diagrams
BDT Workshop - 14 -
George Mason University
GMU
•
Department of Systems Engineering and Operations Research
Decision Theory: Criticisms and Rebuttals
Argument:
– Bayesians insert subjectivity into analyses
•
Rebuttal:
– Every analysis depends on assumptions
– Bayesians make subjective assumptions explicit in a way that leaves them open for debate and
analysis
– Bayesian theory provides a principled, theoretically justifiable way to integrate informed expert
judgment with scientific theory and statistical data
•
Argument:
– You just can’t put a numerical value on some things, such as human lives
•
Rebuttal:
– Every time you get into your car you are implicitly putting a value on your own and others’ lives
– Every time the airline allows a person to pass through a metal detector a value is being placed on
human lives
– Bayesian decision theory provides a scientifically sound way to make these tradeoffs explicit and
open to public debate
– Bayesian decision theory provides a language for public discourse to improve society’s ability to
make sound policy judgments
•
Argument:
– If anybody can have any prior they want, how can we justify our model output?
•
Rebuttal:
–
–
–
–
Bayesian decision theory allows for reasonable people to disagree
When there is disagreement the political process must operate to determine policy
Decision theoretic models can inject sound science into the political debate
With enough data non-dogmatic Bayesians will come to agree
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 15 -
George Mason University
Decision Graph: An Example
GMU
•
•
•
Department of Systems Engineering and Operations Research
Maria is visiting a friend when she suddenly
begins sneezing.
"Oh dear, I'm getting a cold," she thinks. “I
had better not visit Grandma.”
Then she notices scratches on the furniture.
She sighs in relief. "I'm not getting a cold!
It's only my cat allergy acting up!”
Plausible inference
The evidence for cat allergy
“explains away” sneezing
and cold is no longer needed
as an explanation
Cat_Nearby
True
3.00
Cat_Nearby
False
97.0
True
15.2
Cat_Nearby
False
84.8
True
91.5
False
8.53
Allergic_Reaction
Cold
True
3.22
True
8.00
Allergic_Reaction
False
96.8
Cold 92.0
False
True
20.8
True
51.7
Allergic_Reaction
False
79.2
Cold
False
48.3
True
88.4
True
14.4
False
11.6
False
85.6
Scratches_on_Furniture
Sneezing
True
2.77
True
15.3
Scratches_on_Furniture
Sneezing
False
97.2
False
84.7
True
9.95
True
100
HealthOfGrandmother
Scratches_on_Furniture
Sneezing
False
90.1
False
0
True
100
True
100
HealthOfGrandmother
False
0
False
0
VisitGrandmother
Go
117.000
VisitGrandmother
StayHome 100.000
Go
73.2510
VisitGrandmother
StayHome 100.000
Go
110.592
StayHome 100.000
©Kathryn Blackmond Laskey
June 2002
HealthOfGrandmother
Pleasure
Pleasure
Pleasure
3
2
1
Netica® file Cat_DG.dne
BDT Workshop - 16 -
George Mason University
Department of Systems Engineering and Operations Research
What is a Decision Graph?
GMU
• Both a knowledge representation and a computational
architecture
– Represents knowledge about entities and their interactions
– Modular elements with defined interconnections
– Computation can exploit loosely coupled structure for efficiency
• Used for inference and/or decision problems
– Infer likely values of some variables from other variables
– Choose policy that best achieves decision maker’s objectives
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 17 -
George Mason University
GMU
•
•
•
Department of Systems Engineering and Operations Research
Types of Influence
Links into chance variables from other chance variables
are called relevance links. A relevance arc indicates that
the value of one variable is relevant to the probability
distribution of the other variable.
Links from decision variables into chance variables are
called influence links. An influence link means that the
decision affects, or influences, the outcome of the chance
variable.
Links into decision variables are called information links.
An information link means that the quantity will be known
at the time the decision is made.
– Decision variables are ordered in time
– In standard influence diagrams, a decision variable and all its
information predecessors are (implicit) information
predecessors to all future decision variables (no forgetting)
•
•
Relevance Link
Influence Link
Information Link
Links from chance or decision variables into value
variables represent functional links. Value variables may
not be parents to decision or chance variables. If there is
more than one terminal value variable, the total value is the
sum of all terminal value variables.
Bayesian network is a decision graph having only world
state variables
Value Links
Notation for value variables varies. Some packages use
rounded boxes, others diamonds, others hexagons.
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 18 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Specifying a BN
Variable
Parent Variable Values
Cold
Allergic_Reaction
True
False
8%
92%
Cat_Nearby = True
75%
25%
Cat_Nearby = False
1%
99%
99%
1%
5%
95%
3%
97%
Cat_Nearby = True
60%
40%
Cat_Nearby = False
1%
99%
Allergic_Reaction = True
Sneezing
Cold = True
Cold = False
Cold = True
Allergic_Reaction = False
Cold = False
Cat_Nearby
Scratches_On_Furniture
•
Simplicity
–
–
•
Scalability
–
–
•
31 probabilities required to specify general distribution for 5 binary variables
8 probabilities needed to specify this model
Over 1030 probabilities required to specify general distribution for 50 variables with 4 values per variable
About 9000 probabilities required to specify BN for 50 variables, 4 values per variable, 3 parents per variable, no local structure
Tractability
–
–
General probabilistic inference is exponential in number of variables
Inference in singly-connected BN is linear in number of variables
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 19 -
George Mason University
GMU
•
Department of Systems Engineering and Operations Research
Directed Graphs Represent Dependence
Graphs with nodes and edges provide a powerful tool for visualizing dependencies
between variables
– Variables are represented by nodes
– Direct dependencies between variables are represented by edges connecting nodes
– Absence of an edge between 2 variables means no direct dependency
•
Example: Rain or a sprinkler could cause the pavement in front of the house to be wet.
This could cause a passerby to slip and fall, and could also ruin my new shoes
Sprinkler
Rain
P avement
Fall
©Kathryn Blackmond Laskey
– Rain and Sprinkler are independent, but
are dependent conditional on Pavement
– Fall and Shoes are dependent (if a fall is
more likely, so are ruined shoes) but are
independent given Pavement
– Sprinkler and Fall are dependent, but are
independent given Pavement
Shoes
June 2002
BDT Workshop - 20 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Causation and Explaining Away
• Alternate causes of an event are often (approximately) independent
given some generally understood context. They become dependent
when a common effect is observed. (Knowing that one cause is true
"explains away" other potential causes)
• Learning about an effect that could be caused by either of two
variables introduces an informational dependence between them
• Informational dependence is different from causal dependence
– We can change an effect by intervening to change
the cause
»
We can make the car start if the battery is dead by
putting in a new battery
BatteryOK
SolenoidOK
– We cannot change the state of a variable by
changing one on which it depends informationally
»
If the car won’t start and we learn that the solenoid
is bad we infer that the battery is probably OK
»
But we cannot make the battery OK by destroying
the solenoid
©Kathryn Blackmond Laskey
June 2002
CarStarts
BDT Workshop - 21 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Exercise: Multi-Sensor Fusion
• Several blind men are congregating around an elephant,
arguing loudly over what they are feeling. Some of them,
feeling the legs, shout: “It is a tree!” Others, feeling the tail,
shout: “It is a rope!” Still others, feeling the ears, shout: “It
is a blanket!” Others, feeling the trunk, shout: “It is a snake!”
• Use a Bayesian network to show that a fusion system that
integrates information from all these sources, each looking at
a different aspect of the elephant, can reach a conclusion that
none of them is able to reach on the basis of his own
information.
Netica® file BlindManAndElephant.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 22 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Exercise: Develop a Bayesian Network
The temperature warning light on a piece of equipment is designed to come on
when the engine overheats. The light will also flash if the temperature sensor is
broken. If the engine has overheated, it may be because a belt is broken or
because the temperature in the engine room is too high. A high temperature in
the engine room might cause the sensor to malfunction, and may itself be caused
by a broken air conditioner. Either a broken belt or a broken air conditioner could
be caused by a company history of poor equipment maintenance. An overheated
engine might cause product defects.
– Draw a Bayesian network representing the uncertainties faced by a technician in
diagnosing the reason the light is on.
– Use a Bayesian network software package to enter probabilities for this network. Use
your judgment to choose reasonable probability distributions.
– Use your Bayesian network to answer the following questions
» What is the probability that the engine has overheated?
» What is the probability that the engine has overheated given that the warning light is on?
» What is the probability that the engine has overheated given that the warning light is on, and
the air conditioner is broken?
» What is the probability that there are product defects?
» What is the probability that there are product defects given that the light is on, the air
conditioner is OK, and the maintenance history is poor?
Netica® file Eqpt_Diagnosis.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 23 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Example: Develop a Decision Graph
• If the belt is replaced it is unlikely to break regardless of
maintenance practice. Replacing the belt has a cost and
so do product defects.
– Add a decision variable and value variables to your graph
– Use your model to analyze the decision of whether to replace
the belt
– Use your model to analyze the decision of whether to replace
the belt when the warning light comes on
Netica® file Eqpt_Diagnosis_DG.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 24 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Action, Causation and Information
• Actions have two aspects
– Actions can change the world (intervening actions)
» Decision to change the belt
– Actions can provide information about the state of the world (nonintervening actions)
» Decision to check warning light
• Intervening action
–
–
–
–
Normal time evolution of variable is “interrupted”
Agent sets the value of variable to desired value
Variables causally downstream are affected by change
Variables causally upstream are unaffected
• Non-intervening action
– Variable is information predecessor to future actions
– No other changes
• Every action has aspects of both
– When we take an action our knowledge is updated to include the
fact that we took the action (but we may forget actions we took)
– Observing the world changes it (we often do not model the impact
of actions whose impact is negligible)
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 25 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Example: Value of Information
•
You are deciding whether to go on a picnic (D=g) or stay home (D=h). Your utility
depends on the whether it rains (W=r) or shines (W=s):
– u(g,r)
=
0
you go and get wet
– u(g,s)
=
10
you go and have fun
– u(h,r)
=
3
you stay dry indoors
– u(h,s)
=
4
you stay home and it’s nice
•
You can call for a weather forecast before you make your decision
•
The forecast is for rain (r*) or shine (s*)
– P(F=r* | W=r)
= .9
– P(F=s* | W=s)
= .8
Forecas
t
Weathe
r
Utility
Call?
Go?
•
What is the optimal decision if the probability of rain is 0.1? What if the
probability is 0.8?
•
Is it to your benefit to call for a weather forecast before making your decision?
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 26 -
George Mason University
GMU
•
Department of Systems Engineering and Operations Research
VOI as Function of Prior Probability
This graph shows the expected
utility of three policies as a function
of the prior probability of rain:
– g: go no matter what
– h: stay home no matter what
– f: go only if the forecast says no rain
•
•
•
When line for f is above other lines
information is valuable
Difference is called value of
information
Summary:
(costless)
– Collecting information is useful if it
might change your decision
– The difference between your
expected utility with and without the
information is called the expected
value of information
– Costless information has positive
value for 4/13<p<16/17
– Costly information has value when
value is greater than cost of
Low-cost information is worth collecting for some P(rain)
information
High-cost information is never worth collecting
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 27 -
George Mason University
GMU
•
Department of Systems Engineering and Operations Research
Variable-Level Independence
Independence assumptions are an important tool for simplifying knowledge
elicitation and inference
– We have seen how Bayesian networks use independence to reduce the number of
parameters required to specify a probability distribution
– Independence assumptions also reduce the computational complexity of inference
•
Bayesian networks are not expressive enough to encode all independence
assumptions that can be exploited to:
– simplify knowledge elicitation
– reduce computations for inference
•
Additional types of independence which can simplify knowledge elicitation and
inference:
– Independence of causal influence (ICI)
» The mechanism by which one parent variable causes child variable is independent of
•
•
values of other parent variables and
mechanism by which they cause the child
– Context-specific independence (CSI)
» Variables may be independent of other variables in some contexts but not others
– Local expressions
» The probability distribution of a variable given its parents can be specified as any
parameterized function of the variable’s parents
» For many local expressions, the probability distribution of the child depends on the parents
only through a sufficient statistic
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 28 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Example of Noisy-OR
•
Sneezing can be caused by an allergy (A) , a cold (C), or
dust (D) in the air
•
The Noisy-OR model:
– Each cause may or may not "trigger" the effect.
A
C
D
» Only causes that are true can trigger the effect
» Causes operate "noisily" - if true, they may or may not trigger
the effect
– There is a "trigger probability" associated with each cause.
» Allergy triggers sneezing with probability pA = .6
» Cold triggers sneezing with probability pC = .9
S
» Dust triggers sneezing with probability pD = .3
– Basic assumptions of the "noisy-OR" model:
•
» Effect occurs if one or more of its causes has triggered it
» Whether one cause has triggered the effect is independent of
whether another cause has triggered the effect
©Kathryn Blackmond Laskey
June 2002
•
Noisy-OR reduces 8 probability
judgments to 3 (or 4 with “leak
probability”)
If there were 10 parents, it would
reduce 1024 probability judgments
to 10 (or 11 with “leak probability”)
BDT Workshop - 29 -
George Mason University
GMU
•
Department of Systems Engineering and Operations Research
Context-Specific Independence
Activity of military unit depends on the weather (wind and precipitation) and time of day
W ind
Pre cipitation Time of Day
When there are many parents
context-specific independence
can provide orders of magnitude
savings in elicitation
Activity
•
Context-specific independence reduces number of distributions for “Activity” from 8 to 4:
Wind=Windy
Wind=C alm
Wet Dry
Wet Dry
Night
Day
Night
Day
P(ÒActivityÓ= ÔM oveÕ) =
.9
.7
Best to travel on a calm, dry night:good
driving conditions and low visibility
Calm, wet weather or windy, dry nights
reduce visibility: OK driving conditions
.5
.1
Poor driving conditions in windy,
wet weather, but low visibility.
Most visible in dry daylight.
Example due to Suzanne Mahoney
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 30 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
7.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference using Bayes Rule
Combining expert knowledge and data
Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 31 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Importance of
Sound Knowledge Engineering Process
• Bayesian networks are increasing in popularity
• Applications are growing more complex
• A formal, repeatable process for knowledge engineering is
becoming more important
– Early work on elicitation of probability models (1970’s) focused on
eliciting single probabilities or univariate probability distributions
– Early work in BNs tended to assume that structure elicitation was
relatively straightforward
– As models become more complex the KE process must be
managed
• Knowledge elicitation for large Bayesian networks is a
problem in systems engineering
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 32 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Systems Engineering and the System Lifecycle
• Systems engineering is the technical and managerial process
by which a user need is translated into an operational system
• System life cycle evolves through predictable phases
Retire
Requirements
Requirements
Operate
Analyze
Test
Analyze
Design
Build
Test
Design
Build
Operate
Retire
Spiral Lifecycle Model
Waterfall Lifecycle Model
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 33 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Spiral Model of BN Engineering
• Goal of knowledge engineering
– Discovery and construction of appropriate model
– Not extraction of pre-existing model
• Spiral model is necessary for systems in which requirements
are discovered as development progresses
• Spiral KE
– Construct series of prototype models
– Explore behavior of prototype model on sample problems
– Evaluate prototypes and restructure as necessary
• KE changes both expert and elicitor
– Understanding of expert and elicitor deepen as KE proceeds
– Improves communication between elicitor and expert
Requirements
Operate
Test
Analyze
Design
©Kathryn Blackmond Laskey
June 2002
Build
BDT Workshop - 34 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Applying Spiral BN Engineering
• Begin with a small subproblem
–
–
–
–
Self-contained
Can be completed in short time
Interesting in its own right
Reasonably representative of global problem
• Build and test model for subproblem
– Look for common structures and processes that will recur
– Think about more efficient ways to structure KE
– Develop and document conventions (“style guide”) to be followed
as models are expanded
• Iteratively expand to more complex problems
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 35 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Model Construction Process
• What are the variables?
– World state variables represent possible outcomes
– Decision variables represent available options
– Utility variables represent attributes of value
• What are the possible values of world state and decision
variables?
• What is the graph structure?
• What is the structure of the local distributions for world state
variables?
• What are the local probability distributions for world state
variables?
• What are the local utility functions for the utility variables?
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 36 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
The Participants
• Naive view
– Put problem experts and modeling experts in a room together and
magic will happen”
• Realistic view
– Pure “problem experts” and pure “modeling experts” will talk past
each other
– Modeling experts must learn about the problem and problem
experts must learn what models can do
– This process can be time consuming and frustrating
– Team will be more productive if both sides expect and tolerate this
process
• Training
– The most productive way of training modelers and problem
experts is to construct very simple models of stylized domain
problems
– Goal is understanding and NOT realism or accuracy!
– Beware: the training phase can seem pointless and frustrating
– It is important to get expert buy-in
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 37 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Selecting a Subproblem
• Initial model or expansion of existing model
• Characteristics
–
–
–
–
Manageable size
Interesting in its own right
Path to expansion
Risk mitigation
• How to restrict
– Focus or target variables - variables of direct interest to the
decision maker
» Restrict to subset of variables of interest
» Restrict to subset of values
– Evidence variables - variables for which information will be
available; used to draw inferences about the focus variables
» Restrict to subset of evidence sources
– Context variables - variables that will be assumed known and will
be set to definite values
» Restrict to subset of contextual conditions (sensing conditions,
background casual conditions; assignment of objects to sensors;
number of objects)
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 38 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Managing Knowledge Acquisition
• Record rationale for modeling decisions
• Develop “style guide” to maintain consistency across multiple
subproblems
– Naming conventions
– Variable definitions
– Modeling conventions
• Enforce configuration management
– History of model versions
– Protocols for making and logging changes to current model
– Rationale for changes
• Develop protocol for testing models
– Record of test results traced to model changes and rationale
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 39 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Model Agility
• Requirement: rapid adaptation of model to a new situation
• Support for model agility
– Libraries of reusable model fragments
– Documentation of stable and changeable aspects of model
– Development of data sources for inputs to changeable model
components
» Protocols for data collection and maintenance
» Protocols for importing data into knowledge base
– Automated support for propagating impact of changes
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 40 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Model Evaluation
• Model walk-through
– Present completed model to "fresh" experts and/or modelers
– Evaluate all components of model
• Sensitivity analysis
– Measure effect of one variable on another
– Compare with expert intuition to evaluate model
– Evaluate whether additional modeling is needed
• Case-based evaluation
– Run model on set of test cases
» Cases to check local model fragments (component testing)
» Cases to test behavior of global model (whole-model testing)
– Compare with expert judgment or “ground truth”
– Important issue: selection of test cases
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 41 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
7.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference using Bayes Rule
Combining expert knowledge and data
Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 42 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Limits of Standard BN / DG
• “One size fits all” model
– All problem instances involve:
» Same set of variables
» Same states for variables
» Same relationships between variables
– Only “evidence” (instantiated variables) varies from instance to
instance
– All potentially relevant explanations are explicitly represented
• In complex, open-world problems:
– Number of actors and relationships to each other not fixed in
advance
– Attribution of evidence to actors may not be known in advance
– Situation evolves in time
– Need to represent only the most important explanations
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 43 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Maria’s Continuing Saga…
• Variation 1:
– Tran is sneezing and saw scratches
– Tran was recently exposed to a cold and probably is not allergy
prone
• Variation 2:
– Tran saw scratches
– Maria did not see scratches
– Tran is in room with Maria
• Variation 3:
– Tran and Maria both are sneezing, are allergy prone, and saw
scratches
– Tran and Maria are a continent apart
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 44 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
Variation 1
NearCat(Maria)
True
91.4
False
8.63
•
•
AllergyProne(Maria)
True
100
False
0
Add background variables to
specialize model to different
individuals
Still a “one size fits all” model
ExposedToCold(Maria)
True
15.7
False
84.3
AllergicReaction(Mar...
True
88.2
False
11.8
SeesScratches(Mar...
True
100
False
0
ColdStatus(Maria)
True
15.0
False
85.0
Sneezing(Maria)
True
100
False
0
Health(Grandmother1)
Exposed_to_Cold(Tran)
True
100
False
0
Cold_Status(Tran)
True
97.3
False
2.69
Allergy_Prone(Tran)
True
5.71
False
94.3
Visit(Maria,Grandmother1)
Go
109.977
StayHome 100.000
Pleasure(Maria,Grandmother1)
Allergic_Reaction(Tran)
True
4.43
False
95.6
Sneezing(Tran)
True
100
False
0
Health(Grandmother2)
Near_Cat(Tran)
True
65.5
False
34.5
Sees_Scratches(Tran)
True
100
False
0
Visit(Tran,Grandmother2)
Go
27.6902
StayHome 100.000
Pleasure(Tran,Grandmother2)
Netica® file Cat_SSN_Variation1.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 45 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Variation 2
Near(Maria,Tran)
True
100
False
0
Loc(Tran)
ExposedToCold(Tran)
True
100
False
0
ColdStatus(Tran)
True
97.0
False 2.98
AllergyProne(Tran)
True
5.99
False 94.0
Loc(Cat1)
Near(Tran,Cat1)
True
91.4
False 8.59
Loc(Maria)
Near(Maria,Cat1)
True
91.3
False 8.68
AllergicReaction(Tran)
True
5.66
False
94.3
Sneezing(Tran)
True
100
False
0
AllergyProne(Maria)
True
100
False
0
ExposedToCold(Maria)
True
15.8
False
84.2
AllergicReaction(Maria)
True
88.1
False
11.9
Sees_Scratches(Tran)
True
100
False
0
SeesScratches(Maria)
True
54.9
False 45.1
ColdStatus(Maria)
True
15.0
False 85.0
Sneezing(Maria)
True
100
False
0
Health(Grandmother1)
Visit(Maria,Grandmother1)
Go
109.950
StayHome 100.000
•
•
Pleasure(Maria,Grandmother1)
Decision graph has replicated sub-parts
Different kinds of entities (cats, people)
Netica® file Cat_SSN_Variation2.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 46 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Variation 3 Done Wrong
Near(Maria,Tran)
True
0
False
100
Loc(Tran)
ExposedToCold(Tran)
True
33.8
False
66.2
ColdStatus(Tran)
True
35.9
False 64.1
AllergyProne(Tran)
True
100
False
0
Loc(Cat1)
Near(Tran,Cat1)
True
49.5
False 50.5
Loc(Maria)
Near(Maria,Cat1)
True
49.5
False 50.5
AllergicReaction(Tran)
True
51.0
False
49.0
Sneezing(Tran)
True
100
False
0
AllergyProne(Maria)
True
100
False
0
ExposedToCold(Maria)
True
33.8
False
66.2
AllergicReaction(Maria)
True
51.0
False
49.0
Sees_Scratches(Tran)
True
100
False
0
SeesScratches(Maria)
True
100
False
0
ColdStatus(Maria)
True
35.9
False 64.1
Sneezing(Maria)
True
100
False
0
Health(Grandmother1)
Visit(Maria,Grandmother1)
Go
89.1446
StayHome 100.000
•
•
Pleasure(Maria,Grandmother1)
Variation 2 model gets wrong answer if Maria and Tran are not near each other
and both are near cats!
We need to be able to hypothesize additional cats if and when necessary
Netica® file Cat_SSN_Variation2.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 47 -
George Mason University
Department of Systems Engineering and Operations Research
Variation 3 Done Right
GMU
(…but what a mess!)
Cat
Other
Type(Cat2)
92.2
7.77
Near(Tran,Cat2,T1)
True
42.9
False 57.1
Cat
Other
Loc(Cat2,T1)
Loc(Tran,T1)
Near(Maria,Tran,T1)
True
0
False
100
Loc(Maria,T1)
Type(Cat1)
92.2
7.77
Near(Maria,Cat2,T1)
True
42.9
False 57.1
Loc(Cat1,T1)
Near(Tran,Cat1:Cat,T1)
True
42.7
False
57.3
ColdStatus(Tran,T0)
Cold
13.8
Exposed 13.0
Healthy
73.2
ColdStatus(Tran,T1)
Cold
18.0
Exposed 8.79
Healthy
73.2
AllergyProne(Tran)
True
100
False
0
Near(Tran,Cat1,T1)
True
42.9
False 57.1
Near(Maria,Cat1,T1)
True
42.9
False 57.1
Near(Tran,Cat2:Cat,T1)
True
42.7
False
57.3
Near(Maria,Cat1:Cat,...
True
42.7
False
57.3
Near(Tran,{c}:Cat,T1)
True
85.3
False
14.7
Near(Maria,{c}:Cat,T1)
True
85.3
False
14.7
AllergicReaction(Tran,T1)
True
82.8
False
17.2
Sneezing(Tran,T1)
True
100
False
0
Near(Maria,Cat2:Cat,...
True
42.7
False
57.3
ColdStatus(Maria,T0)
Cold
13.8
Exposed 13.0
Healthy
73.2
AllergyProne(Maria)
True
100
False
0
ColdStatus(Maria,T1)
Cold
18.0
Exposed 8.79
Healthy
73.2
AllergicReaction(Maria,...
True
82.8
False
17.2
SeesScratches(Tran,...
True
100
False
0
SeesScratches(Maria,...
True
100
False
0
Sneezing(Maria,T1)
True
100
False
0
Visit(Maria,Grandmother,...
Go
107.033
StayHome 99.9999
•
Health(Grandmother,T1)
Pleasure(Maria,Grandmother,T1)
This model gets the “right answer” on all the variations
Netica® file Cat_SSN_Variation3.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 48 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
The Solution
Near(h,:Cat,t)
True
2.76
False 97.2
AllergyProne(h)
True
5.00
False 95.0
Loc(x,t)
Spatial Loc(y,t)
Fragment
AllergicReaction(h,t)
True
1.10
False 98.9
SeesScratches(h,t)
True
2.63
False 97.4
Cats
& Allergies
Fragment
AllergicReaction(h,t)
True
1.13
False 98.9
ColdStatus(h,t)
Cold
12.0
Exposed
0
Healthy
88.0
Sneezing(h,t)
True
17.2
False 82.8
Sneezing
Fragment
True
False
Near(x,y,t)
3.04
97.0
ColdStatus(h1,t)
Cold
8.35
Exposed 9.82
Healthy
81.8
Visit(h1,h2,t)
Go
116.651
StayHome 100.000
True
False
Near(x,c,t)
3.00
97.0
Type(c)
50.0
50.0
Near(x,c:Cat,t)
True
1.50
False 98.5
Hypothesis
Management
Fragment
Value
Fragment
Near(x,{c}:Cat,t)
True
2.48
False 97.5
ColdStatus(h,t-1)
Cold
8.35
Exposed 9.82
Healthy
81.8
Colds&Time
Fragment
Health(h2,t)
Pleasure(h1,h2,t)
•
Cat
Other
ColdStatus(h,t)
Cold
8.35
Exposed 9.82
Healthy
81.8
Specify model in pieces and let the computer compose them
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 49 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Multi-Entity Bayesian Networks
and Decision Graphs
• Represent knowledge as model fragments
– Implicitly represents complete and consistent model of domain and
anticipated situations
– No a priori bound on #entities, #relevant relationships, #observations
• Compose fragments dynamically into situation specific network
(SSN)
– A situation is a snapshot of the world at an instant of time
– A situation-specific network is an ordinary, finite Bayesian network or
decision graph constructed from the MEDG knowledge base using
network construction operators
• Use SSN to compute response to query
– General purpose algorithm
– Approximates the “correct answer” encoded by the knowledge base
– Models with special structure can be solved with special-case
algorithms
• Use expert-guided Bayesian learning to update knowledge
patterns over time
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 50 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Model Construction
• Simpler models give same results as more complex model on
problems for which they are adequate
• We want to construct “good enough” model for our situation
• Model constructor builds situation-specific DG from
knowledge base implicitly encoding infinite-dimensional DG
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 51 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Functional Architecture
for Model Construction System
Suggestors
BN/DG
Fragment
KB
• Retrieve model
fragments
• Match variables
• Attach evidence
to variables
Alert
Messages
• Combine
fragments into
situation-specific
model
• Update inferences
and decisions
Model
Workspace
Streaming
Evidence
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 52 -
George Mason University
Department of Systems Engineering and Operations Research
MEDGs and
Object-Oriented Representation
GMU
•
Entities are represented as objects
•
Probability part of MEDG expresses uncertainty about entities
–
–
–
–
•
Attribute uncertainty
Existence uncertainty (false alarm)
Type and subtype uncertainty (discrimination within a type hierarchy)
Reference uncertainty (association)
Value and action part of MEDG represents objectives and plans of software agent
Physical
Object
Near
Mammal
Location
Cat
Human
Time
Composition
Physical
Object
Association
Inheritance
©Kathryn Blackmond Laskey
Physical
Object
Physical
Object
June 2002
BDT Workshop - 53 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
MEBN / MEDG Value Added
Inference Collaborative Environment (ICE)
• MEBN software developed by Information
Extraction & Transport, Inc.
• BMD/ICE development sponsored by MDA
• SPARTA engineers are using ICE to develop
fusion architecture for missile defense
Modeling with standard
Bayesian Networks
Effort
Modeling with richer
representation for
managing and reasoning
w/ Uncertainty
Problem Complexity
# of entities, # interactions, spatio-temporal variables, ...
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 54 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
7.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference using Bayes Rule
Combining expert knowledge and data
Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 55 -
Department of Systems Engineering and Operations Research
George Mason University
GMU
•
•
Bayesian Inference
Bayesian inference is a theory of rational belief dynamics
Uncertainty about state of world is represented by a probability distribution
over states
– Probability is a rational agent’s degree of belief about uncertain states of the world
•
•
Beliefs are updated over time by conditioning on new information about the
world
The rule for updating beliefs with new information is called Bayes rule:
P( H2 | E )
P(E | H1 ) P(H1)
P( H1 | E)
P( E | H 2 ) P(H1)
Posterior odds ratio
•
Prior odds ratio
Likelihood ratio
Evidence increases odds of hypothesis H2 relative to hypothesis H1 if the
evidence is more likely under H2 than under H1
– Posterior probability can increase even with unlikely evidence if it is more unlikely
under alternate hypothesis
– Posterior probability can fail to increase even with likely evidence if it is more likely
under alternate hypothesis
•
Bayesian theory justifies the scientific process
– The best way to confirm a hypothesis is to enumerate many plausible alternatives
and find evidence to disconfirm them
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 56 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Graphical View of Bayes Rule
* * *
*
*
*
* *
*
*** * * *
* * * * * *
*
* *
*
*
*
*
*
* *
* * *
*
*
* * *
* ~ 1 million Zappists
*
* ~ 1 million non-Zappists
•
Province
Province
Northeastern 12.5
Other
87.5
Northeastern
Other
63.2
36.8
Religion
Religion
Zappist 11.9
Other
88.1
Zappist
Other
100
0
Depravia has a population of 40 million people
– 5 million in Northeastern province and 35 million in the rest of Depravia
•
•
The Zappist fundamentalists make up 60% of the population of the
Northeastern province and 5% of the population of the rest of Depravia
What is the probability that Frangolina is from the Northern province of
Depravia if:
– All we know about her is that she lives somewhere in Depravia?
– We learn that she is a fundamentalist Zappist?
Netica® file ZappistProvince.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 57 -
George Mason University
GMU
•
Department of Systems Engineering and Operations Research
Rare Events and the False Positive Index
Diagnostic tests can be evaluated by sensitivity and specificity
– Sensitivity is probability condition will be detected if present
– Specificity is probability condition will not be detected if not present
– We can increase sensitivity by adjusting the threshold for declaring a positive, but at
the cost of decreasing specificity
•
The false positive index is the number of false positives for every true positive
•
The chart plots false
positive index against
base rate for a test with 3
different thresholds
The best way to increase
both sensitivity and
specificity is to integrate
information from multiple
sources
Bayesian networks do this
in a principled way
Increasing the accuracy of
single-source sensors
may not help if single
source cannot observe all
relevant features
•
•
•
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 58 -
George Mason University
GMU
•
•
•
•
•
Department of Systems Engineering and Operations Research
Soft Evidence
Ten year old Leslie comes downstairs complaining of a
headache and chills. You put your hand on her
forehead and decide that she probably has a fever.
What's the probability she's sick enough to stay home
from school?
We draw a Bayesian network with variables S (sick?), H
(headache?), F (fever?), C (chills?). How do we
process the evidence that her forehead "feels hot?"
We could add a “virtual” child of F called T (touch
forehead). We would assign possible values (e.g.,
very hot, hot, warm, cool) for T and assess P(T | F) for
each value of F and T
Why should we have to assess P(F|T) for values of T
that did not occur?
We can achieve the same effect using “soft evidence”
for F
– Soft evidence represents the relative likelihood of
“Leslie’s feel” if she has a fever and if she doesn’t
– An 8 to 1 ratio could be entered as (0.8,0.1), (0.4,0.05) or
any pair of values with an 8 to 1 ratio
S
H
F
C
T
S
H
F
T(f) = .8
_
T(f) = .1
C
Netica® file Leslie.dne
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 59 -
George Mason University
Inference by Local Distributed Computation
GMU
•
Department of Systems Engineering and Operations Research
Distributed inference algorithms perform fast computation
– Algorithm computes probability distribution of non-evidence variables given evidence
variables using Bayes rule
– Tractable exact inference on probability distributions of complex phenomena
– Approximate inference makes even the most difficult problems feasible
•
Graph separation provides the basis for defining communication architecture for
fusion systems
– Compile Bayesian network into computational representation for inference
– Determine information requirements at variables and required flows along links
– Analyze tradeoffs in accuracy and computation/communication
AB
A
F
BEC
B
E
G
ECG
C
EGF
D
CGH
H
DC
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 60 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Inference Algorithms
• Exact, graph theory based
– Pearl's tree algorithm and its modifications
– Junction tree algorithm
– Influence diagram reduction
• Exact, factorization based
– Symbolic probabilistic inference
– Bucket elimination
• Approximate
– Monte Carlo simulation
– Variational methods
– Various special case approaches
All these algorithms solve the canonical inference problem:
find the posterior probability distribution for some variable(s)
given direct or virtual evidence about other variable(s)
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 61 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
7.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference using Bayes Rule
Combining expert knowledge and data
Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 62 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
The Learning Problem
•
Why learning?
–
–
–
–
•
Knowledge elicitation is difficult
Experts can be biased
There may be no experts for some problems
In many applications data are plentiful
Learning tasks
– Learn a Bayesian network from observations alone
– Combine expert knowledge and data
•
Structure of a learning algorithm
– Method for searching over structures
– Method for evaluating “goodness” of structures
» Bayesian learner compares Structure1 and Structure2 by ratio of posterior
probabilities P(Data | Structure1) / P(Data | Structure2)
– Method for estimating parameters given structure
» Bayesian learner computes posterior distribution of parameters given structure
– Choice of output
» Single most probable structure and estimated probability table?
» Sample of structures from posterior distribution of structures, expected value of
probability table, standard deviations of probabilities?
» Other possibilities
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 63 -
Department of Systems Engineering and Operations Research
George Mason University
GMU
Raw Material: The Observations
• Easiest case:
– Random sample of cases from the network to be learned
– Each case contains an observed value for all variables in the network
A
A
B
?
C
D
B
C
E
D
E
• Complexities:
A
0
1
0
1
1
1
?
0
1
1
B
1
1
1
0
0
1
0
0
1
1
C
0
0
1
1
1
0
1
1
1
0
D
1
1
1
1
0
1
0
0
0
1
E
1
1
0
0
1
0
1
0
0
1
– Missing observations: Some variables are not observed in some cases
– Hidden variables: Some variables are not observed in any cases
– Non-random sampling: Sampled cases are not representative of the
population for which a Bayesian network is being learned
A
B
A
?
C
D
©Kathryn Blackmond Laskey
E
B
C
D
E
A
0
1
0
1
1
1
?
0
1
1
B
?
?
?
?
?
?
?
?
?
?
June 2002
C
0
0
?
1
1
0
1
1
1
?
D
1
1
1
?
0
1
0
0
0
1
E
1
1
0
0
1
0
1
0
0
1
BDT Workshop - 64 -
Department of Systems Engineering and Operations Research
George Mason University
GMU
Learning a Single Probability
•
Objective: From a sample of student responses to a problem, learn the
probability that a similar student will answer correctly
If we know nothing about the probability a priori then we may consider all
probabilities equally likely (uniform distribution)
Suppose 7 of 10 students answer correctly
The posterior distribution has a Beta distribution with parameters 8 and 4
The posterior expected value is 8/(8+4) = 0.67
•
•
•
•
Correct(1)
Correct(10)
True
50.0
False
50.0
True
50.0
False
50.0
Correct(10)
True
100
False
0
True
False
Correct(1)
100
0
Correct(2)
Correct(9)
50.0
0 to 0.1
10.0
False
50.0
0.1 to 0.2
10.0
0.2 to 0.3
10.0
0.3 to 0.4
10.0
0.4 to 0.5
10.0
0.5 to 0.6
10.0
0.6 to 0.7
10.0
0.7 to 0.8
10.0
0.8 to 0.9
10.0
Correct(8)
50.0
False
50.0
50.0
False
50.0
True
False
PercentCorrect
True
True
True
0.9 to 1
True
False
Correct(9)
100
0
PercentCorrect
0 to 0.1
0+
0.1 to 0.2 .014
0.2 to 0.3 0.34
0.3 to 0.4 2.33
0.4 to 0.5 8.21
0.5 to 0.6 18.3
0.6 to 0.7 27.8
0.7 to 0.8 27.6
0.8 to 0.9 14.3
0.9 to 1
1.15
0.67 ± 0.13
Correct(3)
True
50.0
False
50.0
10.0
True
False
0.5 ± 0.29
Correct(8)
100
0
True
False
Correct(4)
Correct(7)
True
50.0
False
50.0
True
50.0
False
50.0
True
False
Correct(7)
True
0
False
100
Correct(5)
Correct(6)
True
50.0
True
50.0
False
50.0
False
50.0
True
False
Prior distribution: All percentages are equally
likely (uniform distribution)
June 2002
True
False
Correct(3)
100
0
Correct(4)
100
0
Correct(5)
0
100
Posterior distribution: Beta(8,4) distribution
with expected value 67%
The possible values for a probability are any number between 0 and 1. Netica®
approximates the posterior distribution with finitely many “bins”
©Kathryn Blackmond Laskey
Correct(6)
100
0
Correct(2)
0
100
Netica® file Learning.dne
BDT Workshop - 65 -
George Mason University
GMU
•
•
•
Department of Systems Engineering and Operations Research
Virtual Counts and Expert/Data Combination
Suppose 6 out of 10 students in a second sample answered correctly
Our new posterior distribution is also a Beta distribution with parameters 14
and 8
The rule for updating:
– Prior Beta distribution has “virtual count” of 8 for True and 4 for False
– Data contains 6 true values and 4 false values
– Add the virtual counts to the actual counts to obtain the posterior virtual counts of
14 and 8
•
Uniform distribution is a Beta distribution with virtual count of 1 for True and
1 for False
– Posterior probability for True is (1 + #True)/(2 + #Sample)
•
We can assess virtual counts from the expert and combine them with data
– Larger virtual counts mean less uncertainty
– Expected value for a state is its virtual count divided by total virtual count
•
The same idea works when there are more than two states if we use the
Dirichlet distribution instead of the Beta distribution
Why not just use observed frequencies to estimate probabilities?
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 66 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
Triplot
• A triplot is a convenient tool for visualizing the combination of
prior and data to form a posterior distribution. It plots the
prior, posterior and “normalized likelihood” on the same axes.
• Triplot for vague but on target expert
– Expert distribution is spread out but likely to be near 70% True
– 5 out of 25 observations are True
– Posterior expected value is 23/30 = 77% True
6
5
4
P rior (Beta(3,2))
NLK (B eta(21,6))
P os t. (B eta(23,7))
3
2
1
• “Blind” application of
updating formula can be
a bad idea if expert is
wrong
• Adding “pinch of
probability” on
hypothesis that model is
wrong adds robustness
• Serious conflict should
trigger an alert!
©Kathryn Blackmond Laskey
0.98
0.91
0.84
0.77
0.70
0.63
0.56
0.49
0.42
0.35
0.28
0.21
0.14
0.07
0.00
0
June 2002
BDT Workshop - 67 -
Department of Systems Engineering and Operations Research
George Mason University
The “Pinch of Probability”
GMU
10
9
8
7
Prior (Beta(60,40))
NLK (Beta(21,6))
Post. (Beta(80,45))
6
5
4
3
2
1
0.98
0.91
0.84
0.77
0.70
0.63
0.56
0.49
0.42
0.35
0.28
0.21
0.14
0.07
0.00
0
Overconfident but approximately correct expert
With 1% “pinch” on uniform prior
7
6
5
4
Prior (Beta(10,30))
3
NLK (Beta(21,6))
Post. (Beta(30,35))
2
1
0.98
0.91
0.84
0.77
0.70
0.63
0.56
0.49
0.42
0.35
0.28
0.21
0.14
0.07
0.00
0
Overconfident and wrong expert
©Kathryn Blackmond Laskey
With 1% “pinch” on uniform prior
June 2002
BDT Workshop - 68 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Learning Structure
• We can represent uncertainty about the structure of a
Bayesian network as a probability distribution over structures
– Assign prior probability to each structure and to local
distributions conditional on structure
– Apply Bayes rule to obtain posterior distribution for structure and
parameters given data
• “Natural Occam’s Razor”
– When sample sizes are relatively small Bayesian learning tends to
favor simpler models (models with fewer arcs and simpler local
distributions)
– As observations accumulate Bayesian learning will adjust the
number of parameters as needed to account for the data
– In general Bayesian learning tends to converge to the simplest
model that is consistent with the observations
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 69 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Agenda
1.
2.
3.
4.
5.
6.
7.
Introduction
Basics of decision theory and graphical models
Knowledge engineering and model development
Multi-entity Bayesian networks and decision graphs
Inference using Bayes Rule
Combining expert knowledge and data
Conclusion
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 70 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Application Areas
• BNs and DGs are increasingly being applied to a broad array
of problems
– Automated and mixed initiative multi-source fusion
– Automated and mixed initiative data mining
– Models and simulations for systems engineering trade studies
and system design
– Agent models for agent-based computing
– Cognitive modeling
– Corporate and government strategic planning
– Biometrics
– Cyber-security
– Decision support for many applications
– Human genome project
• How could BNs and DGs be applied in your area of work?
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 71 -
George Mason University
GMU
Department of Systems Engineering and Operations Research
Summary
•
BNs and DGs move from hand-crafted special purpose models to
genuine open-world reasoning capability
– MEDGs are logic for new-paradigm computing
•
General-purpose language for formulating and specifying scientific,
engineering and policy models
– Specify knowledge via modular components with well-defined
interconnections
– Uncertainty and observability intrinsic elements of the representation
– Combine physical, logical, and subjective knowledge in a unified and
logically defensible manner
– Represent both causal and correlational knowledge
•
Modular representation for composing complex probability models
from manageable sub-units
– First-principles domain model
– Decision theory provides approach to trading off expressive power against
complexity in
»
»
»
»
Knowledge engineering
Model construction
Inference
Learning
– Unified knowledge modeling methodology & process spans sub-symbolic
through cognitive and social/organizational levels
•
•
Application experience to date shows strong promise
There are many open research issues
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 72 -
George Mason University
Department of Systems Engineering and Operations Research
GMU
References
•
Bayesian networks and decision graphs
– Charniak, E. Bayesian Networks without Tears, AI Magazine, 1993.
– Jensen, F. Bayesian Networks and Decision Graphs Springer-Verlag, 2001.
– Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,
Morgan Kaufmann, 1988. The first book popularizing Bayesian networks.
– Clemen, R. Making Hard Decisions
•
Knowledge engineering
– Special issue of IEEE Transactions on Knowledge and Data Engineering: Probability Models:
Where do the Numbers Come From?, 2000.
•
Constructing BNs and DGs from modular and repeatable components
– Glesner, S. and D. Koller (1995) Constructing Flexible Dynamic Belief Networks from FirstOrder Probabilistic Knowledge Bases, in ECSQARU ’95, pp. 217-226.
– Laskey, K.B. , Mahoney, S.M. and Wright, E. (2001) Hypothesis Management in SituationSpecific BN Construction. In Koller, D. and Breese, J., Uncertainty in Artificial Intelligence:
Proceedings of the Sixteenth Conference, San Francisco, CA: Morgan Kaufmann.
•
Combining expert knowledge and data
– D. Heckerman. A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06,
Microsoft Research, March, 1995.
– Laskey, K.B. and Mahoney, S.M. Knowledge and Data Fusion in Probabilistic Networks
submitted to Journal of Machine Learning Research special issue on knowledge and data
fusion
•
Web site for my GMU course
– contains lecture notes, additional references, exercises, useful links
– http://ite.gmu.edu/~klaskey/CompProb
©Kathryn Blackmond Laskey
June 2002
BDT Workshop - 73 -