Transcript PowerPoint

CS 416
Artificial Intelligence
Lecture 15
Uncertainty
Chapters 13 and 14
Conditional probability
The probability of a given all we know is b
• P (a | b)
Written as an unconditional probability
•
Conditioning
A distribution over Y can be obtained by summing
out all the other variables from any joint
distribution containing Y
P(Y) = SUM P(Y|z) P(z)
Independence
Independence of variables in a domain can
dramatically reduce the amount of information
necessary to specify the full joint distribution
• Assume dental scenario has three T/F conditions
– Toothache – yes / no
– Catch – pick (does / does not) get caught
– Cavity – yes / no
• 23 probabilities are required to cover all the cases
Independence
• 23 probabilities are required to cover all the cases
• Consider adding weather (four states) to this table
– For each weather condition, there are 8 dental conditions
8*4=32 cells
Independence
Rainy
Cloudy
Sunny
Windy
Independence
Conditional probability stipulates:
• P(dental condition and weather condition) = P(weather|dental) P(dental)
Because weather and dentistry are independent
• P (weather | dental) = P (weather)
• P (toothache, catch, cavity, Weather=cloudy) =
P(Weather=cloudy) * P(toothache, catch, cavity)
4-cell table
8-cell table
12 cells total
Bayes’ Rule
Useful when you know three things and need to
know the fourth
Conditional independence
Consider toothaches, the pick catching, and
cavities
• A cavity causes the pick to catch
• A cavity causes toothaches
Both are likely if you have a
cavity, but neither causes
the other…
• A toothache doesn’t cause the pick to catch
• The pick catching doesn’t cause a toothache
Catching and
toothaches are
not related…
Conditional independence
Toothache and catch are independent given the
presence or absence of a cavity
• If you know you have a cavity, there’s no reason to believe
the toothache and the dentist’s pick are related
Conditional independence
In general, when a single cause influences
multiple effects, all of which are conditionally
independent (given the cause)
2n+1
2*n*(22)
8n
Assuming
binary
variables
Wumpus
Are there pits in (1,3) (2,2) (3,1)
given breezes in (1,2) and (2,1)?
One way to solve…
• Find the full joint distribution
– P (P1,1, …, P4,4, B1,1, B1,2, B2,1)
Find the full joint distribution
• Remember the product rule
• P (P1,1, …, P4,4, B1,1, B1,2, B2,1)
• P(B1,1, B1,2, B2,1 | P1,1, …, P4,4) P(P1,1, …, P4,4)
– Solve this for all P and B values
Find the full joint distribution
• P(B1,1, B1,2, B2,1 | P1,1, …, P4,4) P(P1,1, …, P4,4)
– Givens:
 the rules relating breezes to pits
 each square contains a pit with probability = 0.2
– For any given P1,1, …, P4,4 setting with n pits
 The rules of breezes tells us the value of P (B | P)
 0.2n * 0.8(16-n) tells us the value of P(P)
Solving an instance
We have the following facts:
•
•
Query: P (P1,3 | known, b)
• We know the full joint probability so we can solve this
– 212 = 4096 terms must be summed
Solving an instance more quickly
Independence
• The contents of [4,4] don’t affect the
presence of a pit at [1,3]
• Create Fringe and Other
– Breezes are conditionally
independent of the Other variables
Chapter 14
Probabilistic Reasoning
• First, Bayesian Networks
• Then, Inference
Bayesian Networks
Difficult to build a probability table with a large
amount of data
• Independence and conditional independence seek to reduce
complications (time) of building full joint distribution
Bayesian Network captures these dependencies
Bayesian Network
Directed Acyclic Graph (DAG)
• Random variables are the nodes
• Arcs indicate conditional independence relationships
• Each node labeled with P(Xi | Parents (Xi))
Another example
Burglar Alarm
• Goes off when intruder (usually)
• Goes off during earthquake (sometimes)
• Neighbor John calls when he hears the alarm, but he also
calls when he confuses the phone for the alarm
• Neighbor Mary calls when she hears the alarm, but she
doesn’t hear it when listening to music
Another example
Burglar Alarm
Note the absence of
Information about John
and Mary’s errors.
Note the presence of
Conditional Probability
Tables (CPTs)
Full joint distribution
The Bayesian Network describes the full joint
distribution
P(X1 = x1 ^ X2 = x2 ^ … ^ Xn = xn)
abbreviated as…
P (x1, x2, …, xn)
CPT
Burglar alarm example
P (John calls, Mary calls, alarm goes off, no intruder or earthquake)
Constructing a Bayesian Network
• Top-down is more likely to work
• Causal rules are better
• Adding arcs is a judgment call
– Consider decision not to add error info about John/Mary
Conditional distributions
It can be time consuming to fill up all the CPTs of
discrete random variables
• Sometimes standard templates can be used
– The canonical 20% of the work takes 80% of the time
• Sometimes simple logic summarizes a table
– A V B V C => D
Conditional distributions
Continuous random variables
• Discretization
– Subdivide continuous region into a fixed set of intervals
 Where do you put the regions?
• Standard Probability Density Functions (PDFs)
– Gaussian, where only mean and variance need to be
specified
Conditional distributions
Mixing discrete and continuous
Continuous
Example:
• Probability I buy fruit is a function of its cost
• Its cost is a function of the harvest quality and the presence
of government subsidies
Discrete
How do we mix the items?
Hybrid Bayesians
P(Cost | Harvest, Subsidy)
Enumerate the
discrete choices
• P (Cost | Harvest, subsidy)
• P (Cost | Harvest, ~subsidy)
Hybrid Bayesians
How does Cost change as a function of Harvest?
• Linear Gaussian
– Cost is a Gaussian distribution with mean that varies
linearly with the value of the parent and standard deviation
is constant
Need two of these…
One for each subsidy
Multivariate Gaussian
A network of continuous variables with linear
Gaussian distributions has a joint distribution that
is a multivariate Gaussian distribution over all the
variables
• A surface in n-dimensional space where there is a peak at
the point with coordinates constructed from each dimension’s
means
• It drops off in all directions from the mean
Conditional Gaussian
Adding discrete variables to a multivariate
Gaussian results in a conditional Gaussian
• Given any assignment to the discrete variables, the
distribution over the continuous ones is multivariate Gaussian
Continuous variables with discrete
parents
Either you buy or you don’t
• But there is a soft threshold
around your desired cost
Thresholding functions
Probit
Logit
Mean = 0.6, Std. Dev. = 1.0
Inference in Bayesian Networks
First we talk about computing this exactly
• Will be shown to be intractable in many cases
Later, we’ll talk about approximations
Inference by enumeration
What is the probability of a query variable, X given
a set of evidence variables e (E1, …, Em)
• P (X | e)
Let Y represent the hidden variables
• P (X | e) = a P(X, e) = a SUMy P (X, e, y)
We had solved this by walking through the full
joint distribution.
The Bayesian Network provides another way
Inference by enumeration
Compute sums of products of conditional
probabilities from the network
• P (Burglary | JohnCalls=true, MaryCalls=True)
– Hidden variables = Earthquake and Alarm
– P (B | j, m) = a P (B, j, m) = a SUMeSUMa P (B, e, a, j, m)
– Add four numbers composed of 5 products
• Network with n Booleans requires n2n computations!!!
Variable elimination algorithm
There are ways to reduce computation costs
• Move variables outside of the summation
• Use dynamic programming to store work you’ve done for
future use
Will this always help?
Complexity of exact inference
Polytree (or singly connected): there is one
undirected path between any two nodes
• Time in space complexity is linear in size of network
Multiply connected
• Can have exponential costs
• In practice, people try to cluster nodes of the network to
make it a polytree
Solving an instance more quickly
Independence
• Use conditional independence of b
Solving an instance more quickly
Set up the use of independence
Apply conditional independence
Solving an instance more quickly
Move summation inwards
Use absolute independence
Solving an instance more quickly
Do some additional reorganization
Reduced to four terms to sum over