Transcript PowerPoint
CS 416
Artificial Intelligence
Lecture 14
Uncertainty
Chapter 13
An apology to Red Sox fans
The only team ever in baseball to take a 3-0 series
to a game seven
I was playing the
probabilities…
Shortcomings of first-order logic
Consider dental diagnosis
•
– Not all patients with toothaches have cavities. There are
other causes of toothaches
Shortcomings of first-order logic
What’s wrong with this?
• An unlimited number of toothache causes
Shortcomings of first-order logic
Alternatively, create a causal rule
• Again, not all cavities cause pain. Must expand
Shortcomings of first-order logic
Both diagnostic and causal rules require
countless qualifications
• Difficult to be exhaustive
– Too much work
– We don’t know all the qualifications
– Even correctly qualified rules may not be useful if the realtime application of the rules is missing data
Shortcomings of first-order logic
As an alternative to exhaustive logic…
Probability Theory
• Serves as a hedge against our laziness and ignorance
Degrees of belief
I believe the glass is full with 50% chance
• Note this does not indicate the statement is half-true
– We are not talking about a glass half-full
• “The glass is full” is the only statement being considered
• My statement indicates I believe with 50% that the statement
is true. There are no claims about what other beliefs I have
regarding the glass.
– Fuzzy logic handles partial-truths
Decision Theory
What is rational behavior in context of probability?
• Pick answer that satisfies goals with highest probability of
actually working?
– Sometimes more risk is acceptable
• Must have a utility function that measures the many factors
related to an agent’s happiness with an outcome
• An agent is rational if and only if it chooses the action that
yields the highest expected utility, averaged over all the
possible outcomes of the action
Building probability notation
Propositions
• Like propositional logic. The things we believe
Atomic Events
• A complete specification of the state of the world
Prior Probability
• Probability something is true in absence of other data
Conditional Probability
• Probability something is true given something else is known
Propositions
Like propositional logic
• Random variables refer to parts of the world with unknown
status
• Random variables have a well-defined domain
– Boolean
– Discrete (countable)
– Continuous
Atomic events
A complete specification of the world
• All variables in the world are assigned values
• Only one atomic event can be true
• The set of all atomic events is exhaustive – at least one must
be true
• Any atomic even entails the truth or falsehood of every
proposition
Prior probability
The degree of belief in the absence of other info
• P (Weather)
– P (Weather == sunny)
= 0.7
– P (Weather == rainy)
= 0.2
– P (Weather == cloudy)
= 0.08
– P (Weather == snowy)
= 0.02
• P (Weather) = <0.7, 0.2, 0.08, 0.02>
– Probability distribution for the random variable Weather
Prior probability - Discrete
Joint probability distribution
• P (Weather, Natural Disaster) = an n x m table of probs
– n = instances of weather
– m = instances of natural disasters
Full joint probability distribution
• Probabilities for all variables are established
What about continuous variables where a table won’t suffice?
Prior probability - Continuous
Probability density functions (PDFs)
• P (X = x) = Uniform [18, 26] (x)
– The probability that tomorrow’s temperature is 20.5
degrees Celsius is U [18, 26] (20.5) = 0.125
Conditional probability
The probability of a given all we know is b
• P (a | b)
Written as an unconditional probability
•
Axioms of probability
• All probabilities are between 0 and 1
• Necessarily true propositions have probability 1
Necessarily false propositions have probability 0
• The probability of disjunction is:
Using axioms of probability
The probability of a proposition is equal to the
sum of the probabilities of the atomic events in
which it holds:
An example
Maginalization:
Conditioning:
Conditional probabilities
Conditional probabilities
Normalization
Two previous calculations had the same denominator
• P(cavity | toothache) = a P(cavity, toothache)
– = a [P(cavity, toothache, catch) + P(cavity, toothache, ~catch)]
– = a [<0.108, 0.016> + <0.012, 0.064>] = a<0.12, 0.08> = <0.6, 0.4>
Generalized (X = cavity, e = toothache, y = catch)
P (X, e, y) is a subset of the full joint distribution
Using the full joint distribution
It does not scale well…
• n Boolean variables
– Table size O (2n)
– Process time O (2n)
Independence
Independence of variables in a domain can
dramatically reduce the amount of information
necessary to specify the full joint distribution
• Adding weather (four states) to this table requires creating
four versions of it (one for each weather state) = 8*4=32 cells
Independence
• P (toothache, catch, cavity, Weather=cloudy) =
P(Weather=cloudy | toothache, catch, cavity) *
P(toothache, catch, cavity)
Because weather and dentistry are independent
• P (Weather=cloudy | toothache, catch, cavity) =
P (Weather = cloudy)
• P (toothache, catch, cavity, Weather=cloudy) =
P(Weather=cloudy) * P(toothache, catch, cavity)
4-cell table
8-cell table
Bayes’ Rule
Useful when you know three things and need to
know the fourth
Example
Meningitis
• Doctor knows meningitis causes stiff necks 50% of the time
• Doctor knows unconditional facts
– The probability of having meningitis is 1 / 50,000
– The probability of having a stiff neck is 1 / 20
• The probability of having meningitis given a stiff neck:
Power of Bayes’ rule
Why not collect more diagnostic evidence?
• Statistically sample to learn P (m | s) = 1 / 5,000
If P(m) changes… due to outbreak… Bayes’
computation adjusts automatically, but sampled
P(m | s) is rigid
Conditional independence
Consider the infeasibility of full joint distributions
• We must know P(toothache and catch) for all Cavity values
Simplify using independence
• Toothache and catch are not independent
• Toothache and catch are independent given the presence or
absence of a cavity
Conditional independence
Toothache and catch are independent given the
presence or absence of a cavity
• If you know you have a cavity, there’s no reason to believe
the toothache and the dentist’s pick are related
Conditional independence
In general, when a single cause influences
multiple effects, all of which are conditionally
independent (given the cause)
Naïve Bayes
Even when “effect” variables are not conditionally
independent, this model is sometimes used
• Sometimes called a Bayesian Classifier