Conditional Probability
Download
Report
Transcript Conditional Probability
CS 416
Artificial Intelligence
Lecture 13
Uncertainty
Chapter 13
Midterm
March 16th
See class web page for old tests and study guide
Shortcomings of first-order logic
Consider dental diagnosis
•
– Not all patients with toothaches have cavities.
There are other outcomes of toothaches
Shortcomings of first-order logic
What’s wrong with this?
• An unlimited number of toothache implications
Shortcomings of first-order logic
Alternatively, create a causal rule
• Again, not all cavities cause pain. Must expand
Shortcomings of first-order logic
Both diagnostic and causal rules require
countless qualifications
• Difficult to be exhaustive
– Too much work
– We don’t know all the qualifications
– Even correctly qualified rules may not be useful if the realtime application of the rules is missing data
Shortcomings of first-order logic
As an alternative to exhaustive logic…
Probability Theory
• Serves as a hedge against our laziness and ignorance
Degrees of belief
I believe the glass is full with 50% chance
• Note this does not indicate the statement is half-true
– We are not talking about a glass half-full
• “The glass is full” is the only statement being considered
• My statement indicates I believe with 50% that the statement
is true. There are no claims about what other beliefs I have
regarding the glass.
– Fuzzy logic handles partial-truths
Decision Theory
What is rational behavior in context of probability?
• Pick answer that satisfies goals with highest probability of
actually working?
– Sometimes more risk is acceptable
• Must have a utility function that measures the many factors
related to an agent’s happiness with an outcome
• An agent is rational if and only if it chooses the action that
yields the highest expected utility, averaged over all the
possible outcomes of the action
Building probability notation
Propositions
• Like propositional logic. The things we believe
Atomic Events
• A complete specification of the state of the world
Prior Probability
• Probability something is true in absence of other data
Conditional Probability
• Probability something is true given something else is known
Propositions
Assertions that “such and such is true”
• Random variables refer to parts of the world with unknown
status
• Random variables have a well-defined domain
– Boolean, P(Heads)
– Discrete (countable), P (Whether = Sunny)
– Continuous, P (Speed = 55.0 mph)
Atomic events
A complete specification of the world
• All variables in the world are assigned values
• Only one atomic event can be true
• The set of all atomic events is exhaustive
• Any atomic event entails the truth or falsehood of every
proposition
Prior probability
The degree of belief in the absence of other info
• P (Weather)
– P (Weather == sunny)
= 0.7
– P (Weather == rainy)
= 0.2
– P (Weather == cloudy)
= 0.08
– P (Weather == snowy)
= 0.02
• P (Weather) = <0.7, 0.2, 0.08, 0.02>
– Probability distribution for the random variable Weather
Prior probability - Discrete
Joint probability distribution
• P (Weather, Natural Disaster) = an n x m table of probs
– n = instances of weather
– m = instances of natural disasters
Full joint probability distribution
• Probabilities for all variables are established
What about continuous variables where a table won’t suffice?
Prior probability - Continuous
Probability density functions (PDFs)
• P (X = x) = Uniform [18, 26] (x)
– The probability that tomorrow’s temperature is 20.5
degrees Celsius is U [18, 26] (20.5) = 0.125
Conditional probability
The probability of a given all we know is b
• P (a | b)… P(cavity | toothache) = 0.8
Written as an unconditional probability
•
Axioms of probability
• All probabilities are between 0 and 1
• Necessarily true propositions have probability 1
Necessarily false propositions have probability 0
• The probability of disjunction is:
Using axioms of probability
The probability of a proposition is equal to the
sum of the probabilities of the atomic events in
which it holds:
An example
Maginalization:
Conditioning:
Conditional probabilities
Conditional probabilities
Normalization
Two previous calculations had the same denominator
• P(cavity | toothache) = a P(cavity, toothache)
– = a [P(cavity, toothache, catch) + P(cavity, toothache, ~catch)]
– = a [<0.108, 0.016> + <0.012, 0.064>] = a<0.12, 0.08> = <0.6, 0.4>
Generalized (X = cavity, e = toothache, y = catch)
P (X, e, y) is a subset of the full joint distribution
Using the full joint distribution
It does not scale well…
• n Boolean variables
– Table size O (2n)
– Process time O (2n)
Independence
Independence of variables in a domain can
dramatically reduce the amount of information
necessary to specify the full joint distribution
• Adding weather (four states) to this table requires creating
four versions of it (one for each weather state) = 8*4=32 cells
Independence
• P (toothache, catch, cavity, Weather=cloudy) =
P(Weather=cloudy | toothache, catch, cavity) *
P(toothache, catch, cavity)
Because weather and dentistry are independent
• P (Weather=cloudy | toothache, catch, cavity) =
P (Weather = cloudy)
• P (toothache, catch, cavity, Weather=cloudy) =
P(Weather=cloudy) * P(toothache, catch, cavity)
4-cell table
8-cell table
Bayes’ Rule
Useful when you know three things and need to
know the fourth
Example
Meningitis
• Doctor knows meningitis causes stiff necks 50% of the time
• Doctor knows unconditional facts
– The probability of having meningitis is 1 / 50,000
– The probability of having a stiff neck is 1 / 20
• The probability of having meningitis given a stiff neck:
Power of Bayes’ rule
Why not collect more diagnostic evidence?
• Statistically sample to learn P (m | s) = 1 / 5,000
If P(m) changes… due to outbreak… Bayes’
computation adjusts automatically, but sampled
P(m | s) is rigid
Conditional independence
Consider the infeasibility of full joint distributions
• We must know P(toothache and catch) for all Cavity values
Simplify using independence
• Toothache and catch are not independent
• Toothache and catch are independent given the presence or
absence of a cavity
Conditional independence
Toothache and catch are independent given the
presence or absence of a cavity
• If you know you have a cavity, there’s no reason to believe
the toothache and the dentist’s pick are related
Conditional independence
In general, when a single cause influences
multiple effects, all of which are conditionally
independent (given the cause)
Naïve Bayes
Even when “effect” variables are not conditionally
independent, this model is sometimes used
• Sometimes called a Bayesian Classifier