chap13 - Computer Science
Download
Report
Transcript chap13 - Computer Science
Uncertainty
Logical approach problem: we do not
always know complete truth about the
environment
Example:
Leave(t) = leave for airport t minutes
before flight
Query: ? t Leave(t ) ArriveOnTime
Problems
Why can’t we determine t exactly?
Partial observability
Uncertainty in action outcomes
road state, other drivers’ plans
flat tire
Immense complexity of modeling and
predicting traffic
Problems
Three specific issues:
Laziness
Theoretical ignorance
Too much work to list all antecedents or
consequents
Not enough information on how the world
works
Practical ignorance
If if we know all the “physics”, may not have all
the facts
What happens with a purely
logical approach?
Either risks falsehood…
… or leads to conclusions to weak to do
anything with:
“Leave(45) will get me there on time”
“Leave(45) will get me there on time if there’s no
snow and there’s no train crossing Route 19 and
my tires remain intact and...”
Leave(1440) might work fine, but then I’d
have to spend the night in the airport
Solution: Probability
Given the available evidence, Leave(35) will
get me there on time with probability 0.04
Probability address uncertainty, not degree of
truth
Degree of truth handled by fuzzy logic
IsSnowing is true to degree 0.2
Probabilities summarize effects of laziness
and ignorance
We will use combination of probabilities and
utilities to make decisions
Subjective or Bayesian
probability
We will make probability estimates
based on knowledge about the world
P(Leave(45) | No Snow) = 0.55
Probability assessment if the world were a
certain way
Probabilities change with new
information
P(Leave(45) | No Snow, 5 AM) = 0.75
Making decision under
uncertainty
Suppose I believe the following:
P(Leave(35) gets me there on time | ...) = 0.04
P(Leave(45) gets me there on time | ...) = 0.55
P(Leave(60) gets me there on time | ...) = 0.95
P(Leave(1440) gets me there on time | ...) = 0.9999
Which action do I choose?
Depends on my preferences for missing flight vs.
eating in airport, etc.
Decision theory takes into account utility and
probabilities
Axioms of Probability
For any propositions A and B:
0 P ( A) 1
P(True) 1, P( False ) 0
P( A B) P ( A) P( B ) P( A B)
Example:
A = computer science major
B = born in Minnesota
Notation and Concepts
Unconditional probability or prior
probability:
P(Cavity) = 0.1
P(Weather = Sunny) = 0.55
corresponds to belief prior to arrival of any
new evidence
Weather is a multivalued random variable
Could be one of <Sunny, Rain, Cloudy, Snow>
P(Cavity) shorthand for P(Cavity=true)
Probability Distributions
Probability Distribution gives probability
values for all values
P(Weather) = <0.55, 0.05, 0.2, 0.2>
must be normalized: sum to 1
Joint Probability Distribution gives probability
values for combinations of random variables
P(Weather, Cavity) = 4 x 2 matrix
Posterior Probabilities
Conditional or Posterior probability:
P(Cavity | Toothache) = 0.8
For conditional distributions:
P(Weather | Earthquake) =
Earthquake=false
Earthquake=true
Posterior Probabilities
More knowledge does not change
previous knowledge, but may render
old knowledge unnecessary
P(Cavity | Toothache, Cavity) = 1
New evidence may be irrelevant
P(Cavity | Toothache, Schiller in Mexico) =
0.8
Definition of Conditional
Probability
Two ways to think about it
P( A B)
P( A | B)
, if P( B) 0
P( B)
Definition of Conditional
Probability
Another way to think about it
P( A B) P( A) P( B | A) P( B) P( A | B)
Sanity check: Why isn’t it just:
P( A B) P( A) P( B)
General version holds for probability
distributions:
P(Weather, Cavity) P(Weather | Cavity)P(Cavity)
This is a 4 x 2 set of equations
Bayes’ Rule
Product rule given by
P( A B) P( A) P( B | A) P( B) P( A | B)
Bayes’ Rule:
P( B | A) P( A)
P( A | B)
P( B)
Bayes’ rule is extremely useful in trying
to infer probability of a diagnosis, when
the probability of cause is known.
Bayes’ Rule example
Does my car need a new drive axle?
If a car needs a new drive axle, with 30% probability
this car jerks around
Unconditional probabilites:
P(jerks) = 1/1000
P(needs axle) = 1/10,000
Then:
P(jerks | needs axle) = 0.3
P(needs axle | jerks) = P(jerks | needs axle) P(needs axle)
-----------------------------------------P(jerks)
= (0.3 x 1/10,000) / (1/1000) = .03
Conclusion: 3 of every 100 cars that jerk need an axle
Not dumb question
P( B | A) P( A)
P( A | B)
P( B)
Question:
Why should I be able to provide an
estimate of P(B|A) to get P(A|B)?
Why not just estimate P(A|B) and be done
with the whole thing?
Not dumb question
Answer:
Diagnostic knowledge is often more tenuous
than causal knowledge
Suppose drive axles start to go bad in an
“epidemic”
e.g. poor construction in a major drive axle brand
two years ago is now haunting us
P(needs axle) goes way up, easy to measure
P(needs axle | jerks) should (and does) go up
accordingly – but how to estimate?
P(jerks | needs axle) is based on causal
information, doesn’t change