Transcript notes

Uncertainty in AI
• Birds can fly, right?
• Seems like common sense knowledge.
• The problem with First-Order Logic is that universal
rules are too strict
• x bird(x)→flies(x) mean all bird must fly
• but there are many exceptions...
• If we asserted:
• penguin(opus)
• x penguin(x) → flies(x)
• x penguin(x) → bird(x)
• this would make the KB inconsistent
tweety
opus
big bird
woodstock
• we can't enumerate all possible exceptions
•
•
•
•
what about a robin with a broken wing?
what about birds that are made out of plastic?
what about Big Bird?
this is known as the "qualification problem"
• Uncertainty in actions:
• If a gun is loaded and you pull the trigger, the gun will
fire, right?
• we can't enumerate all possible exceptions
• what about a robin with a broken wing?
• what about birds that are made out of plastic?
• what about Big Bird?
• Uncertainty in actions:
• If a gun is loaded and you pull the trigger, the gun will
fire, right?
• ...unless it is a toy gun
• ...unless it is underwater
• ...unless the barrel is filled with concrete
• ...unless it is defective
• what might we want to say is...
• "most birds fly" or
• "some birds do not fly" or
• "tweety possibly flies"
• We cannot express these in FOL
• there are no degrees of truth
• "some birds do not fly" becomes x bird(x)flies(x)
• solutions:
•
•
•
•
Default (non-monotonic) Logics
Fuzzy Logics
procedural approachs
Probability
Default Logic
• extensions of FOL that allow for exceptions
• FOL is monotonic:
• anything that is entailed by a set of sentences will still be
entailed by a superset of sentences
• a |= g implies ab |= g
• bird(opus) |= flies(opus)
• bird(opus)penguin(opus) |= flies(opus) [necessarily]
• Default Logics are non-monotonic
• old conclusions can get retracted due to new
information
• bird(opus)penguin(opus) |= flies(opus)
• Default Logics require alternative model theory,
such as possible worlds (or Kripke) semantics
• without going into the complexities, most default
logics are based on drawing conclusions from
"minimal models" that make the least assumptions
necessary
• Default Logic sytem I
•
•
•
•
new type of "soft" implication
bird(x)  flies(x)
"" means "..and if it is not inconsistent to believe so, then..."
example:
•
•
•
•
•
penguin(opus)
bird(tweety)
x penguin(x) → flies(x)
x penguin(x) → bird(x)
x bird(x)  flies(x)
• model: m0=<U,D,R>
•
•
•
•
•
•
U={a,b}
D={opus→a,tweety→b}
R: bird={a,b}, penguin={a}, flies={b}
because opus is a penguin, 'a' cannot be in the set of things that fly
this makes it inconsistent to apply the "birds fly" rule for opus
this model is the "minimal" consistent model of the sentences
• Circumscription - another approach to Defeasible
Reasoning
• add 'abnormal' predicates to default sentences
•
•
•
•
•
penguin(opus)
bird(tweety)
x penguin(x) → flies(x)
x penguin(x) → bird(x)
x bird(x)abnormal1(x) → flies(x)
• find models that make the fewest abnormal
assumptions to be consistent
•
•
•
•
•
U={a,b}
D={opus→a,tweety→b}
R: bird={a,b}, penguin={a}, flies={b}, abnormal1 ={a}
there is no reason to assume tweety is abnormal
but we opus needs to be abnormal to explain not flying
•
opus
isa canfly
tweety
F
Procedural Solutions penguin
isa canfly
• Semantics networks bird
T
• 'isa' links encode
inheritance relationships
• to answer queries, follow
pointers until property is defined
• Closed World Assumption in Prolog
• anything you cannot prove is assumed to be false
• this makes it more concise to write KBs by asserting only true things
•
•
•
•
•
•
flies(X) :- bird(X),not penguin(X),not too_large_to_fly(X).
bird(tweety).
bird(opus).
bird(big_bird).
penguin(opus).
too_big_to_fly(big_bird).
• note that in FOL, we could not conclude flies(tweety) because
penguin(tweety) would have to be explicitly asserted as a fact
• Rule strengths or priorities in Expert Systems
• ...an old ad-hoc approach (with unclear semantics)
• penguin(x) →0.9 flies(x)
• bird(x) →0.5 flies(x)
• example:
• can assign 'salience' (integers) to defrules in Jess
• this will determine the order in which rules fire in the
forward-chaining
Probability
• encode knowledge in the form of prior probabilities
and conditional probabilities
•
•
•
•
P(x speaks portugese|x is from Brazil)=0.9
P(x speaks portugese)=0.012
P(x is from Brazil)=0.007
P(x flies|x is a bird)=0.8 (?)
• inference is done by calculating posterior
probabilities given evidence
• compute P(cavity | toothache, flossing, dental history,
recent consumption of candy...)
• compute P(fed will raise interest rate | employment=5%,
inflation=0.5%, GDP=2%, recent geopolitical events...)
• Many modern knowledge-based systems are based
on probabilistic inference
• including Bayesian networks, Hidden Markov Models,
Markov Decision Problems
• example: Bayesian networks are used for inferring user
goals or help needs from actions like mouse clicks in an
automated software help system (think 'Clippy')
• Decision Theory combines utilities with probabilities of
outcomes to decide actions to tak
• the challenge is capturing all the numbers needed
for the prior and conditional probabilities
• objectivists (frequentists) - probabilities represent
outcomes of trials/experiments
• subjectivists - probabilities are degress of belief
• probability and statistics is at the core of many
Machine Learning algorithms
• causal vs. diagnostic knowledge
• causal: P(x has a toothache|x has a cavity)=0.9
• diagnostic: P(x has a cavity|x has a toothache)=0.5
• typically it is easier to articulate knowledge in the causal
direction, but we often want to use it in a diagnostic way
to make inferences from observations
• product rule : P(A,B) joint = P(A|B)*P(B)
• Bayes' Rule: convert between causal and diagnostic
H = hypothesis (cause, disease)
E = evidence (effect, symptoms)
• Joint probability table (JPT)
• you can calculate answer to any question from JPT
• the problem is there are exponential # of entries (2N,
where N is the number of binary random variables)
• Joint probability table (JPT)
• you can calculate answer to any question from JPT
• the problem is there are exponential # of entries (2N,
where N is the number of binary random variables)
P(cavity | toothache)
= P(cavity  toothache) / P(toothache)
=
0.016+0.064 /
(0.108 + 0.012 + 0.016 + 0.064)
= 0.4
Solution to reduce complexity: Employ the Independence Assumption
Most variables are not strictly independent, in that there is a statistical association (but
which is cause? effect?). However, some many variables are conditionally independent.
P(A,B|C) = P(A|C)P(B|C)
Bayesian networks
note: absence of link between Burglary and JohnCalls means JohnCalls is conditionally
independent of Burglary given Alarm