Transcript chapter9

CS 2710, ISSP 2160
Chapter 9
Inference in First-Order Logic
1
Pages to skim
• Storage and Retrieval (p. starts bottom 328)
• Efficient forward chaining (starts p. 333) through
Irrelevant facts (ends top 337)
• Efficient implementation of logic programs (starts p. 340)
through Constraint logic programming (ends p. 345)
• Completeness of resolution (starts p. 350) (though see notes
in slides)
2
Inference with Quantifiers
• Universal Instantiation:
– Given X (person(X)  likes(X, sun))
– Infer person(john)  likes(john,sun)
• Existential Instantiation:
– Given x likes(x, chocolate)
– Infer: likes(S1, chocolate)
– S1 is a “Skolem Constant” that is not found anywhere else
in the KB and refers to (one of) the individuals that likes
sun.
3
Reduction to Propositional Inference
• Simple form (pp. 324-325) not efficient. Useful
conceptually.
• Replace each universally quantified sentence by all possible
instantiations
– All X (man(X)  mortal(X)) replaced by
– man(tom)  mortal(tom)
– man(chocolate)  mortal(chocolate)
– …
• Now, we have propositional logic.
• Use propositional reasoning algorithms from Ch 7
4
Reduction to Propositional Inference
• Problem: when the KB includes a function symbol, the set of
term substitutions is infinite. father(father(father(tom)))
…
• Herbrand 1930: if a sentence is entailed by the original FO
KB, then there is a proof using a finite subset of the
propositionalized KB
• Since any subset has a maximum depth of nesting in terms,
we can find the subset by generating all instantiations with
constant symbols, then all with depth 1, and so on
5
First-Order Inference
• We have an approach to FO inference via
propositionalization that is complete: any entailed sentence
can be proved
• Entailment for FOPC is semi-decidable: algorithms exist that
say yes to every entailed sentence, but no algorithm exists
that also says no to every nonentailed sentence.
• Our proof procedure could go on and on, generating more and
more deeply nested terms, but we will not know whether it is
stuck in a loop, or whether the proof is just about to pop out
6
Generalized Modus Ponens
• This is a general inference rule for FOPC that does not
require universal instantiation first
• Given:
– p1’, p2’ … pn’, (p1  … pn)  q
– Subst(theta, pi’) = subst(theta, pi) for all i
• Conclude:
– Subst(theta, q)
7
GMP is a lifted version of MP
• GMP “lifts” MP from propositional to first-order logic
• Key advantage of lifted inference rules over
propositionalization is that they make only substitutions
which are required to allow particular inferences to proceed
8
GMP Example
• x,y,z ((parent(x,y)  parent(y,z))  grandparent(x,z))
• parent(james, john), parent(james, richard), parent(harry,
james)
• We can derive:
– Grandparent(harry, john), bindings:
{x/harry,y/james,z/john}
– Grandparent(harry, richard), bindings:
{x/harry,y/james,z/richard}
9
Unification
• Process of finding all legal substitutions
• Key component of all FO inference algorithms
• Unify(p,q) = theta, where Subst(theta,p) == Subst(theta,q)
Assuming all variables universally quantified
10
Standardizing apart
• All X knows(john,X).
• All X knows(X,elizabeth).
• These ought to unify, since john knows everyone, and
everyone knows elizabeth.
• Rename variables to avoid such name clashes
Note:
all X p(X) == all Y p(Y)
All X (p(X) ^ q(X)) == All X p(X) ^ All Y q(Y)
11
def Unify (p, q, bdgs):
d = disagreement(p, q)
# If there is no disagreement, then success.
if not d: return bdgs
elif not isVar(d[0]) and not isVar(d[1]): return 'fail'
else:
if isVar(d[0]): var = d[0] ; other = d[1]
else: var = d[1] ; other = d[0]
if occursp (var,other): return ‘fail’
# Make appropriate substitutions and recurse on the result.
else:
pp = replaceAll(var,other,p)
qq = replaceAll(var,other,q)
return Unify (pp,qq, bdgs + [[var,other]])
For code, see “resources” on the webpage
12
================================
unify:
['loves', ['dog', 'var_x'], ['dog', 'fred']]
['loves', 'var_z', 'var_z']
subs: [['var_z', ['dog', 'var_x']], ['var_x', 'fred']]
result: ['loves', ['dog', 'fred'], ['dog', 'fred']]
================================
unify:
['loves', ['dog', 'fred'], 'fred']
['loves', 'var_x', 'var_y']
subs: [['var_x', ['dog', 'fred']], ['var_y', 'fred']]
result: ['loves', ['dog', 'fred'], 'fred']
================================
unify:
['loves', ['dog', 'fred'], 'mary']
['loves', ['dog', 'var_x'], 'var_y']
subs: [['var_x', 'fred'], ['var_y', 'mary']]
result: ['loves', ['dog', 'fred'], 'mary']
================================
13
unify:
['loves', ['dog', 'fred'], 'mary']
['loves', ['dog', 'var_x'], 'var_y']
subs: [['var_x', 'fred'], ['var_y', 'mary']]
result: ['loves', ['dog', 'fred'], 'mary']
================================
unify:
['loves', ['dog', 'fred'], 'fred']
['loves', 'var_x', 'var_x']
failure
================================
unify:
['loves', ['dog', 'fred'], 'mary']
['loves', ['dog', 'var_x'], 'var_x']
failure
================================
unify:
['loves', 'var_x', 'fred']
['loves', ['dog', 'var_x'], 'fred']
var_x occurs in ['dog', 'var_x']
failure
14
unify:
['loves', 'var_x', ['dog', 'var_x']]
['loves', 'var_y', 'var_y']
var_y occurs in ['dog', 'var_y']
failure
================================
unify:
['loves', 'var_y', 'var_y']
['loves', 'var_x', ['dog', 'var_x']]
var_x occurs in ['dog', 'var_x']
failure
================================
unify: (fails because vars not standardized apart)
['hates', 'agatha', 'var_x']
['hates', 'var_x', ['f1', 'var_x']]
failure
================================
unify:
['hates', 'agatha', 'var_x']
['hates', 'var_y', ['f1', 'var_y']]
subs: [['var_y', 'agatha'], ['var_x', ['f1', 'agatha']]]
result: ['hates', 'agatha', ['f1', 'agatha']]
15
Most General Unifier
• The Unify algorithm returns a MGU
L1 = p(X,f(Y),b)
L2 = p(X,f(b),b)
Subst1 = {X\a, Y\b}
Result1 = p(a,f(b),b)
Subst2 = {Y\b}
Result2 = p(X,f(b),b)
Subst1 is more restrictive than Subst2. In fact,
Subst2 is a MGU of L1 and L2.
16
Storage and retrieval
• Hash statements by predicate for quick retrieval (predicate
indexing), e.g., of all sentences that unify with tall(X)
• Why attempt to unify
– tall(X) and silly(dog(Y))
• Instead
– Predicates[tall] = {all tall facts}
– Unify(tall(X),s) for s in Predicates[tall]
• Subsumption lattice for efficiency (see p. 329 for your
interest)
17
Inference Methods
• Unification (prerequisite)
• Forward Chaining
– Production Systems
– RETE Method (OPS)
• Backward Chaining
– Logic Programming (Prolog)
• Resolution
– Transform to CNF
– Generalization of Prop. Logic resolution
18
Resolution Theorem Proving (FOL)
• Convert everything to CNF
• Resolve, with unification
– Save bindings as you go!
• If resolution is successful, proof succeeds
• If there was a variable in the item to prove, return
variable’s value from unification bindings
19
Converting to CNF
20
Converting sentences to CNF
1. Eliminate all ↔ connectives
(P ↔ Q)  ((P  Q) ^ (Q  P))
2. Eliminate all  connectives
(P  Q)  (P  Q)
3. Reduce the scope of each negation symbol to a single predicate
P  P
(P  Q)  P  Q
(P  Q)  P  Q
(x)P  (x)P
(x)P  (x)P
4. Standardize variables: rename all variables so that each quantifier
has its own unique variable name
21
Converting sentences to clausal form: Skolem
constants and functions
5. Eliminate existential quantification by introducing Skolem
constants/functions
(x)P(x)  P(c)
c is a Skolem constant (a brand-new constant symbol that is not used
in any other sentence)
(x)(y)P(x,y) becomes (x)P(x, F(x))
since  is within the scope of a universally quantified variable, use a
Skolem function F to construct a new value that depends on the
universally quantified variable
f must be a brand-new function name not occurring in any other sentence
in the KB.
E.g., (x)(y)loves(x,y) becomes (x)loves(x,F(x))
In this case, F(x) specifies the person that x loves
E.g., x1 x2 x3 y P(… y …) becomes
x1 x2 x3 P(… FF(x1,x2,x3) …) (FF is a new name)
22
Converting sentences to clausal form
6. Remove universal quantifiers by (1) moving them all to the
left end; (2) making the scope of each the entire sentence;
and (3) dropping the “prefix” part
Ex: (x)P(x)  P(x)
7. Put into conjunctive normal form (conjunction of
disjunctions) using distributive and associative laws
(P  Q)  R  (P  R)  (Q  R)
(P  Q)  R  (P  Q  R)
8. Split conjuncts into separate clauses
9. Standardize variables so each clause contains only variable
names that do not occur in any other clause
23
An example
(x)(P(x)  ((y)(P(y)  P(F(x,y)))  (y)(Q(x,y)  P(y))))
2. Eliminate 
(x)(P(x)  ((y)(P(y)  P(F(x,y)))  (y)(Q(x,y)  P(y))))
3. Reduce scope of negation
(x)(P(x)  ((y)(P(y)  P(F(x,y))) (y)(Q(x,y)  P(y))))
4. Standardize variables
(x)(P(x)  ((y)(P(y)  P(F(x,y))) (z)(Q(x,z)  P(z))))
5. Eliminate existential quantification
(x)(P(x) ((y)(P(y)  P(F(x,y))) (Q(x,G(x))  P(G(x)))))
6. Drop universal quantification symbols
(P(x)  ((P(y)  P(F(x,y))) (Q(x,G(x))  P(G(x)))))
24
An Example
7. Convert to conjunction of disjunctions
(P(x)  P(y)  P(F(x,y)))  (P(x)  Q(x,G(x))) 
(P(x)  P(G(x)))
8. Create separate clauses
P(x)  P(y)  P(F(x,y))
P(x)  Q(x,G(x))
P(x)  P(G(x))
9. Standardize variables
P(x)  P(y)  P(F(x,y))
P(z)  Q(z,G(z))
P(w)  P(G(w))
Note: Now that quantifiers are gone, we do need the upper/lower-case
distinction
25
1. all X (read (X) --> literate (X))
2. all X (dolphin (X) --> ~literate (X))
3. exists X (dolphin (X) ^ intelligent (X))
(a translation of ``Some dolphins are intelligent'')
``Are there some who are intelligent but cannot read?''
4. exists X (intelligent(X) ^ ~read (X))
Set of clauses (1-3):
1. ~read(X) v literate(X)
2. ~dolphin(Y) v ~literate(Y)
3a. dolphin (a)
3b. intelligent (a)
Negation of 4:
~(exists Z (intelligent(Z) ^ ~read (Z)))
In Clausal form:
~intelligent(Z) v read(Z)
Resolution proof: in lecture.
26
More complicated example
Did Curiosity kill the cat
• Jack owns a dog. Every dog owner is an animal lover. No
animal lover kills an animal. Either Jack or Curiosity killed the
cat, who is named Tuna. Did Curiosity kill the cat?
• These can be represented as follows:
A. (x) (Dog(x)  Owns(Jack,x))
B. (x) (((y) (Dog(y)  Owns(x, y)))  AnimalLover(x))
C. (x) (AnimalLover(x)  ((y) Animal(y)  Kills(x,y)))
D. Kills(Jack,Tuna)  Kills(Curiosity,Tuna)
E. Cat(Tuna)
F. (x) (Cat(x)  Animal(x) )
G. Kills(Curiosity, Tuna)
GOAL
27
• Convert to clause form
A1. (Dog(D))
D is a skolem constant
A2. (Owns(Jack,D))
B. (Dog(y), Owns(x, y), AnimalLover(x))
C. (AnimalLover(a), Animal(b), Kills(a,b))
D. (Kills(Jack,Tuna), Kills(Curiosity,Tuna))
E. Cat(Tuna)
F. (Cat(z), Animal(z))
• Add the negation of query:
G: (Kills(Curiosity, Tuna))
28
• The resolution refutation proof
R1: G, D, {}
(Kills(Jack, Tuna))
R2: R1, C, {a/Jack, b/Tuna}
(~AnimalLover(Jack),
~Animal(Tuna))
R3: R2, B, {x/Jack}
(~Dog(y), ~Owns(Jack, y),
~Animal(Tuna))
R4: R3, A1, {y/D}
(~Owns(Jack, D),
~Animal(Tuna))
R5: R4, A2, {}
(~Animal(Tuna))
R6: R5, F, {z/Tuna}
(~Cat(Tuna))
R7: R6, E, {}
FALSE
29
• The proof tree
G
D
{}
R1: K(J,T)
C
{a/J,b/T}
R2: AL(J)  A(T)
B
{x/J}
R3: D(y)  O(J,y)  A(T)
A1
{y/D}
R4: O(J,D), A(T)
A2
{}
R5: A(T)
F
{z/T}
R6: C(T)
A
{}
R7: FALSE
30
Decidability and Completeness
• Resolution is a refutation complete inference procedure for
First-Order Logic
– If a set of sentences contains a contradiction, then a
finite sequence of resolutions will prove this.
– If not, resolution may loop forever (“semi-decidable”)
• Here are notes by Charles Elkan that go into this more
deeply
31
Decidability and Completeness
• Refutation Completeness: If KB |= A then KB |- A
– If it’s entailed, then there’s a proof
• Semi-decidable:
– If there’s a proof, we’ll halt with it.
– If not, maybe halt, maybe not
• Logical entailment in FOL is semi-decidable: if the desired
conclusion follows from the premises, then eventually
resolution refutation will find a contradiction
32
Decidability and Completeness
• Propositional logic
– logical entailment is decidable
– There exists a complete inference procedure
• First-Order logic
– logical entailment is semi-decidable
– Resolution procedure is refutation complete
33
• Strategies (heuristics) for efficient
resolution include
– Unit preference. If a clause has only one
literal, use it first.
– Set of support. Identify “useful” rules and
ignore the rest. (p. 305)
– Input resolution. Intermediately generated
sentences can only be combined with original
inputs or original rules.
– Subsumption. Prune unnecessary facts from
the database.
34
Horn Clauses
• A Horn Clause is a CNF clause with at most one positive
literal
• Horn Clauses form the basis of forward and backward
chaining
• The Prolog language is based on Horn Clauses
• Deciding entailment with Horn Clauses is linear in the size of
the knowledge base
35
Reasoning with Horn Clauses
• Forward Chaining
– For each new piece of data, generate all new facts, until
the desired fact is generated
– Data-directed reasoning
• Backward Chaining
– To prove the goal, find a clause that contains the goal as
its head, and prove the body recursively
– Goal-directed reasoning
• The state space is an AND-OR graph; see 7.5.4
36
Forward Chaining over FO Definite (Horn)
Clauses
• Clauses (disjunctions) with at most one positive literal
• First-order literals can include variables, which are assumed
to be universally quantified
• Use GMP to perform forward chaining
(Semi-decidable as for full FOPC)
37
Def FOL-FC-Ask(KB,A) returns subst or false
KB: set of FO definite clauses with variables standardized apart
A: the query, an atomic sentence
Repeat until new is empty
new  {}
for each implication (p1 ^ … ^ pn  q) in KB:
for each T such that SUBST(T,p1^…^pn) =
SUBST(T,p1’^…^pn’) for some p1’,…,pn’ in KB
q’  SUBST(T,q)
if q’ is not a renaming of a sentence already in KB or new:
add q’ to new
S  Unify(q’,A)
if S is not fail then return S
add new to KB
Return false
Process can be made more efficient; read on your own, for interest
38
Backward Chaining over Definite (Horn)
Clauses
• Logic programming
• Prolog is most popular form
• Depth-first search, so space requirements are lower, but
suffers from problems from repeated states
39
american(X) ^ weapon(Y) ^ sells(X,Y,Z) ^ hostile(Z)  criminal(X).
owns(nono,m1). missile(m1).
missile(X1) ^ owns(nono,X1)  sells(west,X1,nono).
missile(X2)  weapon(X2).
enemy(X3,america)  hostile(X3).
american(west).
enemy(nono,america).
Goal: criminal(west).
Backward chaining proof: in lecture
In Prolog:
criminal(X) :- american(X), weapon(Y), sells(X,Y,Z), hostile(Z).
40
Horn clauses are all of the form:
L1 ^ L2 ^ ... ^ Ln -> Ln+1
Or, equivalently, in clausal form:
~L1 v ~L2 v ... v ~Ln v Ln+1
Prolog (like databases) makes the "closed world assumption":
if P cannot be proved, infer not P
Think of the system as an arrogant know-it-all:
"If it were true, I would know it. Since I can't prove
it, it must not be true"
Thus, it uses "negation as failure".
41
neighbor(canada,us)
neighbor(mexico,us)
neighbor(pakistan,india)
?- neighbor(canada,india).
no
In full first-order logic, you would have to be able to
infer “~neighbor(canada,india)" for
"neighbor(canada,india)" to be false.
Be careful! “~neighbor(canada,india) is not entailed by the
Sentences above!
42
bachelor(X) :- male(X), \+ married(X).
male(bill).
male(jim).
married(bill).
married(mary).
An individual is a bachelor if it is male and it is
not married. \+ is the negation-as-failure operator in Prolog.
| ?- bachelor(bill).
no
| ?- bachelor(jim).
yes
| ?- bachelor(mary).
no
| ?- bachelor(X).
X = jim;
no
| ?43
Comparing backward chaining in
prolog with resolution
• [In lecture]
44
WrapUp
• You are responsible for everything in Chapter 9 except the
following (though you are encouraged to read them):
– Storage and Retrieval (p. starts bottom 328)
– Efficient forward chaining (starts p. 333) through
Irrelevant facts (ends top 337)
– Efficient implementation of logic programs (starts p.
340) through Constraint logic programming (ends p. 345)
– Completeness of resolution (starts p. 350) (though see
notes in slides)
• Also, see files posted on the schedule (clausal form
conversion, resolution, etc.)
45