hand paper mill

Download Report

Transcript hand paper mill

Learning abductive reasoning
using random examples
Brendan Juba
Washington University in St. Louis
Outline
1. Models of abductive reasoning
2. An abductive reasoning algorithm
3. “Not-easiness” of abducing conjunctions
Abductive reasoning:
making plausible guesses
• Abductive reasoning: Given a conclusion c,
find a “plausible” h that implies/leads to/… c
• Two varieties of “plausibility” in common use
– Logical plausibility: a small h from which c follows
– Bayesian plausibility: a h which has
large posterior probability when given c
In symbols… Pr [h|c true] > …
Requires a prior
distribution over
representations…
Why might we want a new model?
• Existing models only tractable in simple cases
– E.g. Horn rules (a⋀b⋀c⇒d …no negations), “nice”
(conjugate) priors
• The choice of prior distribution really matters
– And, it’s difficult to specify by hand
New model: abductive reasoning
from random examples
• Fix a set of attributes
(propositional variables x1, x2, …, xn)
• An environment is modeled by an arbitrary,
unknown distribution D over examples, i.e.,
settings of the n propositional variables.
• Task: for a conclusion c, find a h such that
1. Plausibility: Prx∈D[h(x)=1] ≥ μ (for some given μ)
2. h almost entails c: Prx∈D[c(x)=1|h(x)=1] ≥ 1-ε
All probabilities over examples from D
Example: identifying a subgoal
• Consider: blocks world. For t=1,2,…,T
B
– Propositional state vars. (“fluents”)
A
ONt(A,B), ONt (A,TABLE), ONt (C,A), etc.
– Actions also encoded by propositional vars.Or, even given
by examples, not
PUTt(B,A), PUTt(C,TABLE), etc.
explicitly
formulated…
• Given many examples of interaction…
• Our goal c: ONT(A,TABLE)⋀ONT(B,A) ⋀ONT(C,B)
• A perhaps plausibly good “subgoal” h:
[ONT-1(B,A) ⋀PUTT(C,B)]⋁[PUTT-1 (B,A) ⋀PUTT
(C,B)]
Formally: abductive reasoning
from random examples for a class H
• Fix a class of Boolean representations H
• Given Boolean formula c; ε, δ, μ∈(0,1);
independent examples x(1),…,x(m) ∈D,
• Suppose that there exists a h*∈H such that
1. Plausibility: Prx∈D[h*(x)=1] ≥ μ
2. h* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1
• Find a h (ideally in H) such that with prob. 1-δ,
1. Plausibility: Prx∈D[h(x)=1] ≥ 1/poly(1/μ, 1/1-ε, n)
2. h almost entails c: Prx∈D[c(x)=1|h(x)=1] ≥ 1-ε
in pictures…
x ∈ {0,1}n
x: h(x)=1
c(x)=1
c(x)=1
c(x)=0
h(x)=1
c: goal/observation…
h: explanation/solution/…
Outline
1. Models of abductive reasoning
2. An abductive reasoning algorithm
3. “Not-easiness” of abducing conjunctions
Theorem 1. If there is a k-DNF h* such that
1. Plausibility: Prx∈D[h*(x)=1] ≥ μ
2. h* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1
then using m = O(1/με (nk+log1/δ)) examples,
in time O(mnk) we can find a k-DNF h such that
with probability 1-δ,
1. Plausibility: Prx∈D[h(x)=1] ≥ μ
2. h almost entails c: Prx∈D[c(x)=1|h(x)=1] ≥ 1-ε
k-DNF: an OR of “terms of size k” – ANDs of at most k
“literals” – attributes or their negations
Algorithm for k-DNF abduction
• Start with h as an OR over all terms of size k
• For each example x(1),…,x(m)
– If c(x(i)) = 0,
delete all terms T from h such that T(x(i)) = 1
• Return h
Simple algorithm, 1st proposed by J.S. Mill, 1843
Running time is clearly O(mnk)
Analysis pt 1: Prx∈D[h(x)=1] ≥ μ
• We are given that some k-DNF h* has
1. Plausibility: Prx∈D[h*(x)=1] ≥ μ
2. h* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1
• Initially, every term of h* is in h
• Terms of h* are never true when c(x)=0 by 2.
☞every term of h* remains in h
☞h* implies h, so Prx[h(x)=1]≥Prx[h*(x)=1]≥μ.
Analysis pt 2:
Prx[c(x)=1|h(x)=1] ≥ 1-ε
• Rewrite conditional probability:
Prx∈D [c(x)=0⋀h(x)=1] ≤ εPrx∈D [h(x)=1]
• We’ll show: Prx∈D [c(x)=0⋀h(x)=1] ≤ εμ
( ≤ εPrx∈D [h(x)=1] by part 1)
• Consider any h’ s.t. Prx[c(x)=0⋀h’(x)=1] > εμ
– Since each x(i) is drawn independently from D
Prx∈D [no i has c(x(i))=0 ⋀ h’(x(i))=1] < (1-εμ)m
– A term of h’ is deleted when c=0 and h’ =1
– So, h’ is only possibly output w.p. < (1-εμ)m
Analysis pt 2, cont’d:
Prx[c(x)=1|h(x)=1] ≥ 1-ε
• We’ll show: Prx∈D [c(x)=0⋀h(x)=1] ≤ εμ
• Consider any h’ s.t. Prx[c(x)=0⋀h’(x)=1] > εμ
– h’ is only possibly output w.p. < (1-εμ)m
• There are only 2O(nk) possible k-DNF h’
• Since (1-1/x)x ≤ 1/e, m = O(1/με (nk+log1/δ)) ex’s
suffice to guarantee that each such h’
k)
O(n
is only possible to output w.p. < δ/2
☞w.p. >1-δ, our h has Prx[c(x)=0⋀h(x)=1] ≤ εμ
Theorem 1. If there is a k-DNF h* such that
1. Plausibility: Prx∈D[h*(x)=1] ≥ μ
2. h* entails c: Prx∈D[c(x)=1|h*(x)=1] = 1
then using m = O(1/με (nk+log1/δ)) examples,
in time O(mnk) we can find a k-DNF h such that
with probability 1-δ,
1. Plausibility: Prx∈D[h(x)=1] ≥ μ
2. h almost entails c: Prx∈D[c(x)=1|h(x)=1] ≥ 1-ε
k-DNF: an OR of “terms of size k” – ANDs of at most k
“literals” – attributes or their negations
A version that tolerates exceptions is also possible–see paper…
Outline
1. Models of abductive reasoning
2. An abductive reasoning algorithm
3. “Not-easiness” of abducing conjunctions
Theorem 2. Suppose that a polynomial-time
algorithm exists for learning abduction from
random examples for conjunctions.
Then there is a polynomial-time algorithm for
PAC-learning DNF.
PAC Learning
w.p. 1-ε over…
D
x’=(x’1,x’2,…,x’n)
C∈C
c(x’)
(x(1), c(x(1)))
(x(2)1,c(x(2)))
…
(x(m), c(x(m)))
w.p. 1-δ
over…
f(x’)
C
f
Theorem 2. Suppose that a polynomial-time
algorithm exists for learning abduction from
random examples for conjunctions.
Then there is a polynomial-time algorithm for
PAC-learning DNF.
Theorem (Daniely & Shalev-Shwartz’14). If
there is a polynomial-time algorithm for PAClearning DNF, then for every f(k)⟶∞ there is a
polynomial-time algorithm for refuting random
k-SAT formulas of nf(k) clauses.
This is a new hardness assumption. Use at your discretion.
Key learning technique:
Boosting (Schapire, 1990)
• Suppose that there is a polynomial-time
algorithm that, given examples of c∈C, w.p. 1δ produces a circuit f s.t.
Prx[f(x)=c(x)] > ½+1/poly(n).
Then there is a polynomial-time PAC-learning
algorithm for the class C.
– i.e., using the ability to produce such f’s, we
produce a g for which Prx[g(x)=c(x)] is “boosted”
to any 1-ε we require.
Sketch of learning DNF using
conjunctive abduction
• If c is a DNF and Pr[c(x)=1]> ¼, then some
term T of c has Pr[T(x)=1]>1/4|c|
– …and otherwise f≡0 satisfies Pr[f(x)=c(x)]>½+¼
• Note: Pr[c(x)=1|T(x)=1]=1, T is a conjunction
☞Abductive reasoning finds some h such that
Pr[h(x)=1]>1/poly(n) and Pr[c(x)=1|h(x)=1]>¾
• Return f: f(x)=1 whenever h(x)=1; if h(x)=0,
f(x)=majx:h(x)=0{c(x)}—Pr[f(x)=c(x)]>½+1/4poly(n)
Recap: learning abduction
from random examples
For goal condition c, if there is a h* such that
Prx∈D[h*(x)=1] ≥ μ & Prx∈D[c(x)=1|h*(x)=1] = 1
using examples x from D, find a h such that
Prx∈D[h(x)=1] ≥ μ’ & Prx∈D[c(x)=1|h(x)=1] ≥ 1-ε
Thm 1. An efficient algorithm exists for k-DNF h
Thm 2. Unless DNF is PAC-learnable, there is no
efficient algorithm for conjunctions