Descriptive ILP for Mathematical Discovery

Download Report

Transcript Descriptive ILP for Mathematical Discovery

Descriptive ILP for
Mathematical Discovery
Simon Colton
Computational Bioinformatics Lab
Department of Computing
Imperial College, London
Overview
Inductive logic programming
Predictive versus descriptive induction
The HR program
Applications to mathematics
All new application to vision data
Future work
Logic Programs
Representation language
Subset of first order logic
Conjunction of literals (body) implies a
single literal (head)
E.g., p(X)  q(Y,X)  r(X)
Huge amount of theoretical work
Many knowledge bases
Even a programming language (Prolog)
Inductive Logic Programming
Machine learning
Improve with respect to a task
Usually classification/prediction via concept/rule learning
In light of experience
Usually data/background knowledge
Given:
Background logic programs, B
Positive examples E+ and negative examples E-
Learn:
A hypothesis logic program, H such that
(B  H)  E+ and (B  H)  E-
Predictive Induction
AI went down the problem solving path
Everything shoe-horned into this paradigm
Set up a problem as follows:
Given a set of positives and a set of negatives
Learn a reason why positives are positives and the
negatives are negative
Reasons are rulesets/concepts/math-formulae
Positives only
Manage the same task without negatives
Reasons allow for prediction of the class of
Of unseen examples
Predictive ILP programs: Progol, FOIL, …
Descriptive Induction
Much broader remit:
Given the same background and data
Find something interesting about the data
Interesting thing may be:
A particular example
A concept which categorises examples
A hypothesis which relates concepts
An explanation of a hypothesis
Descriptive ILP systems: Claudien, Warmr
Descriptive versus Predictive
Predictive:
“You know what you’re looking for, but you don’t
know what it looks like”
Descriptive:
“You don’t know what you’re looking for”
Future:
“You don’t even know you’re looking for something”
E.g., adding an entry to a database
Example – the Animals
Classic toy problem for ILP
Given details of a set of animals
E.g., covering, milk, homeothermic, legs
Learn reasons for categorisation into:
Mammals, reptiles, birds, fish
Descriptive induction finds the same rules
But as conjectures, not as solutions to a problem
Finds other things of interest:
E.g., “Did you see that the duck-billed platypus is the only
mammal which lays eggs?”
The HR Program
Performs descriptive induction
Developed since 1997
Edinburgh, York, Imperial
PhD (prolog), PostDoc (java), beyond (projects)
Mainly applied to mathematics
But developed as a general purpose ML program
Also applied to AI tasks
Predictive learning, theorem proving, CSPs
Automated Theory Formation
A theory consists of at least:
Examples, concept, conjectures, proofs
Given background knowledge
Examples, concepts, axioms
E.g., groups, multiplication, identity axiom
Theory formation proceeds by a cycle of:
Concept formation
Conjecture making
Proof/disproof attempts
Assessment of concepts
Concept Formation
15 production rules
Take old concept(s) and produce new ones
Most PRs are generic
Exists, forall, compose, negate, split, size,
match, equal, disjunct, linear constraint
A few are more domain specific
Record, arithmetic, embed graph,
Example Construction
Odd prime numbers
Size
[a,b] : b|a
[a,n]:n = |{b:b|a}|
Split
[a] : 2|a
Negate
Split
[a]:2=|{b:b|a}|
Compose [a] : not 2|a
[a]:2=|{b:b|a}| & not 2|a
Conjecture Making
Conjectures made empirically
At each attempt to make a new concept
New concept has same success set as old one
An equivalence conjecture is made
New concept has empty success set
A non-existence conjecture is made
Success set is a proper subset/superset
An implication conjecture is made
More succinct results are extracted from these
Implications, implicates and prime implicates
Explanation Handling
User supplies some axioms of domain
E.g., three axioms of group theory
HR appeals to third party ATP software
Uses Otter to try to prove that each
conjecture follows from the axioms
If unsuccessful, uses Mace to try and find a
counterexample
New example added to the theory
Other reasoning software used (MathWeb)
Interestingness
HR has many types of search:
BFS, DFS, random, tiered, reactive, heuristic
Usually depth limit (complexity)
Heuristic search:
Measure the “interestingness” of each concept
Build new concepts from the best old ones
Intrinsic measures
Comprehensibility, applicability, variety, …
Relative values
Novelty, child/parent, interestingness of conjectures
Utilitarian measures
Coverage, highlight, invariance, discrimination, …
Handling Uncertain Data
Given a genuinely new concept
HR tries to make conjectures opportunistically
Near-implications
One concept has nearly a subset of examples of another
User specifies the minimum percentage match, e.g., 20%
E.g., prime numbers are odd (only one exception)
Near-equivalences
Two concepts have nearly the same success sets
Often choose to look at the positives only
Near-nonexists
New concept has very few examples
E.g., even prime number if and only if equal to 2
Mathematical Discovery
HR has been applied to:
Number theory, graph theory, algebraic
domains, e.g., group theory, ring theory
Early approaches were entirely descriptive
Interesting concepts and conjectures looked for
in the theories HR formed
Later approaches largely driven by
particular applications
For mathematical reasons and for AI reasons
Number Theory
First big success for HR
Encyclopedia of Integer sequences
Coming up to 100,000 sequences
Such as primes, squares, odds, evens, fibonacci
HR invented 20 new sequences
Which are now in the encyclopedia
All supplied with reasons why they are interesting
Some nice examples
Odd refactorable numbers are perfect squares
Sum of divisors is prime  number of divisors is prime
Spin-off Systems
NumbersWithNames
www.machine-creativity.com/programs/nwn
Data-mines a subset of the encyclopedia
E.g., Perfect numbers are pernicious
HOMER
Available online soon
HR, Otter and Maple combination
Maple background file, HR forms conjectures, Otter acts as a
filter (provable theorems in the bin)
Interacts with the user
Demonstration
Graph Theory
Siemion Fajtlowicz & Ermelinda Delavina
Graffiti program and the writing on the wall
Scores of papers written about the conjectures
Including by Paul Erdos
MSc. Project of Noor Mohamadali
Use HR to re-discover Graffiti’s conjectures
Very successful
Found all the ones we expected it to, and some others
which Graffiti hasn’t found
Currently being proved/disproved by the AGX program
Pierre Hansen and Giles Caporossi
Algebraic Domains
Otter/Mace are pretty good in these domains
HR can be completely bootstrapping
Start from the axioms of the domain alone
Building of classification trees
HR used in a predictive induction way
Find classifying concepts (distinguish pairs of algebras)
Part of a larger system which verifies the results: IJCAR04
Example Classification
Groups of size eight can be classified using their
self inverse elements (x-1=id)
“They will either have (i) all self inverse elements (ii)
an element which squares to give a non-self inverse
element (iii) no self-inverse elements which aren't
also commutators (iv) a self inverse element which
can be expressed as the product of two noncommutative elements or (v) none of these
properties.”
Classification tree produced for loops of size 6
109 isomorphism classes
AI Applications
Machine learning
Used for bioinformatics datasets
Constraint solving
Reformulation of CSPs (implied constraints)
E.g., QG3-quasigroups are anti-Abelian
Automated reasoning
Producing theorems for TPTP library
40,000 possible theorems in 10 minutes
Around 100 accepted into the library
“Proving” non-theorems
The TM System
Inspired by Lakatos’ philosophy of maths
PhD project of Alison Pease
Starts with a non-theorem
Uses Mace to:
Find counterexamples and supporting examples
Uses HR to find concept which is true of a subset
of counters or supporting examples
Adds appropriate axioms
Uses Otter to prove reformulated theorem
Example from TM
Non-theorem from TPTP library:
In ring theory, the following identity is true
 w x ((((w*w)*x)*(w*w))=id)
Mace found 7 supporting examples and 6
falsifying examples
HR found a single concept true of 3 positives:
¬( b (b*b=b & ¬(b+b=b)))
“In rings for which, for all b, b*b=b implies
that b+b=b, the identity is true”
Proved by Otter, hand proof by Roy McCasland
Vision Application
Work with Paulo Santos and Derek Magee
Vision data:
Learning the rules of a dice game
Context of autonomous agents
Learning/adapting from observations alone
Ask Paulo et. al about the vision side!
First serious application of HR’s handling of noisy,
incomplete data
First quantitative comparison of HR and Progol
(predictive ILP system)
Progol Setup
State and successor data transformed to:
Transition/4 predicates
E.g., trans([c1],[c1,c2],t101,t103)
Mode declarations chosen wisely
Application of a language bias
Rules extracted from the answer to a positivesonly learning problem:
Find a set of rules to cover the trans/4 predicate
Also a meta-level application of Progol
See Santos et. al ECAI paper
HR Setup
State and successor data transformed to:
state0/1, state1/2 and state2/2 predicates
(HR cannot handle lists)
Various different setups tried with HR
Using match, compose, exists, split PRs
Using near-implications at 50%, 70%, 100%
Using different search strategies
BFS, split-first, compose-last
Rules extracted from the theory produced
i.e., no explicit problem to solve
Experiments
8 different sessions from vision system
Plus one with all the data
Looking for 18 rules of the game
Empty becomes 1-state
1-state becomes 2-state
2 equal dice faces are both removed
Otherwise, the smallest of the two is removed
15 instantiations of this rule
Recorded how many rules each system found
Also recorded the coverage of the 15 rules
Found in the data
Sensitivity Results
HR Results
First application of HR with uncertain data
Wanted to assess it’s performance
Altering parameters
Affected sensitivity in entirely expected ways
Increasing these increases sensitivity:
Number of steps
Complexity limit (but not too much)
Decreasing percentage match increases sensitivity
HR able to pick up conjectures which are only 65% true
Altering the search strategy had little effect
Drawbacks to Approaches
Predictive approach
Problem may be solved without finding some game rules
Due to over-generalisation with little data
Need to know the mode declarations in advance
In general, rules will require the solution of many different
prediction problems
Descriptive approach
Can take a long time to find rules (10x slower)
In general, rules will not be so simple to find
Can produce a glut of information from which it’s difficult
to extract the pertinent rules
Descriptive best in theory, predictive slightly better
in practice (at least for this data set, but we’ll see…)
Proposed Combination
Can we get the best of both worlds?
Idea:
Use HR for an initial theory formation exploration
Look at some rules found
Extract a general format for them
Turn general rules into mode declarations
Use Progol to solve individual problems
In practice, HR may take too long to work
But the process would be entirely automatic
Conclusions
HR shown good abilities for discovery in
pure mathematics and AI applications
Characterised by lack of noise, concrete axioms
Not clear that the application to discovery in
noisy, incomplete data sets will be as good
Future Work
Applying for grants
To continue the application to CSP reformulation
With Ian Miguel
To continue the Lakatos approach, possibly with
applications to software verification
With Alison Pease
My focus is on integration of
Learning, constraint solving, automated reasoning
Application to creative tasks in
Mathematics, bioinformatics and vision!