Automated Puzzle Generation

Download Report

Transcript Automated Puzzle Generation

Automated Puzzle Generation
Simon Colton
Universities of Edinburgh and York
Background
• Train journey with Jeremy Gow
–
–
–
–
–
To meet Herb Simon
Puzzle generation rather than problem solving
Wrote some puzzles for Jeremy
Jeremy kept getting the “wrong” answer
Puzzle generation is a difficult task
• Reviewer’s comment
– View puzzles independently of implementation
Some Example Puzzles
• Which is the odd one out?
– Hair, triangles, squares, plants, words, trees
– Answer: triangles (others have roots)
• Jingle is to corporation as ? Is to politician
– Campaign, platform, slogan, promises
– Answer: slogan
• What is next in the sequence
– 4, 3, 6, 6, 2, 9 ?
– Answer: later
Overview of What’s Needed
• Structure for puzzles
– Characterisation of puzzles
• Puzzles must have single solutions
– Theory formation helps here
• Puzzles must be of correct difficulty
– Methods for disguising the answer
Queendom.com Examples
• What’s the odd one out?
– Coconuts, oysters, clams, eggs, walnuts, haddock
– A: haddock (the others have shells)
• Hair is to stubble as potatoes are to ?
– F.fries, sweet potatoes, potato skins, vegetable
– A. French fries
• What’s next in the sequence
– 3, 8, 15, 24, 35?
– A: 48 (square integers and subtract 1)
A Characterisation of Puzzles
• Three (of many) types of puzzle are:
– Odd one out, analogy, next in sequence
• Have (almost) the same structure:
– Question statement
– Set of choices, one of which is answer
– Solution which is an embedded concept
• Some tweaking necessary to make a fit
– Next in sequence puzzles have no choices
– Analogy puzzles have no solution concept
Solutions to Puzzles
• Solution is a single embedded concept
– Fairly simple and positively stated
• Which is the odd one out: 4, 9, 8, 36?
– A: 9 (even numbers), A: 8 (square numbers)
– Puzzle is unsatisfying if there are two answers
• Which is the odd one out: 2, 3, 9, 20?
– A: 9 (it is a square number)
• Which is the odd one out: 23, 25, 27, 29?
– A: 27 (others are primes or squares)
The Difficulty of Puzzles
• Embedded concept is usually not complex
– Probably in order to ensure single solution
• Number of possible answers
– Increases the search space for answer
– Could make the problem easier
• Disguising concepts
– Odd one out: haddock puzzle, they’re all foodstuffs
– Next in sequence (from queendom): 2, 7, 4, 14, 6?
– Another concept interleaved (or stuck on)
The HR Program
• Automated theory formation
– Concepts (ex. & def.), conjectures, proofs
– Theory is a collection of concepts (in this case)
• Concept formation via 8 production rules
– Builds new concepts from old ones
– Compose,disjunct,exists,forall,match,negate,size,split
• Complexity of a concept:
– Number of production rule steps
• Specialisation concepts important
– Specialistion of objects of interest (e.g., prime nums)
Extension for Puzzles (General)
• HR generates theory, then builds puzzles
– Embed each concept, make all puzzles, choose rep.
• From characterisation of solution:
– Don’t use negate or disjunct production rules in ATF
• From single solution:
– Exhaust theory up to a complexity limit
– Check for alternative solutions and discard
• From difficulty consideration
– Present puzzles in order of conc. complexity, disguise
– Actively add disguise where possible
Extension for Puzzles (Special)
• User: chooses the number of possible answers (n)
– Answers are presented in random order
• Odd one out:
– Choose n positive and 1 negative example of spec. conc
– Check all other concepts for a different solution
• Next in sequence (only in domain of integers)
– Embed number type (e.g. primes, 2, 3, 5, 7, ?)
– Embed function (e.g. number of divisors, 1, 2, 2, 3, ?)
– Actively disguise by interleaving simple seq.
• Analogy: A is to B as C is to: D, E, F, G?
– A, B, C and D share spec. property, E, F and G do not
Experiment 1: Animals
• Animals dataset (distributed with Progol)
– 18 animals (dog, platypus, snake, eagle, etc.)
– 12 properties (class, homeothermic, eggs, etc.)
• Theory formation up to comp. limit 5
– Compose, exists, forall, match, size, split
• Asked for all odd one out & analogy puzzles
– User specifies: 4 answers possible
Animals Results
• 31 puzzles about animals formed
• Good examples
[15] Which is OOO: penguin, ostrich, cat, bat?
[31] Eel is to platypus as shark is to snake,eagle,turtle,lizard?
• Bad example
[27] Cat is to dog as eagle is to lizard, eel, ostrich, trout?
• Observations:
– Low complexity of concepts, little disguise found
– Need more examples of animals
• Conclusion:
– Single solutions worked OK, but fairly easy to solve
Experiment 2: Integer Sequences
• Integers 1 to 30 provided
– Addition, multiplication, digits, divisors
– Compose, exists, match, size, split
• Theory formed up to complexity 4
• Disguise simple concepts (comp. < 3)
– By interleaving other simple concepts
• All next in sequence puzzles asked for
– User specifies: 6 terms of the sequence given
Sequences Results
•
•
24 next in sequence puzzles generated
Good examples:
[2] 4, 3, 6, 6, 2, 9, ? [numdiv, 27, mult. of 3]
[3] 21, 3, 24, 6, 27, 9, ? [mult 3, mult 3]
[10] 21, 22, 24, 25, 26, 28, ? [digit is a div]
•
Bad examples:
[20] 6, 0, 2, 0, 4, 0, ? [# even divisors of 24, …]
[22] 11, 12, 12, 13, 13, 14, ?
•
Observations
–
–
Functions should start earlier on number line
Embedded concepts are in general too complex
Remarks about Creativity
• Setter: creative act is finding concept/examples
• Solver: creative act is finding the answer/solution
• Having a single solution:
– Want the solver to be P-creative, not H-creative
• Difference between answer and solution
– IQ tests: interested in answer, not solution
• More will come to light after field testing
– Comments very welcome
Conclusions and Future Work
• Characterisation of puzzles
– Single pos. simp. solution, difficulty (disguise)
• Puzzle generation can be automated
– Results not stunning, but still preliminary
• Puzzle generation needs improvement
• Also needs hand crafting of input files
• More answers/questions about puzzle
solver/setter creativity
– After a field test of HR’s puzzles