Transcript Crossword
Combining KR and search:
Crossword puzzles
Next: Logic representations
Reading: C. 7.4-7.8
1
Changes in Homework
Mar 4th: Hand in written design, planned
code for all modules
Mar 9th: midterm
Mar 25th: Fully running system due
Mar 30th: Tournament begins
2
Changes in Homework
Dictionary
Use dictionary provided; do not use your own
Start with 300 words only
Switch to larger set by time of tournament
Representation of dictionary is
important to reducing search time
Using knowledge to generate word
candidates could also help
3
Midterm Survey
Start after 9AM Friday and finish by
Thursday, Mar. 4th
Your answers are important: they will
affect remaining class structure
4
Crossword Puzzle Solver
Proverb: Michael Litman, Duke Univ
Developed by his AI class
Combines knowledge from multiple
sources to solve clues (clue/target)
Uses constraint propogation in
combination with probabilities to select
best target
5
Algorithm Overview
Independent programs specialize in different
types of clues – knowledge experts
Information retrieval, database search, machine learning
Each expert module generates a candidate list
(with probabilities)
Centralized solver
Merges the candidates lists for each clue
Places candidates on the puzzle grid
6
Performance
Averages 95.3% words correct and
98.1% letters correct
Under 15 minutes/puzzle
Tested on a sample of 370 NYT puzzles
Misses roughly 3 words or 4 letters on a
daily 15X15 puzzle
7
Questions
Is this approach any more intelligent
than the chess playing programs?
Does the use of knowledge correspond
to intelligence?
Do any of the techniques for generating
words apply to Scrabble?
8
9
To begin: research style
Study of existing puzzles
How hard?
What are the clues like?
What sources of knowledge might be helpful?
Crossword Puzzle database (CWDB)
350,000 clue-target pairs
>250,000 unique pairs
= # of puzzles seen over 14 years at rate of one
puzzle/day
10
How novel are crossword
puzzles?
Given complete database and a new
puzzle, expect to have seen
91% of targets
50% of clues
34% of clue target pairs
96% of individual words in clues
11
12
Categories of clues
Fill in the blank:
28D: Nothing ____: less
Trailing question mark
4D: The end of Plato?:
Abbreviations
55D: Key abbr: maj
13
Expert Categories
Synonyms
40D Meadowsweet: spiraea
Kind-of
27D Kind of coal or coat: pea
“pea coal” and “pea coat” standard phrases
Movies
50D Princess in Woolf’s “Orlando”: sasha
Geography
59A North Sea port: aberdeen
Music
2D “Hold Me” country Grammay winner, 1988: oslin
Literature
53A Playwright/novelist Capek: karel
Information retrieval
6D Mountain known locally as Chomolungma: everest
14
15
16
17
Candidate generator
Farrow of “Peyton Place”: mia
Movie module returns:
0.909091 mia
0.010101 tom
0.010101 kip
0.010101 ben
0.010101 peg
0.010101 ray
18
19
20
Ablation tests
Removed each module one at a time,
rerunning all training puzzles
No single module changed overall percent
correct by more than 1%
Removing all modules that relied on CWDB
94.8% to 27.1% correct
Using only the modules that relied exclusively
on CWDB
87.6% correct
21
Word list modules
WordList, WordListBig
Ignore their clues and return all words of correct length
WordList
655,000 terms
WordListBig
WordList plus constructed terms
First and last names, adjacent words from clues
2.1 million terms, all weighted equally
5D 10,000 words, perhaps: novelette
Wordlist-CWDB
58,000 unique targets
Returns all targets of appropriate length
Weights with estimates of their “prior” probabilities as targets of
arbitrary clues
Examine frequency in crossword puzzles and normalize to account for bias
caused by letters intersecting across and down terms
22
CWDB-specific modules
Exact Match
Returns all targets of the correct length associated with the clue
Example error: it returns eeyore for 19A Pal of Pooh: tigger
Transformations
Learns transformations to clue-target pairs
Single-word substitution, remove one phrase from beginning or end
and add another, depluralizing a word in clue, pluralize word in target
Nice X <-> X in France
X for short <-> X abbr.
X start <-> Prefix with X
X city <-> X capital
51D: Bugs chaser: elmer, solved by Bugs pursuer: elmer and the
transformation rule X pursuer <-> X chaser
http://www.oneacross.com
23
Information retrieval modules
Encyclopedia
For each query term, compute distribution of terms
“close” to query
Counted 10-k times every times it apears at a distance of
k<10 from query term
Extremely common terms (as, and) are ignored
Partial match
For a clue c, find all clues in CWDB that share words
For each such clue, give its target a weight
LSI-Ency, LSI-CWDB
Latent semantic indexing (LSI) identifies correlations
between words: synonyms
Return closest word for each word in the clue
24
Database Modules
Movie
www.imdb.com
Looks for patterns in the clue and formulates query to database
Quoted titles: 56D “The Thief of Baghdad” role: abu
Boolean operations: Cary or Lee: grant
Music, literary, geography
Simple pattern matching of clue (keywords “city”, “author”,
“band”, etc) to formulate query
15A “Foundation of Trilogy” author: asimov
Geography database: Getty Information Institute
25
Synonyms
WordNet
Look for root forms of words in the clue
Then find variety of related words
49D Chop-chop: apace
Synonyms of synonyms
Forms of related words converted to forms of
clue word (number, tense)
18A Stymied: thwarted
Is this relevant to Scrabble?
26
Syntactic Modules
Fill-in-the-blanks
>5% clues
Search databases (music, geography, literary and
quotes) to find clue patterns
36A Yerby’s “A Rose for _ _ _ Maria”: ana
Kindof
Pattern: for _ _ _ Maria
Allow any 3 characters to fill the blanks
Pattern matching over short phrases
50 clues of this type
“type of” (A type of jacket: nehru)
“starter for” (Starter for saxon: anglo)
“suffix with” (Suffix with switch or sock: eroo
27
Implicit Distribution Modules
Some targets not included in any database,
but more probable than random
Schaeffer vs. srhffeeca
Bigram module
Generates all possible letter sequences of the given length
by returning a letter bigram distribution over all possible
strings, learned from CWDB
Lowest probability clue-target, but higher probability than
random sequence of letters
Honolulu wear: hawaiianmuumuu
How could this be used for Scrabble?
28
Questions
Is this approach any more intelligent
than the chess playing programs?
Does the use of knowledge correspond
to intelligence?
Do any of the techniques for generating
words apply to Scrabble?
29