Learning to Parse Database Queries Using Inductive
Download
Report
Transcript Learning to Parse Database Queries Using Inductive
Learning to Parse Database
Queries Using Inductive Logic
Programming
施林锋
南京大学计算机科学与技术系
Outline
• Introduction
• Learning to Parse DB Queries
• Overview of CHILL
• Parsing DB Queries
• Experimental
• Future work & Conclusions
• References & Related articles
Introduction
• Empirical or corpus-based methods for constructing natural language
systems replace hand-generated
• Statistical and probabilistic to constructing parsers
• Stochastic grammars(Black,Lafferty, …)
• Transition networks(Miller et al.)
• Acid test for empirical methods
• Construction of better natural language systems
• The author is aim to use CHILL to engineer a natural language frontend for a database-query task.
Overview of CHILL
• CHILL: Constructive Heuristics Induction for Language Learning
• CHILL is a general approach to the problem of inducing natural
language parsers.
• Chill use inductive logic programming to learn a deterministic shiftreduce parser written in Prolog.
• Input:
• A set of training instances <sentence, desired parses>
• Output:
• Shift-reduce parser maps sentence to parses
Overview of CHILL
Parsing DB Queries
• Example
• What is the capital of the state with the largest population?
answer(C, (capital(S,C),largest(P,(state(S),population(S,P))))).
• What are the major cities in Kansas?
answer(C, (major(C), city(C), loc(C, S), equal(S, stateid(Kansas))))
• Query language
• Logical form
• More straightforward from natural language utterances than SQL
Parsing DB Queries
• Database
• United States geography database system
• An existing natural language interface called Geobase
• Geobase contains 800 Prolog facts about state, capital city, population, area,
major rivers, major cities, highest and lowest points
Parsing DB Queries
• Query language – Geoquery
• Basic Objects
Parsing DB Queries
• Query language – Geoquery
• Basic relations (right)
• Meta-predicate (left)
Expreimental
• 250 sentences with its parses
• Question pattern:
•
•
•
•
which states | where is | what be/states/rivers (totally 203)
how many/long/large/high (totally 41)
give me…
name the rivers in arkansas (totally 6)
• Mainly ask states,rivers,city,population attach with superlative
Expreimental
• Random splits
• 225 training example, 25 test
• 10 fold cross validation
• Sentence use CHILL to produce query, then executed the query
• Evaluation
• Same answer score correct, otherwise false
Expreimental
• Result
• CHILL outperforms the existing
system when trained on 175
or more examples
• Two different failure
• Wrong parses
• Wrong answer
Conclusions & Future work
• Conclusions
• CHILL parsers outperform an existing system
• Empirical approach is important to NLP application
• Future work
• Much larger corpora and other domain
• Extent to which performance can be improved by corpus “manufacturing”
References & Related articles
• Zelle J M, Mooney R J. Learning semantic grammars with constructive
inductive logic programming[C]//AAAI. 1993: 817-822.
• Zelle J M, Mooney R J. Inducing deterministic Prolog parsers from
treebanks: A machine learning approach[C]//AAAI. 1994: 748-753.
• Zettlemoyer L S, Collins M. Learning to map sentences to logical form:
Structured classification with probabilistic categorial grammars[J].
arXiv preprint arXiv:1207.1420, 2012.
• Q&A