University of North Texas Mean Green Athletics

Download Report

Transcript University of North Texas Mean Green Athletics

A Natural Language Interface
for Crime-related Spatial Queries
Chengyang Zhang, Yan Huang, Rada Mihalcea, Hector Cuellar
Department of Computer Science and Engineering
University of North Texas
ISI 2009 Presentation
Outline
 Motivation
• Related Work
• Proposed Method
• System Evaluation
ISI 2009
Motivation
• The databases and query interfaces hosted by Federal and state justice
departments are heterogeneous and complicated.
ISI 2009
Motivation
• Need tools for crime-related spatial queries.
Find a house in
neighborhood with
low crime rate2
Find a police
office near
the school1
ISI 2009
Motivation
• Neither web forms nor keyword search has the expressive power and
flexibility desired in crime-related spatial queries.
• But natural language does!
•
No need for training
•
No need for proprietary user interface or esoteric formal language like SQL or Xquery
•
Ideal for ad-hoc real time query in emergency conditions
ISI 2009
Our Contributions
 We propose a method to translate crime-related natural language spatial
queries into spatial data queries
 We implement a prototype query system
 Experiments show that the system achieves results significantly better
than those obtained by using Google Maps.
ISI 2009
Outline
 Motivation
 Related Work
• Proposed Method
• System Evaluation
ISI 2009
Related Work
• Syntax-based methods[3-4] use template or grammar rules to match
natural language sentences into database schemas
•
Simple but not scalable
•
Sometimes may lead to serious errors
• Semantic Parsing algorithms[5-9] preserve syntactic dependencies, but also
seek to enforce semantic constraints over the possible mappings
•
The quality of mapping is significantly improved
•
Precise system in [9] focused on high precision only
CSCE 5290
Related Work
• Lambda-calculus encoding can be used as the intermediate
representation between natural language and database queries.[10]
•
Training corpus is used to derive lexicons and grammars for the specific domain
•
The approach was found to lead to good results
• Structure of XML documents can be used to match natural language parse
trees.[11]
•
Identify a meaningful lowest common ancestor structure (MLCAS) from the tree
structure
•
Includes an interactive component to receive help from the user when formulating the
query
CSCE 5290
Outline
 Motivation
 Related Work
 Proposed Method
• System Evaluation
ISI 2009
System Framework
ISI 2009
1. Part of Speech Tagging
• In POS tagging, we employ the classic Viterbi algorithm.
•
dynamic programming framework coupled with a Markov assumption
•
Efficient and widely used
•
Use manually labeled Penn Treebank Dataset for training purpose
Running Example:
I wish to find a police department within 2 miles of a law court
POS Tagging:
I/NP wish/VB to/IN find/VB a/DT police/NN department/NN
within/IN 2/CD miles/NNS of/IN a/DT law/NN court/NN
ISI 2009
2. Semantic Parsing
• In semantic parsing, we identify three type of “key words” using the
parsing tree.
•
Target object
•
Spatial predicate
•
Reference object
Example Parsing tree:
ISI 2009
2. Semantic Parsing
Running Example:
I wish to find a police department within 2 miles of a law court
Semantic Parsing:
Target Object: police department
Spatial predicate: within 2 miles
Reference object: law court
ISI 2009
3. Schema Matching
• In schema matching, we try to match target and reference spatial objects
from the backend spatial database using
•
Table name
•
Attribute name
•
Content of the database
• We then perform a spatial join for each retrieved candidate pair based on
spatial predicate
ISI 2009
Outline
 Motivation
 Related Work
 Proposed Method
 System Evaluation
ISI 2009
Query Interface
ISI 2009
Experimental Evaluation
• Database contains real spatial data obtained from City of Denton
•
32 tables
•
Including crime-related objects such as police office, law courts
• Gold standard: human prepared answers for 30 different crime-related
queries.
• Baseline: Top 10 answers from Google Maps
• Result:
ISI 2009
Summary
• We proposed a method to build a natural language interface to spatial
database queries. The prototype system demonstrated effectiveness of
our approach in crime-related spatial queries.
• In our future work, we plan to extend our system by increasing the
dataset size, and improving the accuracy of the tagging and parsing
algorithms. We will collect more user queries and improve the system
performance based on a larger evaluation dataset.
ISI 2009
References
1. http://maps.google.com/
2. http://maps.met.police.uk/
3. I. Androutsopoulos, G. Ritchie, and P. Thanisch, “Natural language interfaces to databases – an introduction,” Journal
of Natural Language Engineering, vol. 1, no. 1, 1995.
4. W. Woods, R. Kaplan, and B. Webber, “The Lunar sciences natural language information system,” Bolt Beranek and
Newmann, Tech. Rep.,1972.
5. R. Ge and R. J. Mooney, “A statistical semantic parser that integrates syntax and semantics,” in Proceedings of the
Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, MI, Jul. 2005, pp. 9–16.
6. R. J. Kate and R. J. Mooney, “Using string-kernels for learning semantic parsers,” in Proceedings of the 21st
International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational
Linguistics (COLING/ACL-06), Sydney, Australia, July 2006, pp. 913–920.
7. R. J. Mooney, “Learning for semantic parsing,” in Computational Linguistics and Intelligent Text Processing:
Proceedings of the 8th International Conference, CICLing 2007, Mexico City, A. Gelbukh, Ed. Berlin: Springer Verlag,
2007, pp. 311–324.
8. Y. Wong and R. J. Mooney, “Learning for semantic parsing with statistical machine translation,” in Proceedings of
Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics
Annual Meeting (HLT-NAACL-06), New York City, NY, 2006, pp. 439–446.
9. A. Popescu, A. Armanasu, and O. Etzioni, “Modern natural language interfaces to databases: Composing statistical
parsing with semantic tractability,” in Proceedings of the 20st International Conference on Computational Linguistics
(COLING 2004), Geneva, Switzerland, 2004.
10. L. Zettlemoyer and M. Collins, “Learning to map sentences to logical form: Structured classification with probabilistic
categorial grammars,” in Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI-05),
2005.
11. Y. Li, H. Yang, and H. Jagadish, “NaLIX: an interactive natural language interface for querying XML,” in Proceedings
of SIGMOD 2005, Baltimore, MD, 2005.
ISI 2009