Name of presentation - Columbia University

Download Report

Transcript Name of presentation - Columbia University

NLP And The Semantic Web
Dainis Kiusals
[email protected]
COMS E6125
Spring 2010
Natural Language Processing
• 1950s and 1960s – researchers began developing
techniques aimed at understanding the ways computers
could be used to provide Natural Language Processing.
• The ability to capture context was studied by Noam
Chomsky. His theory is based upon the use of Generative
grammars - constructs used to describe how a sentence is
formed, which may be used to create formal grammars
through which an input stream of words may be parsed as a
first step toward extracting their meaning [1]
[sentence]
[noun phrase], [verb phrase]
[determiner], [noun], [verb], [article], [adjective], [adverb]
NLP Issues / Challenges
o Morphology – different forms of words(singular/plural, tense)
o Syntax – grammatical structure(verbs, nouns)
o Spelling – different spelling(and misspelling) of words
o Text Segmentation – identifying word boundaries
o Word Sense Disambiguation – multiple word meanings
The company is ready to sell.
color/colour, organize/organise
bow (bend forward, weapon, ribbon, front of ship)?
runs, ran, running
Semantic Web
• Proposed by Tim Berners-Lee
(W3C Director) as a method
for adding concepts via
semantic annotation to Web
content.
• W3C standardizing the RDF
and OWL protocols.
• At lowest level, concepts
stored as triples, defined at
higher levels by ontologies.
[3]
[2]
Keyword Search
Queries are only
processed as statistical
analysis of keyword
appearance in
documents, with some
advanced logical
features.
Does not distinguish
between different
interpretations of a
word in given context in
searched data (corpus)
– search results might
contain different uses
of a word.
[4]
NLP / Semantic Search
• Increased relevancy of results vs. keyword search.
– Longer query phrases and questions yield better results.
– Makes use of semantic information to attain better results.
• Users need to change (used to keywords).
– NLP Search pages need to encourage use of complex queries.
• Web Search vs. Enterprise Search?
– NLP Search may be better suited for smaller size domains.
• Top-Down or Bottom-Up approach?
– Top-Down approach relies more on NLP processing.
– Creating the Semantic Web (Bottom-Up) will be more costly.
NLP / Semantic Web Relationship
• NLP and the Semantic Web compliment each other
and will grow together.
• As Semantic Web (RDL and OWL) annotation is
added to Web pages, NLP search engines can take
advantage of this information.
• NLP processes can be used to automate the
generation of content to be used to populate new
Semantic Web annotation.
• Global and domain-specific ontologies (which
represent concepts and their relationships) combined
with NLP techniques define the search process.
Case Study:
1. Founded in San Francisco in 2005 with a goal to create a
NLP Search Engine.
2. In 2007 obtained exclusive rights to several decades of
Xerox/PARC NLP research.
3. Launched first public software beta in May 2008 – NLP
search website covering approx. 2.5 million Wikipedia web
pages (also referenced Freebase).
4. Created innovative user interface which leveraged
NLP/semantic search results (ex: highlighting of relevant
phrases/sentences within a larger document).
5. Two months after public beta was acquired by Microsoft in
order to be incorporated into the Bing! Search engine.
NLP Search Companies
Resources
1. An Executive's Guide to Information Technology: Principles,
Business Models and Terminology by Robert
Plant and Stephen Murrell, Cambridge University Press,
2007
2. Enterprise 2.0 Implementation, Chapter 13 by Aaron C.
Newman and Jeremy Thomas McGraw-Hill/Osborne, 2009
3. Encyclopedia of Knowledge Management, RDF and OWL
by David G. Schwartz (ed) IGI Global, 2006
4. Semantic Knowledge Management: An Ontology-Based
Framework by Antonio Zilli (ed) et al. IGI Global, 2009
5. http://www.parc.com/work/focus-area/NLP/
6. http://www.powerset.com/
Resources/information taken from Full Paper submitted 3/12/10.