Machine Learning in Natural Language Processing

Download Report

Transcript Machine Learning in Natural Language Processing

Machine Learning in
Natural Language Processing
Noriko Tomuro
November 16, 2006
What is NLP?
 Natural Language Processing (NLP) is a
field in Computer Science devoted to
creating computers that use natural
language as input and/or output.
NLP is AI-complete
 “The most difficult problems in AI
manifest themselves in human
language phenomena.”
 Use of language is the touchstone
of intelligent behavior.
NLP Tasks


NLP can be stand-alone applications or
components embedded in other
systems.
NLP essentially is to analyze (natural
language) sentences, linguistically. It
involves (minimally):
1.
2.
3.
Determining the part-of-speech category of
words (e.g. noun, verb).
Identifying the syntactic structure of a
sentence.
Deriving the meaning of the sentence.
NLP Applications


Speech Recognition
Sentence Analysis



Dialogue Systems, e.g.





Parsing for syntactic analysis
Word sense disambiguation for semantic
analysis
Tutoring systems
Question-Answering Systems
Machine Translation
Automatic Text Summarization
(e.g. Newsblaster system at U of Columbia)
and much more
Part-Of-Speech (POS)

Identifying POS is as easy as one would
think – because of ambiguity.


“He authored the book.”
 “he” – DET
 “author” – N or V
 “the” – DET
 “book” – N or V
Without knowing the structure (or
sometimes the meaning) of the sentence,
ambiguous categories cannot be correctly
disambiguated.
Stemming is NOT NLP


Note that stemming (e.g. Porter
stemmer) is a technique in Information
Retrieval; it’s NOT NLP.
Stemming only removes the common
morphological and inflectional endings
from words.



“police”, “policy”, “policies” => “polic”
“went” => “went”
Whereas proper NLP analysis yields:


“police” => “police”, “policies” => “policy”
“went” => “go”
Syntactic Structure

Grammar rules (often in CFG) specify
the grammatically correct ways to
form phrases (and a sentence).
S
Grammar
R0:
R1:
R2:
R3:
R4:
R5:
R6:
R7:
S  NP VP
NP  Det N
VP  VG NP
VG  V
NP  " John"
V  " ate"
Det  " the"
N  " cake"
NP
VP
V
NP
“John” “ate” Det
N
“the” “cake”
Parsing Algorithms


Computational complexity (with no
optimization) is exponential.
Various parsing algorithms:

Top-down Parsing -- (top-down) derivation
Bottom-up Parsing
Chart Parsing
Earley’s Algorithm – most efficient, O(n3)
Left-corner Parsing – optimization of Earley’s

and lots more…




Demo using my CF parser
Semantics


Derive the meaning of a sentence.
Often applied on the result of syntactic
analysis.
“John ate
the cake.”
NP
V
NP
((action INGEST)
(actor
JOHN-01)
(object FOOD))

; syntactic verb
; syntactic subj
; syntactic obj
To do semantic analysis, we need a
(semantic) dictionary (e.g. WordNet,
http://www.cogsci.princeton.edu/~wn/).
NLP is Hard…


Understanding natural languages is hard
… because of inherent ambiguity
Engineering NLP systems is also hard …
because of
 Huge amount of resources needed
(e.g. grammar, dictionary, documents to
extract statistics from)

Computational complexity
(intractable) of analyzing a sentence
Ambiguity


At last, a computer that
understands you like your mother.”
Three possible (syntactic)
interpretations:
1. It
understands you as well as your
mother understands you.
2. It understands that you like your
mother.
3. It understands you as well as it
understands your mother.
Empirical Approaches to NLP



Formal analysis of natural languages
is too hard and computationally
heavy.
Abandon complete/correct solutions,
and shoot for “approximate”
solutions by using cheaper/faster
techniques.
Collect data from documents
(corpus), rather than building linguistic
resources manually.
Part-Of-Speech Tagging



Typically, from a corpus where words are
already tagged with POS categories, collect
statistics of n-word sequences (n-gram)
=> probabilities of POS sequences.
For a given untagged/test sentence, assign
the most probable POS tag for each word
starting from the beginning, based on the
POS tags of the preceding words.
Popular techniques:



Hidden Markov Model (HMM)
– with parameter
estimation using Viterbi, EM algorithms
Transformation Rules (e.g. Brill tagger)
Various ML classification algorithms
Probabilistic Parsing



For ambiguous sentences, we’d like to
know which parse tree is more likely than
others.
So we must assign probability to each
parse tree … but how?
A probability of a parse tree t is
P(t )   p(r )
r
where r is a rule used in t.
and p(r) is obtained from a (annotated) corpus.

Again, parameter estimation by using
various ML techniques.
Partial (Chunk) Parsing



Abandon full parsing; instead aim to
obtain just phrases (chunks).
Build a flat structure (instead of a hierarchical tree
by full parsing).
Each chunk is identified by a regular
expression – a finite-state automaton.
=> Polynomial time complexity

Then a chunker is a cascade of finite-state
automata.
Semantic Analysis

Natural language sentences are
ambiguous, largely because a word often
has several meanings/senses.
 e.g. “bass” (n) has two senses:
•
1.
a type of fish
2.
tones of low frequency
Which sense is used?



“The bass part of the song is very moving.”
“I went fishing for some sea bass.”
It’s easy for humans, but not for
computers.
Word Sense Disambiguation



A task of assigning the proper sense to
words in a sentence (thus a classification task).
Using a training corpus annotated with
proper senses, obtain statistics of n-word
sequences (their word senses).
Apply classification algorithms, such as:




Decision Tree
Support Vector Machine
Conditional Random Fields (CRF)
Difficulty with data sparseness. Techniques:


Smoothing
Backoff models
Other NLP’ish ML Tasks

Text Categorization


Multi-lingual Retrieval



Enter a query in one language, and retrieve
relevant documents of ANY language.
Machine Translation part is done by ML.
Automatic construction of domain
ontology
Build a conceptual hierarchy of a specific
domain
and many, many more!


Classify a document into one of the document
categories (See textbook on Naïve Bayes).
Analyzing Web Documents

Recently there have been many NLP
applications which analyze (not just retrieve)
web documents
 Blogs – for semantic analysis, sentiment
(polarity/opinion) identification

Email Spam Filtering – but most often
systems utilize simple word probability

A general approach “Web as a corpus”
– web as the vast collection of documents