Parts of Speech

Download Report

Transcript Parts of Speech

Parts of Speech
Sudeshna Sarkar
7 Aug 2008
1
Why Do We Care about Parts of Speech?
•Pronunciation
Hand me the lead pipe.
•Predicting what words can be expected next
Personal pronoun (e.g., I, she) ____________
•Stemming
-s means singular for verbs, plural for nouns
•As the basis for syntactic parsing and then meaning extraction
I will lead the group into the lead smelter.
•Machine translation
• (E) content +N  (F) contenu +N
• (E) content +Adj  (F) content +Adj or satisfait +Adj
2
What is a Part of Speech?
Is this a semantic distinction? For example, maybe Noun is the
class of words for people, places and things. Maybe Adjective
is the class of words for properties of nouns.
Consider:
green book
book is a Noun
green is an Adjective
Now consider:
book worm
This green is very soothing.
3
How Many Parts of Speech Are There?
A first cut at the easy distinctions:
Open classes:
•nouns, verbs, adjectives, adverbs
Closed classes: function words
•conjunctions: and, or, but
•pronounts: I, she, him
•prepositions: with, on
•determiners: the, a, an
4
Part of speech tagging
8 (ish) traditional parts of speech
Noun, verb, adjective, preposition, adverb, article,
interjection, pronoun, conjunction, etc
This idea has been around for over 2000 years (Dionysius
Thrax of Alexandria, c. 100 B.C.)
Called: parts-of-speech, lexical category, word classes,
morphological classes, lexical tags, POS
We’ll use POS most frequently
I’ll assume that you all know what these are
5
POS examples
N
V
ADJ
ADV
P
PRO
DET
noun
verb
adj
adverb
preposition
pronoun
determiner
chair, bandwidth, pacing
study, debate, munch
purple, tall, ridiculous
unfortunately, slowly,
of, by, to
I, me, mine
the, a, that, those
6
Tagsets
Brown corpus tagset (87 tags):
http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html
Penn Treebank tagset (45 tags):
http://www.cs.colorado.edu/~martin/SLP/Figures/ (8.6)
C7 tagset (146 tags)
http://www.comp.lancs.ac.uk/ucrel/claws7tags.html
7
POS Tagging: Definition
The process of assigning a part-of-speech or lexical
class marker to each word in a corpus:
WORDS
the
koala
put
the
keys
on
the
table
TAGS
N
V
P
DET
8
POS Tagging example
WORD
tag
the
koala
put
the
keys
on
the
table
DET
N
V
DET
N
P
DET
N
9
POS tagging: Choosing a tagset
There are so many parts of speech, potential
distinctions we can draw
To do POS tagging, need to choose a standard set of
tags to work with
Could pick very coarse tagets
N, V, Adj, Adv.
More commonly used set is finer grained, the “UPenn
TreeBank tagset”, 45 tags
PRP$, WRB, WP$, VBG
Even more fine-grained tagsets exist
10
Penn TreeBank POS Tag set
11
Using the UPenn tagset
The/DT grand/JJ jury/NN commmented/VBD on/IN
a/DT number/NN of/IN other/JJ topics/NNS ./.
Prepositions and subordinating conjunctions marked
IN (“although/IN I/PRP..”)
Except the preposition/complementizer “to” is just
marked “to”.
12
POS Tagging
Words often have more than one POS: back
The back door = JJ
On my back = NN
Win the voters back = RB
Promised to back the bill = VB
The POS tagging problem is to determine the
POS tag for a particular instance of a word.
13
How hard is POS tagging? Measuring
ambiguity
14
Algorithms for POS Tagging
•Ambiguity – In the Brown corpus, 11.5% of the word
types are ambiguous (using 87 tags):
Worse, 40% of the tokens are ambiguous.
15
Algorithms for POS Tagging
Why can’t we just look them up in a dictionary?
•Words that aren’t in the dictionary
http://story.news.yahoo.com/news?tmpl=story&cid=578&ncid
=578&e=1&u=/nm/20030922/ts_nm/iraq_usa_dc
•One idea: P(ti | wi) = the probability that a random hapax
legomenon in the corpus has tag ti.
Nouns are more likely than verbs, which are more likely
than pronouns.
•Another idea: use morphology.
16
Algorithms for POS Tagging - Knowledge
•Dictionary
•Morphological rules, e.g.,
•_____-tion
•_____-ly
•capitalization
•N-gram frequencies
•to _____
•DET _____ N
•But what about rare words, e.g, smelt (two verb forms, melt and past
tense of smell, and one noun form, a small fish)
•Combining these
• V _____-ing
I was gracking vs. Gracking is fun.
17
POS Tagging - Approaches
Approaches
Rule-based tagging
(ENGTWOL)
Stochastic (=Probabilistic) tagging
HMM (Hidden Markov Model) tagging
Transformation-based tagging
Brill tagger
•
Do we return one best answer or several answers and let later
steps decide?
•
How does the requisite knowledge get entered?
18
3 methods for POS tagging
1.
Rule-based tagging
Example: Karlsson (1995) EngCG tagger based on the
Constraint Grammar architecture and ENGTWOL lexicon
– Basic Idea:
 Assign all possible tags to words (morphological analyzer
used)
 Remove wrong tags according to set of constraint rules
(typically more than 1000 hand-written constraint rules,
but may be machine-learned)
19
3 methods for POS tagging
2. Transformation-based tagging
Example: Brill (1995) tagger - combination of rule-based and stochastic
(probabilistic) tagging methodologies
– Basic Idea:
 Start with a tagged corpus + dictionary (with most frequent tags)
 Set the most probable tag for each word as a start value
 Change tags according to rules of type “if word-1 is a determiner
and word is a verb then change the tag to noun” in a specific
order (like rule-based taggers)
 machine learning is used—the rules are automatically induced
from a previously tagged training corpus (like stochastic
approach)
20
3 methods for POS tagging
3. Stochastic (=Probabilistic) tagging
Example: HMM (Hidden Markov Model) tagging - a training
corpus used to compute the probability (frequency) of a
given word having a given POS tag in a given context
21
Hidden Markov Model (HMM) Tagging
Using an HMM to do POS tagging
HMM is a special case of Bayesian inference
It is also related to the “noisy channel” model in ASR
(Automatic Speech Recognition)
22
Hidden Markov Model (HMM) Taggers
Goal: maximize P(word|tag) x P(tag|previous n tags)
Lexical information
Syntagmatic information
P(word|tag)
word/lexical likelihood
probability that given this tag, we have this word
NOT probability that this word has this tag
modeled through language model (word-tag matrix)
P(tag|previous n tags)
tag sequence likelihood
probability that this tag follows these previous tags
modeled through language model (tag-tag matrix)
23
POS tagging as a sequence classification task
We are given a sentence (an “observation” or “sequence of
observations”)
Secretariat is expected to race tomorrow
sequence of n words w1…wn.
What is the best sequence of tags which corresponds to this
sequence of observations?
Probabilistic/Bayesian view:
Consider all possible sequences of tags
Out of this universe of sequences, choose the tag sequence which
is most probable given the observation sequence of n words
w1…wn.
24