ppt - UMIACS

Download Report

Transcript ppt - UMIACS

CMSC 723 / LING 645: Intro to
Computational Linguistics
September 15, 2004: Dorr
More about FSA’s,
Finite State Morphology (J&M 3)
Prof. Bonnie J. Dorr
Dr. Christof Monz
TA: Adam Lee
More about FSAs
Transducers
Equivalence of DFSAs and NFSAs
Recognition as search: depth-first, breadth-
search
Recognition using NFSAs
NFSA Recognition of “baaa!”
Breadth-first Recognition
of “baaa!”
should be q2
Regular languages
Regular languages are characterized by
FSAs
For every NFSA, there is an equivalent
DFSA.
Regular languages are closed under
concatenation, Kleene closure, union.
Concatenation
Kleene Closure
Union
Morphology
Definitions and Problems
– What is Morphology?
– Topology of Morphologies
Approaches to Computational Morphology
– Lexicons and Rules
– Computational Morphology Approaches
Morphology
The study of the way words are built up from
smaller meaning units called Morphemes
Abstract versus Realized
HOP +PAST  hop +ed  hopped  /hapt/
Syntax
Lexeme/Inflected Lexeme
Grammars
sentences
Morphology
Morpheme/Allomorph
Morphotactics
words
Phonology
Phoneme/Allophone
Phonotactics
letters
Phonology and Morphology
 Phonology vs. Orthography
 Historical spelling
– night, nite
– attention, mission, fish
 Script Limitations
– Spoken English has 14 vowels
• heed hid hayed head had hoed hood who’d hide how’d taught
Tut toy enough
– English Alphabet has 5
• Use vowel combinatios: far fair fare
• Consonantal doubling (hopping vs. hoping)
Syntax and Morphology
 Phrase-level agreement
– Subject-Verb
• John studies hard (STUDY+3SG)
– Noun-Adjective
• Las vacas hermosas
 Sub-word phrasal structures
–
–
–
–
conj
‫שבספרינו‬
prep
‫נו‬+‫ים‬+‫ספר‬+‫ב‬+‫ש‬
That+in+book+PL+Poss:1PL
noun
Which are in our books
article plural
poss
Topology of Morphologies
Concatenative vs. Templatic
Derivational vs. Inflectional
Regular vs. Irregular
Concatenative Morphology
 Morpheme+Morpheme+Morpheme+…
 Stems: also called lemma, base form, root, lexeme
– hope+ing  hoping
hop  hopping
 Affixes
–
–
–
–
Prefixes: Antidisestablishmentarianism
Suffixes: Antidisestablishmentarianism
Infixes: hingi (borrow) – humingi (borrower) in Tagalog
Circumfixes: sagen (say) – gesagt (said) in German
 Agglutinative Languages
– uygarlaştıramadıklarımızdanmışsınızcasına
– uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına
– Behaving as if you are among those whom we could not cause to become
civilized
Templatic Morphology
Roots and Patterns
‫ك ت ب‬
KTB
‫כת ב‬
? ‫مَ ? ? و‬
?‫??ו‬
‫مكتوب‬
‫כתוב‬
maktuub
written
ktuuv
written
Templatic Morphology:
Root Meaning
KTB: writing “stuff”
‫كتاب‬
book
‫مكتبة‬
library
‫مكتب‬
office
‫كتب‬
‫مكتوب‬
‫كاتب‬
write
letter
writer
‫כתב‬
‫מכתב‬
‫כתב‬
‫כתיב‬
spelling
‫כתובת‬
address
Derivational vs. Inflectional
Word Classes
– Parts of speech: noun, verb, adjectives, etc.
– Word class dictates how a word combines with
morphemes to form new words
Derivational morphology
Nominalization: computerization,
appointee, killer, fuzziness
Formation of adjectives: computational,
clueless, embraceable
CatVar: Categorial Variation Database
http://clipdemos.umiacs.umd.edu/catvar/
Inflectional morphology
Adds: Tense, number, person, mood, aspect
Word class doesn’t change
Word serves new grammatical role
Five verb forms in English
Other languages have (lots more)
Nouns and Verbs (in English)
Nouns have simple inflectional
morphology
– cat
– cat+s, cat+’s
Verbs have more complex morphology
Regulars and Irregulars
Nouns
– Cat/Cats
– Mouse/Mice, Ox, Oxen, Goose, Geese
Verbs
– Walk/Walked
– Go/Went, Fly/Flew
Regular (English) Verbs
Morphological Form Classes
Regularly Inflected Verbs
Stem
walk
merge
try
map
-s form
walks
merges
tries
maps
-ing form
walking
merging
trying
mapping
Past form or –ed participle
walked
merged
tried
mapped
Irregular (English) Verbs
Morphological Form Classes Irregularly Inflected Verbs
Stem
eat
catch
cut
-s form
eats
catches
cuts
-ing form
eating
catching cutting
Past form
ate
caught
cut
-ed participle
eaten
caught
cut
“To love” in Spanish
Computational Morphology
Finite State Morphology
– Finite State Transducers (FST)
Input/Output
Analysis/Generation
Computational Morphology
WORD
 cats
 cat
 cities
 geese
 ducks
STEM (+FEATURES)*
cat +N +PL
cat +N +SG
city +N +PL
goose +N +PL
(duck +N +PL) or
(duck +V +3SG)
 merging merge +V +PRES-PART
 caught
(catch +V +PAST-PART) or
(catch +V +PAST)
Building a Morphological
Parser
 The Rules and the Lexicon
–
–
–
–
General versus Specific
Regular versus Irregular
Accuracy, speed, space
The Morphology of a language
 Approaches
– Lexicon only
– Lexicon and Rules
• Finite-state Automata
• Finite-state Transducers
– Rules only
Lexicon-only Morphology
• The lexicon lists all surface level and lexical level pairs
• No rules …?
• Analysis/Generation is easy
• Very large for English
• What about Arabic or Turkish?
• Chinese?
acclaim
acclaim
acclaimed
acclaimed
acclaiming
acclaims
acclaims
acclamation
acclamations
acclimate
acclimated
acclimated
acclimates
acclimating
acclaim $N$
acclaim $V+0$
acclaim $V+ed$
acclaim $V+en$
acclaim $V+ing$
acclaim $N+s$
acclaim $V+s$
acclamation
acclamation
acclimate
acclimate
acclimate
acclimate
acclimate
$N$
$N+s$
$V+0$
$V+ed$
$V+en$
$V+s$
$V+ing$
Building a Morphological
Parser
 The Rules and the Lexicon
–
–
–
–
General versus Specific
Regular versus Irregular
Accuracy, speed, space
The Morphology of a language
 Approaches
– Lexicon only
– Lexicon and Rules
• Finite-state Automata
• Finite-state Transducers
– Rules only
Lexicon and Rules:
FSA Inflectional Noun Morphology
• English Noun Lexicon
reg-noun
Irreg-pl-noun
Irreg-sg-noun
plural
fox
cat
dog
geese
sheep
mice
goose
sheep
mouse
-s
• English Noun
Rule
Lexicon and Rules: FSA English Verb
Inflectional Morphology
reg-verb-stem
irreg-verb-stem
irreg-past-verb
past
past-part
pres-part
3sg
walk
fry
talk
impeach
cut
speak
spoken
sing
sang
caught
ate
eaten
-ed
-ed
-ing
-s
FSA for Derivational Morphology:
Adjectival Formation
More Complex
Derivational Morphology
Using FSAs for Recognition:
English Nouns and their Inflection
Morphological Parsing
 Finite-state automata (FSA)
–
–
Recognizer
One-level morphology
 Finite-state transducers (FST)
–
Two-level morphology
•
–
PC-Kimmo (Koskenniemi 83)
input-output pair
Terminology for PC-Kimmo
 Upper = lexical tape
 Lower = surface tape
 Characters correspond to pairs, written a:b
 If “a:a”, write “a” for shorthand
 Two-level lexical entries
 # = word boundary
 ^ = morpheme boundary
 Other = “any feasible pair that is not in this
transducer”
 Final states indicated with “:” and non-final states
indicated with “.”
Four-Fold View of FSTs
As a recognizer
As a generator
As a translator
As a set relater
Nominal Inflection FST
Lexical and Intermediate Tapes
Spelling Rules
Name
Rule Description
Example
Consonant Doubling
1-letter consonant doubled before -ing/-ed
beg/begging
E-deletion
Silent e dropped before -ing and -ed
make/making
E-insertion
e added after s,z,x,ch,sh before s
watch/watches
Y-replacement
-y changes to -ie before -s, -i before -ed
try/tries
K-insertion
verbs ending with vowel + -c add -k
panic/panicked
Chomsky and Halle Notation
ε→e/
x
s
z
^ __ s #
Intermediate-to-Surface
Transducer
State Transition Table
Two-Level Morphology
Sample Run
KIMMO DEMO
FSTs and ambiguity
 Parse Example 1: unionizable
– union +ize +able
– un+ ion +ize +able
 Parse Example 2: assess
– assessv
– assN +essN
 Parse Example 3: tender
– tenderAJ
– tenNum+dAJ+erCMP
What to do about
Global Ambiguity?
Accept first successful structure
Run parser through all possible paths
Bias the search in some manner
Computational Morphology
 The Rules and the Lexicon
–
–
–
–
General versus Specific
Regular versus Irregular
Accuracy, speed, space
The Morphology of a language
 Approaches
– Lexicon only
– Lexicon and Rules
• Finite-state Automata
• Finite-state Transducers
– Rules only
Computational Morphology
 The Rules and the Lexicon
–
–
–
–
General versus Specific
Regular versus Irregular
Accuracy, speed, space
The Morphology of a language
 Approaches
– Lexicon only
– Lexicon and Rules
• Finite-state Automata
• Finite-state Transducers
– Rules only (next time!!)
Readings for next time
J&M Chapter 6