Morphological Analyzers

Download Report

Transcript Morphological Analyzers

Paradigm based
Morphological Analyzers
Dr. Radhika Mamidi
Morphological Analyzers
They are tools to automatically decompose
a word into its root and affixes and give
related features.
Example:
1st stage – identifying morphemes
ate: root = eat
suffix = ed
2nd stage – analyzing morphemes
ate: root = eat
tense = past
Some Applications
• Machine Translation
• Speech Processing
Machine Translation
• Pos tagger gives only part of speech.
More information is needed to translate a
word correctly.
• More information like tense, aspect and
mood of the verbs, gender, number and
person of the nouns.
Example: [Eng Hindi translation]
ENGLISH: She went home.
HINDI: vaha ghar gayi.
ENGLISH: He went home.
HINDI: vaha ghar gayaa.
• The gender of the pronoun is essential for the
translation in Hindi.
• The morph analyzer will give the gender
information.
Example: [Hindi Eng translation]
In Hindi ‘vaha’ can have different senses – ‘he’,
‘she’ or ‘that’.
“vaha ghar gayaa”
If we were to translate this, then the extra
information on the verb will help us to translate
the above sentence correctly as
“He went home”
• The ‘yaa’ indicates past tense as well as singular
number and masculine gender.
• The morph analyzer will give this information.
Speech Processing
• In Text to Speech tools also Morph
Analyzer is essential along with Part of
Speech.
• With extra information on the words, the
efficiency increases.
• The intonation, the pause, the stress etc
can be close to the way humans speak.
• This additional information is given by
morph analyzers.
Approaches
• Paradigm based
• Finite State based
We will discuss the first approach.
Requirement for building paradigm
based Morph Analyzers
•
•
•
•
Knowledge of Lexeme and Word forms
Root and Affix dictionaries
Paradigm Table
Paradigm class
• The lexemes are stored in the dictionaries
and the word forms as paradigms.
Lexeme and Word form
APPLE: apple, apples
CHURCH: church, churches
BOY: boy, boys
WATCH: watch, watches
SPY: spy, spies
• The word in upper case is called LEXEME and
the inflected forms are WORD FORMS.
• Lexemes are the headwords in a dictionary.
Lexeme and Word form
Another example:
played is a word form of the lexeme PLAY
plays is a word form of the lexeme PLAY(1)
plays is a word form of the lexeme PLAY(2)
where PLAY(1) is a verb and PLAY(2) is a noun.
PLAY(1) and PLAY(2) are two different lexemes.
Exercise 1
Give the lexeme of the following word forms
ate
played
manufactured
glasses
players
bites
Exercise 2
“manufactured” can be a verb in past tense or an
adjective. So it belongs to two different lexemes
– manufacture and manufactured.
Which of the following words belong to more than
one lexeme?
ate
wanted
wrote
written
finished
Root and Affix dictionaries
Root dictionary contains a list of roots or
the base forms to which affixation takes
place.
It is stored usually with its part of speech.
Affix dictionary contains a list of all the
affixes in a language.
The features of the affixes are stored here.
The features are stored as attribute value
pairs.
Example entries in a dictionary
Root dictionary
eat <root=‘eat’, category=‘verb’>
book <root=‘book’, category=‘verb’>
book <root=‘book’, category=‘noun’>
Affix dictionary
+s <tense = ‘present’>
+ed <tense = ‘past’>
+en <aspect = ‘perfective’>
+ing <aspect = ‘progressive’>
Paradigm table
A paradigm table represents the inflected forms of
a particular word.
It includes the conjugation of verbs and
declensions of nouns, adjectives, pronouns etc.
Example:
apple, apples
eat, eats, ate, eaten, eating
smart, smarter, smartest
Conjugation of English verbs
•
•
•
•
•
play plays played played playing
eat eats ate eaten eating
look looks looked looked looking
dance dances danced danced dancing
push pushes pushed pushed pushing
Declension of English nouns
•
•
•
•
•
apple, apples
boy, boys
church, churches
watch, watches
spy, spies
Exercise 3
• Give the paradigm table for 5 different
nouns and 5 different verbs in English.
Paradigm Class
• A paradigm class contains the classes of
words i.e. the prototypical root and all the
roots that fall in its class including the
given root.
• By the term ‘root’ we mean the base form
or stem to which affixation takes place.
• Those words which decline or conjugate in
exactly the same way, fall into one class.
The English verbs ‘play’ and ‘look’ have the
following paradigm:
• play plays played played playing
• look looks looked looked looking
So they belong to the same class.
But ‘push’ since it differs in its present tense
form i.e. it has ‘-es’ and not ‘- s’ falls in
another class. Its paradigm is as follows:
• push pushes pushed pushed pushing
The English nouns ‘play’ and ‘boy’ have the
following paradigm:
• play plays
• boy boys
So they belong to the same class.
But ‘spies’ falls in another class. Its
paradigm is as follows:
• spy spies
Paradigm class is represented by one
member of the class.
eat
V
play V
push V
play N
spy
N
church N
eat
play, talk, walk, train
push, fish
play, boy, day
spy, sky
church, watch
Exercise 4
Which of the following verbs belong to the same
paradigm class?
mince
ride
walk
speak
shake
play
dance
take
Which of the following nouns belong to the same
paradigm class?
girl
house
dish
book
mouse
beach
flower
pencil