3. Linguistic Essentials

Download Report

Transcript 3. Linguistic Essentials

Natural Language Processing
Spring 2007
V. “Juggy” Jagannathan
Course Book
Foundations of Statistical
Natural Language Processing
By
Christopher Manning & Hinrich
Schutze
Chapter 3
Linguistic Essentials
January 22, 2007
Parts of Speech and Morphology
• Syntactic/Grammatical categories – Parts
of Speech (POS)
– Nouns – refer to people, animal, concepts &
things
– Verbs – to express action in a sentence
– Adjectives – describe properties of nouns
• Substitution test for adjectives
• Ex: The {sad, intelligent, green, fat…} one is in the
corner.
Word class/lexical categories
• Open or lexical categories
– Nouns, verbs and adjectives that have a large
membership and continually grows as new
words are added to the language
• Closed word or functional categories
– Prepositions and determiners
• Ex. Of, on, the, a
• Words are listed in a “dictionary” referred
to by linguists as the “lexicon”
Tags
• Parts of Speech tagging – 8 categories –
referred to as POS tags.
• Corpus Linguists use more fine grained
tagging
• Various corpus have been tagged
extensively and the pioneering one is the
Brown corpus.
– Adjectives in Brown corpus are referred by
the tag “JJ”
Morphological process
• Source:
http://www.sil.org/LINGUISTICS/Glossary
OfLinguisticTerms/WhatIsAMorphologicalP
rocess.htm
– “Definition A morphological process is a
means of changing a stem to adjust its
meaning to fit its syntactic and
communicational context.”
• Examples
– Plural form (dog-s) derived from (dog)
Morphological processes
• Major forms of morphological processes
– Inflection
• Systematic modification of a root (stem) form by means of prefixes and
suffixes
• Inflection does not change the meaning of the word but does change word
features such as tense and plurality.
• All of the inflectional forms of a word are grouped as manifestation of a
“lexeme”
– Derivation
• Can dramatically change the meaning of the derived word.
• Ex: Adverb “widely” derived from adjective “wide”
• Ex: suffix use – weak-en; soft-en; understand-able; accept-able; teach-er;
lead-er;
• Compounding
– Merging of two or more words into a new word (concept)
– Ex. Disk drive, tea kettle, college degree, down market, mad cow
disease, overtake
Nouns and Pronouns
• Nouns – refers to people, animals and things
– Dog, tree, person, hat, speech, idea, philosophy
– Inflection is a process by which stem of a word can be
modified to create new word
– English the only form of inflection is one indicating
whether a noun is singular or plural
– Ex. Dogs, trees, hats, speeches, persons
– Irregular inflection examples: women
– Other languages use inflection to convey “gender –
masculine, feminine, neuter” and “case – nominative,
genitive, dative, accusative).
Gender forms
• Pronouns
– Masculine (he), feminine (she), neuter (it)
• Case relationship in English – the genitive case
– Ex: the woman’s house; the students’ grievances
• Possessive pronouns
– Ex: my car
– Second possessive form of pronoun: a friend of mine
• Reflexive pronouns – ex. Herself, myself
– Ex:
• Mary saw herself in the mirror.
• Mary saw her in the mirror.
– Also referred to as “anaphors” must refer to something nearby in
the text.
Brown tags
Nouns
NN
Candy, woman
Proper nouns
NNP
Mary, Smith, United States
Adverbial nouns
NR
“The large cat might weigh twenty pounds”, “He
will attend the meeting next month” **
Plural nouns
NNS
Women, dogs
Plural proper nouns
NNPS
All the Johns, please step aside.
Plural adverbial nouns
NRS
“He will attend the meeting next few months.”
Possessive singular nouns
NN$
Man’s role in society…
Possessive plural nouns
NNS$
Dogs’ den located in down town..
** Examples from: http://www.tameri.com/edit/doubles.html
Pronoun forms and Brown Tags
Words that accompany nouns:
determiners and adjectives
• Determiners – describe the particular
reference of a noun
– Articles – refers to someone or something
– “the” refers to someone or some thing we
already know about and is being referenced
• Ex. “the tree” refers to a known tree.
– “a” or “an” introduces a new reference to
some thing that has not appeared before or its
identity cannot be inferred from the context.
Determiners and adjectives
• Demonstratives
– “this” or “that”
• Adjectives
– Describe properties of nouns
– ex: a red rose, this long journey, many intelligent
children, a very trendy magazine.
– The above is also referred to as: attributive or
adnominal.
– Predicative form of adjective (appearing in the object
place of a sentence)
• Ex. The rose is red. The journey will be long.
Agreement
• Agreement, here refers to congruence in
gender, case and number between the
determiner, adjective and the noun. Many
languages, this can be quite complex.
Adjectives and Brown tags
• Positive – the basic form of an adjective [JJ]
– Ex. Rich, trendy, intelligent
• Comparative [JJR]
– Ex. Richer, trendier
• Superlative [JJT]
– Ex. Richest, trendiest
• Semantically superlative adjectives [JJS]
– Ex. Chief, main and top
• Numbers – are subclasses of adjectives
– Cardinals [CD]
• Ex. One, two, and 6,000,000
– Ordinals [OD]
• Ex. First, second, tenth
• Periphrastic forms - forms made by using auxiliary words
– Ex. More intelligent, most intelligent
Brown tags for determiners,
quantifiers
•
Determiners
–
–
Articles [AT]
Singular determiners [DT]
•
–
Plural determiners [DTS]
•
–
Either, neither
Quantifiers
–
–
Words that express ideas like “all”, “many”, “some”
Pre-quantifier [ABN]
•
–
All, many
Nominal pronoun [PN]
•
•
Some, any
Double conjunction determiners [DTX]
•
•
These, those
Determiners that can be both singular or plural [DTI]
•
–
This, that
One, something, anything, something
Interrogative pronouns
–
–
–
–
[WDT] – wh-determiner – what, which
[WP$] – possesive wh-pronoun: whose
[WPO] – objective wh-pronoun: whom, which, that
[WPS] – nominative wh-pronoun: who, which, that
Verbs
Phrase Structure
Phrase Structure
• Noun phrases [NP]
– Noun is the head of the noun phrase
• Prepositional phrases [PP]
– Headed by preposition and contain a NP complement
• Verb phrases [VP]
– Headed by a verb
• Ex. Getting to school on time was a struggle.
• Adjective phrases [AP]
– She is very sure of herself
– He seemed a man who was quite certain to succeed.
Phrase Structure Grammars
• Syntactic analysis allows us to infer the
meaning – meaning completely different in
the following two sentences that use the
same words
– Mary gave Peter a book
– Peter gave Mary a book
• Some languages the order of the words
does not matter – free word order
language
Rewrite rules
Labeled bracketing
Non-local and long-distance
dependencies
• Subject-verb agreement
– The women who found the wallet were given
a reward.
• Long-distance relationship
– Which book should Peter buy?
• These dependencies impact statistical
NLP approaches
Dependency: Arguments and
adjuncts
• Dependency
– Concept of dependents
– “Sue watched the man at the next table”
• Sue and man are dependent on watched.
• The PP “at the next table” is dependent of man. It modifies
man.
• The two phrases can be viewed as “arguments” of the verb
“watched”.
• Semantic roles
– Agent of an action is the person or thing doing the
action [also viewed as subject]
– Patient – is the person or thing that is being acted on
[also viewed as the object]
Active & Passive voice
• Example
– Children eat candy.
– Candy is eaten by children
Adjuncts
Sub categorization Frame
The set of arguments that a verb can appear with is
referred to as sub categorization frame.
Selectional restrictions or selectional preferences
X’ Theory
• N’ – “N bar nodes”
• http://en.wikipedia.org
/wiki/X-bar_theory
Phrase Structure Ambiguity
Garden Paths
• Parsing the following sentence
– The horse raced past the barn fell.
– Garden path parse is the phenomenon by
which a parse that is generated from “the
horse raced past the barn” will have to be
abandoned to accommodate “fell”.
Ungrammatical constructs
• Parsing may fail or can get multiple parses
due to ungrammatical constructs
– Slept children the
• Some sentences may be grammatically
correct but meaningless
– Colorless green ideas sleep furiously.
– The cat barked.
Semantics and Pragmatics
Lexical Semantics: study of how meanings of individual words are combined into the
meaning of sentences.
Hypernymy vs Hyponymy
animal is a hypernym of cat
cat is a hyponym of animal
Antonym – words with opposite meanings
Meronymy – part belonging to a whole
tire is a meronym of a car
Holonym – whole corresponding to a part
Synonyms – words with similar meanings
Homonyms – words that are spelled the same but have different meanings
bank – river bank; bank – a financial institution
Senses Polyseme – if the different senses (meanings) of the word are related. Example
“branch” could mean part of a tree; could mean dependant part of an organization.
Ambiguity – lexical ambiguity refers to both homonymy and polyseme
Homophony – homonyms that are also pronounced the same. “bass” for example could
mean a fish or low pitched sound – and is NOT a homophone.
Compositionality
• Once we have the meaning of individual words,
we need to assemble them into the meaning of a
whole sentence. This is not easy…
– White paper, white hair, white skin, white wine
– Only the paper is white!
– These are example of collocations
• Idioms – individual word meaning does not
predict the meaning of the whole
– Kick the bucket
– Carriage return
Scope and discourse analysis
• Scope of quantifiers can be tricky
• Discourse analysis requires resolution of
“anaphoric relations”
• Ex. Mary helped Peter get out of the cab.
He thanked her.
• Anaphoric relations is correctly mapping
he to Peter and her to Mary.
Other areas in linguistics
• Phonetics – study of physical sounds of language –
phenomena like consonants, vowels and intonations.
• Phonology – structure of sound system in languages
• Sociolinguistics – interactions of social organization and
language
• Historical linguistics – study of how language changes
over time
• Psycholinguistics – study of how language is perceived
• Mathematical linguistics – use of mathematical modeling
approach to linguistics