Lecture 2: 13/3/2006
Download
Report
Transcript Lecture 2: 13/3/2006
Introduction to Computational
Linguistics
Eleni Miltsakaki
AUTH
Spring 2006-Lecture 2
1
Outline of English syntax
• Words
• Phrases
• Simple Sentences
2
Review
• What is computational linguistics?
• What is the subject matter of theoretical
computational linguistics?
• What is the subject matter of applied
computational linguistics?
• Why is language hard for the computer?
3
Review
• Give examples of
– Syntactic ambiguity
– Semantic ambiguity
– Phonological ambiguity
4
Words
• Two basic ways to form words
– Inflectional (e.g. English verbs)
• Open + ed = opened
• Open + ing = opening
– Derivational (e.g. adverbs from adjectives, nouns
from adjectives)
• Happy happily
• Happy happiness (nouns from adjectives)
5
Basic classes of words
• Classes of words aka parts of speech (POS)
–
–
–
–
Nouns
Verbs
Adjectives
Adverbs
• The above classes of word belong to the type open class
words
• We also have closed class words
– Articles, pronouns, prepositions, particles, quantifiers,
conjunctions
6
Basic phrases
• A word from an open class can be used to
form the basis of a phrase
• The basis of a phrase is called the head
7
Examples of phrases
• Noun phrases
– The manager of the institute
– Her worry to pass the exams
– Several students from the English Department
• Adjective phrases
– easy to understand
– mad as a dog
– glad that he passed the exam
8
Examples of phrases
• Adverb phrases
– fast like the wind
– outside the building
• Verb phrases
– ate her sandwich
– went to the doctor
– believed what I told him
9
“Complements”
• Notice that to be meaningful the verb “go”, for
example requires a phrase for “location”
– *John went
– John went home
• Such phrases “complete” the meaning of the
verb (or other type of head) and are called
complements
10
Inside the noun phrase
• NPs are used to refer to things: objects, places,
concepts, events, qualities, etc
• NPs may consist of:
–
–
–
–
–
A single pronoun (he, she, etc)
A name or proper noun (John, Athens, etc)
A specifier and a noun
A qualifier and a noun
A specifier and a qualifier and a noun (e.g., the first
three winners)
11
Specifiers
• Specifiers indicate how many objects are
described and also how these objects
relate to the speaker
• Basis types of specifiers
– Ordinals (e.g., first, second)
– Cardinals (e.g., one, two)
– Determiners (see next slide)
12
Determiners
• Basic types of determiners
– Articles (the, a, an)
– Demonstratives (this, that, these, those)
– Possessives (‘s, her, my, whose, etc)
– Wh-determiners (which, what –in questions)
– Quantifying determiners (some, every, most,
no, any etc)
13
Qualifiers
• Basic types of qualifiers
– Adjectives
• Happy cat
• Angry feelings
– Noun modifiers
• Cook book
• University hospitals
14
Inside the verb phrase
• A simple VP
– Adverbial modifier + head verb +
complements
• Types of verbs
– Auxiliary (be, do, have)
– Modal (will, can, could)
– Main (eat, work, think)
15
Types of verb complements
• Intransitive verbs do not required complements
• Transitive verbs require an object as a complement (e.g.
find a key)
• Transitive verbs allow passive forms (e.g. a key was
found)
• Ditransitive verbs require one direct and on indirect
object (e.g. give Mary a book)
16
Other verb complements
• Clausal complements
– Some verbs require clausal complements
• Mary knows that John left
• Prepositional phrase complements
– Some verbs requires specific PP complements
• Mary gave the book to John
– Others require any PP complement
• John put the book on the shelf/in the room/under the table
17
Adjective phrases
• Simple
– Angry, easy, etc
• Complex
– Pleased with the prize
– Angry at the committee
– Willing to read the book
• Complex AdjP normally do not precede nouns, they are
used as complements of verbs such as be or seem
18
Adverbial phrases
• Indicators of
–
–
–
–
–
–
Degree
Location
Manner
The time of something (now, yesterday, etc)
Frequency
Duration
• Location in the sentence
– Initial
– Medial
– Final
19
The famous argument-adjunct
problem
• Sometimes it’s hard to say if an adverbial is a
verb complement (i.e. it’s an argument of the
verb) or simply a modification of the verb phrase
(i.e. an adjunct)
• Consider
–
–
–
–
Mary put the book on the shelf
*Mary put
Mary painted the room with a brush
Mary painted the room
20
Grammars and parsing
• What is syntactic parsing
– Determining the syntactic structure of a
sentence
• Basic steps
– Identify sentence boundaries
– Identify what part of speech is each word
– Identify syntactic relations
21
Tree representation
• John ate the pizza
(S (NP (N John))
(VP (V ate)
(NP (Det the)
(N cat))))
22
Some basic tree terminology
•
•
•
•
•
•
•
•
Nodes
Links
Root
Leaves
Parent node
Child node
Ancestor
The notion of “domination”
23
How to construct a tree
• To construct a tree of an English sentence
you need to know which structure are legal
in English
• Rewrite rules
– Describe what tree structures are allowed in
the language
24
Rewrite rules for English
NP==> N
NP==> Det NP
VP==> V
VP ==> V NP
S ==> NP VP
S
==> NP VP
==> N VP
==> John VP
==> John V NP
==> John ate NP
==> John ate Det N
==> John ate the N
==> John ate the pizza
25
What makes a good grammar?
• Generality
– The range of sentences covered by the rules
• Selectivity
– The range of sentences that can be identified
as ungrammatical
• Understandability
– How simple the grammar is
26
Hint for making rules general
• Pay attention to constituents
• Diagnostic of constituency
– Conjunction
• Compare
–
–
–
–
–
–
I ate a hamburger and a hot dog
I will eat the hamburger and throw away the hot dog
I ate a hamburger and John ate a hot dog
*I ate a hamburger and on the stove
*I ate a cold hot dog and well burned
*I ate the hot dog
27
How the conjunction test can help
• Compare
– I looked up John’s number
– I looked up John’s chimney
– *I looked up John’s number and in his
cupboards
– I looked up John’s chimney and in his
cupboards
28
Parsing strategies
• Top-down
– A top down parser starts with S and attempts to rewrite it into a
sequence of terminal symbols that matches the words in the input
sentence
• Bottom-up
– You take a sequence of symbols and match it to the right hand side of
the rule, i.e. start with Det N and match it to get the NP
• Bottom-up chart parsing
– To avoid unnecessary repetition of the matching process you use a data
structure called chart that allows you to record partial results
We’ll see examples in J. Allen’s Natural Language Understanding, Chapter 3
29
What is generative capacity?
• The range of languages that a formalism can
describe
• Formal languages allow a precise (mathematic)
characterization
• Natural languages CANNOT be characterized
precisely enough to define generative capacity
30