Transcript slides7

‫‪Text Books‬‬
‫עיבוד שפות טבעיות ‪ -‬שיעור שבע‬
‫‪Partial Parsing‬‬
‫אורן גליקמן‬
‫המחלקה למדעי המחשב‬
‫אוניברסיטת בר אילן‬
‫‪1‬‬
‫‪88-680‬‬
‫‪Text Books‬‬
Text Books
Syntax
• The study of grammatical relations
between words and other units within
the sentence.
The Concise Oxford Dictionary of Linguistics
• the way in which linguistic elements (as
words) are put together to form
constituents (as phrases or clauses)
Merriam-Webster Dictionary
88-680
Text Books
2
Text Books
Brackets
• “I prefer a morning flight”
• [S [NP [pro I]][VP [V prefer][NP [Det a] [Nom [N
morning] [ N flight]]]]]]
88-680
Text Books
3
Text Books
Parse Tree
S
VP
NP
NP
Nom
Pronoun
Verb
Det
Noun
Noun
I
prefer
a
morning
flight
88-680
Text Books
4
Text Books
Parsing
• The problem of mapping from a string of
words to to its parse tree is called
parsing.
88-680
Text Books
5
Text Books
Generative Grammar
• A set of rules which indicate precisely
what can be and cannot be a sentence
in a language.
• A grammar which precisely specifies
the membership of the set of all the
grammatical sentences in the language
in question and therefore excludes all
the ungrammatical sentences.
88-680
Text Books
6
Text Books
Formal Languages
• The set of all grammatical sentences in
a given natural language.
• Are natural languages regular?
88-680
Text Books
7
Text Books
English is not a regular
language!
• anbn is not regular
• Look at the following English sentences:
– John and Mary like to eat and sleep,
respectively.
– John, Mary, and Sue like to eat, sleep, and
dance, respectively.
– John, Mary, Sue, and Bob like to eat,
sleep, dance, and cook, respectively.
88-680
Text Books
8
Text Books
Constituents
• Certain groupings of words behave as
constituents.
• Constituents are able to occur in various
sentence positions:
‫– ראיתי את הילד הרזה‬
‫– ראיתי אותו מדבר עם הילד הרזה‬
‫– הילד הרזה גר ממול‬
88-680
Text Books
9
Text Books
The Noun Phrase (NP)
• Examples:
– He
– Ariel Sharon
– The prime minister
– The minister of defense during the war in
Lebanon.
• They can all appear in a similar context:
___ was born in Kfar-Malal
88-680
Text Books
10
Text Books
Prepositional Phrases
• Examples:
–
–
–
–
the man in the white suit
Come and look at my paintings
Are you fond of animals?
Put that thing on the floor
88-680
Text Books
11
Text Books
Verb Phrases
• Examples:
– Getting to school on time was a struggle.
– He was trying to keep his temper.
– That woman quickly showed me the way to
hide.
88-680
Text Books
12
Text Books
Chunking
• Text chunking is dividing sentences into nonoverlapping phrases.
• Noun phrase chunking deals with extracting
the noun phrases from a sentence.
• While NP chunking is much simpler than
parsing, it is still a challenging task to build a
accurate and very efficient NP chunker.
88-680
Text Books
13
Text Books
What is it good for
• The importance of chunking derives from
the fact that it is used in many
applications:
– Information Retrieval & Question Answering
– Machine Translation
– Preprocessing before full syntactic analysis
– Text to speech
– Many other Applications
88-680
Text Books
14
Text Books
What kind of structures should
a partial parser identify?
• Different structures useful for different
tasks:
– Partial constituent structure
[NPI] [VPsaw [NPa tall man in the park]].
– Prosodic segments
[I saw] [a tall man] [in the park].
– Content word groups
[I] [saw] [a tall man] [in the park].
88-680
Text Books
15
Text Books
Chunk Parsing
• Goal: divide a sentence into a sequence of
chunks.
• Chunks are non-overlapping regions of a text:
– [I] saw [a tall man] in [the park].
• Chunks are non-recursive
– a chunk can not contain other chunks
• Chunks are non-exhaustive
– not all words are included in chunks
88-680
Text Books
16
Text Books
Chunk Parsing Examples
• Noun-phrase chunking:
– [I] saw [a tall man] in [the park].
• Verb-phrase chunking:
– The man who [was in the park] [saw me].
• Prosodic chunking:
– [I saw] [a tall man] [in the park].
88-680
Text Books
17
Text Books
Chunks and Constituency
Constituents: [a tall man in [the park]].
Chunks: [a tall man] in [the park].
• Chunks are not constituents
– Constituents are recursive
• Chunks are typically subsequences of
Constituents
– Chunks do not cross constituent
boundaries
88-680
Text Books
18
Text Books
Chunk Parsing: Accuracy
• Chunk parsing achieves higher accuracy
– Smaller solution space
– Less word-order flexibility within chunks than
between chunks
– Better locality:
• Fewer long-range dependencies
• Less context dependence
– No need to resolve ambiguity
– Less error propagation
88-680
Text Books
19
Text Books
Chunk Parsing: Domain
Specificity
Chunk parsing is less domain specific:
• Dependencies on lexical/semantic
information tend to occur at levels
"higher" than chunks:
– Attachment
– Argument selection
– Movement
• Fewer stylistic differences within chunks
88-680
Text Books
20
Text Books
Chunk Parsing: Efficiency
• Chunk parsing is more efficient
– Smaller solution space
– Relevant context is small and local
– Chunks are non-recursive
– Chunk parsing can be implemented with a
finite state machine
88-680
Text Books
21
Text Books
Psycholinguistic Motivations
Chunk parsing is psycholinguistically motivated:
• Chunks as processing units
– Humans tend to read texts one chunk at a time
– Eye-movement tracking studies
• Chunks are phonologically marked
– Pauses, Stress patterns
• Chunking might be a first step in full parsing
88-680
Text Books
22
Text Books
Chunk Parsing Techniques
• Chunk parsers usually ignore lexical content
• Only need to look at part-of-speech tags
• Techniques for implementing chunk parsing:
– Regular expression matching / Finite State
Machines
– Transformation Based Learning
– Memory Based Learning
– Others
88-680
Text Books
23
Text Books
Regular Expression Matching
• Define a regular expression that
matches the sequences of tags in a
chunk
– A simple noun phrase chunk regexp:
<DT>? <JJ>* <NN.?>
– Chunk all matching subsequences:
the/DT little/JJ cat/NN sat/VBD on/IN the/DT
mat/NN
[the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT
mat/NN]
– If matching subsequences overlap, the first one
88-680
gets priority
Text Books
24
Text Books
Chunking as Tagging
• Map Part of Speech tag sequences to {I,O,B}*
 I – tag is part of an NP chunk
 O – tag is not part of
 B – the first tag of an NP chunk which immediately
follows another NP chunk
• Example:
– Input: The little cat sat on the mat
– Output: B
I
I O O B I
88-680
Text Books
25
Text Books
Chunking State of the Art
• Depending on task specification and
test set: 90-95%
88-680
Text Books
26
Text Books
Homework
88-680
Text Books
27
Text Books
Context Free Grammars
• Putting the constituents together
• Next Week…
88-680
Text Books
28