Lecture 6 - School of Computer Science

Download Report

Transcript Lecture 6 - School of Computer Science

Grammar Writing
Lecture 6
11-721
Grammars and Lexicons
Teruko Mitamura
[email protected]
www.cs.cmu.edu/~teruko
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
1
Schedule: November 26
•
•
•
•
•
Review of Bird2.gra
Feedback on the 1st assignment
Project Grading Criteria
Submit Grammar exercise (mlb.gra/results)
Start Grammar exercise (Japanese
grammar)
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
2
Review: Bird 2 Grammar
•
•
•
•
Goal: To learn more on unification
Much better than bird.gra
Fine to have semantic features in f-structure
Some Problems:
– Not scalable semantic features
• ((x0 semclass) = Morris)
• (meower -) for “bird”
– Incomplete f-structures
– Ambiguous f-structures
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
3
Feedback on the 1st assignment
Planning
• Make sure to estimate the time for each step (or each type)
and the total time
• No schedule was presented
• No duration was presented
• More time for testing and debugging may be needed
• No additional types were presented
Design
• Subj-verb agreement with “be” verb and a regular verb
– Test sentences must contain full sentences, not just pronoun and
“be” verb.
– Test sentences should contain all the pronouns.
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
4
Feedback on the
st
1
assignment (2)
• Intransitive/Transitive/Ditransitive with PP
– Make sure that sentences contain PP.
– PP attachment may be ambiguous. Grammar should be
able to handle both noun attachment and verb attachment.
• Conjunction with PP and NP should be in an object
position, not in a subject position
• If you choose Passive
– Should contain “by” phrase.
• If you choose Sentences with the modified nominal
construction.
– Test sentences must contain full sentences, not just NP.
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
5
Feedback on the
st
1
assignment (3)
• If you choose Sentences with the Noun-Noun
compounds, test sentences must contain full
sentences.
• Optional notation doesn’t work with the parser
NP -> (DET) N
• A quotation mark <N’> doesn’t work inside Lisp.
• Not enough test sentences
• No statement of testing purposes, reasons for
pass/failure
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
6
Feedback on the
st
1
assignment (4)
• No feature structures
• No attribute-value pair
• You don’t need to write attribute-value pairs for
each sentence type
• Test sentences were too short
• Some worked more than it was required for the 1st
assignment
• Overall Comment
– Follow the steps of the process of grammar writing (see
the slides or handouts)
– If you have questions, please ask
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
7
Pseudo-Unification
• Ordering of test and action equation
• Right and left hand side of equation
• Split f-structure
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
8
Grammar Writing Project
Grading Criteria
There are 5 parts, giving a letter grade separately
See the introduction slides for references
1. Plan/Design
2. Grammar
3. Test Suites
4. Results
5. Issues/Discussions
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
9
1. Plan/Design
• Clear statement of goals, tasks, time
estimates and actual time spent
• Set of structures covered
• Trees for each type of construction
• Grammatical functions used
• Feature structures used (attribute values)
– See the grammar exercise cover page
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
10
2. Grammar
• Grammar writing principles followed:
–
–
–
–
Generality
Extensibility
Simplicity
Selectivity
• Quality of grammar: easy to understand
– Well-organized
– Well-documented, etc.
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
11
3. Test Suites
• Enough sentences to test each type of
construction
• Include both grammatical and
ungrammatical sentences
• Clear statement of testing purposes, reasons
for pass/failure
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
12
3. Test Suites (2)
Examples
1. Full sentences that contain “be” verb
Test for person agreement.
*I are a student
Test for number agreement.
*She is teachers
Test for case agreement.
*Me is a teacher
Test for missing determiner
*She is teacher
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
13
4. Results
• Grammatical sentences – all parsed
• Ungrammatical sentences – all failed
• Well-formed f-structures
–
–
–
–
–
–
No NILs in the f-structures
No missing f-structures, elements
No unnecessary redundancy
No unnecessary feature structures
No conflicting information in the f-structures
No unnecessary ambiguity
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
14
5. Issues/Discussions (What you learned)
• Any interesting findings
• Grammar design issues (decisions and results)
• Grammar writing principles vs. actual
implementation issues
• Time estimate vs. Actual time spent
• Problems and Reasons
– E.g. Ambiguity: reasons for more than one parse
– E.g. Any limitations that you encountered
• Platform, parser, grammar limitations
• Other issues/discussions/future extension
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
15
Files to submit
1. Plan and Grammar Design
2. Test Suite
3. Grammar File
- Specify the location of the grammar file
4. Test Results: two separate files
1. Grammatical sentences (parsed)
2. Ungrammatical sentences (failed)
3. No trace in the results files. Set to (dmode 0)
5. Issues/Discussions
–
If you use English grammar references, list them.
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
16
Project Due
•
•
•
•
•
December 7 (Friday) at 3pm
Late submission will be down-graded
Cari will collect the project (hard copy).
Don’t send the files by email.
Work alone.
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
17
Grammar Exercise
(Japanese Grammar)
• Free word-order language
• SOV language
• Case markers determine grammatical relations
(ga, wo, ni, de, etc)
• Grammar file: jpn.gra
• Test files: jpn-test1.lisp
/afs/cs/project/cmt-55/lti/Lab/Modules/GNL-721/2007/
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
18
Japanese Lexicon
nichiyoubi (Sunday)
nyuuyooku (New York)
hoomuran (home run)
itta
(went)
utta
(hit-past)
Hideki, Ichiro (person’s name)
ga
(NOM case)
wo
(ACC case)
ni
(Time-on)
ni
(Loc-to)
e
(Loc-to)
de
(Loc-at)
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
19
Japanese Examples
Nichiyoubi ni Hideki ga Nyuuyook e
Sunday
on Hideki NOM New York
itta.
to go PAST
“Hideki went to New York on Sunday.”
Nichiyoubi ni Nyuuyooku e Hideki ga itta.
Hideki ga nichiyoubi ni Nyuuyooku e itta.
Hideki ga Nyuuyooku e nichiyoubi ni itta.
Nyuuyooku e Hideki ga nichiyoubi ni itta.
Nyuuyooku e nichiyoubi ni Hideki ga itta.
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
20
Japanese Examples (2)
Hideki ga Nyuuyooku e itta
Nyuuyooku e Hideki ga itta
Nichiyoubi ni Nyuuyooku e itta
Nyuuyooku e nichiyoubi ni itta
Hideki ga nichiyoubi ni itta
Nichiyoubi ni Hideki ga itta
Hideki ga itta
Nyuuyooku e itta
Nichiyoubi ni itta
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
21
Japanese Example
Nichiyoubi ni Ichiro ga hoomuran wo utta.
Sunday
on Ichiro NOM home run ACC hit-PAST
“Ichiro hit a home run on Sunday.”
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
22
Ungrammatical Sentences
• You can’t have two nominatives or accusatives
in a sentence. (jpn-test1.lisp)
*Hideki ga nichiyoubi ga itta
*Hideki ga Hideki ga itta
*Hideki ga hoomuran ga utta
*Hideki wo hoomuran wo utta
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
23
Japanese Grammar
• Use of recursive rules
(<S> < == > (<NP> <S>)
(<S> < == > (<V>)
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
24
Recursive Rules
S
NP
S
NP
N
P
N
S
P
Hideki ga Nyuuyooku e
Carnegie Mellon
School of Computer Science
V
itta
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
25
Japanese Grammar (2)
• “ni” is ambiguous in Japanese
Time-on, Loc-to
Nichiyoubi ni itta (went on Sunday)
Nyuuyooku ni itta (went to New York)
• “ni” and “e” can be used for Loc-to
Nyuuyooku ni/e itta (went to New York)
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
26
Japanese grammar exercise
• Grammar file:
/afs/cs/project/cmt-55/lti/Lab/Modules/GNL721/2007/jpn.gra
• Test file:
jpn-test1.lisp
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
27
Questions?
Carnegie Mellon
School of Computer Science
LTI Grammars and Lexicons
Copyright © 2007, Carnegie Mellon. All Rights Reserved.
28