Midterm Review

Download Report

Transcript Midterm Review

CS 4705 Final Review
CS4705
Julia Hirschberg
Format and Coverage
• Covers only material from <date> thru <date> (i.e.
beginning with Probabilistic Parsing)
• Same format as midterm:
– Short answers: 2-3 sentences
– True/False: for false statements provide true
correction that is not just the negation of the
false statement, e.g.
– Good answer:
• The exam is on Dec 14. FALSE! The exam is on
Dec 16.
– Bad answer:
• The exam is on Dec 14. FALSE! The exam is not
on Dec 14..
• Exercises
• Short essays: 2 essays, 3-5 paragraphs each
• The final will be only slightly longer than the
midterm, although you will have the full 3h to
complete it.
Probabilistic Parsing
• Problems with CFGs:
– Rules unordered, many possible parses
• Solutions:
– Weight the rules by their probabilities
– But rules aren’t sensitive to lexical items or
subcategorization frames
– Add headwords to trees
– Add subcategorization probabilities
– Add complement/adjunct distinction
– Etc.
Semantics
• Meaning Representations
– Predicate/argument structure and FOPC
x, y{Having(x)  Haver(S, x)  HadThing( y, x) Car( y)}
– Problems with mapping to NL (e.g. and  ^)
• Frame semantics
Having
Haver: S
HadThing: Car
– Problems with reasoning from representation
Subcategorization Frames and Thematic Roles
• What patterns of arguments can different verbs take?
– NP likes NP
– NP likes Inf-VP
– NP likes NP Inf-VP
• What roles can arguments take?
– Agent, Patient, Theme (The ice melted), Experiencer
(Bill likes pizza), (Bill likes pizza), Stimulus (Bill likes
pizza), Goal (Bill ran to Copley Square), Recipient (Bill
gave the book to Mary), Instrument (Bill ate the burrito
with a plastic spork), Location (Bill sits under the tree
on Wednesdays)
Selectional Restrictions
George assassinated the senator.
?The spider assassinated the fly
*Cain assassinated Able.
George broke the bank.
Lexical Semantics
•
•
•
•
Lexemes
Lexicon
Wordnet: synsets
Framenet: subcategorization frames/verb
semantics
Word Relations
• Types of word relations
– Homonymy: bank/bank
– Homophones: red/read
– Homographs: bass/bass
– Polysemy: bank/sperm bank
– Synonymy: big/large
– Hyponym/hypernym: poodle/dog
– Metonymy: (printing press)/the press
– Meronymy: (wheel)/car
– Metaphor: Nothing scares Google.
Word Sense Disambiguation
Time flies like an arrow.
• Tasks: all-words vs. lexical sample
• Techniques:
– Supervised, semi-supervised bootstrapping,
unsupervised
– Corpora needed
– Features that are useful
– Competitions and Evaluation methods
• Specific approaches:
– Naïve Bayes, Decision Lists, Dictionary-based,
Selectional Restrictions
Discourse Structure and Coherence
• Topic segmentation
– Useful Features
– Hearst’s TexTiling – how does it work?
– Supervised methods – how do we evaluate?
• Coherence relations
– Hobbs’
– Rhetorical Structure Theory – what are it’s
problems?
Reference Terminology
•
•
•
•
•
•
•
•
•
Referring expressions
Discourse referents
Anaphora and cataphora
Coreference
Antecendents
Pronouns
One-anaphora
Definite and indefinite NPs
Anaphoric chains
Constraints on Anaphoric Reference
•
•
•
•
•
•
•
•
•
Salience
Recency of mention: rule of 2 sentences
Discourse structure
Agreement
Grammatical function
Repeated mention
Parallel construction
Verb semantics/thematic roles
Pragmatics
Algorithms for Coreference Resolution
•
•
•
•
•
Lappin & Leas
Hobbes
Centering Theory
Supervised approaches
Evaluation
Information Extraction
• Template-based IE
– Named Entity Tagging
– Sequence-based relation tagging: supervised
and bootstrapping
– IE for Question Answering, e.g. biographical
information (Biadsy’s `bouncing’ between
Wikipedia and Google)
Information Retrieval
• Vector-Space model
– Cosine similarity
– TF/IDF weighting
• NIST competition retrieval tasks
• Techniques for improvement
• Metrics
– Precision, recall, F-measure
Question Answering
•
•
•
•
Factoid questions
Useful Features
Answer typing
UT Dallas System
Summarization
• Types and approaches to summarization
– Indicative vs. informative
– Generative vs. extractive
– Single vs. multi-document
– Generic vs. user-focused
• Useful features
• Evaluation methods
• Newsblaster – how does it work?
– Multi-document
– Sentence fusion and ordering
– Topic tracking
MT
• Multilingual challenges
– Orthography, Lexical ambiguity, morphology,
syntax
• MT Approaches:
– The Pyramid
– Statistical vs. Rule-based vs. Hybrid
• Evaluation metrics
– Human vs. Bleu score
– Criteria: fluency vs. accuracy
Dialogue
•
•
•
•
•
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding
Intentional Structure: Centering
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
The Final
• Dec. 16, MUDD 535, 1:10-4pm
• Good luck!