Natural language processing - UVa CS

Download Report

Transcript Natural language processing - UVa CS

Lecture 1:
Introduction
Kai-Wei Chang
CS @ University of Virginia
[email protected]
Couse webpage: http://kwchang.net/teaching/NLP16
CS6501– Natural Language Processing
1
Announcements
 Waiting list: Start attending the first few meetings
of the class as if you are registered. Given that
some students will drop the class, some space
will free up.
 We will use Piazza as an online discussion
platform. Please enroll.
CS6501– Natural Language Processing
2
Staff
 Instructor: Kai-Wei Chang
 Email: [email protected]
 Office: R412 Rice Hall
 Office hour: 2:00 – 3:00, Tue (after class).
 Additional office hour: 3:00 – 4:00, Thu
 TA: Wasi Ahmad
 Email: [email protected]
 Office: R432 Rice Hall
 Office hour: 4:00 – 5:00, Mon
CS6501– Natural Language Processing
3
This lecture
 Course Overview
 What is NLP? Why it is important?
 What will you learn from this course?
 Course Information
 What are the challenges?
 Key NLP components
CS6501– Natural Language Processing
4
What is NLP
 Wiki: Natural language processing (NLP) is
a field of computer science, artificial
intelligence, and computational linguistics
concerned with the interactions between
computers and human (natural) languages.
CS6501– Natural Language Processing
5
Go beyond the keyword matching
 Identify the structure and meaning of
words, sentences, texts and conversations
 Deep understanding of broad language
 NLP is all around us
CS6501– Natural Language Processing
6
Machine translation
Facebook translation, image credit: Meedan.org
CS6501– Natural Language Processing
7
Statistical machine translation
Image credit: Julia Hockenmaier, Intro to NLP
CS6501– Natural Language Processing
8
Dialog Systems
CS6501– Natural Language Processing
9
Sentiment/Opinion Analysis
CS6501– Natural Language Processing
10
Text Classification
www.wired.com
 Other applications?
CS6501– Natural Language Processing
11
Question answering
'Watson' computer wins at 'Jeopardy'
credit: ifunny.com
CS6501– Natural Language Processing
12
Question answering
 Go beyond search
CS6501– Natural Language Processing
13
Natural language instruction
https://youtu.be/KkOCeAtKHIc?t=1m28s
CS6501– Natural Language Processing
14
Digital personal assistant
More on natural language instruction
credit: techspot.com
 Semantic parsing – understand tasks
 Entity linking – “my wife” = “Kellie” in the phone
book
CS6501– Natural Language Processing
15
Information Extraction
 Unstructured text to database entries
Yoav Artzi: Natural language processing
CS6501– Natural Language Processing
16
Language Comprehension
Christopher Robin is alive and well. He is the same
person that you read about in the book, Winnie the Pooh.
As a boy, Chris lived in a pretty home called Cotchfield
Farm. When Chris was three years old, his father wrote
a poem about him. The poem was printed in a magazine
for others to read. Mr. Robin then wrote a book
 Q: who wrote Winnie the Pooh?
 Q: where is Chris lived?
CS6501– Natural Language Processing
17
What will you learn from this course
 The NLP Pipeline
 Key components for
understanding text
 NLP systems/applications
 Current techniques & limitation
 Build realistic NLP tools
CS6501– Natural Language Processing
18
What’s not covered by this course
 Speech recognition – no signal processing
 Natural language generation
 Details of ML algorithms / theory
 Text mining / information retrieval
CS6501– Natural Language Processing
19
This lecture
 Course Overview
 What is NLP? Why it is important?
 What will you learn from this course?
 Course Information
 What are the challenges?
 Key NLP components
CS6501– Natural Language Processing
20
Overview
 New course, first time being offered
 Comments are welcomed
 Aimed at first- or second- year PhD students
 Lecture + Seminar
 No course prerequisites, but I assume
 programming experience (for the final project)
 basics of probability calculus, and linear
algebra (HW0)
CS6501– Natural Language Processing
21
Grading
 No exam & HW -- hooray
 Lectures & forum
 Participate in discussion (additional credits)
 Review quizzes (25%): 3 quizzes
 Critical review report (10%)
 Paper presentation (15%)
 Final project (50%)
CS6501– Natural Language Processing
22
Quizzes
 Format
 Multiple choice questions
 Fill-in-the-blank
 Short answer questions
 Each quiz: ~20 min in class
 Schedule: see course website
 Closed book, Closed notes, Closed laptop
CS6501– Natural Language Processing
23
Critical review report
 1 page maximum
 Pick one paper from the suggested list
 Summarize the paper (use you own words)
 Provide detailed comments
 What can be improved
 Potential future directions
 Other related work
 Some students will be selected to present
their critical reviews
CS6501– Natural Language Processing
24
Paper presentation
 Each group has 2~3 students
 Picked one paper from the suggested
readings, or your favorite paper
 Cannot be the same as critical review report
 Can be related to your final project
 Register your choice early
 15 min presentation + 2 mins Q&A
 Will be graded by the instructor, TA, other
students
CS6501– Natural Language Processing
25
Final Project
 Work in groups (2~3 students)
 Project proposal
 Written report, 2 page maximum
 Project report (35%)
 < 8 pages, ACL format
 Due 2 days before the final presentation
 Project presentation (15%)
 5-min in-class presentation (tentative)
CS6501– Natural Language Processing
26
Late Policy
 Credit of 48 hours for all the assignments
 Including proposal and final project
 No accumulation
 No more grace period
 No make-up exam
 unless under emergency situation
CS6501– Natural Language Processing
27
Cheating/Plagiarism
 No. Ask if you have concerns
 UVA Honor Code:
http://www.virginia.edu/honor/
CS6501– Natural Language Processing
28
Lectures and office hours
 Participation is highly appreciated!
 Ask questions if you are still confusing
 Feedbacks are welcomed
 Lead the discussion in this class
 Enroll Piazza
https://piazza.com/virginia/fall2016/cs6501004
CS6501– Natural Language Processing
29
Topics of this class
 Fundamental NLP problems
 Machine learning & statistical approaches
for NLP
 NLP applications
 Recent trend in NLP
CS6501– Natural Language Processing
30
What to Read?
 Natural Language Processing
ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL
aclweb.org/anthology
 Machine learning
ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ
 Artificial Intelligence
AAAI, IJCAI, UAI, JAIR
CS6501– Natural Language Processing
31
Questions?
CS6501– Natural Language Processing
32
This lecture
 Course Overview
 What is NLP? Why it is important?
 What will you learn from this course?
 Course Information
 What are the challenges?
 Key NLP components
CS6501– Natural Language Processing
33
Challenges – ambiguity
 Word sense ambiguity
CS6501– Natural Language Processing
34
Challenges – ambiguity
 Word sense / meaning ambiguity
Credit: http://stuffsirisaid.com
CS6501– Natural Language Processing
35
Challenges – ambiguity
 PP attachment ambiguity
Credit: Mark Liberman, http://languagelog.ldc.upenn.edu/nll/?p=17711
CS6501– Natural Language Processing
36
Challenges -- ambiguity
 Ambiguous headlines:
 Include your children when baking cookies
 Local High School Dropouts Cut in Half
 Hospitals are Sued by 7 Foot Doctors
 Iraqi Head Seeks Arms
 Safety Experts Say School Bus Passengers
Should Be Belted
 Teacher Strikes Idle Kids
CS6501– Natural Language Processing
37
Challenges – ambiguity
 Pronoun reference ambiguity
Credit: http://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakes
CS6501– Natural Language Processing
38
Challenges – language is not static
 Language grows and changes
 e.g., cyber lingo
LOL
G2G
BFN
B4N
Idk
FWIW
LUWAMH
Laugh out loud
Got to go
Bye for now
Bye for now
I don’t know
For what it’s worth
Love you with all my heart
CS6501– Natural Language Processing
39
Challenges--language is compositional
Carefully
Slide
CS6501– Natural Language Processing
40
Challenges--language is compositional
小心:
Carefully
Careful
Take
Care
Caution
CS6501– Natural Language Processing
地滑:
Slide
Landslip
Wet Floor
Smooth
41
Challenges – scale
 Examples:
 Bible (King James version): ~700K
 Penn Tree bank ~1M from Wall street journal
 Newswire collection: 500M+
 Wikipedia: 2.9 billion word (English)
 Web: several billions of words
CS6501– Natural Language Processing
42
This lecture
 Course Overview
 What is NLP? Why it is important?
 What will you learn from this course?
 Course Information
 What are the challenges?
 Key NLP components
CS6501– Natural Language Processing
43
Part of speech tagging
CS6501– Natural Language Processing
44
Syntactic (Constituency) parsing
CS6501– Natural Language Processing
45
Syntactic structure => meaning
Image credit: Julia Hockenmaier, Intro to NLP
CS6501– Natural Language Processing
46
Dependency Parsing
CS6501– Natural Language Processing
47
Semantic analysis
 Word sense disambiguation
 Semantic role labeling
Credit: Ivan Titov
CS6501– Natural Language Processing
48
Q: [Chris] = [Mr. Robin] ?
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
Slide modified from Dan Roth
49
Co-reference Resolution
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
50
Questions?
CS6501– Natural Language Processing
51