Natural language processing - UVa CS
Download
Report
Transcript Natural language processing - UVa CS
Lecture 1:
Introduction
Kai-Wei Chang
CS @ University of Virginia
[email protected]
Couse webpage: http://kwchang.net/teaching/NLP16
CS6501– Natural Language Processing
1
Announcements
Waiting list: Start attending the first few meetings
of the class as if you are registered. Given that
some students will drop the class, some space
will free up.
We will use Piazza as an online discussion
platform. Please enroll.
CS6501– Natural Language Processing
2
Staff
Instructor: Kai-Wei Chang
Email: [email protected]
Office: R412 Rice Hall
Office hour: 2:00 – 3:00, Tue (after class).
Additional office hour: 3:00 – 4:00, Thu
TA: Wasi Ahmad
Email: [email protected]
Office: R432 Rice Hall
Office hour: 4:00 – 5:00, Mon
CS6501– Natural Language Processing
3
This lecture
Course Overview
What is NLP? Why it is important?
What will you learn from this course?
Course Information
What are the challenges?
Key NLP components
CS6501– Natural Language Processing
4
What is NLP
Wiki: Natural language processing (NLP) is
a field of computer science, artificial
intelligence, and computational linguistics
concerned with the interactions between
computers and human (natural) languages.
CS6501– Natural Language Processing
5
Go beyond the keyword matching
Identify the structure and meaning of
words, sentences, texts and conversations
Deep understanding of broad language
NLP is all around us
CS6501– Natural Language Processing
6
Machine translation
Facebook translation, image credit: Meedan.org
CS6501– Natural Language Processing
7
Statistical machine translation
Image credit: Julia Hockenmaier, Intro to NLP
CS6501– Natural Language Processing
8
Dialog Systems
CS6501– Natural Language Processing
9
Sentiment/Opinion Analysis
CS6501– Natural Language Processing
10
Text Classification
www.wired.com
Other applications?
CS6501– Natural Language Processing
11
Question answering
'Watson' computer wins at 'Jeopardy'
credit: ifunny.com
CS6501– Natural Language Processing
12
Question answering
Go beyond search
CS6501– Natural Language Processing
13
Natural language instruction
https://youtu.be/KkOCeAtKHIc?t=1m28s
CS6501– Natural Language Processing
14
Digital personal assistant
More on natural language instruction
credit: techspot.com
Semantic parsing – understand tasks
Entity linking – “my wife” = “Kellie” in the phone
book
CS6501– Natural Language Processing
15
Information Extraction
Unstructured text to database entries
Yoav Artzi: Natural language processing
CS6501– Natural Language Processing
16
Language Comprehension
Christopher Robin is alive and well. He is the same
person that you read about in the book, Winnie the Pooh.
As a boy, Chris lived in a pretty home called Cotchfield
Farm. When Chris was three years old, his father wrote
a poem about him. The poem was printed in a magazine
for others to read. Mr. Robin then wrote a book
Q: who wrote Winnie the Pooh?
Q: where is Chris lived?
CS6501– Natural Language Processing
17
What will you learn from this course
The NLP Pipeline
Key components for
understanding text
NLP systems/applications
Current techniques & limitation
Build realistic NLP tools
CS6501– Natural Language Processing
18
What’s not covered by this course
Speech recognition – no signal processing
Natural language generation
Details of ML algorithms / theory
Text mining / information retrieval
CS6501– Natural Language Processing
19
This lecture
Course Overview
What is NLP? Why it is important?
What will you learn from this course?
Course Information
What are the challenges?
Key NLP components
CS6501– Natural Language Processing
20
Overview
New course, first time being offered
Comments are welcomed
Aimed at first- or second- year PhD students
Lecture + Seminar
No course prerequisites, but I assume
programming experience (for the final project)
basics of probability calculus, and linear
algebra (HW0)
CS6501– Natural Language Processing
21
Grading
No exam & HW -- hooray
Lectures & forum
Participate in discussion (additional credits)
Review quizzes (25%): 3 quizzes
Critical review report (10%)
Paper presentation (15%)
Final project (50%)
CS6501– Natural Language Processing
22
Quizzes
Format
Multiple choice questions
Fill-in-the-blank
Short answer questions
Each quiz: ~20 min in class
Schedule: see course website
Closed book, Closed notes, Closed laptop
CS6501– Natural Language Processing
23
Critical review report
1 page maximum
Pick one paper from the suggested list
Summarize the paper (use you own words)
Provide detailed comments
What can be improved
Potential future directions
Other related work
Some students will be selected to present
their critical reviews
CS6501– Natural Language Processing
24
Paper presentation
Each group has 2~3 students
Picked one paper from the suggested
readings, or your favorite paper
Cannot be the same as critical review report
Can be related to your final project
Register your choice early
15 min presentation + 2 mins Q&A
Will be graded by the instructor, TA, other
students
CS6501– Natural Language Processing
25
Final Project
Work in groups (2~3 students)
Project proposal
Written report, 2 page maximum
Project report (35%)
< 8 pages, ACL format
Due 2 days before the final presentation
Project presentation (15%)
5-min in-class presentation (tentative)
CS6501– Natural Language Processing
26
Late Policy
Credit of 48 hours for all the assignments
Including proposal and final project
No accumulation
No more grace period
No make-up exam
unless under emergency situation
CS6501– Natural Language Processing
27
Cheating/Plagiarism
No. Ask if you have concerns
UVA Honor Code:
http://www.virginia.edu/honor/
CS6501– Natural Language Processing
28
Lectures and office hours
Participation is highly appreciated!
Ask questions if you are still confusing
Feedbacks are welcomed
Lead the discussion in this class
Enroll Piazza
https://piazza.com/virginia/fall2016/cs6501004
CS6501– Natural Language Processing
29
Topics of this class
Fundamental NLP problems
Machine learning & statistical approaches
for NLP
NLP applications
Recent trend in NLP
CS6501– Natural Language Processing
30
What to Read?
Natural Language Processing
ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL
aclweb.org/anthology
Machine learning
ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ
Artificial Intelligence
AAAI, IJCAI, UAI, JAIR
CS6501– Natural Language Processing
31
Questions?
CS6501– Natural Language Processing
32
This lecture
Course Overview
What is NLP? Why it is important?
What will you learn from this course?
Course Information
What are the challenges?
Key NLP components
CS6501– Natural Language Processing
33
Challenges – ambiguity
Word sense ambiguity
CS6501– Natural Language Processing
34
Challenges – ambiguity
Word sense / meaning ambiguity
Credit: http://stuffsirisaid.com
CS6501– Natural Language Processing
35
Challenges – ambiguity
PP attachment ambiguity
Credit: Mark Liberman, http://languagelog.ldc.upenn.edu/nll/?p=17711
CS6501– Natural Language Processing
36
Challenges -- ambiguity
Ambiguous headlines:
Include your children when baking cookies
Local High School Dropouts Cut in Half
Hospitals are Sued by 7 Foot Doctors
Iraqi Head Seeks Arms
Safety Experts Say School Bus Passengers
Should Be Belted
Teacher Strikes Idle Kids
CS6501– Natural Language Processing
37
Challenges – ambiguity
Pronoun reference ambiguity
Credit: http://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakes
CS6501– Natural Language Processing
38
Challenges – language is not static
Language grows and changes
e.g., cyber lingo
LOL
G2G
BFN
B4N
Idk
FWIW
LUWAMH
Laugh out loud
Got to go
Bye for now
Bye for now
I don’t know
For what it’s worth
Love you with all my heart
CS6501– Natural Language Processing
39
Challenges--language is compositional
Carefully
Slide
CS6501– Natural Language Processing
40
Challenges--language is compositional
小心:
Carefully
Careful
Take
Care
Caution
CS6501– Natural Language Processing
地滑:
Slide
Landslip
Wet Floor
Smooth
41
Challenges – scale
Examples:
Bible (King James version): ~700K
Penn Tree bank ~1M from Wall street journal
Newswire collection: 500M+
Wikipedia: 2.9 billion word (English)
Web: several billions of words
CS6501– Natural Language Processing
42
This lecture
Course Overview
What is NLP? Why it is important?
What will you learn from this course?
Course Information
What are the challenges?
Key NLP components
CS6501– Natural Language Processing
43
Part of speech tagging
CS6501– Natural Language Processing
44
Syntactic (Constituency) parsing
CS6501– Natural Language Processing
45
Syntactic structure => meaning
Image credit: Julia Hockenmaier, Intro to NLP
CS6501– Natural Language Processing
46
Dependency Parsing
CS6501– Natural Language Processing
47
Semantic analysis
Word sense disambiguation
Semantic role labeling
Credit: Ivan Titov
CS6501– Natural Language Processing
48
Q: [Chris] = [Mr. Robin] ?
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
Slide modified from Dan Roth
49
Co-reference Resolution
Christopher Robin is alive and well. He is the
same person that you read about in the book,
Winnie the Pooh. As a boy, Chris lived in a
pretty home called Cotchfield Farm. When
Chris was three years old, his father wrote a
poem about him. The poem was printed in a
magazine for others to read. Mr. Robin then
wrote a book
50
Questions?
CS6501– Natural Language Processing
51