Intro to the module - School of Computer Science

Download Report

Transcript Intro to the module - School of Computer Science

Introduction to the Module
John Barnden
School of Computer Science
University of Birmingham
Natural Language Processing 1
2014/15 Semester 2
Me
• John Barnden is my name
• And natural language processing is my game ...
– Specifically and mainly: metaphor theory & processing
• I’m Professor of Artificial Intelligence
• Coords:
– Room 136
– Tel. 4-3816
– [email protected]
Demonstrator
• To be specified.
• His job:
– Help you with any aspect of the module
– Incl.: understanding the material, getting a start on exercises (even when
assessed), using some computer programs that will be available, helping
with marking
Additional Assistant (sometimes)
• Andrew Gargett — [email protected]
my Research Fellow on an EU-funded project on metaphor
processing
• His job:
– Teach some of the module
– Introduce you to certain software tools
Syllabus Page and Website
• FIND and READ the syllabus page for this module!!
• In the Relevant Links section, follow the link to my own
webpages for the module.
• READ the top page there.
• Lecture slides, exercises, etc. will hang from it.
• Currently the Canvas page just points to my page.
You
• What degrees are you on?
• Why did you choose this module?
• What have you heard about NLP?
Assessment
• 1.5 hour exam (80%).
– NB: in its detail, will differ considerably from previous exams.
• Mid-term test (10%), in approx. week 7 of this term
– Date will be specified nearer the time.
• Exercise-set as homework, Weeks 9-11 probably (10%)
– To be done individually, with limited collaboration (to be clarified later).
– Be aware of the plagiarism documentation in the student handbook on
the School website!!
Official Aims of Module
(plus Notes by me)
• Introduce Natural Language Processing as one of the
components of Artificial Intelligence, both from engineering and
cognitive viewpoints. Note:
– NLP gives insight into mind and AI in general.
• Show how Natural Language Processing techniques can be
programmed …. Notes:
– The module is not a workshop and only aims to give you abstract algorithms and
other background for NLP programming. Emphasis will be more on the
underlying concepts, theory, problems, and understanding of algorithms. You
will also be introduced to some practical tools.
– Ignore the mentions of Prolog on the module syllabus page – that aspect is out
of date.
More Notes on Aims of Module
• The module will largely be about processing of textual
language.
– Only occasional comments will be made about processing of speech.
– The language-processing field is largely divided into textual and speechprocessing aspects.
– Speech brings in a host of extra technical problems.
– Text processing is (more than) enough for (more than) one module!
• The main module textbook contains much information about speech
processing (optional reading).
• The module will (very briefly) mention ramifications into sign
language and manual gesture.
Unofficial Aims of Module
• Make you aware of language as a really fun think to think
about!
• To show you it acts strangely and wonderfully all around us all
the time!
• To show you it’s technically challenging to deal with, in all sorts
of fascinating ways!
Textbook and Its Relationship to Module
• Main textbook is the Jurafsky & Martin 2009 book on syllabus page.
• Plays an important role in the module.
• In many cases the lectures can only give a brief intro to a more detailed
treatment in the textbook.
• Assessed work will assume a (reasonable level of) knowledge of specified
parts of the textbook.
• Lectures will cover some things not covered in the textbook, and will further
illuminate some things that are.
• You can of course ask me or the demonstrator privately for help with
understanding textbook material.
Nature of Class Sessions
• Mainly lecture, but with
– Occasional in-class exercises (formative)
– Mid-term in-class test (assessed -- 10%).
• You are strongly encouraged to ask questions or make comments
in class.
• I will have detailed lecture slides (accessible via my module
website), but may say important things that are not on the slides.
• These slides will always be on the web.
• I will occasionally supply additional notes (electronic), including
answer notes about exercises.
What the Study of Language Covers, 1
• What language is, as distinct from other things we do or use.
• But also how it’s related to some such things.
• Whether other creatures use language.
• Speech aspects, textual aspects, signing aspects, gestural aspects.
• Connection of language to diagrams, pictures, music, thought ...
• Poetic aspects of language.
• Specific purposes of language such as persuasion and intimacy-building.
• Learning/teaching of language (either naturally or deliberately).
• Development of language over history.
What the Study of Language Covers, 2
• How do we get meaning (in broadest sense, including things like
emotion) from discourse.
• How discourse is broken down into components (e..g, sentences,
phrases, words, parts of words).
• How the meaning of a phrase, sentence or complex discourse
segment depends on the meanings of the parts and other
information.
• How the above differs between: text, speech, signing, ...
• Translation between different languages.
Language Technology
• Any use of language processing by a computer system. Some main
topical examples, all of extensive practical importance:
– Machine translation.
– Document summarization.
– Information extraction.
– Text mining.
– Information retrieval (usually = retrieval of whole documents).
– Conversational agents, whether for
• general chat as in fronting of sites (IKEA, US Army, ...) chatrooms and artificial companions
• or for specific tasks such as booking tickets, therapy, other life help.
– Sentiment analysis: extracting the emotional/evaluative tone of language
objects such as product reviews, customer complaints or user interactions
with an HCI system.
– Web searching.
A Standard Breakdown
• Language is traditionally (and still currently) viewed as having the
following aspects or levels:
– Phonological / orthographical (and the analogous level in sign language):
‍ The patterns of sounds, letters or hand/body movements in basic units such as words,
and what happens to them when words (etc.) are put together
– Morphological:
‍ Largely about how words are broken down into conceptually significant segments (i.e.
not just into letters, etc.)
– Syntactic:
‍ The patterns of words of various types found in bigger units such as sentences.
– Semantic:
‍ The primary meanings of words, phrases and sentences.
– Pragmatic:
‍ More subtle and/or context-dependent aspects of the way in which meaning and
other effects arise from language.
But This Breakdown is Broken Down!
• The semantics/pragmatics distinction is hugely contentious and theory-laden.
There are many different versions of what sort of meaning semantics gets at, and
of what pragmatics adds.
• The syntax/semantics distinction is somewhat difficult and theory-laden. Even
defining what the traditional “parts of speech” (nouns, verbs, etc.) are in an
objective way is tricky, and brings in both syntax and semantics.
• There is no sharp distinction between morphology and syntax. For one thing,
what counts as a word is unclear. And words can be built from other words.
• Even if the breakdown could be theoretically maintained, it would not imply that
language processing would, should, or even could, be correspondingly divided,
because of extensive interaction between the different aspects.
Rough Set of Topics
• What counts as a word?
• Morphology.
• Simple Grammar and Parts of Speech (POSs).
• POS Analysis
• Syntactic Analysis
• Some Logic needed for ...
• Semantic Analysis
• Pragmatics and Other Advanced Topics
Some Intriguing Exercises
You do “Introductory Exercise-Set A.”
If there’s time, we discuss those exercises.
You do “Introductory Exercise-Set B.”
That will lead into the next segment of the module ...