Computational Linguistics

Download Report

Transcript Computational Linguistics

Computational
Linguistics
INTroduction
Lecture 2
Computers and Language
CL: Two Main Disciplines
Feb 2010 -- MR
language and computers
LINGUISTICS
CLINT - Lecture 1
COMP SCI
2
Language and Computers
includes …

Natural Language Processing (NLP)



Human Language Technology




Computational models of language analysis, interpretation,
and generation.
syntax/semantics interface
emphasis on large-scale performance
example1: Google search
example2: speech technology
Computational Linguistics


Emphasis on mechanised linguistic theories.
Grew out of early Machine Translation efforts
Feb 2010 -- MR
CLINT - Lecture 1
3
Linguistics






Phonetics: The study of speech sounds
Phonology: The study of sound systems
Morphology: The study of word structure
Syntax: The study of sentence structure
Semantics: The study of meaning
Pragmatics: The study of language use
Feb 2010 -- MR
CLINT - Lecture 1
4
Noam Chomsky



Noam Chomsky’s work
in the 1950s radically
changed linguistics,
making syntax central.
Chomsky has been the
dominant figure in
linguistics ever since.
Chomsky invented the
generative approach
to grammar.
Feb 2010 -- MR
CLINT - Lecture 1
5
Formal v. Natural Languages
Formal Languages
Natural Languages

Arithmetic
3290 1 1010101

English
John saw the dog

Logic
x man(x)  mortal(x)

German
Johann hat den hund
gesehen

URL
http://www.cs.um.edu.mt

Maltese
Ġianni ra kelb
Feb 2010 -- MR
CLINT - Lecture 1
6
Ambiguity

Morphological Ambiguity

Lexical Ambiguity

Syntactic Ambiguity

Semantic Ambiguity

Pragmatic Ambiguity

The management of ambiguity is central to the
success of CL
Feb 2010 -- MR
CLINT - Lecture 1
7
Ambiguity

Find at least 5 meanings of this sentence:

4/11/2017
I made her duck
Speech and Language
Processing - Jurafsky and Martin
8
I made her duck






I cooked a duck for her
I cooked a duck belonging to her
I created a duck for her
I created a duck that now belongs to her
I caused her to lower her head
I turned her into a duck
Feb 2010 -- MR
CLINT - Lecture 1
9
Ambiguity





I cooked waterfowl for her benefit (to eat)
I cooked waterfowl belonging to her
I created the (ceramic?) duck she owns
I caused her to quickly lower her upper body
I waved my magic wand and turned her into
undifferentiated waterfowl
4/11/2017
Speech and Language
Processing - Jurafsky and Martin
10
Sources of Ambiguity

I caused her to quickly lower her head or
body.


I cooked waterfowl belonging to her.


Lexical category (part of speech): “duck” can be a
noun or verb; a verb in this case
Lexical category: “her” can be a possessive (“of
her”) or dative (“for her”) pronoun
I made the (ceramic) duck statue she owns

4/11/2017
Lexical Semantics: “make” can mean “create” or
“cook”, and about 100 other things as well
Speech and Language
Processing - Jurafsky and Martin
11
Ambiguity


Ambiguity is a fundamental problem of
computational linguistics
Resolving ambiguity is a crucial goal
4/11/2017
Speech and Language
Processing - Jurafsky and Martin
12
Computer Science

The study of basic concepts




Information
Data
Algorithm
Program
Feb 2010 -- MR
CLINT - Lecture 1
13
Information Data
Algorithm Program




Information is a theoretical concept invented by Shannon in 1948
to measure uncertainty. The units of this measure are called bits.
 Length – metres
 Weight – kilos
 Information – bits
1 bit is the amount of uncertainty inherent to a situation when
there are exactly two possible outcomes. Example: for breakfast I
will have coffee or I will have tea (nothing else).
When I tell you that I have tea, I have conveyed one bit of
information.
The greater the number of possible outcomes, the more bits of
infomation involved in the statement that indicates the actual
outcome.
Feb 2010 -- MR
CLINT - Lecture 1
14
Information Data
Algorithm Program




A formalized representation of facts or concepts
suitable for communication, interpretation, or
processing by people or automated means.
Example: a telephone directory
Unlike information, which is abstract, data is
concrete
Data has a certain level of structure. In the
telephone directory, for example, we have the
structure of a list of entries, each of which has a
name, an address, and a number.
Feb 2010 -- MR
CLINT - Lecture 1
15
Information Data
Algorithm Program
A completely defined procedure for the
solution of a given problem in a finite number
of steps
Feb 2010 -- MR
CLINT - Lecture 1
16
Algorithm for
Chocolate Cake
Feb 2010 -- MR
CLINT - Lecture 1
17
Computer Program




A set of instructions, written in a specific
programming language, which a computer
follows in processing data, performing an
operation, or solving a logical problem.
Concrete
A program can implement an algorithm.
More than one program may implement the
same algorithm.
Not all programs express good algorithms!
Feb 2010 -- MR
CLINT - Lecture 1
18
Algorithms and Linguistics


Do linguistic theories in the abstract make
sense?
Linguistic theory explain linguistic knowledge
in the form of



grammar rules
theories about grammar rules
But performance, involves processing issues:
Feb 2010 -- MR
CLINT - Lecture 1
19
Computational Linguistics –
Issues

Can an artificial system learn a language with limited
exposure to grammatical sentences?
Feb 2010 -- MR
CLINT - Lecture 1
20
Computers and Language
Twin Goals

Scientific Goal:
Contribute to Linguistics by adding a
computational dimension.

Technological Goal:
Develop machinery capable of handling
human language that can support “language
engineering”
Feb 2010 -- MR
CLINT - Lecture 1
21
Computers and Language:
Applications






Information Retrieval/Extraction
Document Classification
Question Answering
Style and Spell Checking
Multimodal Interaction
Machine Translation
Feb 2010 -- MR
CLINT - Lecture 1
22
Algorithms


Many of the algorithms that we’ll study will turn
out to be transducers; algorithms that take one
kind of structure as input and output another.
Unfortunately, ambiguity makes this process
difficult. This leads us to employ algorithms of
various sorts that are designed to manage
ambiguity
4/11/2017
Speech and Language
Processing - Jurafsky and Martin
23