Computational Language

Download Report

Transcript Computational Language

Computational Language
Andrew Hippisley
Computational Language



Computational language and AI
Language engineering: applied
computational language
Case study: spell checkers
Computational language & AI

Artificial Intelligence:
“the simulation on computer of distinctly human
mental functions.” Wilks (1993)
Computational language & AI

Language integral to intelligent systems



Artificial Intelligence
Turing Test
ELIZA
ELIZA
Eliza: Hello. I am ELIZA. How can I help you?
Me: I am feeling a bit depressed today
Eliza: Do you think coming here will help you not to be depressed?
Me: yes
Eliza: You are sure?
Me: a friend told me I should talk to you
Eliza: Do you wish to you should talk to me?
Me: I am willing to try
Eliza: How long have you been willing to try?
Me: I have been willing to try for about two weeks
Eliza: Can you elaborate on that?
Me: a friend said I should try to talk to you about my problem
Eliza: Why do you say your problem?
Me: I have a problem with depression
Eliza: Can you elaborate on that?
Me: no, I don't wish to discuss it
Eliza: You are being a bit negative
Computational language & AI


Why language engineering?
Language integral to intelligent systems




Artifiicial Intelligence
Turing Test
ELIZA
Expert systems: natural language interface,
natural language database
Computational language & AI

Methods shared across systems





Finite State Transition Networks (FSTN)
Logic
Formal rules
Probability
Data: you know it!
Applied computational language

History of the field
Machine Translation: 1960, 1966,
post 1966
 Database access
 Text interpretation
 Information retrieval
 Text categorisation

Language engineering

Information overload


Need a way of automatically processing
text documents
Information extraction
Language engineering

Information extraction

GIDA: system for automatically monitoring
financial market sentiment
GIDA
Language engineering

Information overload



Need a way of automatically processing
text documents
Information extraction
Summarisation
Automatic summarisation
(courtesy of Paulo FERNANDES de OLIVEIRA, PhD)
• Get information source;
• Extract some content from it;
• Present the most important part to the user
xx xxx xxxx x xx xxxx
xxx xx xxx xx xxxxx x
xxx xx xxx xx x xxx xx
xx xxx x xxx xx xxx x
xx x xxxx xxxx xxxx xx
xx xxxx xxx
xxx xx xx xxxx x xxx
xx x xx xx xxxxx x x xx
xxx xxxxxx xxxxxx x x
xxxxxxx xx x xxxxxx
xxxx
xx xx xxxxx xxx xx x xx
xx xxxx xxx xxxx xx
xxx xx xxx xxxx xx
xxx x xxxx x xx xxxx
xx xxx xxxx xx x xxx
xxx xxxx x xxx x xxx
xx xx xxxxx x x xx
xxxxxxx xx x xxxxxx
xxxx
xx xx xxxxx xxx xx
xxx xx xxxx x xxxxx
xx xxxxx x
Lexical Cohesion
Links Example
Sentence 15:
"For the stock market this
move was so deeply
discounted that I don't think it
will have a major impact".
Sentence 42:
Lucent, the most active stock on
the New York Stock Exchange,
skidded 47 cents to $4.31, after
falling to a low at $4.30.
Sentence 23:
J&J's stock added 83 cents to
$65.49.
Sentence 26:
Flagging stock markets
kept merger activity and
new stock offerings on
the wane, the firm said.
Text title: U.S. stocks hold some gains.
Collected from Reuters’ Website on 20 March 2002.
Lexical Cohesion
Bonds Example
17.
In other news, Hewlett-Packard said preliminary
estimates showed shareholders had approved its purchase of
Compaq Computer -- a result unconfirmed by voting officials.
19.
In a related vote, Compaq shareholders are expected on
Wednesday to back the deal, catapulting HP into contention
against International Business Machines for the title of No. 1
computer company.
Text title: U.S. stocks hold some gains.
Collected from Reuters’ Website on 20 March 2002.
Language engineering

Information overload






Need a way of automatically processing
text documents
Information extraction
Summarisation
Translation
Retrieve only relevant documents
Voice processing
Language engineering

Two main approaches


Symbolic
Stochastic
Case study spell checkers
Spelling dictionaries
aim?
given a sequence of symbols:





1. identify misspelled strings
2. generate a list of possible ‘candidate’ correct
strings
3. select most probable candidate from the list
Spelling dictionaries
Implementation:




Probabilistic framework
bayesian rule
noisy channel model
Spelling dictionaries
Types of spelling error



actual word errors
non-word errors
Spelling dictionaries
Types of spelling error


actual word errors



/piece/ instead of /peace/
/there/ instead of /their/
non-word errors
Spelling dictionaries
Types of spelling error


actual word errors



/piece/ instead of /peace/
/there/ instead of /their/
non-word errors

/graffe/ instead of /giraffe/
Spelling dictionaries
Types of spelling error


actual word errors



non-word errors


/piece/ instead of /peace/
/there/ instead of /their/
/graffe/ instead of /giraffe/
of all errors in type written texts, 80% are nonword errors
Spelling dictionaries
non-word errors


Cognitive errors



/seperate/ instead of /separate/
phonetically equivalent sequence of symbols has been
substituted
due to lack of knowledge about spelling conventions
Spelling dictionaries
non-word errors



Cognitive errors
Typographic (‘typo’) errors



influenced by keyboard
e.g. substitution of /w/ for /e/ due to its adjacency on
the keyboard
/thw/ instead of /the/
Spelling dictionaries
non-word errors
noisy channel model






The actual word has been passed through a noisy
communication channel
This has distorted the word, thereby changing it in some
way
The misspelled word is the distorted version of the actual
word
Aim: recover the actual word by hypothesising about the
possible ways in which it could have been distorted
Spelling dictionaries
non-word errors
noisy channel model
What are the possible distortions?








insertion
deletion
substitution
transposition
all of these viewed as transformations that take
place in the noisy channel
Spelling dictionaries

Implementing spelling identification and
correction algorithm
Spelling dictionaries
Implementing spelling identification and correction
algorithm



STAGE 1: compare each string in document with a list of
legal strings; if no corresponding string in list mark as
misspelled
STAGE 2: generate list of candidates




Apply any single transformation to the typo string
Filter the list by checking against a dictionary
STAGE 3: assign probability values to each candidate in
the list
STAGE 4: select best candidate
Spelling dictionaries
STAGE 3


prior probability



likelihood



given all the words in English, is this candidate more likely to
be what the typist meant than that candidate?
P(c) = c/N where N is the number of words in a corpus
Given, the possible errors, or transformation, how likely is it
that error y has operated on candidate x to produce the
typo?
P(t/c), calculated using a corpus of errors, or transformations
Bayesian rule:


get the product of the prior probability and the likelihood
P(c) X P(t/c)
Spelling dictionaries
non-word errors
Implementing spelling identification and
correction algorithm







STAGE 1: identify misspelled words
STAGE 2: generate list of candidates
STAGE 3a: rank candidates for probability
STAGE 3b: select best candidate
Implement:


noisy channel model
Bayesian Rule
Next week
Finite state machines and regular expressions