Computational Language
Download
Report
Transcript Computational Language
Computational Language
Andrew Hippisley
Computational Language
Computational language and AI
Language engineering: applied
computational language
Case study: spell checkers
Computational language & AI
Artificial Intelligence:
“the simulation on computer of distinctly human
mental functions.” Wilks (1993)
Computational language & AI
Language integral to intelligent systems
Artificial Intelligence
Turing Test
ELIZA
ELIZA
Eliza: Hello. I am ELIZA. How can I help you?
Me: I am feeling a bit depressed today
Eliza: Do you think coming here will help you not to be depressed?
Me: yes
Eliza: You are sure?
Me: a friend told me I should talk to you
Eliza: Do you wish to you should talk to me?
Me: I am willing to try
Eliza: How long have you been willing to try?
Me: I have been willing to try for about two weeks
Eliza: Can you elaborate on that?
Me: a friend said I should try to talk to you about my problem
Eliza: Why do you say your problem?
Me: I have a problem with depression
Eliza: Can you elaborate on that?
Me: no, I don't wish to discuss it
Eliza: You are being a bit negative
Computational language & AI
Why language engineering?
Language integral to intelligent systems
Artifiicial Intelligence
Turing Test
ELIZA
Expert systems: natural language interface,
natural language database
Computational language & AI
Methods shared across systems
Finite State Transition Networks (FSTN)
Logic
Formal rules
Probability
Data: you know it!
Applied computational language
History of the field
Machine Translation: 1960, 1966,
post 1966
Database access
Text interpretation
Information retrieval
Text categorisation
Language engineering
Information overload
Need a way of automatically processing
text documents
Information extraction
Language engineering
Information extraction
GIDA: system for automatically monitoring
financial market sentiment
GIDA
Language engineering
Information overload
Need a way of automatically processing
text documents
Information extraction
Summarisation
Automatic summarisation
(courtesy of Paulo FERNANDES de OLIVEIRA, PhD)
• Get information source;
• Extract some content from it;
• Present the most important part to the user
xx xxx xxxx x xx xxxx
xxx xx xxx xx xxxxx x
xxx xx xxx xx x xxx xx
xx xxx x xxx xx xxx x
xx x xxxx xxxx xxxx xx
xx xxxx xxx
xxx xx xx xxxx x xxx
xx x xx xx xxxxx x x xx
xxx xxxxxx xxxxxx x x
xxxxxxx xx x xxxxxx
xxxx
xx xx xxxxx xxx xx x xx
xx xxxx xxx xxxx xx
xxx xx xxx xxxx xx
xxx x xxxx x xx xxxx
xx xxx xxxx xx x xxx
xxx xxxx x xxx x xxx
xx xx xxxxx x x xx
xxxxxxx xx x xxxxxx
xxxx
xx xx xxxxx xxx xx
xxx xx xxxx x xxxxx
xx xxxxx x
Lexical Cohesion
Links Example
Sentence 15:
"For the stock market this
move was so deeply
discounted that I don't think it
will have a major impact".
Sentence 42:
Lucent, the most active stock on
the New York Stock Exchange,
skidded 47 cents to $4.31, after
falling to a low at $4.30.
Sentence 23:
J&J's stock added 83 cents to
$65.49.
Sentence 26:
Flagging stock markets
kept merger activity and
new stock offerings on
the wane, the firm said.
Text title: U.S. stocks hold some gains.
Collected from Reuters’ Website on 20 March 2002.
Lexical Cohesion
Bonds Example
17.
In other news, Hewlett-Packard said preliminary
estimates showed shareholders had approved its purchase of
Compaq Computer -- a result unconfirmed by voting officials.
19.
In a related vote, Compaq shareholders are expected on
Wednesday to back the deal, catapulting HP into contention
against International Business Machines for the title of No. 1
computer company.
Text title: U.S. stocks hold some gains.
Collected from Reuters’ Website on 20 March 2002.
Language engineering
Information overload
Need a way of automatically processing
text documents
Information extraction
Summarisation
Translation
Retrieve only relevant documents
Voice processing
Language engineering
Two main approaches
Symbolic
Stochastic
Case study spell checkers
Spelling dictionaries
aim?
given a sequence of symbols:
1. identify misspelled strings
2. generate a list of possible ‘candidate’ correct
strings
3. select most probable candidate from the list
Spelling dictionaries
Implementation:
Probabilistic framework
bayesian rule
noisy channel model
Spelling dictionaries
Types of spelling error
actual word errors
non-word errors
Spelling dictionaries
Types of spelling error
actual word errors
/piece/ instead of /peace/
/there/ instead of /their/
non-word errors
Spelling dictionaries
Types of spelling error
actual word errors
/piece/ instead of /peace/
/there/ instead of /their/
non-word errors
/graffe/ instead of /giraffe/
Spelling dictionaries
Types of spelling error
actual word errors
non-word errors
/piece/ instead of /peace/
/there/ instead of /their/
/graffe/ instead of /giraffe/
of all errors in type written texts, 80% are nonword errors
Spelling dictionaries
non-word errors
Cognitive errors
/seperate/ instead of /separate/
phonetically equivalent sequence of symbols has been
substituted
due to lack of knowledge about spelling conventions
Spelling dictionaries
non-word errors
Cognitive errors
Typographic (‘typo’) errors
influenced by keyboard
e.g. substitution of /w/ for /e/ due to its adjacency on
the keyboard
/thw/ instead of /the/
Spelling dictionaries
non-word errors
noisy channel model
The actual word has been passed through a noisy
communication channel
This has distorted the word, thereby changing it in some
way
The misspelled word is the distorted version of the actual
word
Aim: recover the actual word by hypothesising about the
possible ways in which it could have been distorted
Spelling dictionaries
non-word errors
noisy channel model
What are the possible distortions?
insertion
deletion
substitution
transposition
all of these viewed as transformations that take
place in the noisy channel
Spelling dictionaries
Implementing spelling identification and
correction algorithm
Spelling dictionaries
Implementing spelling identification and correction
algorithm
STAGE 1: compare each string in document with a list of
legal strings; if no corresponding string in list mark as
misspelled
STAGE 2: generate list of candidates
Apply any single transformation to the typo string
Filter the list by checking against a dictionary
STAGE 3: assign probability values to each candidate in
the list
STAGE 4: select best candidate
Spelling dictionaries
STAGE 3
prior probability
likelihood
given all the words in English, is this candidate more likely to
be what the typist meant than that candidate?
P(c) = c/N where N is the number of words in a corpus
Given, the possible errors, or transformation, how likely is it
that error y has operated on candidate x to produce the
typo?
P(t/c), calculated using a corpus of errors, or transformations
Bayesian rule:
get the product of the prior probability and the likelihood
P(c) X P(t/c)
Spelling dictionaries
non-word errors
Implementing spelling identification and
correction algorithm
STAGE 1: identify misspelled words
STAGE 2: generate list of candidates
STAGE 3a: rank candidates for probability
STAGE 3b: select best candidate
Implement:
noisy channel model
Bayesian Rule
Next week
Finite state machines and regular expressions