fromkin-9-computers

Download Report

Transcript fromkin-9-computers

CHAPTER 9
LANGUAGE PROCESSING:
HUMANS AND COMPUTERS
PowerPoint by Don L. F. Nilsen
to accompany
An Introduction to Language (8th or 9th edition, 2007/2011)
by Victoria Fromkin, Robert Rodman
and Nina Hyams
50
1
BOTTOM-UP AND TOP-DOWN PROCESSING
Bottom-up processing relates to decoding.
You start with the actual sounds, letters,
morphemes, etc. and figure out the words,
phrases, clauses, sentences, paragraphs,
etc.
Top-down processing is based on reasoning.
You make a generalization and see how well
the sounds, letters, morphemes, etc. support
your generalization.
(Fromkin Rodman Hyams [2011] 381-382)
50
2
Top-down reasoning is powerful, but it
can be dangerous if it is not
accompanied by bottom-up reasoning.
For example, Otto Jesperson assumed
that men were better thinkers than
women.
He conducted an experiment in which
men and women read a story and were
given a quiz.
50
3
The women responded more quickly and more
accurately than the men, which was not what
Jacobson had expected.
So he concluded that women’s minds have
“vacant chambers” that men’s minds don’t
have.
This allowed Jacobson to account for his
evidence while at the same time not
disproving his original hypothesis that men
were better thinkers than women.
50
4
COMPUTER WORDS AND METAPHORS
COMPUTER WORDS: bits, bytes, code police,
cyberspace, future shock, hackers, hard copy,
menu, third wave, user-friendly
COMPUTER METAPHORS: 42, bug, cookies, GIGO,
great runes, heavy wizardry, Lotus Software, Melvyl
(California Library System), a sagan, snail mail,
Sorcerer’s Apprentice mode, Trojan Horse, USLT,
vulcan nerve pinch, web, YABA compatible
50
5
CONCORDANCE
A Concordance checks for word frequency and word
associations.
Word associations are determined by providing a
window of three words (more or less) on each side of
the targeted word.
(Fromkin Rodman Hyams [2011] 407-409)
50
6
ELIZA
ELIZA is a program designed to pass the turing
test. If a person communicating with ELIZA
cannot tell whether or not ELIZA is a human,
then ELIZA passes the turing test.
ELIZA claims to be a psychiatrist talking to a
patient. ELIZA, like a psychiatrist, is
attempting to get rather than give
information:
50
7
PATIENT: Men are all alike.
ELIZA: In what way?
PATIENT: They’re always bugging me about something
or other.
ELIZA: Can you think of a specific example?
PATIENT: Well, my boyfriend made me come here.
ELIZA: Your boyfriend made you come here?
PATIENT: He says I’m depressed much of the time.
ELIZA: I am sorry to hear that you are depressed.
(Fromkin Rodman Hyams [2011] 398)
50
8
Victor Raskin fooled ELIZA by saying, “I
don’t like my mother and I don’t want to
talk about her.”
ELIZA responded, “Tell me more about
your mother.
(Nilsen Nilsen & Combs 199)
50
9
CIRCUIT FIX-IT SHOP
Twenty-five years after ELIZA, a program named
“Circuit Fix-It Shop” was developed:
COMPUTER: This is the circuit fix-it shop. How may I
help you?
TECHNICIAN: I need to fix the RS111 circuit.
COMPUTER: I am familiar with that circuit. The LED is
supposed to be displaying alternatively blinking one
and seven.
TECHNICIAN: Where is the LED?
50
10
COMPUTER: In the middle right of the blue region is
the LED.
TECHNICIAN:
No light.
COMPUTER: What is the switch at when the LED is off?
TECHNICIAN: The switch is down.
COMPUTER: You put up the switch.
(Fromkin Rodman Hyams [2011] 398-402, 405)
50
11
MORE SOPHISTICATED PROGRAMS
Today more sophisticated programs are needed. One
such program is the little Paperclip guy that answers
questions in Microsoft Word.
Another sophisticated program is “Script Model
Grammar” designed by Roger Schank and Robert
Abelson and modified by linguist Victor Raskin and
others at Purdue University and elsewhere.
50
12
SAM: SCRIPT APPLIER MECHANISM
Of course sentences need to be parsed in Artificial
Intelligence. But constituents larger than a sentence
must be parsed as well.
One of the devices for doing this discourse parsing is
the “Script Applier Mechanism” devised by Roger
Schank and Robert Abelson and modified for humor
and language play by Victor Raskin and others.
50
13
Note that a play or a movie has a script
for the actors to follow.
The script in Artificial Intelligence is the
same, but it is much simpler. It is a
“mundane script.”
The “Restaurant Script,” for example
involves a customer, a server, a
cashier, etc.
50
14
Props in the “Restaurant Script” include the restaurant, the table,
the menu, the food, the check, the payment, the tip, etc.
The sequence of actions is as follows:
1. Customer goes to restaurant.
2. Customer goes to table.
3. Server brings menu.
4. Customer orders food.
5. Server brings food.
6. Customer eats food.
7. Server brings check.
8. Customer leaves tip for server.
9. Customer gives payment to cashier.
10. Customer leaves restaurant.
(Hendrix and Sacerdote 654)
(Nilsen Nilsen & Combs 199)
50
15
There are two exciting things about the Script
Applier Mechanism. First, it will be able to
spot anything that is missing, added, or out
of place in the sequence of events and ask,
“What’s up.”
Second, it is able to handle two scripts at the
same time, so that it is capable of dealing
with jokes, language play, satire, irony,
sarcasm, parody, paradox and double
entendre in general.
50
16
PARSING PROBLEMS
GARDEN PATH:
The horse raced past the barn fell.
After the child visited the doctor prescribed a course of injections.
The doctor said the patient will die yesterday.
(Fromkin Rodman Hyams [2011] 385)
EMBEDDING: “Never imagine yourself not to be otherwise than
what it might appear to others…to be otherwise.”
(Lewis Carroll’s Alice’s Adventures in Wonderland)
(Fromkin Rodman Hyams [2011] 377)
50
17
RIGHT-BRANCHING VS. EMBEDDING
RIGHT BRANCHING: This is the dog that worried the
cat that killed the rat that ate the malt that lay in the
house that Jack built.
EMBEDDING: Jack built the house that the malt that
the rat that the cat that the dog worried killed ate lay
in.
NOTE Multiple embedding is OK for a computer, but
not OK for the human brain.
(Fromkin Rodman Hyams [2011] 386)
50
18
ANOMALOUS WORDS: A sniggle blick is
procking a slar.
(Fromkin Rodman Hyams [2007] 368)
METANALYSIS (incorrect phrase breaking):
grade A vs. grey day
night rate vs. nitrate
(Fromkin Rodman Hyams [2007] 370)
NOTE: English “adder” and “apron” were
borrowed incorrectly from the French
expressions “un nadder” and “un naperon”
respectively
50
19
AMBIGUOUS SYNTAX IN NEWSPAPER
HEADLINES:
Teacher Strikes Idle Kids
Enraged Cow Injures Farmer with Ax
Killer Sentenced to Die for Second Time in 10
Years
Stolen Painting Found by Tree
(Fromkin Rodman Hyams [2011] 384)
50
20
REAL-WORLD KNOWLEDGE
Explain why the following sentences are ambiguous to
a computer but not to a human:
A cheesecake was on the table. It was delicious and
was soon eaten.
SIGN IN A CHURCH: For those of you who have
children and don’t know it, we have a nursery
downstairs.
NEWSPAPER AD: Our bikinis are exciting; they are
simply the tops.
(Fromkin Rodman Hyams [2011] 423-424)
50
21
ANTISMOKING CAMPAIGN SLOGAN:
It’s time we make smoking history.
Do you know the time?
Concerned with spreading violence, the president called a press
conference.
The ladies of the church have cast off clothing of every kind and
they may be seen in the church basement Friday.
(Fromkin Rodman Hyams [2011] 423-424)
50
22
AMBIGUOUS NEWSPAPER HEADLINES
Red Tape Holds Up New Bridge
Kids Make Nutritious Snacks
Sex Education Delayed, Teachers
Request Training
(Fromkin Rodman Hyams [2011] 423-424)
50
23
SEMANTIC PRIMING
In the human brain, the word “doctor” is more easily
and more completely processed if it is preceded by
“nurse” than if it is preceded by “flower.”
This is because “doctor” and “nurse” “are located in
the same part of the mental lexicon.”
(Fromkin Rodman Hyams [2011] 383-384)
This same feature could easily be built into Artificial
Intelligence.
50
24
SPEECH RECOGNITION
& SPEECH SYNTHESIS
Computational phonetics and phonology has two concerns. The
first is with programming computers to analyze the speech
signal into its component phones and phonemes.
The second is to send the proper signals to an electronic speaker
so that it enunciates the phones of the language and combines
them into morphemes and words.
The first of these is speech recognition; the second is speech
synthesis.
(Fromkin Rodman Hyams [2011] 391-395)
50
25
Machines which imitate human speech,
are so difficult to construct that many
agencies are involved in producing a
single word.
Things that must be considered include
not only the sounds, but also the
inflections and variations of tone and
articulation.
(Fromkin Rodman Hyams [2011] 391-395)
50
26
TO SYNTHESIZE SPEECH:
1. Start with a tone at the same frequency as vibrating vocal cords (higher
if a woman’s or child’s voice is being synthesized, lower for a man’s)
2. Emphasize the harmonics corresponding to the formants required for a
particular vowel, liquid, or nasal quality.
3. Add hissing or buzzing for fricatives.
4. Add nasal resonances for nasal sounds.
5. Temporarily cut off sound to produce stops and affricates….
(Fromkin Rodman Hyams [2011] 394)
A Sound Spectrogram will give an indication of some of the variables of
analyzing or synthesizing speech:
50
27
SOUND SPECTROGRAM
(Fromkin, Rodman &Hyams [2011] 379)
50
28
SPELL CHECKER
I have a spelling checker.
It came with my PC.
It plane lee marks four my revue
Miss steaks aye can knot sea.
(Fromkin Rodman Hyams [2011] 411)
Explain why the spell checker is not
working in the poem above.
50
29
THEORIES AND MODELS
In The Physicist’s Conception of Nature, Manfred Eigen
said, “A theory has only the alternative of being right
or wrong. A model has a third possibility: it may be
right, but irrelevant.”
(Fromkin Rodman Hyams [2007] 397)
Explain why a theory for Artificial Intelligence must be
rigorous and at the same time allow for language
play. In AI, are rigor and language play compatible
concepts or not?
50
30
TRANSLATION
Translation is not just a word-for-word
replacement.
Often there is no equivalent word in the target
language, and the order of words may differ,
as in translating from an SVO language like
English to an SOV language like Japanese.
There is also difficulty in translating idioms,
metaphors, jargon, and so on.
(Fromkin Rodman Hyams [2011] 391-406)
50
31
Machine translation is often impeded by lexical
and syntactic ambiguities, structural
disparities between the two languages,
morphological complexities, and other crosslinguistic differences.
(Fromkin Rodman Hyams [2011] 391-406)
In the following examples consider what
information must be taken into consideration
for better machine translation:
50
32
BUCHAREST HOTEL: The lift is being fixed for the next day. During
that time we regret that you will be unbearable.
SWISS NUNNERY HOSPITAL: The nuns harbor all diseases and have
no respect for religion.
GERMAN HOTEL: All the water has been passed by the manager.
ZURICH HOTEL: Because of the impropriety of entertaining guest of
the opposite sex in the bedroom, it is suggested that the lobby be
used for this purpose.
TURKEY: The government bans the smoking of children.
(Fromkin Rodman Hyams [2007] 382)
50
33
Having Fun with
Computer
Terminology
50
34
1024
When Alan Schoenfeld of the University of
California at Berkeley attended a conference
on Artificial Intelligence, he was given Hotel
Room Number 1024.
Wow! he said.
1024 is 2 to the tenth power. It is a megabyte.
(Nilsen & Nilsen 98)
50
35
ACRONYMS
Acronyms are so common in computer terminology that
programmers make fun of them.
“TLA” stands for “Three Letter Acronym.”
“YABA” stands for “Yet Another Bloody Acronym.”
“YABA Compatible” means that the initials can be pronounced
easily are are not obscene.
(Nilsen & Nilsen 99)
50
36
CHAT GROUPS
Linguist Susan Herring at the University of Texas, Arlington
studied the humor in chat groups. Her results were as follows:
imaginary situations: 20 percent
a mock persona: 14 percent
teasing: 13 percent
irony: 6 percent
name play: 5 percent
silliness: 4 percent
real situations: 3 percent
riddles: 2 percent
pretended misunderstandings: 2 percent
puns: 1 percent
(Nilsen & Nilsen 167)
50
37
EMOTICONS
In conversation we can show our emotions, but on the internet
this is difficult, so we use emoticons:
:-) Smiling
:-)))))))))) Really Smiling
;-) Winking
:-* Kissing
I-0 Yawning
:-& Tongue-Tied
:’-{ Crying
:-/ Undecided
:-II Angry
(Nilsen & Nilsen 100)
50
38
SCIENCE FICTION AND FANTASY
Many computer terms come from Science Fiction and Fantasy:
A huge network packet is a “Godzillagram” from Godzilla
Teenage hackers are “Munchkins” from The Wizard of Oz
A mischievlous program is called a “wabbit” from Elmer Fudd’s
“You wascawwy wabbit.”
A program that repeats itself indefinitely is said to be in
“Sorcerer’s Apprentice Mode” from Fantasia
The meaning of life, truth, and everything is “42” from a computer
in Douglas Adams’ novel.
(Nilsen & Nilsen 99)
50
39
When someone goes onto the internet to get
information that is easily available from a
manual, etc. the Cyber Police might say,
“USLT.” This means “Use the Source, Luke!”
from Starwars.
Another word from Starwars is an “Obi-Wan
Error.” This comes from the name “Obi-Wan
Kenobi” and refers to an “off-by-one code,” as
in 2001: A Space Odyssey where the computer
is named “HAL.” This comes from “IBM” but
is the three letters before I, B, and M.
(Nilsen & Nilsen 99)
50
40
In computer terminology a soft boot refers to the
hitting of “Control,” “Alternate” and “Delete” at the
same time.
This is refered to as the “Vulcan Nerve Pinch” from
Star Trek.
“Droid” from “Android” has become a suffix in such
words as “trendroids,” who follow trends, and “sales
droids” which promise customers things that can be
delivered or are useless.
The “code police” and “net police” are named after the
“thought police” in George Orwell’s 1984.
50
41
SIGNATURES
People like to create enigmatic and puzzling
signatures. One user named Eddie follows
his signature with “Ceci n’est pas une
signature.”
This is an allusion to a painting of a pipe by
René Magritte with the disclaimer, “Ceci n’est
pas une pipe.”
(Nilsen & Nilsen 166)
50
42
TEXT MESSAGING
Since numbers and letters require more than a single stroke on cell
phones, acronyms are often used:
AFAIK: As far as I know
BTW: By the way
CUL or CUL8R: See you later
GIGO: Garbage In Garbage Out
GFR: Grime File Reaper
LOL: Lots of Laughs
OIC: Oh, I see
50
43
OMG: Oh My Gosh
http://www.youtube.com/watch?v=0P0jY-Di6fg
POS: Parent Over Shoulder
ROTF: Rolling on the Floor
ROTFLMAO: Rolling on the Floor Laughing My Ass Off
RUOK: Are you OK?
TIA: Thanks in Advance
WTF: Not translatable
WYSIWYG: What you See Is What You Get
BCNU: Be Seein’ you
(Nilsen & Nilsen 99)
50
44
TWENTE, NETHERLANDS
Every year there is an annual workshop on
Language Technology at the University of
Twente.
In 1996 this workshop was devoted to
“Automatic Interpretation and Generation
of Verbal Humor.”
The papers at this conference had such
titles as:
50
45
“Why do People Use Irony?”
“Password Swordfish: Verbal Humour in the Interface.”
“Computer Implementation of the General Theory of Verbal Humor.”
“Humor Theory beyond Jokes.”
“Speculations on Story Puns.”
“Relevance Theory and Humorous Interpretations.”
“What Sort of a Speech Act is the Joke?”
“A Neural Resolution of the Incongruity-Resoulution Theory of Humor”
“Humorous Analogy: Modeling the Devil’s Dictionary.”
“Why Is a Riddle Not Like a Metaphor?” and
“An Attempt at Natural Humor from a Natural Language Robot.”
(Nilsen and Nilsen 98)
50
46
VIRUS JOKES
AT&T Virus: Every three minutes it tells
you what great service you are getting.
MCI Virus: Every three minutes it
reminds you that you’re paying too
much for the AT&T virus.
50
47
Paul Revere Virus: This revolutionary
virus does not horse around. It warns
you of impending hard disk attack—
once if by LAN, twice if by C:>.
New World Order Virus: Probably
harmless, but it makes a lot of people
really mad just thinking about it.
(Nilsen & Nilsen 177)
50
48
!KURT VONNEGUT ON THE INTERNET
In August of 1997 a piece appeared on the Internet by Kurt
Vonnegut.
When Vonnegut’s wife was given a copy of the article she was so
pleased with her clever husband that she forwarded a copy to
their children.
Vonnegut said that it was “funny and wise and charming,” but he
said he never wrote it.
50
49
!!The article had actually been published by Mary Schmich in the
Chicago Tribune and then picked up and redistributed by a
computer hacker.
Ian Fisher of The New York Times said that as long as readers
thought the piece was Vonnegut’s, they viewed the Internet as a
wonderful tool that could keep people in touch with each other.
But when they learned it was a hoax, their perception of the
internet changed. The internet was now an unreliable hotbed of
hoaxes and wild-eyed conspiracies.
Probably both opinions are true.
(Nilsen & Nilsen 168)
50
50
!!!Computer-Humor Websites
ANIMATOR VS. ANIMATION II:
http://www.metacafe.com/watch/689540/animator_vs_animation_2/
THE THE IMPOTENCE OF PROOFREADING (TAYLOR MALI):
http://www.youtube.com/watch?v=p_rwB5_3PQc
TOP 50 POPULAR TEXT & CHAT ACRONYMS (NETLINGO):
http://www.netlingo.com/top50/popular-text-terms.php
USHER’S OMG, FEATURING WILL.I.AM—AUTOTUNE:
http://www.youtube.com/watch?v=0P0jY-Di6fg
50
51
References.
Clark, Virginia, Paul Eschholz, and Alfred Rosa. Language: Readings in
Language and Culture, 6th Edition. New York, NY: St. Martin’s Press,
1998.
English, Katharine, ed. Most Popular Web Sites: The Best of the Net from
A2Z. Indianapolis, IN: Lycos Press, 1996.
Fromkin, Victoria, Robert Rodman, and Nina Hyams. “Language
Processing: Humans and Computers.” An Introduction to Language,
9thEdition. Boston, MA: Thomson Wadsworth, 2011, 375-429.
Gralla, Preston. How the Internet Works. Emoryville, CA: Ziff-Daivs Press,
1997.
Hempelmann, Christian F. “Computational Humor: Beyond the Pun?” in
Raskin [2008]: 333-360.
Hendrix, Gary G., and Earl D. Sacerdoti. “Natural-Languag Processing:
The Field in Perspective.” in Language: Introductory Readings, 4th
edition. Eds. Virginia P. Clark, Paul A. Eslchholz and Alfred F. Rosa.
New York, NY: St. Martin’s, 1985.
50
52
Hulstijn, J., and A. Nijholt eds. Twente Workshop on Language Technology
12: Automatic Interpretation and Generation of Verbal Humor. Twente,
Netherlands: Univ of Twente Dept of Computer Science, 1996.
Nilsen, Alleen Pace, and Don L. F. Nilsen. “Computer Humor,” and
“Internet Influences.” Encyclopedia of 20th Century American Humor.
Westport, CT: Greenwood, 2000, 97-100 and 165-168.
Nilsen, Don L. F., Alleen Pace Nilsen, and Nathan H. Combs. “Teaching a
Computer to Speculate.” Computers and the Humanities. 22 (1988):
193-201.
Nilsen, Kelvin, and Alleen Pace Nilsen. “Literary Metaphors and Other
Linguistic Innovations in Computer Language” (Clark, 166-176).
Raskin, Victor. Semantic Mechanisms of Humor. Boston, MA:
Reider/Kluwer, 1985.
Raskin, Victor. The Primer of Humor Research. New York, NY: Mouton de
Gruyter, 2008.
50
53
Raymond, Eric S. The New Hacker’s Dictionary, 2nd Edition. Cambridge,
MA: MIT Press, 1993.
Roberts, Steven K. “Artificial Intelligence.” in Writing and Reading Across
the Curriculum, 2nd Edition. Laurence Behrens and Leonard J. Rosen.
Boston, MA: Litle, Brown, 1985, 214-222.
Rosch, Eleanor. “On the Internal Structure of Perceptual and Semantic
Categories.” in Cognitive Development and the Acquisition of
Language. Ed. T. Moore. New york, NY: Academic Press, 1973.
Schank, Roger C., and Robert Abelson. Scripts, Plans, Goals, and
Understanding: An Inquiry Into Human Knowledge Structures.
Hillsdale, NJ: Lawrence Erlbaum, 1977.
Siegel, David. Creating Killer Web Sites. Indianapolis, IN: Hayden Books,
1996.
50
54