How Humans Process Language

Download Report

Transcript How Humans Process Language

Language Processing:
Humans & Computer
Psycholinguistics & Computational Linguistics
Lauren Kafka
Marina Hamoy
August 3, 2006
Psycholinguistics:

The area of linguistics that is concerned with
linguistic performance–how we use our
linguistic competence–in speech (or sign)
production and comprehension.
The Speech Chain:
Brain-to-Brain Linking




A spoken utterace starts as
a message in the speaker’s
brain/mind.
The message is put into
linguistic form and
interpreted as articulation
commands.
It emerges as an acoustic
signal.
The signal is processed by
the listener’s ear and sent to
the brain/mind, where it is
interpreted.
Physiological
Level
Linguistic
Level
Acoustic
Level
Comprehension


One goal of psycholinguistics is to describe
the processes people normally use in
speaking and understanding language.
Breakdowns in performance such as “tip-ofthe-tongue” phenomena, speech errors, and
failure to comprehend tricky sentences tell us
a lot about how language is processed.
Can you think of any of your own?

Examples of when some word was on the tip-ofyour-tongue, but you couldn’t think of it

Speech errors (Hung go)

Failure to comprehend tricky sentences

http://www.zippyvideos.com/5589295543497276/tim
e_out-1/original
Speech Sounds: Understanding
Begins with Hearing


Sound is produced whenever there is a
disturbance in the position of air molecules.
Acoustic phonetics is concerned only with
speech sounds, all of which can be heard by
the normal human ear.
Frequency, Pitch & Volume



The speed of the variations of air pressure
determines the fundamental frequency of
sounds.
This is perceived by the hearer as pitch.
The magnitude, or intensity, of the variations
determines the loudness of the sound.
Speech Perception






The speech signal can be broken into strings
of:
Phonemes
Syllables
Morphemes
Words
Phrases
Context & Lexical Access
Night rate vs. nitrate depends on context
 Meaning of words depends on lexical access
or word recognition
Example: A sniggle blick is procking a slar.
 If you don’t recognize the words, you
conclude that the sentence is nonsense.

Lexical Semantics



Processing speech to get at the meaning of
what is said requires syntactic analysis as
well as knowledge of lexical semantics.
Stress and intonation provide some clues to
syntactic structure. Example: He lives in the
white house. He lives in the White House.
Loudness, pitch, and duration of syllables
provide information about meaning.
Timing & Rhythm





I vant to sock your blut.
Ivan tsuckyour blut.
Ted Koppel gave an address.
Ted Koppel gave Ann a dress.
Can you think of two sentences that include
the same letters or sounds, but differ in
timing, rhythm, and meaning?
Language Analysis &
Computer Technology

Machine translation (MT)
–
–

Communication between people &
computers
–
–

Between natural languages
Analysis of authentic materials
Artificial intelligence (AI)
World Wide Web (www)
Research in linguistic theories
Frequency
Analysis

Corpus: ~1M spoken or written language data
gathered for linguistic research or analysis
1)
Frequency analysis
–
–
–
–
–
SAE: 30% - and, the, to, that, of, a, I, you, it,
& know
WAE: 25% - the (7%), of, and, to, a, that, in,
is, was, & he
English prepositions
WAE (except TO)
Profane/taboo
SAE
http://textalyser.net/
Concordance
Analysis

http://www.dundee.ac.uk/english/wics/wics.htm
1320
taste it is that such
948
of sparing the
778 small property of my
1870 desolate, while your
947
Miss, if the
1884
the love of my
1615 stockings, and all his
1577
faded away into a
1001
on your way to the
1036 detachment from the
poor
poor
poor
poor
poor
poor
poor
poor
poor
poor
cattle always have in their mouths
child the inheritance of any part of
father, whom I never saw Š so long
heart pined away, weep for it
lady had suffered so intensely
mother hid his torture from me
tatters of clothes, had, in a long
weak stain. So sunken and
wronged gentleman, and, with a
young lady, by laying a brawny hand
Collocation
Analysis


2 or more words with customary relationships
http://esl.about.com/library/vocabulary/blcollocation_1.htm
STRONG
support
50
safety
22
sales
21
oppostion 19
showing
18
sense
18
message
15
defense
14
POWERFUL
force
13
computers 10
position
8
men
8
computer 8
man
7
militar y
6
machines 6
C (w1, w2)
New York
United States
Los Angeles
last year
Saudi Arabia
last week
vice president
Persian Gulf
11,487
7,261
5,412
3,301
3,191
2,699
2,514
2,378
Information Retrieval: WWW


Search engines
Databases
–

http://www.language-archives.org/index.html
Prevent spammers from scanning your e-mail
address by clicking on the active e-mail link & by
using a simple JavaScript code
Data
Mining

Information extraction using keyword queries

Typical applications: customer profiling, fraud detection,
credit risk analysis, promotion evaluation

Norway to Wal-Mart: We don't want your shares Pension-fund investing with a social
consciousness.

Intelligence obtained by applying data mining to a
database of French theses on the subject of Brazil
Machine
Translation


“There's a message coming through, captain TRANSLATION SOFTWARE, the science-fiction
dream of a machine that understands any
language, has taken a step closer to reality.”
http://www.gutenberg.org/etext/6737
free download of literature
Computational
Phonetics & Phonology



Computers programmed to produce synthetic
speech by following a ‘recipe’ of electronic
blending
Speech Recognition
Speech Synthesis
–
TTS difficulties


> 300 Heteronyms: read [reed] & [red]
Inconsistent spelling: tough, bough, cough, dough
Computational
Morphology




Computers need to understand the inter-weaving of
rules, exceptions & morpheme & word structure
Computer’s dictionary: morphological forms – needs
continual updating
Form predictability: impossible for compounding –
sky+box= skybox
Component morpheme
–
–
Monomorpheme or not – [reZENT] or [Resent]
Heteronyms - lead [leed] & [led]
Computational
Syntax: ELIZA

ELIZA: 1st human-machine communication
invented by J Weizenbaum
–

Circuit-Fix-It-Shop: NCSU & DU repair tech
programmed speech
–

using syntax (print) simulating a psychiatric session
Capable of understanding & speaking complex
utterances
Computer parser
References

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catal
ogId=LDC2001T02

http://www.language-archives.org/index.html
http://www.gutenberg.org/etext/6737
http://www.nsknet.or.jp/~peterrs/concordancing/usingconcs.html
www.otal.umd.edu/SHORE2001/
crossLang/index.html
http://www.dundee.ac.uk/english/wics/wics.htm
http://textalyser.net/
http://www.zippyvideos.com/5589295543497276/time_o
ut-1/original





