McCoy - University of Delaware

Download Report

Transcript McCoy - University of Delaware

Kathy McCoy
Artificial Intelligence
Natural Language Processing
Applications for People with Disabilities
Primary Research Areas

Natural Language Generation – problem of choice.





Deep Generation --- structure and content of coherent text
Surface Generation – particularly using TAG (multi-lingual
generation and machine translation)
Discourse Processing
Second Language Acquisition
Applications for people with disabilities affecting
their ability to communicate
Projects

Augmentative Communication – Center for
Applied Science and Engineering in Rehabilitation
(ASEL) – Word Prediction and Contextual Information
(Keith Trnka, (Jay McCaw), Chris Pennington, Debbie
Yarrington)
 ICICLE – CALL system for teaching English as a
second language to ASL natives (Rashida Davis, Charlie
Greenbacker)
 Text Skimming – for someone who is blind to skim a
document to find an answer to a question (Debbie
Yarrington).
 Generating Textual Summaries of Graphs – (Sandee
Carberry, Seniz Demir)
Developing Intelligent
Communication Aids for
People with Disabilities
Kathleen F. McCoy
Computer and Information Sciences &
Center for Applied Science and
Engineering in Rehabilitation
University of Delaware
Augmentative Communication

Intervention that gives non-speaking person an
alternative means to communicate
User Population
 May have severe motor impairments





Unable to speak
Unable to write
Cannot use sign language
May have cognitive impairments and/or developmental
disabilities
May be too young to have developed literacy skills
Row-Column Scanning
Row-Column Scanning II
Language Representation: Words
Still Need to Spell!
Predicting Fringe Vocabulary
Word Prediction of Spelled Words (infrequent
context-specific words)
Methods
 Statistical NLP Methods
 Learning from the context of the individual
 Other Contextual Clues


Geographic Location, Time of Day, Conversational
Partner, Topic of Conversation, Style of the
Document
Prediction Example
Trigram Model: P(w|h)=P(w|w-2 w-1)
Can we do better??


Intuitively all possible words do not occur with
equal likelyhood during a conversation.
The topic of the conversation affects the words
that will occur.
E.g., when talking about baseball: ball, bases, pitcher,
bat, triple….
 How often do these same words occur in your
algorithms class?

Topic Modeling


Goal: Automatically identify the topic of the
conversation and increase the probability of
related words and decrease probability of
unrelated words.
Questions
Topic Representation
 Topic Identification
 Topic Application
 Topic Language Model Use

Topic Modeling Approach
Topic Identification
Topic Identification
Topic Application


How do we use those similarity scores?
Essentially weight the contribution of each topic
by the amount of similarity that topic has with
the current conversation.
Results Using Topics
Current Work


What happens with significantly larger corpora?
What other kinds of tuning to the user can we
do:
Recency
 Style


Does keystroke savings translate into
communication rate enhancement?
Text Skimming
Debra Yarrington, Kathleen McCoy
Problem:

Blind and dyslexic individuals cannot skim text
Example: “What’s the syntax for calling a function with
template parameters?” (skimming through code)
 “Why was Ayers Rock renamed?”
 “What type of tree produces leaves with three distinct
shapes?”
 “Where can I find more information about Portugal?”


People who cannot read text rely on
screen readers (Jaws, Window-Eyes)
 braille output



more difficult to come by
extremely bulky to carry around
Example of Jaws Output at 400
wpm

Link
“What psychological and philosophical significance should we attach to recent efforts at
computer simulations of human cognitive capacities? In answering this question, I find it useful
to distinguish what I will call "strong" AI from "weak" or "cautious" AI (Artificial Intelligence).
According to weak AI, the principal value of the computer in the study of the mind is that it
gives us a very powerful tool. For example, it enables us to formulate and test hypotheses in a
more rigorous and precise fashion. But according to strong AI, the computer is not merely a tool
in the study of the mind; rather, the appropriately programmed computer really is a mind, in the
sense that computers given the right programs can be literally said to understand and have other
cognitive states. In strong AI, because the programmed computer has cognitive states, the
programs are not mere tools that enable us to test psychological explanations; rather, the
programs are themselves the explanations.
I have no objection to the claims of weak AI, at least as far as this article is concerned. My
discussion here will be directed at the claims I have defined as those of strong AI, specifically the
claim that the appropriately programmed computer literally has cognitive states and that the
programs thereby explain human cognition. When I hereafter refer to AI, I have in mind the
strong version, as expressed by these two claims.
I will consider the work of Roger Schank and his colleagues at Yale (Schank & Abelson
1977), because I am more familiar with it than I am with any other similar claims, and because it
provides a very clear example of the sort of work I wish to examine. But nothing that follows
depends upon the details of Schank's programs. The same arguments would apply to Winograd's
SHRDLU (Winograd 1973), Weizenbaum's ELIZA (Weizenbaum 1965), and indeed any Turing
machine simulation of human mental phenomena.”
Proposed Solution:


A system that takes a question and a document
or a few documents, and returns a small set of
text links where potential answers to the
question might be found
In order to accomplish this, we will potentially
use:
Techniques used in existing Question Answering
systems
 Data collected from skimming text with an eye
tracking device

Example
Gaze Plot
link
Hot Spots
What
Art
Middle
infused
purpose
with
also served
people believed
writing does
who read
Sculpture. The mission
as well as decorate
Biblical tales
lessons to
were
church sculpture; animals
life
“Green man” peering
carefully
wrought
forth
Romanesque era
classical
conventions
of figures
Romanesque
At the beginning
era the style of
architecture
that was in vogue
Known as Romanesque because it
copied the pattern
proportion
of the architecture
the Roman Empire
chief characteristics of the
Romanesque style were
vaults, round arches,
and few windows
The easiest point to look
for is the rounded arch, seen in door
openings
windows
In general
churches were heavy
Carrying about them an air
solemnity and
These early
tapestries or
look closely
were
France called it “gothic”
was a reference
Ransacked Rome
twilight
architectural
Romanesque
vaults
incorporated
of window
The easiest point of
arch
doors. Also
later Gothic
very
especially the
the
churches
outdo each
of
For the
construction, througt
The architect
same place
Text Skimming
Debra Yarrington, Kathleen McCoy
Problem:

Blind and dyslexic individuals cannot skim text
Example: “What’s the syntax for calling a function with
template parameters?” (skimming through code)
 “Why was Ayers Rock renamed?”
 “What type of tree produces leaves with three distinct
shapes?”
 “Where can I find more information about Portugal?”


People who cannot read text rely on
screen readers (Jaws, Window-Eyes)
 braille output



more difficult to come by
extremely bulky to carry around
Example of Jaws Output at 400
wpm

Link
“What psychological and philosophical significance should we attach to recent efforts at
computer simulations of human cognitive capacities? In answering this question, I find it useful
to distinguish what I will call "strong" AI from "weak" or "cautious" AI (Artificial Intelligence).
According to weak AI, the principal value of the computer in the study of the mind is that it
gives us a very powerful tool. For example, it enables us to formulate and test hypotheses in a
more rigorous and precise fashion. But according to strong AI, the computer is not merely a tool
in the study of the mind; rather, the appropriately programmed computer really is a mind, in the
sense that computers given the right programs can be literally said to understand and have other
cognitive states. In strong AI, because the programmed computer has cognitive states, the
programs are not mere tools that enable us to test psychological explanations; rather, the
programs are themselves the explanations.
I have no objection to the claims of weak AI, at least as far as this article is concerned. My
discussion here will be directed at the claims I have defined as those of strong AI, specifically the
claim that the appropriately programmed computer literally has cognitive states and that the
programs thereby explain human cognition. When I hereafter refer to AI, I have in mind the
strong version, as expressed by these two claims.
I will consider the work of Roger Schank and his colleagues at Yale (Schank & Abelson
1977), because I am more familiar with it than I am with any other similar claims, and because it
provides a very clear example of the sort of work I wish to examine. But nothing that follows
depends upon the details of Schank's programs. The same arguments would apply to Winograd's
SHRDLU (Winograd 1973), Weizenbaum's ELIZA (Weizenbaum 1965), and indeed any Turing
machine simulation of human mental phenomena.”
Proposed Solution:


A system that takes a question and a document
or a few documents, and returns a small set of
text links where potential answers to the
question might be found
In order to accomplish this, we will potentially
use:
Techniques used in existing Question Answering
systems
 Data collected from skimming text with an eye
tracking device

What
Art
Middle
infused
purpose
with
also served
people believed
writing does
who read
Sculpture. The mission
as well as decorate
Biblical tales
lessons to
were
church sculpture; animals
life
“Green man” peering
carefully
wrought
forth
Romanesque era
classical
conventions
of figures
Romanesque
At the beginning
era the style of
architecture
that was in vogue
Known as Romanesque because it
copied the pattern
proportion
of the architecture
the Roman Empire
chief characteristics of the
Romanesque style were
vaults, round arches,
and few windows
The easiest point to look
for is the rounded arch, seen in door
openings
windows
In general
churches were heavy
Carrying about them an air
solemnity and
These early
tapestries or
look closely
were
France called it “gothic”
was a reference
Ransacked Rome
twilight
architectural
Romanesque
vaults
incorporated
of window
The easiest point of
arch
doors. Also
later Gothic
very
especially the
the
churches
outdo each
of
For the
construction, througt
The architect
same place
Current Directions
Have collected eye-tracking data from close to 100
people (on several documents each)
 Analysis quite interesting – enough data to find
patterns in where the skimmers are looking.
 Analyzing data with “text tiling methods” to pick out
places in the text where “same thing” being
discussed.
 Incorporate question extraction techniques
 How to present this to the user?

Modeling the Acquisition of
English in the ICICLE System
Kathleen F. McCoy
Department of Computer and Information Sciences
University of Delaware
People

Current People



Rashida Davis
Charlie Greenbacker
Others


Chris Pennington, Dan Blanchard, Mike Bloodgood, Greg
Silber, Meghan Boyle, Mohamed Mostagir, Stephanie Baker,
Heejong Yi, David Derman
Graduates: Matthew Huenerfauth, Jill Janofsky, Lisa
Masterman Michaud, Litza Stark, David Schneider
The ICICLE Project
Interactive Computer Identification and
Correction of Language Errors


Interactive writing tutor for native signers of
American Sign Language (ASL)
Purpose: analyze student-written English texts
and provide individualized feedback and
instruction on grammar
The ICICLE Project

Cycle of user input, system response
student provides piece of text
 system analyzes text for grammatical
errors



system provides student with tutorial
instruction on the errors
student has opportunity to make
corrections and request re-analysis
The
ICICLE
System
Current Implementation
the student
enters text here
the system shows
which sentences
have errors
explanations
shown here
Writing From Deaf Students



Literacy is a serious issue for the Deaf population.
Lots of variation in level of acquisition.
Marked Differences from writing of hearing peers.



Dropped be: She really pretty.
Missing Possessives: She age is 13.
Subject/verb agreement, plural markers, determiners: She
really like go with friend to mall.
Work on ICICLE
Previous work focused on developing grammar
and mal-rules and modeling the user’s level of
acquisition (so different analyses can be found
depending on it)
Current Work
 Tutorial Responses
 Probabilistic Parsing – need help!
 NEED SYSTEM HELP!!!!!

What Mal-Rules do We Use?
“She is teach piano on Tuesdays.”

Beginner: Over-application of auxiliary IS,
missing simple present morphology:


Intermediate: Botched progressive tense:


She teaches piano on Tuesdays.
She is teaching piano on Tuesdays.
Advanced: Botched passive voice:

She is taught piano on Tuesdays.
Current Directions
Have collected eye-tracking data from close to 100
people (on several documents each)
 Analysis quite interesting – enough data to find
patterns in where the skimmers are looking.
 Analyzing data with “text tiling methods” to pick out
places in the text where “same thing” being
discussed.
 Incorporate question extraction techniques
 How to present this to the user?
