Transcript Slide 1
Far Reaching Research (FRR) Project
IBM Research
See, Hear, Do:
Language and Robots
Jonathan Connell
Exploratory Computer Vision Group
Etienne Marcheret
Speech Algorithms & Engines Group
Sharath Pankanti (ECVG)
Josef Vopicka (Speech)
© 2002 IBM Corporation
IBM Research
Challenge = Multi-modal instructional dialogs
Use speech, language, and vision to learn objects & actions
Innate perception abilities (objects / properties)
Innate action capabilities (navigation / grasping)
Easily acquire terms not knowable a priori
Example dialog:
command
following
verb
learning
noun
learning
advice
taking
Round up my mug.
I don’t know how to “round up” your mug.
Walk around the house and look for it.
When you find it bring it back to me.
I don’t know what your “mug” looks like.
It is like this <shows another mug> but sort of orange-ish.
OK … I could not find your mug.
Try looking on the table in the living room.
OK … Here it is!
Language Learning & Understanding is a AAAI Grand Challenge
http://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language
2
© 2005 IBM Corporation
IBM Research
Eldercare as an application
Example tasks:
Pick up dropped phone
Get blanket from another room
Bring me the book I was reading yesterday
Large potential market
Many affluent societies have a demographic imbalance (Japan, EU, US)
Institutional care can be very expensive (to person, insurance, state)
A little help can go a long way
Can be supplied immediately (no waiting list for admission)
Allows person to stay at home longer (generally easier & less expensive)
Boosts independence and feeling of control (psychological advantage)
Note: We are not attempting to address the whole problem
X
X
X
X
3
Aggressive production cost containment
Robust self-recharging and stairs traversal
Bathing and bathroom care, patient transfer, cooking
OSHA, ADA, FDA, FCC, UL or CE certification
© 2005 IBM Corporation
IBM Research
State of the art
Indoor navigation
Minerva from CMU, Jose from Univ. British Columbia
No object perception
No manipulation capability
Perception & manipulation
Herb from CMU / Intel (Kanade), PR2 from Willow Garage
Off-line object model generation
No natural language interface
Language learning
Ripley from MIT (Deb Roy), HAM from KTH in Sweden
Either fetch or carry
No procedural learning
Dialog and speech
Honda system from IBM, call center handling from IBM
No physical presence or action
No visual perception of objects
4
© 2005 IBM Corporation
IBM Research
Business Model
OEM
buy hardware
IBM
$70B / year
add software
and services
Third Party
customers
5
© 2005 IBM Corporation
IBM Research
Costs & revenue potential
OEM sales price for hardware
$6000
Electromechanical parts
Onboard computer
Assembly (15hrs x $80 / hr)
+ 30% Sales & distribution + 20% profit
$1300
$500
$1200
$3000
Value-added wholesale price (w/ software)
$15,000
10% Continued R&D
30% Sales & distribution
20% Profit
$1500
$4500
$3000
Price = Less than a new car
Total cost of ownership
$8000 / yr
Lifetime = 3 years
Service (15hrs / quarter x $50 / hr x 4 quarters)
Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr)
$4 / hr
Eldercare market in US (x3 if EU and AP also)
3 million
Total US population
Ages 75-85
Suitable (ability level, desire, finances)
6
$5000 / yr
$3000 / yr
$24B / yr
resell robot +
value added
software +
field service
300 million
10%
10%
Manufacturing business ($2000 / robot yr)
$6B / yr
Services business ($3000 / robot yr)
$9B / yr
© 2005 IBM Corporation
IBM Research
Sample business case
Home eldercare now (employer costs)
$25,000 / yr
1 aide from 8am to 6pm = 10 hrs
50wks x 5days / wk x 10hrs / day
= 2500 hrs / yr
Federal min. wage = $7.25 / hr
+38% overhead (FICA + 401K + medical)
= $10 / hr
Aide’s activities:
Help with clothes, hygiene, meals
Odd tasks such as fetching objects
Sitting around watching TV
Alternative: Half-time aide + robot
$20,500 / yr
Human still helps with clothes, hygiene, meals
Robot potentially available after hours and on weekends
No problem with robot Training, Turnover, and Trust (stealing)
Value proposition (to client):
30% more hours @ 10% less cost
Split savings with customer ($50,000 $45,000 per client)
Human 5 hrs + robot 8 hrs = 13 hrs / day during week
10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share)
Bill at $20,000 - $3000 service = $17,000 / yr revenue 10.6 months payback on $15,000 purchase
7
© 2005 IBM Corporation
IBM Research
What’s different and important
Speech-driven interface
No headset required (far field), can learn new nouns and verbs
Multi-modal dialog
Responds to gestures, exploits synergies between modalities
Manipulation as well as mobility
Not just a walking telephone, can do useful physical work also
One-shot learning
No turntable scanning, not 100’s of examples, no trial-and-error experiments
Cost containment
Vision instead of special-purpose sensors and precise mechanicals
8
© 2005 IBM Corporation