Transcript Slide 1

Far Reaching Research (FRR) Project
IBM Research
See, Hear, Do:
Language and Robots
Jonathan Connell
Exploratory Computer Vision Group
Etienne Marcheret
Speech Algorithms & Engines Group
Sharath Pankanti (ECVG)
Josef Vopicka (Speech)
© 2002 IBM Corporation
IBM Research
Challenge = Multi-modal instructional dialogs
Use speech, language, and vision to learn objects & actions
Innate perception abilities (objects / properties)
Innate action capabilities (navigation / grasping)
Easily acquire terms not knowable a priori
Example dialog:
command
following
verb
learning
noun
learning
advice
taking
Round up my mug.
I don’t know how to “round up” your mug.
Walk around the house and look for it.
When you find it bring it back to me.
I don’t know what your “mug” looks like.
It is like this <shows another mug> but sort of orange-ish.
OK … I could not find your mug.
Try looking on the table in the living room.
OK … Here it is!
Language Learning & Understanding is a AAAI Grand Challenge
http://www.aaai.org/aitopics/pmwiki/pmwiki.php/AITopics/GrandChallenges#language
2
© 2005 IBM Corporation
IBM Research
Eldercare as an application
 Example tasks:
Pick up dropped phone
Get blanket from another room
Bring me the book I was reading yesterday
 Large potential market
Many affluent societies have a demographic imbalance (Japan, EU, US)
Institutional care can be very expensive (to person, insurance, state)
 A little help can go a long way
Can be supplied immediately (no waiting list for admission)
Allows person to stay at home longer (generally easier & less expensive)
Boosts independence and feeling of control (psychological advantage)
 Note: We are not attempting to address the whole problem
X
X
X
X
3
Aggressive production cost containment
Robust self-recharging and stairs traversal
Bathing and bathroom care, patient transfer, cooking
OSHA, ADA, FDA, FCC, UL or CE certification
© 2005 IBM Corporation
IBM Research
State of the art
 Indoor navigation
Minerva from CMU, Jose from Univ. British Columbia
No object perception
No manipulation capability
 Perception & manipulation
Herb from CMU / Intel (Kanade), PR2 from Willow Garage
Off-line object model generation
No natural language interface
 Language learning
Ripley from MIT (Deb Roy), HAM from KTH in Sweden
Either fetch or carry
No procedural learning
 Dialog and speech
Honda system from IBM, call center handling from IBM
No physical presence or action
No visual perception of objects
4
© 2005 IBM Corporation
IBM Research
Business Model
OEM
buy hardware
IBM
$70B / year
add software
and services
Third Party
customers
5
© 2005 IBM Corporation
IBM Research
Costs & revenue potential
 OEM sales price for hardware
$6000
Electromechanical parts
Onboard computer
Assembly (15hrs x $80 / hr)
+ 30% Sales & distribution + 20% profit
$1300
$500
$1200
$3000
 Value-added wholesale price (w/ software)
$15,000
10% Continued R&D
30% Sales & distribution
20% Profit
$1500
$4500
$3000
Price = Less than a new car
 Total cost of ownership
$8000 / yr
Lifetime = 3 years
Service (15hrs / quarter x $50 / hr x 4 quarters)
 Effective wage (40hrs / wk x 50wks / yr = 2000 hrs / yr)
$4 / hr
 Eldercare market in US (x3 if EU and AP also)
3 million
Total US population
Ages 75-85
Suitable (ability level, desire, finances)
6
$5000 / yr
$3000 / yr
$24B / yr
resell robot +
value added
software +
field service
300 million
10%
10%
 Manufacturing business ($2000 / robot yr)
$6B / yr
 Services business ($3000 / robot yr)
$9B / yr
© 2005 IBM Corporation
IBM Research
Sample business case
 Home eldercare now (employer costs)
$25,000 / yr
1 aide from 8am to 6pm = 10 hrs
50wks x 5days / wk x 10hrs / day
= 2500 hrs / yr
Federal min. wage = $7.25 / hr
+38% overhead (FICA + 401K + medical)
= $10 / hr
 Aide’s activities:
Help with clothes, hygiene, meals
Odd tasks such as fetching objects
Sitting around watching TV
 Alternative: Half-time aide + robot
$20,500 / yr
Human still helps with clothes, hygiene, meals
Robot potentially available after hours and on weekends
No problem with robot Training, Turnover, and Trust (stealing)
 Value proposition (to client):
30% more hours @ 10% less cost
Split savings with customer ($50,000  $45,000 per client)
Human 5 hrs + robot 8 hrs = 13 hrs / day during week
10% less revenue but 22% more profit (= $6.6B / yr extra profit if 100% market share)
Bill at $20,000 - $3000 service = $17,000 / yr revenue  10.6 months payback on $15,000 purchase
7
© 2005 IBM Corporation
IBM Research
What’s different and important
 Speech-driven interface
No headset required (far field), can learn new nouns and verbs
 Multi-modal dialog
Responds to gestures, exploits synergies between modalities
 Manipulation as well as mobility
Not just a walking telephone, can do useful physical work also
 One-shot learning
No turntable scanning, not 100’s of examples, no trial-and-error experiments
 Cost containment
Vision instead of special-purpose sensors and precise mechanicals
8
© 2005 IBM Corporation