Transcript Slide 1

Overview of
research at HP
Labs India
© 2006 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
HP Labs around the world
7 locations
600 researchers in 23 labs
Palo Alto
Beijing
Tokyo
Bristol
St. Petersburg
Haifa
Bangalore
20-30 large projects in 8 high-impact areas
High-Impact Research Areas
The next technology challenges and opportunities
Digital Commercial Print
Intelligent Infrastructure
Content Transformation
Sustainability
Immersive Interaction
Cloud
Analytics
Information Management
Digital Commercial Print
End State: Flexible, customized, on-demand
printing that replaces the traditional
distribution of mass-produced materials
HP Labs’ research contribution: Breakthrough
technology to accelerate the transformation to
digital commercial printing
Printing Process
Commercial-grade
throughput, cost
and quality
Data Path
Efficient processing
of massive data
streams
Color
Self-calibration,
intuitive rendering
Job Creation
Automated content
generation
Content Transformation
End State: Complete convergence of physical and
digital information
HP Labs’ research contribution: Technologies to
transfer content seamlessly from paper to digital
and access digital content wherever paper is used
today
Displays/Materials
Unbreakable, conformable, ultra-thin
and lightweight; Digital with the look
and feel of paper
Content Management
Intuitive, personalized organization;
Intelligent content extraction;
Live, interactive documents
Immersive Interaction
End state: Intuitive human interaction through
and with technology
HP Labs’ research contribution: Radically
simplify the user experience to make
technology more useful, intuitive and pervasive
Intuitive Interfaces
Natural, multi-modal,
computer-human
interactions
Seamless Collaboration
Immersive multimedia
communication – anytime,
anywhere – with no
physical barriers
Contextual Services
Delivering “the right thing at
the right time”; Personal
paradigms to simplify
Web interaction
Information Management
End State: The vast universe of enterprise information
transformed into immediate, business-relevant insight
HP Labs’ research contribution: Redefine
the twin tasks of taming and exploiting information
to revolutionize enterprise decision making
Management
Superior analysis, extraction
and delivery of massive
enterprise content
Intelligence
Capabilities to transform
massive-scale, real-time data
into transactional, operational
business intelligence
Analytics
End state: Application of mathematic and
scientific methodologies create better run
businesses
HP Labs’ research contribution: Drive secure,
informed, highly effective decision making
Solutions
Predictive customer behavior;
Individual profile learning
Software
Enhance automation and
business processes
Services
Analytics that address
and transform operational
efficiency and security
Cloud
End state: Everything-as-a-Service: Billions of users, millions of
services, thousands of service providers, millions of servers,
exabytes of data, terabytes of traffic
HP Labs’ research contribution: Develop an
integrated cloud stack, from infrastructure to services
Infrastructure
Enterprise-grade security,
capacity and management
Services
Disrupt traditional
industries and offer rich,
dynamic experiences
Intelligent Infrastructure
End state: Capture more value via dramatic computing
performance and cost improvements
HP Labs’ research contribution: Radical, new
approaches for collecting, storing and transmitting
data to feed the exascale data center
Nanotechnology
Intelligent Storage
Memristors, Sensors, Cloud-scale, dynamic
Photonic Interconnect enterprise-grade
Data Center
Cost and power efficient;
Manageable, reliable;
Easily programmable
Networks
Programmable,
scalable,
energy-efficient
Sustainability
End state: An IT industry with a light carbon
footprint that drives the reduction of carbon
emissions throughout the global economy
HP Labs’ research contribution: Displace
conventional supply chains with sustainable
IT ecosystems
Data Centers
Integrated, end-to-end management
of compute, power & cooling
resources from cradle to cradle
Tools & Methodologies
Reengineer existing value chains
using IT to lower environmental
footprint
2008 HP Labs Innovation Research Awards
41 awards, 34 universities,14 countries
• Stanford University
• University of California,
Berkeley
• University of California, Davis
• University of California, San
Diego
• University of California, Santa
Barbara
• University of Southern
California
• University of Toronto
• Carnegie Mellon University
• Massachusetts Institute of
Technology
• State University of New
York at Buffalo
• Rochester Institute of
Technology
•
•
•
•
University of Edinburgh, Scotland
University of Bath, England
University of Leeds, England
University of Bristol, England
•
•
•
•
EMEA
Europe, Middle East & Africa
Konstanz University, Germany
Technische Universitaet Muenchen, Germany
Vrije Universiteit Amsterdam, Netherlands
Universidade do Minho, Portugal
• Russian Academy of Sciences, Russia
• University of Saint-Petersburg, Russia
• Bilkent University, Turkey
• Technion, Israel Institute of
Technology, Israel
Americas
• National Institute of
Informatics, Japan
• University of Illinois at
Urbana-Champaign
• University of Michigan
• University of WisconsinMadison
• Purdue University
• Georgia Institute of
Technology
12
17 July 2015
• Peking University, China
• Tsinghua University, China
• Nanyang Technological
University, Singapore
• Indian Institute of
Technology, Madras, India
• Indian Institute of
Technology, Bombay, India
APJ
Asia-Pacific & Japan
Open cloud computing research test
bed
•
A loose federation of “Centers of
Excellence” around the globe
− UIUC, Singapore IDA, KIT: 3 initial CoE
− HP, Intel, Yahoo: 3 initial sponsors with CoE
•
Research objectives
− Multi-datacenter, multi-geography, multitenancy, secure, massive scale, open test
bed
•
Each center: 1000-4000 cores and up to
PB storage
− Base service: PRS (physical resource set)
− Required services: Open EC2-like, S3, and
Hadoop-on-demand
− Plus additional local
extensions/variants/service types
HP Labs India
© 2006 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Gesture-based keyboard (GKB)
PrintCast
Uplink Side
Downlink Side
Uplink Dish
Receiver Dish
& LNBC
Solid State Power Amplifier
Set Top Box
Up
converter
PrintCast
Decoder
Modulator
Encoder
Television
Inserter
AV Signal
Data from PC
Printer
Paper & IT convergence
Secure AiO
HP Labs India
•
Three ongoing projects
− Simplifying web consumption for the next billion
(SWAN) – Remainder of this talk
− Intuitive multimodal and gestural interaction (IMAGIN)
− Paper in the digital enterprise (PRIDE)
SWAN project - Motivation
Simplifying web consumption for all
 Web is useful but complex to use for non-tech-savvy people
 Web has to be useful in the mobile context as well
Why is web consumption complex ?
•
Each web site forces its own cognitive model on the
user
− Website decides the interaction model, user has to learn it & remember it
− Different websites of the same genre impose their model
•
Web requires very “low” level instructions
− Information access is through query and manual filtering approach
− Content adaption, e.g. translation, require a lot of technical skills
•
Mobile web consumption is challenging
− User’s frame of mind is different (limited attention span, distracted)
− Devices are resource challenged
•
Broken web experience across different access methods
− experience continuity across broadband, mobile & disconnected connectivity
State of the art
chumby
Web
Simplificatio
n
Web Widgets
Passive
consumption
Browser Scripting
Pipes
Personalized
Web Content
Alerts
Personalized
web pages
Mashups
Mobile
environments
The Gap: Need to Simplify Personal Web Interactions - especially for Mobile Environments
Technical Goals
 Users to set their own preferred interaction
pattern
 Enabling users to easily express their own web
interaction patterns
 Providing a familiar interface to all personal actions on
the web
 Higher level intent while interacting with services
 Implicit web content consumption based on higher user
intent expression, user feedback and user profile.
 Understanding and translating user intent to web actions
 Always responsive interactions
 Providing continual interaction across multiple devices &
connectivity situations
 Providing ‘Responsive-Behavior’ despite disconnections
Approach
Create simple interactions
for long term and exploratory
information needs
Intent
Query
Goal
User
Profiles
Aggregation,
ranking
Query
expansion
Summarization
End user value: Simplify the
“Intent -> Query -> Goal”
cycle
Google
Youtube Digg/Delicious
Using User profiles to personalize
services
Explicit
and
Implicit info
Data
Collection
User
User
Profile
Profile
Constructor
Application
Personalized services
(Search, news, video,
shopping)
Aren’t online portals already doing this?
•
Online portals and search engines build user
profiles using cookies and other stored data
(search keywords, web pages accessed)
− However, they don’t see all the user data
− No way for users to aggregate and reuse the profiles
different websites (Google, Yahoo, ..) build using their
data
− Privacy is a big problem
Implicit profile construction - Prior
approaches and their limitations
•
Word based Approach
− Use words in user documents to represent user
interests
− Problems
• Words appear independent of page content (“Home”, “page”)
• Polysemy and Synonymy
• Large profile sizes
•
DMOZ approach
− Use existing ontology maintained for free
− Problems
• Too large (about 6 lakh DMOZ nodes), ontology has to be
drastically pruned for use
• Need to build classifiers for each DMOZ node
Our approach
•
Use Wikipedia as the language of profile
representation, map user documents to Wikipedia
concepts
− Has bias lower than DMOZ and variance lower than
words
•
Build a hierarchical profile based on Wikipedia
•
Tag the profile concepts as (transactional or
recreational)
•
Compute recency of user interests in a particular
topic
Mapping documents (web pages) to
Wikipedia concepts
Item: “Sony to slash PlayStation3 price”
Term vector Representation: <sony:1>,<slash:1>,
<playstation3:1>,<price:1>
Item: “Jittery Sony Knocks $100 Off PS3 Price Tag”
Term vector Representation: <jittery:1>, <sony:1>, <knocks:1>
<ps3:1>,<price:1>, <tag:1>
Additional features: titles of the
retrieved articles
query
Sony to slash PlayStation3
price
Index of Wikipedia dump
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
PlayStation Network Platform
PlayStation 2
Ducks demo
PlayStation 3
PlayStation
Ken Kutaragi
PlayStation Portable
Console manufacturer
Sony Group
Crystal Dynamics
PlayStation 3 accessories
…
…
Term Vector vs Wikipedia profiles
Words in TF * IDF based user profile
Concepts in Wikipedia Based user profile
Search
Text Retrieval Conference
Home
Help
News
Privacy
Google
Terms
HTML element
Bank of America
Google search
ICICI Bank
IDBI Bank
Bank fraud
New
Artificial neural network
Page
Use
Web
View
Results
Information
Web crawler
Web design
Debit card
Extensible Markup Language
Hewlett-Packard
Microsoft
Account
XHTML
Demand account
Constructing the hierarchical profile
Algorithm of Xu et.al. [WWW 2007]
Wild life
photography (5)
Nature photography
(10)
Photography
(15)
Support (# pages
mapped to this
concept)
Photography
(15)
Wild life
photography (5)
Nature photography
(10)
Tagging concepts in user profiles
•
Two types of tags
− Whether the concept is of commercial or recreational
interest
− Recency of interest
•
Tagging Commercial interest
− Crawl shopping site pages, map pages to concepts and
label these concepts as commercial interests
•
Tagging Recreational interest
− Use topics in Wikipedia recreational/hobby categories
•
Recency of Interest – Sigma(1/e^(today – time
page supporting topic last accessed))
Wikipedia based profile
Evaluation results
•Profiles are stable (fig 1)
0.8
•Profile elements at all levels of the
hierarchy have similar precision (fig 3)
0.6
Stability
•Profile elements with high support
have high precision (fig 2)
0.7
0.5
Stability_alpha
0.4
Stability_date
0.3
0.2
0.1
0
•Bookmarks are not a good data
source for profiles
0
200
400
600
Number of web pages in cache
1
1.2
0.95
1
0.9
0.8
Percent (%)
Precision
Figure 1
0.85
0.8
0.75
Percentage in profile
0.6
Precision
0.4
0.2
0.7
0
Support > 5
Figure 2
3 < Support < 5
Support < 3
Level 1 Level 2 Level 3 Level 4 Level 5 Level 6
Figure 3
Query expansion – Personalized video
•
Approach
− Create three additional queries
(based on terms with high TF in title,
tags and description)
− Evaluating which expansion is better
•
•
Example: Query on
Youtube for “trains”
Expansion using
− Title
train+osbourne+midnight+bullet+rollin+mystery+
maglev
− Description
train+runaway+record+version+video+http+track
− Tags
train+railroad+guitar+osbourne+railway+bullet
•
Cross-lingual expansions
− Baba Ramdev
− Baba+ramdev+yoga+swami+pranayam+li
ye+ram+disease+dev+india+dhyan
Query expansion - “Find similar”
Problem – Can we construct queries to make
getting “similar content” easier ?
Approach - Identify key phrases for text
document, query standard search engine, rank
results
Query - Ed
Lazowska’s talk
•Retrieving the original document
capture restart+ capture random+random walk+page rank+capture
random walk+restart yields
retrieves Hopcroft’s talk at rank 1 in Google
Result – Hopcroft’s
talk
Query expansion – “Find similar”
economic growth
global development
economic history
economic governance
adam smith
good governance
economic growth process
modern technology
Query
economic+growth+global+development+history+governance+adam+smith+process+rich+g
ood+new+knowledge+cgd+brief+world+property+rights+productivity+labor+human+capital
+getting+use+modern+technology+trade+barriers+public+goods+poor+countries+machine
+natural+resources+research+intellectua
Aggregating search results
•
Current search interfaces geared to immediate
gratification, no way to tradeoff search latency for
more relevant results
•
Different search engines have different coverage,
no way to benefit from this
•
Navigation of results requires clicking back and
forth on search results
− Search result snippets often misleading
Our solution
•
To create an aggregated
and personalized
Information Retrieval (IR)
system that
− compiles and consolidates
the most relevant information
on particular topic(s) from
the web
− automatically creates a PDF
document on the topic
Ranking results
•
Content Based Ranking (based on TF,IDF,
Document Boost, Field Boost)
•
Delicious Vector Cosine Similarity
Rank (URL) = d*(CBR) + (1-d) ( DVCS)
User Interface
results
User study
Document summarization using Wikipedia
Algorithm1
Document sentences mapped to
Wikipedia concepts
Uses in degree of concept-sentence
bipartite graph for sentence selection
Additional features:
titles of the
retrieved
articles
query
1.
Sony to slash
PlayStation3
price
PlayStation
Network
Platform
PlayStation 2
Ducks demo
PlayStation 3
PlayStation
Ken Kutaragi
PlayStation
Portable
Console
manufacturer
Sony Group
Crystal
Dynamics
PlayStation 3
accessories
…
…
2.
3.
4.
5.
6.
7.
Index of
Wikipedia
content
8.
9.
10.
11.
Tested on DUC 2002 data from NIST
12.
13.
Would have come in 3rd in the NIST
challenge
Limitations
- Controlling size of the summary
- General concepts (e.g. Sports) may
win over specific concepts (e.g.
Soccer)
C1
C2
C3
C4
1
0
1
0
0
1
1
0
0
0
0
1
S1
S2
S3
In degree = 2
Document summarization - Algorithm 2
Intuition : Important sentences in the
document map to important concepts
and vice versa
x
t 1
n
 f(x ,G)
t
n
Accumulate step
Propagate sentence importance to
concepts and concept importance to
sentences over multiple iterations
y
t 1
m


x
t
n
n N m
Broadcast step
x
Future work – Size of summary,
multi-document summaries, Indian
language summaries
t 1
n


m M n
y
t
m
Challenge 1
•
Better intent expression
•
Multi-lingual query reformulation
−Baba Ramdev
− Baba+ramdev+yoga+swami+pranayam+liye+ram+disease+dev+in
dia+dhyan
•
Interfaces to simplify feedback for query
reformulation
Challenge 2
•
Long standing queries
• Queries spread over time
− Learning photography
− Information delivery needs to be incremental and non-repetitive
− Video retrieval
•
Channels
• Create Initial stickiness
• Ensure ongoing interest
− Caching – Utility models
•
What are good evaluation measures for such systems ?
Challenge 3
•
Document summarization
− Extracting leads
− Compression versus missed information
− Cross lingual summarization