Information Capture and Re-Use

Download Report

Transcript Information Capture and Re-Use

Information Capture and Re-Use
Joe Hellerstein
Scenario
• Ubiquitous computing is more than clients!
– sensors and their data feeds are key
– smart dust (MEMS sensors)
– biomedical monitoring devices (MEMS sensors)
– every item of value records its use/misuse
(disposable computing)
– tacit information from human behavior
– video from surveillance cameras, broadcasts,
etc.
There’s a Data Flood Coming
There’s a Data Flood Coming
• What does it look like?
– Never ends: interactivity required
– Big: data reduction/aggregation is key
– Unpredictable: this scale of devices and nets
will not behave nicely
• Key Technologies:
– CONTROL:
• early answers and interactivity
• online aggregation for data reduction
– River/Eddy:
• massively parallel, adaptive dataflow
CONTROL
Continuous Output and Navigation Technology with Refinement On Line
• Data-intensive jobs are long-running. How to
give early answers and interactivity?
– Statistical estimators, and their performance
implications
– online query processing algs: ripple joins
– online interactivity over feeds: data “juggle”
• Appreciate interplay of massive data
processing, stats, and UIs
• Challenges: apply to sequence data, scale up
River
• We built the world’s fastest sorting machine
– On the “NOW”: 100 Sun workstations + SAN
– But it only beat the record under ideal
conditions!
• River: performance adaptivity for data flows on
clusters
– simplifies management and programming
– perfect for sensor-based streams
• Challenges: deploy over a wide area
Eddy
• How to order and reorder operators over time
• key complement to River: adapt not only to the
hardware, but to the processing rates
Challenges: scale up, consider parallel scheduling
Telegraph: Putting it Together
• Want to build next-gen global DB system.
Capture and Re-Use Embodied in a vertical
solution.
• Marriage of:
– CONTROL, River & Eddy
– OceanStore + optionally-Xactional storage that
handle new hardware realities, scale
– Federation in the wide area via
Negotiation/Economics
– Combinations of browse/query/mine at UI
• no magic bullet there! CONTROL is key.
Integration with other options
• Integration
– Use Oceanic Data Utility for distribution,
caching, protection of streams
– Use negotiation architectures to connect
federated and stored streams
– Be data-intensive backbone to diverse clients
– Be a scalable platform for tacit knowledge
extraction
• Cooperation
– Tacit information as a feed
– Capture/merge classroom feeds
– Use UI design tools for device-independent,
interactive stream-based apps
Plan for Success
• One Year
– Implement River/Eddy over parallel cluster,
deploy CONTROL modules
– Deploy data analysis apps over sequence data
(MEMS/Web/Video)
• Three Year
– Integrate w/ wide area storage & processing
– Get data-intensive Endeavour apps running on
architecture (e.g. tacit knowledge mining)
– Develop UI tools for interacting with neverending streams