Information Capture and Re-Use

Download Report

Transcript Information Capture and Re-Use

Telegraph: An Adaptive GlobalScale Query Engine
Joe Hellerstein
Scenarios
• Ubiquitous computing: more than clients!
– sensors and their data feeds are key
• smart dust, biomedical (MEMS sensors)
• each consumer good records (mis)use
– disposable computing
• video from surveillance cameras, broadcasts, etc.
• Global Data Federation
– all the data is online – what are we waiting for?
– The plumbing is coming
• XML/HTTP, etc. give LCD communication
• but how do you query robustly over many sites in the wide
area?
There’s a Data Flood Coming
There’s a Data Flood Coming
• What does it look like?
– Never ends: interactivity required
– Big: data reduction/aggregation is key
– Unpredictable: this scale of devices and nets
will not behave nicely
The Telegraph Query Engine
• Key technologies
– Interactive Control
• interactivity with early answers
• online aggregation for data reduction
– Continuously adaptive flow optimization
• massively parallel, adaptive dataflow via
Rivers and Eddies
CONTROL
Continuous Output, Navigation & Transformation with Refinement On Line
• Data-intensive jobs are long-running. How to
give early answers and interactivity?
– online interactivity over feeds: data “juggle”
– online query processing algs: ripple joins
– statistical estimators, and their performance
implications
• Appreciate interplay of massive data
processing, stats, and UIs
CONTROL
Continuous Output and Navigation Technology with Refinement On Line
CONTROL
Continuous Output and Navigation Technology with Refinement On Line
River
• We built the world’s fastest sorting machine
– On the “NOW”: 100 Sun workstations + SAN
– But it only beat the record under ideal
conditions!
• River: performance adaptivity for data flows on
clusters
– simplifies management and programming
– perfect for sensor-based streams
Eddy
Eddy
• How to order and reorder operators over time
– based on performance, economic/admin feedback
• Vs.River:
– River optimizes each operator “horizontally”
– Eddies optimize a pipeline “vertically”
Telegraph: Putting it Together
• Scalable, adaptive dataflow infrastructure. Apps include…
– sensor nets
– massively parallel and wide-area query engines
– net appliances: chaining xform8n/aggreg8n/etc. proxies
– any unpredictable dataflow scenario
• Technology: a marriage of…
– CONTROL, River & Eddy
• Many research questions here
• E.g. how to combine River and Eddy adaptivity
• E.g. how to tune Eddies for statistical performance goals
– Combinations of browse/query/mine at UI
– Storage management to handle new hardware realities
Integration with Endeavour
• Give
– Be data-intensive backbone to diverse clients
– Be replication dataflow engine for OceanStore
– Telegraph Storage Manager provides storage
(xactional/otherwise) for OceanStore
– Provide platform for data-intensive “tacit info mining”
• Take
– Leverage OceanStore to manager distributed metadata,
security
– Leverage protocols out of TinyOS for sensors
Additional Slides
• For use in questions, etc.
Connectivity & Heterogeneity
• Lots of folks working on data format translation, parsing
– we will borrow, not build
– currently using JDBC & Cohera Net Query
• commercial tool, donated by Cohera Corp.
• gateways XML/HTML (via http) to ODBC/JDBC
– we may write “Teletalk” gateways from sensors
• Heterogeneity
– never a simple problem
– Control project developed interactive, online data
transformation tool: Potter’s Wheel
Potter’s Wheel Anomaly Detection