HiFi Systems: Network-centric Query Processing for the Physical World

Download Report

Transcript HiFi Systems: Network-centric Query Processing for the Physical World

HiFi Systems:
Network-Centric Query Processing
for the Physical World
Michael
Franklin
UC Berkeley
2.13.04
Introduction

Continuing improvements in sensor devices
–
Wireless motes
–
RFID
–
Cellular-based telemetry

Cheap devices can monitor the environment at
a high rate.

Connectivity enables remote monitoring at
many different scales.

Widely different concerns at each of these
levels and scales.
M. Franklin, UC Berkeley, Feb. 04
Plan of Attack





Motivation/Applications/Examples
Characteristics of HiFi Systems
Foundational Components
– TelegraphCQ
– TinyDB
Research Issues
Conclusions
M. Franklin, UC Berkeley, Feb. 04
The Canonical HiFi System
M. Franklin, UC Berkeley, Feb. 04
RFID - Retail Scenario


“Smart Shelves”
continuously monitor item
addition and removal.
Info is sent back through
the supply chain.
M. Franklin, UC Berkeley, Feb. 04
“Extranet” Information Flow
Manufacturer C
Retailer A
Manufacturer D
Aggregation/
Distribution
Service
Retailer B
M. Franklin, UC Berkeley, Feb. 04
M2M - Telemetry/Remote Monitoring




Energy Monitoring - Demand
Response
Traffic
Power Generation
Remote Equipment
M. Franklin, UC Berkeley, Feb. 04
Time-Shift Trend Prediction

National companies can exploit
East Coast/ West Coast time differentials to
optimize West Coast operations.
M. Franklin, UC Berkeley, Feb. 04
Virtual Sensors



Sensors don’t have to be physical sensors.
Network Monitoring algorithms for detecting viruses,
spam, DoS attacks, etc.
Disease outbreak detection
M. Franklin, UC Berkeley, Feb. 04
Properties

High Fan-In, globally-distributed architecture.

Large data volumes generated at edges.
– Filtering and cleaning must be done there.
Successive aggregation as you move inwards.
– Summaries/anomalies continually, details later.
Strong temporal focus.
Strong spatial/geographic focus.
Streaming data and stored data.
Integration within and across enterprises.





M. Franklin, UC Berkeley, Feb. 04
One View of the Design Space
Filtering,
Cleaning,
Alerts
seconds
On-the-fly
processing
Monitoring,
Time-series
Data mining
(recent history)
Time
Scale
Combined
Stream/Disk
Processing
Archiving
(provenance
and schema
evolution)
years
Disk-based
processing
M. Franklin, UC Berkeley, Feb. 04
Another View of the Design Space
Filtering,
Cleaning,
Alerts
local
Several
Readers
Monitoring,
Time-series
Data mining
(recent history)
Geographic
Scope
Regional
Centers
Archiving
(provenance
and schema
evolution)
global
Central
Office
M. Franklin, UC Berkeley, Feb. 04
One More View of the Design Space
Filtering,
Cleaning,
Alerts
Monitoring,
Time-series
Degree of
Detail
Dup Elim
history: hrs
Data mining
(recent history)
Archiving
(provenance
and schema
evolution)
Aggregate
Data Volume
Interesting Events
history: days
Trends/Archive
history: years
M. Franklin, UC Berkeley, Feb. 04
Building Blocks
TinyDB
TelegraphCQ
M. Franklin, UC Berkeley, Feb. 04
TelegraphCQ: Monitoring Data Streams

Streaming Data
–
–
–
–

B2B and Enterprise apps
–
–



Network monitors
Sensor Networks
News feeds
Stock tickers
Supply-Chain, CRM, RFID
Trade Reconciliation, Order Processing etc.
(Quasi) real-time flow of events and data
Must manage these flows to drive business
(and other) processes.
Can mine flows to create/adjust business
rules or to perform on-line analysis.
TelegraphCQ (Continuous Queries)

An adaptive system for large-scale
shared dataflow processing.

Based on an extensible set of operators:
1) Ingress (data access) operators
 Wrappers, File readers, Sensor Proxies
2) Non-Blocking Data processing operators
 Selections (filters), XJoins, …
3) Adaptive Routing Operators
 Eddies, STeMs, FLuX, etc.
Operators connected through “Fjords”
– queue-based framework unifying push&pull.
– Fjords will also allow us to easily mix and match
streaming and stored data sources.

Extreme Adaptivity
static
late
plans binding
current
DBMS

per
intrainteroperator operator tuple
???
Query
???
Dynamic,
XJoin, DPHJ Eddies,
Parametric, Scrambling, Convergent CACQ
QP
Competitive, MidQuery
PSoup
Re-opt
…
Traditional query optimization depends on
statistical knowledge of the data and a stable
environment.
The streaming world has neither.

This is the region that we are exploring in the
Telegraph project.
M. Franklin, UC Berkeley, Feb. 04
Adaptivity Overview [Avnur & Hellerstein 2000]
static
dataflow
eddy
D
C
A
B
A B C D
• How to order and reorder operators over time?
– Traditionally, use performance, economic/admin feedback
– won’t work for never-ending queries over volatile streams
• Instead, use adaptive record routing.
Reoptimization = change in routing policy
M. Franklin, UC Berkeley, Feb. 04
The TelegraphCQ Architecture
Shared Memory
Query Plan Queue
TelegraphCQ
Back End
Eddy Control Queue
Modules
CQEddy
Split
Query Result Queues
}
Split
Scans
TelegraphCQ
Front End
Planner
Parser
Listener
Mini-Executor
Proxy
Catalog
Shared Memory Buffer Pool
Wrappers
TelegraphCQ
Wrapper
ClearingHouse
Disk
A single CQEddy
can encode multiple
queries.
M. Franklin, UC Berkeley, Feb. 04
The StreaQuel Query Language
SELECT projection_list
FROM from_list
WHERE selection_and_join_predicates
ORDEREDBY
TRANSFORM…TO
WINDOW…BY

Target language for TelegraphCQ

Windows can be applied to individual streams
Window movement is expressed using a “for loop construct in
the “transform” clause
We’re not completely happy with our syntax at this point.


M. Franklin, UC Berkeley, Feb. 04
Example Window Query: Landmark
NOW = 40 = t
0
5
10
15
20
25
30
35
Window
40
45
50
55
60
Timeline
ST
NOW = 41 = t
...
...
Window
Timeline
ST
NOW = 45 = t
Window ST
Window
ST
NOW = 50 = t
Timeline
Timeline
Current Status - TelegraphCQ

System developed by modifying PostgreSQL.

Initial Version released Aug 03
– Open Source (PostgreSQL license)
– Shared joins with windows and aggregates
– Archived/unarchived streams
– Next major release planned this summer.

Initial users include
– Network monitoring project at LBL (Netlogger)
– Intrusion detection project at Eurecom (France)
– Our own project on Sensor Data Processing
– Class projects at Berkeley, CMU, and ???
Visit http://telegraph.cs.berkeley.edu for more information.




Query-based interface to sensor
networks
Developed on TinyOS/Motes
Benefits
– Ease of programming and
retasking
– Extensible aggregation
framework
– Power-sensitive optimization
and adaptivity
Sam Madden (Ph.D. Thesis) in
collaboration with Wei Hong
(Intel).
http://telegraph.cs.berkeley.edu/tinydb
SELECT MAX(mag)
FROM sensors
WHERE mag > thresh
SAMPLE PERIOD 64ms
App
Query,
Trigger
Data
TinyDB
Sensor Network
M. Franklin, UC Berkeley, Feb. 04
Declarative Queries in Sensor Nets

Many sensor network applications can be described using query
language primitives.
–
Potential for tremendous reductions in development and
debugging effort.
“Report the light intensities of the bright nests.”
Sensors
SELECT nestNo, light
FROM sensors
WHERE light > 400
EPOCH DURATION 1s
Epoch
nestNo
Light
Temp
Accel
Sound
0
1
455
x
x
x
0
2
389
x
x
x
1
1
422
x
x
x
1
2
405
x
x
x
Aggregation Query Example
“Count the number occupied
nests in each loud region of
the island.”
SELECT region, CNT(occupied)
AVG(sound)
Epoch
CNT(…)
region
AVG(…)
0
North
3
360
0
South
3
520
1
North
3
370
1
South
3
520
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
Regions w/ AVG(sound) > 200
Query Language (TinySQL)
SELECT <aggregates>, <attributes>
[FROM {sensors | <buffer>}]
[WHERE <predicates>]
[GROUP BY <exprs>]
[SAMPLE PERIOD <const> | ONCE]
[INTO <buffer>]
[TRIGGER ACTION <command>]
M. Franklin, UC Berkeley, Feb. 04
Sensor Queries @ 10000 Ft
Query
{A,B,C,D,E,F}
(Almost) All Queries are
Continuous and Periodic
A
{B,D,E,F}
B
C
•Sample rate
{D,E,F} D
E
Written in SQL
With Extensions For :
F
•Offline delivery
•Temporal Aggregation
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing: Aggregation
SELECT COUNT(*)
FROM sensors
Interval 4
1
Sensor #
1
2
3
4
5
2
3
4
Interval #
Epoch
3
2
4
1
4
5
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing: Aggregation
SELECT COUNT(*)
FROM sensors
Interval 4
1
Sensor #
1
Interval #
4
2
3
4
5
Epoch
2
3
1
3
2
1
4
4
1
5
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
SELECT COUNT(*)
FROM sensors
Interval 3
1
Sensor #
1
2
3
4
Interval #
4
3
2
5
2
3
1
2
2
4
1
4
5
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
SELECT COUNT(*)
FROM sensors
Interval 2
1
Sensor #
1
1
2
3
4
Interval #
4
2
3
1
3
2
5
3
2
1
3
4
1
4
5
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
SELECT COUNT(*)
FROM sensors
5
1
Sensor #
1
2
3
4
Interval #
4
4
2
3
2
2
1
5
1
3
Interval 1
1
3
4
5
5
M. Franklin, UC Berkeley, Feb. 04
In-Network Processing : Aggregation
SELECT COUNT(*)
FROM sensors
Interval 4
1
Sensor #
1
2
3
4
Interval #
4
3
2
2
4
2
1
3
1
5
1
4
3
1
5
1
5
M. Franklin, UC Berkeley, Feb. 04
In Network Aggregation: Example
Benefits
2500 Nodes
50x50 Grid
Depth = ~10
Total Bytes Xmitted vs. Aggregation Function
Neighbors = ~20
Total Bytes Xmitted
100000
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
EXTERNAL
MAX
AVERAGE
COUNT
MEDIAN
Aggregation Function
M. Franklin, UC Berkeley, Feb. 04
Taxonomy of Aggregates

TinyDB insight: classify aggregates according to various functional
properties
– Yields a general set of optimizations that can automatically be applied
Property
Examples
Affects
Partial State
MEDIAN : unbounded,
MAX : 1 record
Effectiveness of TAG
Duplicate
Sensitivity
MIN : dup. insensitive,
AVG : dup. sensitive
MAX : exemplary
COUNT: summary
COUNT : monotonic
AVG : non-monotonic
Routing Redundancy
Exemplary vs.
Summary
Monotonic
Applicability of Sampling, Effect
of Loss
Hypothesis Testing, Snooping
M. Franklin, UC Berkeley, Feb. 04
Current Status - TinyDB


System built on top of TinyOS (~10K lines embedded
C code)Latest release 9/2003
Several deployments including redwoods at UC
Botanical Garden
Humidity vs. Time
101
104
109
110
111
36m
33m: 111
32m: 110
30m: 109,108,107
Rel Humidity (%)
95
85
75
65
55
45
35
20m: 106,105,104
Temperature vs. Time
10m: 103, 102, 101
Temperature (C)
33
28
23
18
13
8
7/7/03 7/7/03 7/7/03 7/7/03 8/7/03 8/7/03 8/7/03 8/7/03 8/7/03 8/7/03 9/7/03 9/7/03 9/7/03
9:40
13:41 17:43 21:45 1:47
5:49
9:51
13:53 17:55 21:57 1:59
6:01
10:03
Visit http://telegraph.cs.berkeley.edu/tinydb for more information.
Date
Putting It All Together?
TinyDB
TelegraphCQ
M. Franklin, UC Berkeley, Feb. 04
Ursa - A HiFi Implementation

Current effort towards building an integrated
infrastructure that spans the large scale in:
– Time
– Geography
– Resources
Ursa-Minor
Mid-tier
(TinyDB-based)
(???)
Ursa-Major
(TelegraphCQ w/Archiving)
M. Franklin, UC Berkeley, Feb. 04
TelegraphCQ/TinyDB Integration




Fjords [Madden & Franklin 02] provide the
dataflow plumbing necessary to use TinyDB
as a data stream.
Main issues revolve around what to run where.
– TCQ is a query processor
– TinyDB is also a query processor
– Optimization criteria include: total cost,
response time, answer quality, answer
likelihood, power conservation on motes, …
Project on-going, should work by summer.
Related work: Gigascope work at AT&T
M. Franklin, UC Berkeley, Feb. 04
TCQ-based Overlay Network

TCQ is primarily a single node system
– Flux operators [Shah et al 03] support cluster-based
processing.




Want to run TCQ at each internal node.
Primary issue is support for wide-area
temporal and geographic aggregation.
– In an adaptive manner, of course
Currently under design.
Related work: Astrolabe, IRISNet, DBIS, …
M. Franklin, UC Berkeley, Feb. 04
Querying the Past, Present, and Future




Need to handle archived data
– Adaptive compression can reduce processing
time.
– Historical queries
– Joins of Live and Historical Data
– Deal with later arriving detail info
Archiving Storage Manager - A Split-stream SM
for stream and disk-based processing.
Initial version of new SM running.
Related Work: Temporal and Time-travel DBs
M. Franklin, UC Berkeley, Feb. 04
XML, Integration, and Other Realities



Eventually need to support XML
Must integrate with existing enterprise apps.
In many areas, standardization well underway
Augmenting moving data
High Fan-in  High Fan-out

Related Work: YFilter [Diao & Franklin 03],
Mutant Queries [Papadimos et al. OGI], 30+
years of data integration research, 10+ years
of XML research, …
M. Franklin, UC Berkeley, Feb. 04
Conclusions




Sensors, RFIDs, and
other data collection
devices enable realtime enterprises.
These will create
high fan-in systems.
Can exploit recent
advances in
streaming and
sensor data
management.
Lots to do!
M. Franklin, UC Berkeley, Feb. 04