DIMENSIONS: Why do we need a new Data Handling

Transcript DIMENSIONS: Why do we need a new Data Handling

DIMENSIONS: Why do we need a
new Data Handling architecture for
sensor networks?
Deepak Ganesan, Deborah Estrin (UCLA),
John Heidemann (USC/ISI)
Presenter: Vijay Sundaram
1
Deployment: Microclimate monitoring
at James Reserve Park (UC Riverside)
How well does data fit model <M> of
variation of temperature with altitude.
Send robotic agent to
edge between low and
high precipitation
regions
Weather Sensor Network
Hmm…I wonder why packet-loss is so
high. Get a connectivity map of the
network for all transmit power settings
Get detailed data from node with
maximum precipitation from Sept
to Dec 2003
2
Goals

Flexible spatio-temporal querying



Distributed Long-term networked data storage


Provide ability to mine for interesting patterns and
features in data.
Drill-down on details
Preserve ability for long-term data mining, while
catering to node storage constraints
Performance


Reasonable Accuracy for wide range of queries
Low communication (energy) overhead
3
How can we achieve goals?

Exploit redundancy in data


Exploit rarity of interesting features


Preserve only interesting features.
Exploit scale of sensor network.


Potentially huge gains from lossy compression exploiting
spatio-temporal correlation
large distributed storage, although limited local storage.
Exploit low cost of approximate query processing

allow approximate query processing that obtain sufficiently
accurate responses.
4
Can existing systems satisfy
design goals?
Temporal Spatial
Geo-Spatial
Data Mining,
Streaming
Media (MPEG-2)
None
Exploited Data Correlation
Data Correlation Vs Decentralization
Centralized
Data
Collection
Web
Caches
Centralized
Hierarchical
Wireless
Sensor Networks
P2P: DHT
Gnutella
Fully Distributed
Degree of Decentralization
5
DIMENSIONS Design: Key Ideas


Construct hierarchy of
lossy compressed
summaries of data using
wavelet compression.
Queries “drill-down” from
root of hierarchy to focus
search on small portions
of the network.
Progressively age lossy
data along spatiotemporal hierarchy to
enable long-term storage
AGE
PROGRESSIVELY
PROGRESSIVELY LOSSY

Level 2
Level 1
Level 0
6
Roadmap




Why wavelets?
Example Precipitation Hierarchy
Spatial and Temporal Processing
internals
Initial Results: Precipitation Dataset
7
Enabling Technique: Wavelets

Very popular signal processing approach, that
provides good time and frequency
localization.



JPEG2000, Geo-Spatial Data Mining
preserves spatio-temporal features (edges,
discontinuities) while providing good
approximation of long-term trends in data
Efficient distributed implementation possible.
8
Sample Architecture: Precipitation
Hierarchy
What is the maximum precipitation
between Sept-Dec 2002?
Direct query to
quadrant that
best matches
query

Local Processing: Construct

Spatial Data Processing:
decreasing
spatial
resolution
lossy time-series summary
(zero communication cost)
Hierarchical Lossy
Compression

Wavelet
 Coeffs

decreasing
temporal
resolution
Organize network into
hierarchy. At each higher
level, reduce number of
participating nodes by a
factor of 4.
At each step of the hierarchy,
summarize data from 4
quadrants, and propagate
9
Spatial Decomposition


Recursively split network into nonoverlapping square grids.
At each level of the hierarchy,



Hierarchy
construction

Elect clusterhead
Cluster-head combines and summarizes
data from 4 quadrants
Cluster-head propagates compressed
data to the next level of the hierarchy.
Routing protocol: GPSR variant
(DCS - Ratnasamy et al,)
10
Wavelet
Subband
Decomposition
Input
Data
Thresholding
+
Quantization
+
Drop Subbands
Filter
y
Compressed
Output
Lossless
Encoder
Cost Metric
time
time
Wavelet Compression Internals
y
x
x
 Haar Filter
 Debauchies 9/7 filter
 Communication Budget
 Error bound
11
Initial Results with Precipitation
Dataset: Communication Overhead



15x12 grid (50km edge) of precipitation data from 19491994, from Pacific Northwest†. Gridded before processing.
Handpicked choice of threshold, quantization intervals,
subbands to drop. Huffman Encoder at output.
Very large compression ratio up the hierarchy
Level
Raw data
size (Kb) - R
Mean data sent
to n ext level (Kb) - M
Compression
Ratio = R/M
1
262.5
5.6
46.6
2
984.4
3.8
257.2
3
3937.7
7.4
987
4
11813.2
2.5
2286.2
†M. Widmann and C.Bretherton. 50 km resolution daily precipitation for the
Pacific Northwest, 1949-94.
12
Find maximum annual precipitation
for each year.

Fraction Error
1 - (DrillDown Answer/Real
Answer)
Drill Down Query: Error in Max Annual
Precipitation from 1949-1994
0.40
0.35
0.30
0.25

0.20
0.15
0.10

0.05
0.00
1950
1960
1970
1980
1990
2000
Exact Answer for
89% of queries.
Within 90% of
answer for >95% of
queries.
Queries require less
than 3% of network.
Good performance
on average with very
low lookup overhead
Year Queried
13
Locate boundary in annual precipitation
between Low and High Precipitation Areas

Fraction of Boundary
nodes missed
Drill Down Edge Query: Number of Nodes Missed
Per Year
1.2
1
0.8

0.6
0.4
within 13% error for
75% of the queries)
0.2
0
1940
1950
1960
1970
1980
Year Queried
1990
Error Metric: Number
of nodes greater
than 1 pixel distance
from drill-down
boundary
Accuracy: Within
25% error for 93%
of the queries (or
2000

Less than 5% of the
network queried.
14
Open Issues

Load Balancing and Robustness


Irregular Node Placement



Use wavelet extensions for irregular node
placement. Computationally more expensive
Gridify dataset with interpolation
Providing Query Guarantees


Hierarchical Model vs Peer Model: lot of work in
p2p systems…
Can we bound error in response obtained for a
drill-down query at a particular level of hierarchy?
Implementation on IPAQ/mote network
15
Summary

DIMENSIONS provides a holistic
data handling architecture for
sensor networks that can



Support a wide range of sensor-network usage
and query models (using drill-down querying of
wavelet summaries)
Provide a gracefully degrading lossy storage model
(by progressively ageing summaries)
Offer ability to tune energy expended for query
performance. (tunable lossy compression)
16
Different optimization metrics
Internetbased Peer-to
Peer Systems
Geo-Spatial
Data Mining
Web
Caches
Streaming
Media
(MPEG-2)
Energy

Latency

Bandwidth


Approximate
Results OK
Resource
Constraints





Spatio-temporal
Query Performance
Lookup Cost
Wireless
Sensor
Networks







17
Other Examples: Packet Loss
Different example of
dataset that exhibits
spatial correlation
6
0.
0. 4
0. 6
10
0. 6
6
6
0.
2
0.
5
0.
0.8
0.2
0. 6
2
0.
0.4
0.4
4
0.4
0.
0.4
0.6 0.
4
0.2
0. 2
0.6
0
0
0. 2
5
10
0.6
2
0.
0. 2
0.4
0.6
0.4
0.2
0.6
Typically, what we want to
query is the deviations
from normal and average
throughput.
0.6
0. 2

0. 2
0.4
0. 2
0. 4

Throughput from one
transmitter to proximate
receivers is correlated
Throughput from multiple
proximate transmitters to
one receiver is correlated.
0. 2

Contour Map
15
Distance (ft)

0.
4
15
Distance (ft)
18
Packet-Loss Dataset: Get
Throughput Vs Distance Map


Involves expensive
transfer of 12x14
map from each node.
Good approximate
results can be
obtained from
querying compressed
data.
19
Data is
progressively aged,
both locally, and
along the hierarchy.

Summaries that
cover larger areas
and longer timeperiods are retained
for much longer
than raw timeseries.
Wavelet
Coefficients

Slower Ageing
Long-term Storage: Concepts
20
Load Balancing and Robustness:
Concepts

Hierarchical Model



Naturally fits wavelet processing
Strict hierarchies are vulnerable to node failures.
Failures near root of hierarchy can be expensive to
repair
Decentralized Peer Model


Summaries communicated to multiple nodes
probabilistically.
Better robustness, but incurs greater
communication overhead.
21

DIMENSIONS: Why do we need a new Data Handling

Transcript DIMENSIONS: Why do we need a new Data Handling

Directory