DIMENSIONS: Why do we need a new Data Handling
Download
Report
Transcript DIMENSIONS: Why do we need a new Data Handling
DIMENSIONS: Why do we need a
new Data Handling architecture for
sensor networks?
Deepak Ganesan, Deborah Estrin (UCLA),
John Heidemann (USC/ISI)
Presenter: Vijay Sundaram
1
Deployment: Microclimate monitoring
at James Reserve Park (UC Riverside)
How well does data fit model <M> of
variation of temperature with altitude.
Send robotic agent to
edge between low and
high precipitation
regions
Weather Sensor Network
Hmm…I wonder why packet-loss is so
high. Get a connectivity map of the
network for all transmit power settings
Get detailed data from node with
maximum precipitation from Sept
to Dec 2003
2
Goals
Flexible spatio-temporal querying
Distributed Long-term networked data storage
Provide ability to mine for interesting patterns and
features in data.
Drill-down on details
Preserve ability for long-term data mining, while
catering to node storage constraints
Performance
Reasonable Accuracy for wide range of queries
Low communication (energy) overhead
3
How can we achieve goals?
Exploit redundancy in data
Exploit rarity of interesting features
Preserve only interesting features.
Exploit scale of sensor network.
Potentially huge gains from lossy compression exploiting
spatio-temporal correlation
large distributed storage, although limited local storage.
Exploit low cost of approximate query processing
allow approximate query processing that obtain sufficiently
accurate responses.
4
Can existing systems satisfy
design goals?
Temporal Spatial
Geo-Spatial
Data Mining,
Streaming
Media (MPEG-2)
None
Exploited Data Correlation
Data Correlation Vs Decentralization
Centralized
Data
Collection
Web
Caches
Centralized
Hierarchical
Wireless
Sensor Networks
P2P: DHT
Gnutella
Fully Distributed
Degree of Decentralization
5
DIMENSIONS Design: Key Ideas
Construct hierarchy of
lossy compressed
summaries of data using
wavelet compression.
Queries “drill-down” from
root of hierarchy to focus
search on small portions
of the network.
Progressively age lossy
data along spatiotemporal hierarchy to
enable long-term storage
AGE
PROGRESSIVELY
PROGRESSIVELY LOSSY
Level 2
Level 1
Level 0
6
Roadmap
Why wavelets?
Example Precipitation Hierarchy
Spatial and Temporal Processing
internals
Initial Results: Precipitation Dataset
7
Enabling Technique: Wavelets
Very popular signal processing approach, that
provides good time and frequency
localization.
JPEG2000, Geo-Spatial Data Mining
preserves spatio-temporal features (edges,
discontinuities) while providing good
approximation of long-term trends in data
Efficient distributed implementation possible.
8
Sample Architecture: Precipitation
Hierarchy
What is the maximum precipitation
between Sept-Dec 2002?
Direct query to
quadrant that
best matches
query
Local Processing: Construct
Spatial Data Processing:
decreasing
spatial
resolution
lossy time-series summary
(zero communication cost)
Hierarchical Lossy
Compression
Wavelet
Coeffs
decreasing
temporal
resolution
Organize network into
hierarchy. At each higher
level, reduce number of
participating nodes by a
factor of 4.
At each step of the hierarchy,
summarize data from 4
quadrants, and propagate
9
Spatial Decomposition
Recursively split network into nonoverlapping square grids.
At each level of the hierarchy,
Hierarchy
construction
Elect clusterhead
Cluster-head combines and summarizes
data from 4 quadrants
Cluster-head propagates compressed
data to the next level of the hierarchy.
Routing protocol: GPSR variant
(DCS - Ratnasamy et al,)
10
Wavelet
Subband
Decomposition
Input
Data
Thresholding
+
Quantization
+
Drop Subbands
Filter
y
Compressed
Output
Lossless
Encoder
Cost Metric
time
time
Wavelet Compression Internals
y
x
x
Haar Filter
Debauchies 9/7 filter
Communication Budget
Error bound
11
Initial Results with Precipitation
Dataset: Communication Overhead
15x12 grid (50km edge) of precipitation data from 19491994, from Pacific Northwest†. Gridded before processing.
Handpicked choice of threshold, quantization intervals,
subbands to drop. Huffman Encoder at output.
Very large compression ratio up the hierarchy
Level
Raw data
size (Kb) - R
Mean data sent
to n ext level (Kb) - M
Compression
Ratio = R/M
1
262.5
5.6
46.6
2
984.4
3.8
257.2
3
3937.7
7.4
987
4
11813.2
2.5
2286.2
†M. Widmann and C.Bretherton. 50 km resolution daily precipitation for the
Pacific Northwest, 1949-94.
12
Find maximum annual precipitation
for each year.
Fraction Error
1 - (DrillDown Answer/Real
Answer)
Drill Down Query: Error in Max Annual
Precipitation from 1949-1994
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
1950
1960
1970
1980
1990
2000
Exact Answer for
89% of queries.
Within 90% of
answer for >95% of
queries.
Queries require less
than 3% of network.
Good performance
on average with very
low lookup overhead
Year Queried
13
Locate boundary in annual precipitation
between Low and High Precipitation Areas
Fraction of Boundary
nodes missed
Drill Down Edge Query: Number of Nodes Missed
Per Year
1.2
1
0.8
0.6
0.4
within 13% error for
75% of the queries)
0.2
0
1940
1950
1960
1970
1980
Year Queried
1990
Error Metric: Number
of nodes greater
than 1 pixel distance
from drill-down
boundary
Accuracy: Within
25% error for 93%
of the queries (or
2000
Less than 5% of the
network queried.
14
Open Issues
Load Balancing and Robustness
Irregular Node Placement
Use wavelet extensions for irregular node
placement. Computationally more expensive
Gridify dataset with interpolation
Providing Query Guarantees
Hierarchical Model vs Peer Model: lot of work in
p2p systems…
Can we bound error in response obtained for a
drill-down query at a particular level of hierarchy?
Implementation on IPAQ/mote network
15
Summary
DIMENSIONS provides a holistic
data handling architecture for
sensor networks that can
Support a wide range of sensor-network usage
and query models (using drill-down querying of
wavelet summaries)
Provide a gracefully degrading lossy storage model
(by progressively ageing summaries)
Offer ability to tune energy expended for query
performance. (tunable lossy compression)
16
Different optimization metrics
Internetbased Peer-to
Peer Systems
Geo-Spatial
Data Mining
Web
Caches
Streaming
Media
(MPEG-2)
Energy
Latency
Bandwidth
Approximate
Results OK
Resource
Constraints
Spatio-temporal
Query Performance
Lookup Cost
Wireless
Sensor
Networks
17
Other Examples: Packet Loss
Different example of
dataset that exhibits
spatial correlation
6
0.
0. 4
0. 6
10
0. 6
6
6
0.
2
0.
5
0.
0.8
0.2
0. 6
2
0.
0.4
0.4
4
0.4
0.
0.4
0.6 0.
4
0.2
0. 2
0.6
0
0
0. 2
5
10
0.6
2
0.
0. 2
0.4
0.6
0.4
0.2
0.6
Typically, what we want to
query is the deviations
from normal and average
throughput.
0.6
0. 2
0. 2
0.4
0. 2
0. 4
Throughput from one
transmitter to proximate
receivers is correlated
Throughput from multiple
proximate transmitters to
one receiver is correlated.
0. 2
Contour Map
15
Distance (ft)
0.
4
15
Distance (ft)
18
Packet-Loss Dataset: Get
Throughput Vs Distance Map
Involves expensive
transfer of 12x14
map from each node.
Good approximate
results can be
obtained from
querying compressed
data.
19
Data is
progressively aged,
both locally, and
along the hierarchy.
Summaries that
cover larger areas
and longer timeperiods are retained
for much longer
than raw timeseries.
Wavelet
Coefficients
Slower Ageing
Long-term Storage: Concepts
20
Load Balancing and Robustness:
Concepts
Hierarchical Model
Naturally fits wavelet processing
Strict hierarchies are vulnerable to node failures.
Failures near root of hierarchy can be expensive to
repair
Decentralized Peer Model
Summaries communicated to multiple nodes
probabilistically.
Better robustness, but incurs greater
communication overhead.
21