Hongbo Jiang
Download
Report
Transcript Hongbo Jiang
EECS 600 Advanced Network Research, Spring 2005
Median and Beyond: New Aggregation
Techniques for Sensor Networks
Hongbo Jiang
April 11, 2005
Context
• Introduction
• Background & Related Work
• Q-digest
• Queries on q-digest
• Experiment
• Discussion
EECS 600 Advanced Network Research, Spring 2005
2
Introduction – wireless sensor networks
• Limitation
– Computation capability, communication bandwidth,
battery power......
• Unreliability
– Inherent unreliability of sensing function. That is,
individual sensor readings are inherently unreliable.
• To address these two aspects, let us look insight
some methods for communication in query
processing…
EECS 600 Advanced Network Research, Spring 2005
3
Typical Models – (I)
• Each node collects data and this data needs to
be delivered to the users through the network
interconnection.
– How to accomplish this? Let each sensor node deliver
its data periodically to host computer (base station),
where the data can be assembled for subsequent
analysis.
• Drawback
– Excessive communication.
EECS 600 Advanced Network Research, Spring 2005
4
Typical Models – (II)
• Exploit the multi-hop routing protocols in sensor
networks in such a way that messages from
multiple nodes are combined en-route from the
sensor nodes to base station.
– Routing tree with the base station as the root.
• Drawback: It suffers from the problem of larger
message sizes as information passes through
the routing tree from the leaf nodes to the base
station.
EECS 600 Advanced Network Research, Spring 2005
5
TinyDB & Cougar – In-networks Aggregation
• Observations:
– The individual sensor values do not hold much value.
– Extracting all the data out of a sensor network is very inefficient
in terms bandwidth and power usage.
• In-network aggregation
– Compute aggregation value such as AVG, SUM, COUNT and
MIN/MAX, over routing tree, minimizing both the number of
messages as well as the size of the message.
• Drawback: not suitable for some application
(sophisticated analysis by computing median, quantiles,
and consensus measures.).
EECS 600 Advanced Network Research, Spring 2005
6
AVG vs. MEDIAN
•
To compute AVG, every node sends two
integers to its parent:
1. One representing the sum of all data values of its
children
2. Total number of its children.
– That is, AVG can be computed by using constant
memory and by sending constant sized messages.
•
To compute MEDIAN, we need to keep track of
all distinct values and thus the message size
and memory required to store it grows linearly
with the size of the networks.
EECS 600 Advanced Network Research, Spring 2005
7
This paper focus on…
• Approximation schemes:
– 100% accurate is not necessary
– Approximation scheme in this paper based on qdigest can be adapted to meet any user specified
tolerance at the expense of higher memory and
bandwidth consumption.
• Q-digest:
– A novel data structure provides guarantees on
approximation error and maximum resource
consumption.
EECS 600 Advanced Network Research, Spring 2005
8
Context
• Introduction
• Background & Related Work
• Q-digest
• Queries on q-digest
• Experiment
• Discussion
EECS 600 Advanced Network Research, Spring 2005
9
Assumption
• Each sensor’s reading is assumed to be an
integer value in the range [1,σ] – σis the
maximum possible value of the signal.
• Query from BS; The sensors organize
themselves in a spanning tree. (Q: How can we
get this spanning tree?)
• Link quality is perfect – no packets loss
EECS 600 Advanced Network Research, Spring 2005
10
An aggregation such as MEDAIN is more
difficult than MIN, MAX, or AVERGAGE
• Under the natural assumption that each sensor
only forwards a fixed amount of data, it is easy
to argue that one cannot calculate the median
(or any other quantile) precisely.
• Example: A = B U C
– B = {1,2,3}; C={1000,1001,1002}
• Then only approximation is possible.
EECS 600 Advanced Network Research, Spring 2005
11
Context
• Introduction
• Background & Related Work
• Q-digest
• Queries on q-digest
• Experiment
• Discussion
EECS 600 Advanced Network Research, Spring 2005
12
Q-digest
• Interesting properties:
– Error-Memory Trade-off
– Confidence Factor
– Multiple Queries
EECS 600 Advanced Network Research, Spring 2005
13
Example of q-digest
• A q-digest consists of a set
of buckets of different sizes
and their associated counts.
– The depth of the tree T is
log(σ)
– Each node v can be
considered a bucket and has
a range (v.min, v.max)
– The size of the q-digest is
determined by a compression
parameter k. (Q: How to
determine the compression
parameter k?)
EECS 600 Advanced Network Research, Spring 2005
14
Q-digest property
• Property
– i) count(v) <= [n/k] (for a non-leaf node)
– ii) count(v) + count(vp) + count(vs) > [n/k] (for a non-root node)
•
Comments
– i) assert that unless it is a leaf node, no node should have a high
count. This property will be used later to prove error bounds on
q-digest
– ii) says that we should not have a node and its children with low
counts. The intuition behind this property is that if two adjacent
buckets which are siblings have low counts, then we do not want
to include tow separate counters for them
EECS 600 Advanced Network Research, Spring 2005
15
Building a q-digest
• An exact representation of the
data will consist of the
frequencies {f1,f2,…,fσ}
• To construct the q-digest we will
hierarchically merge and reduce
the number of buckets (bottomup)
• Comment
– Detailed information concerning
data values which occur
frequently (such as node with 4
and 6) are preserved in the
digest.
– While less frequently occurring
values (such as node d) are
lumped into larger buckets
resulting in information loss.
EECS 600 Advanced Network Research, Spring 2005
16
Merging q-digests
•
Take union of the two q-digest and add
the counts of buckets with the same
range ([min,max])
•
Then compress the result, to build a new
q-digest
EECS 600 Advanced Network Research, Spring 2005
17
Representation of a q-digest
• To represent a q-digest tree in a compact fashion
we number the nodes from 1 to 2σ-1 in a lever
by lever order
• To transmit the q-digest we send a set of tuple of
the following form <nodeid(v), count(v)> which
requires a total of (log(2σ)+log n) bits for each
tuple. Q: Why?
EECS 600 Advanced Network Research, Spring 2005
18
Representation of a q-digest
• Example
– {<1,1>,<6,2>,<7,2>,<10,4>,<
11,6>}
EECS 600 Advanced Network Research, Spring 2005
19
Context
• Introduction
• Background & Related Work
• Q-digest
• Queries on q-digest
• Experiment
• Discussion
EECS 600 Advanced Network Research, Spring 2005
20
Queries on q-digest
• Quantile Query: Given a fraction q belongs to
(0,1), find the value whose rank in sorted
sequent of the n value is nq.
– Sort the nodes of q-digest in increasing right
endpoints (max values)
– Post-order traversal of list nodes in q-digest
– Scan list L (from the beginning) and add the counts of
nodes as they are seen. For some node v, this sum
becomes more than qn, we report v.max as our
estimate of the quantile.
• Example: MEDIAN query on q-digest Q shown in
Fig 1.
EECS 600 Advanced Network Research, Spring 2005
21
Example
• Q: MEDIAN query on qdigest Q shown in Fig 1. {<1,1>,<6,2>,<7,2>,<10,4
>,<11,6>}
• A: (post-order)
– The sorted list is
{<10,4>,<11,6>,<6,2>,<7,2>
,<1,1>}
– The count at node <11,6>
will be more than 0.5n (8).
Then the answer is 4
EECS 600 Advanced Network Research, Spring 2005
22
Queries on q-digest
• Other queries
– Inverse quantile
– Range Query
– Consensus Query
• The method is similar…
EECS 600 Advanced Network Research, Spring 2005
23
Context
• Introduction
• Background & Related Work
• Q-digest
• Queries on q-digest
• Experiment
• Discussion
EECS 600 Advanced Network Research, Spring 2005
24
Experimental Evaluation
• Setup
– C++
– Network topology – routing
tree
– All pair of nodes within a
fixed radio range can be
considered as neighbors.
– 1000*1000 area and 1000
sensors (then 2000*2000
area and 2000 sensors)
– Sensor data value:
random and correlated
EECS 600 Advanced Network Research, Spring 2005
25
Range queries and Histogram
EECS 600 Advanced Network Research, Spring 2005
26
Accuracy and Message Size
EECS 600 Advanced Network Research, Spring 2005
27
Accuracy and Message Size
• What is the list?
• Q: Why the listcorrelated and listrandom looks
different.
EECS 600 Advanced Network Research, Spring 2005
28
Accuracy and Message Size
• Given a message size
m, we ask the question:
What fraction of total
nodes transmitted
messages of size large
then m?
EECS 600 Advanced Network Research, Spring 2005
29
Total Data Transmission
• Sine the number of
distinct values is
less for correlated
scenario, the
amount of data
transferred is lower
for correlated data.
EECS 600 Advanced Network Research, Spring 2005
30
Thanks
• Comment?
EECS 600 Advanced Network Research, Spring 2005
31