Data Dissemination
Download
Report
Transcript Data Dissemination
Data Dissemination
Peyman Teymoori
1
Introduction
Data dissemination: the process by which
queries or data routed in the network
Source: the node generating data
Event: the information to be reported
Sink: the node interested in an event
Two general steps:
Interest propagation: temperature, intrusion
Data propagation: routing, aggregation
2
Routing Models
Address Centric
Each source independently send data to sink
Data Centric
Routing nodes en-route look at data sent
Source 2
Source 1
Source 2
Source 1
A
B
A
B
DC
Routing
AC
Routing
Sink
3
Sink
Differences with Current Networks
Difficult to pay special attention to any individual
node:
Collecting information within the specified region
Collaboration between neighbors
Sensors may be inaccessible:
embedded in physical structures.
thrown into inhospitable terrain.
Sensor networks deployed in very large ad hoc
manner
No static infrastructure
4
Differences with Current Networks
They will suffer substantial changes as nodes fail:
battery exhaustion
accidents
new nodes are added.
User and environmental demands also contribute
to dynamics:
Nodes move
Objects move
Data-centric and application-centric
Location aware
Time aware
5
Overall Design of Sensor Networks
One possible solution?
Internet technology coupled with ad-hoc routing
mechanism
Each node has one IP address
Each node can run applications and services
Nodes establish an ad-hoc network amongst
themselves when deployed
Application instances running on each node can
communicate with each other
6
Why Different and Difficult?
A sensor node is not an identity (address)
Content based and data centric
Where are nodes whose temperatures will exceed
more than 10 degrees for next 10 minutes?
Tell me the location of the object ( with interest
specification) every 100ms for 2 minutes.
Multiple sensors collaborate to achieve one goal.
Intermediate nodes can perform data aggregation
and caching in addition to routing.
where,
when, how?
7
Challenges
Energy-limited nodes
Computation
Aggregate data
Suppress redundant routing information
Communication
Bandwidth-limited
Energy-intensive
Scalability: ad-hoc deployment in large scale
Robustness: unexpected sensor node failures
Dynamic changes: no a-priori knowledge, mobility
Goal: Minimize energy dissipation
8
Aggregation
Many studies addressing not only the routing problem
but also representing and combining data more
efficiently
Process of data while being forwarded toward the
sink
Reducing the number of transmissions
Definition:
Gathering & routing information in a multihop network
Processing data at intermediate nodes to
Reduce resource consumption
Increase network lifetime
9
Aggregation
Two approaches:
In-network with size reduction
More ability to reduce traffic
Less accuracy
Difficulty in reconstructing the original data
In-network without size reduction
Merging some smaller packets into one
Requires:
Networking protocol: routing
Effective aggregation functions
Data representation
10
Aggregation
A different routing paradigm is required
Data-centric routing: nodes route packets based on
packet content
Taking into account:
The most suitable aggregation points
Data type
Priority of information
Timing strategies:
Periodic simple aggregation
Periodic per-hop aggregation
Periodic per-hop adjusted aggregation
depends on the node’s position in the gathering tree
11
Aggregation
Aggregation functions:
Lossy & lossless
Duplicate sensitive vs. duplicate insensitive
AVG vs. MAX
Data representation:
limited storage capabilities
vary according to the application requirements
distributed source coding: A method to deal
with data representation and compression
12
Network Protocols for In-Network
Aggregation
How to forward packets in order to facilitate
in-network aggregation
Several approaches:
Tree-based (Hierarchical): SPTs rooted at
sink, or nodes grouped into clusters
Multi-path routing: DAG, more robust
Hybrid approaches
13
Flooding
Just broadcast what you receive and is not
yours (consider the max hop count)
Disadvantages:
Implosion
(a)
A
B
(a)
(a)
C
D
(a)
Data overlap
q
r
A
(q,r)
s
B
C
(r,s)
Resource blindness
14
Gossiping
A modified version of flooding
Random selection of neighbors for broadcast
Avoids implosion
Disadvantages:
Taking a long time to propagate
No delivering guarantee
15
Rumor Routing
An agent-based path creation algorithm
Agents or “ants”:
long-lived entities created at random by nodes
packets circulated to establish shortest paths
to events they encounter
perform path optimization
Motivation
Sometimes a non-optimal route is satisfactory
16
Rumor Routing
Creating Paths:
Nodes having observed an
event send out agents
which leave routing info to
the event as state in nodes
Agents attempt to travel in
a straight line
If an agent crosses a path
to another event, it begins
to build the path to both
Agent also optimizes paths
if they find shorter ones.
17
Rumor Routing
Basis for algorithm:
Observation: Two lines in
a bounded rectangle
have a 69% chance of
intersecting
Create a set of straight
line gradients from event,
then send query along a
random straight line from
source.
Event
Source
18
Rumor Routing
(a)
(b)
19
Rumor Routing
Advantages:
Tunable best effort delivery
Tunable for a range of query/event ratios
Disadvantages:
Optimal parameters depend heavily on
topology (but can be adaptively tuned)
Does not guarantee delivery
20
Sensor Protocols for Information via
Negotiation (SPIN)
A data-centric routing approach
Uses negotiation & resource adaptation to
address deficiencies of flooding
Two basic ideas:
Exchanging sensor data may be
expensive, but exchanging data about
sensor data may not be.
Nodes need to monitor and adapt to
changes in their own energy resources
21
Sensor Protocols for Information via
Negotiation (SPIN)
Data negotiation
Meta-data (data naming)
Application-level control
Model “ideal” data paths
SPIN messages
ADV- advertise data
REQ- request specific data
DATA- requested data
Resource management
ADV
A
B
REQ
A
B
DATA
A
B
22
Sensor Protocols for Information via
Negotiation (SPIN)
A
B
23
Cost-Field Approach
Sets up minimum cost paths to a sink
A two-phase process:
Set up the cost field (metrics such as delay)
Data dissemination using the cost
At each node, cost = min cost to the sink
No explicit path information
24
Cost-Field Approach
Setting up the cost field:
Sink broadcasts: ADV + its cost as 0
A node N hears an ADV from M:
Its path cost min(L
N , LM C NM )
Ln : total cost from node N to sink
Lm : the cost of node M to sink
Cnm : the cost from node N to M
If Ln was updated, the new cost is broadcasted (new
ADV)
Flooding-based implementation of Dijkstra’s algorithm
A back-off-based approach, Time to defer = γ * Cmn
γ is a parameter of the algorithm
25
Cost-Field Approach
An example of setting up the cost field
γ = 10
26
Cost-Field Approach
Data dissemination:
A source sends a message
cost = Cs
cost-so-far = 0
In an intermediate node M (with cost = Cm):
If cost-so-far + Cm = Cs then
Forward the packet
27
Directed Diffusion
A reactive data-centric protocol
Suitable for monitoring
Expressed in terms of named data
Organized in three phases:
Interest dissemination
Gradient setup
Path reinforcement & forwarding
28
Directed Diffusion
Naming
A list of attribute – value pairs
Animal tracking:
Request
Interest ( Task ) Description
Type = four-legged animal
Interval = 20 ms
Duration = 1 minute
Location = [-100, -100; 200, 400]
Reply
Node data
Type =four-legged animal
Instance = elephant
Location = [125, 220]
Confidence = 0.85
Time = 02:10:35
29
Directed Diffusion
Interest
The sink periodically broadcasts interest
messages
Every node maintains an interest cache
Each item corresponds to a distinct interest
No information about the sink
Interest aggregation : identical type, completely
overlap rectangle attributes
Each entry in the cache has several fields
Timestamp: last received matching interest
Several gradients: data rate, duration, direction
30
Directed Diffusion
Setting Up Gradient
Source
Sink
Neighbor’s choices :
1. Flooding
2. Geographic routing
3. Cache data to direct interests
Interest = Interrogation
Gradient = Who is interested
(data rate , duration, direction)
31
Directed Diffusion
Data propagation
Sensor node computes the highest
requested event rate among all its outgoing
gradients
When a node receives a data:
Find a matching interest entry in its cache
Examine the gradient list, send out data by rate
Cache keeps track of recent seen data items
(loop prevention)
Data message is sent individually to the
relevant neighbors (unicast)
32
Directed Diffusion
Reinforcing the best path
Source
The neighbor reinforces a path:
1. At least one neighbor
2. Choose the one from whom
it first received the latest event (low delay)
3. Choose all neighbors from which
new events were recently received
Low rate event
Sink
Reinforcement = Increased interest 33
Directed Diffusion
The reinforced path must be periodically
refreshed
A trade off based on network dynamics:
Frequency of gradient setup
Achieved performance
MAC layer issues:
Keeping local (control) traffic at a low level
Avoid collision, delay
Enhanced Directed Diffusion
Joint of Directed Diffusion & cluster-based arch.
34
Tiny AGgregation (TAG)
A tree-based data-centric approach
Timing: periodic per hop adjusted
Two main phases:
Distribution: disseminating queries
Collection: aggregating & routing readings
Declarative interface for data collection and
aggregation – SQL style
35
Tiny AGgregation (TAG)
A sample query:
SELECT {agg(expr), attrs} from SENSOR
WHERE {selPreds}
GROUP BY {attrs}
HAVING {havingPreds}
EPOCH DURATION i
SELECT: an expression over one or more aggregation values
expr: the name of a single attribute
agg: aggregation function
attrs: the attributes by which the sensor readings are partitioned
WHERE, HAVING: filters out irrelevant readings
GROUP BY: specifies an attribute based partitioning of readings
EPOCH DURATION: time interval of aggr record computation
36
Tiny AGgregation (TAG)
Distribution:
The sink broadcasts an organizing message
Message contains level & distance from the root
Node receiving the message:
If it doesn’t belongs to any level:
Its level = message.level + 1
Its parent = the sender node
Rebroadcasts the message adding its own ID & level
Broadcasting the query along the structure
37
Tiny AGgregation (TAG)
Collection:
Each parent waits for data
Then sends its aggregation up the tree
Epochs are divided slots equals to the max
depth of the tree – Sleep & Wake up
Every epoch, new aggregate produced
Most of the times, motes are idle and in
low power state
38
Tiny AGgregation (TAG)
SELECT COUNT(*)
FROM sensors
Sensor #
1
4
2
3
4
1
5
1
2
3
<- Time
3
2
1
4
4
1
5
39
Tiny AGgregation (TAG)
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
1
4
5
4
<- Time
3
2
1
2
3
2
2
4
1
4
5
40
Tiny AGgregation (TAG)
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
1
4
4
1
<- Time
3
2
1
5
3
2
3
2
1
3
4
1
4
5
41
Tiny AGgregation (TAG)
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
5
1
4
5
4
1
<- Time
3
4
3
2
2
1
2
1
3
4
5
5
42
Tiny AGgregation (TAG)
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
1
4
5
4
1
<- Time
3
4
3
2
2
1
2
1
3
4
5
1
1
5
43
Tiny AGgregation (TAG)
A grouping example
44
Synopsis Diffusion
Synopsis Diffusion : In-network aggregation
Synopsis Functions
Synopsis Generation: s=SG(r)
Synopsis Fusion: s=SF(s1,s2)
Synopsis Evaluation: r*=SE(s)
Efficient Topology for Synopsis Diffusion
Rings (R0 R1 … Ri-1 Ri )
Duplicate Sensitive Aggregates Mapping
DS Aggregates Order- and duplicateinsensitive synopsis
45
Synopsis Diffusion
Aggregate : A metric of aggregation
Class of Aggregates
Median
Count Distinct
Average
Histogram
MIN
Count
Not TAG
Sink
46
Synopsis Diffusion
[
]
47
Synopsis Diffusion
Phases of SD:
Distribution Phase
Aggregate query is flooded through the network
Network node form a set of rings
Aggregation Phase
Each node uses SG to convert local data to local
synopsis and then uses SF to merge two synopsis to
create a new one. The query initiator uses the SE to
generate the final result.
Adapting the Topology
Ring Topology
Adaptive Ring Topology
Nodes moves up or down in the rings dependent upon
the messages it overhears.
48
Delay Bounded Medium Access
Control (DB-MAC)
A tree-based aggregation scheme
A joint design of routing & MAC protocols
Minimizes the latency for delay bounded
apps.
Takes advantage of data aggregation
Adopts CSMA/CA scheme based on
RTS/CTS/DATA/ACK handshake
Suitable for cases where:
Different sources sense an event
There is delay constraint
49
Delay Bounded Medium Access
Control (DB-MAC)
A message exchange example in DB-MAC
50
Delay Bounded Medium Access
Control (DB-MAC)
Dynamically aggregates data while forwarding toward
the sink
Aggregation tree is built on the fly
No knowledge of the network topology
RTS/CTS are exploited to do aggregation
Back-off intervals respect priorities
By overhearing CTSs, the relay node is chosen
Advantages:
Flexible & distributed construction of the tree
Suitable for dynamic topologies
Energy-efficient due to cross-layer design
51
Tributaries and Deltas
A hybrid approach: tree-based & multipath
Idea
In low packet loss rate regions, use tree (T)
In high loss rate regions, use multipath (M)
How to link the regions:
M nodes only send data to M nodes
M nodes subgraph includes the sink
Thresholds for M node percentage
52
Tributaries and Deltas
53