ppt - CSE, IIT Bombay

Download Report

Transcript ppt - CSE, IIT Bombay

Resilient and Coherence Preserving
Dissemination of Dynamic Data Using
Cooperating Peers
Shetal Shah, IIT Bombay
Kirthi Ramamritham, IIT Bombay
Prashant Shenoy, UMass
1
Streaming (Dynamic) Data


Rapid and unpredictable changes
Used in on-line decision making/monitoring

Data gathered by (wireless sensor) networks



Sensors that monitor light, humidity, pressure, and heat
Network traffic passing via switches
Sports scores, stock prices
Query bw_police:
SELECT source_domain, number_of_packets
FROM network_state;
WHEN bw_used_exceeds_allocation
2
Coherency Requirement (c )
Clients specify the bound on the tolerable
imprecision of requested data.

#packets sent, max incoherency = 100
Source
S(t)
Repository
P(t)
Client
U(t)
U (t )  S (t )  c
Metric:
Fidelity: % of time when the coherency requirement is met
Goal: To achieve high fidelity and resiliency
3
Point to Point Data Dissemination

Pull:
Client pulls data from the source.

Push:
Source pushes data of interest to the client.
4
Generic Architecture
mobile hosts
End-hosts
Network
Network
wired hosts
sensors
servers
Data sources
5
Generic Architecture
mobile host
End-hosts
Network
Network
wired host
Proxies
/caches
sensors
servers
Data sources
6
Where should the queries execute ?

At clients



can’t optimize across clients, links
At source (where changes take place)
 Advantages
 Minimum number of refresh messages, high fidelity
 Main challenge
 Scalability
 Multiple sources hard to handle
At Data Aggregators -- DAs/proxies -- placed at edge of network
 Advantages
 Allows scalability through consolidation, Multiple data sources
 Main challenge
 Need mechanisms for maintaining data consistency at DAs
7
The Basic Problem…
Executing CQs at the edge of the network
-- at Data Aggegators
will improve scalability and reduce overheads
but poses challenges for consistency
maintenance

Computation / data

dissemination
moving query processing
on streaming/changing data

from sources to execution points
(edge servers -- proxies/caches/repositories)
8
Focus of this talk


To create
a resilient and efficient
Sources
content distribution network (CDN)
for streaming/dynamic data.
Existing CDNs for static data
and dynamic web pages:
Akamai, Dynamai
Cooperating
Repositories
Clients
9
Overview




Architecture of our CDN
Techniques for data
dissemination
Techniques to build the
overlay network
Resilient data dissemination
-- suitable replication of
disseminated data
Sources
Cooperating
Repositories
Clients
10
Cooperative Repository
Architecture




Source(s), repositories
(and clients)
Each repository
specifies its coherency
requirement
Source pushes specific
changes to selected
repositories
Repositories cooperate
with


each other
the source
11
Dissemination Graph: Example
Data Set: p,q,r,s
Max # push connections : 2
Source
A
p:0.2, q:0.2
B
r:0.2
D
C
p:0.4, r: 0.3
q: 0.3
Client
q:0.3
12
Overview




Architecture of our CDN
Techniques for data
dissemination
Techniques to build the
overlay network
Resilient data dissemination
-- suitable replication of
disseminated data
Sources
Cooperating
Repositories
Clients
13
Data Dissemination



Different users have
different coherency req
for the same data item.
Coherency requirement at
a repository should be at
least as stringent as that
of the dependents.
Repositories disseminate
only changes of interest.
Source
A
B
p:0.2, q:0.2
r:0.2
D
C
p:0.4, r: 0.3
q: 0.3
Client
q: 0.4
14
Data Dissemination
A repository P sends changes of interest
to the dependent Q if
x x c
P
Q
Q
15
Data dissemination
-- must be done with care
c  0.3
c  0.5
Repository P
Repository Q
1
1
1
1
1.4
1.4
1
1
P
Source
1
1.2
1.4
1.5
1.7
Q
1.7
should prevent missed updates!
1.7
16
Dissemination Algorithms


Source Based (Centralized)
Repository Based (Distributed)
17
Source Based
Dissemination Algorithm

For each data item, source maintains



unique coherency requirements of repositories
the last update sent for that coherency
For every change,



source finds the maximum coherency
for which it must be disseminated
tags the change with that coherency
disseminates (changed data, tag)
18
Source Based
Dissemination Algorithm
c  0.3
c  0.5
Repository P
Repository Q
1.2
1.4
1
1
1
1
1.4
1
1.5
1.7
1.5
1.5
1.5
1.5
P
Source
1
Q
19
Repository Based
Dissemination Algorithm
A repository P sends changes of interest
to the dependent Q if
x  x  c c
P
Q
Q
P
20
Repository Based
Dissemination Algorithm
c  0.3
c  0.5
Repository P
Repository Q
1
1
1
1
1.4
1.4
1.4
1.4
1.7
P
Source
1
1.2
1.4
1.5
1.7
1.7
Q
21
Dissemination Algorithms
– number of checks by source.
Repository based algorithm
requires fewer checks at
source
22
Dissemination Algorithms
– number of messages
Source based algorithm
requires less messages
23
Overview




Architecture of our CDN
Techniques for data
dissemination
Techniques to build the
overlay network
Resilient data dissemination
-- suitable replication of
disseminated data
Sources
Cooperating
Repositories
Clients
24
Logical Structure of the CDN
Repositories want many
data items,
each with different
coherency requirements.
 Issues:
Which repository should
serve what to whom?

Source
A
p:0.2, q:0.2
D
p:0.4, r: 0.3
B
r:0.2,q:0.4
C
q: 0.3
25
Constructing the Layout Network
Insert repositories one by one

Check level by level starting from the
source


Each level has a load controller.
The load controller tries to find data
providers for the new repository(Q).
26
Selecting Data Providers



Repositories with low preference factor are
considered as potential data providers.
The most preferred repository with a needed
data item is made the provider of that data item.
The most preferred repository is made to provide
the remaining data items.
27
Preference Factor

Resource Availability factor:
Can repository (P) be the provider for one more dependent?

Data Availability Factor:
#data items that P can provide for the new repository Q.

Computational delay factor:
#dependents P provides for.

Communication delay factor:
network delay between the 2 repositories.
delay (P,Q )
# data items P can serve Q
28
Experimental Methodology






Physical network: 4 servers, 600 routers,
100 repositories
Communication delay: 20-30 ms
Computation delay: 3-5 ms
Real stock traces: 100-1000
Time duration of observations: 10,000 s
Tight coherency range: 0.01 to 0.05
loose coherency range: 0.5 to 0.99
29
Loss of fidelity for different
coherency requirements
Loss in fidelity %
T% of the data items have stringent
coherency requirements
The less stringent the
coherency
requirement,
the better the fidelity.
Degree of cooperation
For little/no cooperation,
loss in fidely is high.
Too much cooperation?30
Controlled cooperation
Actual degree of cooperation
average network delay

average comp delay  # interested dependents
31
Controlled cooperation is
essential
Degree
Degreeof
ofcooperation
cooperation
Without controlled
cooperation
Max degree of cooperation
With controlled
cooperation
32
But …
Loss in fidelity increases for large # data
items.
Repositories with stringent coherence requirements
should be closer to the source.
33
If parents are not chosen
judiciously
It may result in
 Uneven distribution of
load on repositories.
 Increase in the
number of messages
in the system.
Source
A
p:0.2, q:0.2
D
p:0.4, r: 0.3
B
r:0.2
C
q: 0.3
Increase in loss in fidelity!
34
Data item at a Time Algorithm



A dissemination tree for
each data item.
Source serving the data
item is the root.
Repositories with more
stringent coherency
requirements are placed
closer to the root.
Source
q: 0.2
A
q: 0.3
C
Dynamic Data
Dissemination Tree - for q
When multiple data items are considered,
repositories form a peer to peer network
35
DiTA


Repository N needs data item x
If the source has available push connections,
or the source is the only node
in the dissemination tree for x


N is made the child of the source
Else

repository is inserted in most suitable subtree where


N’’s ancestors have more stringent coherency requirements
N is closest to the root
36
Most Suitable Subtree?



l: smallest level in the subtree with
coherency requirement less stringent
than N’’s.
d: communication delay from the root
of the subtree to N.
smallest (l x d ): most suitable subtree.
Essentially, minimize communication delays!
37
Example
Initially the network consists of the source.
Source
A and B request service of q with
coherency requirement 0.2
B
q: 0.2
q: 0.1
0.2
C
A
q: 0.2
A
C requests service of q with
coherency requirement 0.1
38
Experimental Evaluation
39
Overview




Architecture of our CDN
Techniques for data
dissemination
Techniques to build the
overlay network
Resilient data dissemination
-- suitable replication of
disseminated data
Sources
Cooperating
Repositories
Clients
40
Handling Failures in the Network





Need to detect permanent/transient failures
in the network and to recover from them
Resiliency is obtained by adding redundancy
Without redundancy,
failures  loss in fidelity
Adding redundancy can increase cost
 possible loss of fidelity!
Handle failures such that
cost of adding resiliency is low!
41
Passive/Active Failure Handling

Passive failure detection:

Parent sends I’m alive messages at the end of
every time interval.


what should the time interval be?
Active failure handling:


Always be prepared for failures.
For example: 2 repositories can serve the same
data item at the same coherency to a child.

This means lots of work
 greater loss in fidelity.
Need to be clever 
42
Middle Path
Let repository R want data item x with coherency c.
A backup parent B is found
for each data item that
the repository needs
B
P
c
At what coherency should
B serve R ?
k×c
R
B serves R with coherency k × c
43
If a parent fails

Detection: Child gets two
consecutive updates from
the backup parent with
no updates from the
parent
P
B
c

ck x c
Recovery: Backup
parent is asked to
serve at coherency c
till we get an update
from the parent
R
44
Adding Resiliency to DiTA



A sibling of P is chosen as the
backup parent of R.
If P fails,
A serves B with coherency c
 change is local.
If P has no siblings, a sibling
of nearest ancestor is chosen.
Else the source is made the
backup parent.
A
B
P
c
kxc
R
45
Markov Analysis for k
Assumptions
 Data changes as a random walk along the line
 The probability of an increase is the same as
that of a decrease
No assumptions made about the unit of change
or time taken for a change
2
Expected # misses for any k <= 2 k – 2
for k = 2, expected
# misses <= 6
46
Failure and Recovery Modelling
Trend for time between failure:

Failures and recovery
modeled based on
trends observed in
practice
Analysis of link failures in an IP
backbone by G. Iannaccone et al
Internet Measurement Workshop
2002
Recovery:10%
> 20 min
40% > 1 min & < 20 min,
50% < 1 min
47
In the Presence of Failures,
Varying Recovery Times
Addition of resiliency does improve fidelity.
48
In the Presence of Failures,
Varying Data Items
Increasing Load
Fidelity improves with addition of resiliency even for large
number of data items.
49
In the Absence of Failures
Increasing Load
Often, fidelity improves with addition of
resiliency, even in the absence of failures!
50
Ongoing Work

Mapping Clients to Repositories.


Locality, coherency requirements, load,
mobility
Reorganization of the CDN.


Snapshot based
Dynamic Reorganization
51