Slides - Bretagne

Download Report

Transcript Slides - Bretagne

Efficient P2P backup through buffering at the edge
S. Defrance, A.-M. Kermarrec (INRIA), E. Le Merrer, N. Le Scouarnec, G. Straub, A. van Kempen
Peer to Peer backup system
Exploit users’ ressources :
each user provides storage space
« Pure » P2P backup systems severely limited by:
• Low availability
• Asymmetric bandwidth (Low uplink speed)
• Asynchrony
Peer 1
Peer 2
0h
12 h
24 h
Time To Backup (TTB) and Time to restore (TTR) data may be very high
Practical deployment is limited
2
4/9/2016
CDN-assisted architecture
Architecture proposed in P2P 2010 :
Server = Reliable component
The performances of client-server systems are approached
(in terms of Time To Backup and Time To Restore data)
However :
• A centralized part remains
• Not fully convenient for users
3
4/9/2016
What we propose
To take into account the low-level structure of network
(i.e the presence of gateways in home networks)
To use gateways to distribute the centralized part of the hybrid scheme
LAN
Home network
(LAN)
LAN
LAN
Gateways are turned into stable buffering layers
Mask the asynchrony between peers
4
4/9/2016
Why gateways are good candidates ?
Home network
• Already present in users 'homes
• Storage capable (for buffering)
• Highly available
• At the frontier between a fast LAN
and a slow WAN
5
4/9/2016
Gateways are highly available
We periodically pinged a random set of static IP of a french ISP*
Gateways up
25000
• 25,000 gateways
School holidays in France
22000
19000
16000
• For 7.5 months
13000
b
Fe
11
4/9/2016
1
6
n
Ja
*The trace is available at : http://www.thlab.net/~lemerrere/trace_gateways
1
1
•Average gateway availability : 86 %
• Large part is very stable
• A few have power-off habits (daily or holiday basis)
v
No
p
Se
l1
Ju
10000
How does it work ?
Prepare
(LAN speed)
7
4/9/2016
Backup
(WAN speed)
Offload
(LAN speed)
How do we evaluate ?
Trace-based simulation using public traces
• To model peers behavior :
-Skype 28 Days
1269 Peers
AvailabilityMean = 0.5
-Jabber 28 Days
465 Peers
AvailabilityMean = 0.27
Scenario:
Size of archive : 1GB
Data creation : Poisson process
(3 backups/month/user avg)
Erasure code
50 simulations/curve
• To model gateways behavior : our gateway trace
• To model bandwidth uplink : trace from a study of residential broadband networks
UplinkMean = 66 kB/s
We randomly assign one gateway and one uplink speed to one peer of each trace
8
4/9/2016
What do we evaluate ?
We evaluate :
• Time To Backup (Hours)
• Time To Restore (Hours)
• Mean and Max data buffered (Mbytes)
TTB : Time between the backup
request and the time when the last
block has been completely uploaded
TTR : Time between the restore
request and the time we downloaded
enough data to reconstruct the file
We compare :
Pure P2P
(P2P)
9
4/9/2016
CDN-Assisted
(CDNA)
Gateway-Assisted
(GWA)
TTB & TTR (Skype trace)
• Time To Backup
(Stored safely at remote place)
90th Percentile of completed backup
GWA
CDNA
P2P
30 H
60 H
140 H
• Time To Restore
90th Percentile of completed restore
GWA
CDNA
P2P
3H
40 H
40 H
CDF
(Retrieve an archive locally)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
GWA
CDNA
P2P
CDNA & P2P
0.1
1
10
Hours
10
4/9/2016
100
Scaling (Skype trace)
Better scaling with archive size :
This enables users to backup larger
amounts of data
TTR (Hours)
120
GWA
CDNA
P2P
100
80
60
40
20
0
0
1
2
3
4
5
6
Archive size (GB)
11
4/9/2016
7
8
9
10
shortstack Buffer Consumed at each peer\\(Average in
1GB archives: 2.5GB needed (99%)
Realistic for current gateways
• Average usage remains low
Less than 1MB here
Data is really offloaded to peers
Gateway effectively used as buffers
CDF
• Low storage needs
Average storage on gateways (MB)
Dimensioning (Skype trace)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Total
0
0.5
1
1.5
2
2.5
4/9/2016
3.5
4
Provisionned Buffer (Max in GB)
1.2
Total
1
Stopping backups
0.8
0.6
0.4
0.2
0
0
100
200
300
400
Time (Hours)
12
3
500
600
700
Conclusion
• Realistic architecture for P2P backup systems
• Evaluation using trace-based simulation
• TTB and TTR are greatly reduced
(Network connection can be used more efficiently)
• More convenient for users :
Let to offload backup tasks quickly (LAN speed)
from the user’s machine to the gateway
• Fully decentralized
• Trace of gateway availability
13
4/9/2016
Thank you !
14
4/9/2016