3rd Edition: Chapter 2
Download
Report
Transcript 3rd Edition: Chapter 2
P2P Networking and
Content Distribution
March 28, 2013
2: Application Layer
1
Announcements
H/W due today (Calendar, Packet pair)
Calendar app 1 week extension is possible (but
w/ 10% point deduction)
Meeting w/ project mentors by Monday
Project plan presentation
Introduction/background
Problem definition (or research questions)
Related work (no need to be complete)
Approach (+supporting materials)
Plans (including refining research questions +
experimenting about ideas)
2: Application Layer
2
Reviews
Network app: client-server, p2p, hybrid
Programming: socket
Addressing issues
Transport layer vs. service requirements
TCP vs. UDP (differences)
HTTP: persistent vs. non-persistent
HTTP: cookies
DNS: distributed, hierarchical DB
DNS name hierarchy vs. Internet's topology
DNS resolution: iterative vs. recursive
2: Application Layer
3
Contents
P2P architecture and benefits
P2P content distribution
Content distribution network (CDN)
2: Application Layer
4
Pure P2P architecture
no always-on server
arbitrary end systems
directly communicate peer-peer
peers are intermittently
connected and change IP
addresses
Three topics:
File distribution
Searching for information
Case Study: Skype
2: Application Layer
5
File Distribution: Server-Client vs P2P
Question : How much time to distribute file
from one server to N peers?
us: server upload
bandwidth
Server
us
File, size F
dN
uN
u1
d1
u2
ui: peer i upload
bandwidth
d2
di: peer i download
bandwidth
Network (with
abundant bandwidth)
2: Application Layer
6
File distribution time: server-client
server sequentially
sends N copies:
NF/us time
client i takes F/di
time to download
Server
F
us
dN
u1 d1 u2
d2
Network (with
abundant bandwidth)
uN
Time to distribute F
to N clients using
= dcs = max {NF/us, F/min(di) }
i
client/server approach
increases linearly w.r.t. N (for large N)
2: Application Layer
7
File distribution time: P2P
Server
server must send one
F
u1 d1 u2
d2
copy: F/us time
us
client i takes F/di time
Network (with
dN
to download
abundant bandwidth)
uN
NF bits must be
downloaded (aggregate)
fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }
i
2: Application Layer
8
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Minimum Distribution Time
3.5
P2P
3
Client-Server
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
35
N
Client server ~ NF/us vs. P2P ~ NF/(us + Sui)
2: Application Layer
9
Contents
P2P architecture and benefits
P2P content distribution
Content distribution network (CDN)
2: Application Layer
10
P2P content distribution issues
Issues
Group management and data search
Reliable and efficient file exchange
Security/privacy/anonymity/trust
Approaches for group management and
data search (i.e., who has what?)
Centralized (e.g., BitTorrent tracker)
Unstructured (e.g., Gnutella)
Structured (Distributed Hash Tables [DHT])
2: Application Layer
11
Centralized model (Napster)
original “Napster” design
1) when peer connects, it
informs central server:
Bob
centralized
directory server
1
peers
IP address
content
2) Alice queries for “Hey
Jude”; server notifies
that Bob has the file..
3) Alice requests file from
Bob
1
3
1
2
Q: “Hey Jude”
A: Bob has it
1
Alice
2: Application Layer
12
Centralized model
Bob
Alice
file transfer is
decentralized, but
locating content is
highly centralized
Judy
Jane
2: Application Layer
13
Centralized model
Benefits:
Low per-node state
Limited bandwidth usage
Short search time
High success rate
Fault tolerant
Drawbacks:
Single point of failure
Limited scale
Possibly unbalanced load
Bob
Judy
Alice
Jane
2: Application Layer
14
File distribution: BitTorrent
P2P file distribution
tracker: tracks peers
participating in torrent
torrent: group of
peers exchanging
chunks of a file
obtain a list
of peers
trading
chunks
peer
2: Application Layer
15
BitTorrent (1)
file divided into 256KB
chunks.
peer joining torrent:
has no chunks, but will accumulate them over time
registers with a tracker to get list of peers,
connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other
peers.
peers may come online and go offline
once peer has entire file, it may (selfishly) leave or
(altruistically) remain
2: Application Layer
16
BitTorrent (2)
Pulling Chunks
at any given time,
different peers have
different subsets of
file chunks
periodically, a peer
(Alice) asks each
neighbor for a list of
chunks that it has.
Alice sends requests
for her missing chunks
rarest first
Sending Chunks: tit-for-tat
Alice sends chunks to four
neighbors currently
sending her chunks at the
highest rate
re-evaluate top 4 every
10 secs
every 30 secs: randomly
select another peer,
starts sending chunks
newly chosen peer may
join top 4
“optimistically unchoke”
2: Application Layer
17
BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
(3) Bob becomes one of Alice’s top-four providers
With higher upload rate,
can find better trading
partners & get file faster!
2: Application Layer
18
P2P Case study: Skype
Skype clients (SC)
inherently P2P: pairs of
users communicate.
proprietary
Skype
login server
application-layer
protocol (inferred via
reverse engineering)
hierarchical overlay
with super nodes (SNs)
Index maps usernames
to IP addresses;
distributed over SNs
Supernode
(SN)
2: Application Layer
19
Contents
P2P architecture and benefits
P2P content distribution
Content distribution network (CDN)
2: Application Layer
21
Why Content Networks?
More hops between client and Web server
more congestion!
Same data flowing repeatedly over links
between clients and Web server
C1
C3
C4
S
C2
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt
- IP router
2: Application Layer
22
Why Content Networks?
Origin server is bottleneck as number of
users grows
Flash Crowds (for instance, Sept. 11)
The Content Distribution Problem: Arrange
a rendezvous between a content source at
the origin server (www.cnn.com) and a
content sink (us, as users)
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt
2: Application Layer
23
Example: Web Server Farm
Simple solution to the content distribution problem: deploy a
large group of servers
www.cnn.com
www.cnn.com
(Copy 1)
(Copy 2)
Request from
grad.umd.edu
www.cnn.com
(Copy 3)
Request from
ren.cis.udel.edu
L4-L7 Switch
Request from
ren.cis.udel.edu
Request from
grad.umd.edu
Arbitrate client requests to servers using an “intelligent”
L4-L7 switch
Pretty widely used today
2: Application Layer
24
Example: Caching Proxy
Majorly motivated by ISP business interests – reduction in
bandwidth consumption of ISP from the Internet
Reduced network traffic
Reduced user perceived latency
ISP
Client
ren.cis.udel.edu
Client
merlot.cis.ud
el.edu
Intercepters
TCP port 80
traffic
Other
traffic
Internet
www.cnn.com
Proxy
2: Application Layer
25
But on Sept. 11, 2001
Web Server
www.cnn.com
New Content
WTC News!
1000,000
other hosts
request
1000,000
other hosts
ISP
old
content
request
User
mslab.kaist.ac.kr
- Congestion /
Bottleneck
- Caching Proxy
2: Application Layer
26
Problems with discussed approaches:
Server farms and Caching proxies
Server farms do nothing about problems due to
network congestion
Caching proxies serve only their clients, not all
users on the Internet
Content providers (say, Web servers) cannot rely
on existence and correct implementation of
caching proxies
Accounting issues with caching proxies.
For instance, www.cnn.com needs to know the number of
hits to the webpage for advertisements displayed on the
webpage
2: Application Layer
27
Again on Sept. 11, 2001 with CDN
Web Server
www.cnn.com
New Content
WTC News!
WA
CA
MI
1000,000
other users
IL
MA
1000,000
other users
FL
NY
DE
request
User
new
content
mslab.kaist.ac.kr
-
Distribution
Infrastructure
- Surrogate
2: Application Layer
28
Web replication - CDNs
Overlay network to distribute content from
origin servers to users
Avoids large amount of same data repeatedly
traversing potentially congested links on the
Internet
Reduces Web server load
Reduces user perceived latency
Tries to route around congested networks
2: Application Layer
29
CDN vs. Caching Proxies
Caches are used by ISPs to reduce bandwidth
consumption, CDNs are used by content providers
to improve quality of service to end users
Caches are reactive, CDNs are proactive
Caching proxies cater to their users (web clients)
and not to content providers (web servers), CDNs
cater to the content providers (web servers) and
clients
CDNs give control over the content to the content
providers, caching proxies do not
2: Application Layer
30
CDN Architecture
Origin
Server
CDN
Request
Routing
Infrastructure
Distribution
& Accounting
Infrastructure
Surrogate
Surrogate
Client
Client
2: Application Layer
31
CDN Components
Distribution Infrastructure:
Moving or replicating content from content source
(origin server, content provider) to surrogates
Request Routing Infrastructure:
Steering or directing content request from a client to
a suitable surrogate
Content Delivery Infrastructure:
Delivering content to clients from surrogates
Accounting Infrastructure:
Logging and reporting of distribution and delivery activities
2: Application Layer
32
Server Interaction with CDN
www.cnn.com
1.
Origin server pushes new
content to CDN
Origin
Server
OR
1
CDN pulls content from origin
server
2
CDN
Distribution
Infrastructure
2. Origin server requests logs and
other accounting info from CDN
OR
CDN provides logs and other
accounting info to origin server
Accounting
Infrastructure
2: Application Layer
33
Client Interaction with CDN
1. Hi! I need www.cnn.com/sept11
2.
CDN
california.cnn.akamai.com
Surrogate
(CA)
Go to surrogate
newyork.cnn.akamai.com
Request
Routing
Infrastructure
3. Hi! I need content /sept11
newyorkcnn.akamai.com
Q:
How did the CDN choose the New
York surrogate over the California
surrogate ?
Surrogate
(NY)
1
2
3
Client
2: Application Layer
34
Request Routing Techniques
Request routing techniques use a set of
metrics to direct users to “best” surrogate
Proprietary, but underlying techniques
known:
DNS based request routing
Content modification (URL rewriting)
Anycast based (how common is anycast?)
URL based request routing
Transport layer request routing
Combination of multiple mechanisms
2: Application Layer
35
DNS based Request-Routing
Common due to the ubiquity of DNS
as a directory service
Specialized DNS server inserted in
a DNS resolution process
DNS server is capable of returning
a different set of A, NS or CNAME
records based on policies/metrics
2: Application Layer
36
DNS based Request-Routing
Q: How does the Akamai
DNS know which
surrogate is closest ?
Akamai
CDN
www.cnn.com
Akamai DNS
california.cnn.akamai.com
newyork.cnn.akamai.com
Surrogate
58.15.100.152
Surrogate
145.155.10.15
1) DNS query:
www.cnn.com
local DNS server (dns.nyu.edu)
test.nyu.edu
128.4.30.15
DNS response:
A 145.155.10.15
newyork.cnn.akamai.com
128.4.4.12
2: Application Layer
37
DNS based Request-Routing
www.cnn.com
Akamai
CDN
Akamai DNS
Surrogate
Surrogate
DNS query
test.nyu.edu
128.4.30.15
local DNS server
(dns.nyu.edu)
128.4.4.12
2: Application Layer
38
DNS based Request-Routing
www.cnn.com
Akamai DNS
Akamai
CDN
Requesting DNS - 76.43.32.4
Surrogate - 145.155.10.15
Surrogate
58.15.100.152
Surrogate
145.155.10.15
Requesting DNS - 76.43.32.4
Requesting DNS - 76.43.32.4
Available Bandwidth = 10 kbps
RTT = 10 ms
Client
76.43.35.53
Client DNS
76.43.32.4
Available Bandwidth = 5 kbps
RTT = 100 ms
www.cnn.com
A 145.155.10.15
TTL = 10s
2: Application Layer
39
DNS based Request Routing: Discussion
Originator Problem: Client may be far removed
from client DNS
Client DNS Masking Problem: Virtually all DNS
servers, except for root DNS servers honor
requests for recursion
Q: Which DNS server resolves a request for test.nyu.edu?
Q: Which DNS server performs the last recursion of the
DNS request?
Hidden Load Factor: A DNS resolution may result
in drastically different load on the selected
surrogate – issue in load balancing requests, and
predicting load on surrogates
2: Application Layer
40
CDN Strategies
Pushing content closer to the users: hop count
reduction (overall network traffic reduction)
CDN Strategies:
Limelight placing CDN servers near a small # of ISP core
nets
Akamai placing CDN servers deep into a large # of ISP
networks’ sites
Nano Data Center (NaDa) home gateways (STBs/modems)
as CDN servers (peer-to-peer delivery among NaDa servers)
Core
Router
Core Network
Edge
Router
OLT
ONT
DSLAM
Modem
Metro/Edge Network
Digital Media
Delivery Platform
Access
NaDa
Summary
P2P architecture and its benefits
P2P content distribution
BitTorrent, Skype
Content distribution network (CDN)
DNS-based request routing
2: Application Layer
42