Lecture 5: P2P vs ISP - Ofir Israel + Guy Paskar

Download Report

Transcript Lecture 5: P2P vs ISP - Ofir Israel + Guy Paskar

Ofir Israel
Guy Paskar
An
Internet
Tale
 Once upon a time..
 Users unhappy (slow connection)
 ISPs unhappy (poor revenues)
 Then came Broadband access...
 And everybody were happy
The Villain arrives
 P2P File-Sharing Applications (Kazaa,
eMule, BitTorrent, etc..)
 Users love it!
 Good and free content, overnight downloads
 ISPs hate it!
 Users using their entire link
 Internet link utilization gone wild
 More bandwidth costs more money!
But is it really a villain?
 Users love it
 Driving force for broadband adoption
 Increased revenues for ISPs
 What should the ISPs do??
Some Ideas
 User unfriendly ideas
 Increase subscription
cost
 Volume-based pricing
 Block/shape P2P traffic
(priority for non-P2P
packets)
 User friendly ideas
 Acquire more BW
 Network caching
Today..
 Generally understand the problem – DONE! (? )
 Describe an analytical model to help us understand
situation better
 Describe one practical solution and it’s empirical
results (Hint: it works)
Research Goals
• Modeling framework to analyze interactions between
P2P File-Sharing users and their ISPs
• Basic insight about system dynamics
• Used to evaluate different strategies to manage P2P
traffic
Meet the Players
 User
 Generates queries (P2P application finds the object and
retrieves it)
 Pays a subscription price, has QoS expectations
 What’s popular, what’s not
 ISP
 Goal: TO MAKE MONEY
 Sets subscription price
 Controls bandwidth
 Influences P2P app behavior
System Setting
n users
inside
ISP
User-ISP
ul., dl.
bandwidt
h
ISP-ISP ul.,
dl. bandwidth
User generates query
Gets a response from within the ISP
Or from a user in another ISP
N users in the
world
The Simple System Model
Aggregate
Average
query
queryrate
rate
Prob. P2P
App.
locates
object
Prob.
Object is
located
inside ISP
Unconstrained
downloads from
within the ISP
System
throughput
Object retrieval prob. (QoS):
User Utility Function
Benefit
Shape parameter
Cost
Subscription cost
Object retrieval prob.
Users subscribe only if:
Equivalently, if:
is the minimal service level acceptable by
user i
ISP Utility Function
Benefit
Revenues from
subscribers’ fee
ISP starts service only if:
Fixed cost
Cost
Cost per unit of BW
Traffic Locality
 Probability that there exists at least one internal
replica of object replicated r times in the system
Number of files inside ISP
Number of files outside ISP
 Probability to download from internal replica
Locality
parameter
Minimum BW
 Reminder:
 So:
 Assuming
 We get
Minimum BW
 Non-linear behavior (on n)
 More users  more locality  less BW needed
 Can be zero if n large enough (self-sustainability)
 Dependant on multiple parameters
 Self-Sustainability
Simple(?!) Model
Impact of Object Replication (r)
 More replicas  Better locality  Lower Bd needed
 Bmin has two roots: x1 – No users, x2 – Enough users for self-
sustainability
Impact of external QoS ( )
 Higher external QoS  More BW needed (because there
are more replicas externally)
Impact of prob. to locate objects (q)
 Some ISPs drop queries. This graph shows them different.
Impact of prob. to retrieve objects internally (Gamma)
 Det. by the ability to find a local object given that it exists.
 Can be influenced by the ISP – this graph shows it should.
Model Refinements
 Simple Model
 Users’ access BW are
unconstrained
 Object popularity is
identical
 Users availability
identical
 Refined Model
 Relax these assumptions
 Propose object
popularity and
replication model
Model Refinements
 We adopt a processor sharing model with rate limit bd
to describe the sharing of Bd
Now each user is
limited by it’s own BW.
Queue Model
Model Refinements
 We introduce a new parameter:
that describes user
patience
 Denote b as the initial download rate, and assuming
the decision to abort is made at the beginning then the
prob. pg to continue the transfer is:
 Larger eta  user claims to get a rate close to what
they paid for
Bmin as a function of bd=bu with different values of
gamma
 Higher gamma  smaller bu needed for self-sustainability
 Optimal gamma is not gamma=1 !!!
 For bu < 250 the BW available inside the ISP is not enough to satisfy
demanding users
Impact of asymmetric access BWs
 Cost for ISP increases as ratio increases (what about ADSL??)
 Larger bu  Better locality  lower Bd
Conclusions
 Locality is good for the ISPs
 More replicas, larger querying probability, larger
upload bandwidth for users’ access, larger probability
to retrieve objects internally (gamma)  SELF
SUSTAINABILITY == GOOD
 Reading slow leads to better understanding 
Further Reading
 Original paper of course:
Garetto et al, “A modeling framework to
understand the tussle between ISPs and peer-topeer file-sharing users” in Performance Evaluation,
June 2007
 Same as the original paper but talks about ISP-ISP
connections:
Wang et al, “Modeling the Peering and Routing
Tussle between ISPs and P2P Applications” in the
proceedings of IWQoS 2006
BREAK ?
Academic Work
 Oracle-based vs. non-Oracle-based (e.g., with ISP
cooperation or without)
 Legality issues, reluctance issues
 Improvements via locality research
 Network location or Geographic location?
 Which method of network location?
 Improvements via redirection research
 Can we redirect traffic to inexpensive links?
 Many more
Part 2
Taming the Torrent - A
Practical Approach to Reducing
Cross-ISP Traffic in Peer-to-Peer
Systems
David R. Choffnes and Fabián E. Bustamante
The problem




Over 66% of P2P users & growing
But how do we know which peer to choose?
Which peers? Trackers provide a random subset of
peers in the torrent
Random peer connections → growing ISP operation
costs.
So , how do we know if a suggested peer is inside our Isp
or outside?
 We want to reduce cross isp transport. Meaning use the
“closest” peers.
 But , how can we do that?

The ISP Perspective
 P2P performance - key factor for service upgrade &
selection by users
 A major engineering challenge for ISPs
 ≈70% of the Internet traffic
 But , a lot of cross isp ,means a lot of cost for the Isp.
 What can the isp do in order to fight the p2p users?
Isp methods and its problems
 ISPs shape traffic directed to standard ports
 P2Ps move to dynamic, non-standard ports
 ISPs turn to deep-packet inspection to identify &
shape P2P flows
 P2Ps encrypt their connections
 ISPs place caches and/or spoofs TCP RST msgs
 Legality issues. (Some ways to overcome this – in Israel!)
So good solution must be agreed by the p2p users!
One solution: Oracles.
 Suggestion – the isp’s itself will have to implement an
oracle, this oracles will guide the user which peers to
choose.
 Help reduce cross-ISP traffic
 This solution looks appealing But:
 Assumes P2P users & ISPs trust each other
 Misses incentive for user adoption
 Therefore not so good after all
34
The authors suggestion
 CDN’S – content distribution network. What is it?
 CDNs attempt to improve web performance by
redirecting requests to replica servers
 The goal is to help content providers (i.e. CNN) to
distribute content by redirecting requests to replica
servers that are:


Topologically proximate
Provide lower-latency
 But how do they do that?
How does CDN work?
 There are some ways that a CDN works by for example:
 Way 1 : I want to go to cnn.com  dns lookup ,
directs me to the domain name of the CDN
(cnn.akamai.com) CDN sends me to the right
replica.
 Way 2: I want to go to cnn.com dns lookup  first
page from original cnn.com, directs me to CDN
server  sends to right replica.
Reusing CDNs’ network views
 Client’s request redirected to “nearby” server
 Client gets web site’s DNS CNAME entry with domain name in CDN
network
 Hierarchy of CDN’s DNS servers direct client to nearby servers
Multiple redirections to find
nearby edge servers
Internet
Hierarchy of CDN
DNS servers
Customer DNS
servers
Web replica servers
(3)
Clients and replica
servers
are
“nearby”]
Client is given
2 web
replica
(4)
servers (fault tolerance)
Client gets CNAME entry
(2)
with domain name in Akamai
Client requests
translation for Yahoo
LDNS
(5)
(6)
(1)
Web client
37
Another web client
The authors suggestion
 So how do we use CDN?
 We are going to recycle data that is already being
collected by Content Distribution Networks, and use
it.
 But how?
 By simply comparing DNS redirections.
 Assumptions :
Links between “nearby” hosts cross few ISPs
If two hosts are close to the same CDN replica
servers, they are close to each other
Reducing cross-ISP traffic
 So we can use the CDN’s data, what are the advantages
for this recycilng?
 Does not requires any trust between isp and p2p users
 The infrastructure is already exist
 And most importantly reduces cross isp traffic without
harming the p2p users (even improving)
An approach to reducing cross isp
 Introducing “Ono”



Extension (plugin) to Azureus BitTorrent client
Will use CDN and ratio maps(?) to determine who is “closer”
We will describe its implementation
 Uses multiple CDN customers



Only DNS resolution, no content download needed
Adaptive lookup rates on CDN names
Represent redirection behavior using ratio-maps
Ratio Maps
 A ratio map represents the frequency of redirecting to a
specific replica
 Number of replicas is usually small (max 31)
 Keep a time window about 24 hours
 How does it looks?
Ratio maps represantation
 The ratio map of a peer (a) is a set of (replica server,
ratio)
for peer a
 Specifically, if peer a is redirected toward replica server
r1 75% of the time window, and toward replica server r2
25% of the time window, then the corresponding ratiomap is
 The sum of all
in a given ratio map equals one
 For each peer there exist a ratio map
 But what can we do with it?

choosing peers by ratio map
 2 peers has close ratio map , than we say that they
are close. ( possibly in the same network), and the
ooposit
 So , we need a calculation that will determine for
2 peers if they are “close” or not.
 Than we can check for all available peers and choose the
“closest” one
 For that we define cosine similarity for 2 peers
Cosine-similarity
 the cosine similarity of two maps will range from 0 to 1, since
the term frequencies cannot be negative
 If cos_sim(a,b) = 0 , the vectors are orthogonal
 if cos_sim(a,b) = 1 than they are the same
 This is very close to dot product
 And we determine a threshold currently 0.15 , if
cos_sim(a,b)>0.15 than we recommend these peers as close
Implementation
 Ono, an extension to Azureus client
 Performs periodic DNS lookups on popular CDN and
create a ratio map

Periodically updates the ratio-maps
 Exchanging ratio maps for cos-sim(a,b):
 On Handshake
 From trackers
 But how do we deal with peers not using Ono?
Ono also attempts to perform DNS lookups on behalf of
other peers that it encounters, to determine their ratio maps
 How?
 Taken from Ono code : getting the other peer DNS server
 And querying it

So what's now?
 Get ratio information from other peers that got from
tracker , and understand who is close

When determine similar redirection behavior, attempts to bias
traffic towards that peer by ensuring that the connection is
always on
 Sends Ono information to supporting trackers(in case of
supporting trackers)
 But what is the cost? How much is our overhead?
 18KB upstream, 36KB downstream per day
 Computation of cosine-similarity is easy
Important notes
 CDN names being used:
 Initialization of ratio map:
 DNS on each CDN name at most once every 30 sec. for 2
min. this gives basis ratio map
 After this phase


If the redirection info for CDN name similar to prev. query 
the interval between queries increases by 1 min.
Otherwise the interval is halved(to a min. of 30 sec.)
Some statistics regarding Ono
 Details for 2007 :
 > 195,000 users worldwide
 … collecting ~15GB of data per day
Empirical results
 Over 120,000 peers use Ono
 Ono collects extra network data
 Samples transfer rates for each connection every 5 sec.
 Get RTT for endpoint using pings
 Get Trace-route between end points



Note :
Not easy to determine cross-ISP hops
IP hops is easy and gives some measure
Empirical resualts






So in practice Trace Route gives a router level views of path
between hosts. BUT an ISP can contain many routers, we
wish for a metric that is closely correspond to ISP hops.
How do we get this metric?
Autonomous systems , how?
Although there is no 1 to 1 correlation between AS and
ISPs, the number of AS hops gives us an upper bound
estimate on the number of cross-ISP hops
So in practice we generate AS level path info from our
trace-routes using mapping that can be provided
Example :
Example
Empirical results



Ono finds shorter paths
Median in less than half
More than 20%
are only one hop
away, via less
than 2%
Reducing cross-ISP traffic
 Average number of AS hops to reach Ono-
recommended/random peers
> 30% of paths to Ono-recommended
peers do not leave the AS of origin
Note BT curve includes all peers, either
Ono-recommended or randomly selected
53
Finding nearby peers
Two orders of magnitude difference
And, on average, 31% lower loss rates!
54
Improving transfer performance
Heavy
Tail – Average
performance
improves
by up
31%
DSL
in England
-- 4/8Mbps
down, only
768Kbps
ISP bandwidth allocation policy
Median
islink
~2KB/s
brings
bottleneck
the
access
Even when
Ono doesn’tto
help,
it difference
allows
BT to
naturally
select faster peers
55
… with the right bandwidth
allocation policy
 Romania: 50 Mb/s in metro-area, 4 Mb/s
outside
883% median improvement
56
Helpful ISPs can help themselves
57
Duscussion
 Absolute network positioning system , and just throw
away the “far” peers
 Problem – all peers must take a part in the service, in
contrast to our method
 Use just AS numbers
 There are ISPs (like comcast) that have many AS
numbers, so using these numbers can restrict cross-AS
traffic that is not cross-ISP traffic
Conclusion
 Recycling network views collected by CDNs
 The method reduces cross-ISP traffic
 Performance of peers is not effected(we saw this)
 Scalable (the more clients adopt it, the more accurate
the bias would get)
 Available easily and freely
 Therefore the method is good and can provide good
results in reducing Cross-ISP traffic
Question
 TBD
Extras