ppt1 - People.cs.uchicago.edu

Download Report

Transcript ppt1 - People.cs.uchicago.edu

CSPP 54001:
Large-Scale Networked Systems
Week 5:
P2P Technologies and Applications
Matei Ripeanu
P2P Definition(s)
A number of definitions coexist:
 Def 1: “A class of applications that takes advantage of
resources — storage, cycles, content, human presence —
available at the edges of the Internet.”



Edges often turned off, without permanent IP addresses
Def 2: “A class of decentralized, self-organizing distributed
systems, in which all or most communication is symmetric.”
Lots of other definitions that fit in between
CSPP 54001 – February 7, 2003
Week 5: P2P





Definitions
Impact
Uses and Examples
Principles
More in-depth Case Studies




Gnutella
CDN: P2P, Web and Akamai traffic analysis
DHTs
Your Role
CSPP 54001 – February 7, 2003
P2P Impact: Widespread adoption


KaZaA – 170 millions downloads (3.5M/week)
the most popular application ever!
Number of users for file-sharing applications
(www.slyck.com,
FastTrack
4,114,120
2/4/2003)
iMesh
1,375,199
eDonkey
DirectConnect
Blubster
Gnutella
CSPP 54001 –Cvernet
February 7, 2003
569,097
136,552
97,128
92,678
91,750
P2P Impact (2): Huge traffic

P2P generated traffic now dominates the
Internet load



Internet2 traffic statistics
Cornell.edu (March ’02): 60% P2P
UChicago estimate (March100%
‘01): Gnutella control
90%
traffic about 1% of all Internet
traffic.
80%
70%
60%
Other
50%
Data transfers
40%
Unidentified
30%
File sharing
20%
10%
0%
CSPP 54001 – February 7,
2003
Feb.'02
Aug.'02
Feb.'03
P2P Impact (3)

Shows a huge pool of underutilized resources that
can be put to work

Seti@Home statistics
Users
Last 24 Hours
4,236,090
2,365
764M
1.13M
Total CPU time
1.3 M years
1.3 K years
Floating Point
Operations
2.564760e+21
4.441277e+18
(51.40 TeraFLOPs/sec)
Results received

Total
Entropia statistics :
CSPP 54001 – February 7, 2003
P2P Impact (4)



Might force a few companies to change their
business models
Data copying and distribution carries zero
almost cost now  this might impact
copyright laws
New research domain  grants and PhD
theses 
CSPP 54001 – February 7, 2003
Week 5: P2P






Definitions
Impact
Applications
Mechanisms
More in-depth Case Studies
Your Role
CSPP 54001 – February 7, 2003
Applications: Number crunching



Examples: Seti@Home, Entropia, UnitedDevices,
DistributedScience, many others
Approach suitable for a particular class of problems.
Some characteristics (for Seti@Home):






Massive parallelism
Low bandwidth/computation ratio
Fixed-rate data processing task
Error tolerance
Users do donate *real* resources
What are the problems?



$1.5M / year
extra consumed power
Centralized. Does it scale?
How to prevent cheating?
How to extend the model
to problems
CSPP 54001
– February 7,that
2003 are not massively parallel
Applications: File sharing


The ‘killer application’ to date
Too many to list them all:

Napster, FastTrack (KaZaA, KazaaLite), Gnutella
(LimeWire, Morpheus, BearShare), iMesh
CSPP 54001 – February 7, 2003
Applications: Content distribution



uServ project: your own Web Server at no (or little) cost
Goal: Provide a system and software that users run to
serve content on the web with their own machines
Alternatives:




Run your own webserver
Hosting Services (free or paid)
Other P2P applications
Not just a webserver, but a webserving community:


Users cooperatively improve availability (through site
replication/mirroring)
Users cooperatively liberate fire-walled content (through P2P
proxying/relaying)
CSPP 54001 – February 7, 2003
Applications: Content Distribution


Streaming: the user plays the data as as it arrives
Possible solution:
If A is the first
user, A gets the
stream from the
server
 Else, A can get the
stream from the
central site or a set
of users who have
already been
receiving the stream

source
Oh, I am exhausted!
P2P approach
CSPP 54001 – February 7, 2003
Client/server approach
Applications: Measurements

Evaluate the performance of your Web site
form end-user perspective (example: Porivo
Networks)


Multiple views on your site performance
Generating Internet statistics


Connectivity statistics
Routing errors
CSPP 54001 – February 7, 2003
Measurements: The Performance “Blind Spot”
Back-end
Infrastructure
Network
Landscape
Last-mile
“Blind Spot”
Datacenter Testing
“Beacon”
Database
Web server
Backbone
ISP
Enterprise Provider
T1
Firewall
Corporate
Network
ISP
App server
Backbone
3rd party
content
Regional
Network
Corporate
User
Major
Provider
Local ISP
Component
Testing
•BMC
•Mercury Interactive
•Tivoli
•ProactiveNet
•HP OpenView
•Computer Associates
Datacenter
Monitoring
•Keynote Systems
•Mercury Interactive
•BMC/SiteAngel
•Service Metrics
Consumer
User
Critical to estimate end-to-end
performance
CSPP 54001 – February 7, 2003
Slide source: www.porivo.com
Measurements: End-to-end Performance
Back-end
Infrastructure
Database

Network
Landscape
Web server

App server

Backbone
ISP
Enterprise Provider
T1
Firewall


Corporate
Network
ISP
Backbone
3rd party
content
Regional
Network
Corporate
User
Major
Provider
Local ISP
Component
Testing
Datacenter
Monitoring
Consumer
User
End-to-end
Web Performance
Testing
CSPP 54001 – February 7, 2003
Slide source: www.porivo.com
More applications …






Instant messaging (Yahoo, AOL)
Collaborative environments (Groove)
Backup storage (HiveNet, OceanStore)
Spam filtering
Anonymous email
Censorship-resistant publishing systems
(Ethernity, Freenet)
CSPP 54001 – February 7, 2003
Week 5: P2P






Definitions
Impact
Uses and Examples
Mechanisms
More in-depth Case Studies
Your Role
CSPP 54001 – February 7, 2003
Mechanisms (1)

To obtain a resilient system


 integrate multiple components with uncorrelated
failure curves. Use data and service replication.
To improve quality of service delivered


 integrate multiple providers with uncorrelated
demand curves (this way less over-provisioning is
necessary for each of them)
 Move service delivery closer to the user
CSPP 54001 – February 7, 2003
Mechanisms (2)

To provide anonymity


Detect anomalies, generate good statistics


 use large number of independent components,
“hide in the crowd” and make search impossible (or
costly)
 Use multiple views
Other facts


It is not clear what’s the QoS acceptable for users
(sometime they are willing to accept a degradation in
service quality if they get it for free)
Social engineering – building communities
CSPP 54001 – February 7, 2003
Week 5: P2P





Definitions
Impact
Uses and Examples
Mechanisms
More in-depth Case Studies




Gnutella
Traffic analysis: P2P vs. Web vs. Akamai
DHTs
Your Role
CSPP 54001 – February 7, 2003
Gnutella Network
Why analyze Gnutella network?
 Large scale
– up to 500k nodes, 100TB data, 10M files today
 Self-organizing network
 Fast growth in its early stages
– more than 50 times during first half of 2001
 Open architecture, simple and flexible protocol
 Interesting mix of social and technical issues
CSPP 54001 – February 7, 2003
Gnutella protocol overview
 P2P file sharing app. on top of an overlay network


Nodes maintain open TCP connections
Messages are broadcasted (flooded) or back-propagated
 (Initial) protocol
Membership
Query
Broadcast
(Flooding)
Backpropagated
PING
PONG
QUERY
QUERY HIT
File download
Node to node
GET, PUSH
 Protocol refinements (2001 and later)
 Ping messages used more efficiently, Vendor specific
extensions, GWebCaches, XML searches, super-nodes (2layer hierarchy).
CSPP 54001 – February 7, 2003
Gnutella search mechanism
Steps:
• Node 2 initiates search for file A
7
1
A
4
2
6
3
5
CSPP 54001 – February 7, 2003
Gnutella search mechanism
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
7
1
4
2
3
A
6
A
5
CSPP 54001 – February 7, 2003
Gnutella search mechanism
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
7
1
4
2
6
3
A
A
5
CSPP 54001 – February 7, 2003
Gnutella search mechanism
A:7
A
7
1
4
2
6
3
A:5
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
5
CSPP 54001 – February 7, 2003
Gnutella search mechanism
7
1
4
2
3
A:7
A:5
A 6
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
5
CSPP 54001 – February 7, 2003
Gnutella search mechanism
7
1
A:7
2
4
A:5
6
3
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
5
CSPP 54001 – February 7, 2003
Gnutella search mechanism
download A
1
7
4
2
6
3
5
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
• File download
CSPP 54001 – February 7, 2003
Gnutella: tools for network exploration
 Eavesdropper - modified node inserted into
the network to log traffic.
 Crawler - connects to all active nodes and
uses the membership protocol to discover
graph topology.
 Client-server approach.
CSPP 54001 – February 7, 2003
Gnutella: Network size
Explosive growth in 2001, but slowly shrinking after
 High user interest
 Users tolerate high
latency, low quality
results
 Better resources
 DSL and cable modem
nodes grew from 24%
to 41% over first 6
months.
CSPP 54001 – February 7, 2003
Gnutella: Growth Invariants
 (1) Unchanged average node connectivity
Number of links ('000)
 3.4 links/node on average
200
150
100
50
0
0
10000
20000
30000
40000
50000
Number of nodes
CSPP 54001 – February 7, 2003
Gnutella: Growth Invariants
Average node-to-node
distance varied only
25% while the network
grew 50 times over 6
months
Percent of node pairs (%)
 (1) Unchanged average node connectivity
 (2) Node-to-node distance maintains similar
distribution
50%
40%
30%
20%
10%
0%
1
2
3
CSPP 54001 – February 7, 2003
4
5 6 7 8 9 10 11 12
Node-to-node shortest path (hops)
Is Gnutella a power-law network?
Power-law networks: the number of links per node
follows a power-law distribution
N = L-k
Num. of nodes (log scale)
10000
November 2000
1000
100
10
1
1
10
Number of links (log scale)
100
Examples:
 The Internet,
 In/out links to/from
HTML pages,
 Citations network,
 US power grid,
 Social networks.
Implications: High tolerance to random node failure but low
reliability when facing CSPP
of an
‘intelligent’
adversary
54001
– February 7, 2003
Is Gnutella a power-law network?
 Later, larger networks display a bimodal distribution
 Implications:

High tolerance to random
node failures preserved
Increased reliability
when facing an
attack.
10000
Number of nodes
(log scale)

May 2001
1000
100
10
1
1
CSPP 54001 – February 7, 2003
10
100
Number of links (log scale)
Gnutella: Query distribution


Similar to Web pages popularity: Zipf
distribution for query popularity
Significance: caching will work well
CSPP 54001 – February 7, 2003
Gnutella: Traffic analysis
messages per secod
  6-8 kbps per link over all connections
 Traffic structure
Message Frequency
25
changed over time
.
20
Ping
Push
Query
Other
15
10
5
CSPP 54001 – February 7, 2003
353
321
289
257
225
193
161
129
97
65
33
1
minute
Gnutella: Total generated traffic
1Gbps (or 330TB/month)!



Note that this estimate excludes actual file
transfers
Q: Does it matter?
Compare to 15,000TB/month estimated in US
Internet backbone (Dec. 2000)
CSPP 54001 – February 7, 2003
Gnutella: Topology mismatch
A
B
F
D
E
C
G
H
Physical links
Logical (overlay) links
Perfect mapping!
CSPP 54001 – February 7, 2003
Gnutella: Topology mismatch
A
B
F
D
E
C


G
H
Inefficient mapping
Link D-E needs to support six times higher traffic.
CSPP 54001 – February 7, 2003
Gnutella: Topology mismatch
The overlay network topology doesn’t
match the underlying Internet
infrastructure topology!



40% of all nodes are in the 10 largest
Autonomous Systems (AS)
Only 2-4% of all TCP connections link nodes
within the same AS
Largely ‘random wiring’
CSPP 54001 – February 7, 2003
Gnutella: Free Riding
 More than 25% of Gnutella
clients share no files; 75%
share 100 files or less
 Conclusion: Gnutella has a
high percentage of free
riders
 If only a few individuals
contribute to the public
good, these few peers
effectively act as
centralized servers.
Adar and Huberman (Aug ’00)
CSPP 54001 – February 7, 2003
Gnutella: Summary
 Gnutella: self-organizing, large-scale, P2P
application based on overlay network. It works!
 Discovered growth invariants specific to largescale systems that:
 Help predict resource usage.
 Give hints for better search and resource
organization techniques.
 Growth hindered by the volume of generated
traffic and inefficient resource use.
CSPP 54001 – February 7, 2003
Week 5: P2P





Definitions
Impact
Uses and Examples
Mechanisms
More in-depth Case Studies




Gnutella
Content Distribution Systems traffic analysis: P2P vs. Web vs.
Akamai
DHTs
Your Role
CSPP 54001 – February 7, 2003
Week 5: P2P






Definitions
Impact
Uses and Examples
Mechanisms
More in-depth Case Studies
Your Role
CSPP 54001 – February 7, 2003
Your Role





Network administrator
Lawyer
Application designer/developer
Entrepreneur
User
CSPP 54001 – February 7, 2003
More information
Links to reports, papers, slides used to prepare
this presentation:
 http://www.cs.uchicago.edu/~matei/P2P/
CSPP 54001 – February 7, 2003
Course Outline (Subject to Change)
1.
2.
3.
4.
5.
(January 9th) Internet design principles and protocols
(January 16th) Internetworking, transport, routing
(January 23rd) Mapping the Internet and other networks
(January 30th) Security
(February 6th) P2P technologies & applications (Matei Ripeanu)
(plus midterm)
6.
7.
8.
9.
10.
(February 13th) Optical networks (Charlie Catlett)
*(February 20th) Web and Grid Services (Steve Tuecke)
(February 27th) Network operations (Greg Jackson)
*(March 6th) Advanced applications (with guest lecturers: Terry
Disz, Mike Wilde)
(March 13th) Final exam
* Ian Foster
is out of town.
CSPP 54001 – February 7, 2003