No Slide Title - Computer Science and Engineering
Download
Report
Transcript No Slide Title - Computer Science and Engineering
CIS 6930.5:
Federated
Distributed Systems
Adriana Iamnitchi (Anda)
[email protected]
Contact Info
Email: [email protected]
Office: ENB 334
Office hours: Wednesdays, 10:45 – 1:00 and by appointment
Course page:
http://www.csee.usf.edu/~anda/CIS6930.5
CIS6930.5: Federated Distributed Systems (Fall 2005)
2
Examples of Distributed Systems
ATT web
A Sensor Network
CIS6930.5: Federated Distributed Systems (Fall 2005)
Gnutella network
The Internet
3
Definition (a version)
A distributed system is a collection of
autonomous, programmable, failure-prone
entities that are able to communicate through
a communication medium that is unreliable.
– Entity=a process on a device (PC, PDA, mote)
– Communication Medium=Wired or wireless
network
“Federated” – spanning multiple institutional
or network (DNS) domains
CIS6930.5: Federated Distributed Systems (Fall 2005)
4
Outline
Case study: Seti, Napster, Gnutella
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2005)
5
CIS6930.5: Federated Distributed Systems (Fall 2005)
6
SETI@home Operations
tape backup
user DB
data
recorder
science DB
tape archive,
delete
redundancy
checking
master DB
DLT tapes
CGI program
garbage
collector
acct.
queue
result
queue
splitters
screensavers
WU storage
data
server
CIS6930.5: Federated Distributed Systems (Fall 2005)
web page
generator
web site
RFI
elimination
repeat
detection
7
How does it work?
SETI@home
Master-worker
architecture
Fixed-rate data processing task
Low bandwidth/computation ratio
Independent parallelism
Error tolerance
CIS6930.5: Federated Distributed Systems (Fall 2005)
8
History and Statistics
Conceived 1995, launched April 1999
“scientific experiment that uses Internet-connected
computers in the Search for Extraterrestrial
Intelligence (SETI). You can participate by running
a free program that downloads and analyzes radio
telescope data. “
No ET signals yet, but other results
Total
Users
Last 24 Hours
(as of Wed Feb 23 07:04:51)
5,361,313
4,391
1,779 millions
5 million
Total CPU time
2.2 million years
3610.717 years
Average CPU
time/work unit
10 hr 58 min 14.0 sec
6 hr 19 min 30.1 sec
Results received
CIS6930.5: Federated Distributed Systems (Fall 2005)
9
Public-resource computing
Utilizes idle computing cycles over Internet
Other systems:
– Original: GIMPS, distributed.net
– Commercial: United Devices, Entropia,
Porivo, Popular Power
– Academic, open-source
> Cosm, folding@home
CIS6930.5: Federated Distributed Systems (Fall 2005)
10
None of the popularity of SETI!
ET
How to get and retain users (from David Anderson,
the leader of the SETI@home project)
– Graphics are important (but monitors do burn in)
– Teams: users recruit other users
– Keep users informed
Science news
System management news
Periodic project emails
Reward users:
– PDF certificates
– Milestone pages and emails
– Leader boards (overall, country, …)
CIS6930.5: Federated Distributed Systems (Fall 2005)
11
Millions and millions of computers!
(Problems)
Server scalability
Dealing with excess CPU time
Cheating
Bad behavior:
– Team recruitment by spam
– Sale of accounts on eBay
Malfunctions
Network bandwidth costs money
CIS6930.5: Federated Distributed Systems (Fall 2005)
12
SETI@home: Summary
Master-worker design
– Centralized solution
>Master=central point of control
>Single point of failure
>Performance bottleneck
Incentives for participation
– Mean sometimes incentives for cheating
Massive (“embarrassing”) parallelism
Low bandwidth/computation ratio
Users do donate real resources: $1.5M / year
consumed power
More information:
http://setiathome.ssl.berkeley.edu
CIS6930.5: Federated Distributed Systems (Fall 2005)
13
Outline
Case study: Seti, Napster, Gnutella
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2005)
14
The File Location Problem
(Napster and Gnutella)
Where is file A?
CIS6930.5: Federated Distributed Systems (Fall 2005)
15
Napster: How It Works
napster.com
• Client-server: Use central server to locate files
• Download files directly from peers
CIS6930.5: Federated Distributed Systems (Fall 2005)
16
Napster
1. File list is
uploaded
napster.com
users
CIS6930.5: Federated Distributed Systems (Fall 2005)
17
Napster
2. User
requests
search at
server.
napster.com
Request
and
results
user
CIS6930.5: Federated Distributed Systems (Fall 2005)
18
Napster
3. User pings
hosts that
apparently
have data.
Looks for
best transfer
rate.
napster.com
pings
pings
user
CIS6930.5: Federated Distributed Systems (Fall 2005)
19
Napster
4. User
retrieves file
napster.com
Retrieves
file
user
CIS6930.5: Federated Distributed Systems (Fall 2005)
20
Napster: History
Program for sharing files over the Internet
History:
– 5/99: Shawn Fanning (freshman, Northeasten U.)
founds Napster Online music service
– 12/99: first lawsuit
– 3/00: 25% UWisc traffic Napster
– 2000: est. 60M users
– 2/01: US Circuit Court of
Appeals: Napster knew users
violating copyright laws
– 7/01: # simultaneous online users:
Napster 160K, Gnutella: 40K, Morpheus: 300K
CIS6930.5: Federated Distributed Systems (Fall 2005)
21
Napster: Summary
Centralized server:
– Client-server architecture
– Single logical point of failure
– Potential for congestion (bottleneck)
– Napster “in control” (freedom is an illusion)
No security:
– Passwords in plain text
– No authentication
– No anonymity
CIS6930.5: Federated Distributed Systems (Fall 2005)
22
Outline
Public-resource computing
– Case study: Seti@home
Peer-to-peer systems
– Case study 1: Napster
– Case study 2: Gnutella
Discuss:
–
–
–
–
Characteristics
Impact
Architecture
Killer application
CIS6930.5: Federated Distributed Systems (Fall 2005)
23
Gnutella: Search for Files with No
Central Server
napster.com
CIS6930.5: Federated Distributed Systems (Fall 2005)
24
Ideas?
Where is file A?
CIS6930.5: Federated Distributed Systems (Fall 2005)
25
Gnutella: Search
I have file A.
I have file A.
Reply
Flooding
Query
Where is file A?
CIS6930.5: Federated Distributed Systems (Fall 2005)
26
Gnutella: History and Statistics
Gnutella history:
– 3/14/00: release by AOL, almost immediately withdrawn
– too late: 1,859,340 users on Gnutella on August 25, 2am
– many iterations to fix poor initial design
High impact:
– Versions implemented
– Different designs
– Lots of research papers/ideas
Network
Users
eDonkey2K
4,123,688
FastTrack
2,521,887
Gnutella
1,516,762
Overnet
1,146,880
DirectConnect
294,255
MP2P
251,137
(www.slyck.com, 06/24/’05)
CIS6930.5: Federated Distributed Systems (Fall 2005)
27
What would you ask about Gnutella?
…
…
CIS6930.5: Federated Distributed Systems (Fall 2005)
28
Gnutella: Heterogeneity
All Peers Equal? (1)
1.5Mbps DSL
1.5Mbps DSL
56kbps Modem
1.5Mbps DSL
10Mbps LAN
1.5Mbps DSL
56kbps Modem
56kbps Modem
CIS6930.5: Federated Distributed Systems (Fall 2005)
29
Gnutella: Free Riding
All Peers Equal? (2)
More than 25% of
Gnutella clients share no
files; 75% share 100 files
or less
Conclusion: Gnutella has
a high percentage of free
riders
If only a few individuals
contribute to the public
good, these few peers
effectively act as
centralized servers.
Adar and Huberman (Aug ’00)
CIS6930.5: Federated Distributed Systems (Fall 2005)
30
Flooding in Gnutella: Loop Prevention
Seen request already
CIS6930.5: Federated Distributed Systems (Fall 2005)
31
Gnutella Topology Mismatch
CIS6930.5: Federated Distributed Systems (Fall 2005)
32
Gnutella Summary
Search by flooding
Self-configuring
Phenomena:
– Not all peers equal
– Free riding
Problems:
– Topology mismatch
– Duplicates due to flooding
Good source for technical info/open questions:
– http://www.limewire.com/index.jsp/tech_papers
CIS6930.5: Federated Distributed Systems (Fall 2005)
33
Problems in Distributed Systems
…
Communication
– Routing [IP,BGP]
– Multicast [IP multicast, SRM, RMTP]
Post and retrieve [Usenet]
Search [Gnutella, Kazaa, etc., Google]
Storage [Databases]
Coordination [SETI@Home]
…
CIS6930.5: Federated Distributed Systems (Fall 2005)
34
Challenges
…
Failures
Scale
Asynchrony
Security
Deployment
Adoption
…
CIS6930.5: Federated Distributed Systems (Fall 2005)
35
Challenges (2)
…
Learn from usage
– Example 1: The Internet
– Example 2: Napster
Conflicting requirements:
– Light but adaptable?
– Light but data-consistent? (think
transactions)
– … (other examples?)
… (other examples?)
CIS6930.5: Federated Distributed Systems (Fall 2005)
36
Course Organization/Syllabus/etc.
CIS6930.5: Federated Distributed Systems (Fall 2005)
37
Administravia: Grading
Reviewing:30%
Discussion leading: 15%
Project: 55%
– Aim high!
– Have fun!
CIS6930.5: Federated Distributed Systems (Fall 2005)
38
Administravia:
Paper Reviewing (1)
Goals:
–
–
Think of what you read
Get used to writing paper reviews
Reviews due by midnight before class
Follow the form when relevant.
State the main contribution of the paper
Critique the main contribution.
–
Rate the significance of the paper on a scale of 5
(breakthrough), 4 (significant contribution), 3 (modest
contribution), 2 (incremental contribution), 1 (no
contribution or negative contribution). Explain your
rating in a sentence or two.
CIS6930.5: Federated Distributed Systems (Fall 2005)
39
Administravia:
Paper Reviewing (2)
Rate how convincing the methodology is.
Do the claims and conclusions follow from the
experiments?
Are the assumptions realistic?
Are the experiments well designed?
Are there different experiments that would be
more convincing?
Are there other alternatives the authors should
have considered?
(And, of course, is the paper free of
methodological errors?)
CIS6930.5: Federated Distributed Systems (Fall 2005)
40
Administravia:
Paper Reviewing (3)
What is the most important limitation of the approach?
What are the three strongest and/or most interesting
ideas in the paper?
What are the three most striking weaknesses in the
paper?
Name three questions that you would like to ask the
authors.
Detail an interesting extension to the work not
mentioned in the future work section.
Optional comments on the paper that you’d like to see
discussed in class.
CIS6930.5: Federated Distributed Systems (Fall 2005)
41
Paper Reviewing (final)
Be professional in your writing
Have an eye on the writing style:
– Clarity
– Beware of traps: learn to use them in writing
and detect them in reading
– Detect (and stay away from) trivial claims.
E.g., 1st sentence in the Introduction:
“The tremendous/unprecedented/phenomenal
growth/scale/ubiquity of the Internet…”
CIS6930.5: Federated Distributed Systems (Fall 2005)
42
Administravia:
Discussion leading
Come prepared!
– Prepare discussion outline
– Prepare questions:
> “What if”s
> Unclear things
>…
– Similar ideas in different contexts
– Initiate short brainstorming sessions
Leaders do NOT need to submit paper reviews
Main goals:
– Keep discussion flowing
– Keep discussion relevant
– Engage everybody (I’ll have an eye on this, too)
CIS6930.5: Federated Distributed Systems (Fall 2005)
43
Administravia:
Projects
Combine with your research if relevant to the class
Get approval from all instructors if you overlap
final projects:
– Don’t sell the same piece of work twice
– You can get more than twice as many results with
less than twice as much work
Aim high!
– Put one extra month and get a publication out of it
– It is doable
Try ideas that you postponed out of fear: it’s just a
class, not your PhD.
CIS6930.5: Federated Distributed Systems (Fall 2005)
44
Administravia:
Project deadlines (tentative)
Sept. 15: 1-page project proposal
Oct. 11: 3-page literature survey
– Know relevant work in your problem area
– If implementation project, list tools, similar projects
Nov. 11: 5-page Midterm project due
– Have a clear image of what’s possible/doable
– Report preliminary results
Last class(es):In-class project presentation
– Demo, if appropriate
Dec. 16:
– 10-page write-up
CIS6930.5: Federated Distributed Systems (Fall 2005)
45
Next Class (Wed, August 31)
Read the 4 chapters from the Grid book
Send brief summaries (lists of ideas/problems
discussed, etc)
– Do not follow the reviewing form
– Be brief and efficient!
– Be BRIEF and EFFICIENT!
In-class discussion + some project ideas
Need discussion leader to team up with me for the
class next week:
– The structure of networks (pick 2):
1. Small-world file sharing communities, Iamnitchi, Ripeanu, Foster.
Infocom 2004.
2. On Power-Law Relationships of the Internet Topology, Faloutsos,
Faloutsos, and Faloutsos, SIGCOMM 1999
3. Mapping the Gnutella network, M. Ripeanu et al, IEEE Computing
Journal 2002.
CIS6930.5: Federated Distributed Systems (Fall 2005)
46
Questions?
CIS6930.5: Federated Distributed Systems (Fall 2005)
47