ppt - Computer Science and Engineering
Download
Report
Transcript ppt - Computer Science and Engineering
CIS 6930.5:
Federated
Distributed Systems
Adriana Iamnitchi (Anda)
[email protected]
Contact Info
Email: [email protected]
Office: ENB 334
Office hours: by appointment (email me)
Course page:
http://www.csee.usf.edu/~anda/CIS6930.5
CIS6930.5: Federated Distributed Systems (Fall 2006)
2
CIS 6930.5: Course Goals
Primary
– Gain deep understanding of fundamental
issues that affect design of large-scale
federated distributed systems
– Map primary contemporary research themes
– Gain experience in network research
Secondary
– By studying a set of outstanding papers,
build knowledge of how to present research
– Learn how to read papers & evaluate ideas
CIS6930.5: Federated Distributed Systems (Fall 2006)
3
What I’ll Assume You Know
Basic Internet architecture
– IP, TCP, DNS, HTTP
Basic principles of distributed computing
– Asynchrony (cannot distinguish between
communication failures and latency)
– Partial global state knowledge (cannot know
everything correctly)
– Failures happen. In very large systems,
even rare failures happen often
If there are things that don’t make sense,
ask!
CIS6930.5: Federated Distributed Systems (Fall 2006)
4
Examples of Distributed Systems
ATT web
A Sensor Network
CIS6930.5: Federated Distributed Systems (Fall 2006)
Gnutella network
The Internet
5
Definition (a version)
A distributed system is a collection of
autonomous, programmable, failure-prone
entities that are able to communicate through
a communication medium that is unreliable.
– Entity=a process on a device (PC, PDA, mote)
– Communication Medium=Wired or wireless
network
“Federated” – spanning multiple institutional
or network (DNS) domains
CIS6930.5: Federated Distributed Systems (Fall 2006)
6
Outline
Case study (and project ideas):
– Volunteer computing: SETI@home and BOINC
– Grid computing
– P2P systems
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2006)
7
CIS6930.5: Federated Distributed Systems (Fall 2006)
8
SETI@home Operations
tape backup
user DB
data
recorder
science DB
tape archive,
delete
redundancy
checking
master DB
DLT tapes
CGI program
garbage
collector
acct.
queue
result
queue
splitters
screensavers
WU storage
data
server
CIS6930.5: Federated Distributed Systems (Fall 2006)
web page
generator
web site
RFI
elimination
repeat
detection
9
How does it work?
SETI@home
Master-worker
architecture
Fixed-rate data processing task
Low bandwidth/computation ratio
Independent parallelism
Error tolerance
CIS6930.5: Federated Distributed Systems (Fall 2006)
10
History and Statistics
Conceived 1995, launched April 1999
“scientific experiment that uses Internet-connected
computers in the Search for Extraterrestrial
Intelligence (SETI). You can participate by running
a free program that downloads and analyzes radio
telescope data. “
No ET signals yet, but other results
Total
Users
Last 24 Hours
(as of Wed Feb 23 07:04:51)
5,361,313
4,391
1,779 millions
5 million
Total CPU time
2.2 million years
3610.717 years
Average CPU
time/work unit
10 hr 58 min 14.0 sec
6 hr 19 min 30.1 sec
Results received
CIS6930.5: Federated Distributed Systems (Fall 2006)
11
Volunteer computing
Also called “public-resource computing”
Utilizes idle computing cycles over Internet
Other systems:
– Original: GIMPS, distributed.net
– Commercial: United Devices, Entropia,
Porivo, Popular Power
– Academic, open-source
> Cosm, folding@home
CIS6930.5: Federated Distributed Systems (Fall 2006)
12
None of the popularity of SETI!
ET
How to get and retain users (from David Anderson,
the leader of the SETI@home project)
– Graphics are important (but monitors do burn in)
– Teams: users recruit other users
– Keep users informed
Science news
System management news
Periodic project emails
Reward users:
– PDF certificates
– Milestone pages and emails
– Leader boards (overall, country, …)
CIS6930.5: Federated Distributed Systems (Fall 2006)
13
Millions and millions of computers!
(Problems)
Server scalability
Dealing with excess CPU time
Cheating
Bad behavior:
– Team recruitment by spam
– Sale of accounts on eBay
Malfunctions
Network bandwidth costs money
CIS6930.5: Federated Distributed Systems (Fall 2006)
14
SETI@home: Summary
Master-worker design
– Centralized solution
>Master=central point of control
>Single point of failure
>Performance bottleneck
Incentives for participation
– Mean sometimes incentives for cheating
Massive (“embarrassing”) parallelism
Low bandwidth/computation ratio
Users do donate real resources: $1.5M / year
consumed power
More information:
http://setiathome.ssl.berkeley.edu
CIS6930.5: Federated Distributed Systems (Fall 2006)
15
BOINC
Berkeley Open Infrastructure for Network
Computing
“Open-source software for volunteer computing and
desktop grid computing. “
http://boinc.berkeley.edu/
Project idea: install and configure BOINC on a set
of machines at USF to run large embarrassingly
parallel applications.
– Two candidate applications from mechanical
engineering and physics (code already exists)
– Report experience. Think along the following idea:
would it be beneficial to use the administrative
desktops for scientific computations at USF?
CIS6930.5: Federated Distributed Systems (Fall 2006)
16
Outline
Case study (and project ideas):
– Volunteer computing: SETI@home and BOINC
– Grid computing
– P2P systems
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2006)
17
Grid Computing: Current Status
The metaphor: power grid
Many deployed grids running in
production mode
Scientists are the most traditional
users
Users:
– 100s, 10s of institutions
– Well-established communities
Resources:
– Computers, data, instruments,
storage, applications
– Owned/administered by institutions
Applications: data- and computeintensive processing
Approach: common infrastructure
CIS6930.5: Federated Distributed Systems (Fall 2006)
18
Why Don’t We Build a Huge Supercomputer?
time:
2003
2000
1998
1996
2002
1995
1996
-0.82
-0.84
1000
2001
1999
1997
1995
2001
-0.80
2000
-0.76
-0.78
10000
.
-k
1999
-0.72
1998
Zipf distribution: Perf(rank)
≈ rank
-0.74
LinPack perf.GFLOPS (log scale) .
Parameter 'k' evolution
1997
Top500 supercomputer list
-0.68
over
-0.70
100
10
1
1
10
CIS6930.5: Federated Distributed Systems (Fall 2006)
100
1000
Rank (log scale)
19
-0.68
-0.70
Impact
Parameter 'k' evolution
.
-0.72
-0.74
-0.76
-0.78
-0.80
– A virtual machine that aggregates the last 10 in Top500
would rank 32nd in ’95 but 14th in ‘03
Both Grid and P2P computing are results of this trend:
– Grids: focus on assembling (a relatively small number of)
resources to enable controlled, secure resource sharing
– P2P focus: scale, deployability.
Challenge: design services that offer the best of both
worlds
complex, secure services, that deliver controlled QoS;
are scalable and can be easily deployed.
CIS6930.5: Federated Distributed Systems (Fall 2006)
20
2003
2001
2000
1999
1998
1997
1996
Trend: it is increasingly interesting to
aggregate the capabilities of the machines in the tail of
this distribution.
1995
2002
-0.82
-0.84
Outline
Case study (and project ideas):
– Volunteer computing: SETI@home and BOINC
– Grid computing
– P2P systems
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2006)
21
Peer-to-Peer Systems
Revived (?) by music sharing
A variety of applications deployed today
Def 1: “A class of applications that take
advantage of resources (e.g., storage, cycles,
content) available at the edge of the Internet.”
– Edges often turned off, without permanent IP
addresses, etc.
Def 2: “A class of decentralized, self-organizing
distributed systems, in which all or most
communication is symmetric.”
Lots of other definitions that fit in between
CIS6930.5: Federated Distributed Systems (Fall 2006)
22
P2P Impact (1)
Widespread adoption leading to
– KaZaA – 170 millions downloads (3.5M/week)
one of the most popular applications ever!
(almost) zero-cost data distribution
… is forcing companies to change their
business models
… might impact copyright laws
CIS6930.5: Federated Distributed Systems (Fall 2006)
23
P2P Impact (2)
Killer application for broadband to
consumers
– P2P generated traffic may be the single
largest contributor to Internet traffic today
Internet2 traffic statistics
Other
100%
Data transfers
80%
Unidentified
File sharing
60%
40%
20%
0%
Feb.'02
Aug.'02
Feb.'03
Aug.'03
Feb. '04
Source: www.internet2.edu
CIS6930.5: Federated Distributed Systems (Fall 2006)
July'04
24
Applications (1)
File sharing
– The ‘killer’ application to date
– Too many to list them all: Napster, FastTrack (KaZaA,
iMesh), Gnutella (LimeWire, Morpheus, BearShare),
Streaming: the user ‘plays’ the data as it arrives
Possible solution:
The first few users get the
stream from the server
New users get the stream
from the server or from users
who are already receiving the
stream
source
Oh, I am exhausted!
P2P
approach
CIS6930.5: Federated Distributed Systems (Fall 2006)
Client/server
approach
25
Applications (2)
Performance benchmarking Problem:
– Evaluate the performance of your Web site form end-user
perspective
> Multiple views on your site performance
– Generate Internet statistics
> Connectivity statistics
> Routing errors, routing maps
Backup storage (HiveNet, OceanStore)
Collaborative environments (Groove Networks)
Instant messaging (Yahoo, AOL)
Web serving communities (uServ)
Spam filtering
Anonymous email
Censorship-resistant publishing systems (Ethernity, Freenet
CIS6930.5: Federated Distributed Systems (Fall 2006)
26
P2P Networks: Current Status
Users:
– Millions
– Anonymous individuals
Resources:
– Computing cycles XOR files
– Owned/administered (?) by user
– Intermittent participation:
Network
Users
eDonkey2K
3,108,066
FastTrack
2,848606
Gnutella
2,219,539
Overnet
645,120
DirectConnect
???
> Gnutella: 60 min. (‘01)
MP2P
???
> MojoNation: 1/6 users always connected
(‘01)
(www.slyck.com, 06/14/’06)
> Overnet: 50% nodes available 70% of
time over a week (‘02)
Applications: file retrieval or parallel
computations
Approach: vertically integrated solutions
CIS6930.5: Federated Distributed Systems (Fall 2006)
27
Trend:
Large, Dynamic, Self-Configuring Grids
Functionality &
infrastructure
Grids
•Large scale
•Weaker trust assumptions
•Ease of integration
•No centralized authority
•Intermittent resource/user participation
•Diversity in:
•Shared resources
•Sharing characteristics
•Variable technical support
•Infrastructure (sharable services)
•Support for diverse applications
P2P
Scale & volatility
On Death, Taxes, and the Convergence of Grid and P2P Systems,
CIS6930.5:
Federated Distributed
Systems (Fall 2006)
Foster
and Iamnitchi,
IPTPS’03
28
Challenges in Distributed Systems
Scale
Real problems: spam, denial of service attacks
(and distributed), security, fault tolerance, etc.
We’ll look at latest solutions to such problems
proposed in:
– Top conferences in systems and networking:
SIGCOMM, OSDI, NSDI
– Top workshops (hot topics): IPTPS, HotOS
– Other venues
(Digression: how do you tell when a conference is
top?)
CIS6930.5: Federated Distributed Systems (Fall 2006)
29
Course Organization/Syllabus/etc.
CIS6930.5: Federated Distributed Systems (Fall 2006)
30
Administravia: Grading
Reviewing:30%
Discussion leading: 15%
Project: 55%
– Aim high!
– Have fun!
CIS6930.5: Federated Distributed Systems (Fall 2006)
31
Administravia:
Paper Reviewing (1)
Goals:
–
–
Think of what you read
Get used to writing paper reviews
Reviews due by noon before class
Be professional in your writing
Have an eye on the writing style:
–
–
Clarity
Beware of traps: learn to use them in writing and
detect them in reading
– Detect (and stay away from) trivial claims.
E.g., 1st sentence in the Introduction:
“The tremendous/unprecedented/phenomenal
growth/scale/ubiquity of the Internet…”
CIS6930.5: Federated Distributed Systems (Fall 2006)
32
Administravia:
Paper Reviewing (2)
Follow the form provided when relevant.
State the main contribution of the paper
Critique the main contribution: Rate the significance of the
paper on a scale of 5 (breakthrough), 4 (significant
contribution), 3 (modest contribution), 2 (incremental
contribution), 1 (no contribution or negative contribution).
Explain your rating in a sentence or two.
Rate how convincing the methodology is.
Do the claims and conclusions follow from the experiments?
Are the assumptions realistic?
Are the experiments well designed?
Are there different experiments that would be more convincing?
Are there other alternatives the authors should have
considered?
(And, of course, is the paper free of methodological errors?)
CIS6930.5: Federated Distributed Systems (Fall 2006)
33
Administravia:
Paper Reviewing (3)
What is the most important limitation of the approach?
What are the three strongest and/or most interesting ideas in
the paper?
What are the three most striking weaknesses in the paper?
Name three questions that you would like to ask the authors.
Detail an interesting extension to the work not mentioned in
the future work section.
Optional comments on the paper that you’d like to see
discussed in class.
CIS6930.5: Federated Distributed Systems (Fall 2006)
34
Administravia:
Discussion leading
Come prepared!
– Prepare discussion outline
– Prepare questions:
> “What if”s
> Unclear aspects of the solution proposed
>…
– Similar ideas in different contexts
– Initiate short brainstorming sessions
Leaders do NOT need to submit paper reviews
Main goals:
– Keep discussion flowing
– Keep discussion relevant
– Engage everybody (I’ll have an eye on this, too)
CIS6930.5: Federated Distributed Systems (Fall 2006)
35
Administravia:
Projects
Combine with your research if relevant to the class
Get approval from all instructors if you overlap
final projects:
– Don’t sell the same piece of work twice
– You can get more than twice as many results with
less than twice as much work
Aim high!
– Put one extra month and get a publication out of it
– It is doable (we have proofs)
Try ideas that you postponed out of fear: it’s just a
class, not your PhD.
CIS6930.5: Federated Distributed Systems (Fall 2006)
36
Administravia:
Project deadlines (tentative)
Sept. 14: 1-page project proposal
Oct. 10: 3-page literature survey
– Know relevant work in your problem area
– If implementation project, list tools, similar projects
Nov. 13: 5-page Midterm project due
– Have a clear image of what’s possible/doable
– Report preliminary results
Last class(es):In-class project presentation
– Demo, if appropriate
Dec. 15:
– 10-page write-up
CIS6930.5: Federated Distributed Systems (Fall 2006)
37
Next Class (Wed, August 30)
In-class discussion of papers:
– “Automated Worm Fingerprinting”, OSDI ‘04.
– “Planet Scale Software Updates”, SIGCOMM ’06.
Discussion of some project ideas
Need discussion leader to team up with me for the
class next week: Real systems (1): BitTorrent
– Exploiting BitTorrent For Fun (IPTPS’06)
– A Case for Efficient Execution of Data-Intense
Applications with BitTorrent on Computational
Desktop Grids ()
CIS6930.5: Federated Distributed Systems (Fall 2006)
38
Questions?
CIS6930.5: Federated Distributed Systems (Fall 2006)
39