CS290F, Winter 2004 Large-scale Networked Systems
Download
Report
Transcript CS290F, Winter 2004 Large-scale Networked Systems
CS290F, Winter 2005
Large-scale Networked Systems
Instructor
Ben Y. Zhao (ravenben at cs.ucsb.edu, 1151 Engineering I)
Office hour: Thur, 3-4pm
Lecture time: TuTh, 1:00-3:00 pm
Place: 1401 Phelps
Teaching Assistant
Ted Huffmire (huffmire at cs.ucsb.edu)
Office hours: TBA
Overview
Administrivia
What this class is, and is not
Enrollment, etc…
What you have to do for this class
Grading
Break
Course projects
organization
some initial ideas
Administrivia
Course webpage
http://www.cs.ucsb.edu/~ravenben/classes/290F
check it periodically for announcements
Class mailing list
cs290f at lists.cs.ucsb.edu
go to: http://lists.cs.ucsb.edu/mailman/listinfo/cs290f
sign up today!
Deadlines
Unless otherwise specified, it means 10 minutes before the
lecture
Special circumstances should be brought to my attention before
deadlines
Class Registration
Please send me ([email protected] ) an e-mail with the
subject “cs290f registration" and the following:
Last and first name
Student ID
Your department
Preferred email address
URL of your home page
Please indicate explicitly if I can add you to the on-line web
page that lists each student enrolled in the class (only your
name and URL will be made public).
Class Enrollment
Class size capped at 40
currently enrollment is full
for good discussion, we should be at ~ 25
For those unable to enroll
email me with your student ID (perm #)
I expect enrollment to open up in first 1-2 weeks
Goals of this Course
Understand
How do peer-to-peer systems work?
What are the design issues in building large networked
systems?
Where is networking research heading?
Get familiar with current network research
Understand solutions in context
Goals
Assumptions
Goals of this Course (cont’d)
Appreciate what is good research
Problem selection
Solution and research methodology
Presentation
Apply what you learned in a class project
Reasons to Not Take This Class
It’s not...
a survey class on p2p systems
a reading seminar
I might do a p2p reading group / seminar later
will be focused more on reading papers
Unreasonably high expectations
this is my first grad class
it’s in my research area
And…
smaller class will benefit everyone
also, this class will likely be offered next year
What do you need to do?
A research-oriented class project
Paper reading / reviews
Lead class discussion on one research paper
Participate in discussions in class
this is meant to be an *interactive* class
Research Project
Investigate new ideas and solutions in class project
Define the problem
Execute the research
Work with your partners
Write up and present your research
Ideally, best projects will become conference or
workshop papers
e.g., SIGCOMM, MOBICOM, SOSP/OSDI, NSDI
IPTPS, WCW, HotOS, WMCSA, NOSSDAV
Research Project: Steps
I’ll distribute a list of projects
You can either choose one of these projects or come up
with your own
Pick your project, partner, and submit a one-page
proposal describing:
The problem you are solving
Your plan of attack with milestones and dates
Any special resources you may need
A midterm presentation of your progress (5
minutes)
Poster session
Submit project papers
Paper Reviews
Goal: synthesize main ideas and concepts in papers
Number: 2-3 per class
Length: no more than ½ page per paper
Content
Main points intended by the author
Points you particularly liked/disliked
Other comments (writing, conclusions…)
Submission
Submit each review via email before class on lecture day
See class web page for details
What is a Presentation?
Presentations are in Powerpoint
or PDF if you hate MSFT
Target a 30 minute presentation
that’s about 15 slides
idea is to engage the class in discussion on your points
leave the audience with thoughtful questions
You’ll be graded on several things:
did you present the high-level points of the paper?
did you get a good discussion going
did you relate the paper to others in the course (context!)
Choose Your Paper
Pick your paper from the lecture schedule online
the readings and dates will be “finalized” by Thursday
Email me to “claim” the paper
I’ll let you know if it’s already been taken
presenters names will be updated online asap
About Grading
10%
Class presentations
20%
Class Project
70%
This is a graduate networking class
Paper reviews
key is what you learn, not the grade
Evaluation of projects
2 components: execution and impact
execution:
impact:
proposal, design, implementation, evaluation, presentation
how novel is your work, how does it advance the state of the
art?
good execution B+/A-, impact A/A+
Class Topics
Study existing p2p systems:
Study fundamental system issues:
Gnutella, Freenet, Tapestry, OceanStore, PAST, RON, I3,
Vivaldi
security, fault-tolerance, system measurement and
deployment
Study issues in alternative network environments:
sensor networks
ad-hoc and mobile networks
Break
Take a break, get up, walk around, stretch
come back in 10 minutes
Course Projects
Small project teams
ideally teams of 3
2 (likely too small to tackle significant project)
4 (likely too large to coordinate well)
Choose your team members
use the class enrollment webpage (I’ll put that up shortly)
use the class mailing list
Send me a one-page project proposal by Jan 15:
what is the problem you’re solving (w/ related works)
motivation/challenges (why is this important and new)
plan of attack (w/ milestones and dates)
any resources you might need
Background on Structured Overlays
Each data item and machine (node) in the system
has associated a unique ID in a large ID space
ID space is partitioned among nodes
DHT: Hash table like interface
put(id, data)
data = get(id)
data items are stored at the node responsible for its ID
DOLR: directory service interface
publish (id)
routeToId (id, message)
data stored anywhere and located via directory
Structured Peer-to-Peer Overlays
Assign random nodeIDs and keys from secure hash
incrementally route towards destination ID
each node has small set of outgoing routes, e.g. prefix routing
ID: ABCE
ABC0
To: ABCD
A930
AB5F
What’s in a Protocol?
Definition of name-proximity
each hop gets you “closer” to destination ID
prefix routing, numerical closeness, hamming distance
Size of routing table
# of overlay routing hops
amount of state kept by each node as f (N), N = network size
worst case routing performance (in overlay hops, not IP)
Network locality
does choice of neighbor consider network distance
impact on “actual” performance of P2P routing
Application Interface
Chord
NodeIDs are numbers on ring
Closeness defined by numerical proximity
Finger table
Node 0/1024
keep routes for next node 2i away in
namespace
routing table size: log2 n
n = total # of nodes
0
128
896
Routing
iterative hops from source
at most log2 n hops
256
768
640
384
512
Chord II
Pros
Cons
simplicity
limited flexibility in routing
neighbor choices unrelated to network proximity
* but can be optimized over time
Application Interface:
distributed hash table (DHash)
Tapestry / Pastry
incremental prefix routing
routing table
11110XXX00XX 000X0000
Node 0/1024
keep nodes matching at least i digits
to destination
table size: b * logb n
routing
recursive routing from source
at most logb n hops
0
128
896
256
768
640
384
512
Routing in Detail
Example: Octal digits, 212 namespace, 2175 0157
2175
0880
0123
0154
0157
0 1
0 1
0 1
0 1
0 1
2
2
2
2
2
3 4 5
3 4 5
3 4 5
3 4 5
3 4 5
Neighbor 5712
Map
For “2175” (Octal)
6 7
6 7
6 7
6 7
6 7
0xxx
20xx
210x
2170
1xxx 0880
----
211x
2171
----
22xx
212x
2172
3xxx
23xx
213x 3210
2173
4xxx
24xx
214x
2174
5xxx 4510
25xx
215x
----
6xxx
26xx
216x
2176
7xxx
27xx
----
2177
7510
4
3
2
1
Routing Levels
Project Ideas
Discuss 10 project suggestions
some related to peer-to-peer protocols and systems
Legend: based on how well-defined projects are not
necessary how difficult they are
Well-defined projects (5)
Less-defined project (3)
You need to define project’s goals (2)
You can also work on variants of these, or make up
your own topics
Need to send me a one page proposal by Jan 15
I’ll provide feedback/meetings to iterate on topic
Project 1: Implement Chimera
Implement a structured peer-to-peer protocol that
is a hybrid of Tapestry, Pastry, and Bamboo
implement in C/C++
focus on usability and performance
Key features
simplicity (leafsets like Pastry)
stability (p2p exchange of routing tables like Bamboo)
performance (locality like Tapestry)
support “standard” p2p API and easy to build up on
might support multiple teams
Project 2: the DIY P2P protocol
Study the related literature on existing p2p
protocols, and design something different
finding an area yet unexplored in the p2p space
The result
can be new contribution in performance, security,
theoretical interest, specialized for a useful application…
The challenge
must have something significantly novel/new
well-specified protocols on routing, node
insertion/deletion, failure handling
Warning: this is not easy
Project 3-5: P2P Applications/Systems
All applications built on common API
should work w/ existing protocols and new ones developed
in class
Project 3: a lightweight p2p CDN/web-cache
use a DHT/DOLR to quickly search for desired object
use request tracking to determine optimal data placement
implement a client-side http proxy
The challenge
maintaining data stability as nodes come and go
providing fairness / load balancing across nodes
Project 4: P2P EBay
Design and implement a scalable and secure
auctioning / e-commerce system on a p2p
infrastructure
support large # of transactions/time
support large # of users and items
be stable and resilient against attacks
The challenge
understanding the types of attacks
selecting a reasonable subset to resist
maintaining scalability despite security enhancements
Project 5: Lightweight Data Synchronization
Design and implement Quartz
lightweight p2p data sharing system
store your most critical files (<100MB) online
use simple application-specific handlers to provide fast
data synchronization (a la CVS)
synchronize your HTML bookmarks across machines
synchronize your papers, homework files, financial records
end to end encryption
The challenge
keeping data highly-persistent on a dynamic p2p network
optimizing per-node operational overhead
a simple user interface that people will *actually use*
Project 6: P2P Security
Security is one of the biggest challenges in p2p
large population spread across networks and domains
high probability of node compromise
need to function in presence of malicious nodes
Existing work on security discouraging
Sybil attacks hard to prevent
“bad” users can create lots of identities online and generate
collusion attacks
hard to avoid malicious nodes
you’re 1, they are many…
Project 6: a P2P reputation system
Design and evaluate a highly-adaptive p2p
reputation system
quickly recognize malicious (or compromised) nodes
use p2p collaboration to share reputation information
form “trusted circles”
use “third-party” anonymous verification to build reputations
The challenge
building reliable reputations that are highly adaptable
doing so without generating massive network probe
traffic
minimizing impact if system is circumvented
Project 7: Distributed Workload Logging
Modelnet and Emulab provide emulation environments for up
to 1000’s of virtual nodes
good for reproducible experiments
bad: in a dedicated cluster, unlike real world conditions
PlanetLab provides real-world platform for distributed
applications (~200+ nodes across Inet)
good for deploying / hardening your application/system
bad for reproducible results
What we need
a way to gather interesting (and unexpected) environmental
conditions from distributed networks like Planetlab, and use it
as trace to drive Modelnet/Emulab
Project 7: cont.
The goal:
design, implement, and deploy a distributed sensor and
logging interface
needs to be scalable, lightweight (minimize impact on
environment), and accurate
gather a large data set, and build an environmental trace
Project 8: P2P Benchmarking
Design and test and benchmark for P2P routing
needs to account for a wide-range set of application-level
behaviors and working conditions
throughput
latency
key lookup versus data read/store
rate of node churn
The challenge
being comprehensive
being fair to goals of different protocols
Project 9: Quantifying P2P Performance
Test a set of protocols against a number of
network topologies
mostly simulations (might need to implement some
protocols)
test using different types of network models, GT-ITM,
BRITE, Waxman, etc…
vary protocol parameters to measure impact on basic
performance
The challenge
making sense of large quantities of test data
getting a better understanding of inherent properties of
network models and how they impact overlay networks
Project 10: P2P + Ad-hoc
Take p2p routing algorithms, and apply to the adhoc or sensor net space
consider new constraints: power, processing, no underlying
IP
design new routing protocol
validate via simulation
The challenge
there’s fairly little previous work, and it’s not clear the
idea is feasible
Support Infrastructure
CSIL machines
you should all have accounts
A new 32 node networking cluster
just assembled, being installed now
Dell Xeon servers, 2.4Ghz, 2G RAM
cluster building in progress
will have modelNet installed, and support large-scale
network emulation tests (~300 nodes)
Summary of TODOs
What you need to do:
send me registration email with subject:
cs290f registration
sign up online for class mailing list
http://lists.cs.ucsb.edu/mailman/listinfo/cs290f
pick a paper to present and email me to claim it
(*after* Thur, first come first serve)
find other team members and pick a project topic
(first draft of proposal due Jan 15)