CS290F, Winter 2004 Large-scale Networked Systems

Download Report

Transcript CS290F, Winter 2004 Large-scale Networked Systems

CS290F, Winter 2005
Large-scale Networked Systems

Instructor

Ben Y. Zhao (ravenben at cs.ucsb.edu, 1151 Engineering I)

Office hour: Thur, 3-4pm

Lecture time: TuTh, 1:00-3:00 pm

Place: 1401 Phelps

Teaching Assistant

Ted Huffmire (huffmire at cs.ucsb.edu)

Office hours: TBA
Overview

Administrivia

What this class is, and is not

Enrollment, etc…

What you have to do for this class

Grading

Break

Course projects

organization

some initial ideas
Administrivia



Course webpage

http://www.cs.ucsb.edu/~ravenben/classes/290F

check it periodically for announcements
Class mailing list

cs290f at lists.cs.ucsb.edu

go to: http://lists.cs.ucsb.edu/mailman/listinfo/cs290f

sign up today!
Deadlines


Unless otherwise specified, it means 10 minutes before the
lecture
Special circumstances should be brought to my attention before
deadlines
Class Registration


Please send me ([email protected] ) an e-mail with the
subject “cs290f registration" and the following:

Last and first name

Student ID

Your department

Preferred email address

URL of your home page
Please indicate explicitly if I can add you to the on-line web
page that lists each student enrolled in the class (only your
name and URL will be made public).
Class Enrollment


Class size capped at 40

currently enrollment is full

for good discussion, we should be at ~ 25
For those unable to enroll

email me with your student ID (perm #)

I expect enrollment to open up in first 1-2 weeks
Goals of this Course

Understand



How do peer-to-peer systems work?
What are the design issues in building large networked
systems?
Where is networking research heading?

Get familiar with current network research

Understand solutions in context

Goals

Assumptions
Goals of this Course (cont’d)


Appreciate what is good research

Problem selection

Solution and research methodology

Presentation
Apply what you learned in a class project
Reasons to Not Take This Class


It’s not...

a survey class on p2p systems

a reading seminar
I might do a p2p reading group / seminar later



will be focused more on reading papers
Unreasonably high expectations

this is my first grad class

it’s in my research area
And…

smaller class will benefit everyone

also, this class will likely be offered next year
What do you need to do?

A research-oriented class project

Paper reading / reviews

Lead class discussion on one research paper

Participate in discussions in class

this is meant to be an *interactive* class
Research Project


Investigate new ideas and solutions in class project

Define the problem

Execute the research

Work with your partners

Write up and present your research
Ideally, best projects will become conference or
workshop papers

e.g., SIGCOMM, MOBICOM, SOSP/OSDI, NSDI

IPTPS, WCW, HotOS, WMCSA, NOSSDAV
Research Project: Steps

I’ll distribute a list of projects



You can either choose one of these projects or come up
with your own
Pick your project, partner, and submit a one-page
proposal describing:

The problem you are solving

Your plan of attack with milestones and dates

Any special resources you may need
A midterm presentation of your progress (5
minutes)

Poster session

Submit project papers
Paper Reviews

Goal: synthesize main ideas and concepts in papers

Number: 2-3 per class

Length: no more than ½ page per paper

Content


Main points intended by the author

Points you particularly liked/disliked

Other comments (writing, conclusions…)
Submission

Submit each review via email before class on lecture day

See class web page for details
What is a Presentation?

Presentations are in Powerpoint



or PDF if you hate MSFT
Target a 30 minute presentation

that’s about 15 slides

idea is to engage the class in discussion on your points

leave the audience with thoughtful questions
You’ll be graded on several things:

did you present the high-level points of the paper?

did you get a good discussion going

did you relate the paper to others in the course (context!)
Choose Your Paper

Pick your paper from the lecture schedule online


the readings and dates will be “finalized” by Thursday
Email me to “claim” the paper

I’ll let you know if it’s already been taken

presenters names will be updated online asap
About Grading

10%
Class presentations
20%
Class Project
70%
This is a graduate networking class


Paper reviews
key is what you learn, not the grade
Evaluation of projects


2 components: execution and impact
execution:


impact:


proposal, design, implementation, evaluation, presentation
how novel is your work, how does it advance the state of the
art?
good execution  B+/A-, impact  A/A+
Class Topics

Study existing p2p systems:


Study fundamental system issues:


Gnutella, Freenet, Tapestry, OceanStore, PAST, RON, I3,
Vivaldi
security, fault-tolerance, system measurement and
deployment
Study issues in alternative network environments:

sensor networks

ad-hoc and mobile networks
Break

Take a break, get up, walk around, stretch

come back in 10 minutes
Course Projects



Small project teams

ideally teams of 3

2 (likely too small to tackle significant project)

4 (likely too large to coordinate well)
Choose your team members

use the class enrollment webpage (I’ll put that up shortly)

use the class mailing list
Send me a one-page project proposal by Jan 15:

what is the problem you’re solving (w/ related works)

motivation/challenges (why is this important and new)

plan of attack (w/ milestones and dates)

any resources you might need
Background on Structured Overlays

Each data item and machine (node) in the system
has associated a unique ID in a large ID space

ID space is partitioned among nodes

DHT: Hash table like interface


put(id, data)

data = get(id)

data items are stored at the node responsible for its ID
DOLR: directory service interface

publish (id)

routeToId (id, message)

data stored anywhere and located via directory
Structured Peer-to-Peer Overlays

Assign random nodeIDs and keys from secure hash


incrementally route towards destination ID
each node has small set of outgoing routes, e.g. prefix routing
ID: ABCE
ABC0
To: ABCD
A930
AB5F
What’s in a Protocol?


Definition of name-proximity

each hop gets you “closer” to destination ID

prefix routing, numerical closeness, hamming distance
Size of routing table


# of overlay routing hops



amount of state kept by each node as f (N), N = network size
worst case routing performance (in overlay hops, not IP)
Network locality

does choice of neighbor consider network distance

impact on “actual” performance of P2P routing
Application Interface
Chord

NodeIDs are numbers on ring

Closeness defined by numerical proximity

Finger table


Node 0/1024
keep routes for next node 2i away in
namespace

routing table size: log2 n

n = total # of nodes
0
128
896
Routing

iterative hops from source

at most log2 n hops
256
768
640
384
512
Chord II

Pros


Cons



simplicity
limited flexibility in routing
neighbor choices unrelated to network proximity
* but can be optimized over time
Application Interface:

distributed hash table (DHash)
Tapestry / Pastry

incremental prefix routing


routing table



11110XXX00XX 000X0000
Node 0/1024
keep nodes matching at least i digits
to destination
table size: b * logb n
routing

recursive routing from source

at most logb n hops
0
128
896
256
768
640
384
512
Routing in Detail
Example: Octal digits, 212 namespace, 2175  0157
2175
0880
0123
0154
0157
0 1
0 1
0 1
0 1
0 1
2
2
2
2
2
3 4 5
3 4 5
3 4 5
3 4 5
3 4 5
Neighbor 5712
Map
For “2175” (Octal)
6 7
6 7
6 7
6 7
6 7
0xxx
20xx
210x
2170
1xxx 0880
----
211x
2171
----
22xx
212x
2172
3xxx
23xx
213x 3210
2173
4xxx
24xx
214x
2174
5xxx 4510
25xx
215x
----
6xxx
26xx
216x
2176
7xxx
27xx
----
2177
7510
4
3
2
1
Routing Levels
Project Ideas

Discuss 10 project suggestions




some related to peer-to-peer protocols and systems
Legend: based on how well-defined projects are not
necessary how difficult they are

Well-defined projects (5)

Less-defined project (3)

You need to define project’s goals (2)
You can also work on variants of these, or make up
your own topics
Need to send me a one page proposal by Jan 15

I’ll provide feedback/meetings to iterate on topic
Project 1: Implement Chimera


Implement a structured peer-to-peer protocol that
is a hybrid of Tapestry, Pastry, and Bamboo

implement in C/C++

focus on usability and performance
Key features

simplicity (leafsets like Pastry)

stability (p2p exchange of routing tables like Bamboo)

performance (locality like Tapestry)

support “standard” p2p API and easy to build up on

might support multiple teams
Project 2: the DIY P2P protocol

Study the related literature on existing p2p
protocols, and design something different



finding an area yet unexplored in the p2p space
The result


can be new contribution in performance, security,
theoretical interest, specialized for a useful application…
The challenge


must have something significantly novel/new
well-specified protocols on routing, node
insertion/deletion, failure handling
Warning: this is not easy
Project 3-5: P2P Applications/Systems

All applications built on common API



should work w/ existing protocols and new ones developed
in class
Project 3: a lightweight p2p CDN/web-cache

use a DHT/DOLR to quickly search for desired object

use request tracking to determine optimal data placement

implement a client-side http proxy
The challenge

maintaining data stability as nodes come and go

providing fairness / load balancing across nodes
Project 4: P2P EBay


Design and implement a scalable and secure
auctioning / e-commerce system on a p2p
infrastructure

support large # of transactions/time

support large # of users and items

be stable and resilient against attacks
The challenge

understanding the types of attacks

selecting a reasonable subset to resist

maintaining scalability despite security enhancements
Project 5: Lightweight Data Synchronization

Design and implement Quartz

lightweight p2p data sharing system

store your most critical files (<100MB) online



use simple application-specific handlers to provide fast
data synchronization (a la CVS)

synchronize your HTML bookmarks across machines

synchronize your papers, homework files, financial records
end to end encryption
The challenge

keeping data highly-persistent on a dynamic p2p network

optimizing per-node operational overhead

a simple user interface that people will *actually use*
Project 6: P2P Security


Security is one of the biggest challenges in p2p

large population spread across networks and domains

high probability of node compromise

need to function in presence of malicious nodes
Existing work on security discouraging

Sybil attacks hard to prevent


“bad” users can create lots of identities online and generate
collusion attacks
hard to avoid malicious nodes

you’re 1, they are many…
Project 6: a P2P reputation system


Design and evaluate a highly-adaptive p2p
reputation system

quickly recognize malicious (or compromised) nodes

use p2p collaboration to share reputation information

form “trusted circles”

use “third-party” anonymous verification to build reputations
The challenge



building reliable reputations that are highly adaptable
doing so without generating massive network probe
traffic
minimizing impact if system is circumvented
Project 7: Distributed Workload Logging



Modelnet and Emulab provide emulation environments for up
to 1000’s of virtual nodes

good for reproducible experiments

bad: in a dedicated cluster, unlike real world conditions
PlanetLab provides real-world platform for distributed
applications (~200+ nodes across Inet)

good for deploying / hardening your application/system

bad for reproducible results
What we need

a way to gather interesting (and unexpected) environmental
conditions from distributed networks like Planetlab, and use it
as trace to drive Modelnet/Emulab
Project 7: cont.

The goal:



design, implement, and deploy a distributed sensor and
logging interface
needs to be scalable, lightweight (minimize impact on
environment), and accurate
gather a large data set, and build an environmental trace
Project 8: P2P Benchmarking

Design and test and benchmark for P2P routing


needs to account for a wide-range set of application-level
behaviors and working conditions

throughput

latency

key lookup versus data read/store

rate of node churn
The challenge

being comprehensive

being fair to goals of different protocols
Project 9: Quantifying P2P Performance

Test a set of protocols against a number of
network topologies




mostly simulations (might need to implement some
protocols)
test using different types of network models, GT-ITM,
BRITE, Waxman, etc…
vary protocol parameters to measure impact on basic
performance
The challenge


making sense of large quantities of test data
getting a better understanding of inherent properties of
network models and how they impact overlay networks
Project 10: P2P + Ad-hoc

Take p2p routing algorithms, and apply to the adhoc or sensor net space


consider new constraints: power, processing, no underlying
IP

design new routing protocol

validate via simulation
The challenge

there’s fairly little previous work, and it’s not clear the
idea is feasible
Support Infrastructure

CSIL machines


you should all have accounts
A new 32 node networking cluster

just assembled, being installed now

Dell Xeon servers, 2.4Ghz, 2G RAM

cluster building in progress

will have modelNet installed, and support large-scale
network emulation tests (~300 nodes)
Summary of TODOs

What you need to do:




send me registration email with subject:
cs290f registration
sign up online for class mailing list
http://lists.cs.ucsb.edu/mailman/listinfo/cs290f
pick a paper to present and email me to claim it
(*after* Thur, first come first serve)
find other team members and pick a project topic
(first draft of proposal due Jan 15)