Application II

Download Report

Transcript Application II

Network Applications
and Network Programming:
Web and P2P
Slides are from Richard Yang from Yale
Minor modifications are made
1
Recap: FTP, HTTP
 FTP: file transfer
ASCII (human-readable
format) requests and responses
 stateful server
 one data channel and one control channel

 HTTP
 Extensibility: ASCII requests, header lines,
entity body, and responses line
 Scalability/robustness
• stateless server (each request should contain the full
information); DNS load balancing
• Client caching Web caches

one data channel
2
Recap: WebServer Flow
Create ServerSocket(6789)
TCP socket space
connSocket = accept()
read request from
connSocket
128.36.232.5
128.36.230.2
state: listening
address: {*.6789, *.*}
completed connection queue:
sendbuf:
recvbuf:
state: established
address: {128.36.232.5:6789, 198.69.10.10.1500}
sendbuf:
recvbuf:
read
local file
write file to
connSocket
close connSocket
state: listening
address: {*.25, *.*}
completed connection queue:
sendbuf:
recvbuf:
Recap: Writing High Performance
Servers:
 Major Issues: Many socket/IO operations can
cause processing to block, e.g.,




accept: waiting for new connection;
read a socket waiting for data or close;
write a socket waiting for buffer space;
I/O read/write for disk to finish
 Thus a crucial perspective of network server
design is the concurrency design (non-blocking)


for high performance
to avoid denial of service
 A technique to avoid
blocking: Thread
Multi-Threaded Web Server
Create ServerSocket(6789)
connSocket = accept()
Create thread for
connSocket
read request from
connSocket
read request from
connSocket
read
local file
read
local file
write file to
connSocket
write file to
connSocket
close connSocket
close connSocket
5
Recap: Writing High Performance
Servers
 Problems of multiple
threads

Too many threads 
throughput meltdown,
response time explosion
Event-Driven Programming
 Event-driven programming, also called asynchronous i/o
 Tell the OS to not block when accepting/reading/writing on sockets
 Java: asynchronous i/o
 for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/
 Yields efficient and scalable concurrency
 Many examples: Click router, Flash web server, TP Monitors, etc.
Web Server
If the OS will not block on sockets, how
may the program structure look like?
Create ServerSocket(6789)
connSocket = accept()
Create thread for
connSocket
read request from
connSocket
read request from
connSocket
read
local file
read
local file
write file to
connSocket
write file to
connSocket
close connSocket
close connSocket
8
Typical Structure of Async i/o
 Typically, async i/o programs use Finite
State Machines (FSM) to monitor the
progress of requests

The state info keeps track of the execution
stage of processing each request, e.g., reading
request, writing reply, …
 The program has a loop to check potential
events at each state
9
Async I/O in Java
 An important class is the class Selector,
to support event loop
 A Selector is a multiplexer of selectable
channel objects
 example
channels: DatagramChannel,
ServerSocketChannel, SocketChannel
 use configureBlocking(false) to make a
channel non-blocking
 A selector may be created by invoking the
open method of this class
Async I/O in Java
 A selectable channel registers
events (called a
SelectionKey) with a
selector with the register
method
 A SelectionKey object
contains two operation sets


Selectable Channel
register
interest Set
ready Set
 A SelectionKey object has
an attachment which can store
data

Selector
often the attachment is a
buffer
Selection Key
Async I/O in Java
 Call select (or selectNow(), or
select(int timeout)) to check for
ready events, called the selected key set
 Iterate over the set to process all ready
events
Problems of Event-Driven Server
 Difficult to engineer, modularize, and tune
 No performance/failure isolation between
Finite-State-Machines (FSMs)
 FSM code can never block (but page faults,
i/o, garbage collection may still force a
block)

thus still need multiple threads
Summary of Traditional
C-S Web Servers
DNS
 Is the
application
extensible,
scalable,
robust,
secure?
app.
server
C0
client 1
client 2
client n
client 3
14
Content Distribution History...
“With 25 years of Internet experience, we’ve
learned exactly one way to deal with the
exponential growth: Caching”.
(1997, Van Jacobson)
15
Web Caches (Proxy)
 Web caches/proxy
placed at entrance of
an ISP
 Client sends all http
requests to web cache


if object at web
cache, web cache
immediately returns
object in http
response
else requests object
from origin server,
then returns http
response to client
origin
server
client
client
Proxy
server
origin
server
16
Web Proxy/Cache
app.
server
 Web caches give good
performance because very
often
 a single client
repeatedly accesses
the same
document
 a nearby client also
accesses the same
document
 Cache Hit ratio
increases
logarithmically with
number of users
C0
ISP
cache
ISP
cache
client 4
client 1
client 5
client 2
client 3
client 6
17
Benefits of Web Caching
Assume: cache is “close”
to client (e.g., in same
network)
 smaller response time:
cache “closer” to
client
 decrease traffic to
distant servers

link out of
institutional/local ISP
network often
bottleneck
origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
18
What went wrong with Web Caches?
 Web protocols evolved extensively to
accommodate caching, e.g. HTTP 1.1
 However, Web caching was developed with a
strong ISP perspective, leaving content providers
out of the picture


It is the ISP who places a cache and controls it
ISPs only interest to use Web caches is to reduce
bandwidth
 In the USA: Bandwidth relative cheap
 In Europe, there were many more Web caches
 However, ISPs can arbitrarily tune Web caches to
deliver stale content
19
Content Provider Perspective
 Content providers care about
User experience latency
 Content freshness
 Accurate access statistics
 Avoid flash crowds
 Minimize bandwidth usage in their access link

20
Content Distribution Networks
 Content Distribution Networks (CDNs) build an
overlay networks of caches to provide fast, cost
effective, and reliable content delivery, while
working tightly with content providers.
 Example:
 Akamai – original and largest commercial CDN
operates over 25,000 servers in over 1,000
networks
 Akamai (AH kuh my) is Hawaiian for intelligent,
clever and informally “cool”. Founded Apr 99,
Boston MA by MIT students
21
Basic of Akamai Operation
 Content provider server

provides the base HTML document
 Akamai caches embedded objects at a set of
its cache servers (called edge servers)
 Akamaization of embedded content: e.g.,
<IMG SRC= http://www.provider.com/image.gif >
changed to
<IMGSRC = http://a661. g.akamai.net/hash/image.gif>
 Akamai customizes DNS to select serving
edge servers based on
closeness to client browser
 server load

22
More Akamai information
 URL akamaization is becoming obsolete and
only supported for legacy reasons
Currently most content providers prefer to use
DNS CNAME techniques to get all their content
served from the Akamai servers
 still content providers need to run their origin
servers

 Akamai Evolution:
 Files/streaming
 Secure pages and whole pages
 Dynamic page assembly at the edge (ESI)
 Distributed applications
23
Lab: Problems of Traditional
Content Distribution
DNS
app.
server
C0
client 1
client 2
client n
client 3
24
Objectives of P2P
 Share the resources
(storage and
bandwidth) of
individual clients to
improve
scalability/robustness
Internet
 Bypass DNS to find
clients with resources!

examples: instant
messaging, skype
25
P2P
 But P2P is not new
 Original Internet was a p2p system:



The original ARPANET connected UCLA,
Stanford Research Institute, UCSB, and Univ.
of Utah
No DNS or routing infrastructure, just
connected by phone lines
Computers also served as routers
 P2P is simply an iteration of scalable
distributed systems
P2P Systems
 File Sharing: BitTorrent, LimeWire
 Streaming: PPLive, PPStream, Zatto, …
 Research systems
 Collaborative computing:
 SETI@Home project
• Human genome mapping
• Intel NetBatch: 10,000 computers in 25 worldwide
sites for simulations, saved about 500million
Peer-to-Peer Computing
-
40-70% of total traffic in many networks
-
upset the music industry, drawn college students,
web developers, recording artists and universities
into court
Source: ipoque Internet study 2008/2009
Recap: P2P Objectives
 Bypass DNS to locate
clients with resources!
 examples:
instant
messaging, skype
Internet
 Share the storage
and bandwidth of
individual clients to
improve
scalability/robustness
29
The Lookup Problem
N1
Key=“title”
Value=MP3 data…
Publisher
N2
Internet
N4
N5
N3
?
Client
Lookup(“title”)
N6
find where a particular file is stored
pay particular attention to see its equivalence of DNS
Outline
 Recap
 P2P
 the lookup problem
 Napster
31
Centralized Database: Napster
 Program for sharing music over the Internet
 History:







5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster
Online music service, wrote the program in 60 hours
12/99: first lawsuit
3/00: 25% UWisc traffic Napster
2000: est. 60M users
2/01: US Circuit Court of
Appeals: Napster knew users
violating copyright laws
7/01: # simultaneous online users:
Napster 160K
9/02: bankruptcy
We are referring to the Napster before closure.
03/2000
32
Napster: How Does it Work?
Application-level, client-server protocol over TCP
A centralized index system that maps files (songs) to
machines that are alive and with files
Steps:
 Connect to Napster server
 Upload your list of files (push) to server
 Give server keywords to search the full list
 Select “best” of hosts with answers
33
Napster Architecture
34
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
Napster: Search
123.2.0.18
search(A)
-->
123.2.0.18
124.1.0.1
Query
Where is file A?
Reply
124.1.0.1
Napster: Ping
123.2.0.18
ping
124.1.0.1
ping
Napster: Fetch
123.2.0.18
124.1.0.1
fetch
Napster Messages
General Packet Format
[chunksize]
[chunkinfo]
[data...]
CHUNKSIZE:
Intel-endian 16-bit integer
size of [data...] in bytes
CHUNKINFO: (hex)
Intel-endian 16-bit integer.
5B - whois query
00 - login rejected
02 - login requested
5C - whois result
03 - login accepted
5D - whois: user is offline!
0D - challenge? (nuprin1715)
69 - list all channels
2D - added to hotlist
6A - channel info
2E - browse error (user isn't online!) 90 - join channel
2F - user offline
91 - leave channel
…..
39
Centralized Database: Napster
 Summary of features: a hybrid design
 control: client-server (aka special DNS) for files
 data: peer to peer
 Advantages
 simplicity, easy to implement sophisticated search
engines on top of the index system
 Disadvantages
 application specific (compared with DNS)
 lack of robustness, scalability: central search
server single point of bottleneck/failure
 easy to sue !
40
Variation: BitTorrent
 A global central index server is replaced by
one tracker per file (called a swarm)

reduces centralization; but needs other means
to locate trackers
 The bandwidth scalability management
technique is more interesting

more later
41
Outline
 Recap

P2P
 the lookup problem
 Napster (central query server; distributed
data servers)

Gnutella
42
Gnutella
 On March 14th 2000, J. Frankel and T.
Pepper from AOL’s Nullsoft division (also
the developers of the popular Winamp mp3
player) released Gnutella
 Within hours, AOL pulled the plug on it
 Quickly reverse-engineered and soon many
other clients became available: Bearshare,
Morpheus, LimeWire, etc.
43
Decentralized Flooding: Gnutella
 On startup, client contacts other servents (server + client)
in network to form interconnection/peering relationships

servent interconnection used to forward control (queries, hits,
etc)
 How to find a resource record: decentralized flooding


send requests to neighbors
neighbors recursively forward the requests
44
Decentralized Flooding
C
E
F
B
D
M
J
A
L
G
H
K
I
S
N
45
Decentralized Flooding
send query to neighbors
C
E
F
B
D
M
J
A
L
G
H
K
I
S
N
Each node forwards the query to its neighbors other than the one
who forwards it the query

46
Background: Decentralized Flooding
C
E
F
B
D
J
A
L
G
H
K
I

M
S
N
Each node should keep track of forwarded queries to avoid loop !
 nodes keep state (which will time out---soft state)
 carry the state in the query, i.e. carry a list of visited nodes
47
Decentralized Flooding: Gnutella
 Basic message header
 Unique ID, TTL, Hops
 Message types
 Ping – probes network for other servents
 Pong – response to ping, contains IP addr, # of files, etc.




Query – search criteria + speed requirement of servent
QueryHit – successful response to Query, contains addr +
port to transfer from, speed of servent, etc.
Ping, Queries are flooded
QueryHit, Pong: reverse path of previous message
48
Advantages and Disadvantages
of Gnutella
 Advantages:
 totally decentralized, highly robust
 Disadvantages:
 not scalable; the entire network can be swamped with
flood requests
• especially hard on slow clients; at some point broadcast
traffic on Gnutella exceeded 56 kbps

to alleviate this problem, each request has a TTL to limit
the scope
• each query has an initial TTL, and each node forwarding it
reduces it by one; if TTL reaches 0, the query is dropped
(consequence?)
49
Flooding: FastTrack (aka Kazaa)
 Modifies the Gnutella protocol into two-level hierarchy
 Supernodes
Nodes that have better connection to Internet
 Act as temporary indexing servers for other nodes
 Help improve the stability of the network
 Standard nodes
 Connect to supernodes and report list of files
 Search
 Broadcast (Gnutella-style) search across supernodes
 Disadvantages
 Kept a centralized registration  prone to law suits

Optional Slides
51
Optional Slides
52
Aside: Search Time?
Aside: All Peers Equal?
1.5Mbps DSL
1.5Mbps DSL
56kbps Modem
1.5Mbps DSL
10Mbps LAN
1.5Mbps DSL
56kbps Modem
56kbps Modem
Aside: Network Resilience
Partial Topology
Random 30% die
Targeted 4% die
from Saroiu et al., MMCN 2002
Asynchronous Network
Programming
(C/C++)
56
A Relay TCP Client: telnet-like Program
fgets
fputs
TCP
client
writen
readn
TCP
server
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpclient
57
Method 1: Process and Thread
 process
fork()
 waitpid()

 Thread: light weight process
 pthread_create()
 pthread_exit()
58
pthread
Void main() {
char recvline[MAXLINE + 1];
ss = new socketstream(sockfd);
pthread_t tid;
if (pthread_create(&tid, NULL, copy_to, NULL)) {
err_quit("pthread_creat()");
}
}
while (ss->read_line(recvline, MAXLINE) > 0) {
fprintf(stdout, "%s\n", recvline);
}
void *copy_to(void *arg) {
char sendline[MAXLINE];
if (debug) cout << "Thread create()!" << endl;
while (fgets(sendline, sizeof(sendline), stdin))
ss->writen_socket(sendline, strlen(sendline));
shutdown(sockfd, SHUT_WR);
if (debug) cout << "Thread done!" << endl;
}
pthread_exit(0);
59
Method 2: Asynchronous I/O
(Select)
 select: deal with blocking system call
int select(int n, fd_set *readfds, fd_set
*writefds, fd_set *exceptfds, struct
timeval *timeout);
FD_CLR(int fd, fd_set *set);
FD_ZERO(fd_set *set);
FD_ISSET(int fd, fd_set *set);
FD_SET(int fd, fd_set *set);
60
Method 3: Signal and Select
 signal: events such as timeout
61
Examples of Network Programming
 Library to make life easier
 Four design examples
 TCP Client
 TCP server using select
 TCP server using process and thread
 Reliable UDP
 Warning: It will be hard to listen to me
reading through the code. Read the code.
62
Example 2: A Concurrent TCP
Server Using Process or Thread
 Get a line, and echo it back
 Use select()
 For how to use process or thread, see later
 Check the code at:
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpserver
 Are there potential denial of service problems with the
code?
63
Example 3: A Concurrent HTTP
TCP Server Using Process/Thread
 Use process-per-request or thread-per-
request
 Check the code at:
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/simple_httpd
64