Content-aware Switch

Download Report

Transcript Content-aware Switch

Introduction to
Content-aware Switch
Presented by
Li Zhao
Content-aware Switch (CS)
www.yahoo.com
Internet
Image Server
IP
TCP
APP. DATA
Application Server
GET /cgi-bin/form HTTP/1.1
Host: www.yahoo.com…
Switch
HTML Server
• Front-end of a web cluster
• Route packets based on layer 5/7 (content)
information
Why use CS
• Servers can be specialized for certain types of
request
– Content segregation
• Exploit locality
– Affinity-based routing
– Increase the performance because of the improved
hit rate
• Partial replication of server file set
– Partition the server’s file set over different nodes
Content-aware Switch
Architecture
• Two way architecture
Server returns the
response to the switch
• One way architecture
Server returns the
response to the client
client
switch
server
Layer 7 Two-way Architecture
Layer-7 Two-way Mechanisms
• TCP gateway
An application level proxy
running on the web switch
mediates the communication
between the client and the server
user
kernel
• TCP splicing
reduce the overhead in TCP
gateway. Packet forwarding
occurs at network level between
the network interface driver and
the TCP/IP stack, is carried out
directly by OS
user
kernel
TCP Splicing
client
SYN(CSEQ)
DATA(CSEQ+1)
ACK(DSEQ+1)
server
content switch
step1
SYN(DSEQ)
ACK(CSEQ+1)
step4
step5
step6
DATA(DSEQ+1) step7
ACK(CSEQ+LenR+1)
step8
ACK(DSEQ+lenD+1)
step2
step3
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(CSEQ+lenR+1)
ACK(SSEQ+lenD+1)
lenR: size of http request.
.
lenD: size of return document
TCP Splicing w/ Pre-forked
Connections
switch
client
step1
server
SYN(PSEQ)
step2
step3
SYN(CSEQ)
DATA(CSEQ+1)
ACK(DSEQ+1)
ACK(SSEQ+1)
SYN(SSEQ)
ACK(PSEQ+1)
step4
SYN(DSEQ)
ACK(CSEQ+1)
step7
DATA(DSEQ+1)
ACK(CSEQ+LenR+1) step8
ACK(DSEQ+lenD+1)
step9
step5
step6
DATA(PSEQ+1)
ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(PSEQ+lenR+1)
ACK(SSEQ+lenD+1)
lenR: size of http request.
.
lenD: size of return document
Pre-Allocate Server Scheme
client
content switch
SYN(CSEQ)
step1
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(CSEQ+LenR+1)
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
step3 DATA(CSEQ+1)
ACK(SSEQ+1)
step4
DATA(SSEQ+1)
step2
step5
ACK(SSEQ+ lenD+1)
Pre-allocated
server
ACK(CSEQ+lenR+1)
ACK(SSEQ+lenD+1)
• Use a guess routing decision based on IP/Port#/History
• Advantage:
• Faster than TCP splicing.
• Reduce session processing overhead
no need to convert server sequence #
client
Degenerated to TCP Splicing
If Guess Wrong
Pre-allocated
content switch
SYN(CSEQ)
DATA(CSEQ+1)
ACK(SSEQ+1)
SYN(SSEQ)
ACK(CSEQ+1)
step4
step5
step6
DATA(SSEQ+1)
ACK(CSEQ+LenR+1)
ACK(DSEQ+lenD+1)
server
step1
SYN(CSEQ)
step2
SYN(SSEQ)
ACK(CSEQ+1)
FIN(CSEQ+1)
step3
SYN(CSEQ)
Right server
SYN(RSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
step4
DATA(RSEQ+1)
ACK(CSEQ+lenR+1)
step5
ACK(SSEQ+lenD+1)
Sequence # conversion needed
Case Study
• Linux-based content aware switch
[Yang99]
• IBM Layer 5 [Pradhan00]
Functional Overview of Contentaware Distributor
Results
• Overhead of the switch
• 89usec reduced  pre-forked
connections
• CS vs. Layer 4 switch
• Affinity-based routing vs. WRR
• Content-segregation vs. WRR
• CGI: 27%
• Static: 36%
IBM Switch Architecture
• Switch core
• Port controller:
– Identify packets
(layer 5) and send
them to CPU
– Processing all
other packets
• CPU: PowerPC
603e
– Parse http request
– URL based routing
Flow Diagram on Layer 5 System
• Client ports vs. server ports
• Classifier: Identify packets
Results
• CS vs. Layer 4
switch
– Entire set of
files are
replicated
– Some servers
share files by
NFS
– Partitioned file
set
Layer-7 one-way architecture
Layer-7 one-way mechanisms
• TCP handoff
The switch hands off the TCP connection
endpoint to the server
• TCP connection hop
– Software-based proprietary solution
– encapsulating the IP packet in an RPX packet
and sending it to the server.
TCP Handoff
client
content switch
SYN(CSEQ)
step1
SYN(DSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(DSEQ+1)
step4
step5
step6
ACK(DSEQ+lenD+1)
•
•
server
step2
step3
Migrate
(Data, CSEQ, DSEQ)
DATA(DSEQ+1)
ACK(CSEQ+lenR+1)
ACK(DSEQ+lenD+1)
Migrate the created TCP connection from the switch to the back-end sever
– Create a TCP connection at the back-end without going through the TCP
three-way handshake
– Retrieve the state of an established connection and destroy the connection
without going through the normal message handshake required to close a
TCP connection
Once the connection is handed off to the back-end server, the switch must
forward packets from the client to the appropriate back-end server
References
• [Pradhan00] G.Apostolopoulos, et. al, Design, Implementation and
Performance of a Content-Based Switch, proceedings of IEEE
INFOCOM-2000
• [Pai98] V.S. Pai, et. al, Locality-Aware Request Distribution in Clusterbased Network Servers. In Proceedings of the 8th Conference on
Architectural Support for Programming Languages and Operating
Systems, San Jose, CA, Oct.1998
• [Aron00] Mohit Aron et. al, Scalable Content-aware Request Distribution
in Cluster-based Network Servers, Proc. of the 2000 Annual Usenix
Technical Conference, June 2000
• [Edward] C. Edward Chow Chow, Introduction to content switch
• [Valeria01] Valeria Cardellini, et. al, The state of the Art in Locally
Distributed Web-server Systems, IBM research report
• [Yang99] Chu-Sing Yang, et. Al, Efficient support for content-based
rouging in web server clusters, Proc. Of USITS’ 99