Content Processing C. Edward Chow Department of Computer

Download Report

Transcript Content Processing C. Edward Chow Department of Computer

Introduction to Content Switch
C. Edward Chow
Department of Computer Science
University of Colorado at Colorado Springs
[email protected]
This tutorial is available at
http://cs.uccs.edu/~chow/pub/agere/contentswitch.ppt
With agere as login and ag2003ere as password
4/11/2003
Edward Chow
Content Switch 1
Outline of the Talk
•
•
•
•
•
•
•
•
Overview of Content Delivery Network and Linux
Virtual Server Technologies.
Overview of Content Switching Concepts
TCP Delayed Binding and Their Improvement
Conflict Detection in Content switching Rule Set
Persistent Issues
Problems Encountered in Content Processing and
their Solutions
Specific Implementations and Their Performance:
Achieving High Availability with Content Switch.
4/11/2003
Edward Chow
Content Switch 2
Content Delivery Network (CDN)
Slow Response
Huge Requests
@Home
Clients
Host Server
PSINet
Sprint
QWest
UUnet
Mind
Spring
Server Crash
Gloobix
Clients
Clients
4/11/2003
Edward Chow
Content Switch 3
Content Delivery Problems
http://www.akamai.com
4/11/2003
Edward Chow
Content Switch 4
Use Client Cache/
Client Side Cache Server
Fewer Requests
Clients
Fast Response
Client
Cache
Clients
4/11/2003
PSINet
QWest
@Home
Sprint
Host
Server
UUnet
Mind
Spring
Client
Side
Cache
Server
Gloobix
Clients
Edward Chow
Content Switch 5
Use Mirror Sites
Need improvement by guiding the selection of mirror servers
with server load/network bandwidth measurement
Mirror Site Fewer Requests
Clients
PSINet
@Home
Host Server
Sprint
QWest
UUnet
Server
Clients
Mind
Spring
Fast Response
4/11/2003
Gloobix
Mirror Site
Edward Chow
Clients
Content Switch 6
Edge Network Cache Servers
Mirror Site
Fast Response
Clients
Cache
Server
PSINet
Client
Cache
Cache
Server
@Home
Host Server
UUnet
QWest
Server
Mind
Spring
4/11/2003
Cache
Server
Sprint
Cache
Server
Clients
Fewer Requests
Client
Side
Cache
Server
Gloobix
Edge
Network
Cache
Server
Edward Chow
Mirror Site
Clients
Content Switch 7
Content Delivery Problem
• Cache Location Problem:
Where to put cache servers?
• How many are needed?
• When/where/how to push/delivery the content?
• How about dynamic content?
4/11/2003
Edward Chow
Content Switch 8
Akamai Edge Delivery Service
Date
11/2000
# of
# of Networks # of Countries
Edge Servers
6000
335
54
6/2001
9700
650
56
• Peering Bottleneck Problem:
Access traffic evenly spread over 7400+ networks
(no one over 5%; most << 1%)
 Need to put edge servers in many networks.
• 11/2000, 4 billion bits/day for 2800 sites.
• Source Http://www.akamai.com
4/11/2003
Edward Chow
Content Switch 9
Caching Dynamic Content at
Web Proxies
• Active Cache Project : [PeiCao 98] Univ. Wisconsin
– Cache Java applet to be executed at proxies
– Choice of passing to server, delivery cached copy,
or generate dynamically.
• Edge Side Include (ESI):
– XML tag to specify ESI fragment in a web page.
– Each ESI fragment can have different cache/
4/11/2003
Edward Chow
Content Switch 10
Edge Side Include Example
http://www.esi.org/
<table>
<tr><td colspan=“2”>
<esi:try>
<esi:attempt>
<esi:include
src=http://www.myxyz.com/news/top.html
onerror=“contineu” />
</esi:attempt>
<esi:except>
<!- -esi
This spot is reserved for your company’s
advertising. For more info <a
href=www.myxyz.com> click here </a>
-->
</esi:except>
</esi:try>
</td></tr>
</table>
4/11/2003
Edward Chow
Content Switch 11
Solution to First Mile Problem
• First Mile Problem: Hugh requests at web site of CDN
• High Bandwidth Connection
• Caching
– End System Cache
• Client Cache
• Client Site Proxy Cache Server
• Mirror Site Caches
– Cache Servers in Internet
• Hierarchical Cache Servers, e.g., Squid/Harvest/Adaptive Web
• Edge Servers of Akamai
• Faster Server/Server Farm (Server Side Caching+Cluster)
• Layer4 Load balancer+Real Servers
• Content Switch+Real Servers
• Distributed Packet Rewrite
4/11/2003
Edward Chow
Content Switch 12
Web Server Cluster
Load balancer can run at
• Application Level — Reverse Proxy
Real
Real
Server
Server
• Kernel level — Linux Virtual Server
Load Balancer
or
Real
Server
Content Switch
Load balancer can distribute requests based on
• Layer 3-4 info — fixe field/fast hash
• Layer 7 info — var. length/slow parsing
4/11/2003
Edward Chow
Real
Server
Content Switch 13
Comparison of Load Balancers
• Reverse Proxy runs as application process requires more
memory/packet copying.
• Linux Virtual Server runs in kernelno memory copying
Name
Type
Level
Layer Info
Reverse Proxy/
Apache/Tomcat/Servlet
SW
Application
3-7
Linux Virtual Server
SW
Kernel
3-4
Linux Content Switch
SW
Kernel/Appl.
3-7
Layer4 Switch (narrow def.) HW
Embedded OS 3-4
Content/Web Switch
Embedded OS 3-7
4/11/2003
HW
Edward Chow
Content Switch 14
Linux Virtual Server (LVS)
• “Virtual server is a highly scalable and highly
available server built on a cluster of real servers. The
architecture of the cluster is transparent to end users,
and the users see only a single virtual server” with
Virtual IP address (VIP).
RIP1 Real
• Http://www.linuxvirtualserver.org/
Server1
Internet
VIP
CIP
Client
4/11/2003
WAN/
LAN
Load Balancer/Director
Linux Box
CIP: Client IP Address
VIP: Virutal IP Address Edward Chow
RIP: Real Server IP Address
RIP2
Real
Server2
RIP3
Real
Server3
Content Switch 15
LVS-NAT Configuration
(Network Address Translation)
•
•
•
•
•
All return traffic go through DirectorSlow
Modify IP addr/port #/Checksum at Director
Director and real servers at same LAN
No modification needed on real-servers
Port remapping: real web server can run
RIP1
on 8080
Real
Server1
RIP2
Internet
Real
Server2
VIP
CIP
Client
4/11/2003
Director
Switch
RIP3
Real
Server3
Edward Chow
Content Switch 16
LVS-NAT Configuration
Step 2. Director routes Pkt
• Based on CIP, source port#, VIP and dst port#,
director selects one of the real servers
• Change the dst IP addr or port # of pkt.
CIP VIP
1. request
RIP1 Real
2. Scheduling/
Server1
Rewrite packet CIP RIP1
RIP2
Internet
Real
Server2
VIP
Director
Switch
CIP
ipvsadm cmd
Client
LVS Routing
Scheduling Rules
4/11/2003
Edward Chow
RIP3
Real
Server3
Content Switch 17
LVS-NAT Configuration
Step 3. Real Server Replies
• Real server retrieves response.
• All real servers set default gateway to Director; like any other
NAT or IP masquerade setup
• Packet will be sent back to Director.
3. Process
CIP VIP
1. request
Internet
RIP1 Real Request
2. Scheduling/
Server1
Rewrite packet CIP RIP1
RIP1 CIP RIP2
Real
Server2
VIP
CIP
Client
4/11/2003
Director
Switch
RIP3
Real
Server3
Edward Chow
Content Switch 18
LVS-NAT Configuration
Step 4. Director rewrites reply
• Director changes the dst IP addr. (RIP1) of pkt to VIP
• Modify port # if needed.
• Modify the checksum; send back pkt.
3. Process
RIP1 Real Request
CIP VIP
2. Scheduling/
Server1
CIP RIP1
Rewrite
packet
1. request
RIP1 CIP RIP2
Internet
Real
Server2
VIP
VIP CIP
Switch RIP3
Director
CIP
4. Rewrite reply
Client
Real
Server3
4/11/2003
Edward Chow
Content Switch 19
LVS-NAT Configuration
(Network Address Translation)
• All return traffic go through DirectorSlow
• Modify IP addr/port #/Checksum at Director.
• Director and real servers at same LAN
CIP VIP
1. request
Internet
VIP CIP
CIP
Client
5. Receive reply
4/11/2003
3. Process
RIP1 Real Request
2. Scheduling/
Server1
Rewrite packet CIP RIP1
RIP1 CIP RIP2
Real
Server2
VIP
Director
Switch
4. Rewrite reply
Edward Chow
RIP3
Real
Server3
Content Switch 20
LVS-NAT Setup Commands
# make the director forward the masquerading packets
echo 1 > /proc/sys/net/ipv4/ip_forward
ipchains -A forward -j MASQ -s 172.16.0.0/24 -d 0.0.0.0/0
# Add virtual service and link a scheduler to it
ipvsadm -A -t 202.103.106.5:80 -s wlc (Weighted Least-Connection scheduling)
ipvsadm -A -t 202.103.106.5:21 -s wrr (Weighted Round Robin scheduling )
#Add real servers and select forwarding method and weight
ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.2:80 -m
ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.3:8000 -m -w 2
ipvsadm -a -t 202.103.106.5:21 -R 172.16.0.2:21 -m
4/11/2003
Edward Chow
Content Switch 21
LVS-Tunnel Configuration
(IP Tunneling)
• Real Servers need to handle IP over IP packets.
• Real Servers can be geographically separated and return traffic
go through different routes.
• Security implication!
RIP1 Real
Server1
2. Scheduling/
3. Process
Request
RIP2
Put packet in IP Tunnel
1. request
CIP VIP
Internet
CIP
Client
4. Receive reply
4/11/2003
RIP0 IP Tunnel
RIP0 RIP2
VIP
Load Balancer
Linux Box
VIP CIP
Edward Chow
CIP VIP
Real
Server2
RIP3
Real
Server3
Content Switch 22
LVS-Tunnel Setup Commands
#The load balancer (LinuxDirector), kernel 2.2.14
echo 1 > /proc/sys/net/ipv4/ip_forward
ipvsadm -A -t 172.26.20.110:23 -s wlc
ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -i
#The real server 1, kernel 2.2.14
echo 1 > /proc/sys/net/ipv4/ip_forward
# insert it if it is compiled as module
insmod ipip
ifconfig tunl0 172.26.20.110 netmask 255.255.255.255
broadcast 172.26.20.110 up
route add -host 172.26.20.110 dev tunl0
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden
4/11/2003
Edward Chow
Content Switch 23
•
LVS-DR Configuration
(Direct
Routing)
Real servers need to configure a non-arp alias interface
with virtual IP address and that interface must share same physical
segment with load balancer.
•
Only Director’s interface replies to VIP ARP request.
•
Director only rewrites server MAC address;
IP packet not changed Fast!
2. Scheduling/
Rewrite packet
VMAC
1. request
GMAC VMAC CIP VIP
RMAC2
Internet
VMAC RMAC3
CIP
Client
4/11/2003
Real
RMAC1 Server1
Director
Route/
Switch
GMAC: Gateway MAC address
Edward Chow
CIP VIP
Real
Server2
RMAC3
Real
Server3
Content Switch 24
LVS-DR Configuration
Step 3. Process Request
• Real server returns request.
• Request goes directly through
switch/router; not Director.
1. request
GMAC VMAC
Server1
RMAC2
Internet
VIP CIP
CIP
VMAC RMAC3
Switch
Client
4. Receive reply
4/11/2003
2. Scheduling/
Rewrite packet
VMAC Linux
Real
Director
CIP VIP
RMAC1
CIP VIP
RMAC3
RMAC3 GMAC VIP CIP
GMAC: Gateway MAC address
Edward Chow
Real
Server2
3. Process
Real Request
Server3
Content Switch 25
LVS-DR Setup Commands
#The load balancer (LinuxDirector), kernel 2.2.14 or later
echo 1 > /proc/sys/net/ipv4/ip_forward
ipvsadm -A -t 172.26.20.110:23 -s wlc
ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 –g
#The real server 1, 172.26.20.112, kernel 2.2.14 or later
echo 1 > /proc/sys/net/ipv4/ip_forward
ifconfig lo:0 172.26.20.110 netmask 255.255.255.255
broadcast 172.26.20.110 up
route add -host 172.26.20.110 dev lo:0
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
echo 1 > /proc/sys/net/ipv4/conf/lo/hidden
4/11/2003
Edward Chow
Content Switch 26
Performance of LVS-based
Systems
“We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14
kernel)directing about 20,000 HTTP requests/second to a bank of
about 20 Web servers answering with tiny identical dummy responses
for a few minutes. Worked just fine.”
Jerry Glomph Black, Director, Internet & Technical Operations,
RealNetworks.
“I had basically (1024) four class-Cs of virtual servers which were
loadbalanced through a LinuxDirector (two, actually -- I used redundant
directors) onto four real servers which each had the four different classCs aliased on them.”
"Ted Pavlic" <[email protected]>
4/11/2003
Edward Chow
Content Switch 27
LVS Usage Survey 2/15/2001 Lorn Key
Clusters
20
1
2
2
2
Directors
Per Cluster
2
2
2
2
2
Total Real
Servers
170
12
4
15
6
Routing
Methods
DR/NAT
DR
NAT
DR
NAT
Schedule
Methods
RR/WLC
WRR
LC
WLC
WLC
Types of Real
Servers
RH6.2
Linux
Win
Linux
Linux
Solaris
RH
Service
Offered
WWW
WWW/
other
WWW
DB
WWW
SMTP
WWW
File System
Replication
rsync
rsync
Coda
NFS
Custom
rsync
custom
Monitoring
Software
Heartbeat
ldirectord
Nanny/
Pulse
Heartbeat
Mon
Nanny
Pulse
Heartbeat
4/11/2003
Edward Chow
Content Switch 28
C. Edward Chow
Department of Computer Science
University of Colorado at Colorado Springs
Sponsored by Computer Comm. Lab/ITRI
Content Switch Topics
•
•
•
•
•
•
•
•
What is a Content Switch?
What Services it Can Provide
Content Switch Example
Related Technologies
Content Switch Architecture and Basic Operations
TCP Delay Binding and Related Improvement
Content Switch Rule and Conflict Detection
Conclusion
4/11/2003
Edward Chow
Content Switch 30
Content Switch (CS)
• Route packets based on high layer (Layer 5/7) headers
and content.
• Examples:
– Direct Web traffic based on pattern of
• URLs, cookies – URL Switching
• XML Tag Value– Web Switching
– Can Route incoming email based on email address;
Connect POP/IMAP based on login
• Web switches and Intel XML Director/accelerator are
special cases of content switch.
4/11/2003
Edward Chow
Content Switch 31
What Services It Can Provide
• Enabling premium services for e-commerce, ISP, and
Web hosting providers
• Load Balancing and High Available Server Clusters:
Web, E-commerce, Email, Computing, File, SAN
• Policy-based networking, differential/QoS services.
• Firewall, Strengthening DoS protection, cache/firewall
load-balancing
• ‘Flash-crowd' management
• Email Spam Protection, Virus Detection/Removal
• Applet Authentication/Filtering
4/11/2003
Edward Chow
Content Switch 32
F5 VRM Solution
Site II
losangeles.domain.com
Internet
Internet
Site I
newyork.domain.com
Router
3-DNS
BIG-IP
BIG-IP
Local DNS
GLOBAL-SITE
Webmaster
Site III
tokyo.domain.com
Server Array
User
london.domain.com
4/11/2003
Edward Chow
Content Switch 33
ServerIron 100
Web Switch
• Integrated Layer 2 through Layer 7 switching
• Support for up to 7,000,000 concurrent sessions, and 20 Gbps of
throughput
• High-availability server load balancing with active/active
configuration and stateful fail-over
• Industry's most powerful content switching capabilities, including
URL, Cookie and SSL Session ID based switching
• Content-aware cache switching
• High performance VPN/Firewall load balancing
• Robust protection against Denial of Service (DoS) attacks
• Most comprehensive global server load balancing with DNS Proxy
and client proximity measurements
4/11/2003
Edward Chow
Content Switch 34
Cisco CSS11000
Content Service Switch
comprises four high-speed RISC processors,
with 512 MB of memory, and 20.0 Gbps of
throughput, Distributed flow forwarding engines
feature up to 16 port-level network processors
with up to 128 MB of memory for wire-speed
delivery of Web content. Support for "sticky"
connections based on IP address, Secure Socket
Layer (SSL) session ID, and cookies ensures
reliability and security for e- commerce
transactions. The unique Cisco content
replication technology enables dynamic
expansion of site capacity in response to sudden
"flash crowds" for "hot" content or seasonal
peaks in traffic that can overwhelm servers.
4/11/2003
Edward Chow
Content Switch 35
Nortel Alteon
Web Switch
•
•
•
•
•
Provides wire-speed Layer 2/3 Ethernet switching, plus high-speed
processing based on Layer 4 through 7 information (TCP ports, URLs,
HTTP headers and cookies, SSL session ID, etc.)
Processes hundreds of thousands of concurrent sessions each second on
eight multi-rate Ethernet ports, (rate selectable per port), with one Gigabit
or 100/1000 Mbps Ethernet uplink port
Performs local and global server load balancing, application redirection,
content filtering, streaming media load balancing, wireless Internet load
balancing and content-aware Layer 7 switching
Filters packets based on up to 2048 filtering rules (224 filtering rules for
Alteon AD3/180e Web Switches), uniquely definable per switch and per
port
Meters, controls, and accounts for bandwidth use-by client, server farm,
virtual service, application, user class, content type and other traffic
classes-and supports guaranteed minimum, metered available, and
maximum burst bandwidth rates
4/11/2003
Edward Chow
Content Switch 36
Intel Netstructure
XML Director 7280
• Example of Rule:
Server1: create */order.asp & //Amount[Value >= 10000]
4/11/2003
Edward Chow
Content Switch 37
Phobos In-Switch
• Only load balancing switch in a PCI card form factor
• Plugs directly into any server PCI slot
• Supports up to 8,192 servers, ensuring availability and
maximum performance
• Six different algorithms are available for optimum performance:
Round Robin, Weighted Percentage, Least Connections,
Fastest Response Time, Adaptive and Fixed.
• Provides failover to other servers for high-availability of the web
site
• U.S. Retail $1995.00
4/11/2003
Edward Chow
Content Switch 38
E-Commerce Example: 1. Client
Client submits via HTTP/Post (or SOAP) the following purchase in XML:
<purchase>
<customerName>CCL</customerName>
<customerID>111222333</customerID>
<item><productID>309121544</productID>
<productName>IBM Thinkpad T21</productName>
<unitPrice>5000</unitPrice>
<noOfUnits>10</noOfUnits>
<subTotal>50000</subTotal>
</item>
<item><productID>309121538</productID>
<productName>Intel wireless LAN PC Card</productName>
<unitPrice>200</unitPrice>
<noOfUnits>10</noOfUnits>
<subTotal>2000</subTotal>
</item>
<totalAmount>52000</totalAmount>
</purchase>
4/11/2003
Edward Chow
Content Switch 39
E-Commerce Example:
2. Content Switch
• Content switch receives the packet.
• Recognize it is a http post request from http request line
POST /purchase.cgi HTTP/1.1
• Recognize it is an XML document from the meta header
content-type: TEXT/XML
• Parsing XML content
• Extract values
of
tag sequences:
52000
purchase/totalAmount
CCL
purchase/customerName
• Rule 1 is matched and packet is routed to one of highSpeedServers.
Rule 1: if (xml.purchase/totalAmount > 5000) routeTo(highSpeedServers);
Rule 2: if (xml.purchase/customerName == CCL) routeTo(specialCustomerServers);
4/11/2003
Edward Chow
Content Switch 40
No Free Lunch:
Penalty of Having Content Switch
•
Layer 4 Switching Layer 7 Switching
packet header extraction fixed short fields
varying length long fields
switch rule matching
hash table look up pattern matching
 Increased packet processing time.
• For XML Director/Accelerator, it needs to parse XML
document and match tag sequences.
 1-3? order of processing time
Size of XML Document (Bytes) XML Content Extract Time (ms)
600
14
7000
21
67104
53
4/11/2003
Edward Chow
Content Switch 41
Related Technologies
• Application level solution:
Proxy server; Apache/Tomcat/Servlet; Microsoft NLB
• Kernel level layer 4 load balancing solution:
http://www.linuxvirtualserver.org/
– Joseph Mark’s presentation
– LVS-NAT(Network Address Translation) web page
– LVS-IP Tunnel web page
– LVS-DR (Direct Routing) web page
• Hardware solution: Cisco 11000, F5 (Big IP), Alteon Web
Systems, Foundry Networks (ServerIron),
Excellent information on: Foundry ServerIron Installation and
Configuration Guide, May 2000.
http://www.foundrynet.com/services/documentation/si
ug/
4/11/2003
Edward Chow
Content Switch 42
Basic Operations of Content Switching
CS: Content Switching
CS
Rules
Incoming
Packets
CS Rule
Editor
Packet Classification
Header
Content
Extraction
Network Path Info
Server Load Status
4/11/2003
CS
Rule Matching Algorithm
Packet Routing
(Load Balancing)
Edward Chow
Forward
Packet
To
Servers
Content Switch 43
Content Switch Architecture
Apostolopoulos
Infocom 2000
4/11/2003
Edward Chow
Content Switch 44
Content Switch Architecture
Case A:
Controller finds
there is an entry in its Hash Table,
Route request to “sticky connection”
outgoing port
Real
Server1
Hash
Table
4/11/2003
Edward Chow
Client
Content Switch 45
Content Switch Architecture
Case B:
Step 1. Controller finds
there is no entry in Hash Table,
Route request to
content switch processor
Real
Server1
Hash
Table
4/11/2003
Edward Chow
Client
Content Switch 46
Content Switch Architecture
CS
Rules
Step2. CS processor
a. Extract content/Match CS rules
b.Route request
c. Setup Sequence# modification
on server side port
Case B: Step 1. Controller finds
there is no entry in Hash Table,
Route request to
content switch processor
pkt
Modification
info
Hash
Table
Client
4/11/2003
Edward Chow
Real
Server1
Content Switch 47
Content Switch Architecture
CS
Rules
Step2. CS processor
a. Extract content/Match CS rules
b.Route request
c. Setup Sequence# modification
on server side port
Real
Server1
Case B: Step 1. Controller finds
there is no entry in Hash Table,
Route request to
content switch processor
pkt
Modification
info
Step 3. At server side port,
Return pkts are modified
Sequence#/IP addr/Chksum
Route back to client
Hash
Table
Client
4/11/2003
Edward Chow
Content Switch 48
Efficient Content Switching Architecture
• Tasks: Million packets with thousand of rules to match and load
balancing algorithms to run.
• How to assign tasks to the (network) processors and threads?
– Packet Extraction
(Understand header formats, XML parsing)
– Content Switching Rule Matching
– Packet Routing
(Load Balancing, Bandwidth Control)
• How Much Packet Processing Should Controllers Do?
• What a controller can do?
• A Typical Parallel Processing Problem?
4/11/2003
Edward Chow
Content Switch 49
TCP Delay Binding (Splicing)
client
server
content switch
SYN(CSEQ)
step1
SYN(DSEQ)
ACK(CSEQ+1)
step2
step3
ACK(DSEQ+1)
DATA(CSEQ+1)
ACK(DSEQ+1)
step5
step6
step7
step8
DATA(DSEQ+1)
ACK(CSEQ+LenR+1)
step9
step4
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
ACK(SSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(CSEQ+lenR+1)
step10
ACK(DSEQ+ lenD+1)
DATA(?) 2nd request
ACK(?)
4/11/2003
step11
ACK(SSEQ+lenD+1)
lenR: size of http request.
.
lenD: size of return document
Edward Chow
Content Switch 50
Improve Content Switching
• Setup CS-Real Server connections ahead of time
(Persistent HTTP Connections). NetScale
 Reduce TCP 3-way handshake time
• Pre-allocate Server Scheme (Guess Real Server
based on the TCP Sync)
• Sequence# modification on every return pkt  Need
to recompute checksum also.
• Filter Scheme (Offload Sequence# modification/rule
matching to real servers).
• Buffering/Pipeline (aggregate) Requests
4/11/2003
Edward Chow
Content Switch 51
Pre-Allocate Server Scheme
client
content switch
SYN(CSEQ)
ACK(SSEQ + 1)
step1
SYN(SSEQ)
ACK(CSEQ+1)
DATA(CSEQ+1)
ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(CSEQ+LenR+1)
step2
step3
SYN(CSEQ)
SYN(SSEQ)
ACK(CSEQ+1)
ACK(SSEQ+1)
step4 DATA(CSEQ+1)
ACK(SSEQ+1)
step5
DATA(SSEQ+1)
step6
ACK(SSEQ+ lenD+1)
Pre-allocated
server
ACK(CSEQ+lenR+1)
ACK(SSEQ+lenD+1)
• Guess routing decision based on IP/Port#/History
. • Advantage:
• Faster than TCP delay binding.
• Possible direct route between client and server
• Reduce session processing overhead
no need to convert
server
4/11/2003
Edward
Chowsequence #
Content Switch 52
Degenerated to TCP Delayed Binding If Guess
is Wrong
Pre-allocated
client
SYN(CSEQ)
content switch
step1
SYN(SSEQ)/ ACK(CSEQ+1)
ACK(SSEQ + 1)
DATA(CSEQ+1)/ ACK(SSEQ+1)
Server sent HTTP 404
step4 DATA(CSEQ+1)/ACK(SSEQ+1)
step5 DATA(SSEQ+1)
FIN(CSEQ+lenR+1))
Right server
SYN(CSEQ)
SYN(RSEQ)/ ACK(CSEQ+1)
step7
step8
ACK(SSEQ+lenD+1
4/11/2003
step11
server
step2 SYN(SSEQ)/ ACK(CSEQ+1)
step3
ACK(SSEQ+1)
step6
Sequence #
step9
conversion needed
for right server now
step10
DATA(SSEQ+1)/ACK(CSEQ+LenR+1)
SYN(CSEQ)
ACK(RSEQ+1)
DATA(CSEQ+1)/ACK(RSEQ+1)
DATA(RSEQ+1)/ACK(CSEQ+lenR+1)
step12 ACK(RSEQ+lenD+1)
Edward Chow
Content Switch 53
Filter Process Scheme
client
content switch
SYN(CSEQ)
step1
SYN(DSEQ)/ACK(CSEQ+1) step2
ACK(DSEQ+1)
Filter Process
run on server
step3
DATA(CSEQ+1)/ACK(DSEQ+1)
step5a
step4
step5b SYN(CSEQ)
Migrate
(Data, CSEQ, DSEQ)
SYN(SSEQ)/ ACK(CSEQ+1)
step6
step7
step8
DATA(DSEQ+1)
ACK(CSEQ+LenR+1)
ACK(DSEQ+lenD+1)
4/11/2003
server
step9
step10
Edward Chow
ACK(SSEQ+1)
DATA(CSEQ+1)/ACK(SSEQ+1)
DATA(SSEQ+1)
ACK(CSEQ+lenR+1)
ACK(SSEQ+lenD+1)
Content Switch 54
Pre-allocate performance plot
microseconds
Plot of response time vs document size
Series 1 - Basic scheme with no
rule matching module inserted,
i.e., using default IPVS.
500000
480000
460000
440000
420000
400000
380000
360000
340000
320000
300000
280000
260000
240000
220000
200000
180000
160000
140000
120000
100000
80000
60000
40000
20000
0
Series1
Series2
Series 2 - Basic scheme with
the rule matching module
inserted.
Series3
Series4
0
10000
20000
30000
40000
bytes
Figure 3. Performance of Pre-allocate Server Scheme
4/11/2003
Edward Chow
Series 3 - Pre-allocate scheme
with all hits, i.e., where all preallocate guesses were correct.
Series 4 - Pre-allocate scheme
with all misses, i.e., where all
pre-allocate guesses were
wrong.
Content Switch 55
Handling multiple requests
in a Keep-Alive connection
• Determine when new request arrives
– Verify that previous request has been completely received
– Request data size is > 0
• Key assumption is only one outstanding request is
sent at a time by client, i.e., requests are not
pipelined
• Reuse connections
– Store each connection control information in a
hash table keyed by real server address, once it is
established.
4/11/2003
Edward Chow
Content Switch 56
Quiz
• Web server keeps the TCP connection alive,
expecting the browser to return for images and in-line
media files.
• How many keep-alive connections are setup on IE5
and Netscape 4.7 for web page with many .jpg/.gif
images?
• Can these image requests be pipelined from client
browser to web server?
4/11/2003
Edward Chow
Content Switch 57
Multiple HTTP Requests from One TCP Connection
NAT approach
server1
Content
Switch
client
Index.htm
cs.jpg
server2
.
.
.
server9
• A keep alive TCP connection may include multiple HTTP “GET” requests.
• Content Switch examines each “GET” request and makes new routing decision.
• Content Switch establishes another connection with a different server based
on the routing decision.
• Those HTTP responses from different servers need to be interleaved and
seen by the user as if from the same server.
• Solutions: In order delivery (buffer requirement); Out of order delivery (seq# tracking)?
• Problems: Should we throw away earlier html requests if receive later requests?
4/11/2003
Edward Chow
Content Switch 58
Multiple HTTP Requests from One TCP Connection
server1
client
Content
Switch
server2
.
.
.
server9
•
•
Can servers return documents directly to client in keep-alive session
case?
Can equivalent VS-Tunnel or VS-DR be implemented using Content
Switch?
4/11/2003
Edward Chow
Content Switch 59
Content Switch Rule Survey
Survey shows that existing switches support
• rules in basic (condition action) or (action condition)
form
• some define condition as class, then specify the
action in separate statement or command
• simple single conditional term
• command line interface (to facilitate incremental
update?)
• Actions can include reject, forward, put in queue (for
bandwidth control, scheduling)
4/11/2003
Edward Chow
Content Switch 60
Content Switch Rule Design
• Rule syntax generic to support all Intended features.
• Use simple C if statement syntax rule: if (condition) { action }
– Easy to read
– Allow optimization using c compiler
• Condition consists of multiple terms of
– variable relational_operator value
e.g. xml.purchase/totalAmount > 50000
smtp.to == “[email protected]”
cookie.name == “servlet1”
bitmatch(64, 8, 0xff) == 64
# above mean TTL=64 idea from netfilter universal filter
– suffix(variable, string) e.g. suffix(url, “gif”)
– regex(variable, pattern) e.g. regex(url, “/purchase”)
• Action consists of reject, forward(server| queue)
loadBalance(serverGroup, loadBalancingAlgorihtm)
4/11/2003
Edward Chow
Content Switch 61
Efficient CS Rule Matching
• Brute force, strict priority: Rules are executed in
sequential manner.
• Efficient Rule Matching Method:
– Organize Rules so that rules can be skipped
based on existing content types.
– Utilize compiler optimization technique.
4/11/2003
Edward Chow
Content Switch 62
Simple CS Rule Editor GUI
4/11/2003
Edward Chow
Content Switch 63
Conflict Detection on
Content Switching Rules
• Detect conflicts among rules or rule set.
• Absolute conflict type:
r1: if (xml.purchase/customerName == “CCL”) {routeTo(r1)}
r2: if (xml.purchase/customerName == “CCL”) {routeTo(r2)}
• Potential conflict type:
r1: if (xml.purchase/totalAmount > 5000) {routeTo(quickServers)}
r2: if (xml.purchase/totalAmount >20000) {routeTo(superServers)}
• Algorithm: Build tree with the same variable, check operator and
value to see if they are the same or lead to potential conflict,
compare actions to decide conflict type or duplication.
• Developed conflict detection algorithm for rules with multiple term
condition. Can be applied to policy-based rules conflict detection.
• Editor can build these trees while a user enters rules and warns
about conflict right away.
4/11/2003
Edward Chow
Content Switch 64
XML Tag Value Extraction
• A xmlContentExtract() is built to extract the tag values
of a list of unique tag sequences.
• It is based on clark cooper’s expat 1.0 xmlparser.
• Its argument include the pointer to an XML document,
the pointer to the array of strings (unique xml tag
squences we follow the xsl selector syntax), and the
number of sequences.
• It return the list of a structure node, with the tag
sequence, its attribute, and its value.
• Currently, it supports one attribute and tag sequece
needs to be unique.
4/11/2003
Edward Chow
Content Switch 65
Persistence Handling in LVS
• Some network applications require packets from
same users/sessions be routed to same real servers.
– For consistent treatment?
– For fast performance, e.g. servers maintain
persistent data/info for sessions
• Tomcat web server returns cookie value so that
return client requests can be routed to the same
Tomcat web server.
• But cookie value is in HTTP header, a Layer 7 info.
Layer 4 switch cannot access it.
• This is so called persistence handling problem.
• One solution: Sticky connection. Same IP address
served by same server.
4/11/2003
Edward Chow
Content Switch 66
Persistent handling Problems
FTP Case:
• Normally FTP uses port 21 for control, port 20 for data.
• But for passive FTP, the server tells the clients the port that it
listens to. The client initiates the data connection connecting to
that port.
• For the LVS/TUN and LVS/DR, LinuxDirector is only on the clientto-server half of the connection, so it is impossible for LinuxDirector
to get the data port from the packet that goes to the client directly.
SSL Session Case:
• port 443 for secure Web servers and port 465 for secure mail
server,
• key for connection must be chosen/exchanged and only the initial
real server has the key.
• Persistent or sticky connection is needed.
4/11/2003
Edward Chow
Content Switch 67
Persistent Connection Solution
• When the client first accesses the service,
LinuxDirector creates a template between the given
client and the selected server, then create an entry
for the connection in the hash table.
• The connections for any port from the client will send
to the server before the template expires.
• The template expires in a configurable time, and the
template won't expire until all its connections expire.
• The timeout of persistent templates can be
configured by users, and the default is 300 seconds
4/11/2003
Edward Chow
Content Switch 68
Problems Encountered in The
Design of Linux-based Content
Switch
•
•
•
•
Handle a Request Contained in Multiple Packets
Handle Different Data Encoded Methods
Allow Referencing Specific XML Tags
Handle Long Transactions in SSL and Email network
services
4/11/2003
Edward Chow
Content Switch 69
Handle a Request Contained in
Multiple Packets
• For a long request, its headers and content will be carried
by the multiple packets due to packet size limitation.
• We have observed Netscape 4.7 spliting a short request
<1000 into two packets
• Due to interleaving with other sessions, packets of the
same session may not be allocated consecutive memory.
• Even packets of the same session arrives without
interleaved with packets of other sessions, application level
data will be fragmented in kernel packet buffer such as
skbuf.
• Matching application data pattern in the kernel is tricky.
4/11/2003
Edward Chow
Content Switch 70
Example: Determine Content
Length
TCP Segment n contains:
POST /cgi-bin/cs622/purchase.pl HTTP/1.0\r\n
Referer: http://archie.uccs.edu/~acsd/lcs/xmldemo.html\r\n
Connection: Keep-Alive\r\n
User-Agent: Mozilla/4.75 [en] (X11; U; Linux 2.2.16-22enterprise i686) \r\n
Host: viva.uccs.edu\r\n
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png,
*/*\r\n
Accept-Encoding: gzip\r\n
Accept-Language: en\r\n
Accept-Charset: iso-8859-1,*,utf-8\r\n
Content-type: application/x-www-form-urlencoded\r\n
Content-length: 7
TCP Segment n+1 contains:
53\r\n
data (753 bytes)
4/11/2003
Edward Chow
Content Switch 71
Potential Solutions
• Allocate application data of a session in the
consecutive memory Major rework on most kernel
packet buffer allocation scheme.
• Use carry lookahead memory hardware.
• Coding complicated pattern matching code that can
match pattern over fragmented data.
• Use application level content switching bear the
overhead of data copying from kernel to application
level.
4/11/2003
Edward Chow
Content Switch 72
Handle Different Data Encoding
Methods
• XML data can be passed in plain/text.
• When submitting it with form, the XML request data
are encoded using the x-www-form-urlencoding
method
• When extracting XML data for rule matching, different
data encoding methods need to be detected through
the content-type header.
4/11/2003
Edward Chow
Content Switch 73
An E-Commerce XML Example
Client submits via HTTP/Post (or SOAP) the following purchase in XML:
<purchase>
<customerName>CCL</customerName>
<customerID>111222333</customerID>
<item><productID>309121544</productID>
<productName>IBM Thinkpad T21</productName>
<unitPrice>5000</unitPrice>
<noOfUnits>10</noOfUnits>
<subTotal>50000</subTotal>
</item>
<item><productID>309121538</productID>
<productName>Intel wireless LAN PC Card</productName>
<unitPrice>200</unitPrice>
<noOfUnits>10</noOfUnits>
<subTotal>2000</subTotal>
</item>
<totalAmount>52000</totalAmount>
</purchase>
4/11/2003
Edward Chow
Content Switch 74
Allow Referencing Specific XML Tags
• An ambiguous XML tag sequence specification can
match multiple instances.
• To avoid that and to speed up the matching, we
propose the use of XML tag sequence specification
that enables us to specify the specific XML tag
sequence.
• For example, To specify a rule based on subTotal
value present in the second item tag within the first
purchase tag, the condition of the rule will be
specified as “purchase:1.item:2.subTotal > 5000”.
• As another example, “purchase:2.totalAmount <
15000” specifies the condition of a rule based on the
totalAmount tag present within the second purchase
tag.
4/11/2003
Edward Chow
Content Switch 75
Handle Long Transactions in SSL
and Email network services
• some of the packet processing functions are better
handled at the application level.
• For example, there are a lot of packages, including
McAfee’s uvscan and AMAVis scanmail, mutt
(recombine email component), for detecting and
removing email virus, but almost all of them are
implemented in application level and interact with the
sendmail program. It will require significant effort to
rewrite them as kernel modules.
• Same observations were derived on SSL processing.
4/11/2003
Edward Chow
Content Switch 76
Web Switching/SSL processing overhead and Performance
differences btw Prefork and Dynamic fork
Overall WebBench Requests/Second
Requests / Second
300.000
Request Per Second Prefork
NonSSLProxy
250.000
Request Per Second Dynamic
NonSSLProxy
200.000
150.000
Request Per Second Apache
NonSSL
100.000
50.000
Request Per Second Dynamic
SSLProxy
1_
cl
ie
nt
8_
cl
ie
nt
16
_c
li e
nt
24
_c
li e
nt
32
_c
li e
nt
40
_c
li e
nt
48
_c
li e
nt
56
_c
li e
nt
0.000
Clients
•
•
Request Per Second Prefork
SSLProxy
Request Per Second Apache
SSL
Significant SSL processing overhead. 240 req/sec vs. 38 req/sec
Content switching processing overhead may reduce the performance to
lower than single web server. What we gain here? How we can improve it?
4/11/2003
Edward Chow
Content Switch 77
IXP1200-based Content Switch
• We have ported OpenSSL and our Linux Secure Web
System to run on IXP12EB with VxWork.
• Using WindRiver’s Tornado II IDE.
• Preliminary version run purely on StrongArm core.
• Currently working on offload header extraction and
rule matching code to run as hardware threads on
microengines.
4/11/2003
Edward Chow
Content Switch 78
Intel IXP1200 NP and
IXP12EB
•
•
The IXP 1200 Network Processor
The IXP12EB Evaluation Board:
– PCI form factor board based on IXP1200 Network
Processor
– eight 10/100 Mbps ports
– two Gigabit Ethernet ports
– PCI back-plane and an Ethernet Network
Interface Card (NIC)
4/11/2003
Edward Chow
Content Switch 79
IXP 1200 Network Processor
4/11/2003
Edward Chow
Content Switch 80
Packets Receiving &
Transmitting
4/11/2003
Edward Chow
Content Switch 81
Agere Network Processor
The following figures are from Douglas Comer’s new text
“Network System Design using Network Processors”
4/11/2003
Edward Chow
Content Switch 82
Agere’s FPP
4/11/2003
Edward Chow
Content Switch 83
Agere’s RSP
4/11/2003
Edward Chow
Content Switch 84
Alchemy’s
Au1000
4/11/2003
Edward Chow
Content Switch 85
Applied Micro
Circuit Corp
nP7510
4/11/2003
Edward Chow
Content Switch 86
Cisco
Parallel
eXpress
Forwarding
(PXF)
4/11/2003
Edward Chow
Content Switch 87
Cognigine’s Reconfigurable
Communication Unit (RCU)
4/11/2003
Edward Chow
Content Switch 88
EZChip NP-1
4/11/2003
Edward Chow
Content Switch 89
IBM
PowerNP
4/11/2003
Edward Chow
Content Switch 90
IBM NP
Embeded
Processor
Complex
4/11/2003
Edward Chow
Content Switch 91
Motorola’s
C-Port
4/11/2003
Edward Chow
Content Switch 92
Motorola
Single CP
4/11/2003
Edward Chow
Content Switch 93
Packet Flow and IXP2400
4/11/2003
Edward Chow
Content Switch 94
Intel
IXP2400
4/11/2003
Edward Chow
Content Switch 95
HA-LVS Configuration
High Available
Client
CIP
MON
Internet
1. When Backup Director
detects Linux Director failure
through heart beat protocol,
“graciously negotiate”
the take-over of VIP
 Provide fault-tolerant
4/11/2003
Linux
Director
Heart
Beat
Real
Server1
Real
Server2
Real
Backup
Server3
Director2. Monitor server processes
run on real servers
 Route requests to server processes
that are alive. Initiate restart/repair
MON
Edward Chow
Content Switch 96
High Available Web Server
MON
Cluster
Real
Client
CIP
Server1
Web
Switch1
Internet
1. Web Switch detects the failure of
other web switch
Take over the processing of routing
request.
Heart
Beat
MON
Web
Switch2
Real
Server2
Real
Server3
2. Web switch monitors server processes run on real servers.
When they die,
• route requests to server processes that are alive.
• Rewrite web switching rule. Initiate restart/repair
4/11/2003
Edward Chow
Content Switch 97
Status of UCCS ACSD Project
•
•
•
•
•
•
•
•
•
•
•
Two versions of Linux Kernel -based LCS content switch, LCS01, LCS02
were developed.
A Linux Application level secure web switch (LSWS) was developed using
OpenSSL package.
LSWS is ported to run on Intel IXP12EB and IXP1200 network processor
with Windriver VxWork.
Part of the above research projects are sponsored by CCL/ITRI.
Based on Linux-2.2.16-3, current release LCS02.
Being ported to Linux-2.4.18 and integrated with KTCPVS.
ip_forward.c, ip_masq.c, ip_vs.c are modified to implement basic TCP
delay binding.
ip_cs.c are added for most of the content switching functions with http
header extraction and xml content extraction.
A simple Java-based ruleEdit program was created for rule editing and
conflict detection. A C-based program can detect conflicts among rules with
regular expression in their condition expression.
Rule translate program to convert the rule set into a Linux kernel module
and allow dynamic replacement of rule without restarting the system.
Currently working on integrating KTCPVS and provide unified
configuration/monitor command
4/11/2003
Edward Chow
Content Switch 98
LCS Demo
• We set up viva.uccs.edu as a content switch and wait
and ace as two real servers.
• URL Switching demo:
http://viva.uccs.edu/~lcs1/ route to ace.uccs.edu
http://viva.uccs.edu/~lcs2/ route to wait.uccs.edu
• XML Web Switching (E-commerce applications)
http://archie.uccs.edu/~acsd/lcs/xmldemo.html
When the 2nd subtotal tag >=50000, route to ace.
When the 2nd subtotal tag <50000, route to wait.
• Let us know if you have problem accessing them.
My students may be working on LCS extension.
4/11/2003
Edward Chow
Content Switch 99
LCS Rule Example
R4: if (atoi(rule_fields[1].value) >= 50000) {
return route_to("ace", NON_STICKY, saddr);
}
R5: if ((atoi(rule_fields[1].value) > 0) &&
(atoi(rule_fields[1].value) < 50000)){
IP_RULE_MSG("serevr=wait\n");
return route_to("wait", NON_STICKY, saddr);
}
R10: if (strstr(url, "lcs1") != NULL) {
IP_RULE_MSG("server=ace\n");
return route_to("ace", NON_STICKY, saddr);
}
R11: if(strstr(url, "lcs2") != NULL){
IP_RULE_MSG("server=wait\n");
return route_to("wait", NON_STICKY, saddr);
}
4/11/2003
Edward Chow
Content Switch 100
Intel 7280 Demo
• http://cs.uccs.edu/~chow/pub/master/ycai/doc/csdemo.html
4/11/2003
Edward Chow
Content Switch 101
Related Load Balancing
Research Results
• Modified Apache status module to report
– Total bytes to be transferred by child processes
– Average document transfer speed
• Modified LB-DNS to receive server status and
bandwidth probing results.
• LB-DNS returns IP-address of the best server based
a weight contributed by both server load and
bandwidth.
• Modified WebStone benchmark to test the
performance of load balancing web server clusters.
4/11/2003
Edward Chow
Content Switch 102
Load balancing Systems
Bandwidth Probe
Results
Statistics Gathering
Daemon
Modified Web Server
1
Server Delay
Server Ranking
/tmp/StatFile
Modified Web Server
n
4/11/2003
LBA: Modified
DNS
Edward Chow
Request for
Web pages
Content Switch 103
Connection Rate: LBA vs.
Round-Robin
Server connection rate for 4 servers
Connections/sec
1000
800
600
400
200
0
1
2
3
4
5
6
7
8
9
10
11
12
load balancing system 418.2 656.6 907.9 420 636.7 322.6 711.6 420.5 638.3 670.6 683.4 899
327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6
round-robin
Update for LBA , per sec
Round robin only run once
load balancing system
4/11/2003
Edward Chow
round-robin
Content Switch 104
Conclusion
• Content Delivery Network improves internet content retrieval
• LVS provides a low cost layer 4 switching service for cluster.
• Linux Content Switch with generic rules can be easily configured
for wide-variety of value-added services:
– Premium services
– Load balancing/High Available server farm.
– Firewall
– Bandwidth control/Traffic shaping
• Require efficient SW/HW architecture and rule matching
algorithms to reduce processing overhead.
• Content rule design/conflict detection are important and
challenging.
• TCP delay binding can be improved.
4/11/2003
Edward Chow
Content Switch 105
References
•
•
•
•
•
•
•
•
•
•
•
http://www.linuxvirtualserver.org/
http://www.akamai.com/
http://cs.uccs.edu/~chow/pub/contentsw/talk/contentswitching.ppt
[Aron2000] Aron, Mohit, “Differential and predictable QoS in web server systems”,
Ph.D dissertation Rice University, Oct. 2000.
[Zhang97] Lixia Zhang, Sally Floyd, and Van Jacobson, “Adaptive Web Caching,”
April 25, 1997. http://www-nrg.ee.lbl.gov/floyd/web.html
[Esi2001] Edge Side Includes, http://www.esi.org/.
[Chow2001a] C. Edward Chow and Indira Semwal, “Web Load Balancing Through
More Accurate Server Report,” Proceeding of PDCAT 2001, Taipei, Taiwan.
[Chow2001b] C. Edward Chow, Ganesh Godavari, and Jianhua Xie, “Content Switch
Rules and their Conflict Detection,” Proceeding of PDCAT 2001, Taipei, Taiwan.
[Chow2001c] C. Edward Chow and Weihong Wang, “The Design and Implementation
of Linux LVS-based Content Switch”, Proceeding of PDCAT 2001, Taipei, Taiwan.
[Aversa2000] Luis Aversa and Azer Bestavros, “Load Balancing a Cluster of Web
Servers: Using Distributed Packet Rewriting,” Proceedings of IPCCC 2000.
[Cao98] PeiCao, Jin Zhang and Kevin Beach, “Active Cache: Caching Dynamic
Contents on the Web” http://www.cs.wisc.edu/~cao/papers/active-cache.ps
4/11/2003
Edward Chow
Content Switch 106