Transcript part I

Improving Web Servers performance
Objectives:
 Scalable Web server System
 Locally distributed architectures
 Cluster-based Web systems
 Distributed Web systems
 Cluster-based solutions
 Distributed Web-based solutions
 Dispatching algorithms for cluster-based
Web systems
1
Reference
“The State of the Art in Locally
Distributed Web-server Systems”
Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni
and Philip S. Yu
2
Concepts
 Web server System is a system that
Provides web services
 The trend is
Increasing number of clients
 Growing complexity of web applications

 Scalable Web server systems

The ability to support large numbers of
accesses and resources while still providing
adequate performance
3
Architecture solutions for scalable
Web-server systems
4
Model architecture for a locally
distributed Web system
5
Locally Distributed Web
System
 Cluster Based Web System


The server nodes mask their IP addresses to clients, using a
Virtual IP address corresponding to one device (web switch)
in front of the set of the servers
Web switch receives all packets and then sends them to
server nodes
 Distributed Web System


The IP addresses of the web server nodes are visible to
clients
No web switch, just a layer 3 router may be employed to
route the requests
6
Cluster based Architecture
7
Distributed Architecture
8
Request routing mechanisms
 After classifying the two Web systems
 Cluster Based Web System
 Distributed Web System
 The question now becomes “how are packets
routed to each of the web servers?
9
Request routing mechanisms for
cluster-based Web systems
 layer-4 switch

Content-blind routing
 layer-7 switch


Content-aware switches
Also called Layer 5 switches in TCP/IP protocol
What are the trade-offs between layer-4 and layer7 switches?
10
Two Approaches
11
Taxonomy of cluster-based
architecture
12
Layer-4 two-way architecture
13
Layer-4 one-way architecture
14
Layer-4 one-way mechanisms
 Packet single-rewriting

Same as two-way architecture. The only difference is in the
modification of the source address of outbound packets
 Packet tunneling



This is also known as IP encapsulation
IP datagrams with IP datagrams
Requires that all servers support IP tunneling
 Packet frowarding




Assumes that the Web switch and the server nodes are on
the same LAN
All nodes share the VIP address
Server nodes need to disable ARP
Web switch forwards the inbound packet to the target
server without modifying the TCP/IP header
15
LAN Addresses
Each adapter on LAN has unique LAN address
16
LAN Address (more)
 MAC address allocation administered by IEEE
 manufacturer buys portion of MAC address space
(to assure uniqueness)
 Analogy:


MAC address: like Social Security Number
IP address: like postal address
 MAC flat address => portability
 IP hierarchical address NOT portable
17
Routing discussion
Starting at A, given IP
datagram addressed to B:
A
223.1.1.1
223.1.2.1
 look up net. address of B, find B
on same net. as A
 link layer send datagram to B
inside link-layer frame
frame source,
dest address
B’s MAC A’s MAC
addr
addr
223.1.1.2
223.1.1.4 223.1.2.9
B
223.1.1.3
datagram source,
dest address
A’s IP
addr
B’s IP
addr
223.1.3.27
223.1.3.1
223.1.2.2
E
223.1.3.2
IP payload
datagram
frame
18
ARP: Address Resolution Protocol
Question: how to determine
MAC address of B
knowing B’s IP address?
 Each IP node (Host or
Router) on LAN has
ARP table
 ARP Table: IP/MAC
address mappings for
some LAN nodes
< IP address; MAC address; TTL>

TTL (Time To Live): time
after which address
mapping will be forgotten
(typically 20 min)
19
ARP protocol
 A wants to send datagram
to B, and A knows B’s IP
address.
 Suppose B’s MAC address
is not in A’s ARP table.
 A broadcasts ARP query
packet, containing B's IP
address
 all machines on LAN
receive ARP query
 B receives ARP packet,
replies to A with its (B's)
MAC address

frame sent to A’s MAC
address (unicast)
 A caches (saves) IP-to-
MAC address pair in its
ARP table until information
becomes old (times out)
 soft state: information
that times out (goes
away) unless refreshed
 ARP is “plug-and-play”:
 nodes create their ARP
tables without
intervention from net
administrator
20
Layer-7 two-way architecture
21
Layer-7 two-way mechanisms
 TCP gateway
An application level proxy running on the web switch
mediates the communication between the client and the
server

Makes separate TCP connections to client and server
 TCP splicing
reduce the overhead in TCP gateway. For outbound packets,
packet forwarding occurs at network level by rewriting the
client IP address
22
Layer-7 two-way Mechanisms
 TCP gateway
An application level proxy running
on the web switch mediates the
communication between the client
and the server
user
kernel
 TCP splicing
reduce the overhead in TCP
gateway. Packet forwarding occurs
at network level between the
network interface driver and the
TCP/IP stack, is carried out
directly by OS
user
kernel
23
Content-aware Switch
www.yahoo.com
Internet
Image Server
IP
TCP
APP. DATA
Application Server
GET /cgi-bin/form HTTP/1.1
Host: www.yahoo.com…
Switch
HTML Server
• Front-end of a web servers
• Route packets based on layer 5/7 (content)
information
24
Why use Context-aware Switching
 Servers can be specialized for certain types of
request

Content segregation
 Exploit locality
 Affinity-based routing
 Increase the performance because of the improved hit
rate
 Partial replication of server file set
 Partition the server’s file set over different nodes
25
URL Parsing is expensive!!
 Performing content-aware routing implies that
some kind of string searching and matching
algorithm is required

Such a time-consuming function is expensive in a
heavy traffic web site
 Experience showed that the system
performance would be severely degraded if we
implement some URL parsing functions in the
distributor
26
TCP splicing
 Once the two TCP connections are established,
they are spliced

IP packets are forwarded at the network layer
 TCP splicing requires
 Connection binding
 Packet analyzer to rewrite packets
• Appropriate address translation
• Sequence number modifications to be performed on the
packets
 Basically, we are deploying connection re-use
27
Operation of Content-aware
Distributor
Client
connection
setup
(2)
Layer-7 Switch
SYN(CIS
N
)
pre-fork
connection
(1)
)
SYN(DISN
+1)
ACK(CISN
SYN(PISN
)
)
SYN(SISN
+1)
ACK(PISN
HTTP Kee
pAlive(PIS
N+1)
ACK(SISN
+1)
ACK(DIS
N+1)
Client sends
HTTP request
(3)
Server
HTTP requ
est(CISN+
1)
ACK(DIS
N+1)
Data(SSN')
)
ACK(PSN
(4)
ACK(SSN
=SSN'+x+
1)
Connection
reuse
Connection
Binding
+1)
Data(DISN
+len+1)
rewrite
packet
HTTP requ
est(PSN)
ACK(SSN
),Option(bi
nd)
Data(SSN)
+len+1)
ACK(PSN
ACK(CISN
ACK(DIS
N+len+1)
rewrite
packet
ACK(SSN
+len+1)
ta
End of da
ta, FIN
End of da
ACK
ACK
connection
Reuse
28
Layer-7 one-way architecture
29
Layer-7 one-way mechanisms
 TCP handoff


The switch hands off the TCP connection endpoint to the
server
Needs changes to the OS on both components
 TCP connection hop
Software-based proprietary solution
 encapsulating the IP packet and sending it to
the server

30
Layer-7 one-way mechanisms
 Migrate the created TCP connection from the
switch to the back-end sever


Create a TCP connection at the back-end without going
through the TCP three-way handshake
Retrieve the state of an established connection and
destroy the connection without going through the normal
message handshake required to close a TCP connection
 Once the connection is handed off to the back-
end server, the switch must forward packets from
the client to the appropriate back-end server
31
Summary
 So far, we have discussed:
32