COE 590 Special Topics: Parallel Architectures

Download Report

Transcript COE 590 Special Topics: Parallel Architectures

Programming Multi-Core
Processors based Embedded
Systems
A Hands-On Experience on Cavium Octeon
based Platforms
Lecture 4: Layering & Deep Packet Inspection
Course Outline




Introduction
Multi-threading on multi-core processors
Applications for multi-core processors
Application layer computing on multi-core



Application layer protocols
Deep packet inspection and content processing
Performance measurement and tuning
Copyright © 2009
4-2
KICS, UET
Agenda for Today

Application layer protocols




Application layering
Layer 4-7 applications
Header and deep packet inspection
Performance considerations


Case study: deep packet inspection
Case study: deep message inspection
Copyright © 2009
4-3
KICS, UET
Multi-Core Systems for Application Layer

Internet architecture typically consists of




Network core
Network edge and access network
End-point hosts
Compute capabilities within networks


Historically, network does simple operations
Compute intensive tasks left to end hosts


Examples: packet re-assembly, state-ful protocol
processing, content filtering and matching, etc.
However, “intelligence” moving to networks from
end hosts with increasing processor capabilities
Copyright © 2009
4-4
KICS, UET
Multi-Core Systems for Application Layer
(2)

Increasing processing capabilities in networks



Moore’s law in 90’s: increasing clock frequencies
Moore’s law in 00’s: increasing number of cores
Additional capabilities are being used for:



Access control, filtering, caching, load balancing, packet
header and content inspection, and intrusion detection
In addition to regular switching and routing
Enabling new computing paradigms:



Network as a platform
Software as a Service (SaaS)
Cloud Computing
Copyright © 2009
4-5
KICS, UET
Multi-Core Systems for Application Layer
(3)

Networks are leveraging from multi-core systems



One or more cores dedicated to routing/switching
Other cores for application layer processing
Performance challenges:



Layer 2 and 3 devices operate at link speeds
Users expect same performance even with application layer level
processing
Multiple challenges to deliver expected performance:





Processor-memory speed disparities
Processor-I/O speed disparities
OS and system software level overheads
Thread synchronization overheads
We address these topics today by:



Understanding the nature of application layer protocols
Specific examples: layer 4-7 filtering and deep packet inspection
Discussion of performance considerations at application layer
Copyright © 2009
4-6
KICS, UET
Application Layering
Distributed Operating Systems
By Andrew S. Tanenbaum
Application Layering

Increasing number of distributed applications

Multiple end-points




Clients, servers, databases, etc.)
At geographically dispersed locations
Connected through public or private (IP) networks
Application level issues:


Hide low level networking details  interface
Application level abstraction  protocols


Convenient to deal with at application layer
No need to change lower level protocols (layer 1-4)
Copyright © 2009
4-8
KICS, UET
Layered Protocols

Message-passing requires agreements at different
levels





Signal voltage levels to represent 0 or 1 bits
Detection of the last bit by the receiver
How can a receiver detect message errors or loss and what
should it do in that case
How numbers and strings should be represented
Open System Interconnection (OSI) reference model



Identifies various levels and gives them standard names
Points out the functions that belong to a particular level
Designed to allow open systems to communicate through
standard rules  protocols


Protocols govern message format, content, and their meanings
A group of computers must agree on protocols to communicate
Copyright © 2009
4-9
KICS, UET
The OSI Reference Model

Communication
divided into 7
layers:



Each layer deals
with a specific
aspect
Provides an
interface to the
one above
Interface defines
its services
Copyright © 2009
4-10
KICS, UET
Message-Passing Using OSI Model


Process A on machine # 1 needs to send a
message to process B on machine # 2
Message is built up through layers before it
goes to the network
Copyright © 2009
4-11
KICS, UET
Protocols at various Levels

Several protocols at different layer levels



Three lower level protocols in OSI model:





Protocol suite or stack used in a particular system
Implementation  different from reference model itself
Physical layer
Data link layer
Network layer
Transport protocols
Higher level protocols



Session and presentation layer protocols
Application protocols
Middleware protocols
Copyright © 2009
4-12
KICS, UET
Example of a Data Link Layer
Protocol
Message exchange between A and B at link
layer level
Copyright © 2009
4-13
KICS, UET
Examples of Transport Protocols

Regular TCP
Copyright © 2009

4-14
TCP for transaction
processing (T/TCP)
KICS, UET
Middleware Protocols



Consist of general-purpose protocols below application layer
Examples: authentication, distributed commit, and
synchronization protocols
An adapted OSI model:
Copyright © 2009
4-15
KICS, UET
Example: Organization of a Search
Engine
Copyright © 2009
4-16
KICS, UET
Client-Server Architectures

Using three levels, a simple architecture
includes:



A client machine containing an implementation of
only the user-interface level
A server machine containing the rest 
processing and data level
Problem: this is not a truly distributed system


Everything is handled by the server
Client is simply a dumb terminal
Copyright © 2009
4-17
KICS, UET
Multitiered Architectures

Physically, a 2-tiered architecture



Consists of two types of machines: clients and servers
Various organizations are possible at three levels
Fig. (e) example: a browser with local cache of WWW pages
Copyright © 2009
4-18
KICS, UET
Example: Server Acting as a Client

Physically, a 3-tiered architecture



Server can act as a client
Programs or data may be distributed across
multiple servers
Example: a transaction processing system
Copyright © 2009
4-19
KICS, UET
Modern Architectures

Vertical distribution



Horizontal distribution





Achieved by placing logically different parts on different machines
Example: a multitiered architecture
Client or server is split in logically equivalent parts
Each part operates on its own share of complete data set
Goal is often to balance the load
Example: replicated web servers
Peer-to-peer distribution




No server
Clients collaborate with one another
Other clients can dynamically join or leave a group of clients
Example: Napster
Copyright © 2009
4-20
KICS, UET
Example: Horizontal Distribution

Web service


Replicated across three servers
Useful for highly popular web sites with enough
bandwidth
Copyright © 2009
4-21
KICS, UET
Layer 4-7 Applications
Examples and Multi-Core Platforms
Layer 4-7 Switching Examples

Load balancing





HTTP/HTTPS
VPN
NAT


Proxying



Filtering

HTTP/HTTPS traffic proxy
Forward or reverse proxy
Web caching

Application delivery



Copyright © 2009
4-23
Firewalls
Content based processing
and routing
Business intelligence
Software as as service
(SAAS) gateways
Cloud computing
Policy enforcements
KICS, UET
Layer 4-7 Switch Usage
Copyright © 2009
4-24
KICS, UET
A Web Proxy
1. Client request
2. Proxy request
4. Proxy response
3. Server response
Internet
Client
Web server
Web proxy server

A proxy works as an intermediary




Between client and server
Typically at the “edge” of the network
Network aggregation points are administered by ISPs
Services

Infrastructure, caching, access control, authentication, and firewall
Copyright © 2009
4-25
KICS, UET
Caching Services

Caching is beneficial for entire web infrastructure





Types of transactions that can be cached






Upstream bandwidth saving
Improved latency for the end user
Content distribution to downstream users
Less load on servers
HTTP
FTP
Gopher
NNTP
Streaming media
Proxy should be transparent for other types
transactions

For example, SSL tunneling
Copyright © 2009
4-26
KICS, UET
Modes of Proxy Usage

Server vs. appliance

Proxy is a software application


Proxy is a pre-packaged box


Explicit: user explicitly points the browser to a proxy


Example: ITC proxy
Transparent: user is unaware of the proxy

Example: WCCP, L-4 switches, WPAD (Web Proxy Auto Discovery), and others
Forward vs. reverse



Examples: Cisco, CacheFlow, and Network Appliances caching products
Explicit vs. transparent


Examples: Microsoft Proxy, Inktomi Traffic Server, and Squid
Forward: used for bandwidth saving at aggregation points
Reverse: used as a server accelerator
Single vs. hierarchical (chained) vs. cluster (array) configurations


Distributed caching
Load balancing
Copyright © 2009
4-27
KICS, UET
Platforms for Layer 4-7: Multi-Core

Requirements for layer 4-7 devices:




Suitable processor for a spectrum of applications
Efficient memory hierarchy
Efficient I/O and networking support
A flexible operating environment




A generic OS (typically POSIX compliant)
System level support for development and maintenance
High throughput to match user expectations
Multi-core systems are the choice



Technology and price-performance dictates it
OS and system software support
Multi-threading to achieve high application performance
Copyright © 2009
4-28
KICS, UET
Deep Packet Inspection
T. Lam et al., “XML Document Parsing:
Operational and Performance
Characteristics,” IEEE Computer, Sept.,
2008
Current Internet Architecture
Source: Fang Yu, “High Speed Deep Packet Inspection with Hardware Support,” PhD Dissertation, University of
California, Berkeley, 2006. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-156.html.
Copyright © 2009
4-30
KICS, UET
Characteristics of Internet Architecture

Internet core




Limited to packet routing and switching
Typically, little or no “intelligence” involved
Use only packet header information
Network edges



Perform application level functions
Increasing “intelligence” at edges and hosts
Involves processing packet header as well as
content  deep packet inspection


One or more packets
Stateless or state-ful
Copyright © 2009
4-31
KICS, UET
Deep Packet Inspection (DPI)


Unlike traditional packet inspection, DPI
checks the complete payload instead of just
the packet header(5-tuple) to make the
decision.
When protocol ID is hidden, DPI is required
to find out protocol information in the packet
payload based on predefined protocol
features.
Copyright © 2009
4-32
KICS, UET
DPI—Classification and Matching

Packet classification depends on:



DPI applications





Inspecting header  layer 2-4 protocols
Inspecting pay load  layer 7 protocols  DPI
Detect virus, spam, etc.
Network intrusion detection
Network montoring
Load balancing
DPI is all about comparing content against a
set of specified patterns
Copyright © 2007
2009
4-33
33
KICS, UET
Example DPI Applications

Stopping worms from spreading



Current schemes rely on end-host anti-virus
software and are not very effective
DPI allows content checking at network edges
Content based routing



Identifying HTML traffic for a web server
Directing XML traffic to an application server
Balancing load
Copyright © 2009
4-34
KICS, UET
Representing a Pattern

An explicit text string



A regular expression




Represents what we are looking for
One string can represent only one pattern
Replacing strings for regular text searching
Enhance the expressive power of a pattern
More flexible and widely understood
Other possibilities e.g., Xpath

Standard language for matching in XML content
Copyright © 2007
2009
4-35
35
KICS, UET
Complexities with DPI

Patterns are pre-defined keywords or strings



Explicit strings
Regular expressions
Patterns related requirements:





Match multiple patterns
Simultaneously or in specified sequence
For wide variety of attacks
One packet to be matched against 1000’s of
signatures
Example: SNORT has more than 4000 signatures
Copyright © 2007
2009
4-36
36
KICS, UET
Complexities with DPI (2)

Keywords complexities:



Performance challenge:



Can be of any length
Can be anywhere in the payload of a packet
Match at line speed
Continuously increasing: 10, 100, 1,000, 10,000
Mbps
Flexibility to accommodate new patterns
Copyright © 2009
4-37
KICS, UET
XML Pattern Matching

XML is widely used at application layer



Stages for XML processing:





Extensible Markup Language
De facto standard for Internet document format
Parsing
Access
Modification
Serialization
Parsing is the most expensive operation
Copyright © 2009
4-38
KICS, UET
XML Processing Stages
Copyright © 2009
4-39
KICS, UET
DPI Case Study: A Layer 7 Filter

Linux L7 filter:




An extension to Netfilter
Classify packets in connection
D. Guo et al. “A Scalable Multithreaded L7-filter
Design for Multi-Core Servers,” ANCS ’08, 2008
Multi-core architecture

Multi-core architecture duplicates hardware
resources such as ALU, L1 cache, etc. on the
same die, and hence allows multiple processes to
run concurrently on different cores.
Copyright © 2009
4-40
KICS, UET
Implementation Details

Offline model


Connection level parallelism




Well-controlled research environment
Faster processing
Better cache performance
Connection-based affinity for multi-core
scalability
Direct mapping in VM

Transplant native optimization to VM
Copyright © 2009
4-41
KICS, UET
Trace-Driven L7 Filter Data Flow
Copyright © 2009
4-42
KICS, UET
Affinity Based Multithreaded L7 Filter
Architecture
Xen Driver Domain
Incoming
Packets
libnids
Preprocessing
Scheduler
Schedule new packets
To MTs
MT #1
MT #2
MT #3
VCPU
#3
Xen Hypervisor
PCPU
#0
...
...
VCPU
#2
MT #7
...
...
VCPU
#1
MT #6
...
...
VCPU
#0
MT #5
...
PT #0
MT #4
VCPU
#4
VCPU
#5
VCPU
#6
VCPU
#7
PCPU
#5
PCPU
#6
PCPU
#7
Direct Mapping
PCPU
#1
PCPU
#2
PCPU
#3
PCPU
#4
Physical Devices
Copyright © 2009
4-43
KICS, UET
Data Flow in Scheduler
Copyright © 2009
4-44
KICS, UET
Throughput and Core Utilization
100%
T-ori
T-aff
U-ori
U-aff
1.2
1.0
90%
80%
70%
0.8
60%
50%
0.6
40%
0.4
30%
0.2
20%
10%
0.0
0%
1
2
3
4
5
6
7
CPU Utilization
Throughput (Gbps)
1.4
8
# of MTs
Copyright © 2009
4-45
KICS, UET
L2 Cache Misses
Copyright © 2009
4-46
KICS, UET
Execution Time Comparisons
Copyright © 2009
4-47
KICS, UET
Execution Time Profile
Copyright © 2009
4-48
KICS, UET
Summary of Case Study Results

L7 filer implementation



With multiple threads
A scheduling mechanism using core affinity to
optimize L2 cache utilization
Improvements compared to native Linux


51% higher throughput
15% lower core utilization
Copyright © 2009
4-49
KICS, UET
Performance Consideration
Case Studies
DPI Performance Evaluation


We consider two case studies
Case study: DPI



String matching using regular expression
Regular vs. optimized regular expressions
Case study: deep message inspection



XML based message content
TCP termination  messages instead of packet
Performance of XML operations: parsing,
validation, and XPath based matching
Copyright © 2009
4-51
KICS, UET
Case Stdy:
Efficient Regular Expression
Matching for Deep Packet
Inspection
Fang Yu et al., ANCS ’06, 2006
DPI—Performance Requirements

High speed



Match core link speeds (10 Gbps)
Match edge link speeds (1 Gbps)
Match increasing processor performance




Moore’s law in 90’s  increasing clock speeds
Moore’s law now  increasing number of cores
Such rates are difficult to achieve with software
based worm/virus pattern detection
Example: SNORT can achieve up to 250 Mbps
Copyright © 2009
4-53
KICS, UET
DPI—Performance Requirements (2)

Low cost



High throughput requirement can be fulfilled with
powerful parallel architectures but at high costs
Currently available multi-core systems provide
highly parallel and high performance compute
systems at low cost
Layer 4-7 application performance challenges


Cache and memory latencies  architecture
Multi-threading complexities  software
development
Copyright © 2009
4-54
KICS, UET
Performance Optimization


Techniques evolved for all representations
Explicit text representations



Algorithmic optimizations e.g., Bloom filters
Architectural optimizations e.g., TCAMs
Regular expression based representations



Nondeterministic Finite Automaton (NFA) based
approaches  inefficient
Deterministic Finite Automaton (DFA) based
approaches
Grouping for over-lapping patterns
Copyright © 2007
2009
4-55
55
KICS, UET
Optimization Case Study: RegEx

RegEx pattern optimizations:




NFA  NFA based implementation
DFA RP  Flex generated DFA based repeated
scan engine
DFA OP  Optimized DFA with grouping
Experimental setup:


NFA based algorithms from Linux L7 filter and
SNORT
Packet traffic based on real traces from MIT and
UCB real network dumps
Copyright © 2009
4-56
KICS, UET
Throughput Comparison of Scanners
Copyright © 2009
4-57
KICS, UET
Results Summary for Case Study

Implemented DFA based approach



Instead of NFA based commonly used
Re-writing results in reduced memory needs
compared to exponential memory with regualr
DFA
Selective grouping enhances throughput



2 to 3 order of magnitude more than NFA
1 to 2 order of magnitude faster than regular DFA
Suitable for multi-core processor implementation
Copyright © 2009
4-58
KICS, UET
Case Study:
Benchmarking XML Based
Application Oriented Network
Infrastructure and Services
Abdul Waheed and Jason Ding
SAINT ’07
January 18, 2007
Outline

Application oriented networking



Benchmarking





AON benchmarking requirements
State of application services benchmarks
AONBench


Application services in network infrastructure
Role of XML for AON
Methodology
Specifications
Case studies of using AONBench
Conclusions
Copyright © 2009
4-60
KICS, UET
Application Oriented Networking
(AON)

Increasing “intelligence” in the network




AON provides application awareness in the network



Beyond switching, routing, and traffic engineering functions
Traditionally, complex functions left to end hosts
Cost-effective processing capabilities enable increasing
complex functions within the network
Provides value added functions in network infrastructure
Adds value to a vast array of enterprise applications
Examples:



Network edge devices for caching, filtering, and security
Protocol translation for integration at enterprise level
Network policing and monitoring conformance to SLAs
Copyright © 2009
4-61
KICS, UET
AON is the “Network for
Applications”
APPLICATIONS,
PROCESSES,
PEOPLE
APPLICATION
ORIENTED
NETWORKING
MESSAGE
ROUTING
EVENT
CAPTURE
APPLICATION
SECURITY
PACKET
NETWORK
Next stage in the evolution of the network:

Process at application message level

Understand the content and context of messages

Enable application to focus on business logic and user
interaction while offloading application overhead aspects to the
network with no changes to the existing systems
Copyright © 2009
4-62
KICS, UET
AON understands Application
Messages

Conventional networks:



Provide intelligent packet
level services
Cannot interpret message
contents
101011001011011011010100110101
AON interprets application
message contents for much
richer detailed information:


PURCHASE ORDER #: 012345678
FROM: BigWig Co, Anytown
TO: Cisco Systems DATE: 04/01/05
QTY: 50
PART#: Widget #12345a
PRICE:=$500 ea. TOTAL: = $25,000
DELIVERY: Urgent SLA:= 2 days
APPLICATION-ORIENTED
NETWORKING
Example: Ship To, Part#,
Qty, $, SLA
Allows business driven
policies to be executed on
application messages at
runtime
Copyright © 2009
?
101011001011011011010100110101
PACKET NETWORKING
4-63
KICS, UET
Cisco AON Core Capabilities
•
•
•
•
•
•
Reliable messaging
Content based routing
Transformation
Protocol switching
Message distribution
Message load balance
•
•
•
•
•
•
Authentication
Authorization
Encryption/Decryption
Data integrity / N-R
Digital signatures
Centralized PKI mgt.
Application Optimization
•
•
•
•
•
•
Event capture, filtering
Logging for audit
Automatic notification
Policy controlled
Feed to dashboards
Link to Network events
Extensibility
• Hardware Acceleration (SSL, Crypto, XML)
• Message level Caching and Compression
• High Availability, Failover, Load Balancing
Copyright © 2009
Event
Visibility
Application
level Security
Intelligent
Messaging
4-64
• ADK (for custom adapters)
• SDK (for custom bladelets)
• AON Technology Partners
KICS, UET
Application Services without AON
Application
Server
Common
services
Back-end
Client end
WAN


Application server hosts the business services
Uses back-end services and network infrastructure


Back-end: to conduct common or compute intensive tasks
Network infrastructure: for connectivity and routing
Copyright © 2009
4-65
KICS, UET
Application Services with AON
Application
Server
Client end
WAN
AON module


AON modules push common services to the network
Application servers can leverage AON to:



Off-load compute-intensive task
Off-load routing, security, load balancing, caching, protocol
classification, and application-specific integration
Focus on providing business logic
Copyright © 2009
4-66
KICS, UET
Cisco AON Products

Pushes common application services to infrastructure








Security
Firewalls and access control
Content caching and delivery
XML content processing
Load balancing
Protocol matching
Integration of heterogeneous standards/applications
Multiple AON form factors:



AON modules in Catalyst 6500 series switches
AON modules in Cisco 2600/2800/3700/3800 series routers
Cisco 8340 AON appliance
Copyright © 2009
4-67
KICS, UET
Role of XML for AON

XML is becoming ubiquitous for network applications



XML features




Similar to IP, which is ubiquitous for packet routing
Application message content is increasingly XML
Self defining, extensible format
Convenient to parse, validate, and transform
Suitable for integrating multiple services in enterprise
networks
Performance is a challenge



Different from stateless packet (header) processing
XML processing and security are compute-intensive
Benchmarking is needed as AON awareness grows
Copyright © 2009
4-68
KICS, UET
Requirements for an AON
Benchmark

Benchmark should be vendor-neutral



It should be open




Need a level playing field for everyone
Should focus on services rather than specific products
Open in terms of specifications
Also, open source tools
User should be able to employ their own choice of tools
Benchmarks should exercise


Networking/web services and
XML Content processing services, such as parsing, pattern
matching, validation, transformation, and security
Copyright © 2009
4-69
KICS, UET
State of Application Services
Benchmarks

Two distinct classes of benchmarks: WWW and XML





Available benchmarks lack all requirements:




Web services benchmarks
XML microbenchmarks
XML storage management benchmarks
XML security benchmarks
Benchmark either web services or XML processing
Product-specific benchmarks
Commercial tools that are not open benchmarks or
specification for benchmarking
Our approach:


Provide open specifications for benchmarking
Leave tool development/selection to the evaluator
Copyright © 2009
4-70
KICS, UET
AONBench

AONBench is not a benchmarking tools



Methodology




Setup needed for measurements
Setup for service request and response
Criteria for XML functional success and failure
Specifications



Consists of two parts: methodology and specifications
Tool development or selection is left to the evaluator
Application services based use cases
XML messages with schemas and style sheets
We do have two realizations of AONBench


Using public domain tools
Using custom tools
Copyright © 2009
4-71
KICS, UET
AONBench Methodology (1)
GigE switch
Default server
Client end
HTTP text/xml message
HTTP response
Second server
endpoint
Ingress / Egress
AON module

Isolated measurement setup



AON module
GigE network connection client, servers, and AON
It can scale with traffic generation needs
Copyright © 2009
4-72
KICS, UET
AONBench Methodology (2)

Client/server setup




Service selection




Client sends XML content as HTTP POST messages
AON module intercepts (explicitly or implicitly)
Forwards the message to an endpoint
Based on unique URL in POST request
AON module provides pre-configured services
Services can succeed or fail
Modes of AON forwarding



AON selects an end-point based on success/failure of a
service request
No need to use any product-specific interface
XML function can be verified by modifying the message
Copyright © 2009
4-73
KICS, UET
AONBench Methodology (3)

Connection type



Content sizes




Default: keep-alive connections with unlimited messages
Keep-alive can be disabled as needed
Application services frequently use small messages
Small: 1 KB to 5 KB messages
Large: 500 KB messages
Basic performance metrics

Throughput




Messages per second
Megabits per second
End-to-end request-response latency
Other metrics can be derived from the basic metrics
Copyright © 2009
4-74
KICS, UET
Specifications for Use Cases

We specify three classes of use cases

Network infrastructure services



XML content based services




Content Based Routing (CBR)
Schema Validation (SV)
XML Transformations (Trans)
XML security services


Forwarding Request (FR) without modifications
Provide HTTP proxying baseline
Encryption, Decryption, Digital Signature, and Verification
We also specify XML messages for these use cases


Message contents
Schemas and style sheets
Copyright © 2009
4-75
KICS, UET
Specifications: Forward Request

AON module actions






Message




Receives the HTTP POST request from a client
Identifies it as FR use case
Forwards the message to specified server end point
Receives the response from server end point
Forwards the response back to the client
Any XML message can be used
Contents are irrelevant
Size: one of 1, 5, or 500 KB
Performance significance


Provides a baseline for all other cases
Should result in highest throughput compared to other UCs
Copyright © 2009
4-76
KICS, UET
Specifications: XML Processing Use
Cases

AON module actions:






Receives the HTTP POST
request
Identifies the type of
service: CBR, SV, or Trans
Performs the selected
service on content
If no errors  forwards to
specified end point
else  forwards to the
default end point
Receives response from the
end point
Forwards response back to
the client
Copyright © 2009

Messages:




Performance significance:


4-77
CBR:
XML message with SOAP
envelope containing a string
of interest
SV:
XML message with name of
schema XSD file
Trans:
XML message that use a
pre-stored style sheet
Exercise XML computeintensive operations
Also exercise network I/O
KICS, UET
Specifications: XML Security Use
Cases

AON module actions:






Receives the HTTP POST
request
Identifies the type of
service: Enc/Dec/Sign/Verify
Performs the selected
service on content
If no errors  forwards to
specified end point
else  forwards to the
default end point
Receives response from the
end point
Forwards response back to
the client
Copyright © 2009

Messages:




Performance significance:


4-78
Encryption/Signature:
XML message with SOAP
envelope containing a PO
Decryption:
Uses message resulting
from Encyrption
Verification:
Uses result of Sign
Exercise XML crypto
operations
CPU intensive use cases
KICS, UET
AONBench Implementations

Implementation does not depend on any tool




Our setup






Users can select tools of their choice
Need client/server setup
Tool needs to support HTTP POST request
Client end point: ApacheBench
Server end point: Apache web server
Multiple AON form factors
We use specified messages, schema, and style sheet
Exercise services through unique URIs
In-house tools can also be used
Copyright © 2009
4-79
KICS, UET
Studies of Using AONBench

We have used AONBench at various phases
of AON product development





Product requirements and definition
Product design phase  for architects
Product execution phase
 performance
regression testing
product release phase
 for marketing
Cases studies of applying AONBench



Competitive analysis
Selection of appliance hardware
Evaluation of acceleration hardware
Copyright © 2009
4-80
KICS, UET
Case Study: Selection of Appliance
H/W
Hardware Platform
Speedup
Transformation
Schema Validation
2x3.2GHz CPUs, 1 MB L2, 2
MB L3, 667 MHz FSB, and 4
GB DRAM
Baseline-1
Baseline-2
4x3.2 GHz CPUs, 1 MB L2,
no L3, 667 MHz FSB, and 4
GB DRAM
1.25x
1.1x
2x3.2 GHz CPUs, 1 MB L2, 8
MB L3, 667 MHz FSB, and 4
GB DRAM
1.6x
1.8x
Increased L3 cache size more effective than doubling the number of CPUs
Copyright © 2009
4-81
KICS, UET
Case Study: Competitive Analysis
25
20
15
Design-1
Design-2
Competitor-1
Competitor-2
Design-3
Relative
Performance
10
5
0
ValidateSchema
Encryption
Decryption
Signature
VerifySignature
Competitive analysis of three designs using AONBench
Copyright © 2009
4-82
KICS, UET
Case Study: Evaluating Accelerator
H/W
Use Case
Performance speedup due to an accelerator
5 KB messages
500 KB messages
Content Based Routing
1.08
3.08
Schema Validation
1.12
1.11
Encryption
1.06
1.44
Decryption
1.39
1.18
Digital Signature
1.11
1.02
Signature Verification
1.03
1.02
Hardware accelerator is more valuable for processing larger messages
Copyright © 2009
4-83
KICS, UET
Cast Study Conclusions

AONBench contributions

Introduces specification based benchmarking for networked
application services





Use cases
XML messages
Neutral to vendor and measurement tools while focusing on
the performance of networked services
Useful for architects, developers, as well as end users
Future work


Use current “atomic” use cases to develop more realistic UCs
Extend the scope of use cases: caching, load balancing,
message filtering, protocol classification, and application
integration
Copyright © 2009
4-84
KICS, UET
Key Takeaways for Today’s Session

Most of the action at application layer




Wide range of protocols and applications
Simpler to deploy on existing IP networks
Exponentially increasing demand
Multi-core systems are suitable



To deliver high throughput
To match link speeds despite compute/memory
intensive tasks at application layer (e.g., DPI)
To concurrently perform compute intensive
security operations for scalability
Copyright © 2009
4-85
KICS, UET