Luis Ramos – August 17 th , 2006 - Indico

Download Report

Transcript Luis Ramos – August 17 th , 2006 - Indico

FroNtier Stress Tests
Status report
Luis Ramos
LCG3D Meeting - August 17, 2006
Agenda
1.
2.
3.
4.
5.
6.
Objectives
Test Plan
Hw/Sw Setup
Test Cases - Results
Future Work
Conclusions
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 2/22
Objectives
• Develop a benchmark for Frontier servers
– DB schema independent
• Build a tool that identifies performance
bottlenecks of a given setup
• Performance analysis of the complete sw stack
–
–
–
–
CORAL / Frontier plugin
Frontier Client
Squid
Frontier Servlet
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 3/22
Test Plan
• How fast are the individual components?
–
–
–
–
Database
Application Server
Cache Server
Network
• Explore performance impact of:
– Different data (content, size, storage type, compression)
– Different complexity database schemas
– Different caching policies
• How do DB throughput, network bandwidth,
payload size, # of clients or server CPU correlate?
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 4/22
Metrics and Parameters
• Metrics:
–
–
–
–
–
Individual and total throughput
Server errors
CPU consumption/load (clients, frontier server, squid server)
Memory usage and disk space needs
Network bandwidth usage
• Parameters:
–
–
–
–
–
# of client nodes
# of test clients
Payload sizes
Database structure and content
Caching policy
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 5/22
FroNtier Test Setup
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 6/22
FroNtier Test Setup
• Hardware setup:
– 1 server running Frontier & Squid:
•
•
•
•
Dual Intel Xeon CPU 2.80GHz
2Gb RAM
HD 150Gb
Fast Ethernet (100Mbps)
– 1 Backend Oracle Database 10gR2 (cooldev)
• Software Setup:
– FroNtier v3.1 (need to check keep_alive feature in v3.2)
– Frontier Squid v1.0rc4
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 7/22
Client Setup
•
Hardware:
– Dedicated lxplus nodes
• Dual Pentium III 1GHz
• 500Mb RAM
• HD 6Gb
• Fast Ethernet (100Mbps)
•
Software:
– CORAL_1_5_1 with FrontierClient v2.4.7 (hacked to print some metrics)
– C++ CORAL/FrontierPlugin test
• Queries the server
• Gathers results
• Outputs measures
– Python controller script
• Starts a number of clients in several client nodes
• Gathers measures
• Generates structured data for plotting
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 8/22
Test Cases developed
• Data Types analysis
• Payload Sizes analysis
• Network analysis
• Throughput analysis
– Includes Database, FroNtier and Squid analysis
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 9/22
Data Types analysis
• CORAL static mapping
•
C++ type
Oracle10g type
int
NUMBER(10)
float
BINARY_FLOAT
double
BINARY_DOUBLE
std::string(100)
VARCHAR2(100)
coral::Blob
BLOB
If DB schema is not created with CORAL:
– User may define a different mapping
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 10/22
Payload Compression
• Std::string(100) in C++ >> VARCHAR2(100) in DB
filled with random numbers
• Network payload size in % of user data size:
(for different compression levels)
Payload Sizes
Different compression levels
Payload Sizes
Different compression levels
70
Size in % of user data
120
100
80
Zip Level = 0
Zip Level = 5
60
40
Size in % of user data
140
65
Zip Level = 1
60
Zip Level = 5
Zip Level = 9
55
20
0
0,001
0,01
0,1
1
10
100
User data size (MBytes)
Luis Ramos – August 17th, 2006
50
0,001
0,01
0,1
1
10
100
User data size (MBytes)
FroNtier Stress Tests - 11/22
Payload Size
• Client side payload size evolution
–
–
–
–
–
(HTTP) XML ascii size
(FrontierClient) some bytes down to BLOB base64
(FrontierClient) 33% down to BIN size
(FrontierClient) up to uncompressed BIN size
(CORAL/FrontierPlugin) down to user C++ data types
• With std:strings, user data is ~ the same as DB size
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 12/22
Network Analysis
• Test tool that checks network performance
between multiple given client nodes and a
single server…
– Generates TCP/IP between clients and server
– Each client shows a throughput and a CPU%
• Done using the netcat utility
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 13/22
Throughput analysis
Frontier Server
• Up to 150 clients running against a single server
(direct FroNtier server access, no Squid involved)
FroNtier Server
Total Throughput
(per # of clients for different payload sizes)
3,5
Throughput (MBps)
3
2,5
1,36MB
2
2,7MB
5,56MB
1,5
11,2MB
1
22,6MB
0,5
0
0
20
40
60
80
100
120
140
160
# of clients
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 14/22
Throughput analysis
Server error rate
Depending on the size of the payload, up to 80 clients can be
served from one FroNtier node (no caching)
FroNtier Server
% of failed client requests
(per # of clients for different payload sizes)
100
% of failed client requests
90
80
70
1,3MB
60
2,7MB
5,5MB
50
11,2MB
40
22,6MB
30
20
10
0
0
20
40
60
80
100
120
140
160
# of clients
Frontier v3.2 keepalive feature may reduce drastically the error rates
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 15/22
Throughput analysis
Network, Squid and Frontier
• TCP/IP vs Frontier Server vs Squid Cache Hits
FroNtier, Squid and Network
Total Throughput
(payload size = 1,3MB)
Total Throughput (MBps)
14
12
10
8
6
4
2
0
0
20
40
60
80
100
120
140
160
# of clients
Squid (cache hits)
Luis Ramos – August 17th, 2006
FroNtier (DB queries)
Network
FroNtier Stress Tests - 16/22
Throughput Analysis
Compression level
• Zipping performance (single client)
Throughput per Payload Size
Single Client
300
Throughput (kBps)
250
200
Zip Level = 0
150
Zip Level = 5
100
50
0
0
2
4
6
8
10
12
Payload size (MBytes)
• Zip5 throughput is lower because:
– CPU time is higher (both on server and client)
– Network is not the bottleneck (can be in other scenarios)
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 17/22
Throughput Analysis
CPU% and load average
• How CPU% correlates with server total throughput
payload = 30kB
Throughput and CPU (per # of clients)
Server total CPU%
Total Throughput (kBps)
100
300
90
Throughput per CPU%
Total Throughput (kBps)
70
60
50
150
40
100
30
20
50
10
0
0
0
10
20
30
40
50
# of clients
Throughtput (kBps)
200
Server Load Average
300
12
250
10
200
8
150
6
100
4
50
2
0
0
20
40
60
80
Load Average (good = 2)
80
CPU %
Throughput (kBps)
250
0
100
CPU%
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 18/22
Throughput Analysis
FroNtierPlugin vs OraclePlugin
• CORAL FrontierPlugin vs OraclePlugin
(same CORAL code, different connect string)
Frontier vs Oracle (single client)
Single client:
180
Throughput (kBps)
170
160
150
FRT
140
ORA
130
120
110
100
0
0,2
0,4
0,6
0,8
1
1,2
Millions
# of rows
To do: plot multi client results
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 19/22
Future work
• Run multi client throughput tests changing:
– DB data type, structure and contents
– Access method (Oracle vs Frontier)
– Frontier configuration (compression level and caching)
• Frontier server error rate analysis
– Easily identify the limits of a given setup
• Squid caching
– Influence of partial caching of the queries
• Tests from outside CERN
– Run throughput tests from outside to measure the influence of a
poorer network connection
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 20/22
Conclusions
• Some values already collected
• Test scripts developed
– Next step: turn scripts easily reusable by others
• Real world data needed!
– “real” data and a estimate of the ratio between cached and
uncached accesses would allow to:
• make much more precise statements about
compression and achievable rates
– Who can provide these estimate/measurement to us?
• Further results and analysis
– 3D workshop in September
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 21/22
Questions? Ideas?
Luis Ramos – August 17th, 2006
FroNtier Stress Tests - 22/22