Tests e supporto per l’ENEA GRID

Download Report

Transcript Tests e supporto per l’ENEA GRID

Tests and tools for ENEA GRID
Performance test: HPL (High Performance Linpack)
Network monitoring
A. Funel
December 11, 2007
HPL TEST
• HPL measures the floating point execution rate for
solving a sistem of linear equations AX = B
• HPL requires the availibility of MPI and libraries for linear
algebra (BLAS, VSIPL, ATLAS)
• HPL is scalable: parallel efficiency constant with respect
to the processor memory usage
www.netlib.org/benchmark/hpl
HPL Results (1)
A(nn) X = B
GFLOPS = [(2/3)n3+(3/2)n2]/[th 109]
th = CPU time
Th. Peak =
# of CORES CPU CLOCK SPEED 
FPO ISSUE RATE
LSF SUBMITION
Linux (bw305): 15 CORES,
Th. Peak = 72 GFLOPS
Test completed
AIX (sp4-2): 32 CORES,
Th. Peak  96 GFLOPS
AIX (sp4-3-4): 32 CORES
Th. Peak =  122 GFLOPS
Test did not complete!!!
HPL Results (2)
Expected CPU time th : # FPO / (# of CORES  CPU CLOCK SPEED  FPO ISSUE RATE)
USED: ATLAS Version 3.6
Linux (bw305), PQ = 35 CORES (LSF SUBMITION)  (HPL COMPLETED)
n (matrix size)
bytes (B) = 8n2
% MEMORY
(TOTAL = 12 GB)
obtained CPU time
(sec)
expected CPU time
th (sec)
GFLOPS
n  4000
32  103 B
 1.2 %
 16.0
0.5
3.486
n 8000
64  103 B
 4.0 %
 74.0
 4.2
6.806
n 12000
96  103 B
 9.0 %
 331.3
 14.5
7.237
HIGH USER WAIT TIME (NOT CPU TIME)  MAYBE DUE TO THE NETWORK
INTERCONNECTS (PUBLIC WHEN THE TEST WAS DONE)
A PONT-TO-POINT COMMUNICATION TEST USING MPI
HPL POINT-TO-POINT COMMUNICATION BETWEEN PROCESSORS IS BASED
ON MPI (MPI_Send MPI_Recv) ROUTINES
HPL Results (3)
PROBLEMS FOR AIX (LSF SUBMITION)  HPL MAKES THE
MACHINES HANGING OUT, THE TEST DOES NOT COMPLETE
EVEN IF MEMORY USAGE < 10%
ONLY A FEW CPU SECONDS OVER DAYS OF RUNNING TIME!!! 
UNDER INVESTIGATION
INTERACTIVE SUBMITIONS
AIX (sp4-1), 48=32 CORES  HPL COMPLETED
n (matrix
size)
l
bytes
l (B) = 8n2
% MEMORY
(TOTAL = 25.6 GB)
obtained CPU time
(sec)
expected CPU time
th (sec)
GFLOPS
n  18000
144  103 B
 10 %
 266.5
52.0
16.25
AIX (sp4-2), 48=32 CORES, 20% TOTAL (32 GB) MEMORY HPL NOT COMPLETED
AIX (ostro), 44=16 CORES, 20% TOTAL (16 GB) MEMORY HPL NOT COMPLETED
NETWORK MONITORING (coll. G. Guarnieri)
A TOOL HAS BEEN PROVIDED IN ORDER TO DETECT WHETHER THE
COMMUNICATION SPEED BETWEEN TWO HOSTS (CLIENT AND SERVER)
OF THE ENEA GRID CHANGES OVER TIME
THE TEST MEASURES THE ROUND TRIP TIME IT TAKES TO SEND A SMALL
PACKET (10, 100, 1000 BYTES) OF DATA AND RECEIVE IT BACK
SMALL PACKETS: NOT CHOPPED (NO SPURIOUS DELAY EFFECTS), FAST
FLUCTUATIONS NOT HIDDEN BY THE FINAL INTEGRATED AVERAGE TIME
NEEDED FOR WAITING BIG SIZE PACKETS
60 PACKETS SENT IN SEQUENCE EACH SECOND
TCP/IP PROTOCOL
start
client
server
stop
BOTH CLIENT AND SERVER BLOCK UNTIL
THE FULL PACKET IS SENT/RECEIVED:
NO LOSS OF DATA
www.afs.enea.it/funel
NETWORK MONITORING (2)
Client: eurofel00 Server: bw305-2
Client: kleos Server: feronix0
HIGH SPIKES CLEARLY DETECTED  OVERALL COMMUNICATION DELAY
Conclusions
HPL BENCHMARK TEST:
Linux (LSF)  THE TEST COMPLETES HOWEVER:
1. OBTAINED CPU TIME >> EXPECTED CPU TIME  (PEAK)exp < (PEAK)th
2. TOO MUCH (USER) TIME TO COMPLETE
AIX (LSF)  THE TEST DOES NOT COMPLETE:
ONLY A FEW CPU SECONDS OVER DAYS OF RUNNING TIME!!!!
AIX (INTERACTIVE SUBMITION):
ONLY sp4-1 (32 CORES, 10% TOTAL MEMORY) TESTED 
TEST COMPLETED BUT STILL (CPU TIME) >> (EXPECTED CPU TIME)
USER WAIT TIME  35 minutes
NETWORK MONITORING:
A TOOL HAS BEEN PROVIDED TO DETECT VARIATIONS IN THE
COMMUNICATION SPEED BEWTEEN TWO HOSTS OF THE ENEA GRID
USEFUL FOR IMPROVING THE OVERALL NETWORK EFFICIENCY