Dept_present3

Transcript Dept_present3

Communication Performance
Measurement and Analysis on
Commodity Clusters
Research Proposal
Name
Nor Asilah Wati Abdul Hamid
Supervisor
Dr. Paul Coddington
Dr. Francis Vaughan
Table of Content
Introduction
Message-Passing Multicomputers.
Previous Research to Improve Communication Over
Ethernet.
Communication Performance Measurement.
Previous Benchmark Software
Performance Analysis for MPIBench.
Motivation
Methodology
Value of the Research.
Introduction
The proposed research is on parallel computing and focus
on message-passing parallel computers.
This research will study communications benchmark
software and performance measurement and analysis for
message-passing parallel computers.
The proposed research will find a clearer understanding of
communications performance problems and how they can
be improved, particularly for commodity clusters using
Linux PCs and Ethernet networks.
Message-Passing Parallel Computers
There are various types of message-passing parallel computers, from
high end to the low end.
Beowulf clusters are high-performance computers built from off-the-shelf
commodity components - PCs running Linux and Fast Ethernet network.
However, some clusters use high-end Unix workstations (such as
Compaq Alpha or Sun UltraSPARC machines) and/or high-end gigabit
networks (such as Myrinet, QSNet)
Hydra
APAC NF
Message-Passing Parallel Computers
The low end commodity cluster - consist of a cluster of PCs running Linux
connected using a Fast Ethernet network, e.g Perseus.
Use MPI message-passing libraries, e.g MPICH, LAM MPI.
MPI – standard library specification for message-passing computer.
MPICH – freely available implementation of MPI
The proposed research is mainly focussed on low end commodity clusters.
Perseus
Message-Passing Parallel Computers
Beowulf clusters have become very popular over the past couple of
years, due to the rapid improvements in the performance of commodity
processors and networking infrastructure, and the development of
Linux, for PCs.
For most applications, Beowulf clusters offer much better
price/performance than standard supercomputers.
Beowulf cluster commonly use Ethernet network and TCP/IP for
communication and MPICH for MPI library.
Ethernet network is much cheaper than high-speed networks.
However there are several inadequacies related to the Ethernet
network due to TCP/IP and MPI implementation.
Network Cost Comparison
(Clustervision.com)
Interconnect
Bandwidth
(Mbytes/s)
Latency (µs) Cost/port
(Euro)
QsNet
(Quadrics)
Myrinet
(Myricom)
Gigabit Ethernet
360
5
4770
245
10
2050
90
100
200
Megabit Ethernet 12
100
28
Infiniband
13 - 17
2000
560 - 610
Ethernet Problems
TCP/IP is specifically designed for Internet use, hence, there
are several problems in using it for parallel computing
Examples : mechanism for packet loss and congestion
control, timeout etc.
Problems in MPI implementation occur because :TCP/IP support detect errors, loss of data and retransmission until
data is correct and receive
BUT
MPI implementation assume network with reliable data transfer.

There is much research trying to improve the performance of
TCP/IP, but mostly focussed on optimizing the performance
for internet and local-area network.
Previous Research to Improve
Communication Over Ethernet
Active Messages – aims to reducing the communication overhead and
allowing communication and computation overlap.
GAMMA – an extension layer in communication layer for Linux in cluster of
PCs.
BIP – Basic Interface for Parallelism, an interface for network
communication for message-passing parallel computing.
VIA – is a standard communication infrastructure for System Area
Networks (SANs) that provides protected, zero-copy user-space interprocess communication
MVICH – is an MPICH-based implementation of MPI for Virtual
Interface Architecture (VIA).
Protocol Comparison
(Ping-Pong Application)
Platform
Latency(us) Bandwidth
(Mbyte/s)
BIP – Myrinet
5.0
108.0
TCP - Myrinet
103.0
42.0
GAMMA – Gigabit Ethernet
9.6
90.0
TCP – Gigabit Ethernet
103.0
62.0
GAMMA - Fast Ethernet
12.7
12.2
VIA – Fast Ethernet
27.0
-
TCP – Fast Ethernet
105.0
10.0
Previous Research to Improve
Communication Over Ethernet
Previous research focusing more on developing a new design for
replacing the TCP/IP protocol.
However, a new protocol will require new software (e.g: drivers) for all
Ethernet hardware.
Also, need to port MPI implementation to new protocol, e.g : MVICH.
TCP/IP and MPICH are widely used in existing Beowulf cluster. So a
more flexible TCP/IP and better MPICH will be better than a new
protocol.
Research from Pope et al is an example of research aiming to design a
more flexible TCP/IP using a compliant systems approach.
They proposed the argument for separation of policy and mechanism
and examine what policies is suitable for TCP/IP stacks which
depends on the type of communication use.
Communication Performance Measurement
Why communication
important, examples :-
performance
measurement
is
To improve the performance of the machine and the MPI
implementation
Needed as input to performance modeling tools for parallel
programs
To compare the performance of the machine, in order to find the
fastest machine.
Benchmark software, e.g: SKaMPI, MPBench, Mpptest,
Pallas MPI Benchmark, and recently developed
MPIBench
Previous Benchmark Software
SKaMPI, MPBench, Pallas MPI Benchmark, Mpptest.
Existing benchmark software has several weaknesses, which can
result in the inaccuracy of time measurement.
The use of relatively coarse grained clocks for timing measurement,
which will lead a benchmark to average results over a high number of
test repetitions.
Rely on MPI_Wtime for timing and use ping-pong test to measure the
total round trip time, not single communication time.
None of the communication patterns used in existing benchmark
consider clusters of SMP nodes.
MPIBench
MPIBench has been developed by Duncan Grove as part of
his PhD research.
The extra functionality in MPIBench :
Topology-aware, specifically designed to ensure
meaningful results on clusters of SMP nodes.
Uses an accurate globally synchronized clock to measure
the performance of all the processes involved.
Can measure times of single communications - not just
averages.
Can generate histograms (distributions) of communication
times.
The proposed research will used MPIBench for the
performance measurement and also improve the MPIBench.
Performance Analysis with MPIBench
Comparison of communication performance of different networks.
Beowulf-type cluster of PCs connected by Fast Ethernet (Perseus
and Bunyip).
Perseus vs Bunyip – to analyse effects of different
communication topology.
Sun Technical Compute Farm connected with Myrinet (Orion).
Compaq AlphaServer SC connected with QsNet (APAC NF).
Performance Analysis with MPIBench
MPIBench found several inadequacies from the performance
analysis, for examples :Problem caused by TCP/IP timeouts and congestion control.
Problems with MPI implementations.
Problems caused by network congestion.
Distribution results with long tails, including ‘outliers’ with very long
communication time due to :Spurious interference from unrelated operating system services.
Cluster management system daemons
Outlier - An extreme point that is much longer than the average value of
distribution.
Perseus : Average time for MPI_Bcast
Perseus : Percentage of procesess experiencing
outliers during MPI_Bcast
Distribution of times for MPI_Bcast
Perseus : Average times for MPI_Alltoall
Perseus : Percentage of processess
experiencing outliers during MPI_Alltoall
Motivation 1
MPIBench is a new communication benchmark software which has new
capability compared to existing benchmark software.
HOWEVER, there has been no detailed comparison or study between
MPIBench with the existing MPI benchmarks. Furthermore, in order
to improve MPIBench a comparison with existed benchmark
software is important, to identify any inadequacies in MPIBench.
Research Aims
1.
To compare MPIBench with the other existing benchmark software .
The comparison also to test the scalability, functionality and usability
of MPIBench compared with the existing software.
2.
Based from the comparison results, improvements and changes can
be done to MPIBench.
Methodology
1.
Comparison of different benchmark software for message-passing
parallel computer.
Particularly, the comparison is divided into theoretical and experimental
part.
The theoretical part will involved a study based from the conference or
journal paper and the documentation from the benchmark software.
The experimental part will involve installation of the benchmark software
into the Hydra cluster and test the functionality of the software.
Then, a standard procedure for test particular such as size of data, MPI
routine and number of iterations will be identify to standardized the
experiment. All the data that obtain from the experiment will be recorded
and compared.
Methodology
2.
Improvement to MPIBench
Generally, the second method will required a detailed understanding
to the MPIBench code.
After that, changes to the code will be highlighted and then changes
will be made to the code.
Crucially important after the changes is the testing to the MPIBench,
the testing should be done with the same testing in the first
methodology to ensure the correctness of the program.
Motivation 2
Previously, Grove had used MPIBench to test between two cluster which
has a similar commodity component but different in their topology,
Perseus and Bunyip.
HOWEVER, there has not been any experimental work done with
MPIBench to test on a machine which has a similar components and
similar topology but only different in their network type.
Research Aims
3. To analyze the performance between Myrinet and Ethernet network on a
large Linux PC cluster (Hydra). Results obtained from the test will be
analyze and may provide ideas on how to upgrade the communication
performance for Ethernet network in Beowulf cluster.
Methodology
3.
Performance Analysis and Investigation
Performance on Different Networks.
of
Communication
Design a method to differentiate between Ethernet and Myrinet network to
run the program.
A set of procedure or parameter is required to standardize the experiment,
for examples number of iterations, MPI routine, number of processors and
size of data.
The performance analysis result will be recorded and analysed.
After the performance analysis results is obtained, then, the results will be
used to investigate the problems in Ethernet network.
The investigation will involve study, analysis and discussion regarding the
comparison results on communication performance for Myrinet and
Ethernet network.
The expectation of this stage is to obtain ideas for problems that occur in
the Ethernet network, particularly for TCP/IP and MPI implementation.
Motivation 3
Previously, there are several research to overcome the problems of
communication performance for Ethernet network in Beowulf cluster.
However, previous research focus more on a new design of protocol. A new
protocol will require new software (e.g: drivers) for all Ethernet hardware
and also need to port MPI implementation to new protocol.
It will be more valuable if the problems of TCP/IP and MPICH itself can
be fixed.
Research Aims
4. To propose or develop solutions to communication problems in Beowulf
clusters using Ethernet network, particularly for TCP/IP and MPI
implementation.
Methodology
4.
Propose or Develop Solutions for the Ethernet Network
Problems in Beowulf Clusters Computers.
This will involve study, analysis, comparison results and experiment.
Based from the study that has been done, there are several expected
problems that might be occurred in TCP/IP, for example packet loss
and congestion.
Suggestions that might be suitable to the TCP/IP, decrease the time
out or improve the algorithm for the resend mechanism in TCP/IP.
The problems that occur in MPICH such as poor performance and
unusual distribution of MPI_Alltoall.
Suggest or develop optimised code for some MPI routines that is
suitable for TCP/IP and Ethernet network.
Re-run experiments to test changes to MPICH code or TCP, in order
to check for performance improvement.
Motivation 4
Previously Grove had used MPIBench to benchmark several machines,
from his analysis he recorded “outlier” results showing very long
communication times.
The main causes of ‘outlier’ is because of :Spurious interference from unrelated operating system services.
Cluster management system daemons
However, there has been no further work to investigate the solution of these
problems.
Research Aims
5. To find solutions for loss of performance in Beowulf clusters with Linux
PCs.
6. Possibly develop a customized installation of Linux.
Methodology
5. Investigation of the Outliers Problem.
Set the same experiment that the MPIBench did previously on Perseus.
Based on the expected main causes of the outliers, the experiment will involve :Experiment with removing operating system and Cluster Management system
processes.
Experiment with reducing the frequency of the interference from process
execution.
Try to identify the cause of outliers and propose solutions.
Value of the Research
This proposed research will provide :1.
An improvement to MPIBench which can be used
to analyze communication networks and MPI
implementations.
2.
Results that can be used for future study for
PEVPM, a new performance modelling technique.
3.
An improvement in communication performance
for Beowulf Clusters using Ethernet network which
can provide a solution for cheap high performance
computing.
END.

Dept_present3

Transcript Dept_present3

Directory