Lecture1101 - Department of Computer Science

Download Report

Transcript Lecture1101 - Department of Computer Science

Metrics
FLOPS
(FLoating point Operations Per Sec) - a measure of the
numerical processing of a CPU which can be an indicator of it’s
scientific computing capability.

The floating-point format is a variation of scientific notation - the real number is
represented using a mantissa, base, and exponent

Storing real number in computers:
 use
the fixed length of word as the storage space for a real number (e.g. 64bits)
 Mantissa
 The
mantissa and exponents are converted to base-2
 Some

parts of the word are used to store the mantissa, 1bit to store sign, and the rest to store the exponent
Advantages and disadvantages

Using a fixed-length space to store a wide overall range of values




is normalised (1.61 is normalised, 16.1 is not)
If 64 bits are used to store the real numbers, in which 11 bits are used to store exponent and 52
bits to mantissa (the remaining 1 bit used to store sign). We can derive the range of numbers this
storage layout can represent
More bits are used to store mantissa, higher precision, but smaller range
More bits are used to store exponent, wider range, but lower precision

The difference between two successive numbers is not uniform

When the numbers cannot be perfected converted to base-2 numbers, they must be rounded to be
stored in the format, leading to some problems where algebraic rules do not appear to apply
The LINPACK benchmark produces a FLOPS results. This solves a dense
system of linear equations by Gaussian elimination.
Computer Science, University of Warwick
1
Example of Floating Point Numbers
172.625
base 10
10101100.101 X 2^0 base 2
1.0101100101 X 2^7 base 2 normalised
Using 32 bit (4 bytes) to store the number in computers,
in which 1 bit for sign, 8 bits for exponent, and the rest
for Mantissa
0 00000111 00000000000010101100101
S Exp
Mantissa
Computer Science, University of Warwick
2
Metrics

MIPS (Millions of Instructions Per Second) - a measure of the
speed of a processor.
•
Peak MIPS rates (usually vendor supplied) can be misrepresentative
•
Meaningless Information on Performance for Salespeople
•
People seldom refer to it
Computer Science, University of Warwick
3
Metrics

SPECint - measures a processor’s integer processing
capabilities.
•
Latest version SPECint2006
•
Can test cpu, memory, compiler, but cannot test networking, I/O
•
Consists of a series of benchmarks (12, including compression,
compilation)
•
each benchmark has a reference time
•
Dividing the measured runtime of the benchmark by the reference
time and multiplying by 100 provides a base ratio
For example, if we run the benchmark 401.bzip2 to test the
system, whose reference time is 1400. The actual runtime of the
benchmark is 140 sec. then the base ratio is calculated as
1400/140*100=1000
• These are averaged to produce a final performance figure for the
processor.
Computer Science, University of Warwick
4
SPECint2006 benchmark suite
Language Category
Benchmark
400.perlbench
C
Programming Language
401.bzip2
C
Compression
403.gcc
C
C Compiler
429.mcf
C
Combinatorial Optimization
445.gobmk
C
Artificial Intelligence
456.hmmer
C
Search Gene Sequence
458.sjeng
C
Artificial Intelligence
462.libquantum C
Physics / Quantum Computing
464.h264ref
C
Video Compression
471.omnetpp
C++
Discrete Event Simulation
473.astar
C++
Path-finding Algorithms
483.xalancbmk C++
XML Processing
Computer Science, University of Warwick
5
Metrics
Communication:

Bandwidth (bytes/sec)
• How much data can be sent per second over the network

Latency (seconds)
• The time between one processor sending a message and the other processor
receiving the message

Interconnection type: On-board interconnection or over networks.

Topologies: bus, crossbar, hub, switch

Protocols: stacks

unicast, multicast, broadcast.
Storage capabilities:

Storage facilities: register, cache, memory, hard disk

Bandwidth and Latency.
•
Bandwidth: how much data can be accessed per second in a certain storage facility
•
Latency: the time between sending a data accessing request and receiving the requested
data

Memory hierarchies (cpu register-> cache -> main memory -> remote memory)

Local, remote file systems
Computer Science, University of Warwick
6
Top500 Supercomputer list
Website:
www.top500.org
Top500
project Started in 1993, updated twice a year
Aiming
to track the trend in HPC
Using
LINPACK to measure the performance (FLOPS)

Essentially, LINPACK is to solve the dense system of linear
equations Ax=b (commonly encountered in engineering area)

Users are allowed to change the problem size to get the
maximum performance, which is used to rank the
supercomputers

Theoretical peak performance is also given for reference
Computer Science, University of Warwick
7
Top500 Supercomputer list
Tends
to represent parallel computers, so distributed systems such as SETI@Home
are neglected.
Does
not consider storage or I/O issues
Both
custom designed machines and commodity machines win positions in the list
General
trend towards commodity machines (COTS - Commodity Off-The-Shelf).
BlueGene/L, however, is not a COTS machine
Connecting
a large number of machines with relatively lower performance is more
rewarding than connecting a small number of machines each with high performance

Read the paper: “A note on the Zipf distribution of Top500 supercomputers”
(download from my homepage)
Performance

doubles each year, better than Moore’s Law.
Moore’s Law : performance doubles approximately every 18 months
Dominated
by the United States (location map of the Top100 machines:
http://www.top500.org/lists/2006/11/top100map)
UK
supercomputers in the list

Cambridge: No.20 (http://www.top500.org/system/8267 ),

AWE: No. 15
Computer Science, University of Warwick
8
Top Machine
BlueGene/L

first supercomputer in the Blue Gene project

Specialised systems based on the Power
architecture.

•
Individual power 400 processors at 700Mhz
•
Two processors reside in a single chip.
•
Two chips reside on a “compute card” with
512MB memory.
•
16 of these compute cards are placed on a
node board.
•
32 node boards fit into one cabinet, and
there are 64 cabinets.
•
130,712 CPUs with theoretical peak of
183.5 TFLOPS/s
•
Multiple network topologies available,
which can be selected depending on the
application.
High density of processors in a small area:
•
Low power and (comparatively) slow
processors - just lots of them!
•
Fast interconnects and low-latency.
Computer Science, University of Warwick
9