Šiuolaikinių kompiuterių architektūra

Download Report

Transcript Šiuolaikinių kompiuterių architektūra

COMPUTER
ARCHITECTURE
(T120B125)
Assoc.Prof. Stasys Maciulevičius
Computer Dept.
[email protected]
Computer performance
Performance. What is it? How can it be evaluated
or measured?
Calculation time; it includes everything: CPU time,
memory and drives access time, input/output time, time
spent by operating system;
• CPU time includes:
- CPU time for user tasks;
- CPU time for system tasks.
CPU time TCPU = NCPU ×  = NCPU / F ;
•
•
NCPU - number of clocks used by executing program
•

•
F
2012-2014
- clock time
- clock rate ( = 1/F)
©S.Maciulevičius
2
Computer performance
Average clocks per instruction:
CPI = NCPU / N,
CPI - average number of clocks per instruction,
N - number of instructions executed by program.
Replacing NCPU by N × CPI , we get:
TCPU = N × CPI × 
or
TCPU = N × CPI / F .
CPI = (CPIi × Ni )/ N = (CPIi × Ni / N )
2012-2014
©S.Maciulevičius
3
Gibson mix
The Gibson Mix,
was produced by
J.Gibson of IBM for
scientific
applications
This had the
following weighting
for the variation
used here:
2012-2014
Group of instructions
Rate
Fixed Point Add/Subtract
Fixed Point Multiply
Fixed Point Divide
Branch
Compare
Transfer (8 characters)
Shift
Logical
Modification
Floating Point Add
Floating Point Multiply
Floating Point Divide
0.330
0.006
0.002
0.065
0.040
0.175
0.046
0.017
0.190
0.073
0.040
0.016
©S.Maciulevičius
4
MIPS metrics
MIPS metrics:
MIPS = N / (T × 106) = F / (CPI × 106).
This measure is easy to understand. However, it
should be metioned that :
• MIPS speeds are highly dependent on instruction
set, so comparing of computers having different
instruction sets is problematic;
• effective MIPS speeds are highly dependent on
the programming language used;
• in some cases MIPS even acts contrary to
productivity
2012-2014
©S.Maciulevičius
5
MIPS metrics
MIPS speeds of some CPUs:
AMD Athlon
3,561 MIPS at
1.2 GHz
3.0 MIPS/MHz
2000
Pentium 4 Extreme
Edition
9,726 MIPS at
3.2 GHz
3.0 MIPS/MHz
2003
Intel Core 2 Extreme
X6800
27,079 MIPS at
2.93 GHz
9.2 MIPS/MHz
2006
Intel Core i7 Extreme
QX9770 (Quad core)
59,455 MIPS at
3.2 GHz
18.6 MIPS/MHz
2008
AMD Phenom II X4
940 Black Edition
42,820 MIPS at
3.0 GHz
14.3 MIPS/MHz
2009
Intel Core i7 Extreme
3960X (Hex core)
177,730 MIPS at 53.4 MIPS/MHz
3.33 GHz
2011
2012-2014
©S.Maciulevičius
6
MFLOPs metrics
MFLOPs is an acronym meaning Millions
FLoating point Operations Per Second:
MFLOPs = N / (T × 106)
It should be taken into account that:
• different processors may have different floating point
instruction sets; (Cray-2 doesn’t have division,
Motorola 68882 has, etc.);
• duration of floating point instructions varies in wide
range.
2012-2014
©S.Maciulevičius
7
Intel Processor Numbers



The processor number is one of several factors,
along with processor brand, specific system
configurations and system-level benchmarks, to
be considered when choosing the right processor
for your computing needs.
Intel processor numbers are based on a variety of
features that may include the processor's
underlying architecture, cache, Front Side Bus,
clock speed, power and other Intel technologies
A processor number represents a broad set of
features that can influence overall computing
experience but is not a measurement of
performance
2012-2014
©S.Maciulevičius
8
Intel Processor Numbers

Processor numbers for the 1st generation
Intel® Core™ i7 brand have the i7 identifier
followed by a three digit numerical
sequence:
2012-2014
©S.Maciulevičius
9
Intel Processor Numbers
The table below explains the alpha prefixes used
for the Intel Core 2 processor families
Alpha Prefix
Description
QX
X
Desktop or mobile quad-core extreme performance processors
Desktop or mobile dual-core extreme performance processors
Q
E
T
Desktop quad-core high performance processors
Desktop energy efficient dual-core processors with TDP greater
than or equal to 55W
Mobile highly energy efficient processors with TDP 30-39W
P
Mobile highly energy efficient processor with TDP 20-29 W
L
U
Mobile highly energy efficient with TDP 12-19W
Mobile ultra high energy efficient with TDP less than or equal to
11.9W
Mobile small form-factor with 22x22 BGA package
S
2012-2014
©S.Maciulevičius
10
Intel Processor Numbers

Processor numbers for the 3rd generation Intel Core
processor family have an alpha/numerical identifier
followed by a four digit numerical sequence (3xxx), and
may have an alpha suffix depending on the processor. The
table below explains the alpha suffixes used for the 3rd
generation Intel Core processor family
Alpha Suffix
K
QM
S
T
2012-2014
Description
Unlocked
Quad-Core Mobile
Performance optimized lifestyle
Power optimized lifestyle
©S.Maciulevičius
11
Intel Atom Processor Numbers


Processor numbers for the Intel Atom processor family
are categorized by a three digit numerical sequence
Netbook class Intel Atom processors have an alpha prefix
of N, and Intel Atom processors with an alpha prefix of Z
indicate the processor is for Mobile Internet Devices
(MIDs)
2012-2014
©S.Maciulevičius
12
Intel Xeon and Itanium Numbers

Intel Xeon and Intel Itanium processor numbers are
categorized in four digit numerical sequences, and may
have an alpha prefix to indicate power and performance
Alpha
Prefix
Description
X
Performance
E
Mainstream (rack-optimized)
L
Power-Optimized
Processor Family
Numb. Sequence
System Type
Intel® Itanium® processor
9000
Multi-processor and dual-processor
Intel® Xeon® processor
7000
Multi-processor
Intel® Xeon® processor
5000
Dual-processor
Intel® Xeon® processor
3000
Single-processor
2012-2014
©S.Maciulevičius
13
Intel Xeon Phi Coprocessor Numbers
2012-2014
©S.Maciulevičius
14
AMD Opteron processor numbers
Series
100 Series
200 Series
800 Series
1-way
Up to 2-way
Up to 8-way
Socket
Socket 939*
Socket 940
Socket 940
Performance
100 Series
Benchmarks
200 Series
Benchmarks
800 Series
Benchmarks
Scalability
Single-Core Options
Frequency
Model Numbers
1.6GHz
-
Model 242
Model 842
1.8GHz
Model 144
Model 244
Model 844
2.0GHz
Model 146
Model 246
Model 846
2.2GHz
Model 148
Model 248
Model 848
2.4GHz
Model 150
Model 250
Model 850
2.6GHz
Model 152
Model 252
Model 852
2.8GHz
Model 154
Model 254
Model 854
2012-2014
©S.Maciulevičius
15
AMD Opteron processor numbers
Series
100 Series
200 Series
800 Series
1-way
Up to 2-way
Up to 8-way
Socket
Socket 939*
Socket 940
Socket 940
Performance
100 Series
Benchmarks
200 Series
Benchmarks
800 Series
Benchmarks
Scalability
Dual-Core Options
Frequency
Model Numbers
1.8GHz
Model 165
Model 265
Model 865
2.0GHz
Model 170
Model 270
Model 870
2.2GHz
Model 175
Model 275
Model 875
2.4 GHz
Model 180
Model 280
Model 880
2.6GHz
Model 185
Model 285
Model 885
2012-2014
©S.Maciulevičius
16
Processor performance
In the paper “The Fundamentals of Performance”
(http://www.devx.com/Intel/Article/30831?trk=DXRSS_LATEST)
performance of modern microprocessors is
characterized as follows:
performance = clock speed x IPC x (number
of cores x effectiveness)
IPC – number of instructions executed per 1 clock
Effectiveness multiplier for dual cores CPU equals
1,5-1,7
2012-2014
©S.Maciulevičius
17
Benchmarks
Who develops:
1. Manufacturers
2. Users
3. Special institutions
4 types of benchmarks:
1) Real applications
2) Kernels
3) Game like
4) Syntetic tests
2012-2014
©S.Maciulevičius
18
Benchmarks
Some benchmarks for performance
evaluation of processor’s :
•
•
Dhrystone – for integer arithmetic performance
Whetstone – for floating-point arithmetic
performance
•
•
•
Livermore Lops – a benchmark for parallel
computers (based on applied physics tasks)
Linpack – a software library for performing numerical
linear algebra (problem size 100х100, 1000х1000, ... )
on digital computers
NAS Parallel Benchmarks – a set of
benchmarks targeting performance evaluation of highly
parallel supercomputers
2012-2014
©S.Maciulevičius
19
Benchmarks – PCMark7
PCMark7 includes more than 25 individual workloads
combined into 7 separate tests to give different views of
system performance







The PCMark test measures overall system performance and returns an official
PCMark score.
The Lightweight test measures the capabilities of entry level systems unable to
run the full PCMark suite.
The Entertainment test measures system performance in entertainment, media
and gaming scenarios.
The Creativity test measures performance in typical creativity scenarios
involving images and video.
The Productivity test measures system performance scenarios using the
Internet and office applications.
The Computation test contains workloads that isolate the computation
performance of the system.
The Storage test contains workloads that isolate the performance of the PC’s
storage system
2012-2014
©S.Maciulevičius
20
Benchmarks – PCMark8
PCMark 8 Basic Edition (free):





Complete performance measurement for your PC.
Includes Home, Creative and Work benchmarks.
Test everything from tablets to desktop PCs.
Easy to use, no technical know-how needed.
Free online account to manage your results.
2012-2014
©S.Maciulevičius
21
Benchmarks - SYSmark 2012



SYSmark 2012 is an application-based
benchmark that reflects usage patterns of
business users in the areas of office productivity,
data/financial analysis, system management,
media creation, 3D modeling and web
development
SYSmark 2012 is a ground up development and
features the latest and most popular applications
from each of their respective fields
SYSmark 2012 v1.5 supports Microsoft Windows
7 and Windows 8
2012-2014
©S.Maciulevičius
22
SPEC


The Standard Performance Evaluation
Corporation (SPEC) is a non-profit
corporation formed to establish, maintain and
endorse a standardized set of relevant
benchmarks that can be applied to the newest
generation of high-performance computers.
SPEC develops benchmark suites and also
reviews and publishes submitted results from
our member organizations and other benchmark
licensees
2012-2014
©S.Maciulevičius
23
SPEC benchmarks
SPEC CPU2006 is designed to provide
performance measurements that can be used
to compare compute-intensive workloads on
different computer systems, SPEC CPU2006
contains two benchmark suites:
 CINT2006 for measuring and comparing computeintensive integer performance, and
 CFP2006 for measuring and comparing computeintensive floating point performance
2012-2014
©S.Maciulevičius
24
SPEC benchmarks
SPECint and SPECfp benchmarks differs in
percentage of FP operations:
2012-2014
©S.Maciulevičius
25
SiSoft Sandra 2013
Sandra means "System ANalyser, Diagnostic
and Reporting Assistant."
SiSoftware Sandra 2013, the latest version of
utility which includes remote analysis,
benchmarking and diagnostic features for
PCs, servers, Pocket PC1, Smartphone1,
small office/home office (SOHO) networks
and enterprise networks
Supports Win32 x86, Win64 x64, WinCE, ARM
platforms
2012-2014
©S.Maciulevičius
26
What benchmarks use engineers
reviewing processors?
In Intel’s Second-Gen Core CPUs: The Sandy Bridge Review
performance evaluated in following areas:

PCMark Vantage – Memory, Gaming, Productivity,…

3DMark11
– Graphics, Physics, Performance,…

SiSoftware Sandra 2011 – Processor Arithmetics,
Multimedia, Cryptography, Memory,…




Content Creation
Productivity – OCR, WinZip, WinRar,…
Media Encoding
Games - Metro 2033, F1 2010 (DX11), Aliens Vs. Predator
(DX11)

Power Consumption
2012-2014
©S.Maciulevičius
27
Speedup
Amdahl's law, also known as Amdahl's argument, is named
after computer architect Gene Amdahl, and is used to find
the maximum expected improvement to an overall system
when only part of the system is improved :
1
S = ----------------(1 - p) + p / k
Here:
– S – resulting speedup,
– p – proportion of that computation where the
improvement has a speedup of k
– k – speedup of improvement to a computation
2012-2014
©S.Maciulevičius
28
Speedup of parallelized
implementations
Amdahl's law is a model for the relationship
between the expected speedup of
parallelized implementations of an
algorithm relative to the serial algorithm,
under the assumption that the problem
size remains the same when parallelized
2012-2014
©S.Maciulevičius
29
Speedup of parallelized
implementations
2012-2014
©S.Maciulevičius
30
TOP500



The TOP500 project ranks and details the 500
most powerful (non-distributed) computer systems
in the world (see www.top500.org )
The project was started in 1993 and publishes an
updated list of the supercomputers twice a year
The LINPACK Benchmarks are a measure of a
system's floating point computing power. They
measure how fast a computer solves a dense n by
n system of linear equations Ax = b, which is a
common task in engineering
2012-2014
©S.Maciulevičius
31
Green500 List

The Green500 list ranks computers from the
TOP500 list of supercomputers in terms of
energy efficiency

Today’s release of the Green500 List
(http://www.green500.org/lists/green201311) shows that the top of
the list is dominated by heterogeneous
supercomputers, those that combine two or more
types of processing elements together, such as a
traditional processor or central processing unit
(CPU) combined with a graphical processing unit
(GPU) or coprocessor
2012-2014
©S.Maciulevičius
32
Green500 List
Green500
MFLOPS/W
Rank
1
4,503.17
2
3,631.86
3
4
5
Site*
Computer*
TSUBAME-KFC - LX 1U4GPU/104Re-1G Cluster, Intel
GSIC Center,
Tokyo Institute of Xeon E5-2620v2 6C 2.100GHz,
NVIDIA K20x
Wilkes - Dell T620 Cluster, Intel
Cambridge
Xeon E5-2630v2 6C 2.600GHz,
University
NVIDIA K20
3,517.84
Center for
Computational
Sciences,
University of
Tsukuba
HA-PACS TCA - Cray 3623G4-SM
Cluster, Intel Xeon E5-2680v2 10C
2.800GHz, NVIDIA K20x
3,185.91
Swiss National
Supercomputing
Centre (CSCS)
Piz Daint - Cray XC30, Xeon E52670 8C 2.600GHz, NVIDIA K20x
Level 3 measurement data
available
3,130.95
ROMEO HPC
Center ChampagneArdenne
romeo - Bull R421-E3 Cluster, Intel
Xeon E5-2650v2 8C 2.600GHz,
NVIDIA K20x
2012-2014
©S.Maciulevičius
Total Power
(kW)
27.78
52.62
78.77
1,753.66
81.41
33