Lecture 17 - Multicore Computers 1

Download Report

Transcript Lecture 17 - Multicore Computers 1

+
CS 325: CS Hardware and Software
Organization and Architecture
Multicore Computers
1
+ Outline







Introduction
Motivation for Multi-Core
What is multi-core processor?
Properties of Multi-core systems
Applications benefit from multi-core
Multiprocessor memory types
Multi-core design



Symmetric multi-core processor
Asymmetric multi-core processor
Advantages & disadvantages of multi-core
2
+
3
Hardware Performance Issues
 Microprocessors
performance


Improved organization
Increased clock frequency
 Increase



in Parallelism
Pipelining
Superscalar (multi-issue)
Simultaneous multithreading (SMT)
 Diminishing


have seen an exponential increase in
returns
More complexity requires more logic
Increasing chip area for coordinating and signal transfer logic
 Harder to design, make and debug
+ Introduction

4
Flood of Computer Tasks(1990’s)
Increasing number of computer users
 Server management
▪ We need better performance of PC or Server.
→ These demands accelerate the development of
microprocessor.


Emergence of Multi-core Processor(2000’s)

Improvements over single core
▪ Put execution cores in one die
+ Increased Complexity
 Power
requirements grow exponentially with chip
density and clock frequency

Can use more chip area for cache


 By

Smaller
Order of magnitude lower power requirements
2016
>100 billion transistors on 300mm2 die

>1 billion transistors for logic
5
+ Increased Complexity
6
Multicore
has the potential for near-linear
improvement
 Needs
some programming effort
 Won’t work for all problems
Unlikely
that one core can use all of a huge
cache effectively, so add processing units
(cores) to make an MPSoC
(Multiprocessing System on Chip)
+
Power and Memory Considerations
More action
Less action
7
We passed 50%!!!
Is this a RAM or a processor?
+ Chip Utilization of Transistors
Cache
CPU
8
+ Effective Applications for Multicore Processors
 Database
 Servers
(e.g. Select *)
handling independent transactions
 Multi-threaded

Lotus Domino, Siebel CRM
 Multi-process

applications
Oracle, SAP, PeopleSoft
 Java

applications
Java VM is multi-threaded with scheduling and memory management
Sun’s Java Application Server, IBM Websphere, Tomcat
 Multi-instance

native applications
applications
One application running multiple times
9
+ Motivation for Multi-Core
Exploits
increased feature-size and density
Increases
Limits
10
functional units per chip
energy consumption per operation
Constrains
growth in processor complexity
+ Multi-Core Computer
11

A multi-core processor is a processing system
composed of two or more independent cores (or
CPUs). The cores are typically integrated onto a
single integrated circuit die (known as a chip
multiprocessor or CMP).

A many-core processor is one in which the number
of cores is large enough that traditional multiprocessor techniques are no longer efficient

Somewhere in the range of several tens of cores - and likely
requires a network on chip.
+ Multi-Core Computer
12
 dual-core
processor contains two independent
microprocessors.
A
dual core set-up is somewhat comparable to having
multiple, separate processors installed in the same computer.

But because the two processors are actually plugged into the
same socket, the connection between them is faster.
 Ideally, a
dual core processor is nearly twice as powerful as a
single core processor.

In practice, performance gains are about 50%:
 A dual core processor is likely to be about one-and-a-half
times as powerful as a single core processor.
+ Multi-Core Computer
13
A
multi-core processor implements multiprocessing in a
single physical package.
Cores may or may not share caches
 May implement message passing or shared memory inter-core
communication methods.

 All

cores are identical in symmetric multi-core systems.
EX: Intel Core 2 Duo
 They

are not identical in asymmetric multi-core systems.
EX: IBM Cell Processor
+ CMP benefits
14
 with
a shared on-chip cache memory, communication
events can be reduced to just a handful of processor
cycles.
 therefore
with low latencies, communication delays have
a much smaller impact on overall performance.
 threads
can also be much smaller and still be effective.
 automatic
parallelization more feasible.
+
15
Core i7 and Duo
 Let
us review these two Intel architectures…
+ Individual Core Architecture
 Intel

Core Duo uses superscalar cores
More than one instruction executed at a time during a clock cycle.
 Intel

16
Core i7 uses simultaneous multi-threading (SMT)
Scales up number of threads supported (extended superscalar
architecture)
 4 SMT cores, each supporting 4 threads appears as 16 core (i7 has 2
threads per CPU)
Core i7
Core 2 duo
+ Intel x86 Multicore Organization Core Duo

2006

Two x86 superscalar, shared L2 cache

Dedicated L1 cache per core

32KB instruction and 32KB data

Thermal control unit per core
 Manages chip heat dissipation with sensors, clock speed is throttled
 Maximize performance within thermal constraints
 Improved ergonomics (quiet fan)

Advanced Programmable Interrupt Controlled (APIC)
 Inter-process interrupts between cores
 Routes interrupts to appropriate core
 Includes timer so OS can self-interrupt a core
17
+ Intel x86 Multicore Organization Core Duo
 Power




Monitors thermal conditions and CPU activity
Adjusts voltage (and thus power consumption)
Can switch on/off individual logic subsystems to save power
Split-bus transactions can sleep on one end
 2MB



Management Logic
shared L2 cache
Dynamic allocation
MESI support for L1 caches
Extended to support multiple Core Duo in SMP (not SMT)
 L2 data shared between local cores (fast) or external
 Bus
interface is FSB
18
+
19
Intel x86 Multicore Organization Core i7

November 2008

Four x86 SMT processors

Dedicated L2, shared L3 cache

Speculative pre-fetch for caches

On chip DDR3 memory controller



Three 8 byte channels (192 bits) giving 32GB/s
No front side bus (just like labs 1 & 2 with the SDRAM controller)
QuickPath Interconnect



Cache coherent point-to-point link
High speed communications between processor chips
Total bandwidth 25.6GB/s
+
20
What applications benefit
from multi-core?
 Database
 Web
servers
servers
 Telecommunication
 Multimedia
 Scientific
 In
markets
applications
applications
general, applications with Thread-level parallelism (as
opposed to instruction-level parallelism)
+

21
Multi-core architectures
Replicate multiple processor cores on a single
die.
 The cores fit on a single processor socket.
The cores run in parallel
(like on a uniprocessor)
+
several
threads
several
threads
several
threads
22
several
threads
c
o
r
e
c
o
r
e
c
o
r
e
c
o
r
e
1
2
3
4
+
23
Programming for multi-core
 Programmers
 Write
 OS
must use threads or processes.
parallel algorithms.
will map threads/processes to cores
 Spread
the workload across multiple cores.
+
24
Examples
 Editing
a photo while recording a TV show
through a digital video recorder.
 Downloading
software while running an antivirus program.
 “Anything
that can be threaded today will map
efficiently to multi-core”.
 BUT: some
applications difficult to parallelize.
 Examples?
 Piped
processes
+
25
Multiprocessor memory types
 Shared
memory:
In this model, there is one (large) common shared
memory for all processors.
 Distributed
memory:
In this model, each processor has its own (small) local
memory, and its content is not replicated anywhere
else.
+
26
Microprocessor Design
 Taking
the idea of superscalar operations to the next
level, it is possible to put multiple microprocessor
cores onto a single chip, and have the cores operate
in parallel with one another.
+
Symmetric Multi-core Processor(SMP)
27
A
symmetric multi-core processor is one that has
multiple cores on a single chip, and all of those
cores are identical.
 Example: Intel i3, i5, i7
 The
Intel i series CPU is an example of a symmetric
multi-core processor. The i series can have either 2 cores
on chip (“i3”) or 4 cores on chip (“i5/i7”). Each core in
the i series CPU is symmetrical, and can function
independently of one another. It requires a mixture of
scheduling software and hardware to farm tasks out to
each core.
+ Symmetric Multi-core Processor
 Applications
 Personal
Computers
 Servers/Clusters
28
+ Asymmetric Multi-core Processor

29
An asymmetric multi-core processor is one
that has multiple cores on a single chip, but
those cores might be different designs.

For instance, there could be 2 general purpose
cores and 2 vector cores on a single chip.
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
•
Applications

Super Computing:
▪
IBM's latest supercomputer, IBM
Roadrunner, is a hybrid of General
Purpose CISC Opteron as well as
Cell processors.
30
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
Applications
•

Home cinema
▪
Toshiba is considering producing HDTVs
using Cell. They have already presented a
system to decode 48 standard definition
MPEG-2 streams. This can enable a
viewer to choose a channel based on
dozens of thumbnail videos displayed on
the screen in the same time.
31
Asymmetric Multi-core
Processor(ASMP) – Cell Processor
+
Applications
•

Video Processing Card
▪
Some companies, such as Leadtek,
have plans to release a PCI-E card
based upon the Cell to allow for
"faster than real time" transcoding
of H.264, MPEG-2 and MPEG-4
video.
32
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
Applications
•

Console Video Games
▪
▪
The first major commercial
application of Cell was in
Sony's PlayStation 3 game
console.
This video game console
contains the first production
application of the Cell
processor, clocked at 3.2 GHz
and containing seven out of
eight operational cores
33
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor

34
Future

Based on the unique features, Cell can bridge the
gap between
 conventional desktop processors
 and more specialized high-performance
processors, such as the NVIDIA and ATI
graphics-processors (GPUs).
+ Challenges resulting from multi-core

Aggravates memory wall




Memory bandwidth
▪ Way to get data out of memory banks
▪ Way to get data into multi-core processor array
Memory latency
Fragments L3 cache
Pins become strangle point
▪
▪

Rate of pin growth projected to slow and flatten
Rate of bandwidth per pin (pair) projected to grow slowly
Requires mechanisms for efficient inter-processor
coordination



Synchronization
Mutual exclusion
Context switching
35
+ Advantages of Multi-core
36
 Cache
circuitry can operate at a much higher clock rate
than is possible if the signals have to travel off-chip.
 Signals
between different CPUs (cores) travel shorter
distances, those signals degrade less.
 These higher quality signals allow more data to be sent in a
given time period.
A
dual-core processor uses slightly less power than two
coupled single-core processors.
+ Disadvantages of Multi-core
37
 Ability
of multi-core processors to increase application
performance depends on the use of multiple threads within
applications.
 Most
Current video games will run faster on a 3 GHz singlecore processor than on a 2GHz dual-core processor (of the
same core architecture.
 Two
processing cores sharing the same system bus and
memory bandwidth limits the real-world performance
advantage.
 If
a single core is close to being memory bandwidth limited,
going to dual-core might only give 30% to 70% improvement.
 If
memory bandwidth is not a problem, a 90% improvement can be
expected.
+ Conclusion



38
Multi-core processors represent an important new trend
in computer architecture.
 Decreased power consumption and heat generation.
 Minimized wire lengths and interconnect latencies.
They enable true thread-level parallelism with great
energy efficiency and scalability.
To utilize their full potential, applications will need to
move from a single to a multi-threaded model.


Parallel programming techniques likely to gain importance.
The difficult problem is not building multi-core hardware, but
programming it in a way that lets mainstream applications benefit from
the continued growth in CPU performance.