Lecture 17 - Multicore Computers 1
Download
Report
Transcript Lecture 17 - Multicore Computers 1
+
CS 325: CS Hardware and Software
Organization and Architecture
Multicore Computers
1
+ Outline
Introduction
Motivation for Multi-Core
What is multi-core processor?
Properties of Multi-core systems
Applications benefit from multi-core
Multiprocessor memory types
Multi-core design
Symmetric multi-core processor
Asymmetric multi-core processor
Advantages & disadvantages of multi-core
2
+
3
Hardware Performance Issues
Microprocessors
performance
Improved organization
Increased clock frequency
Increase
in Parallelism
Pipelining
Superscalar (multi-issue)
Simultaneous multithreading (SMT)
Diminishing
have seen an exponential increase in
returns
More complexity requires more logic
Increasing chip area for coordinating and signal transfer logic
Harder to design, make and debug
+ Introduction
4
Flood of Computer Tasks(1990’s)
Increasing number of computer users
Server management
▪ We need better performance of PC or Server.
→ These demands accelerate the development of
microprocessor.
Emergence of Multi-core Processor(2000’s)
Improvements over single core
▪ Put execution cores in one die
+ Increased Complexity
Power
requirements grow exponentially with chip
density and clock frequency
Can use more chip area for cache
By
Smaller
Order of magnitude lower power requirements
2016
>100 billion transistors on 300mm2 die
>1 billion transistors for logic
5
+ Increased Complexity
6
Multicore
has the potential for near-linear
improvement
Needs
some programming effort
Won’t work for all problems
Unlikely
that one core can use all of a huge
cache effectively, so add processing units
(cores) to make an MPSoC
(Multiprocessing System on Chip)
+
Power and Memory Considerations
More action
Less action
7
We passed 50%!!!
Is this a RAM or a processor?
+ Chip Utilization of Transistors
Cache
CPU
8
+ Effective Applications for Multicore Processors
Database
Servers
(e.g. Select *)
handling independent transactions
Multi-threaded
Lotus Domino, Siebel CRM
Multi-process
applications
Oracle, SAP, PeopleSoft
Java
applications
Java VM is multi-threaded with scheduling and memory management
Sun’s Java Application Server, IBM Websphere, Tomcat
Multi-instance
native applications
applications
One application running multiple times
9
+ Motivation for Multi-Core
Exploits
increased feature-size and density
Increases
Limits
10
functional units per chip
energy consumption per operation
Constrains
growth in processor complexity
+ Multi-Core Computer
11
A multi-core processor is a processing system
composed of two or more independent cores (or
CPUs). The cores are typically integrated onto a
single integrated circuit die (known as a chip
multiprocessor or CMP).
A many-core processor is one in which the number
of cores is large enough that traditional multiprocessor techniques are no longer efficient
Somewhere in the range of several tens of cores - and likely
requires a network on chip.
+ Multi-Core Computer
12
dual-core
processor contains two independent
microprocessors.
A
dual core set-up is somewhat comparable to having
multiple, separate processors installed in the same computer.
But because the two processors are actually plugged into the
same socket, the connection between them is faster.
Ideally, a
dual core processor is nearly twice as powerful as a
single core processor.
In practice, performance gains are about 50%:
A dual core processor is likely to be about one-and-a-half
times as powerful as a single core processor.
+ Multi-Core Computer
13
A
multi-core processor implements multiprocessing in a
single physical package.
Cores may or may not share caches
May implement message passing or shared memory inter-core
communication methods.
All
cores are identical in symmetric multi-core systems.
EX: Intel Core 2 Duo
They
are not identical in asymmetric multi-core systems.
EX: IBM Cell Processor
+ CMP benefits
14
with
a shared on-chip cache memory, communication
events can be reduced to just a handful of processor
cycles.
therefore
with low latencies, communication delays have
a much smaller impact on overall performance.
threads
can also be much smaller and still be effective.
automatic
parallelization more feasible.
+
15
Core i7 and Duo
Let
us review these two Intel architectures…
+ Individual Core Architecture
Intel
Core Duo uses superscalar cores
More than one instruction executed at a time during a clock cycle.
Intel
16
Core i7 uses simultaneous multi-threading (SMT)
Scales up number of threads supported (extended superscalar
architecture)
4 SMT cores, each supporting 4 threads appears as 16 core (i7 has 2
threads per CPU)
Core i7
Core 2 duo
+ Intel x86 Multicore Organization Core Duo
2006
Two x86 superscalar, shared L2 cache
Dedicated L1 cache per core
32KB instruction and 32KB data
Thermal control unit per core
Manages chip heat dissipation with sensors, clock speed is throttled
Maximize performance within thermal constraints
Improved ergonomics (quiet fan)
Advanced Programmable Interrupt Controlled (APIC)
Inter-process interrupts between cores
Routes interrupts to appropriate core
Includes timer so OS can self-interrupt a core
17
+ Intel x86 Multicore Organization Core Duo
Power
Monitors thermal conditions and CPU activity
Adjusts voltage (and thus power consumption)
Can switch on/off individual logic subsystems to save power
Split-bus transactions can sleep on one end
2MB
Management Logic
shared L2 cache
Dynamic allocation
MESI support for L1 caches
Extended to support multiple Core Duo in SMP (not SMT)
L2 data shared between local cores (fast) or external
Bus
interface is FSB
18
+
19
Intel x86 Multicore Organization Core i7
November 2008
Four x86 SMT processors
Dedicated L2, shared L3 cache
Speculative pre-fetch for caches
On chip DDR3 memory controller
Three 8 byte channels (192 bits) giving 32GB/s
No front side bus (just like labs 1 & 2 with the SDRAM controller)
QuickPath Interconnect
Cache coherent point-to-point link
High speed communications between processor chips
Total bandwidth 25.6GB/s
+
20
What applications benefit
from multi-core?
Database
Web
servers
servers
Telecommunication
Multimedia
Scientific
In
markets
applications
applications
general, applications with Thread-level parallelism (as
opposed to instruction-level parallelism)
+
21
Multi-core architectures
Replicate multiple processor cores on a single
die.
The cores fit on a single processor socket.
The cores run in parallel
(like on a uniprocessor)
+
several
threads
several
threads
several
threads
22
several
threads
c
o
r
e
c
o
r
e
c
o
r
e
c
o
r
e
1
2
3
4
+
23
Programming for multi-core
Programmers
Write
OS
must use threads or processes.
parallel algorithms.
will map threads/processes to cores
Spread
the workload across multiple cores.
+
24
Examples
Editing
a photo while recording a TV show
through a digital video recorder.
Downloading
software while running an antivirus program.
“Anything
that can be threaded today will map
efficiently to multi-core”.
BUT: some
applications difficult to parallelize.
Examples?
Piped
processes
+
25
Multiprocessor memory types
Shared
memory:
In this model, there is one (large) common shared
memory for all processors.
Distributed
memory:
In this model, each processor has its own (small) local
memory, and its content is not replicated anywhere
else.
+
26
Microprocessor Design
Taking
the idea of superscalar operations to the next
level, it is possible to put multiple microprocessor
cores onto a single chip, and have the cores operate
in parallel with one another.
+
Symmetric Multi-core Processor(SMP)
27
A
symmetric multi-core processor is one that has
multiple cores on a single chip, and all of those
cores are identical.
Example: Intel i3, i5, i7
The
Intel i series CPU is an example of a symmetric
multi-core processor. The i series can have either 2 cores
on chip (“i3”) or 4 cores on chip (“i5/i7”). Each core in
the i series CPU is symmetrical, and can function
independently of one another. It requires a mixture of
scheduling software and hardware to farm tasks out to
each core.
+ Symmetric Multi-core Processor
Applications
Personal
Computers
Servers/Clusters
28
+ Asymmetric Multi-core Processor
29
An asymmetric multi-core processor is one
that has multiple cores on a single chip, but
those cores might be different designs.
For instance, there could be 2 general purpose
cores and 2 vector cores on a single chip.
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
•
Applications
Super Computing:
▪
IBM's latest supercomputer, IBM
Roadrunner, is a hybrid of General
Purpose CISC Opteron as well as
Cell processors.
30
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
Applications
•
Home cinema
▪
Toshiba is considering producing HDTVs
using Cell. They have already presented a
system to decode 48 standard definition
MPEG-2 streams. This can enable a
viewer to choose a channel based on
dozens of thumbnail videos displayed on
the screen in the same time.
31
Asymmetric Multi-core
Processor(ASMP) – Cell Processor
+
Applications
•
Video Processing Card
▪
Some companies, such as Leadtek,
have plans to release a PCI-E card
based upon the Cell to allow for
"faster than real time" transcoding
of H.264, MPEG-2 and MPEG-4
video.
32
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
Applications
•
Console Video Games
▪
▪
The first major commercial
application of Cell was in
Sony's PlayStation 3 game
console.
This video game console
contains the first production
application of the Cell
processor, clocked at 3.2 GHz
and containing seven out of
eight operational cores
33
Asymmetric Multi-core
+
Processor(ASMP) – Cell Processor
34
Future
Based on the unique features, Cell can bridge the
gap between
conventional desktop processors
and more specialized high-performance
processors, such as the NVIDIA and ATI
graphics-processors (GPUs).
+ Challenges resulting from multi-core
Aggravates memory wall
Memory bandwidth
▪ Way to get data out of memory banks
▪ Way to get data into multi-core processor array
Memory latency
Fragments L3 cache
Pins become strangle point
▪
▪
Rate of pin growth projected to slow and flatten
Rate of bandwidth per pin (pair) projected to grow slowly
Requires mechanisms for efficient inter-processor
coordination
Synchronization
Mutual exclusion
Context switching
35
+ Advantages of Multi-core
36
Cache
circuitry can operate at a much higher clock rate
than is possible if the signals have to travel off-chip.
Signals
between different CPUs (cores) travel shorter
distances, those signals degrade less.
These higher quality signals allow more data to be sent in a
given time period.
A
dual-core processor uses slightly less power than two
coupled single-core processors.
+ Disadvantages of Multi-core
37
Ability
of multi-core processors to increase application
performance depends on the use of multiple threads within
applications.
Most
Current video games will run faster on a 3 GHz singlecore processor than on a 2GHz dual-core processor (of the
same core architecture.
Two
processing cores sharing the same system bus and
memory bandwidth limits the real-world performance
advantage.
If
a single core is close to being memory bandwidth limited,
going to dual-core might only give 30% to 70% improvement.
If
memory bandwidth is not a problem, a 90% improvement can be
expected.
+ Conclusion
38
Multi-core processors represent an important new trend
in computer architecture.
Decreased power consumption and heat generation.
Minimized wire lengths and interconnect latencies.
They enable true thread-level parallelism with great
energy efficiency and scalability.
To utilize their full potential, applications will need to
move from a single to a multi-threaded model.
Parallel programming techniques likely to gain importance.
The difficult problem is not building multi-core hardware, but
programming it in a way that lets mainstream applications benefit from
the continued growth in CPU performance.