Chapter 18 Multicore Computers
Download
Report
Transcript Chapter 18 Multicore Computers
Chapter 18
Multicore Computers
Vy Luong
Multicore Computers
Chip multiprocessor:
combines two or more
processors (cores) on a
die.
Each core consists of:
registers, ALU, pipeline
hardware, control unit
and L1 cache.
Hardware Performance Issues
Goal: increase instruction-level parallelism.
Superscalar
Replicate execution resources enabling parallel execution
of instructions in parallel pipelines.
Simultaneous multithreading (SMT)
Duplicate register banks so that multiple threads can share
the use of pipeline resources.
Problem: Managing multiple threads and power
consumption.
Why Multicore?
Control power density by
using more of the chip
area for cache memory
(instead of logic
transistors).
Near linear performance
improvement.
Applications That Benefit From
Multicore Systems
Servers
Multithreaded native applications
Multiprocess applications
Java applications
Multiinstance applications
Valve Game Software
Valve
Reprogrammed Source engine software to use
multithreading to exploit the power of multicore
processor chips from Intel and AMD.
Twice the performance with coarse threading.
Hybrid threading approach (combine coarse with finegrained threading).
Scene-rendering lists for multiple scenes in parallel
(and other graphic-related simulation).
Multicore Organization
Variables in a multicore
organization:
Number of core
processors on the chip
Number of levels of
cache memory
Amount of cache
memory that is shared
Superscalar or SMT?
Intel Core Duo: individual cores are superscalar.
Intel Core i7: Implement SMT cores.
Advantages: scales up the number of hardware threads
that the system supports.
Multicore system with four cores (and SMT) that
supports four simultaneous threads in each core, on the
application level, appears the same as 16 cores.
SMT appears to be more attractive than superscalar.
Intel Core Duo
Introduced in 2006.
Two x86 superscalar
processors.
Separate thermal control
units.
Advanced Programming
Interrupt Controller
Intel Core i7
Introduced in November
2008.
Four x86 SMT processors
DDR3
QuickPath interconnect
ARM11 MPCore
Can be configured with
up to four processors.
DIC
Timer
Watchdog
Interrupt Handling
support between 0 and 255 hardware interrupt inputs.
maintains a list of interrupts, showing their priority and
status.
DIC satisfies two requirements:
Routing an interrupt request to a single CPU or CPUs, as required.
Provide interprocessor communication so a thread on one CPU
can cause activity by a thread on another CPU.
Interrupts:
Inactive - processed by that CPU but pending or active in some
CPUs to which it is targeted.
Pending – asserted but processing has not started.
Active – started but processing is not completed.
Cache Coherency
Snoop unit control (SCU): resolve bottlenecks related to access
to shared data.
The SCU introduces three types of optimization:
direct data intervention
enables copying clean data from one CPU L1 data cache to another
CPU L1 data cache without accessing external memory.
duplicated tag RAMs
duplicated versions of L1 tag RAMs used by the SCU to check for data
availability before sending coherency commands to the relevant
CPUs.
migratory lines
enables moving dirty data from one CPU to another without
writing to L2 and reading the data back in from external memory.