Chapter 18 Multicore Computers

Download Report

Transcript Chapter 18 Multicore Computers

Chapter 18
Multicore Computers
Vy Luong
Multicore Computers
 Chip multiprocessor:
combines two or more
processors (cores) on a
die.
 Each core consists of:
registers, ALU, pipeline
hardware, control unit
and L1 cache.
Hardware Performance Issues
 Goal: increase instruction-level parallelism.
 Superscalar
Replicate execution resources enabling parallel execution
of instructions in parallel pipelines.
 Simultaneous multithreading (SMT)
Duplicate register banks so that multiple threads can share
the use of pipeline resources.
 Problem: Managing multiple threads and power
consumption.
Why Multicore?
 Control power density by
using more of the chip
area for cache memory
(instead of logic
transistors).
 Near linear performance
improvement.
Applications That Benefit From
Multicore Systems






Servers
Multithreaded native applications
Multiprocess applications
Java applications
Multiinstance applications
Valve Game Software
Valve
 Reprogrammed Source engine software to use
multithreading to exploit the power of multicore
processor chips from Intel and AMD.
 Twice the performance with coarse threading.
 Hybrid threading approach (combine coarse with finegrained threading).
 Scene-rendering lists for multiple scenes in parallel
(and other graphic-related simulation).
Multicore Organization
 Variables in a multicore
organization:
 Number of core
processors on the chip
 Number of levels of
cache memory
 Amount of cache
memory that is shared
Superscalar or SMT?
 Intel Core Duo: individual cores are superscalar.
 Intel Core i7: Implement SMT cores.
 Advantages: scales up the number of hardware threads
that the system supports.
 Multicore system with four cores (and SMT) that
supports four simultaneous threads in each core, on the
application level, appears the same as 16 cores.
 SMT appears to be more attractive than superscalar.
Intel Core Duo
 Introduced in 2006.
 Two x86 superscalar
processors.
 Separate thermal control
units.
 Advanced Programming
Interrupt Controller
Intel Core i7
 Introduced in November
2008.
 Four x86 SMT processors
 DDR3
 QuickPath interconnect
ARM11 MPCore
 Can be configured with
up to four processors.
 DIC
 Timer
 Watchdog
Interrupt Handling
 support between 0 and 255 hardware interrupt inputs.
 maintains a list of interrupts, showing their priority and
status.
 DIC satisfies two requirements:
 Routing an interrupt request to a single CPU or CPUs, as required.
 Provide interprocessor communication so a thread on one CPU
can cause activity by a thread on another CPU.
 Interrupts:
 Inactive - processed by that CPU but pending or active in some
CPUs to which it is targeted.
 Pending – asserted but processing has not started.
 Active – started but processing is not completed.
Cache Coherency
 Snoop unit control (SCU): resolve bottlenecks related to access
to shared data.
 The SCU introduces three types of optimization:
 direct data intervention
 enables copying clean data from one CPU L1 data cache to another
CPU L1 data cache without accessing external memory.
 duplicated tag RAMs
 duplicated versions of L1 tag RAMs used by the SCU to check for data
availability before sending coherency commands to the relevant
CPUs.
 migratory lines
 enables moving dirty data from one CPU to another without
writing to L2 and reading the data back in from external memory.