Lecture 11 - It works!

Download Report

Transcript Lecture 11 - It works!

Computing Machinery
Chapter 11: Alternative Architectures
Flynn's Taxonomy
Parallel Architectures Functional Diagrams
Pipeline Processing
PRAM (Parallel Random Access Machine)
EREW - Exclusive Read/Exclusive Write
CREW - Concurrent Read/Exclusive Write
ERCW - Not Used
CRCW - Concurrent Read/Concurrent Write
Concurrent Read/Exclusive Write (CREW)
In this model, a particular address in shared memory can be read by multiple processors
concurrently.
However only one processor at a time can write to a particular address in shared
memory.
Concurrent means that the order in which two operations occur, does not affect the
outcome (or state) of the system.
Concurrent Read/Concurrent Write (CRCW) In the concurrent read, concurrent write PRAM model, multiple processors can read from
or write to the same address in shared memory concurrently.
A number of alternative interpretations for the concurrent write operation have been
studied. We can choose from a number of operations for concurrent write such as
RANDOM, PRIORITY, MAX, and SUM.
Parallel Architecture Performance Analysis
Speed - The speed of a computing system is the amount of work accomplished (e.g.
number of instructions completed) in a specified time. So we normally refer to
processing speed in terms of instructions per second.
work
S
time
Speedup - The speedup for a multi-processor system is the ratio of the time required to
solve a problem using a multi-processor computer to the time required for a singleprocessor computer. Since speedup is the ratio of two quantities that have the same
units (instructions per second), it is a unitless quantity.
Speedup 
Sn
S1
Efficiency - The efficiency of an n-processor multi-processor computer system is
defined as the speedup of the multi-processor divided by the number of processors, n.
Traditionally it has been assumed that efficiency cannot be greater than unity (1).
Sn

n  S1
Simultaneous Multithreading (SMT)
The functional difference between conventional multiprocessing and SMT is that in the
first case each functional processor is a separate physical processor and in the second
case one set of arithmetic and logical functions are shared between the logical
processors within a physical (multicore) CPU.
Scheduling Priority in SMT
When two instructions are in contention for a resource the one from the higherpriority thread slot "wins" the contention.
In order to prevent indefinite postponement, the SMT scheduling policy rotates the
priority ranking periodically.
Internal Organization of an SMT Architecture
Commercial Vector Processor Super-Computers
The vector processor concept is still a viable one. While general-purpose vector
supercomputers are being surpassed by much less expensive multiprocessors, the vector
processor concept has a likely future in special applications such as audio, and video
processing and computer generated imagery.
Array Processor for Video Decoding
Shared-Memory Multiprocessor
For speed and efficiency each processor of shared memory multiprocessor system keeps
a cache of local memory, periodically updated from a common shared memory.
The shared memory of a parallel processing system needs a management scheme that
ensures that all processsors keep a current version of all data values (this is called
memory coherence).
MESI Protocol
In the MESI protocol, a two bit tag is used to designate the status of each address of
shared memory.
modified - When the status is modified this means that the data value in cache
has been altered but is not currently held in the cache of any othe processor.
This status indicates that the address must be written back to shared memory
before it is overwritten by another word.
exclusive - When the status is exclusive thsi means that the data value is being
held only by the current processor and has not been modified. When it is time
to write over this value in cache, it does not need to be written back to the
shared memory.
shared - The shared status means that copies of this value may be stored in the
caches of other processors.
invalid - The invalid status indicates that this cache line is not valid. In order
to validate these data, the cache must be updated from shared memory.
Multicore Data Coherence
The MOESI protocol is an extension of the MESI protocol that adds a new status called
owned. A processor can write to a cache line it owns even if other processors are
holding copies.
When a processor modifies data it owns, it is responsible for updating the copies being
held by other processors.
The MOESI protocol is used in multicore CPU's in which processor-to-processor
communication is much faster than access to shared memory.
4-D Hypercube Interconnections
Neural Networks
The Future of Computer Architecture
the end of Moore's Law
It is believed that the ability to achieve process shrinks will continue as far as into
the early 2010's but relatively soon.
Specifically, the quantum mechanical properties of electrons and other atoms begin
to dominate in the substrate when the feature size reaches around 50 nanometers.
At sizes smaller than this, only a few electrons are needed to saturate the channel.
Statistical fluctuations due to thermal effects will make the switching of transistors
difficult to control.