Ch 12 - RISC

Download Report

Transcript Ch 12 - RISC

Chapter XI
Reduced Instruction Set Computing
(RISC)
CS 147
Li-Chuan Fang
Introduction
 The world of microprocessors and CPUs can be
divided into two parts:


complex instruction set computers (CISC processors)
reduced instruction set computers (RISC processors)
 CISC processors have larger instruction sets that
often include some particularly complex
instructions. These instructions usually correspond
to specific statements in high-level languages.
 RISC processors exclude these instructions, opting
for a smaller instruction set with simpler
instructions.
Overview
 the rationale for RISC processors
 RISC instruction sets
 instruction pipelines and register windows
RISC Rationale
 The first microprocessors ever developed
were very simple processors with very
simple instruction sets.
 Current CISC microprocessor instruction
sets may include over 300 instructions.
 In general, the greater the number of
instructions in an instruction set, the
propagation delay is within the CPU.
RISC’s Features
 Fixed - Length Instructions
 Limited Loading and Storing Instructions
Access Memory
 Fewer Addressing Modes
 Instruction Pipeline
 Large Number of Registers
RISC’s Features cont.





Hardwired Control Unit
Delayed Loads and Branches
Speculative Execution of Instructions
Optimizing Compiler
Separate Instruction and Data Streams
Fixed - Length Instructions
In RISC processors, every instruction has
the same size. For instance, an immediate
mode instruction might include an 8-bit
operand. Other instructions might use these
8 bits for opcodes of address information.
Limited Loading and Storing
Instruction Access Memory
All processors can load data from and store
data to memory.
RISC processors limit interaction with
memory to loading and storing data. If a
value from memory is to be ANDed with
the accumulator, the CPU first loads the
value into a register and then performs the
AND operation.
Fewer Addressing Modes
RISC processors typically allow only a few
addressing modes that can be processed
quickly, such as register indirect and
relative modes.
Instruction Pipeline
A pipeline is like an assembly line in which many
products are being worked on simultaneously,
each at a different station.
In RISC processors, one instruction is executed
while the following instruction is being fetched.
By overlapping these operations, the CPU
executes one instruction per clock cycle, even
though each instruction requires three cycles to be
fetched, decoded, and executed.
Large Number of Registers
Having a large number of registers allows the
CPU to store many operands internally.
When the operands are needed, the CPU fetches
them from the registers, rather than from memory.
This reduces the access time significantly. The
registers can also be used to pass parameters to
subroutines in an efficient manner; this is
accomplished using register windowing.
Hardwired Control Unit
Combinatorial logical generally has a lower
propagation delay than a lookup ROM. For this
reason, a hardwired control unit can run at a
higher clock frequency than its corresponding
microcoded control unit.
For RISC processors, the benefit of a higher clock
rate outweighs the advantages offered by
microcoded control units, such as ease of
modification.
Delayed loads and Branches
RISC processors use delayed loads and
delayed branches to avoid waiting time.
The RISC instruction pipeline can
encounter hazards during branch
instructions or consecutive instructions that
use a common operand.
Speculative Execution of
Instructions
In speculative execution, the CPU executes
the instruction but does not store its result.
If the instruction is to be executed, the result
is stored. If not, the result is discarded.
Optimizing Compiler
An optimizing compiler can arrange
instructions to facilitate delayed loads and
branches, as well as to optimally assign
operands to registers. Fewer instructions
make it much simpler to design an
optimizing compiler for a RISC processor
than for a CISC processor.
Separate Instruction and Data
Streams
The instruction pipeline may need to access
instructions and operands from memory
simultaneously. Separating the instruction
and data streams helps to avoid memory
access conflicts.
RISC Instruction Sets
 The instruction sets of RISC processors are
reduced, or smaller in size than those of
CISC processors.
 A CISC processor might have over 300
instructions in its instruction set, but RISC
CPUs typically have fewer than 100.
Instruction Formats for the SPIM (MIPS) CPU
addi
0x08
6
Rt,
Rs
5
Rs,
Rt
5
Imm
Immediate value
16
jal
0x03
6
label
absolute jump target address
26
Instruction Pipelines and Register
Windows
 two implementation techniques commonly used in
RISC processors to improve performance:

instruction pipeline
allows RISC processors to execute one instruction per
clock cycle

incorporation of large numbers of registers within
the CPU
allows more variables to be stored in registers, rather
than memory, which reduces the time needed to
access data
Instruction Pipelines
 An instruction pipeline is very similar to a
manufacturing assembly line.
 An instruction pipeline processes an instruction
the way the assembly line processes a product.
 The first stage fetches the instruction from
memory.
 The second stage decodes the instruction and
fetches any required operands.
Instruction Pipelines cont.
 The third stage executes the instruction.
 The fourth stage stores the result.
 As with the assembly line, each stage
processes instruction simultaneously (after
an initial latency, or delay, to fill the
pipeline).
 This allows the CPU to execute one
instruction per clock cycle.
Instruction Pipelines cont.
 The IBM 801, the first RISC computer, also uses a fourstage instruction pipeline.
 Other processors, such as the RISC II use only three
stages; they combine the execute and store result
operations in a single stage.
 The MIPS processor uses a five-stage pipeline; it
decodes the instruction and selects the operand
registers in separate stages.
 Note that each stage has a register that latches its data
at the end of the stage to synchronize data flow between
stages.
Instruction Pipelines cont.
 Although we could employ several complete
control units to process instructions, a single
pipelined control unit offers hardware several
advantages.
 The primary advantage is the reduced hardware
requirements of the pipeline.
 A second advantage of instruction pipelines is the
reduced complexity of the memory interface.
Three-stage RISC Pipeline
Fetch
instruction
Decode instr.
select regs.
Execute instr.
store result
Four-stage RISC Pipeline
Fetch
instruction
Store
result
Decode instr.
select regs.
Execute
instruction
Five-stage RISC Pipeline
Fetch
instruction
Decode
instruction
Select
registers
Store
result
Execute
instruction
Data flow through three-stage RISC pipeline
Clock cycle
1
2
3
4
5
6
7
Stage
1
11 12 13 14 15 16 17
2
-- 11 12 13 14 15 16
3
-- -- 11 12 13 14 15
Data flow through four-stage RISC pipeline
Clock cycle
1
2
3
4
5
6
7
Stage
1
11 12 13 14 15 16 17
2
3
4
-- 11 12 13 14 15 16
-- -- 11 12 13 14 15
-- -- -- 11 12 13 14
Data flow through five-stage RISC pipeline
Clock cycle
1
2
3
4
5
6
7
Stage
1
11 12 13 14 15 16 17
2
3
4
5
-----
11 12 13 14
-- 11 12 13
-- -- 11 12
-- -- -- 11
15
14
13
12
16
15
14
13
Pipelines’ Problems
 One problem is memory access.
 Another problem is caused by branch
statements.
How to solve these problems?
 problem 1 - memory access

As we noted previously, the cache must separate
instructions and data to avoid memory conflicts
from the different stages of the pipeline.
 problem 2 - branch statements

There is not much that the pipeline can do about
this. Instead, an optimizing compiler is needed to
reorder the instructions to avoid this problem.
Register Windowing
 The CPU can access data in registers more
quickly than data in memory, so having
more registers makes more data available
faster.
 Having more registers also helps reduce the
number of memory references, particularly
when calling and returning from
subroutines.
Register Windowing cont.
 Although a RISC processor has many
registers, it may not be able to access all of
them at any given time.
 Most RISC CPUs have some global
registers, which are always accessible.
 The remaining registers are windowed so
that only a subset of the registers are
accessible at any specific time.
Register windowing in the
SPARC processor
Global registers
(8)
Common
input
registers
(8)
Window # 1
Windowed
registers
Window # 2
Window # 3
Local
registers
(8)
Common
output
registers
(8)
Register Windowing cont.
 The RISC CPU must keep track of which
window is active and which windows
contain valid data.
 A window pointer register contains the
value of the window that is currently active.
 A window mask register contains 1 bit per
window and denotes which windows
contain valid data.
Register Windowing cont.
 Register windows provide their greatest benefit
when the CPU calls a subroutine.
 During the calling process, the register window is
moved down one window position.
 In SPARC example, if window 1is active and the
CPU calls a subroutine, the processor activates
window 2 by updating the window pointer and
window mask registers.
Register Windowing cont.
 The CPU can pass parameters to the
subroutine via the registers that overlap
both windows, instead of through memory;
this save a significant amount of time in
accessing data.
 The CPU can use the same registers to
return results to the calling routine.
Register windowing in a CPU:
during execution of the main routine
Window
00
pointer
register (First
window
active)
Window
mask
register
1000
0
12
13
14
15
16
(Only first window
has valid data)
47
Param. # 1
Param. # 2
Param. # 3
First
window
Register windowing in a CPU:
executing a subroutine
Window
pointer
register
(Second
Window
active)
01
Window 1 10 0
mask
register
(First two windows
has valid data)
0
12
13
14
15
16
27
47
Param. # 1
Param. # 2
Param. # 3
Result
First
window
Second
window
Register windowing in a CPU:
after returning from the subroutine
Window
00
pointer
register (First
window
active)
Window
mask
register
1000
0
12
13
14
15
16
(Only first window
has valid data)
47
First
window
Result
Register Renaming
 Most recent processors may use register renaming
to add flexibility to the idea of register
windowing.
 A processor that uses register renaming can select
any registers to comprise its working register
“window”.
 The CPU uses pointers to keep track of which
registers are active and which physical register
corresponds to each logical register.
Register Renaming cont.
 Unlike register windowing, in which only specific
registers are active at any given time, register
renaming allows any group of physical registers to
be active.