Transcript Document
Chapter 1
An Introduction to Processor
Design
부산대학교
컴퓨터공학과
1.1 Processor Architecture &
Organization
All modern general-purpose computers employ
“stored program concept”
IAS computer by von Neumann at Princeton Institute
for Advanced Studies (in 1946)
First implemented in ‘Baby Machine’ at Univ. of
Manchester, England (in 1948)
[Figure 1.1] The state in a stored-program digital
computer
FF.. FF16
ins truc tions
regis ters
addres s
dat a
processor
ins truc tions
and data
2015-04-09
memory
PNU Computer Eng.
00. .0016
2
1.1 Processor Architecture &
Organization
50 years of development:
performance of processors h
cost i
cost-effective computers (principles of operation not
changed much)
Most of improvements:
Advances in technology of electronics
New insights:
2015-04-09
Vacuum tubes -> transistors -> ICs -> VLSI
Virtual memory (early 1960s)
Cache memory
Pipelining
RISC
PNU Computer Eng.
3
1.2 Abstraction in Hardware Design
Transistors (elementary component)
Logically act as inverters
Logic gates
CMOS NAND gate (using 4 trs)
If A = B = Vdd, output = Vss
If either A or B (or both) = Vss, output =Vdd
=> output = not(A.B)
Transistor circuit, logic symbol, truth table
Vdd
A
A.B
A
B
out put
A
B
Output
0
0
1
0
1
1
1
0
1
1
1
0
B
Vss
2015-04-09
Logic sy mbol
PNU Computer Eng.
Truth table
4
1.2 Abstraction in Hardware Design
The gate abstraction
Simplify the process of designing circuits with great number of trs
Removes the need to know that the gate is built from trs
Free from implementation technology in function level
Eg. Field effect tr, bipolar tr, etc.
However, performance difference exists
Levels of abstraction
Trs
Gates, memory cells
Adder, MUX, decoder, registers
ALUs, shifters, memory blocks
Processors, peripherals, memories
ICs
PCBs
PCs, controllers, mobile phones
2015-04-09
PNU Computer Eng.
5
1.3 MU0 – a simple processor
A simple form of processor can be built from a few
basic components
PC (program counter)
ACC (accumulator)
ALU (arithmetic-logic unit)
IR (instruction register)
Instruction decoder, control logic
The MU0 instruction set
A 16-bit machine with a 12-bit address space (4K x 2 bytes:
8K bytes memory)
Instructions: 16 bits long (op: 4 bits, address field: 12 bits)
4 bits
opcode
2015-04-09
12 bits
S
PNU Computer Eng.
6
1.3 MU0 – a simple processor
[Table 1.1] The MU0 instruction set
Instruction
Opcode
Effect
LDA S
0000
ACC := mem 16[S]
STO S
0001
mem 16[S] := ACC
ADD S
0010
ACC := ACC + mem 16[S]
SUB S
0011
ACC := ACC - mem 16[S]
JMP S
0100
PC := S
JGE S
0101
if ACC >= 0 PC := S
JNE S
0110
if ACC !=0 PC := S
STP
0111
stop
2015-04-09
PNU Computer Eng.
7
1.3 MU0 – a simple processor
Datapath
A register transfer level (RTL) design style based on
registers, MUXs, and so on
[Figure 1.5] MU0 datapath example
address bus
PC
control
IR
memory
ALU
ACC
data bus
2015-04-09
PNU Computer Eng.
8
RTL level design
[Figure 1.6] MU0 register transfer level organization
Control signals:
enables on all of regs
function select lines to ALU
select control lines for two MUXs
control for a tri-state driver
to send ACC value to memory
MEMrq (memory request)
RnW (read/write control lines)
2015-04-09
PNU Computer Eng.
9
1.4 Instruction set design
To build a high-performance processor (beyond MU0
inst. set), inst. set design is important.
4 address insts (the most general form)
Ex) add d, s1, s2, next_i; d := s1 + s2
f bits
n bits
f unc tion op 1 addr.
n bits
op 2 addr.
n bits
n bits
des t. addr. nex t_i addr.
3 address insts
Make address of the next inst. implicit using PC (except for
branch)
Ex) add d, s1, s2; d := s1 + s2
f bits
n bits
f unction op 1 addr.
2015-04-09
n bits
op 2 addr.
n bits
dest. addr.
PNU Computer Eng.
10
1.4 Instruction set design
2 address insts
Make destination reg. the same as one of source reg.
Ex) add d, s1; d := d + s1
f bits
n bits
f unct ion op 1 addr.
n bits
des t. addr.
1 address insts
AC is used as destination
Ex) add s1; AC := AC + s1
f bits
n bits
f unction op 1 addr.
0 address insts (using a stack)
Ex) add; tos := tos + next on stack
f bits
f unc tion
2015-04-09
PNU Computer Eng.
11
1.4 Instruction set design
Addressing modes
Immediate addressing: immediate data
Absolute addressing: inst. contains full address for data
Indirect addressing: inst. contains address of location that
contains address of data
Register addressing: data is in a reg.
Register indirect addressing
Index addressing
Stack addressing
2015-04-09
PNU Computer Eng.
12
1.4 Instruction set design
Control flow instructions
Subroutine calls & returns
System calls
Branch, jump
Conditional branch
Branch to an operating system routine
Exceptions
Error handling
2015-04-09
PNU Computer Eng.
13
1.5 Processor design trade-offs
CISC vs RISC
CISC
To reduce semantic gap b/w high level language & machine
instruction
Complex sequence of operations
Make compiler’s job easy
RISC
ARM’s middle name: from RISC
Reducing semantic gap is not the right way to make an efficient
computer
[Table 1.3] Typical dynamic instruction usage
Instruction type
2015-04-09
Dynamic usage
Data movement
Control flow
43%
23%
Arithmetic operations
15%
Comparisons
13%
Logical operations
5%
Other
1%
PNU Computer Eng.
14
1.5 Processor design trade-offs
Data movement b/w regs and memory:
almost half
Control flow such as branches & procedure
calls: almost quarter
Arithmetic operations: only 15%
Complex arithmetic insts do not help much
The most important tech: pipelining, cache
memory
To make processors go faster
2015-04-09
PNU Computer Eng.
15
1.5 Processor design trade-offs
Pipelines
1.
2.
3.
4.
5.
6.
Fetch
Decode
REG: get operands from register bank
ALU
MEM: access memory for an operand, if necessary
RES: write result back to register bank
[Figure 1.13] Pipelined instruction execution
1
f et ch dec
2
3
instruction
reg
f et ch dec
ALU mem res
reg
f et ch dec
ALU mem res
reg
ALU mem res
time
2015-04-09
PNU Computer Eng.
16
1.5 Processor design trade-offs
Pipeline hazards
Read after write hazard (data hazard)
1
Result from one inst is used as an operand by the next inst =>
inst2 must stall until the result is available
[Figure 1.14] Read-after-write pipeline hazard
f etch dec
2
reg ALU mem res
f etch dec
stall
reg ALU mem res
instruction
time
2015-04-09
PNU Computer Eng.
17
1.5 Processor design trade-offs
Branch hazard
Solution:
Compute branch target earlier (if possible)
The target may be computed speculatively
Delayed branch
[Figure 1.15] Pipelined branch behavior
1 (branch) f et ch dec
2
reg
f et ch dec
3
ALU mem res
reg
f et ch dec
4
5 (br anch tar get)
ALU mem res
reg
f et ch dec
ALU mem res
reg
f et ch dec
ALU mem res
reg
ALU mem res
instruction
time
Pipeline efficiency
2015-04-09
The deeper the pipeline, the worse the problems get: RISC approach is
better
PNU Computer Eng.
18
1.6 RISC
In 1980, Patterson: RISCI project
RISCI arch
Fixed (32-bit) inst size with few formats
Load-store arch:
RISCI organization
Insts that process data operate only on regs
Separate insts to access memory
A large register bank (32 32-bit regs) to allow load-store arch to
operate efficiently
Hard-wired inst decode logic
Pipelined execution
Single cycle execution
RISCI advantages
A smaller die size
A shorter development time
A higher performance (controversial)
2015-04-09
PNU Computer Eng.
19