ComputerArchitecture_v4.0
Download
Report
Transcript ComputerArchitecture_v4.0
Computer Architecture
This course for using in HEDSPI Project
Nguyen Thanh Kien
Department of Computer Engineering
Faculty of Information Technology
Hanoi University of Technology
Acknowledge
These materials are used as reference for this slide:
– “Computer Architecture, Background and Motivation” slides of UCSB,
B.Parhami.
– Computer architecture - Behrooz Parhami
– Computer organization and design - John L. Hennessy & David A.
Patterson
Page 2
© NTK 2009
Reference
Text Books:
– Computer architecture - Behrooz Parhami
– Computer organization and design - John L. Hennessy & David A.
Patterson
Reference books
– Computer Architecture and Organization - John P. Hayes
– Computer Organization and Architecture – Designing for Performance William Stallings
– Computer Architecture: Single and Parallel Systems - Mehdi R.
Zargham
– Assembly Language Programming and Organization of the IBM-PC Ytha Yu, Charles Marut
Page 3
© NTK 2009
About
Lecturer:
– Nguyen Thanh Kien
– Department of Computer Engineering, Faculty of Information
Technology, Hanoi University of Technology
– Mobile: +84 983588135
– Email: [email protected]
[email protected]
– Address:
• Room 322, C1, Hanoi University of Technology
• No.1, Dai Co Viet, Hai Ba Trung, Hanoi.
Page 4
© NTK 2009
Content
1. Introduction - Computer system technology and Computer
Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization
Page 5
© NTK 2009
Content
1. Introduction - Computer system technology and Computer
Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization
Page 6
© NTK 2009
Content
1. Introduction - Computer system technology and Computer
Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization
Page 7
© NTK 2009
2. Instruction Set Architecture
2.1. Instructions and addressing
2.2. Procedures and Data
Page 8
© NTK 2009
2.1. Instructions and addressing
2.1.1 Abstract View of Hardware
2.1.2 Instruction Formats
2.1.3 Simple Arithmetic / Logic Instructions
2.1.4 Load and Store Instructions
2.1.5 Jump and Branch Instructions
2.1.6 Addressing Modes
Page 9
© NTK 2009
2.1. Instructions and addressing
2.1.1 Abstract View of Hardware
2.1.2 Instruction Formats
2.1.3 Simple Arithmetic / Logic Instructions
2.1.4 Load and Store Instructions
2.1.5 Jump and Branch Instructions
2.1.6 Addressing Modes
Page 10
© NTK 2009
2.1.1 Abstract View of Hardware
...
m 2 32
Loc 0 Loc 4 Loc 8
4 B / location
Memory
up to 2 30 words
Loc
Loc
m 8 m 4
...
EIU
(Main proc.)
$0
$1
$2
$31
ALU
Execution
& integer
unit
FPU
(Coproc. 1)
Integer
mul/div
Hi
FP
arith
$0
$1
$2
Floatingpoint unit
$31
Lo
TMU
Chapter
10
Chapter
11
Chapter
12
BadVaddr Trap &
(Coproc. 0) Status memory
Cause unit
EPC
Memory and processing subsystems for MiniMIPS.
Page 11
© NTK 2009
Data Types
Byte =Byte
8 bits
Halfword= 2 bytes
Halfword
Word =Word
4 bytes
Doubleword
= 8 bytes
Doubleword
MiniMIPS registers hold 32-bit (4-byte) words. Other common
data sizes include byte, halfword, and doubleword.
Page 12
© NTK 2009
$0
$1
0
$zero
$at
$2
$3
$v0
$v1
$4
$a0
$5
$6
$7
$a1
$a2
$a3
$8
$t0
$9
$10
$t1
$t2
$11
$t3
$12
$13
$t4
$t5
$14
$t6
$15
$16
$t7
$s0
$17
$18
$s1
$s2
$19
$s3
$20
$21
$s4
$s5
$22
$s6
$23
$24
$s7
$t8
$25
$t9
$26
$27
$28
$k0
$k1
$gp
$29
$sp
$30
$31
$fp
$ra
Page 13
A 4-byte word
sits in consecutive
memory addresses
according to the
big -endian order
(most significant
byte has the
lowest address)
Reserved for assembler use
Procedure results
Procedure
arguments
Saved
Byte numbering:
3
3
2
1
0
2
1
Register Conventions
0
When loading
a byte into a
register, it goes
in the low end Byte
Temporary
values
Word
Doubleword
Operands
Saved
across
procedure
calls
A doubleword
sits in consecutive
registers or
memory locations
according to the
big -endian order
(most significant
word comes first)
More
temporaries
Reserved for OS (kernel)
Global pointer
Stack pointer
Frame pointer
Return address
Saved
© NTK 2009
Registers and
data sizes in
MiniMIPS.
$4
$5
Registers
$6
$7
$8
$9
$10
$11
$12
$13
$14
$15
$16
$17
$18
$19
$20
$21
$22
$23
$24
$25
$26
$27
$28
Page 14
$29
Used
$a0
$a1
in
$a2
$a3
$t0
$t1
$t2
$t3
$t4
$t5
$t6
$t7
$s0
$s1
$s2
$s3
$s4
$s5
$s6
$s7
$t8
$t9
$k0
$k1
$gp
$sp
Procedure
Thisarguments
Chapter
big-endian order
(most significant
byte has the
lowest address)
Saved
10 temporary registers
Byte numbering:
Temporary
values
8 operand registers
3
2
When loading
a byte into a
register, it goe
in the low end
Word
Doublew ord
Operands
Saved
across
procedure
calls
More
temporaries
Reserved for OS (kernel)
Global pointer
© NTK 2009
Stack pointer
A doublewor
sits in conse
registers or
memory loca
according to
big-endian o
(most signifi
2.1.2 Instruction Formats
High-level language statement:
a = b + c
Assembly language instruction:
add $t8, $s2, $s1
Machine language instruction:
000000 10010 10001 11000 00000 100000
ALU-type Register Register Register
Addition
Unused
instruction
18
17
24
opcode
Instruction
cache
P
C
Register
file
Data cache
(not used)
$17
$18
Instruction
fetch
Register
readout
Register
file
ALU
$24
Operation
Data
read/store
Register
writeback
A typical instruction for MiniMIPS and steps in its execution.
Page 15
© NTK 2009
Add, Subtract, and Specification of Constants
MiniMIPS add & subtract instructions; e.g., compute:
g = (b + c) (e + f)
add
add
sub
$t8,$s2,$s3
$t9,$s5,$s6
$s7,$t8,$t9
# put the sum b + c in $t8
# put the sum e + f in $t9
# set g to ($t8) ($t9)
Decimal and hex constants
Decimal
Hexadecimal
25, 123456, 2873
0x59, 0x12b4c6, 0xffff0000
Machine instruction typically contains
an opcode
one or more source operands
possibly a destination operand
Page 16
© NTK 2009
MiniMIPS Instruction Formats
31
R
31
I
31
J
op
25
rs
20
rt
15
6 bits
5 bits
5 bits
Opcode
Source
register 1
Source
register 2
op
25
rs
20
rt
rd
10
5 bits
Destination
register
15
sh
fn
5
5 bits
6 bits
Shift
amount
Opcode
extension
operand / offset
6 bits
5 bits
5 bits
16 bits
Opcode
Source
or base
Destination
or data
Immediate operand
or address offset
op
25
jump target address
0
0
6 bits
1 0 0 0 0 0 0 0 0 0 0 0 26
0 bits
0 0 0 0 0 0 0 1 1 1 1 0 1
Opcode
Memory word address (byte address divided by 4)
MiniMIPS instructions come in only three formats: register (R),
immediate (I), and jump (J).
Page 17
0
© NTK 2009
2.1.3 Simple Arithmetic/Logic Instructions
Add and subtract already discussed; logical instructions are similar
add
sub
and
or
xor
nor
31
R
$t0,$s0,$s1
$t0,$s0,$s1
$t0,$s0,$s1
$t0,$s0,$s1
$t0,$s0,$s1
$t0,$s0,$s1
op
25
rs
#
#
#
#
#
#
20
rt
set
set
set
set
set
set
15
$t0
$t0
$t0
$t0
$t0
$t0
rd
to
to
to
to
to
to
10
($s0)+($s1)
($s0)-($s1)
($s0)($s1)
($s0)($s1)
($s0)($s1)
(($s0)($s1))
sh
5
fn
0
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 x 0
ALU
instruction
Source
register 1
Source
register 2
Destination
register
Unused
add = 32
sub = 34
The arithmetic instructions add and sub have a format that is common
to all two-operand ALU instructions. For these, the fn field specifies the
arithmetic/logic operation to be performed.
Page 18
© NTK 2009
Arithmetic/Logic with One Immediate Operand
An operand in the range [32 768, 32 767], or [0x0000, 0xffff],
can be specified in the immediate field.
addi
andi
ori
xori
$t0,$s0,61
$t0,$s0,61
$t0,$s0,61
$t0,$s0,0x00ff
#
#
#
#
set
set
set
set
$t0
$t0
$t0
$t0
to
to
to
to
($s0)+61
($s0)61
($s0)61
($s0) 0x00ff
For arithmetic instructions, the immediate operand is sign-extended
31
I
op
25
rs
20
rt
15
operand / offset
0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
1 0 Errors 0 1
addi = 8
Source
Destination
Immediate operand
Instructions such as addi allow us to perform an arithmetic or logic
operation for which one operand is a small constant.
Page 19
0
© NTK 2009
2.1.4 Load and Store Instructions
31
I
op
25
rs
20
rt
15
operand / offset
0
1 0 x 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
lw = 35
sw = 43
Memory
A[0]
A[1]
A[2]
.
.
.
A[i]
Base
register
lw
lw
Data
register
$t0,40($s3)
$t0,A($s3)
Address in
base register
Offset = 4i
Element i
of array A
Offset relative to base
Note on base and offset:
The memory address is the sum
of (rs) and an immediate value.
Calling one of these the base
and the other the offset is quite
arbitrary. It would make perfect
sense to interpret the address
A($s3) as having the base A
and the offset ($s3). However,
a 16-bit base confines us to a
small portion of memory space.
MiniMIPS lw and sw instructions and their memory addressing
convention that allows for simple access to array elements via a base
address and an offset (offset = 4i leads us to the i th word).
Page 20
© NTK 2009
lw, sw, and lui Instructions
lw
sw
$t0,40($s3)
$t0,A($s3)
lui
$s0,61
31
I
op
25
rs
# load mem[40+($s3)] in $t0
# store ($t0) in mem[A+($s3)]
# “($s3)” means “content of $s3”
# The immediate value 61 is
# loaded in upper half of $s0
# with lower 16b set to 0s
20
rt
15
operand / offset
0
0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
lui = 15
Unused
Destination
Immediate operand
0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Content of $s0 after the instruction is executed
The lui instruction allows us to load an arbitrary 16-bit value into the
upper half of a register while setting its lower half to 0s.
Page 21
© NTK 2009
Initializing a Register
Example
Show how each of these bit patterns can be loaded into $s0:
0010 0001 0001 0000 0000 0000 0011 1101
1111 1111 1111 1111 1111 1111 1111 1111
Solution
The first bit pattern has the hex representation: 0x2110003d
lui
ori
$s0,0x2110
# put the upper half in $s0
$s0, $s0, 0x003d# put the lower half in $s0
Same can be done, with immediate values changed to 0xffff
for the second bit pattern. But, the following is simpler and faster:
nor
Page 22
$s0,$zero,$zero # because (0 0) = 1
© NTK 2009
2.1.5 Jump and Branch Instructions
Unconditional jump and jump through register instructions
j
jr
verify
$ra
31
J
op
# go to mem loc named “verify”
# go to address that is in $ra;
# $ra may hold a return address
jump target address
25
0 0 0 0 1 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
j=2
x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC
31
R
op
Effective target address (32 bits)
25
rs
20
rt
15
rd
10
sh
5
fn
0
0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU
instruction
Source
register
Unused
Unused
Unused
jr = 8
The jump instruction j of MiniMIPS is a J-type instruction which is shown
along with how its effective target address is obtained. The jump register
(jr) instruction is R-type, with its specified register often being $ra.
Page 23
© NTK 2009
Conditional Branch Instructions
Conditional branches use PC-relative addressing
bltz $s1,L
beq $s1,$s2,L
bne $s1,$s2,L
31
I
op
25
20
rt
15
operand / offset
31
op
Source
25
rs
Zero
20
rt
Relative branch distance in words
15
operand / offset
0
0 0 0 1 0 x 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
beq = 4
bne = 5
Source 1
Source 2
Relative branch distance in words
Conditional branch instructions of MiniMIPS.
Page 24
0
0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
bltz = 1
I
rs
# branch on ($s1)< 0
# branch on ($s1)=($s2)
# branch on ($s1)($s2)
© NTK 2009
Comparison Instructions for Conditional Branching
slt
$s1,$s2,$s3
slti
$s1,$s2,61
31
R
op
25
20
if ($s2)<($s3), set $s1 to 1
else set $s1 to 0;
often followed by beq/bne
if ($s2)<61, set $s1 to 1
else set $s1 to 0
rt
15
rd
10
sh
31
op
fn
0
Source 1
register
25
rs
Source 2
register
20
rt
Destination
15
Unused
operand / offset
slt = 42
0
0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
slti = 10
Source
Destination
Immediate operand
Comparison instructions of MiniMIPS.
Page 25
5
0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0
ALU
instruction
I
rs
#
#
#
#
#
© NTK 2009
Examples for Conditional Branching
If the branch target is too far to be reachable with a 16-bit offset
(rare occurrence), the assembler automatically replaces the branch
instruction beq $s0,$s1,L1 with:
bne
j
L2: ...
$s1,$s2,L2
L1
# skip jump if (s1)(s2)
# goto L1 if (s1)=(s2)
Forming if-then constructs; e.g., if (i == j) x = x + y
bne $s1,$s2,endif
add $t1,$t1,$t2
endif: ...
# branch on ij
# execute the “then” part
If the condition were (i < j), we would change the first line to:
slt
beq
Page 26
$t0,$s1,$s2
$t0,$0,endif
# set $t0 to 1 if i<j
# branch if ($t0)=0;
# i.e., i not< j or ij
© NTK 2009
if-then-else Statements
Example
Show a sequence of MiniMIPS instructions corresponding to:
if (i<=j) x = x+1; z = 1; else y = y–1; z = 2*z
Solution
slt
bne
addi
addi
j
else: addi
add
endif:...
Page 27
$t0,$s2,$s1
$t0,$zero,else
$t1,$t1,1
$t3,$zero,1
endif
$t2,$t2,-1
$t3,$t3,$t3
#
#
#
#
#
#
#
© NTK 2009
j<i? (inverse condition)
if j<i goto else part
begin then part: x = x+1
z = 1
skip the else part
begin else part: y = y–1
z = z+z
while Statements
Example
The simple while loop: while (A[i]==k) i=i+1;
Assuming that: i, A, k are stored in $s1,$s2,$s3
Solution
loop: add
$t1,$s1,$s1 # t1 = 4*i
add
$t1,$t1,$t1 #
add
$t1,$t1,$s2 # t1 = A + 4*i
lw
$t0,0($t1)
# t0 = A[i]
bne
$t0,$s3,endwhl
#
addi $s1,$s1,1
#
j
#
loop
endwhl: …
Page 28
#
© NTK 2009
switch Statements
Example
The simple switch
switch(test) {
case 0:
a=a+1; break;
case 1:
a=a-1; break;
case 2:
b=2*b; break;
default:
}
Assuming that: test,a,b are
stored in $s1,$s2,$s3
Page 29
beq
beq
beq
b
case_0:
addi
b
case_1:
sub
b
case_2:
add
b
default:
continue:
© NTK 2009
s1,t0,case_0
s1,t1,case_1
s1,t2,case_2
default
s2,s2,1
continue
#a=a+1
s2,s2,t1
continue
#a=a-1
s3,s3,s3
continue
#b=2*b
2.1.6 Addressing Modes
Addressing mode is the method by which the location of an operand
is specified within an instruction. MiniMIPS uses sixs addr modes:
Addressing
Instruction
Other elements involved
Reg spec
Register
Reg file
Constant offset
Base
Reg base
Reg file
Reg
data
Constant offset
Reg data
Mem
Add addr
Mem
Add addr
PC
Page 30
addi, ori…
Extend,
if required
Immediate
Pseudodirect
jal
Some place
in the machine
Implied
PC-relative
Operand
PC
lw, sw…
Mem
Memory data
Mem
Memory data
branch instr
Mem
addr Memory Mem
data
© NTK 2009
Schematic representation of addressing modes in MiniMIPS.
j
Finding the Maximum Value in a List of Integers
Example
List A is stored in memory beginning at the address given in $s1.
List length is given in $s2.
Find the largest integer in the list and copy it into $t0.
Solution
lw
$t0,0($s1)
# initialize maximum to A[0]
addi $t1,$zero,0
# initialize index i to 0
loop: add
$t1,$t1,1
# increment index i by 1
beq
$t1,$s2,done # if all elements examined,
quit
add
$t2,$t1,$t1
# compute 2i in $t2
add
$t2,$t2,$t2
# compute 4i in $t2
add
$t2,$t2,$s1
# form address of A[i] in $t2
lw
$t3,0($t2)
# load value of A[i] into $t3
slt
$t4,$t0,$t3
# maximum < A[i]?
beq
$t4,$zero,loop# if not, repeat with no
change
addi $t0,$t3,0
# if so, A[i] is the
new maximum
j
loop
# change completed; now repeat
# continuation of the program
Page 31done: ...
© NTK 2009
The 20 MiniMIPS Instructions
Covered So Far
Copy
Arithmetic
Logic
Memory access
Control transfer
Table 5.1
Page 32
Instruction
Usage
Load upper immediate
Add
Subtract
Set less than
Add immediate
Set less than immediate
AND
OR
XOR
NOR
AND immediate
OR immediate
XOR immediate
Load word
Store word
Jump
Jump register
Branch less than 0
Branch equal
Branch not equal
lui
add
sub
slt
addi
slti
and
or
xor
nor
andi
ori
xori
lw
sw
j
jr
bltz
beq
bne
© NTK 2009
rt,imm
rd,rs,rt
rd,rs,rt
rd,rs,rt
rt,rs,imm
rd,rs,imm
rd,rs,rt
rd,rs,rt
rd,rs,rt
rd,rs,rt
rt,rs,imm
rt,rs,imm
rt,rs,imm
rt,imm(rs)
rt,imm(rs)
L
rs
rs,L
rs,rt,L
rs,rt,L
op fn
15
0
0
0
8
10
0
0
0
0
12
13
14
35
43
2
0
1
4
5
32
34
42
36
37
38
39
8
2. Instruction Set Architecture
2.1. Instructions and addressing
2.2. Procedures and Data
Page 33
© NTK 2009
2.2. Procedures and Data
2.2.1 Simple Procedure Calls
2.2.2 Using the Stack for Data Storage
2.2.3 Parameters and Results
2.2.4 Data Types
2.2.5 Arrays and Pointers
2.2.6 Additional Instructions
Page 34
© NTK 2009
2.2.1 Simple Procedure Calls
#Laboratory Exercise 4, Home Assignment 1
#include <iregdef.h>
.text
.set noreorder
.globl start
.ent start
start:
li
a0,-45
#load input parameter
jal
abs
#jum and link to abs procedure
nop
.end start
.ent abs
abs:
sub
v0,zero,a0
#put -(a0) in v0; in case (a0)<0
bltz
a0,done
nop
add
v0,a0,zero
done:
jr
ra
.end abs
Page 35
#if (a0)<0 then done
#else put (a0) in v0
© NTK 2009
2.2.1 Simple Procedure Calls
A procedure is: a subprogram that when called
(initiated, invoked) performs a specific task,
perhaps leading to one or more results, based on the
input parameters (arguments) with which it is
provided and returns to the point of call, having perturbed
nothing else.
In assembly language, a procedure is associated with a
symbolic name that denotes its starting address. The
jal instruction in MIPS is intended specifically for
procedure calls:
– it performs the control transfer (unconditional jump) to the starting
address of the procedure,
– while also saving the return address in register $ra.
Page 36
© NTK 2009
Illustrating a Procedure Call
main
PC
jal
proc
Prepare
to call
Prepare
to continue
proc
Save, etc.
Restore
jr
$ra
Relationship between the main program and a procedure.
Page 37
© NTK 2009
2.2.1 Simple Procedure Calls
Using a procedure involves the following sequence of actions:
1.
2.
3.
4.
5.
6.
Put arguments in places known to procedure (reg’s $a0-$a3)
Transfer control to procedure, saving the return address (jal)
Acquire storage space, if required, for use by the procedure
Perform the desired task
Put results in places known to calling program (reg’s $v0-$v1)
Return control to calling point (jr)
MiniMIPS instructions for procedure call and return from procedure:
Page 38
jal
proc
# jump to loc “proc” and link;
# “link” means “save the return
# address” (PC)+4 in $ra ($31)
jr
rs
# go to loc addressed by rs
© NTK 2009
$0
$1
$2
$3
$4
$5
$6
$7
$8
$9
$10
$11
$12
$13
$14
$15
$16
$17
$18
$19
$20
$21
$22
$23
$24
$25
$26
$27
$28
$29
$30
$31
Page 39
0
$zero
$at Reserved for assembler use
$v0
Procedure results
$v1
$a0
Procedure
$a1
Saved
arguments
$a2
$a3
$t0
$t1
$t2
Temporary
$t3
values
$t4
$t5
$t6
$t7
$s0
$s1
Saved
$s2
across
$s3
Operands
procedure
$s4
calls
$s5
$s6
$s7
More
$t8
temporaries
$t9
$k0
Reserved for OS (kernel)
$k1
$gp Global pointer
$sp Stack pointer
Saved
$fp Frame pointer
$ra Return address
A 4-b yte word
sits in consecutive
memory addresses
according to the
big-endian order
(most significant
byte has the
lowest address)
Byte numbering:
© NTK 2009
3
2
3Recalling Register
2
Conventions
1
0
1
0
When loading
a byte into a
register, it goes
in the low end Byte
Word
Doublew ord
A doubleword
sits in consecutive
registers or
memory locations
according to the
big-endian order
(most significant
word comes first)
Registers and
data sizes in
MiniMIPS.
A Simple MiniMIPS Procedure
Example
Procedure to find the absolute value of an integer.
$v0 |($a0)|
Solution
The absolute value of x is –x if x < 0 and x otherwise.
abs: sub
$v0,$zero,$a0
bltz $a0,done
add $v0,$a0,$zero
done: jr
$ra
#
#
#
#
#
put -($a0) in $v0;
in case ($a0) < 0
if ($a0)<0 then done
else put ($a0) in $v0
return to calling program
In practice, we seldom use such short procedures because of the
overhead that they entail. In this example, we have 3-4
instructions of overhead for 3 instructions of useful computation.
Page 40
© NTK 2009
Nested Procedure Calls
main
PC
jal
abc
Prepare
to call
Prepare
to continue
abc
Procedure
abc
Save
xyz
jal
Text version
is incorrect
xyz
Restore
jr
$ra
Example of nested procedure calls.
Page 41
Procedure
xyz
© NTK 2009
jr
$ra
2.2.2 Using the Stack for Data Storage
sp
b
a
Push c
sp
c
b
a
Pop x
sp
sp = sp – 4
mem[sp] = c
b
a
x = mem[sp]
sp = sp + 4
Effects of push and pop operations on a stack.
push: addi
sw
Page 42
$sp,$sp,-4
$t4,0($sp)
pop: lw
addi
© NTK 2009
$t5,0($sp)
$sp,$sp,4
Memory Map in MiniMIPS
Hex address
00000000
Reserved
1 M words
Program
Text segment
63 M words
00400000
10000000
Addressable
with 16-bit
signed offset
Static data
10008000
1000ffff
Data segment
Dynamic data
$gp
$28
$29
$30
448 M words
$sp
$fp
80000000
Stack
Stack segment
7ffffffc
Second half of address
space reserved for
memory-mapped I/O
Page 43
2009
Overview of the memory© NTK
address
space in MiniMIPS.
2.2.3 Parameters and Results
Stack allows us to pass/return an arbitrary number of values
$sp
Local
variables
z
y
..
.
Saved
registers
Frame for
current
procedure
Old ($fp)
$sp
$fp
c
b
a
..
.
Frame for
current
procedure
c
b
a
..
.
$fp
Before calling
After calling
Use of the stack by a procedure.
Page 44
© NTK 2009
Frame for
previous
procedure
Example of Using the Stack
Saving $fp, $ra, and $s0 onto the stack and restoring
them at the end of the procedure
$sp
$sp
$fp
$fp
Page 45
proc: sw
addi
addi
sw
sw
.
($s0)
.
($ra)
.
($fp)
lw
lw
addi
lw
jr
$fp,-4($sp)
$fp,$sp,0
$sp,$sp,–12
$ra,-8($fp)
$s0,-12($fp)
#
#
#
#
#
save the old frame pointer
save ($sp) into $fp
create 3 spaces on top of stack
save ($ra) in 2nd stack element
save ($s0) in top stack element
$s0,-12($fp)
$ra,-8($fp)
$sp,$fp, 0
$fp,-4($sp)
$ra
#
#
#
#
#
put top stack element in $s0
put 2nd stack element in $ra
restore $sp to original state
restore $fp to original state
return from procedure
© NTK 2009
2.2.4 Data Types
Data size (number of bits), data type (meaning assigned to bits)
Signed integer:
Unsigned integer:
Floating-point number:
Bit string:
byte
byte
byte
word
word
word
word
doubleword
doubleword
Converting from one size to another
Type
Page 46
8-bit number Value
32-bit version of the number
Unsigned 0010 1011
Unsigned 1010 1011
43
171
0000 0000 0000 0000 0000 0000 0010 1011
0000 0000 0000 0000 0000 0000 1010 1011
Signed
Signed
+43
–85
0000 0000 0000 0000 0000 0000 0010 1011
1111 1111 1111 1111 1111 1111 1010 1011
0010 1011
1010 1011
© NTK 2009
ASCII Characters
ASCII (American standard code for information interchange)
0
1
2
3
4
5
6
NUL
DLE
SP
0
@
P
`
p
SOH
DC1
!
1
A
Q
a
q
STX
DC2
“
2
B
R
b
r
ETX
DC3
#
3
C
S
c
s
4
EOT
DC4
$
4
D
T
d
t
5
ENQ
NAK
%
5
E
U
e
u
6
ACK
SYN
&
6
F
V
f
v
7
BEL
ETB
‘
7
G
W
g
w
8
BS
CAN
(
8
H
X
h
x
9
HT
EM
)
9
I
Y
i
y
a
LF
SUB
*
:
J
Z
j
z
b
VT
ESC
+
;
K
[
k
{
c
FF
FS
,
<
L
\
l
|
d
CR
GS
-
=
M
]
m
}
e
SO
RS
.
>
N
^
n
~
f
SI
US
/
?
O
_
o
DEL
0
1
2
3
Page 47
© NTK 2009
7
8-9
a-f
More
More
controls
symbols
8-bit ASCII code
(col #, row #)hex
e.g., code for +
is (2b) hex or
(0010 1011)two
Loading and Storing Bytes
Bytes can be used to store ASCII characters or small integers.
MiniMIPS addresses refer to bytes, but registers hold words.
31
I
lb
$t0,8($s3)
lbu
$t0,8($s3)
sb
$t0,A($s3)
op
25
rs
#
#
#
#
#
20
rt
load rt with mem[8+($s3)]
sign-extend to fill reg
load rt with mem[8+($s3)]
zero-extend to fill reg
LSB of rt to mem[A+($s3)]
15
immediate / offset
1 0 x x 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
lb = 32
lbu = 36
sb = 40
Base
register
Data
register
Address offset
Load and store instructions for byte-size data elements.
Page 48
0
© NTK 2009
Meaning of a Word in Memory
Bit pattern
(02114020) hex
0000 0010 0001 0001 0100 0000 0010 0000
00000010000100010100000000100000
Add instruction
00000010000100010100000000100000
Positive integer
00000010000100010100000000100000
Four-character string
A 32-bit word has no inherent meaning and can be interpreted in
a number of equally valid ways in the absence of other cues
(e.g., context) for the intended meaning.
Page 49
© NTK 2009
2.2.5 Arrays and Pointers
Index: Use a register that holds the index i and increment the register in
each step to effect moving from element i of the list to element i + 1
Pointer: Use a register that points to (holds the address of) the list element
being examined and update it in each step to point to the next element
Array index i
Add 1 to i;
Compute 4i;
Add 4i to base
Base
Array A
A[i]
A[i + 1]
Pointer to A[i]
Add 4 to get
the address
of A[i + 1]
Array A
A[i]
A[i + 1]
Stepping through the elements of an array using the indexing
method and the pointer updating method.
Page 50
© NTK 2009
Selection Sort
To sort a list of numbers, repeatedly perform the following:
Find the max element, swap it with the last item, move up
the “last” pointer
A
A
first
first
max
A
first
x
y
last
last
last
Start of iteration
y
x
Maximum identified
End of iteration
One iteration of selection sort.
Page 51
© NTK 2009
Selection Sort Using the Procedure max
A
A
first
Inputs to
proc max
first
In $a0
max
In $v1
In $a1
y
Outputs from
proc max
last
last
Start of iteration
Page 52
x
In $v0
last
sort: beq
jal
lw
sw
sw
addi
j
done: ...
A
first
$a0,$a1,done
max
$t0,0($a1)
$t0,0($v0)
$v1,0($a1)
$a1,$a1,-4
sort
#
#
#
#
#
#
#
#
y
x
Maximum identified
End of iteration
single-element list is sorted
call the max procedure
load last element into $t0
copy the last element to max loc
copy max value to last element
decrement pointer to last element
repeat sort for smaller list
continue with rest of program
© NTK 2009
2.2.6 Additional Instructions
MiniMIPS instructions for multiplication and division:
mult
div
$s0, $s1
$s0, $s1
mfhi
mflo
$t0
$t0
31
R
op
#
#
#
#
#
25
rs
20
rt
set
set
and
set
set
Hi,Lo to ($s0)($s1)
Hi to ($s0)mod($s1)
Lo to ($s0)/($s1)
$t0 to (Hi)
$t0 to (Lo)
rd
15
10
sh
5
fn
0
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 x 0
ALU
instruction
Source
register 1
Source
register 2
Unused
Unused
mult = 24
div = 26
The multiply (mult) and divide (div) instructions of MiniMIPS.
31
R
op
25
rs
20
rt
15
rd
10
sh
5
fn
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 x 0
ALU
instruction
Unused
Unused
Destination
register
Unused
mfhi = 16
mflo = 18
MiniMIPS instructions for copying the contents of Hi and Lo registers into
general registers .
Page 53
© NTK 2009
Logical Shifts
MiniMIPS instructions for left and right shifting:
sll
srl
sllv
srlv
$t0,$s1,2
$t0,$s1,2
$t0,$s1,$s0
$t0,$s1,$s0
31
R
op
25
20
rt
15
left-shifted by 2
right-shifted by 2
left-shifted by ($s0)
right-shifted by ($s0)
rd
10
sh
5
31
op
Unused
25
0
rs
Source
register
20
rt
Destination
register
15
rd
Shift
amount
10
sh
sll = 0
srl = 2
5
fn
0
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 x 0
ALU
instruction
Amount
register
Source
register
Destination
register
Unused
The four logical shift instructions of MiniMIPS.
Page 54
fn
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 x 0
ALU
instruction
R
rs
# $t0=($s1)
# $t0=($s1)
# $t0=($s1)
# $t0=($s1)
© NTK 2009
sllv = 4
srlv = 6
Unsigned Arithmetic and Miscellaneous Instructions
MiniMIPS instructions for unsigned arithmetic (no overflow exception):
addu
subu
multu
divu
$t0,$s0,$s1
$t0,$s0,$s1
$s0,$s1
$s0,$s1
addiu $t0,$s0,61
#
#
#
#
#
#
#
#
set $t0 to ($s0)+($s1)
set $t0 to ($s0)–($s1)
set Hi,Lo to ($s0)($s1)
set Hi to ($s0)mod($s1)
and Lo to ($s0)/($s1)
set $t0 to ($s0)+61;
the immediate operand is
sign extended
To make MiniMIPS more powerful and complete, we introduce later:
sra
$t0,$s1,2
srav $t0,$s1,$s0
syscall
Page 55
# sh. right arith (Sec. 10.5)
# shift right arith variable
# system call (Sec. 7.6)
© NTK 2009
The 20 MiniMIPS Instructions
from Chapter 6
(40 in all so far)
Copy
Arithmetic
Table 6.2 (partial)
Shift
Memory access
Instruction
Usage
Move from Hi
mfhi
rd
Move from Lo
mflo
rd
Add unsigned
addu
rd,rs,rt
Subtract unsigned
subu
rd,rs,rt
Multiply
mult
rs,rt
Multiply unsigned
multu rs,rt
Divide
div
rs,rt
Divide unsigned
divu
rs,rt
Add immediate unsigned
addiu rs,rt,imm
Shift left logical
sll
rd,rt,sh
Shift right logical
srl
rd,rt,sh
Shift right arithmetic
sra
rd,rt,sh
Shift left logical variable
sllv
rd,rt,rs
Shift right logical variable
srlv
rt,rd,rs
Shift right arith variable
srav
rd,rt,rd
Load byte
lb
rt,imm(rs)
Load byte unsigned
lbu
rt,imm(rs)
Store byte
sb
rt,imm(rs)
Jump and link
jal
L
System call
syscall
Control transfer
Page 56
© NTK 2009
op fn
0
0
0
0
0
0
0
0
9
0
0
0
0
0
0
32
36
40
3
0
16
18
33
35
24
25
26
27
0
2
3
4
6
7
12
The 37 + 3 MiniMIPS Instructions Covered So Far
Instruction
Usage
Instruction
Usage
Load upper immediate
lui
rt,imm
Move from Hi
mfhi
rd
Add
add
rd,rs,rt
Move from Lo
mflo
rd
Subtract
sub
rd,rs,rt
Add unsigned
addu
rd,rs,rt
Set less than
slt
rd,rs,rt
Subtract unsigned
subu
rd,rs,rt
Add immediate
addi
rt,rs,imm
Multiply
mult
rs,rt
Set less than immediate
slti
rd,rs,imm
Multiply unsigned
multu rs,rt
AND
and
rd,rs,rt
Divide
div
rs,rt
OR
or
rd,rs,rt
Divide unsigned
divu
rs,rt
XOR
xor
rd,rs,rt
Add immediate unsigned
addiu rs,rt,imm
NOR
nor
rd,rs,rt
Shift left logical
sll
rd,rt,sh
AND immediate
andi
rt,rs,imm
Shift right logical
srl
rd,rt,sh
OR immediate
ori
rt,rs,imm
Shift right arithmetic
sra
rd,rt,sh
XOR immediate
xori
rt,rs,imm
Shift left logical variable
sllv
rd,rt,rs
Load word
lw
rt,imm(rs)
Shift right logical variable
srlv
rd,rt,rs
Store word
sw
rt,imm(rs)
Shift right arith variable
srav
rd,rt,rs
Jump
j
L
Load byte
lb
rt,imm(rs)
Jump register
jr
rs
Load byte unsigned
lbu
rt,imm(rs)
Branch less than 0
bltz
rs,L
Store byte
sb
rt,imm(rs)
Branch equal
Page 57
Branch not equal
beq
rs,rt,L
jal
L
bne
rs,rt,L
Jump and link
© NTK 2009
System call
syscall
Content
1. Introduction - Computer system technology and Computer
Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization
Page 58
© NTK 2009
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 59
© NTK 2009
3.1. Introduction
Page 60
© NTK 2009
Number Representation
Numbers are normally represented using a positional
number system:
N (b ) an an 1an 2 ...a1a0 .a1a2 ...a m
– Base/radix: b (the number of digits)
– Digits: 0..(b-1)
• 0 ≤ ai ≤ (b-1)
– Binary: b=2, digits:0,1
– Decimal: b=10, digits: 0,1,2,3,4,5,6,7,8,9
– Octal: b=8, digits: 0,1,2,3,4,5,6,7
© NTK 2009
– Hexadecimal: b=16, digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
Page 61
Number Representation
N (b ) an an 1an 2 ...a1a0 .a1a2 ...a m
N(10) an .bn an1.bn1 ... a1.b1 a0 .b0 a1.b1 ... am .bm
N (10)
n
a .b
i m
i
i
11101.11(2) = 1x24+1x23+1x22+0x21+1x20+1x2-1+1x2-2= 29.75(10)
Page 62
© NTK 2009
Number Representation
Decimal:
– b=10
– Digits: 0,1,2,3,4,5,6,7,8,9
N (10) an an 1an 2 ...a1a0 .a1a 2 ...a m
– Eg:
N (10)
n
a .10
i m
i
i
539.45(10) = 5x102+3x101+9x100+4x10-1+5x10-2
Page 63
© NTK 2009
ai = 0..9
Number Representation
Binary:
– b=2
bit – binary digit
– Digits: 0,1
N ( 2) an an 1an 2 ...a1a0 .a1a2 ...am
N (10)
n
a .2
i m
i
i
– Eg:
1011.011(2) = 1x23 + 0x22 + 1x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3
Page 64
© NTK 2009
ai = 0,1
Number Representation
Binary (cnt’)
– n-bit binary number can represent which range?
• an-1...a1a0 from 0 to 2n-1
– MSB – Most Significant Bit
N ( 2 ) an 1an 2 ...a1a0
– LSB – Least Significant Bit
MSB
0001 = 1
0110 = 6
1011 = 11
0010 = 2
0111 = 7
1100 = 12
0011 = 3
1000 = 8
1101 = 13
0100 = 4
1001 = 9
1110 = 14
0101 = 5
1010 = 10
1111 = 15
Page 65
© NTK 2009
LSB
Number Representation
Octal:
– b=8
N (8) an an 1...a1a0 .a1a2 ...a m
– Digits: 0,1,2,3,4,5,6,7
ai = 0..7
– Eg:
503.071(8) = 5x82 + 0x81 + 3x80 + 0x8-1 + 7x8-2 + 1x8-3
Hexadecimal:
N (16) an an 1...a1a0 .a1a 2 ...a m
– b=16
ai = 0..F
– Digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
– Eg:
503.071(16) = 5x162 + 0x161 + 3x160 + 0x16-1 + 7x16-2 + 1x16-3
Page 66
© NTK 2009
Convert from base b to base 10
Base b to base 10 conversion
N (b ) an an 1an 2 ...a1a0 .a1a2 ...a m
N(10) an .bn an1.bn1 ... a1.b1 a0 .b0 a1.b1 ... am .bm
Eg:
– 1010.11(2)= 1x23+0x22+1x21+0x20+1x2-1+1x2-2=10.75(10)
– 1010.11(8)=?
– A12(16)=?
Page 67
© NTK 2009
Convert from base 10 to base b
Base 10 to base b conversion
– For integer part:
• Divide integer part by b until the result is 0
• Write remainders in reverse order to get the converted
result.
– For the odd part after “.”
• Multiply by b until the result is 0
Page 68
© NTK 2009
Convert from base 10 to base 2
Eg1: 6.625(10) = ?(2)
– The odd part after “.”
– The integer part
6
• 0.625 x 2 = 1.25
2
• 0.25 x 2 = 0.5
0
3
2
1
1
2
1
0
• 0.5
6.625(10) = 110.101(2)
Eg2: 120.5625(10) = 1111000.1001(2)
Page 69
x 2 = 1.0
© NTK 2009
Convert from base 2 to base 2n
Group from right to left n-bit groups and replace the
equivalent values in base 2n
Eg:
101011(2) = ?(8)
1010.110(2)=12.6(8)
101011(2) = ?(16)
1010.110(2)=A.C(16)
Page 70
© NTK 2009
Convert from base 2n to base 2
Each digit in base 2n is replaced by n bit in base 2.
Eg:
37A.B(16)=?(2)
Page 71
© NTK 2009
Convert from base i to base j
If both i and j are powers of 2, use base 2 as an
intermediate base:
– Eg: base 8 base 2 base 16
– 735.37(8)=?(16)
Else, use base 10 as an intermediate base:
– Eg: base 5 base 10 base 2
Page 72
© NTK 2009
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 73
© NTK 2009
3.2. Unsigned Numbers
The general form of signed number A:
an-1an-2...a2a1a0
Value of A:
A an 1 2 n 1 an 2 2 n 2 ... a1 21 a0 20
n 1
A ai 2i
i 0
Range of representation:
– Use n bit to represent 2’s complement numbers
– Range: 0 => 2n-1
Page 74
© NTK 2009
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 75
© NTK 2009
3.3. Signed Numbers
1’s complement and 2’s complement number
– A binary integer A is represented by n bit:
• 1’s complement number of A is (2n - 1) – A
• 2’s complement number of A is 2n - A
• Notes: 2’s complement number of A = 1’s complement number + 1
– Eg:
• n=8, A = 00110101
• 1’s complement number of A is (28 - 1) - 00110101=
• 2’s complement number of A is 28 - 00110101=
Page 76
© NTK 2009
2’s complement representation of signed numbers
Most left bit is sign bit
Positive and 0 numbers are expressed in usual binary format.
– The largest number can be represented is 2n-1-1
– n=8 => largest signed number: 28-1-1 = 127
Negative number a is stored as the binary equivalent of 2n-A
in one n-bit system.
– -3 is stored as 28-3=11111101 in a 8-bit system
– The most negative number can be stored is -2n-1
Page 77
© NTK 2009
2’s complement representation of signed numbers
The general form of signed number A:
an-1an-2...a2a1a0
Value of A:
A an 1 2
n 1
n2
ai 2i
i 0
Range of representation:
– Use n bit to represent 2’s complement numbers
– Range: -2n-1 => 2n-1-1
Page 78
© NTK 2009
2’s complement representation of signed numbers
+10 = 0000 1010
- 10 = 28-10 = 1 0000 0000
–
0000 1010
1111 0110
- 10 = 1111 0110
+10 + (-10) = ?
Page 79
© NTK 2009
MIPS signed number representation
32 bit signed numbers:
0000
0000
0000
...
0111
0111
1000
1000
1000
...
1111
1111
1111
Page 80
0000 0000 0000 0000 0000 0000 0000two = 0(10)
0000 0000 0000 0000 0000 0000 0001two = + 1(10)
0000 0000 0000 0000 0000 0000 0010two = + 2(10)
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1110two
1111two
0000two
0001two
0010two
=
=
=
=
=
+
+
–
–
–
2,147,483,646(10)
2,147,483,647(10)
2,147,483,648(10)
2,147,483,647(10)
2,147,483,646(10)
1111 1111 1111 1111 1111 1111 1101two = – 3(10)
1111 1111 1111 1111 1111 1111 1110two = – 2(10)
1111 1111 1111 1111 1111 1111 1111two = – 1(10)
© NTK 2009
2’s complement representation of signed numbers
Procedure to find binary representation of negative number in
2’s complement:
– Find the binary equivalent of the magnitude
– Complement each bit (0=>1, 1=>0)
– Add 1
Eg: find representation of -13 in 8-bit signed number system
using 2’s complement:
• Magnitude:
13 = 0000 1101
• 1’s complement:
• Add 1:
• -13 =
Page 81
1111 0010
+
1
1111 0011
© NTK 2009
2’s complement representation of signed numbers
To find the magnitude of a negative number:
– Complement each bit
– Add 1
Eg:
-5 11111011
-1
00000100
00000000
1
1
5 00000101
Page 82
11111111
1
© NTK 2009
00000001
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 83
© NTK 2009
3.4. Addition & Subtraction
3.4.1. Addition of unsigned numbers
3.4.2. Addition of signed numbers
3.4.3. Subtraction of signed numbers
Page 84
© NTK 2009
3.4.1. Addition of Unsigned Numbers
Unsigned binary addition similar to decimal addition.
carry
decimal binary
1100
11110
A
2565
10110
B
6754
11011
sum
9319
110001
Eg: 10101(2) + 11011(2) = ? (2)
Page 85
© NTK 2009
Addition of Unsigned Numbers
Overflow
– Occur when the result of addition is out of range of representation (the
result can not be stored in the predefined number of bits)
– Eg:
X = 1001 0110 = 150
X = 1100 0101
= 197
Y = 0001 0011 = 19
Y = 0100 0110
=
S = 1010 1001 = 169
S = 0000 1011=11 267
Cout = 0
Cout = 1 carry-out
Overflow occurs when Cout = 1
Page 86
© NTK 2009
70
3.4.2. Addition of Signed Numbers
The reason that 2’s complement is so popular is the simplicity
of addition.
To add any two numbers, no matter what the sign of each is,
we just do binary addition on their representation.
Page 87
-5 1011
-5
1011
-5
1011
+7 0111
+5 0101
+3
0011
+2 0010
0 0000
-2
1110
© NTK 2009
Addition of Signed Numbers
Overflow
– Occur when the result of addition is out of range of representation (the
result can not be stored in the predefined number of bits)
– Occur when?
• Add two numbers of the opposite sign?
• Add two positive numbers?
• Add two negative numbers?
Overflow occurs when adding two numbers with the same
sign and the result is in different sign
Page 88
© NTK 2009
3.4.3. Subtraction of Signed Numbers
Principle:
– Subtraction is addition of negative number.
a – b = a + (-b)
Eg: 7 – 5 = ?
5
0101
7
0111
1010
-5
+1011
+
2
0010
1
-5 1011
Page 89
© NTK 2009
Overflow when adding or subtracting signed numbers:
Page 90
© NTK 2009
Addition & Subtraction in MIPS
Unsigned integers are commonly used for memory addresses
where overflow ignored, so MIPS provide two kinds of
addition and subtraction:
– add, addi, sub cause exception when overflow
– addu, addiu, subu do not cause exception when overflow
– MIPS has a register called exception program counter (EPC) to store
the address of the instruction that caused the exception.
– Instruction mfc0 is used to copy EPC into a general purpose register
mfc0 s1,epc
Page 91
© NTK 2009
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 92
© NTK 2009
3.5. Multiplication
3.5.1. Multiplication of unsigned numbers
3.5.2. Multiplication of signed numbers
Page 93
© NTK 2009
3.5.1. Multiplication of unsigned numbers
1000
x 1001
1000
0000
0000
1000
1001000
Multiplicand (+8)
Multiplier (+9)
Product
(+72)
Multiplication of two n-bit unsigned numbers, the product is
one 2n-bit unsigned number.
Page 94
© NTK 2009
Multiplication implementation - 1st version
Multiplicand, ALU, product are 64 bit
Multiplier is 32 bit
Page 95
© NTK 2009
Multiplication algorithm
If each step take one
clock cycle, this
multiplication algorithm
will require almost 100
clock cycles to multiply
two 32-bit numbers.
Page 96
© NTK 2009
Multiplication implementation – 2nd version
Multiplicand, ALU, multiplier are 32 bit
Product is 64 bit
Page 97
© NTK 2009
3.5.2. Multiplication of signed numbers
Use unsigned multiplication:
– Convert multiplicand & multiplier to positive numbers
– Multiply using unsigned multiplication algorithm
– Change the sign of product:
• If the signs of multiplicand and multiplier are the same, the product is
the result of step 2.
• If the signs disagree, the product is two’s complement of the result of
step 2.
Page 98
© NTK 2009
Faster multiplication
Page 99
© NTK 2009
Multiply in MIPS
Product is stored in a pair of 32-bit special registers, called
Hi & Lo
To produce properly signed and unsigned product, MIPS has
two multiplication instructions:
– mult – multiply signed
– multu – multiply unsigned
To fetch the 32-bit product, MIPS provide two instructions:
– mfhi
– mflo
Page 100
© NTK 2009
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 101
© NTK 2009
3.6. Division
3.6.1. Division of unsigned numbers
3.6.2. Division of signed numbers
Page 102
© NTK 2009
3.6.1. Division of unsigned numbers
Divide 1001010(10) by 1000(10)
Page 103
© NTK 2009
Division implementation – 1st version
Divisor, ALU, remainder are 64 bit
Quotient is 32 bit
Page 104
© NTK 2009
Division algorithm
Page 105
© NTK 2009
Division implementation – 2nd version
Page 106
© NTK 2009
3.6.2. Division of signed numbers
Use unsigned division algorithm:
– Convert dividend and divisor to positive numbers
– Divide using unsigned division algorithm
– Change the sign of product:
• Dividend
Divisor
Quotient
Remainder
+
+
Keep
Keep
+
-
Negate
Keep
-
+
Negate
Negate
-
-
Keep
Negate
Page 107
© NTK 2009
Faster division
Page 108
© NTK 2009
Divide in MIPS
Remainder is stored in Hi, quotient is stored in Lo
To produce properly signed and unsigned result, MIPS has
two division instructions:
– div – divide signed
– divu – divide unsigned
To fetch the 32-bit result, MIPS provide two instructions:
– mfhi
– mflo
Page 109
© NTK 2009
3. Arithmetic for Computer
1. Introduction
2. Unsigned Numbers
3. Signed Number
4. Addition and Subtraction
5. Multiplication
6. Division
7. Floating Point
Page 110
© NTK 2009
3.7. Floating Point
Floating-point numbers can be represented in the
form:
(-1)sx1.xxxxx(2)x2yyyy
s: sign (s=0 => positive, s=1 => negative)
x,y={0,1}
15.75(10) =1111.11(2) = 1.11111x23
Page 111
© NTK 2009
Floating-point number representation
Page 112
© NTK 2009
IEEE 754 standard
To represent floating-point numbers
Base b=2
Three basic forms:
– Single-precision number representation, 32 bit
– Double-precision number representation, 64 bit
– Extended-precision number representation, 80 bit
Formats:
31 30
S
23 22
e
m
63 62
S
52 51
S
0
e
m
79 78
Page 113
0
64 63
0
e
m
© NTK 2009
IEEE 754 standard
31 30
S
23 22
e
m
63 62
S
52 51
0
e
m
79 78
S
0
64 63
0
e
m
X = (-1)S x 1.m x 2e-b
s: sign bit (s=0 => positive, s=1 => negative)
e: excess
b: bias
– Single-precision 32-bit : b = 127
– Double-precision 64-bit : b = 1023
– Extended-precisioin 80-bit : b = 16383
Page 114
© NTK 2009
IEEE 754 standard
Example: One real number X is represented using IEEE 754
standard with the following format:
1100 0001 0101 0110 0000 0000 0000 0000
Show the decimal value of X
Solution:
1100 0001 0101 0110 0000 0000 0000 0000
s=1
e=130
m
X = (-1)S x 1.m x 2e-b
= (-1)1 x 1.1010110 x 2130-127
= -1.101011 x 23= -1101.011 = -13.375(10)
Page 115
© NTK 2009
IEEE 754 standard
Example: One real number X is represented using IEEE 754
standard with the following format:
0011 1111 1000 0000 0000 0000 0000 0000
Show the decimal value of X
Solution:
X = (-1)S x 1.m x 2e-127
= (-1)0 x 1.0 x 2127-127
=1
Page 116
© NTK 2009
IEEE 754 standard
Example: Represent the real number -19.7890625(10) using
32 bit IEEE 754 standard
19.7890625 = 10011.1100101(2) = 1.00111100101 x 24
-19.7890625 = (-1)1 x 1.00111100101 x 2131-127
= 1 1000 0011 001111001010...0
Page 117
© NTK 2009
Special convention
Page 118
© NTK 2009
Range of Representation
underflow
overflow
¥
-b
-a
-0 +0
overflow
a
b
32 bit: a = 2-127 ≈ 10-38
b = 2+127 ≈ 10+38
64 bit: a = 2-1023 ≈ 10-308
b = 2+1023 ≈ 10+308
80 bit: a = 2-16383 ≈ 10-4932
b = 2+16383 ≈ 10+4932
Page 119
© NTK 2009
¥
Floating-point addition
Page 120
© NTK 2009
Block
diagram
of
an arithmetic
unit
dedicated to
floating-point
addition
Page 121
© NTK 2009
Floating-point multiplication
Page 122
© NTK 2009
Floating-point instructions in MIPS
Operations
Single
Double
Addition
add.s
add.d
Subtraction
sub.s
sub.d
Multiplication
mul.s
mul.d
Division
div.s
div.d
Comparison
c.x.s
c.x.d
bclt
Branch if true
bclf
Branch if false
Page 123
© NTK 2009
x = eq, neq, lt, le, gt, ge
Floating-point instructions in MIPS
MIPS has dedicated registers and instructions used for
floating-point operations:
– 32 floating-point registers: $f0,$f1,$f2...,$f31
– Separate load and store for floating-point registers: lwc1, swc1,
Page 124
lwc1
f4,0(sp)
# Load 32 bit F.P number into f4
lwc1
f6,4(sp)
# Load 32 bit F.P number into f6
add.s f2,f4,f6
# f2 = f4 + f6
swc1
# Store F.P number from f2
f2,8(sp)
© NTK 2009
Floating-point instructions in MIPS
A double precision register is combination of an even-old pair
of single precision registers, using even register as its name
lwc1 f4,0(sp)
# Load upper part of 64 bit F.P number into f4
lwc1 f5,4(sp)
# Load lower part of 64 bit F.P number into f5
lwc1 f6,8(sp)
# Load upper part of 64 bit F.P number into f6
lwc1 f7,12(sp)
# Load lower part of 64 bit F.P number into f7
add.d f2,f4,f6
# (f2,f3) = (f4,f5) + (f6,f7)
swc1 f2,16(sp)
# Store upper part of 64 bit F.P number from f2
swc1 f2,20(sp)
# Store upper part of 64 bit F.P number from f2
Page 125
© NTK 2009
MIPS floating-point assembly language
Page 126
© NTK 2009
Content
1. Introduction - Computer system technology and Computer
Performance
2. Instruction Set Architecture
3. Arithmetic for Computer
4. CPU Organization
Page 127
© NTK 2009
4. CPU Organization
4.1. Building a datapath
4.2. Single cycle implementation
4.3. Multi cycle implementation
4.4. Exceptions
4.5. Pipelining
Page 128
© NTK 2009
4.1. Building a datapath
A basic MIPS implementation
Implement a subset of the core MIPS instruction set:
– Memory-reference instructions: lw, sw
– Arithmetic-logical instructions: add, sub, and, or, slt
– Branch instructions: beq, j
Page 129
© NTK 2009
Review of instruction classes
31
R
31
I
31
J
Page 130
op
25
rs
20
rt
15
6 bits
5 bits
5 bits
Opcode
Source
register 1
Source
register 2
op
25
rs
20
rt
rd
10
5 bits
Destination
register
15
sh
fn
5
5 bits
6 bits
Shift
amount
Opcode
extension
operand / offset
6 bits
5 bits
5 bits
16 bits
Opcode
Source
or base
Destination
or data
Immediate operand
or address offset
op
25
jump target address
0
0
0
6 bits
1 0 0 0 0 0 0 0 0 0 0 0 26
0 bits
0 0 0 0 0 0 0 1 1 1 1 0 1
Opcode
Memory word address (byte address divided by 4)
© NTK 2009
A basic MIPS implementation
For every instruction, the first two steps are identical:
– Send the program counter PC to instruction memory and fetch the instruction
from that memory.
– Read one or two registers, using fields of the instruction to select the registers
to read.
– Instructions class except for jump use ALU:
• Memory-reference instructions use ALU for address calculation
• Arithmetic-logical instructions use ALU for operation execution
• Branch instructions use ALU for comparison
– Next actions differ according to instructions:
• Memory-reference instructions: access memory to write / read
• Arithmetic-logical instructions: write data from ALU back to registers
• Branch instructions: change next instruction address based in comparison
Page 131
© NTK 2009
Abstract view of a basic MIPS implementation – v1.0
3
1
2
Page 132
© NTK 2009
Abstract view of a basic MIPS implementation – v1.0
Drawback in version 1.0 of the basic MIPS implementation:
– In several places, data going into a particular unit as coming from two
different sources.
• Value written into PC can come from one of two adders.
• Data written into registers can come from either ALU or date memory
– Several of the units must be controlled depending on the type of
instruction.
• Data memory must read on load and write on store.
• Registers must be written on a load and an arithmetic-logical
instruction.
Page 133
© NTK 2009
Abstract view of a basic MIPS implementation – v2.0
Page 134
© NTK 2009
Abstract view of a basic MIPS implementation – ver 2
Benefits:
– Simple, easy to understand
– Single-cycle datapath
Requirement:
– Instruction memory and data memory are separate because:
• Format of data and instruction is different
• Having separate memories is less expensive
• The processor operates in one cycle and cannot use single-portet
memory for two different access within that cycle.
Page 135
© NTK 2009
Logic design conventions
To design the machine, how the logic implementing the
machine will operate and how the machine is clocked needed
to be decided.
MIPS design consists of two types of logic elements:
– Combinational elements
– Sequential elements:
• Flip-flops, memories, registers
Please revise
Digital Logic Design
Page 136
© NTK 2009
Building a datapath
Fetching instructions and incrementing PC
Implement R-format ALU operations
Loading and storing data
Branching with beq
Page 137
© NTK 2009
Fetching instructions and incrementing PC
Read data from instruction memory to output
PC = PC + 4
Page 138
© NTK 2009
Implement R-format ALU operations
R-format ALU instructions: add, sub, and, or, slt
add s1,s2,s3
# s1 = s2 + s3
1. Read s2,s3 from register file according to two register numbers to outputs
2. ALU execute operation add
3. Write result back to register s1, ©need
write control signal
NTK 2009
Page 139
Loading and storing data
lw t1, offset_value(t2)
sw t1, offset_value(t2)
1. Compute memory address by adding the base register t2
with 16-bit sign extended field offset_value.
2. Read data from register t1 to write to calculated address in
data memory
or Read data from calculated
address in data memory to
write to register t1.
Page 140
© NTK 2009
Branching with beq
beq t1,t2,offset
Compute branch target address by adding sign-extended
offset to PC
– Two notes in the definition of branch instruction:
• The instruction set architecture specifies that the base for branch
address calculation is the address of the instruction following the
branch. So we can always compute PC+4 in fetching period.
• The offset field is shifted left 2 bits so that it’s a word offset.
Use ALU to evaluate branch condition:
– Read two registers t1, t2 from register file to inputs of ALU
– Subtract two inputs of ALU, assert control signal
Page 141
© NTK 2009
Branching with beq
Page 142
© NTK 2009
Creating a single datapath
Single datapath:
– Execute every instruction in one clock cycle.
– No resource used more than once per instruction, so any elements
needed more than once must be duplicated.
• Separate memories for instruction and data.
Page 143
© NTK 2009
A single datapath for memory and R-type instructions
Page 144
© NTK 2009
A single datapath for a basic MIPS
Control signals
are not connected
Page 145
© NTK 2009
4. CPU Organization
4.1. Building a datapath
4.2. Single cycle implementation
4.3. Multi cycle implementation
4.4. Exceptions
4.5. Pipelining
Page 146
© NTK 2009
4.2. Single cycle implementation
ALU Control
Main Control Unit
Page 147
© NTK 2009
ALU Control
ALU has four control inputs:
Depend on the instruction class, one of first five functions of ALU will be
performed (NOR is needed for other parts):
– Load, store instructions: use ALU to compute memory address by addition
– R-type instructions: ALU performs one of five actions (add, sub, and, or, slt)
based on 6-bit function field in the instruction.
– Branch beq: ALU performs subtraction
Page 148
© NTK 2009
ALU control inputs
Page 149
© NTK 2009
Truth table which uses don’t care values to have compact minimization form
Designing the Main Control Unit
Page 150
© NTK 2009
Three instruction classes
The opcode is always contained in bits 31:26. We refer to this field as Opcode[5:0]
Two registers to be read (R-type, beq, sw) are always at 25:21 and 20:16
The base register for load and store instruction is always in bit 25:21
16 bit offset for load/store/beq is always in 15:0
The destination register is in one of two places:
– For load, it’s in 20:16
– For R-type, it’s in 15:11
Page 151
© NTK 2009
A simple datapath with all control lines identified
Page 152
© NTK 2009
Single datapath
with control unit
153
Single control & datapath
extended to handle jump
154
Why a single-cycle implementation is not used today?
Although single-cycle design will work correctly, it will not be
used in modern designs because it’s inefficient:
– Clock cycle must have the same length for every instruction in this
single-cycle design => Clock cycle is determined by the longest
possible path in the machine.
• Longest instruction: Load instruction: uses 5 functional units in series:
instruction memory, register file, ALU, data memory, register file.
=> Not good since several instruction could fit in a shorter clock cycle
– Some functional units (ALU) must be duplicated
=> Hardware cost
Page 155
© NTK 2009
4. CPU Organization
4.1. Building a datapath
4.2. Single cycle implementation
4.3. Multicycle implementation
4.4. Exceptions
4.5. Pipelining
Page 156
© NTK 2009
4.3. Multicycle implementation
Multicycle implementation:
– Break each instruction into a series of steps corresponding to the
functional unit operations needed.
• Each step in the execution will take one clock cycle. Each instruction
will take different numbers of clock cycles.
• Allow a functional unit to be used more than once per instruction =>
hardware share.
Page 157
© NTK 2009
High-level view of multicycle datapath
Key elements:
– A shared memory unit for both instructions & data
– A single ALU
– Require additional registers: IR, Memory data register, A,B, ALUout
– Require additional multiplexers© NTK 2009
Page 158
Multicycle implementation
At the end of a clock cycle, all data used in subsequent clock
cycle must be stored in a state element:
– Data used by subsequent instructions in a later clock cycle is stored in
one of the programmer-visible state elements: register file, PC, memory
– Data used by the same instruction in a later clock cycle is stored in one
of additional registers.
One clock cycle can accommodate at most one of the
following operations: A memory access, A register file access
(two reads and one write), An ALU operation
=> Any data produced by these three functional units must be saved
into a temporary register for use in later cycle. If not saved, timing race.
Page 159
© NTK 2009
Additional registers
Instruction Register (IR) and Memory data register (MDR) are
added to save the output of memory for instruction read and a
data read. Two separate registers are used since both values
are needed during the same clock.
A,B registers are used to hold values read from register file.
ALUout register holds the output of ALU.
Page 160
© NTK 2009
Additional multiplexers
Replacing three ALUs of single-cycle datapath by a single ALU requires:
– An additional multiplexer: {A,PC} => the first ALU input.
– An additional multiplexer: {B,4,Sign extend, shift left 2} => the second ALU input
Page 161
© NTK 2009
Multicycle datapath with control lines
Page 162
© NTK 2009
Branch and jump instructions
With jump and branch instructions, three possible sources of
values to be written into PC:
– The output of ALU, PC + 4 => PC directly
– Register ALUout – the address of branch target after it is computed
– Address of jump target = Lower 26 bit of IR << 2 concatenated with 4
upper bits of the incremented PC
Page 163
© NTK 2009
Complete datapath for multicycle implementation
Page 164
© NTK 2009
31
R
31
I
31
J
Page 165
op
25
rs
20
rt
15
6 bits
5 bits
5 bits
Opcode
Source
register 1
Source
register 2
op
25
rs
20
rt
rd
10
5 bits
Destination
register
15
sh
fn
5
5 bits
6 bits
Shift
amount
Opcode
extension
operand / offset
6 bits
5 bits
5 bits
16 bits
Opcode
Source
or base
Destination
or data
Immediate operand
or address offset
op
25
jump target address
0
0
0
6 bits
1 0 0 0 0 0 0 0 0 0 0 0 26
0 bits
0 0 0 0 0 0 0 1 1 1 1 0 1
Opcode
Memory word address (byte address divided by 4)
© NTK 2009
ftp address to download materials
ftp://dce.hut.edu.vn/kiennt/
Page 166
© NTK 2009
Action of the 1-bit control signals
Page 167
© NTK 2009
Action of the 2-bit control signals
Page 168
© NTK 2009
Breaking instruction execution into clock cycles
Each MIPS instruction needs from three to five of these steps
– 1. Instruction fetch step.
– 2. Instruction decode and register fetch step.
Same for all instructions
– 3. Execution, memory address computation or branch completion.
– 4. Memory access or R-type instruction completion step.
– 5. Memory read completion step.
Page 169
© NTK 2009
1. Instruction fetch step
Fetch the instruction from memory and compute the address
of the next sequential instructions:
– IR <= Memory[PC];
• Assert the control signals MemRead and IRWrite and set IorD = 0 to
select PC as the source of address.
– PC <= PC + 4;
• ALUSrcA = 0 (sending PC to ALU input)
• ALUSrcB = 01 (sending 4 to ALU input)
• ALUOp = 00 (to make ALU add)
• PCSource = 00 (to select output of ALU addition as source)
• assert PCWrite to write back to PC
Page 170
© NTK 2009
2. Instruction decode and register fetch step
Read two registers in rs and rt instruction fields and compute
the branch target address with ALU
– A <= Reg[IR[25:21]]
Are these operations
necessary for all
– B <= Reg[IR[20:16]]
instructions?
– ALUOut <= PC + (sign-extend (IR[15:0])<<2
• ALUSrcA = 0 to select PC as ALU input
• ALUSrcB = 11 to select sign-extended and shifted offset field as ALU
input
• ALUOp = 00 to make ALU add.
Page 171
© NTK 2009
3. Execution, memory address computation or branch
completion step
This is the first cycle during which the datapath operation is
determined by the instruction class
– Memory reference
• ALUOut <= A + sign-extend (IR[15:0])
– Arithmetic-logical instruction (R-type)
• ALUOut <= A op B
– Branch
• if (A==B) PC <= ALUOut
– Jump
• PC <= {PC[31:28], (IR[25:0],2’b00)};
Page 172
© NTK 2009
4. Memory access or R-type instruction completion
During this step, a load or store instruction accesses memory
and an arithmetic-logical instruction writes its result.
– Memory reference:
• MDR <= Memory [ALUOut];
• or Memory [ALUOut] <= B;
– Arithmetic-logical instruction:
• Reg[ IR[15:11] ] <= ALUOut;
Page 173
© NTK 2009
5. Memory read completion step
During this step, loads complete by writing back the value
from memory
– Reg [IR[20:16]] <= MDR;
Page 174
© NTK 2009
Summary of steps taken to execute any instruction class
Page 175
© NTK 2009
Defining the control
We use state machine to describe the operation of MIPS
Page 176
© NTK 2009
Figure 5.32. FSM for instruction fetch and decode
Page 177
© NTK 2009
Figure 5.33. FSM for controlling memory-reference instructions
Page 178
© NTK 2009
Figure 5.34. FSM for R-type instruction
Page 179
© NTK 2009
Figure 5.35. FSM for branch instruction
Page 180
© NTK 2009
Figure 5.36. FSM for jump instruction
Page 181
© NTK 2009
Complete FSM
Page 182
© NTK 2009
4. CPU Organization
4.1. Building a datapath
4.2. Single cycle implementation
4.3. Multicycle implementation
4.4. Exceptions
Page 183
© NTK 2009
4.4. Exceptions
Exception:
– Is an unexpected event from within the processor
– Ex: arithmetic overflow, using an undefined instruction...
Interrupt:
– Is an event that causes an unexpected change in control flow but
comes from outside of the processor.
– Interrupts are used by IO devices to communicate with the processor.
– Ex: timer interrupt...
Page 184
© NTK 2009
How exceptions are handled?
Two types of exceptions which our current implementation can generate:
– Execution of an undefined instruction
– Execution of an arithmetic overflow
Basic action of machine when an exception occurs:
– Save address of offending instruction in the Exception Program Counter (EPC).
– Transfer control to the OS at some specific address.
• The OS can then take appropriate action which may involve providing some
service to the user program, taking predefined action in response to and
overflow, or stopping the execution of program and reporting errors.
• Then the OS can terminate the program or may continue its execution, using
EPC to determine the address to restart the program.
Page 185
© NTK 2009
How exceptions are handled?
We can implement the processing required for exception by
adding a few extra registers and control signals to our basic
implementation and by slightly extending the FSM.
– Two additional registers:
• EPC: 32-bit register used to hold address of the affected instruction
• Cause: 32-bit register used to record the cause of exception
– Cause register = 0: undefined instruction
– Cause register = 1: arithmetic overflow
– Two control signals used to cause EPC and Cause register to be
written: EPCWrite, CauseWrite.
– One-bit signal to set value 0/1 to Cause register.
– Write exception address of handling code to PC (8000 0180(16))
Page 186
© NTK 2009
The multicycle datapath with exception handling
Page 187
© NTK 2009
How control checks for exceptions?
Undefined instruction exception:
– This is detected when no next state is defined from state 1 for op value
– We handle this exception by defining the next-state value for all op
values rather than lw, sw, 0 (R-type), j and beq as state 10.
Arithmetic overflow exception:
– The ALU designed includes logic to detect overflow and a signal called
Overflow is provided as an output from ALU. This signal is used to
specify additional possible next state (state 11) for state 7.
Page 188
© NTK 2009
FSM
with
exception
detection
Page 189
© NTK 2009
Midterm exam (70’)
Using multicycle datapath implementation of MIPS, explain
the operation of the following instructions:
– a. load/store t1,15(s0)
– b. add/sub/and/or/slt s0,t1,t2
– c. beq t1,t2,Label
– d. jump Label
Page 190
© NTK 2009
4. CPU Organization
4.1. Building a datapath
4.2. Single cycle implementation
4.3. Multicycle implementation
4.4. Exceptions
4.5. Pipelining
Page 191
© NTK 2009
4.5. Pipelining
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Page 192
© NTK 2009
4.5. Pipelining
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Page 193
© NTK 2009
Introduction
Pipelining is an implementation technique used to enhance
performance in which multiple instructions are overlapped in
execution.
The idea of pipelining is: Divide an instruction into smaller
steps which can be executed concurrently.
Page 194
© NTK 2009
Introduction
MIPS instructions classically take five steps:
– 1. Fetch instruction from memory.
– 2. Read registers while decoding the instruction. The format of MIPS
instruction allow reading and decoding to occur simultaneously.
– 3. Execute the operation or calculate the address.
– 4. Access an operand in memory.
– 5. Write the result into a register.
Page 195
© NTK 2009
Pipelining
Inst 1
Inst 2
Inst 3
Inst 4
Inst 5
Inst 6
Inst 7
Page 196
1
2
3
4
FI
DI
EX
FO
5
6
7
8
WR
© NTK 2009
9
10
11
12
Single-cycle versus Pipelined performance
In this example, we limit our attention to 8 instructions: lw,
sw, add, sub, and, or, slt, beq.
Total time for each instruction:
Page 197
© NTK 2009
Single-cycle versus Pipelined performance
Page 198
© NTK 2009
Pipelined performance
If the stages are perfectly balanced, then the time between
instructions in pipelined processor when the number of
instruction is large is equal to:
Page 199
© NTK 2009
4.5. Pipelining
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Page 200
© NTK 2009
A pipelined datapath
The division of an instruction into five stages means a fivestage pipeline, which in turn means five instructions will be in
execution during any single clock cycle:
Page 201
© NTK 2009
Pipelined execution in single-cycle datapath of MIPS
Because each resource is used during only one of the five stages
of an instruction, allowing it to be shared by other instructions
during the other four stages.
=> To retain the value of an individual instruction for its other four
Page 202
© NTK 2009
stages, the value must be saved in a register.
Pipelined datapath of MIPS
Registers
Registers must be wide enough to store all data corresponding to the lines
Page 203
© NTK 2009
that go through them. IF/ID:64, ID/EX:128, EX/MEM:97,MEM/WB:64
How are portions of datapath used during an
instruction?
– Example of a load instruction
Page 204
© NTK 2009
IF: first stage of an instruction
Page 205
© NTK 2009
ID: second stage of an instruction
Page 206
© NTK 2009
EX: third stage of an instruction
Page 207
© NTK 2009
MEM: forth stage of an instruction
Page 208
© NTK 2009
WB: fifth stage of an instruction
Page 209
© NTK 2009
Write register number needed to be saved for WB stage
Corrected datapath to handle a load instruction
Page 210
© NTK 2009
4.5. Pipelining
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Page 211
© NTK 2009
Pipelined datapath of MIPS with control
For pipelined datapath, we can divide control lines into five
groups according to the pipeline stage:
– IF: The control signals to read instruction memory and to write the PC
are always asserted, so there is nothing special to control in this
pipeline stage.
– ID: As in the previous stage, the same thing happens at every clock
cycle, so there are no optional control lines to set.
– EX: The signals to be set are RegDst, ALUOp, ALUSrc. The signals
select Result register, the ALU operation, and either read data 2 or a
sign-extended immediate for ALU.
– MEM: The control lines set in this stage are Branch, MemRead,
MemWrite. These signals are set by the branch equal, load, store
instructions.
– WB: two control lines are MemtoReg, which decides between sending
the ALU result or the memory value to the register file, and RegWrite,
which writes the chosen value.
Page 212
© NTK 2009
Pipelined datapath of MIPS with control
Page 213
© NTK 2009
9 control lines for final three stages
Four control lines for EX stage
Three control lines for MEM stage
Page 214
© NTK 2009
Two control lines for WB stage
The pipelined datapath with control signals connected
Page 215
© NTK 2009
Designing Instruction sets for pipelining
The design of MIPS was designed for pipeline execution:
– 1. All MIPS instructions are the same length.
» Easy to fetch instruction in 1st stage and decode it in 2nd stage.
– 2. MIPS only has a few instruction formats, with the source register fields being
located in the same place in each instruction.
» In 2nd stage, register file can be read at the same time that hardware is
deciding what type of instruction was fetched.
– 3. Memory operands only appear in load and store in MIPS.
– 4. Operands are aligned in memory.
– RISC – Reduced Instruction Set Computer
– CISC – Complex Instruction Set Computer
Page 216
© NTK 2009
4.5. Pipelining
Introduction
Pipelined Datapath
Pipelined Control
Pipelined Hazards
Page 217
© NTK 2009
Pipeline Hazards
There are situations in pipelining when the next instruction
can not execute in the following clock cycle => hazards.
Three types of hazards:
– Structural Hazards
– Data Hazards
• Forwarding
• Stall
– Control Hazards
Page 218
© NTK 2009
Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Page 219
© NTK 2009
Structural Hazards
Structural Hazards occur when the hardware cannot support
the combination of instructions that we want to execute in the
same clock cycle.
MIPS is designed to avoid structural harzards when designing
a pipeline.
– Ex: suppose we have a single memory instead of two mem (instruction
memory + data memory)
Conflict when
both read
memory
Page 220
© NTK 2009
Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Page 221
© NTK 2009
Data Hazards
Data Hazards occur when the pipeline must be stalled
because one step must wait for another to complete.
– In a computer pipeline, data hazards arise from the dependence of one
instruction on an ealier one that is still in pipeline.
add s0, t0, t1
sub t2, s0, t3
# add instruction doesn’t write its result to s0 until fifth stage
# sub instruction read s0 register in second stage
Solution:
– Observation: we don’t need to wait for instruction to complete before
trying to resolve the data hazards. For code above, as soon as the ALU
creates sum for addition, we can supply it as input for substraction.
– Adding extra hardware to to retrieve missing item early from the internal
Page 222
© NTK 2009
resources is called forwading or
bypassing.
Representation of instruction pipeline
IF: Instruction fetch
ID: Instruction decode + register file read
EX: Execution
MEM: Memory access
WB: Write back
The shading indicates the element is used by instruction
– MEM: white => add doesn’t access memory
– Shading on the right half: Read
Page
© NTK 2009
– 223
Shading on the left half: Write
Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Page 224
© NTK 2009
Data hazards and forwarding
Instead of waiting until the fifth stage of add instruction for the
result in s0 register, forwarding uses extra hardware to write
the output of EX stage of add instruction to the input of EX
stage for sub instruction. (extra hardware to create connection)
Page 225
© NTK 2009
Data hazards and forwarding
Forwarding paths are valid if only the destination stage is
later in time than the source stage.
– Ex:
lw
s0,20(t1)
sub t2,s0,t3
there cannot be a valid forwarding path from the output of memory access
stage in the first instruction to the input of the execution stage of the
following , since that would mean going backward in time.
pipeline
stall
Page 226
© NTK 2009
Data hazards and forwarding
pipeline
stall
Pipeline stall or bubble is added to solve data hazards when
an R-format instruction following a load tries to use the data.
Page 227
© NTK 2009
Reordering code to avoid pipeline stalls
Page 228
© NTK 2009
Reordering code to avoid pipeline stalls
Page 229
© NTK 2009
Forwarding
Forwarding yields another insight into the MIPS architecture.
Each MIPS instruction writes at most one result and does so
near the end of the pipeline. Forwarding is harder if there are
multiple results to forward per instruction or they need to write
a result early on in instruction execution.
Notes:
– The name “forwarding” comes from the idea that the result is passed
forward from an earlier instruction to a later instruction. “Bypassing”
comes from passing the result by register file to the desired unit.
Page 230
© NTK 2009
ALU and pipeline registers without forwarding
Page 231
© NTK 2009
ALU and pipeline registers with forwarding
Page 232
© NTK 2009
Datapath modified to resolve hazards via forwarding
Page 233
© NTK 2009
Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Page 234
© NTK 2009
Data hazards and Stalls
Forwarding can not resolve
this program because the
destination stage is ealier in
time than the source stage.
?
Page 235
© NTK 2009
Stalls are inserted into pipeline
Page 236
© NTK 2009
Harzards detection unit used for stalls
Page 237
© NTK 2009
Harzards detection unit used for stalls
Harzards detection unit is used to solve hazard when an
instruction tries to read a register following a load instruction
that writes the same register.
Checking for load instruction:
– if (ID/EX.MemRead and
load instruction or not?
(( ID/EX.RegisterRt = IF/ID.RegisterRs) or
( ID/EX.RegisterRt = IF/ID.RegisterRt)))
stall the pipeline
Page 238
© NTK 2009
the destination register
field of load instruction
in EX stage matches
either source register
of the instruction in ID
stage
Hazards
Structural Hazards
Data Hazards
– Forwarding
– Stall
Control Hazards
Page 239
© NTK 2009
Control Hazards (Branch Hazards)
Control hazards or branch hazards occur when the proper
instruction can not execute in the proper clock cycle because
the instruction that was fetched is not the one that is needed,
that is the flow of instruction addresses is not what the
pipeline expected.
In branch instruction:
– We need to fetch the instruction following the branch on the next clock
cycle to allow pipeline.
– But the pipeline cannot possibly know what the next instruction should
be, since it only just received the branch instruction from memory.
Page 240
© NTK 2009
Impact of pipeline on the branch instruction
if branch is taken,
we need to discard
(flush) these instructions
Page 241
© NTK 2009
Solution 1: Stall
If we can move branch decision up earlier, we have less delay.
Assume that we put in enough extra hardware so that we can test
registers, calculate the branch address, and update PC during the second
stage of pipeline. Even with this highly costed extra hardware, we still
have stall:
– lw instruction, executed if the branch fails, is stalled one extra 200ps clock
cycle before starting.
Page 242
The cost of ©extra
NTK 2009hw for most computer is too high
Solution 2: Branch Prediction
To solve branch hazards, we use branch prediction:
– One simple approach is to always predict that branches will be
undertaken.
• When branches are correctly undertaken, the pipeline proceeds at
full speed.
• When branches are taken, we have some stalls.
Page 243
© NTK 2009
Branch prediction is solution to control hazards
Page 244
© NTK 2009
Branch prediction
A more sophisticated version of branch prediction would have
some branches predicted as undertaken, and some as taken:
– For usual branch like previous example, we predict branch as
undertaken.
– For branches at the loops, branches are usually jump back to the top of
loop. So branches are predicted as taken.
– Dynamic hardware predictor:
• May change predictions for a branch over the life of a program.
• Keeping a history table for each branch as taken or not taken, the
using the past behavior to predict the future.
Page 245
© NTK 2009
Solution 3: Delayed decision (used by MIPS)
The delayed branch always executes the next sequential
instruction, with branch taking place after that one instruction
delay.
Compiler and assembler try to place an instruction that
always executes after the branch in the branch delay slot.
add t1,t2,t4
beq s0,s1,LABEL
or t5,t6,t7
LABEL:
....
Page 246
beq s0,s1,LABEL
add t1,t2,t4
or t5,t6,t7
LABEL:
....
© NTK 2009
Scheduling the branch delay slot
Page 247
© NTK 2009
Pipeline summary
We have seen three models of execution: single cycle,
multicycle, and pipelined.
Pipelined control strives for 1 clock cycle per instruction, like
single cycle, but also for a fast clock cycle, like multicycle.
Page 248
© NTK 2009
FINISH
Page 249
© NTK 2009