Chapter-2-Instructions-Language-of-the

Download Report

Transcript Chapter-2-Instructions-Language-of-the

Chapter 2
Instructions: Language
of the Computer
Instructions: Overview



Language of the machine
More primitive than higher level languages, e.g., no sophisticated
control flow such as while or for loops
Very restrictive


We’ll be working with the MIPS instruction set architecture





e.g., MIPS arithmetic instructions
inspired most architectures developed since the 80's
used by NEC, Nintendo, Silicon Graphics, Sony
the name is not related to millions of instructions per second !
it stands for microcomputer without interlocked pipeline stages !
Design goals: maximize performance and minimize cost and reduce
design time
MIPS Arithmetic

All MIPS arithmetic instructions have 3 operands
Operand order is fixed (e.g., destination first)

Example:

C code:
A = B + C
MIPS code:
add $s0, $s1, $s2
compiler’s job to associate
variables with registers
MIPS Arithmetic

Design Principle 1: simplicity favors regularity.
Translation: Regular instructions make for simple hardware!

Simpler hardware reduces design time and manufacturing cost.


Of course this complicates some things...
C code:
A = B + C + D;
E = F - A;
MIPS code
(arithmetic):
add $t0, $s1, $s2
add $s0, $t0, $s3
sub $s4, $s5, $s0
Allowing variable number
of operands would
simplify the assembly
code but complicate the
hardware.
Performance penalty: high-level code translates to denser
machine code.
MIPS Arithmetic


Operands must be in registers – only 32 registers provided
(which require 5 bits to select one register). Reason for small
number of registers:
Design Principle 2: smaller is faster.


Why?
Electronic signals have to travel further on a physically larger chip
increasing clock cycle time.
Smaller is also cheaper!
Aside: MIPS Register Convention
Name
Register
Number
$zero
0
$at
1
$v0 - $v1
2-3
$a0 - $a3
4-7
$t0 - $t7
8-15
$s0 - $s7
16-23
$t8 - $t9
24-25
$gp
28
$sp
29
$fp
30
$ra
31
Usage
Preserve
on call?
constant 0 (hardware)
n.a.
reserved for assembler
n.a.
returned values
no
arguments
yes
temporaries
no
saved values
yes
temporaries
no
global pointer
yes
stack pointer
yes
frame pointer
yes
return addr (hardware)
yes
Chapter 2 — Instructions: Language of the Computer — 6
Registers vs. Memory


Registers are faster to access than
memory
Operating on memory data requires loads
and stores


More instructions to be executed
Compiler must use registers for variables
as much as possible


Only spill to memory for less frequently used
variables
Register optimization is important!
Chapter 2 — Instructions: Language of the Computer — 7
Memory Organization



Viewed as a large single-dimension array with access by
address
A memory address is an index into the memory array
Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a load/store
is a byte
0
1
8 bits of data
2
8 bits of data
3
4
5
6
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
...
Memory Organization


Bytes are load/store units, but most data items use larger words
For MIPS, a word is 32 bits or 4 bytes.
0
4
8
12
32 bits of data
32 bits of data
Registers correspondingly hold 32 bits of data
32 bits of data
32 bits of data
...


232 bytes with byte addresses from 0 to 232-1
230 words with byte addresses 0, 4, 8, ... 232-4


i.e., words are aligned
what are the least 2 significant bits of a word address?
Memory Operands

Main memory used for composite data


To apply arithmetic operations



Each address identifies an 8-bit byte
Words are aligned in memory


Load values from memory into registers
Store result from register to memory
Memory is byte addressed


Arrays, structures, dynamic data
Address must be a multiple of 4
MIPS is Big Endian


Most-significant byte at least address of a word
c.f. Little Endian: least-significant byte at least address
Chapter 2 — Instructions: Language of the Computer — 10
Immediate Operands

Constant data specified in an instruction
addi $s3, $s3, 4

No subtract immediate instruction

Just use a negative constant
addi $s2, $s1, -1

Design Principle 3: Make the common
case fast


Small constants are common
Immediate operand avoids a load instruction
Chapter 2 — Instructions: Language of the Computer — 11
The Constant Zero

MIPS register 0 ($zero) is the constant 0


Cannot be overwritten
Useful for common operations

E.g., move between registers
add $t2, $s1, $zero
Chapter 2 — Instructions: Language of the Computer — 12
Load/Store Instructions


Load and store instructions
Example:
C code:
A[8] = h + A[8];
value
MIPS code (load):
(arithmetic):
(store):


offset
address
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 32($s3)
Load word has destination first, store has destination last
Remember MIPS arithmetic operands are registers, not memory
locations

therefore, words must first be moved from memory to registers using
loads before they can be operated on; then result can be stored back to
memory
A MIPS Example

Can we figure out the assembly code?
swap(int v[], int k);
{ int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
swap:
muli $2, $5, 4
add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31
So far we’ve learned:

MIPS



loading words but addressing bytes
arithmetic on registers only
Instruction
Meaning
add $s1, $s2, $s3
sub $s1, $s2, $s3
lw $s1, 100($s2)
sw $s1, 100($s2)
$s1 = $s2 + $s3
$s1 = $s2 – $s3
$s1 = Memory[$s2+100]
Memory[$s2+100]= $s1
Machine Language

Instructions, like registers and words of data, are also 32 bits long



Example: add $t0, $s1, $s2
registers are numbered, e.g., $t0 is 8, $s1 is 17, $s2 is 18
Instruction Format R-type (“R” for aRithmetic):
000000 10001 10010 01000 00000
op
opcode –
operation
6 bits
rs
first
register
source
operand
5 bits
rt
second
register
source
operand
5 bits
rd
register
destination
operand
5 bits
100000
shamt
shift
amount
5 bits
funct
function field selects variant
of operation
6 bits
Machine Language

Consider the load-word and store-word instructions,

what would the regularity principle have us do?



we would have only 5 or 6 bits to determine the offset from a base
register - too little…
Design Principle 3: Good design demands a compromise
Introduce a new type of instruction format


I-type (“I” for Immediate) for data transfer instructions
Example: lw $t0, 1002($s2)
100011 10010
6 bits
op
01000
5 bits
rs
rt
0000001111101010
5 bits
16 bits
16 bit offset
Stored Program Concept


Instructions are bit sequences, just like data
Programs are stored in memory

to be read or written just like data
memory for data, programs,
compilers, editors, etc.
Processor
Memory

Fetch & Execute Cycle



instructions are fetched and put into a special register
bits in the register control the subsequent actions (= execution)
fetch the next instruction and repeat
SPIM – the MIPS simulator

SPIM (MIPS spelt backwards!) is a MIPS simulator that







reads MIPS assembly language files and translates to machine
language
executes the machine language instructions
shows contents of registers and memory
works as a debugger (supports break-points and single-stepping)
provides basic OS-like services, like simple I/O
SPIM is freely available on-line
An important part of our course is to actually write MIPS assembly
code and run using SPIM – the only way to learn assembly (or any
programming language) is to write lots and lots of code!!!
Memory Organization:
Big/Little Endian Byte Order

Bytes in a word can be numbered in two ways:


byte 0 at the leftmost (most significant) to byte 3 at the rightmost
(least significant), called big-endian 0 1 2 3
byte 3 at the leftmost (most significant) to byte 0 at the rightmost
(least significant), called little-endian
3 2 1 0
Big-endian
Memory
Little-endian
Memory
Byte 0 Byte 1 Byte 2 Byte 3
Word 0
Byte 3 Byte 2 Byte 1 Byte 0
Word 0
Byte 4 Byte 5 Byte 6 Byte 7
Word 1
Byte 7 Byte 6 Byte 5 Byte 4
Word 1
Memory Organization:
Big/Little Endian Byte Order

SPIM’s memory storage depends on that of the underlying
machine






Intel 80x86 processors are little-endian
because SPIM always shows words from left to right a “mental
adjustment” has to be made for little-endian memory as in Intel PCs
in our labs: start at right of first word go left, start at right of next word
go left, …!
Word placement in memory (from .data area of code) or word
access (lw, sw) is the same in big or little endian
Byte placement and byte access (lb, lbu, sb) depend on big or
little endian because of the different numbering of bytes within a
word
Character placement in memory (from .data area of code)
depend on big or little endian because it is equivalent to byte
placement after ASCII encoding
Run storeWords.asm from SPIM examples!!
Control: Conditional Branch

Decision making instructions

alter the control flow,


i.e., change the next instruction to be executed
MIPS conditional branch instructions:
bne $t0, $t1, Label
beq $t0, $t1, Label
000100 01000 01001

Example:
I-type instructions
0000000000011001
if (i==j) h = i + j;
bne $s0, $s1, Label
add $s3, $s0, $s1
Label:
....
beq $t0, $t1, Label
(= addr.100)
word-relative addressing:
25 words = 100 bytes;
also PC-relative (more…)
Addresses in Branch

Instructions:
bne $t4,$t5,Label
beq $t4,$t5,Label

Next instruction is at Label if $t4 != $t5
Next instruction is at Label if $t4 = $t5
Format:
I
op
rs
rt
16 bit offset

16 bits is too small a reach in a 232 address space

Solution: specify a register (as for lw and sw) and add it to offset


use PC (= program counter), called PC-relative addressing, based
on
principle of locality: most branches are to instructions near current
instruction (e.g., loops and if statements)
Addresses in Branch


Further extend reach of branch by observing all MIPS
instructions are a word (= 4 bytes), therefore word-relative
addressing:
MIPS branch destination address = (PC + 4) + (4 * offset)
Because hardware typically increments PC early
in execute cycle to point to next instruction


so offset = (branch destination address – PC – 4)/4
but SPIM does offset = (branch destination address – PC)/4
Control: Unconditional Branch (Jump)

MIPS unconditional branch instructions:
j Label

Example:
if (i!=j)
h=i+j;
else
h=i-j;

beq $s4, $s5, Lab1
add $s3, $s4, $s5
j Lab2
Lab1: sub $s3, $s4, $s5
Lab2: ...
J-type (“J” for Jump) instruction format

Example: j Label # addr. Label = 100
word-relative
addressing:
25 words = 100 bytes
000010
6 bits
op
00000000000000000000011001
26 bits
26 bit number
Addresses in Jump

Word-relative addressing also for jump instructions
J

op
26 bit address
MIPS jump j instruction replaces lower 28 bits of the PC with
A00 where A is the 26 bit address; it never changes upper 4 bits




Example: if PC = 1011X (where X = 28 bits), it is replaced with
1011A00
there are 16(=24) partitions of the 232 size address space, each
partition of size 256 MB (=228), such that, in each partition the upper
4 bits of the address is same.
if a program crosses an address partition, then a j that reaches a
different partition has to be replaced by jr with a full 32-bit address
first loaded into the jump register
therefore, OS should always try to load a program inside a single
partition
Constants

Small constants are used quite frequently (50% of operands)
e.g.,
A = A + 5;
B = B + 1;
C = C - 18;

Solutions? Will these work?


create hard-wired registers (like $zero) for constants like 1
put program constants in memory and load them as required

MIPS Instructions:
addi $29, $29, 4
slti $8, $18, 10
andi $29, $29, 6
ori $29, $29, 4

How to make this work?
Immediate Operands

Make operand part of instruction itself!

Design Principle 4: Make the common case fast

Example: addi $sp, $sp, 4 # $sp = $sp + 4
001000
11101
11101
6 bits
5 bits
5 bits
op
rs
rt
0000000000000100
16 bits
16 bit number
How about larger constants?


First we need to load a 32 bit constant into a register
Must use two instructions for this: first new load upper immediate
instruction for upper 16 bits
lui $t0, 1010101010101010
filled with zeros
1010101010101010

0000000000000000
Then get lower 16 bits in place:
ori $t0, $t0, 1010101010101010
1010101010101010
0000000000000000
0000000000000000
1010101010101010
1010101010101010
1010101010101010
ori

Now the constant is in place, use register-register arithmetic
So far

Instruction
Format
add $s1,$s2,$s3
sub $s1,$s2,$s3
lw $s1,100($s2)
sw $s1,100($s2)
bne $s4,$s5,Lab1
beq $s4,$s5,Lab2
j Lab3

Meaning
R
R
I
I
I
I
J
$s1 = $s2 + $s3
$s1 = $s2 – $s3
$s1 = Memory[$s2+100]
Memory[$s2+100] = $s1
Next instr. is at Lab1 if $s4 != $s5
Next instr. is at Lab2 if $s4 = $s5
Next instr. is at Lab3
Formats:
R
op
rs
rt
rd
I
op
rs
rt
16 bit address
J
op
26 bit address
shamt
funct


Instructions for bitwise manipulation
Operation
C
Java
MIPS
Shift left
<<
<<
sll
Shift right
>>
>>>
srl
Bitwise AND
&
&
and, andi
Bitwise OR
|
|
or, ori
Bitwise NOT
~
~
nor
§2.6 Logical Operations
Logical Operations
Useful for extracting and inserting
groups of bits in a word
Chapter 2 — Instructions: Language of the Computer — 31
Control Flow

We have: beq, bne. What about branch-if-less-than?

New instruction:
slt $t0, $s1, $s2

if
$s1 < $s2 then
$t0 = 1
else
$t0 = 0
Can use this instruction to build blt $s1, $s2, Label


how? We generate more than one instruction – pseudo-instruction
can now build general control structures

The assembler needs a register to manufacture instructions
from pseudo-instructions

There is a convention (not mandatory) for use of registers
Compiling If Statements

C code:
if (i==j) f = g+h;
else f = g-h;


f, g, … in $s0, $s1, …
Compiled MIPS code:
bne
add
j
Else: sub
Exit: …
$s3, $s4, Else
$s0, $s1, $s2
Exit
$s0, $s1, $s2
Assembler calculates addresses
Chapter 2 — Instructions: Language of the Computer — 33
Compiling Loop Statements

C code:
while (save[i] == k) i += 1;


i in $s3, k in $s5, address of save in $s6
Compiled MIPS code:
Loop: sll
add
lw
bne
addi
j
Exit: …
$t1,
$t1,
$t0,
$t0,
$s3,
Loop
$s3, 2
$t1, $s6
0($t1)
$s5, Exit
$s3, 1
Chapter 2 — Instructions: Language of the Computer — 34
Basic Blocks

A basic block is a sequence of instructions
with


No embedded branches (except at end)
No branch targets (except at beginning)


A compiler identifies basic
blocks for optimization
An advanced processor
can accelerate execution
of basic blocks
Chapter 2 — Instructions: Language of the Computer — 35
More Conditional Operations

Set result to 1 if a condition is true


slt rd, rs, rt


if (rs < rt) rd = 1; else rd = 0;
slti rt, rs, constant


Otherwise, set to 0
if (rs < constant) rt = 1; else rt = 0;
Use in combination with beq, bne
slt $t0, $s1, $s2
bne $t0, $zero, L
# if ($s1 < $s2)
#
branch to L
Chapter 2 — Instructions: Language of the Computer — 36
Branch Instruction Design


Why not blt, bge, etc?
Hardware for <, ≥, … slower than =, ≠




Combining with branch involves more work
per instruction, requiring a slower clock
All instructions penalized!
beq and bne are the common case
This is a good design compromise
Chapter 2 — Instructions: Language of the Computer — 37
Signed vs. Unsigned



Signed comparison: slt, slti
Unsigned comparison: sltu, sltui
Example



$s0 = 1111 1111 1111 1111 1111 1111 1111 1111
$s1 = 0000 0000 0000 0000 0000 0000 0000 0001
slt $t0, $s0, $s1 # signed


–1 < +1  $t0 = 1
sltu $t0, $s0, $s1

# unsigned
+4,294,967,295 > +1  $t0 = 0
Chapter 2 — Instructions: Language of the Computer — 38

Steps required
1.
2.
3.
4.
5.
6.
Place parameters in registers
Transfer control to procedure
Acquire storage for procedure
Perform procedure’s operations
Place result in register for caller
Return to place of call
§2.8 Supporting Procedures in Computer Hardware
Procedure Calling
Chapter 2 — Instructions: Language of the Computer — 39
Register Usage



$a0 – $a3: arguments (reg’s 4 – 7)
$v0, $v1: result values (reg’s 2 and 3)
$t0 – $t9: temporaries


$s0 – $s7: saved





Can be overwritten by callee
Must be saved/restored by callee
$gp: global pointer for static data (reg 28)
$sp: stack pointer (reg 29)
$fp: frame pointer (reg 30)
$ra: return address (reg 31)
Chapter 2 — Instructions: Language of the Computer — 40
Procedure Call Instructions

Procedure call: jump and link
jal ProcedureLabel
 Address of following instruction put in $ra
 Jumps to target address

Procedure return: jump register
jr $ra
 Copies $ra to program counter
 Can also be used for computed jumps

e.g., for case/switch statements
Chapter 2 — Instructions: Language of the Computer — 41
Leaf Procedure Example

C code:
int leaf_example (int g, h, i, j)
{ int f;
f = (g + h) - (i + j);
return f;
}
 Arguments g, …, j in $a0, …, $a3
 f in $s0 (hence, need to save $s0 on stack)
 Result in $v0
Chapter 2 — Instructions: Language of the Computer — 42
Leaf Procedure Example

MIPS code:
leaf_example:
addi $sp, $sp, -4
sw
$s0, 0($sp)
add $t0, $a0, $a1
add $t1, $a2, $a3
sub $s0, $t0, $t1
add $v0, $s0, $zero
lw
$s0, 0($sp)
addi $sp, $sp, 4
jr
$ra
Save $s0 on stack
Procedure body
Result
Restore $s0
Return
Chapter 2 — Instructions: Language of the Computer — 43
Non-Leaf Procedures


Procedures that call other procedures
For nested call, caller needs to save on the
stack:



Its return address
Any arguments and temporaries needed after
the call
Restore from the stack after the call
Chapter 2 — Instructions: Language of the Computer — 44
Non-Leaf Procedure Example

C code:
int fact (int n)
{
if (n < 1) return f;
else return n * fact(n - 1);
}
 Argument n in $a0
 Result in $v0
Chapter 2 — Instructions: Language of the Computer — 45
Non-Leaf Procedure Example

MIPS code:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
L1: addi
jal
lw
lw
addi
mul
jr
$sp,
$ra,
$a0,
$t0,
$t0,
$v0,
$sp,
$ra
$a0,
fact
$a0,
$ra,
$sp,
$v0,
$ra
$sp, -8
4($sp)
0($sp)
$a0, 1
$zero, L1
$zero, 1
$sp, 8
$a0, -1
0($sp)
4($sp)
$sp, 8
$a0, $v0
#
#
#
#
adjust stack for 2 items
save return address
save argument
test for n < 1
#
#
#
#
#
#
#
#
#
#
if so, result is 1
pop 2 items from stack
and return
else decrement n
recursive call
restore original n
and return address
pop 2 items from stack
multiply to get result
and return
Chapter 2 — Instructions: Language of the Computer — 46
Local Data on the Stack

Local data allocated by callee


e.g., C automatic variables
Procedure frame (activation record)

Used by some compilers to manage stack storage
Chapter 2 — Instructions: Language of the Computer — 47
Memory Layout


Text: program code
Static data: global
variables



Dynamic data: heap


e.g., static variables in C,
constant arrays and strings
$gp initialized to address
allowing ±offsets into this
segment
E.g., malloc in C, new in
Java
Stack: automatic storage
Chapter 2 — Instructions: Language of the Computer — 48
Byte/Halfword Operations


Could use bitwise operations
MIPS byte/halfword load/store

String processing is a common case
lb rt, offset(rs)

Sign extend to 32 bits in rt
lbu rt, offset(rs)

lhu rt, offset(rs)
Zero extend to 32 bits in rt
sb rt, offset(rs)

lh rt, offset(rs)
sh rt, offset(rs)
Store just rightmost byte/halfword
Chapter 2 — Instructions: Language of the Computer — 49
String Copy Example

C code (naïve):
Null-terminated string
void strcpy (char x[], char y[])
{ int i;
i = 0;
while ((x[i]=y[i])!='\0')
i += 1;
}
 Addresses of x, y in $a0, $a1
 i in $s0

Chapter 2 — Instructions: Language of the Computer — 50
String Copy Example

MIPS code:
strcpy:
addi
sw
add
L1: add
lbu
add
sb
beq
addi
j
L2: lw
addi
jr
$sp,
$s0,
$s0,
$t1,
$t2,
$t3,
$t2,
$t2,
$s0,
L1
$s0,
$sp,
$ra
$sp, -4
0($sp)
$zero, $zero
$s0, $a1
0($t1)
$s0, $a0
0($t3)
$zero, L2
$s0, 1
0($sp)
$sp, 4
#
#
#
#
#
#
#
#
#
#
#
#
#
adjust stack for 1 item
save $s0
i = 0
addr of y[i] in $t1
$t2 = y[i]
addr of x[i] in $t3
x[i] = y[i]
exit loop if y[i] == 0
i = i + 1
next iteration of loop
restore saved $s0
pop 1 item from stack
and return
Chapter 2 — Instructions: Language of the Computer — 51

Most constants are small


16-bit immediate is sufficient
For the occasional 32-bit constant
lui rt, constant


Copies 16-bit constant to left 16 bits of rt
Clears right 16 bits of rt to 0
lhi $s0, 61
0000 0000 0111 1101 0000 0000 0000 0000
ori $s0, $s0, 2304 0000 0000 0111 1101 0000 1001 0000 0000
§2.10 MIPS Addressing for 32-Bit Immediates and Addresses
32-bit Constants
Chapter 2 — Instructions: Language of the Computer — 52
Branch Addressing

Branch instructions specify


Opcode, two registers, target address
Most branch targets are near branch


Forward or backward
op
rs
rt
constant or address
6 bits
5 bits
5 bits
16 bits
PC-relative addressing


Target address = PC + offset × 4
PC already incremented by 4 by this time
Chapter 2 — Instructions: Language of the Computer — 53
Jump Addressing

Jump (j and jal) targets could be
anywhere in text segment


Encode full address in instruction
op
address
6 bits
26 bits
(Pseudo)Direct jump addressing

Target address = PC31…28 : (address × 4)
Chapter 2 — Instructions: Language of the Computer — 54
Target Addressing Example

Loop code from earlier example

Assume Loop at location 80000
Loop: sll
$t1, $s3, 2
80000
0
0
19
9
4
0
add
$t1, $t1, $s6
80004
0
9
22
9
0
32
lw
$t0, 0($t1)
80008
35
9
8
0
bne
$t0, $s5, Exit 80012
5
8
21
2
19
19
1
addi $s3, $s3, 1
80016
8
j
80020
2
Exit: …
Loop
20000
80024
Chapter 2 — Instructions: Language of the Computer — 55
Branching Far Away


If branch target is too far to encode with
16-bit offset, assembler rewrites the code
Example
beq $s0,$s1, L1
↓
bne $s0,$s1, L2
j L1
L2: …
Chapter 2 — Instructions: Language of the Computer — 56
Addressing Mode Summary
Chapter 2 — Instructions: Language of the Computer — 57
Assembler Pseudoinstructions


Most assembler instructions represent
machine instructions one-to-one
Pseudoinstructions: figments of the
assembler’s imagination
→ add $t0, $zero, $t1
blt $t0, $t1, L → slt $at, $t0, $t1
move $t0, $t1
bne $at, $zero, L

$at (register 1): assembler temporary
Chapter 2 — Instructions: Language of the Computer — 58


Illustrates use of assembly instructions
for a C bubble sort function
Swap procedure (leaf)

void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
v in $a0, k in $a1, temp in $t0
§2.13 A C Sort Example to Put It All Together
C Sort Example
Chapter 2 — Instructions: Language of the Computer — 59
The Procedure Swap
swap: sll $t1, $a1, 2
# $t1 = k * 4
add $t1, $a0, $t1 # $t1 = v+(k*4)
#
(address of v[k])
lw $t0, 0($t1)
# $t0 (temp) = v[k]
lw $t2, 4($t1)
# $t2 = v[k+1]
sw $t2, 0($t1)
# v[k] = $t2 (v[k+1])
sw $t0, 4($t1)
# v[k+1] = $t0 (temp)
jr $ra
# return to calling routine
Chapter 2 — Instructions: Language of the Computer — 60
The Sort Procedure in C

Non-leaf (calls swap)

void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i += 1) {
for (j = i – 1;
j >= 0 && v[j] > v[j + 1];
j -= 1) {
swap(v,j);
}
}
}
v in $a0, k in $a1, i in $s0, j in $s1
Chapter 2 — Instructions: Language of the Computer — 61
The Procedure Body
move
move
move
for1tst: slt
beq
addi
for2tst: slti
bne
sll
add
lw
lw
slt
beq
move
move
jal
addi
j
exit2:
addi
j
$s2, $a0
$s3, $a1
$s0, $zero
$t0, $s0, $s3
$t0, $zero, exit1
$s1, $s0, –1
$t0, $s1, 0
$t0, $zero, exit2
$t1, $s1, 2
$t2, $s2, $t1
$t3, 0($t2)
$t4, 4($t2)
$t0, $t4, $t3
$t0, $zero, exit2
$a0, $s2
$a1, $s1
swap
$s1, $s1, –1
for2tst
$s0, $s0, 1
for1tst
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
save $a0 into $s2
save $a1 into $s3
i = 0
$t0 = 0 if $s0 ≥ $s3 (i ≥ n)
go to exit1 if $s0 ≥ $s3 (i ≥ n)
j = i – 1
$t0 = 1 if $s1 < 0 (j < 0)
go to exit2 if $s1 < 0 (j < 0)
$t1 = j * 4
$t2 = v + (j * 4)
$t3 = v[j]
$t4 = v[j + 1]
$t0 = 0 if $t4 ≥ $t3
go to exit2 if $t4 ≥ $t3
1st param of swap is v (old $a0)
2nd param of swap is j
call swap procedure
j –= 1
jump to test of inner loop
i += 1
jump to test of outer loop
Move
params
Outer loop
Inner loop
Pass
params
& call
Inner loop
Outer loop
Chapter 2 — Instructions: Language of the Computer — 62
The Full Procedure
sort:
addi $sp,$sp, –20
sw $ra, 16($sp)
sw $s3,12($sp)
sw $s2, 8($sp)
sw $s1, 4($sp)
sw $s0, 0($sp)
…
…
exit1: lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $s3,12($sp)
lw $ra,16($sp)
addi $sp,$sp, 20
jr $ra
#
#
#
#
#
#
#
make room on stack for 5 registers
save $ra on stack
save $s3 on stack
save $s2 on stack
save $s1 on stack
save $s0 on stack
procedure body
#
#
#
#
#
#
#
restore $s0 from stack
restore $s1 from stack
restore $s2 from stack
restore $s3 from stack
restore $ra from stack
restore stack pointer
return to calling routine
Chapter 2 — Instructions: Language of the Computer — 63
Effect of Compiler Optimization
Compiled with gcc for Pentium 4 under Linux
Relative Performance
3
140000
Instruction count
120000
2.5
100000
2
80000
1.5
60000
1
40000
0.5
20000
0
0
none
O1
O2
Clock Cycles
180000
160000
140000
120000
100000
80000
60000
40000
20000
0
none
O3
O1
O2
O3
O2
O3
CPI
2
1.5
1
0.5
0
none
O1
O2
O3
none
O1
Chapter 2 — Instructions: Language of the Computer — 64
Effect of Language and Algorithm
Bubblesort Relative Performance
3
2.5
2
1.5
1
0.5
0
C/none
C/O1
C/O2
C/O3
Java/int
Java/JIT
Quicksort Relative Performance
2.5
2
1.5
1
0.5
0
C/none
C/O1
C/O2
C/O3
Java/int
Java/JIT
Quicksort vs. Bubblesort Speedup
3000
2500
2000
1500
1000
500
0
C/none
C/O1
C/O2
C/O3
Java/int
Java/JIT
Chapter 2 — Instructions: Language of the Computer — 65

Powerful instruction  higher performance


Fewer instructions required
But complex instructions are hard to implement



May slow down all instructions, including simple ones
§2.18 Fallacies and Pitfalls
Fallacies
Compilers are good at making fast code from simple
instructions
Use assembly code for high performance


But modern compilers are better at dealing with
modern processors
More lines of code  more errors and less
productivity
Chapter 2 — Instructions: Language of the Computer — 66
Fallacies

Backward compatibility  instruction set
doesn’t change

But they do accrete more instructions
x86 instruction set
Chapter 2 — Instructions: Language of the Computer — 67
Pitfalls

Sequential words are not at sequential
addresses


Increment by 4, not by 1!
Keeping a pointer to an automatic variable
after procedure returns


e.g., passing pointer back via an argument
Pointer becomes invalid when stack popped
Chapter 2 — Instructions: Language of the Computer — 68

Design principles
1.
2.
3.
4.

Layers of software/hardware


Simplicity favors regularity
Smaller is faster
Make the common case fast
Good design demands good compromises
§2.19 Concluding Remarks
Concluding Remarks
Compiler, assembler, hardware
MIPS: typical of RISC ISAs

c.f. x86
Chapter 2 — Instructions: Language of the Computer — 69
Concluding Remarks

Measure MIPS instruction executions in
benchmark programs


Consider making the common case fast
Consider compromises
Instruction class
MIPS examples
SPEC2006 Int
SPEC2006 FP
Arithmetic
add, sub, addi
16%
48%
Data transfer
lw, sw, lb, lbu,
lh, lhu, sb, lui
35%
36%
Logical
and, or, nor, andi,
ori, sll, srl
12%
4%
Cond. Branch
beq, bne, slt,
slti, sltiu
34%
8%
Jump
j, jr, jal
2%
0%
Chapter 2 — Instructions: Language of the Computer — 70