Transcript Chapter 2

ECE369
Chapter 2
ECE369
1
Instruction Set Architecture
•
•
•
The repertoire of instructions of a computer
Different computers have different instruction sets
– But with many aspects in common
Early computers had very simple instruction sets
– Simplified implementation
Many modern computers also have simple instruction sets
•
A very important abstraction
•
– interface between hardware and low-level software
– standardizes instructions, machine language bit patterns, etc.
– advantage: different implementations of the same architecture
– disadvantage: sometimes prevents using new innovations
•
Modern instruction set architectures:
– IA-32, PowerPC, MIPS, SPARC, ARM, and others
ECE369
2
The MIPS Instruction Set
•
•
•
•
Used as the example throughout the book
Stanford MIPS commercialized by MIPS Technologies (www.mips.com)
Large share of embedded core market
– Applications in consumer electronics, network/storage equipment,
cameras, printers, …
Typical of many modern ISAs
– See MIPS Reference Data tear-out card, and Appendixes B and E
ECE369
3
MIPS arithmetic
•
•
All instructions have 3 operands
Operand order is fixed (destination first)
Example:
C code:
a = b + c
MIPS ‘code’:
add a, b, c
(we’ll talk about registers in a bit)
“The natural number of operands for an operation like addition is
three…requiring every instruction to have exactly three operands, no
more and no less, conforms to the philosophy of keeping the
hardware simple”
ECE369
4
MIPS arithmetic
•
•
Design Principle: simplicity favors regularity.
Of course this complicates some things...
C code:
a = b + c + d;
MIPS code:
add a, b, c
add a, a, d
•
•
Operands must be registers, only 32 registers provided
Each register contains 32 bits
•
Design Principle: smaller is faster.
ECE369
Why?
5
Registers vs. Memory
•
•
•
•
•
•
•
•
•
•
Arithmetic instructions use register
operands
Control
MIPS has a 32 × 32-bit register file
Use for frequently accessed data
Numbered 0 to 31
Datapath
32-bit data called a “word”
Assembler names
Processor
$t0, $t1, …, $t9 for temporary values
$s0, $s1, …, $s7 for saved variables
Compiler associates variables with registers
What about programs with lots of variables
ECE369
Input
Memory
Output
I/O
6
Memory Operands
•
•
•
•
•
Main memory used for composite data
– Arrays, structures, dynamic data
To apply arithmetic operations
– Load values from memory into registers
– Store result from register to memory
Memory is byte addressed
– Each address identifies an 8-bit byte
Words are aligned in memory
– Address must be a multiple of 4
MIPS is Big Endian
– Most-significant byte at least address of a word
– c.f. Little Endian: least-significant byte at least address
ECE369
7
Memory Organization
•
•
•
Viewed as a large, single-dimension array, with an address.
A memory address is an index into the array
"Byte addressing" means that the index points to a byte of memory.
0
1
2
3
4
5
6
...
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
ECE369
8
Memory Organization
•
•
•
•
•
Bytes are nice, but most data items use larger "words"
For MIPS, a word is 32 bits or 4 bytes.
0 32 bits of data
4 32 bits of data
Registers hold 32 bits of data
32
bits
of
data
8
12 32 bits of data
...
232 bytes with byte addresses from 0 to 232-1
230 words with byte addresses 0, 4, 8, ... 232-4
Words are aligned
i.e., what are the least 2 significant bits of a word address?
ECE369
9
Memory Operand Example 1
•
C code:
g = h + A[8];
•
– g in $s1, h in $s2, base address of A in $s3
Compiled MIPS code:
– Index 8 requires offset of 32
• 4 bytes per word
lw $t0, 32($s3)
add $s1, $s2, $t0
offset
# load word
base register
ECE369
10
Memory Operand Example 2
•
C code:
A[12] = h + A[8];
•
– h in $s2, base address of A in $s3
Compiled MIPS code:
– Index 8 requires offset of 32
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)
# load word
# store word
ECE369
11
Registers vs. Memory
•
•
•
Registers are faster to access than memory
Operating on memory data requires loads and stores
– More instructions to be executed
Compiler must use registers for variables as much as possible
– Only spill to memory for less frequently used variables
– Register optimization is important!
ECE369
12
Instructions
•
Example:
C code:
g = h + A[i];
# $s3 stores base address of A and
# g,h and i in $s1,$s2 and $s4
Add $t1,$s4,$s4
t1 = 2*i
Add $t1,$t1,$t1
t1 = 4*i
Add $t1,$t1,$s3
t1 = 4*i + s3
Lw $t0,0($t1)
t0 = A[i]
Add $s1,$s2,$t0
g = h + A[i]
ECE369
13
So far we’ve learned:
•
MIPS
— loading words but addressing bytes
— arithmetic on registers only
•
Instruction
Meaning
add $s1, $s2, $s3
sub $s1, $s2, $s3
lw $s1, 100($s2)
sw $s1, 100($s2)
$s1 = $s2 + $s3
$s1 = $s2 – $s3
$s1 = Memory[$s2+100]
Memory[$s2+100] = $s1
ECE369
14
Immediate Operands
•
Constant data specified in an instruction
addi $s3, $s3, 4
•
No subtract immediate instruction
– Just use a negative constant
addi $s2, $s1, -1
•
Design Principle : Make the common case fast
– Small constants are common
– Immediate operand avoids a load instruction
ECE369
15
The Constant Zero
•
•
MIPS register 0 ($zero) is the constant 0
– Cannot be overwritten
Useful for common operations
– E.g., move between registers
add $t2, $s1, $zero
ECE369
16
Policy of Use Conventions
Name Register number
$zero
0
$v0-$v1
2-3
$a0-$a3
4-7
$t0-$t7
8-15
$s0-$s7
16-23
$t8-$t9
24-25
$gp
28
$sp
29
$fp
30
$ra
31
Usage
the constant value 0
values for results and expression evaluation
arguments
temporaries
saved
more temporaries
global pointer
stack pointer
frame pointer
return address
Register 1 ($at) reserved for assembler, 26-27 for operating system
ECE369
17
Unsigned Binary Integers
•
Given an n-bit number
n 1
x  x n1 2


 x n2 2
   x1 2  x 0 2
1
0
Range: 0 to +2n – 1
Example


n2
0000 0000 0000 0000 0000 0000 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
Using 32 bits

0 to +4,294,967,295
ECE369
18
2s-Complement Signed Integers
•
Given an n-bit number
n 1
x   x n1 2


 x n2 2
   x1 2  x 0 2
1
0
Range: –2n – 1 to +2n – 1 – 1
Example


n2
1111 1111 1111 1111 1111 1111 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
Using 32 bits

–2,147,483,648 to +2,147,483,647
ECE369
19
2s-Complement Signed Integers
• Bit 31 is sign bit
– 1 for negative numbers
– 0 for non-negative numbers
• –(–2n – 1) can’t be represented
• Non-negative numbers have the same
unsigned and 2s-complement
representation
• Some specific numbers
–
–
–
–
0: 0000 0000 … 0000
–1: 1111 1111 … 1111
Most-negative:
1000 0000 … 0000
Most-positive: 0111 1111 … 1111
ECE369
20
Signed Negation
•
Complement and add 1
– Complement means 1 → 0, 0 → 1
x  x  1111...1112  1
x  1  x

Example: negate +2


+2 = 0000 0000 … 00102
–2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
ECE369
21
Sign Extension
• Representing a number using more bits
– Preserve the numeric value
• In MIPS instruction set
– addi: extend immediate value
– lb, lh: extend loaded byte/halfword
– beq, bne: extend the displacement
• Replicate the sign bit to the left
– c.f. unsigned values: extend with 0s
• Examples: 8-bit to 16-bit
– +2: 0000 0010 => 0000 0000 0000 0010
– –2: 1111 1110 => 1111 1111 1111 1110
ECE369
22
Representing Instructions
• Instructions are encoded in binary
– Called machine code
• MIPS instructions
– Encoded as 32-bit instruction words
– Small number of formats encoding operation
code (opcode), register numbers, …
– Regularity!
• Register numbers
– $t0 – $t7 are reg’s 8 – 15
– $t8 – $t9 are reg’s 24 – 25
– $s0 – $s7 are reg’s 16 – 23
ECE369
23
MIPS Format
•
Instructions, like registers and words
– are also 32 bits long
–
–
•
add $t1, $s1, $s2
Registers: $t1=9, $s1=17, $s2=18
Instruction Format:
000000
op
10001
rs
10010
rt
01001
rd
00000
shamt
ECE369
100000
funct
24
MIPS I-format Instructions
op
rs
rt
constant or address
6 bits
5 bits
5 bits
16 bits
• Immediate arithmetic and load/store
instructions
– rt: destination or source register number
– Constant: –215 to +215 – 1
– Address: offset added to base address in rs
• Design Principle: Good design demands
good compromises
– Different formats complicate decoding, but allow
32-bit instructions uniformly
– Keep formats as similar as possible
25
ECE369
Machine Language
•
Consider the load-word and store-word instructions,
•
Introduce a new type of instruction format
– I-type for data transfer instructions
– other format was R-type for register
Example: lw $t0, 32($s2)
•
35
18
9
op
rs
rt
32
16 bit number
ECE369
26
Summary
Name Register number
$zero
0
$v0-$v1
2-3
$a0-$a3
4-7
$t0-$t7
8-15
$s0-$s7
16-23
$t8-$t9
24-25
$gp
28
$sp
29
$fp
30
$ra
31
Usage
the constant value 0
values for results and expression evaluation
arguments
temporaries
instruction format op
saved
add
R 0
more temporaries
global pointer
sub
R 0
stack pointer
lw
I 35
frame pointer
sw
I 43
return address
A[300]=h+A[300]
Lw $t0,1200($t1)
Add $t0, $s2, $t0
Sw $t0, 1200($t1)
rs
reg
reg
reg
reg
rt
reg
reg
reg
reg
rd shamt funct address
reg 0 32 na
reg 0 34 na
na na na address
na na na address
# $t1 = base address of A, $s2 stores h
# use $t0 for temporary register
Op rs,rt,address
Op,rs,rt,rd,shamt,funct
Op,rs,rt,address
ECE369
35,9,8,1200
0,18,8,8,0,32
43,9,8,1200
27
Summary of Instructions We Have Seen So Far
ECE369
28
Logical Operations
•

Instructions for bitwise manipulation
Operation
C
Java
MIPS
Shift left
<<
<<
sll
Shift right
>>
>>>
srl
Bitwise AND
&
&
and, andi
Bitwise OR
|
|
or, ori
Bitwise NOT
~
~
nor
Useful for extracting and inserting
groups of bits in a word
ECE369
29
Shift Operations
•
•
•
op
rs
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
shamt: how many positions to shift
Shift left logical
– Shift left and fill with 0 bits
– sll by i bits multiplies by 2i
Shift right logical
– Shift right and fill with 0 bits
– srl by i bits divides by 2i (unsigned only)
ECE369
30
AND Operations
•
Useful to mask bits in a word
– Select some bits, clear others to 0
and $t0, $t1, $t2
$t2
0000 0000 0000 0000 0000 1101 1100 0000
$t1
0000 0000 0000 0000 0011 1100 0000 0000
$t0
0000 0000 0000 0000 0000 1100 0000 0000
ECE369
31
OR Operations
•
Useful to include bits in a word
– Set some bits to 1, leave others unchanged
or $t0, $t1, $t2
$t2
0000 0000 0000 0000 0000 1101 1100 0000
$t1
0000 0000 0000 0000 0011 1100 0000 0000
$t0
0000 0000 0000 0000 0011 1101 1100 0000
ECE369
32
NOT Operations
•
•
Useful to invert bits in a word
– Change 0 to 1, and 1 to 0
MIPS has NOR 3-operand instruction
– a NOR b == NOT ( a OR b )
nor $t0, $t1, $zero
Register 0:
always read
as zero
$t1
0000 0000 0000 0000 0011 1100 0000 0000
$t0
1111 1111 1111 1111 1100 0011 1111 1111
ECE369
33
Summary of New Instructions
ECE369
34
Example
swap(int* v, int k);
{ int temp;
temp = v[k]
v[k] = v[k+1];
v[k+1] = temp;
}
swap:
sll $t0, $a1, 4
add $t0, $t0, $a0
lw $t1, 0($t0)
lw $t2, 4($t0)
sw $t2, 0($t0)
sw $t1, 4($t0)
jr $31
ECE369
35
Conditional Operations
•
Branch to a labeled instruction if a condition is true
– Otherwise, continue sequentially
• beq rs, rt, L1
– if (rs == rt) branch to instruction labeled L1;
• bne rs, rt, L1
– if (rs != rt) branch to instruction labeled L1;
• j L1
– unconditional jump to instruction labeled L1
ECE369
36
Compiling If Statements
•
C code:
if (i==j) f = g+h;
else f = g-h;
•
– f, g, … in $s0, $s1, …
Compiled MIPS code:
bne
add
j
Else: sub
Exit: …
$s3, $s4, Else
$s0, $s1, $s2
Exit
$s0, $s1, $s2
Assembler calculates
addresses
ECE369
37
Using If-Else
$s0 = f
$s1 = g
$s2 = h
$s3 = i
$s4 = j
$s5 = k
Where is 0,1,2,3 stored?
ECE369
38
Addresses in Branches
•
•
Instructions:
bne $t4,$t5,Label
beq $t4,$t5,Label
Next instruction is at Label if $t4≠$t5
Next instruction is at Label if $t4=$t5
Formats:
I
op
rs
rt
16 bit address
•What if the “Label” is too far away (16 bit address is not enough)
ECE369
39
Addresses in Branches and Jumps
• Instructions:
bne $t4,$t5,Label
beq $t4,$t5,Label
j Label
if $t4 != $t5
if $t4 = $t5
Next instruction is at Label
• Formats:
I
op
J
op
rs
rt
16 bit address
26 bit address
•
ECE369
40
More Conditional Operations
•
Set result to 1 if a condition is true
– Otherwise, set to 0
• slt rd, rs, rt
– if (rs < rt) rd = 1; else rd = 0;
• slti rt, rs, constant
– if (rs < constant) rt = 1; else rt = 0;
ECE369
41
Control Flow
• We have: beq, bne, what about Branch-if-less-than?
If (a<b) # a in $s0, b in $s1
slt
$t0, $s0, $s1
bne $t0, $zero, Less
# t0 gets 1 if a<b
# go to Less if $t0 is not 0
Combination of slt and bne implements branch on less than.
ECE369
42
Branch Instruction Design
•
•
Why not blt, bge, etc?
Hardware for <, ≥, … slower than =, ≠
– Combining with branch involves more work per instruction,
requiring a slower clock
– All instructions penalized!
• beq and bne are the common case
• This is a good design compromise
ECE369
43
Signed vs. Unsigned
•
•
•
Signed comparison: slt, slti
Unsigned comparison: sltu, sltui
Example
– $s0 = 1111
1111 1111 1111 1111 1111 1111 1111
$s1 = 0000 0000 0000 0000 0000 0000 0000 0001
–
– slt
$t0, $s0, $s1
# signed
• –1 < +1  $t0 = 1
– sltu $t0, $s0, $s1
# unsigned
• +4,294,967,295 > +1  $t0 = 0
ECE369
44
While Loop
While (save[i] == k)
i = i+j;
# i, j and k correspond to registers
# $s3, $s4 and $s5
# array base address at $s6
Loop: add $t1, $s3, $s3
add $t1, $t1, $t1
add $t1, $t1, $s6
lw $t0, 0($t1)
bne $t0, $s5, Exit
add $s3, $s3, $s4
j
loop
Exit:
ECE369
45
What does this code do?
ECE369
46
Overview of MIPS
•
•
•
simple instructions all 32 bits wide
very structured, no unnecessary baggage
only three instruction formats
R
op
rs
rt
rd
I
op
rs
rt
16 bit address
J
op
shamt
funct
26 bit address
ECE369
47
Arrays vs. Pointers
clear1( int array[ ], int size)
{
int i;
for (i=0; i<size; i++)
array[i]=0;
}
clear2(int* array, int size)
{
int* p;
for( p=&array[0]; p<&array[size]; p++)
*p=0;
}
CPI for arithmetic, data transfer, branch type of instructions are 1, 2,
and 1 correspondingly. Which code is faster?
ECE369
48
Clear1
array in $a0
size in $a1
i in $t0
clear1( int array[ ], int size)
{
int i;
for (i=0; i<size; i++)
array[i]=0;
}
add
loop1: add
$t0,$zero,$zero # i=0, register $t0=0
$t1,$t0,$t0
# $t1=i*2
add
add
sw
addi
slt
$t1,$t1,$t1
$t2,$a0,$t1
$zero, 0($t2)
$t0,$t0,1
$t3,$t0,$a1
# $t1=i*4
# $t2=address of array[i]
# array[i]=0
# i=i+1
# $t3=(i<size)
bne
$t3,$zero,loop1 # if (i<size) go to loop1
ECE369
49
Clear2, Version 2
clear2(int* array, int size)
{
int* p;
for( p=&array[0]; p<&array[size]; p++)
*p=0;
}
loop2:
Array and size
to registers $a0 and $a1
add
$t0,$a0,$zero
# p = address of array[0]
add
add
add
sw
addi
slt
bne
$t1,$a1,$a1
$t1,$t1,$t1
$t2,$a0,$t1
$zero,0($t0)
$t0,$t0,4
$t3,$t0,$t2
$t3,zero,loop2
# $t1 = size*2
# $t1 = size*4 Distance of last element
# $t2 = address of array[size]
# memory[p]=0
# p = p+4
# $t3=(p<&array[size])
# if (p<&array[size]) go to loop2
50
ECE369
Array vs. Pointer
loop1:
add
add
add
add
sw
addi
slt
bne
add
loop2:
add
add
add
sw
addi
slt
bne
$t0,$zero,$zero
$t1,$t0,$t0
$t1,$t1,$t1
$t2,$a0,$t1
$zero, 0($t2)
$t0,$t0,1
$t3,$t0,$a1
$t3,$zero,loop1
# i=0, register $t0=0
# $t1=i*2
# $t1=i*4
# $t2=address of array[i]
# array[i]=0
# i=i+1
# $t3=(i<size)
# if (i<size) go to loop1
$t0,$a0,$zero
7 instructions
inside loop
# p = address of array[0]
$t1,$a1,$a1
$t1,$t1,$t1
$t2,$a0,$t1
$zero,0($t0)
# $t1 = size*2
# $t1 = size*4
4 instructions
# $t2 = address of array[size] inside loop
# memory[p]=0
$t0,$t0,$4
$t3,$t0,$t2
$t3,zero,loop2
# p = p+4
# $t3=(p<&array[size])
# if (p<&array[size]) go to loop2
ECE369
51
Summary
ECE369
52
Procedure Calling
•
Steps required
1.
Place parameters in registers
2.
Transfer control to procedure
3.
Acquire storage for procedure
4. Perform procedure’s operations
5.
Place result in register for caller
6.
Return to place of call
ECE369
53
Elaboration
Name Register number
$zero
0
$v0-$v1
2-3
$a0-$a3
4-7
$t0-$t7
8-15
$s0-$s7
16-23
$t8-$t9
24-25
$gp
28
$sp
29
$fp
30
$ra
31
Usage
the constant value 0
values for results and expression evaluation
arguments
temporaries
saved
more temporaries
global pointer
stack pointer
frame pointer
return address
What if there are more than 4 parameters for a function call?
Addressable via frame pointer
References to variables in the stack have the same offset
ECE369
54
Procedure Call Instructions
•
Procedure call: jump and link
jal ProcedureLabel
•
– Address of following instruction put in $ra
– Jumps to target address
Procedure return: jump register
jr $ra
– Copies $ra to program counter
– Can also be used for computed jumps
• e.g., for case/switch statements
ECE369
55
What is the Use of Frame Pointer?
Variables local to procedure do not fit in registers !!!
ECE369
56
Nested Procedures,
function_main(){
function_a(var_x);
:
return;
}
function_a(int size){
function_b(var_y);
:
return;
}
function_b(int count){
:
return;
}
/* passes argument using $a0 */
/* function is called with “jal” instruction */
/* passes argument using $a0 */
/* function is called with “jal” instruction */
Resource Conflicts ???
ECE369
57
Stack
• Last-in-first-out queue
• Register # 29 reserved as stack
pointer
• Points to most recently
allocated address
• Grows from higher to lower
address
• Subtracting $sp
• Adding data – Push
• Removing data – Pop
ECE369
58
Function Call and Stack Pointer
jr
ECE369
59
Recursive Procedures Invoke Clones !!!
int fact (int n) {
if (n < 1 )
return ( 1 );
else
return ( n * fact ( n-1 ) );
}
Registers $a0 and $ra
“n” corresponds to $a0
Program starts with the label of the procedure “fact”
How many registers do we need to save on the stack?
ECE369
60
Factorial Code
200 fact:addi
204
sw
sw
L1:
236
240
$sp, $sp, -8 #adjust stack for 2 items
$ra, 4($sp) #save return address
$a0, 0($sp) #save argument n
slti
beq
$t0, $a0, 1
# is n<1?
$t0, $zero, L1 # if not  go to L1
addi
addi
jr
$v0, $zero, 1
$sp, $sp, 8
$ra
addi
jal
$a0, $a0, -1
fact
lw
lw
addi
mult
jr
:
100 fact(3)
104 add ….
#return result
#pop items off stack
#return to calling proc.
#decrement n
# call fact(n-1)
$a0, 0($sp) # restore “n”
$ra, 4($sp) # restore address
$sp, $sp,8
# pop 2 items
$v0,$a0,$v0 # return n*fact(n-1)
$ra
# return to caller
ECE369
ra = 104
a0= 3
sp= 40
vo=
int fact (int n) {
if (n < 1 )
return ( 1 );
else
return ( n * fact ( n-1 ) );
}
61
Assembly to Hardware Example
k is stored in $t0;
3 stored in $t2;
int i;
int k = 0;
add
for (i=0; i<3; i++){
add
k = k + i + 1;
addi
}
loop: add
k = k/2;
addi
addi
slt
bne
srl
i is stored in $t1
$t3 used as temp
$t0,$zero,$zero
$t1,$zero,$zero
$t2,$zero,3
$t0,$t0,$t1
$t0,$t0,1
$t1,$t1,1
$t3,$t1,$t2
$t3,$zero,loop
$t0,$t0,1
ECE369
# k=0, register $t0=0
# i=0, register $t1=0
# $t2=3
#k=k+i
#k=k+1
# i=i+1
# $t3= (i<3)
# if (i<3) go to loop
#k=k/2
62
Assembly to Hardware Example
add
add
addi
loop: add
addi
addi
slt
bne
$t0,$zero,$zero R-Type
$t1,$zero,$zero R-Type
$t2,$zero,3
I-Type
$t0,$t0,$t1
R-Type
$t0,$t0,1
I-Type
$t1,$t1,1
I-Type
$t3,$t1,$t2
R-Type
$t3,$zero,loop I-Type
srl
$t0,$t0,1
Instruction Types?
R-Type
R
op
rs
rt
rd
I
op
rs
rt
16 bit address
ECE369
shamt
funct
63
How do we represent in machine language?
op
rs
rt
rd
shamt
funct
0:
add
$t0,$zero,$zero 000000_00000_00000_01000_00000_100000
4:
add
$t1,$zero,$zero 000000_00000_00000_01001_00000_100000
8:
addi
$t2,$zero,3
001000_00000_01010_0000000000000011
12: loop:
add
$t0,$t0,$t1
000000_01000_01001_01000_00000_100000
16:
addi
$t0,$t0,1
001000_01000_01000_0000000000000001
20:
addi
$t1,$t1,1
001000_01001_01001_0000000000000001
24:
slt
$t3,$t1,$t2
000000_01001_01010_01011_00000_101010
28:
bne
$t3,$zero,loop
000101_00000_01011_1111111111111011
32:
srl
$t0,$t0,2
000000_00000_01000_01000_00001_000010
6 bits
R
I
5 bits
5 bits
5 bits
5 bits
6 bits
shamt
funct
op
rs
rt
rd
op
rs
rt
16 bit address
ECE369
PC+4+BR Addr
- 5
$t0 is reg 8
$t1 is reg 9
$t2 is reg 10
$t3 is reg 11
64
How do we represent in machine language?
loop:
add
$t0,$zero,$zero
add
$t1,$zero,$zero
addi
$t2,$zero,3
op
rs
rt
rd
shamt
funct
Instruction Memory
0
000000_00000_00000_01000_00000_100000
4
000000_00000_00000_01001_00000_100000
8
001000_00000_01010_0000000000000011
add
$t0,$t0,$t1
addi
$t0,$t0,1
12
000000_01000_01001_01000_00000_100000
addi
$t1,$t1,1
16
001000_01000_01000_0000000000000001
20
001000_01001_01001_0000000000000001
24
000000_01001_01010_01011_00000_101010
28
000101_00000_01011_1111111111111011
32
000000_00000_01000_01000_00001_000010
slt
$t3,$t1,$t2
bne
$t3,$zero,loop
srl
$t0,$t0,2
ECE369
65
Byte/Halfword Operations
•
•
Could use bitwise operations
MIPS byte/halfword load/store
– String processing is a common case
lb rt, offset(rs)
lh rt, offset(rs)
– Sign extend to 32 bits in rt
lbu rt, offset(rs)
offset(rs)
lhu rt,
– Zero extend to 32 bits in rt
sb rt, offset(rs)
sh rt, offset(rs)
– Store just rightmost byte/halfword
ECE369
66
String Copy Example
•
C code (naïve):
– Null-terminated string
void strcpy (char x[], char y[])
{ int i;
i = 0;
while ((x[i]=y[i])!='\0')
i += 1;
}
– Addresses of x, y in $a0, $a1
– i in $s0
ECE369
67
String Copy Example
• MIPS code:
strcpy:
addi
sw
add
L1: add
lbu
add
sb
beq
addi
j
L2: lw
addi
jr
$sp,
$s0,
$s0,
$t1,
$t2,
$t3,
$t2,
$t2,
$s0,
L1
$s0,
$sp,
$ra
$sp, -4
0($sp)
$zero, $zero
$s0, $a1
0($t1)
$s0, $a0
0($t3)
$zero, L2
$s0, 1
0($sp)
$sp, 4
ECE369
#
#
#
#
#
#
#
#
#
#
#
#
#
adjust stack for 1 item
save $s0
i = 0
addr of y[i] in $t1
$t2 = y[i]
addr of x[i] in $t3
x[i] = y[i]
exit loop if y[i] == 0
i = i + 1
next iteration of loop
restore saved $s0
pop 1 item from stack
and return
68
32-bit Constants
•
•
Most constants are small
– 16-bit immediate is sufficient
For the occasional 32-bit constant
lui rt, constant
– Copies 16-bit constant to left 16 bits of rt
– Clears right 16 bits of rt to 0
lhi $s0, 61
0000 0000 0111 1101 0000 0000 0000 0000
ori $s0, $s0, 2304 0000 0000 0111 1101 0000 1001 0000 0000
ECE369
69
Branch Addressing
•
•
Branch instructions specify
– Opcode, two registers, target address
Most branch targets are near branch
– Forward or backward

op
rs
rt
constant or address
6 bits
5 bits
5 bits
16 bits
PC-relative addressing


Target address = PC + offset × 4
PC already incremented by 4 by this time
ECE369
70
Jump Addressing
•

Jump (j and jal) targets could be anywhere in text segment
– Encode full address in instruction
op
address
6 bits
26 bits
(Pseudo)Direct jump addressing

Target address = PC31…28 : (address × 4)
ECE369
71
Target Addressing Example
•
Loop code from earlier example
– Assume Loop at location 80000
Loop: sll
$t1, $s3, 2
80000
0
0
19
9
4
0
add
$t1, $t1, $s6
80004
0
9
22
9
0
32
lw
$t0, 0($t1)
80008
35
9
8
0
bne
$t0, $s5, Exit 80012
5
8
21
2
19
19
1
addi $s3, $s3, 1
80016
8
j
80020
2
Exit: …
Loop
20000
80024
ECE369
72
Branching Far Away
•
•
If branch target is too far to encode with 16-bit offset, assembler
rewrites the code
Example
beq $s0,$s1, L1
↓
bne $s0,$s1, L2
j L1
L2:
…
ECE369
73
C Sort Example
•
•
Illustrates use of assembly instructions for a C bubble sort
function
Swap procedure (leaf)
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
– v in $a0, k in $a1, temp in $t0
ECE369
74
The Procedure Swap
swap: sll $t1, $a1, 2
# $t1 = k * 4
add $t1, $a0, $t1 # $t1 = v+(k*4)
#
(address of v[k])
lw $t0, 0($t1)
# $t0 (temp) = v[k]
lw $t2, 4($t1)
# $t2 = v[k+1]
sw $t2, 0($t1)
# v[k] = $t2 (v[k+1])
sw $t0, 4($t1)
# v[k+1] = $t0 (temp)
jr $ra
# return to calling
routine
ECE369
75
The Sort Procedure in C
• Non-leaf (calls swap)
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i += 1) {
for (j = i – 1;
j >= 0 && v[j] > v[j + 1];
j -= 1) {
swap(v,j);
}
}
}
– v in $a0, k in $a1, i in $s0, j in $s1
ECE369
76
The Procedure Body
move
move
move
for1tst: slt
beq
addi
for2tst: slti
bne
sll
add
lw
lw
slt
beq
move
move
jal
addi
j
exit2:
addi
j
$s2, $a0
$s3, $a1
$s0, $zero
$t0, $s0, $s3
$t0, $zero, exit1
$s1, $s0, –1
$t0, $s1, 0
$t0, $zero, exit2
$t1, $s1, 2
$t2, $s2, $t1
$t3, 0($t2)
$t4, 4($t2)
$t0, $t4, $t3
$t0, $zero, exit2
$a0, $s2
$a1, $s1
swap
$s1, $s1, –1
for2tst
$s0, $s0, 1
for1tst
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
save $a0 into $s2
save $a1 into $s3
i = 0
$t0 = 0 if $s0 ≥ $s3 (i ≥ n)
go to exit1 if $s0 ≥ $s3 (i ≥ n)
j = i – 1
$t0 = 1 if $s1 < 0 (j < 0)
go to exit2 if $s1 < 0 (j < 0)
$t1 = j * 4
$t2 = v + (j * 4)
$t3 = v[j]
$t4 = v[j + 1]
$t0 = 0 if $t4 ≥ $t3
go to exit2 if $t4 ≥ $t3
1st param of swap is v (old $a0)
2nd param of swap is j
call swap procedure
j –= 1
jump to test of inner loop
i += 1
jump to test of outer loop
ECE369
77
The Full Procedure
sort:
addi $sp,$sp, –20
sw $ra, 16($sp)
sw $s3,12($sp)
sw $s2, 8($sp)
sw $s1, 4($sp)
sw $s0, 0($sp)
…
…
exit1: lw $s0, 0($sp)
lw $s1, 4($sp)
lw $s2, 8($sp)
lw $s3,12($sp)
lw $ra,16($sp)
addi $sp,$sp, 20
jr $ra
#
#
#
#
#
#
#
make room on stack for 5 registers
save $ra on stack
save $s3 on stack
save $s2 on stack
save $s1 on stack
save $s0 on stack
procedure body
#
#
#
#
#
#
#
restore $s0 from stack
restore $s1 from stack
restore $s2 from stack
restore $s3 from stack
restore $ra from stack
restore stack pointer
return to calling routine
ECE369
78
Addressing Mode Summary
ECE369
79
Translation and Startup
Many compilers
produce object
modules directly
Static
linking
ECE369
80
Big Picture
ECE369
81
Assembler Pseudoinstructions
•
•
Most assembler instructions represent machine instructions one-toone
Pseudoinstructions: figments of the assembler’s imagination
→ add $t0, $zero, $t1
blt $t0, $t1, L → slt $at, $t0, $t1
move $t0, $t1
bne $at, $zero, L
– $at (register 1): assembler temporary
ECE369
82
Producing an Object Module
• Assembler (or compiler) translates program
into machine instructions
• Provides information for building a
complete program from the pieces
– Header: described contents of object module
– Text segment: translated instructions
– Static data segment: data allocated for the life of
the program
– Relocation info: for contents that depend on
absolute location of loaded program
– Symbol table: global definitions and external
refs
– Debug info: for associating with source code
ECE369
83
Linking Object Modules
•
•
Produces an executable image
1. Merges segments
2. Resolve labels (determine their addresses)
3. Patch location-dependent and external refs
Could leave location dependencies for fixing by a relocating loader
– But with virtual memory, no need to do this
– Program can be loaded into absolute location in virtual memory
space
ECE369
84
Loading a Program
•
Load from image file on disk into memory
1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory
• Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including $sp, $fp, $gp)
6. Jump to startup routine
• Copies arguments to $a0, … and calls main
• When main returns, do exit syscall
ECE369
85
Dynamic Linking
•
Only link/load library procedure when it is called
– Requires procedure code to be relocatable
– Avoids image bloat caused by static linking of all (transitively)
referenced libraries
– Automatically picks up new library versions
ECE369
86
Assembly Language vs. Machine Language
•
•
•
•
Assembly provides convenient symbolic representation
– much easier than writing down numbers
– e.g., destination first
Machine language is the underlying reality
– e.g., destination is no longer first
Assembly can provide 'pseudoinstructions'
– e.g., “move $t0, $t1” exists only in Assembly
– would be implemented using “add $t0,$t1,$zero”
When considering performance you should count real instructions
ECE369
87
Summary
•
•
•
Instruction complexity is only one variable
– lower instruction count vs. higher CPI / lower clock rate
Design Principles:
– simplicity favors regularity
– smaller is faster
– good design demands compromise
– make the common case fast
Instruction set architecture
– a very important abstraction indeed!
ECE369
88
ARM & MIPS Similarities
• ARM: the most popular embedded core
• Similar basic set of instructions to MIPS
ARM
MIPS
1985
1985
Instruction size
32 bits
32 bits
Address space
32-bit flat
32-bit flat
Data alignment
Aligned
Aligned
9
3
15 × 32-bit
31 × 32-bit
Memory
mapped
Memory
mapped
Date announced
Data addressing modes
Registers
Input/output
ECE369
89
The Intel x86 ISA
• Evolution with backward compatibility
– 8080 (1974): 8-bit microprocessor
• Accumulator, plus 3 index-register pairs
– 8086 (1978): 16-bit extension to 8080
• Complex instruction set (CISC)
– 8087 (1980): floating-point coprocessor
• Adds FP instructions and register stack
– 80286 (1982): 24-bit addresses, MMU
• Segmented memory mapping and protection
– 80386 (1985): 32-bit extension (now IA-32)
• Additional addressing modes and operations
• Paged memory mapping as well as segments
ECE369
90
The Intel x86 ISA
• Further evolution…
– i486 (1989): pipelined, on-chip caches and FPU
• Compatible competitors: AMD, Cyrix, …
– Pentium (1993): superscalar, 64-bit datapath
• Later versions added MMX (Multi-Media eXtension) instructions
• The infamous FDIV bug
– Pentium Pro (1995), Pentium II (1997)
• New microarchitecture (see Colwell, The Pentium Chronicles)
– Pentium III (1999)
• Added SSE (Streaming SIMD Extensions) and associated
registers
– Pentium 4 (2001)
• New microarchitecture
• Added SSE2 instructions
ECE369
91
The Intel x86 ISA
• And further…
– AMD64 (2003): extended architecture to 64 bits
– EM64T – Extended Memory 64 Technology
(2004)
• AMD64 adopted by Intel (with refinements)
• Added SSE3 instructions
– Intel Core (2006)
• Added SSE4 instructions, virtual machine support
– AMD64 (announced 2007): SSE5 instructions
• Intel declined to follow, instead…
– Advanced Vector Extension (announced 2008)
• Longer SSE registers, more instructions
• If Intel didn’t extend with compatibility, its
competitors would!
– Technical elegance ≠ market success
ECE369
92
Fallacies
• Powerful instruction  higher performance
– Fewer instructions required
– But complex instructions are hard to implement
• May slow down all instructions, including simple ones
– Compilers are good at making fast code from
simple instructions
• Use assembly code for high performance
– But modern compilers are better at dealing with
modern processors
– More lines of code  more errors and less
productivity
ECE369
93
Fallacies
•
Backward compatibility  instruction set doesn’t change
– But they do accrete more instructions
x86 instruction set
ECE369
94
Pitfalls
•
•
Sequential words are not at sequential addresses
– Increment by 4, not by 1!
Keeping a pointer to an automatic variable after procedure returns
– e.g., passing pointer back via an argument
– Pointer becomes invalid when stack popped
ECE369
95
Concluding Remarks
•
•
•
Design principles
1. Simplicity favors regularity
2. Smaller is faster
3. Make the common case fast
4. Good design demands good compromises
Layers of software/hardware
– Compiler, assembler, hardware
MIPS: typical of RISC ISAs
– c.f. x86
ECE369
96