Register - University of Rochester ECE
Download
Report
Transcript Register - University of Rochester ECE
ECE200 – Computer Organization
Chapter 3 – Instructions:
The Language of the
Machine
Homework 3
3.2, 3.5, 3.9, 3.11, 3.14, 3.16, 3.19, 3.25, 3.27
Warning: Some of you will find this to be the
most difficult chapter and hardest homework of
the course
Outline for Chapter 3 lectures
Basic processor operation
ISA motivation: enabling fast hardware
implementations
Reduced Instruction Set Computing (RISC)
MIPS instruction formats and addressing modes
Rundown of MIPS instructions
Procedure call
Arrays and pointers
Assembling, linking, and loading programs
Motorola MCore ISA
The heart of a modern microprocessor
destination operand
source
operand 1
Load instructions
bring data from
memory into the
register file
register
file
source
operand 2
ALU
ALU = Arithmetic
Logic Unit
operation
Store instructions
store data from the
register file into
memory
Register file holds data
to be operated on
Basic processing of an instruction
Instruction fetch
Get
the instruction from instruction memory
Instruction decode
Figure
Read source operands
Get
data to be operated on from the register file
Execute
Do
out what kind it is and how to control it
the operation
Write destination operand
Write
the result of the operation to the register file
Basic processing of an instruction
Instruction fetch
Get
the instruction from instruction memory
Basic processing of an instruction
Instruction decode
Figure
out what kind it is and how to control it
control
logic
Basic processing of an instruction
Read source operands
Get
data to be operated on from the register file
Basic processing of an instruction
Execute
Do
the operation
Basic processing of an instruction
Write destination operand
Write
the result of the operation to the register file
The job of the ISA designer is to…
create an ISA that enables fast hardware
designs…
that meet the design constraints (cost, power
dissipation, etc.)…
of the targeted market (workstation, handheld,
etc.)
Other jobs as well, such as
Simplifying
choices for the compiler
Ensuring the longevity of the ISA by anticipating future
technology trends
Making computer hardware fast
Computer hardware consists of datapaths and
control, each of which may limit machine speed
The datapaths pass the data that is being
computed on to the different units
The control steers the data to different units and
selects the operation of each unit
Example
input select
arithmetic
logic unit
MUX
operation
datapath
control
Making computer hardware fast
Goal #1: fast execution of simple operations
Goal #2: fast access of data from memory
Goal #3: fast fetching and decoding of
instructions
Fast execution of simple operations
Many programs contain predominately simple
integer operations (e.g., add, and, load, branch)
The results of operations are often used by
successive instructions
a + b + c + d
Microprocessor designers make these simple
operations as fast as possible and try to set the
clock period of the chip to match this speed
Register-register ISA
All operands for arithmetic operations are read
from the register file
destination operand
register #
source operand 1
register #
source operand 2
register #
destination operand
register
file
source
operand 1
source
operand 2
ALU
operation
Register
file (32 registers in MIPS) holds data to be executed
Big enough to hold most variables but small enough for a fast clock
Very
fast datapath and simple, fast control
Operations which require the result of immediately prior
operations can execute on back-to-back clock cycles
through an operation called forwarding (Ch 4)
Register-register ISA
add $4, $3, $10
destination operand
register #
source operand 1
register #
source operand 2
register #
destination operand
register
file
source
operand 1
source
operand 2
ALU
operation
Making computer hardware fast
Goal #1: fast execution of simple operations
Goal #2: fast access of data from memory
Goal #3: fast fetching and decoding of
instructions
Fast access of data from memory
In a register-register machine, load and store
operations are used to access data in memory
To access data memory, we need to first
calculate the data’s memory address, called the
effective address
Addressing modes provide the means for
representing offsets, array indexing, pointers,
etc. at the assembly language level
The number and complexity of addressing modes
is a tradeoff between INST and (CPI,CT)
More,
powerful modes reduces INST
Fewer, simple modes simplifies the EA calculation control and
datapath, and thus the time required to access memory
The ALU can serve the additional purpose of
effective address calculation to save hardware
Addressing mode example #1
Base mode (also called displacement mode)
Used
to access local variables
Example: ld $2, 4 ($1)
destination operand
destination operand
register #
source operand 1
register #
register
file
source
operand 1
effective
address
ALU
immediate
value
operation = add
data
cache
Addressing mode example #2
Scaled mode
Used
to index arrays
Example: ld $2, 4 ($1) [$3]
destination operand
destination operand
register #
source operand 1
register #
source operand 2
register #
register
file
source
operand 1
source
operand 2
x
ALU
scale
immediate
value
operation = add
+
effective
address
data
cache
Making computer hardware fast
Goal #1: fast execution of simple operations
Goal #2: fast access of data from memory
Goal #3: fast fetching and decoding of
instructions
Fast instruction fetching
The Program Counter is a special register that
holds the memory address of the currently
executing instruction
The PC is normally incremented to point to the
next sequential instruction, except in the case of
a jump or taken branch
logic
branch
displacement
branch
condition
PC
instruction
cache
instruction
Limiting the way that branches can modify the
PC speeds up the calculation of the next PC
Fast instruction decoding
After an instruction is fetched, we parse its fields
in hardware (decode it) to determine
The
operation to be performed
The location of the source and destination operands
Other instruction-specific information such as
The addressing mode and values (e.g., offset) for memory instructions
Constant values
Decode is greatly simplified (and therefore
execution begins sooner) if
All
instructions have the same length
There is consistency (regularity) among instruction types in
terms of where the fields are located
The number of fields is limited
Reduced Instruction Set Computing
RISC has its roots in
The
IBM 801 project started by John Cocke at IBM Research
in 1975
Precursor to the IBM RS/6000 workstation processors which later
influenced PowerPC
The
Evolved into the SPARC ISA of Sun Microsystems
The
Berkeley RISC project started by Dave Patterson in 1980
Stanford MIPS project started by John Hennessy ~1980
Hennessy co-founded MIPS Computer
RISC philosophy is that instruction sets should
be simplified to enable fast hardware
implementations that can be exploited by
optimizing compilers
University
projects had the added goal of single chip
implementation
Original RISC approach
Fixed-length (32 bits for MIPS) instructions that
have only a few formats
Simplifies
instruction fetch and decode
Code density is sacrificed
Some bits are wasted for some instruction types
Register-register architecture
called load-store architecture
Permits very fast implementation of simple instructions
Easier to pipeline (Chapter 6)
Requires more instructions to implement a HLL program
Also
Limited number of addressing modes
Simplifies
EA calculation and thus speeds up memory access
Few if any complex arithmetic functions
Instead
more, simpler instructions are used
Some recent deviations from RISC
Shorter instructions (16 bits) intermixed with
longer (32 bits)
Higher
code density for embedded environments
Some complex operations
Multiply-add
for DSP and multimedia applications
Experience is that selectively adding some
complexity back into the ISA improves
performance and/or cost effectiveness
MIPS instruction formats
R-type (Register)
31
26 25
op
rs
16 15
rt
11 10
rd
6 5
shamt
0
funct
I-Type (Immediate)
31
26 25
op
21 20
21 20
rs
16 15
rt
0
immediate
J-Type (Jump)
31
26 25
op
0
target
MIPS instruction formats
where
op
is a 6-bit operation code (opcode)
rs is a 5-bit source register specifier
rt is a 5-bit (source or destination) register specifier or branch
condition
rd is a 5-bit destination register specifier
shamt is a 5-bit shift amount
funct is a 6-bit function field
immediate is a 16-bit immediate, branch displacement, or
memory address displacement
target is a 26-bit jump target address
Simplifications
Fixed
length
Limited number of field types
Many fields located in same location in different formats
MIPS addressing modes
Rundown of MIPS instructions
Computational
R-type
and I-type
Load and store instructions
I-type
Jump and branch instructions
R-type,
I-type, and J-type
Only dealing with integer instructions for now
Registers in MIPS
31
general purpose registers
r0
r1
.
.
.
0
31
31
multiply/divide registers
HI
LO
program counter register
PC
0
0
r30
r31
32 GPRs
r0
is equal to zero in all MIPS implementations, which makes it
easy to…
Load a constant, e.g., addi $2,$0,#3
Do absolute addressing, e.g., lw $4,1000($0)
r31
holds return address for subroutine returns
All other GPRs are “generic” although software conventions
define their usage (discussed later)
Registers in MIPS
HI and LO
Hold
multiply result (HI holds upper half)
Hold divide quotient (LO) and remainder (HI)
PC holds address of currently executing
instruction
Also have a separate set of 32 floating point
registers if floating point arithmetic is supported
Computational operands in MIPS
All computations occur on full 32 bit words
Computations are on signed (2’s complement) or
unsigned operands (positive numbers)
Example:
1111 1111 1111 1111 1111 1111 1111 1111 is –1
as a signed number and 4,294,967,294 as an unsigned
number
Operands are in registers or are immediates
Immediate (constant) values are 16 bits wide
is sign extended (most significant bit is replicated in the
uppermost 16 bits) before operated on to convert it to a
32-bit operand
Value
Example: 1000 0000 0000 0000
becomes
1111 1111 1111 1111 1000 0000 0000 0000
Computational instructions
Arithmetic (signed, unsigned, and immediate)
add,
sub, mult, div
addu, subu, multu, divu
addi
Logical
and,
or, nor, xor
andi, ori, xori
Shift (logical and arithmetic)
srl,
sll
Shift shamt digits to right or left; put zeros in leftmost positions for srl
srlv,
Shift [rs] digits to right or left; put zeros in leftmost positions for srlv
sral,
sllv
srav
Same as above except sign-extend the high-order bits
Computational instructions
Set less than (signed, unsigned, and immediate)
rd to 1 if condition is met, set to 0 otherwise
Set
Almost
slt,
sltu
Condition is rs<rt
slti,
always paired with a branch instruction
sltiu
Condition is rs<immediate
Load upper immediate (lui)
immediate 16 bits left, append 16 zeros to right, put 32bit result into rd
Shift
Can
create a 32-bit constant from lui and ori
Move to/from HI/LO
mfhi,
Put HI or LO register contents into rd
mthi,
mflo
mtlo
Put rs contents into HI or LO (for interrupt recovery)
Load and store operands in MIPS
Memory operands can be bytes, half-words (2
bytes), or words (4 bytes [32 bits])
opcode
determines the operand size
Number of address bits used is
32
for accessing bytes
31 for accessing half-words
30 for accessing words
Half-word and word addresses must be aligned
Bit
0 must be zero for half-word accesses
Bits 0 and 1 must be zero for word accesses
31
word effective address
1 0
0 0
.
.
.
12
8
4
0
byte
addresses
1 byte
memory
1 word
Load operands can be signed or unsigned
.
.
.
12
8
4
0
Load and store instructions
lb, lh, lw
Form
effective address, get data from addressed memory
location, sign extend if lb or lh, load into rt
lbu, lhu, lwu
Form
effective address, get data from addressed memory
location, zero extend if lb or lh, load into rt
sb, sh, sw
effective address, store data from rt (partial if sb or sh)
into addressed location
Form
Also load/store instructions for unaligned data
MIPS assembly example
A is an array of words
Starting address of A (memory address of A[0])
is in $4
Want to perform A[3] = A[0] + A[1] - A[2]
MIPS assembly language:
lw
lw
add
lw
sub
sw
$1, 0($4)
$2, 4($4)
$1, $1, $2
$2, 8($4)
$1, $1, $2
$1, 12($4)
# A[0] into $1
# A[1] into $2
# A[0]+A[1] into $1
# A[2] into $2
# A[0]+A[1]-A[2] into $1
# store result in A[3]
Branch and jump instructions
Branch instructions test a condition, either
Equality
beq, bne
Often coupled with slt, sltu, slti, sltiu
or
the value of rs relative to zero
or inequality of rs and rt
bltz, bgtz, blez, bgez
and form the target address by adding the sign
extended immediate to the PC
all instructions are words, immediate is shifted left two
bits before being sign extended
Since
Operation
true
PC = target
test
condition
false
PC = PC+4
Branch and jump instructions
Jump instructions unconditionally branch to the
address formed by either
left the 26-bit target two bits and combining it with
the 4 high-order PC bits
Shifting
j
The
jr
contents of register rs
Branch and link and jump and link instructions
also save the address of the next instruction into
r31
bltzal,
bgezal, jal, jalr
Used for subroutine calls
jr $31 used to return from a subroutine
for loop and if-then-else example
i, j, n are in $1, $2, $3; $4 holds A base address
for (i=0; i<n; i++) {
if (A[i] < j)
A[i]=A[i]*2
else
A[i]=A[i]+1
}
Loop:
Else:
Exit:
add
lw
slt
beq
add
j
addi
sw
addi
addi
bne
$1, $0, $0
$5, 0($4)
$6, $5, $2
$6, $0, Else
$5, $5, $5
Exit
$5, $5, 1
$5, 0($4)
$4, $4, 4
$1, $1, 1
$1, $3, Loop
# initialize i to 0
# A[i] into $5
# set $6=1 if A[i]<j
# if not true, do else
# A[i]=A[i]*2
# skip else
# A[i]=A[i]+1
# store A[i]
# point to next A[i]
# increment i
# loop back if i<n
Special instructions
System call (syscall)
Call
OS to perform system services, e.g., printf
$2 contains code specifying the service to perform
breakpoint
Transfer
control to exception handler
Useful for debugging
Many instruction extensions have been made to
the original MIPS ISA
Branches
System
Etc.
functions
Pseudoinstructions
Assembler can create instruction sequences for
other common operations
Examples
move
$2, $3
blt $7, $14, Label
li $4, 100
Many more examples in Appendix A
Do not use these in your assignments!
Register usage conventions
Software conventions for the use of registers
ensure that routines developed by different
programmers can work together
MIPS assembly convention assigns special
names and meanings to the GPRs
Preferred to use symbolic names instead of
register numbers
Caveats
Conventions may vary by compiler (SGI/MIPS,
gcc, etc.)
The book has taken some liberties in the spirit of
simplification (and so will my notes)
So what is presented here (and in the book) may
not be 100% accurate in terms of following
“MIPS conventions”
MIPS register usage conventions
Register
0
1
2-3
4-7
8-15
24,25
16-23
26,27
28
29
30
31
name
$zero
$at
$v0,$v1
$a0-$a3
$t0-$t7
$t8,$t9
$s0-$s7
$k0,$k1
$gp
$sp
$fp
$ra
usage
always returns zero
reserved by assembler
value returned by subroutine
arguments passed to subroutine
temps not saved by called subroutine
temps saved by called subroutine
reserve for use by interrupt handler
pointer to global area
stack pointer
frame pointer
subroutine return address
Procedure call
To support procedures, we need means for
Passing
parameters to the procedure
Transferring control to the procedure
The procedure to use registers and memory without disturbing
that of the calling program
Passing results back to the calling program
Returning control to the calling program
calling
program
(1) pass
parameters,
transfer control
procedure
(2) save variables,
allocate memory
for local variables
(3) execute
procedure
(4) pass results,
return control
(4) restore
variables
stack
Procedure call
Passing parameters
$a0-$a3
reserved for parameter passing
Additional parameters passed via the stack (later)
Transferring control
jal
ProcedureName
Puts PC+4 into $ra ($31) and jumps to address associated with label
ProcedureName
Passing results
$v0,$v1
reserved for passing results
Stack used for additional results
Returning control
jr
$ra
Jumps to address contained in register $ra
Caller and callee-saved registers
Caller: the procedure that calls another
Callee: the called procedure
Values in registers that are needed by the caller
after the callee returns must be preserved
Loop:
Else:
Exit:
$s1-$s4
add
lw
slt
beq
add
jal
add
j
addi
sw
addi
addi
bne
$s1, $zero, $zero # initialize i to 0
$t0, 0($s4)
# A[i] into $t0
$t1, $t0, $s2
# set $t1=1 if A[i]<j
$t1, $zero, Else # if not true, do else
$a0, $t0, $zero # copy A[i] into $a0
doubleit
# doubleit performs A[i]=A[i]*2
$t0, $v0, $zero # copy result into $t0
Exit
# skip else
$t0, $t0, 1
# A[i]=A[i]+1
$t0, 0($s4)
# store A[i]
$s4, $s4, 4
# point to next A[i]
$s1, $s1, 1
# increment i
$s1, $s3, Loop # loop back if i<n
must be the same after return from doubleit
Caller and callee-saved registers
$t0-$t9 are caller-saved registers
Calling
procedure must save them before the procedure call if
its want the values preserved across the call
Called procedure can use them without saving them first
Used for temporary values that do not need to be saved
across a call, or if we use all $s0-$s7
$a0-$a3 and $v0,$v1 are also caller-saved
$s0-$s7 are callee-saved register
Called
procedure must save them before using them
Calling procedure does not need to save them before the call
Used for values that need to be preserved across a call, or if
we use all $t0-$t9
If the caller does not need $t0-$t9, $a0-$a3, and
$v0,$v1 to be saved across a call, and the callee
does not use $s0-$s7, then none of them need
to be saved
The stack
The stack is a particular area of memory used for
Passing
additional parameters (beyond $a0-$a4) and results
(beyond $v0, $v1)
Preserving register values that need to be maintained across a
procedure call
Creating memory space for a procedure (e.g., for local
variables)
Items are pushed onto the stack and popped off
31
push 22
19
7
push 31
22
19
7
pop
31
22
19
7
22
pop
22
19
7
19
7
But note that the stack is not strictly accessed
as a first-out structure
The stack
The LIFO nature of the stack matches the nested
nature of procedure calling
calling
program
non-leaf
procedures
leaf
procedure
Creating a stack in MIPS
Need storage
Convention
is to use memory starting at address 7fff ffffhex
(bottom of the stack) and growing to lower addresses
Need a pointer (memory address register) to the
top of the stack
$sp
($29) by convention
7fff ffffhex
$sp
memory
stack
direction
of stack
growth
0hex
Using the stack
Pushing items onto the stack
Adjust
$sp to make room for new items
Store items onto stack
Popping items off the stack
Retrieve
items off stack
Adjust $sp to position before items
Example
addi $sp, $sp, -12
sw $t1, 8($sp)
sw $t0, 4($sp)
sw $s0, 0($sp)
lw $s0, 0($sp)
lw $t0, 4($sp)
lw $t1, 8($sp)
addi $sp, $sp, 12
Procedure call steps
Caller (before call)
Save
caller-saved registers
Store arguments
First four in $a0-$a3, remaining pushed onto the stack
jal
Callee
Adjust
stack pointer
Save callee-saved registers (including $ra if non-leaf)
Read arguments
Execute the procedure
Restore callee-saved registers
Store return values and adjust stack pointer
jr
First two in $v0, $v1, remaining pushed onto the stack
$ra
Caller (after return)
Retrieve
arguments and restore caller-saved registers
Stack before (a), during (b), and after (c)
Frame pointer ($fp) provides a stable base
register from which to reference local variables
Provided
for convenience only; some MIPS conventions use
$30 as $s8 instead
Example procedure: factorial
Recursive function, therefore non-leaf
C code
int fact (int n)
{
if (n < 1) return (1);
else return (n*fact(n-1));
}
Example procedure: factorial
MIPS code (see Appendix A for slight variation)
fact:
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
# adjust stack for two items
# save $ra on stack
# save $a0 (n) on stack
# test n<1
# if not, handle (n*fact(n-1))
# put 1 in result register $v0
# readjust stack pointer
# return
# put n-1 in argument register $a0
# call fact
# restore n from stack
# restore $ra from stack
# readjust stack pointer
# put n*fact(n-1) in result register $v0
# return
fact execution with n=2
fact called three times, each with different
values of n
calling
program
n=2
n=1
$a0 used to pass the value of n
$v0 used for the return value
n=0
fact execution with n=2
fact:
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$sp
fact execution with n=2
$a0=2
fact:
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$sp
$ra
$a0
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$ra
$a0
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
$v0=1
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$v0=2*1*1
$a0=1
L1:
$v0=1*1
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
$v0=1
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
How pointers map to MIPS instructions
Pointers are addresses of data objects
Just as in HLLs, pointer versions of assembly
language programs are often more efficient than
array versions
Compilers often will map an array version in a
HLL into a pointer version in machine code
How pointers map to MIPS instructions
Array version of a procedure to clear an array
clear1 (int array[], int size)
{
int i;
for (i=0; i<size; i++)
array[i] = 0;
}
clear1:
Loop:
add
add
add
add
sw
addi
slt
bne
$t0, $zero, $zero # initialize i to 0
$t1, $t0, $t0
# $t1 = 2*i
$t1, $t1, $t1
# $t1 = 4*i
$t2, $a0, $t1
# $t2 = &array[i]
$zero, 0($t2)
# clear array[i]
$t0, $t0, 1
# increment i
$t3, $t0, $a1
# set $t3=1 if i<size
$t3, $zero, Loop # if $t3 not equal 0, reiterate loop
How pointers map to MIPS instructions
Pointer version of a procedure to clear an array
clear2 (int *array, int size)
{
int *p;
for (p=&array[0]; p<&array[size]; p++)
*p = 0;
}
clear2:
Loop:
add
add
add
add
sw
addi
slt
bne
$t0, $a0, $zero
$t1, $a1, $a1
$t1, $t1, $t1
$t2, $a0, $t1
$zero, 0($t0)
$t0, $t0, 4
$t3, $t0, $t2
$t3, $zero, Loop
# initialize p to &array[0]
# $t1 = 2*size
# $t1 = 4*size
# $t2 = &array[size]
# clear location pointed to by p
# increment p to point to next element
# set $t3=1 if p<&array[size]
# if $t3 not equal 0, reiterate loop
Assembling, linking, loading programs
compile
time
run
time
Compiler
Creates assembly language from HLL
May
produce object files directly (built-in assembly)
Phases
Front
end
Translate from HLL to intermediate (internal) form
Optimizations
Improve execution performance, often at the expense of increasing
code size
Back
(optional)
end (code generator)
Generate assembly (or object code) from intermediate form
Compiler structure
Compiler optimizations
Assembler
Converts assembly language into an object file
in binary format that can be linked with other
object files
Object file format
Header:
size and position of other object file sections
Text segment: machine code (instructions)
Data segment: data
Relocation information: list of instructions and data that
depend on absolute memory addresses when the program is
loaded into memory
Symbol table: location of each label in the program as well as
list of unresolved references
Debugging
information
Example object file
Header
Text segment
Data segment
Relocation info
Symbol table
Name
myprog
Text size
100hex
Data size
20hex
Address
Instruction
0
lw $a0, 0($gp)
4
jal 0
…
…
Address
Data
0
(X)
…
…
Address
0
Instruction type
lw
Dependency
X
4
jal
yourproc
Label
Address
X
yourproc
Linker
Links object files together to form an executable
(binary) that can be run on the machine
should be
“printf”
and executable files should show machine language
(in binary format) not assembly language
Object
Linker tasks
Determine memory addresses of code and data
Resolve absolute addresses
Find library routines referenced by the program
Resolve references among files
Resulting program resides in I/O
on
hard drive
on CD
on another machine’s hard drive accessible via the network
Etc.
Example executable file
Header
Text segment
Data segment
Name
Text size
myprog
300hex
Data size
50hex
Address
0040 0000hex
Instruction
lw $a0, 8000hex($gp)
0040 0004hex
jal 100040hex
…
…
0040 0100hex
[start of yourproc]
…
…
Address
1000 0000hex
Data
(X)
…
…
$gp = 1000 8000hex
Loader
Part of the operating system that loads
programs into memory, starts them, and
terminates them when done
Steps
Determine
memory size of text and data segments from
executable header
Create memory address space big enough for the program
Copy text and initialized data into those memory locations
Copy input parameters onto the stack
Clear general registers and set $sp to first free stack location
Call a startup routine that
Copies input parameters into $a0-$a4
Loads the PC with the starting address of main
Terminates the program when main returns
The Motorola MCore architecture
Introduced in 1997 to attack embedded market
Cell
phones
Personal Digital Assistants (PDAs)
Portable Global Positioning Systems (GPSs)
Automotive systems (braking, engine control, etc)
Etc.
Features
32-bit
architecture (integer arithmetic is on 32 bits)
Many embedded processors are 8 or 16-bit
16-bit
fixed-length instructions
Two-operand register-register (load-store) architecture
One source operand also serves as the destination (e.g., rx = rx + ry)
Byte,
half-word, and word memory accesses
Power saving modes
http://www.motorola.com/mcore
MCore user visible registers
16 general purpose 32-bit
registers
R0
is stack pointer
R15 holds return address
32-bit PC register
1-bit condition register
Set
by compare, arithmetic, and
logical operations
Alternative to setting a GPR for
branch conditions as in MIPS
MCore instruction formats
Register-to-register
Monadic
Example: abs rx (absolute value of register rx)
Dyadic
Example: movt rx, ry (move ry to rx if condition register is “true”)
5-bit
Example: lsli rx, imm5 (logical shift left of rx by imm5 bits)
5-bit
immediate
offset immediate
Example: addi rx, oimm5 (add rx and unsigned value oimm5)
MCore instruction formats
7-bit
immediate
Only used by movi rx, imm7 (put unsigned imm7 into rx)
Control
register
Used to move between rx and special register (CReg)
Memory access
Scaled
4-bit immediate
Scale (shift left) imm4 to match access type (byte, half-word, word) and
add to rx
MCore instruction formats
Load
relative word
Only used by lrw rz,label (left-shift zero-extended DISP_8 by 2, add to
PC+2, force two LSBs to zero, and load word at this address into rz)
Register
quadrant and multiple register
Used to load/store multiple registers
Flow control
Scaled
11-bit displacement
Example: bf label (if condition register is “false”, branch to address
formed by appending one zero to right of sign-extended DISP_11
and adding to PC+2; otherwise increment PC)
MCore instruction formats
Register
Example: jsr rx (put PC+2 in r15; jump to subroutine addressed by
appending one zero to right of rx)
Indirect
Example jsri label (put PC+2 in r15; jump to subroutine addressed by
left-shifting zero-extended DISP_8 by 2, adding to PC+2, and forcing two
LSBs to zero
Register
with 4-bit negative displacement
Only used for loopt rx, label (decrement rx; set condition bit if signed
rx>0; clear the condition bit otherwise; if prior value of condition register
was “true”, branch to address formed by appending one to right of
one-extended DISP_4 and adding to PC+2; otherwise increment
PC+2)
MCore instruction set
Mnemonic
Description
ABS
ADDC
ADDI
ADDU
AND
ANDI
ANDN
ASR
ASRC
Absolute Value
Add with C bit
Add Immediate
Add Unsigned
Logical AND
Logical AND Immediate
AND NOT
Arithmetic Shift Right
Arithmetic Shift Right, Update C Bit
BCLRI
BF
BGENI
BGENR
BKPT
BMASKI
BR
BREV
BSETI
BSR
BT
BTSTI
Clear Bit
Branch on Condition False
Bit Generate Immediate
Bit Generate Register
Breakpoint
Bit Mask Immediate
Branch
Bit Reverse
Bit Set Immediate
Branch to Subroutine
Branch on Condition True
Bit Test Immediate
CLRF
CLRT
CMPHS
CMPLT
CMPLTI
CMPNE
CMPNEI
Clear Register on Condition False
Clear Register on Condition True
Compare Higher or Same
Compare Less-Than
Compare Less-Than Immediate
Compare Not Equal
Compare Not Equal Immediate
MCore instruction set
DECF
DECGT
DECLT
DECNE
DECT
DIVS
DIVU
DOZE
Decrement on Condition False
Decrement Register and Set Condition if Result Greater-Than
Zero
Decrement Register and Set Condition if Result Less-Than Zero
Decrement Register and Set Condition if Result Not Equal to Zero
Decrement on Condition True
Divide (Signed)
Divide (Unsigned)
Doze
FF1
Find First One
INCF
INCT
IXH
IXW
Increment on Condition False
Increment on Condition True
Index Halfword
Index Word
JMP
JMPI
JSR
JSRI
Jump
Jump Indirect
Jump to Subroutine
Jump to Subroutine Indirect
LD.[BHW]
LDM
LDQ
LOOPT
LRW
LSL, LSR
LSLC,
LSRC
LSLI, LSRI
Load
Load Multiple Registers
Load Register Quadrant
Decrement with C-Bit Update and Branch if Condition True
Load Relative Word
Logical Shift Left and Right
Logical Shift Left and Right, Update C bit
Logical Shift Left and Right by Immediate
MCore instruction set
MFCR
MOV
MOVI
MOVF
MOVT
MTCR
MULT
MVC
MVCV
Move from Control Register
Move
Move Immediate
Move on Condition False
Move on Condition True
Move to Control Register
Multiply
Move C Bit to Register
Move Inverted C Bit to
Register
NOT
Logical Complement
OR
Logical Inclusive-OR
ROTLI
RSUB
RSUBI
RTE
RFI
Rotate Left by Immediate
Reverse Subtract
Reverse Subtract Immediate
Return from Exception
Return from Interrupt
SEXTB
SEXTH
ST.[BHW]
STM
STQ
STOP
SUBC
SUBU
SUBI
SYNC
Sign-Extend Byte
Sign-Extend Halfword
Store
Store Multiple Registers
Store Register Quadrant
Stop
Subtract with C Bit
Subtract
Subtract Immediate
Synchronize
MCore instruction set
TRAP
TST
TSTNB
Z
Trap
Test Operands
Test for No Byte Equal Zero
WAIT
Wait
XOR
XSR
XTRB0
XTRB1
XTRB2
XTRB3
Exclusive OR
Extended Shift Right
Extract Byte 0
Extract Byte 1
Extract Byte 2
Extract Byte 3
ZEXTB
ZEXTH
Zero-Extend Byte
Zero-Extend Halfword
MCore versus MIPS
Both are 32-bit load-store GPR architectures
MIPS was designed for desktops and servers,
MCore for embedded systems
MCore instruction length is 16 bits to save
memory code space but this reduces
The
number of GPRs (16 versus 32 in MIPS)
The number of operands (2 versus 3 in MIPS)
MCore uses a condition bit versus a GPR for
MIPS to set branch conditions
Condition
bit saves a GPR and sometimes an instruction but
constrains instruction ordering
MCore has more addressing modes and
instruction types
Saves
code space but increases complexity of decoder
and perhaps arithmetic units
Questions?