Transcript Document

Assembly Process
Machine Code Generation
Assembling a program entails translating the
assembly language into binary machine code
This requires more than simply mapping assembly
instructions to machine instructions




Each instruction is bound to an address
Labels are bound to addresses
Assembly instructions which refer to labels generate
machine instructions which contain the label's address
Pseudo-instructions are translated into one or more
machine instructions
Instruction Format
addi $13,$7,50
0010 00
6 bits
opcode
00111
01101
5 bits
5 bits
0000 0000 0011 0010
16 bits
immediate operand
add $13,$7,$8
0000 00
opcode
00 111
01000 01101
000 0010 0000
extended opcode
The symbol table
The assembler scans the source code and generates
the appropriate bit string for each line encountered
The assembler must remember


what memory locations have been allocated
to which address each label is bound
A symbol table is a list of (label, address) pairs
When the data and text segments have been
generated, they are stored as an executable file
The file is used by a program called the loader to
initialize memory to the appropriate state before
execution
Instructions
The .text directive tells the assembler that the lines
which follow are instructions.

By default, the text segment starts at 0x00400000
In some cases, a symbol may not have an assigned
address yet when the assembler scans the line
where it belongs


A second pass through the code can update instructions
containing unresolved labels
Maintain a list of addresses in which each unresolved
label appears
When the labeled is added to the symbol table, all locations
in the corresponding list are updated to hold the address
associated with the label
Branch offset in the MIPS R2000
In machine code, the target address in a branch
must be specified as an offset from the address of
the branch.
During execution, this offset is simply added to the
program counter to fetch the next instruction


PC contains the address
Offset is measured in words, not bytes
PC_NEW = offset*4 + PC_OLD
To calculate the offset, the assembler uses the
formula:
offset = (target instruction address –
(branch instruction address))/4
Branch offset calculation
The offset is stored in the instruction as a word
offset rather than a byte offset.


Instructions are only stored at word boundaries
For both target and branch instruction, the least two
bits of the address are zero
An offset maybe negative

If the target instruction preceded the branch
instruction
The offset is stored in the 16-bit immediate field

This means the branch can only jump about 215
instructions before or after the current address
215 instructions (words) = 217 bytes
Branch offset calculation
An entry in the SPIM instruction list
offset in bytes (__start = 0x00400000)
0x00400000 – (0x00400068) = - 104
stored offset
ffe6 = -26 = -104/4
[0x00400068] 0x1440ffe6
offset calculation, in bytes
ignores PC increment
bne $2, $0, -104 [__start-0x00400068]; 44: bnez $v0, __start
machine code
orignal assembly code
instruction address
line number in source file
Jump target calculation
The jump instruction has two forms


Pseudo-direct, for j and jal
Register direct for jr and jalr
jr and jalr specify a register
containing the address to be loaded
into the PC
j and jal specify most of the address
of the target within the instruction.

However, they have a range of at most
one-sixteenth of the memory space
f
e
d
c
b
a
9
8
7
6
5
4
3
2
1
0
Jump target calculation
The target address is a 32 bit quantity


Since all word addresses are multiples of 4 there is
no need to store the last two bits
The jump instruction format has 26 bits for the target
address
The remaining 6 bits of the instruction are used for the
opcode

The highest-order 4 bits of the target are taken from
the address currently stored in the program counter
PC
opcode
Jump target bits (26)
00
Jump Target Calculation
jump instructions have a
range of 226 words or 226 x
22 =228 bytes

This range is NOT symmetric
about the jump instruction
+0x0fffff7c
0x80000080
-0x00000080
f
e
d
c
b
a
9
8
7
6
5
4
3
2
1
0
Program relocation
It is possible that program modules are
developed separately by individual
programmers. When these programs are to be
loaded into memory they should not be
assigned overlapping memory space.
Thus,the modules have to be relocated


relative addresses are relocatable
Any absolute references must be "fixed" by the
loader
Use a logical base address known at load time
Absolute addresses are stored as offsets from this TBD
base
From source to executable
high-level
source code
lib
obj
asm
exe
asm
obj
linker
compiler
assembler
loader
memory
Some examples of assembling
code
.data
a1: .word 3
a2: .word 16, 16, 16, 16
a3: .word 5
.text
__start:
la $6, a2
loop:
lw $7, 4($6)
mul $9, $10, $7
b loop
li $v0, 10
syscall
Some examples of assembling
code
.data
a1: .word 3
a2: .word 16, 16, 16, 16
a3: .word 5
.text
__start:
la $6, a2
loop:
lw $7, 4($6)
mult $9, $10, $7
b loop
li $v0, 10
syscall
Symbol Table
symbol
address
a1
1000 0000
a2
1000 0004
a3
1000 0014
__start
0040 0000
loop
0040 0008
Memory map of data section
address
contents
1000 0000 0000 0003
1000 0004 0000 0010
1000 0008 0000 0010
1000 000c 0000 0010
1000 0010 0000 0010
1000 0014 0000 0005
Translate pseudo-instructions
la $6, a2
loop:
lw $7, 4($6)
mul $9, $10, $7
b loop
li $v0, 10
syscall
lui $6, $6,
0x1000
ori $6, $6,
0x0004
lw
$7, 4($6)
mult $10, $7
mflo $9
b
loop
ori $v0, $0, 10
syscall
Translate to machine code
address
lui $6, 0x1000 00400000
ori $6, 0x0004 00400004
lw
$7, 4($6)
00400008
mult $10, $7
0040000c
mflo $9
00400010
b
loop
00400014
ori $v0, $0, 10 00400018
syscall
0040001c
contents
3c06 1000
34c6 0004
8cc7 0004
012a 0018
0000 4812
1000 xxxx
3402 000a
0000 000c
(lui)
(ori)
(lw)
(mult)
(mflo)
(beq)
(ori)
(syscall)
Resolve relative references
address
lui $6, 0x1000
00400000
ori $6, 0x0004 00400004
lw
$7, 4($6)
00400008
mult $10, $7
0040000c
mflo $9
00400010
b
loop
00400014
ori $v0, $0, 10 00400018
syscall
0040001c
contents
3c06 1000
34c6 0004
8cc7 0004
012a 0018
0000 4812
1000 fffd (-3)
3402 000a
0000 000c
[0x400008 - (0x400014)]/4 = -12/4 = -3 = 0xfffd