The Assembly Process

Download Report

Transcript The Assembly Process

The Assembly Process
Basically how does it all work
The Assembly Process
Assembly
code
Assembler
Machine
code
• A computer understands machine code
- binary
• People (and compilers) write assembly
language
CMPE12c
2
Gabriel Hugh Elkaim
The Assembly Process
An assembler is a program that translates each instruction
to its binary machine code equivalent.
•It is a relatively simple program
•There is a one-to-one or near one-to-one
correspondence between assembly language
instructions and machine language instructions.
•Assemblers do some code manipulation
•Like MAL to TAL
•Label resolution
•A “macro assembler” can process simple macros like
puts, or preprocessor directives.
CMPE12c
3
Gabriel Hugh Elkaim
MAL  TAL
MAL is the set of instructions accepted by the assembler.
TAL is a subset of MAL – the instructions that can be
directly turned into machine code.
•There are many MAL instructions that have no single
TAL equivalent.
•To determine whether an instruction is a TAL instruction
or not:
•Look in appendix C or on the MAL/TAL sheet.
•The assembler takes (non MIPS) MAL instructions and
synthesizes them into 1 or more MIPS instructions.
CMPE12c
4
Gabriel Hugh Elkaim
MAL  TAL
For example
mul $8, $17, $20
Becomes
mult $17, $20
mflo $8
•MIPS has 2 registers for results from integer multiplication
and division: HI and LO
•Each is a 32 bit register
•mult and multu places the least significant 32 bits of its
result into LO, and the most significant into HI.
•Multiplying two 32-bit numbers gives a 64-bit result
•(232 – 1)(232 – 1) = 264 – 2x232 - 1
CMPE12c
5
Gabriel Hugh Elkaim
MAL  TAL
mflo, mtlo, mfhi, mthi
Move From lo
Move To hi
•Data is moved into or out of register HI or LO
•One operand is needed to tell where the data is
coming from or going to.
•For division (div or divu)
•HI gets the remainder
•LO gets the dividend
•Why aren’t these just put in $0-$31 directly?
CMPE12c
6
Gabriel Hugh Elkaim
MAL  TAL
TAL has only base displacement addressing
So this:
lw $8, label
Becomes:
la $7, label
lw $8, 0($7)
Which becomes
lui $8, 0xMSPART of label
ori $8, $8, 0xLSpart of label
lw $8, 0($8)
CMPE12c
7
Gabriel Hugh Elkaim
MAL  TAL
Instructions with immediate values are
synthesized with other instructions
So:
add $sp, $sp, 4
Becomes:
addi $sp, $sp, 4
For TAL:
•add requires 3 operands in registers.
•addi requires 2 operands in registers and one
operand that is an immediate.
•In MIPS assembly immediate instructions include:
•addi, addiu, andi, lui, ori, xori
•Why not more?
CMPE12c
8
Gabriel Hugh Elkaim
MAL  TAL
TAL implementation of I/O instructions
This:
putc $18
# if you got to use macros
Becomes:
addi
add
syscall
CMPE12c
$2, $0, 11
$4, $18, $0
# code for putc
# put character argument in $4
# ask operating system to do a function
9
Gabriel Hugh Elkaim
MAL  TAL
getc $11
Becomes:
addi
syscall
add
done
Becomes:
$2, $0, 12
addi
syscall
$2, $0, 10
$11, $0, $2
puts $13
Becomes:
addi
add
syscall
CMPE12c
$2, $0, 4
$4, $0, $13
10
Gabriel Hugh Elkaim
MAL  TAL
MAL
TAL
Arithmetic Instructions:
move $4, $3
add $4, $3, $0
add $4, $3, 15
addi $4, $3, 15 # also andi, ori, ..
mul $8, $9, $10
mult $9, $10
mflo $8
div $8, $9, $10
div $9, $10
#HI || LO  product
# never overflow
# $8  $L0, ignore $HI!
# $LO  quotient
# $HI  remainder
mflo $8
rem $8, $9, $10
CMPE12c
div $9, $10
mfhi $8
11
Gabriel Hugh Elkaim
MAL  TAL
MAL
TAL
Branch Instructions:
bltz, bgez, blez, bgtz,
beqz, bnez, blt, bge, bgt,
beq, bne
bltz, bgez, blez, bgtz, beq, bne
beqz $4, loop
beq $4, $0, loop
blt $4, $5, target
slt $t0, $4, $5
CMPE12c
# $t0 is 1 if $4 < $5
# $t0 is 0 otherwise
bne $t0, $0, target
12
Gabriel Hugh Elkaim
Assembler
The assembler will:
•Assign addresses
•Generate machine code
If necessary, the assembler will:
•Translate (synthesize) from the accepted assembly
to the instructions available in the architecture
•Provide macros and other features
•Generate an image of what memory must look like for
the program to be executed.
CMPE12c
13
Gabriel Hugh Elkaim
Assembler
What should the assembler do when it sees a directive?
• .data
• .text
• .space, .word, .byte, .float
• main:
How is the memory image formed?
CMPE12c
14
Gabriel Hugh Elkaim
Assembler
Example Data Declaration
a1:
a2:
a3:
.data
.word 3
.byte ‘\n’
.space 5
Address
0x00001000
0x00001004
0x00001008
0x0000100c
Contents
0x00000003
0x??????0a
0x????????
0x????????
•Assembler aligns data to word addresses unless told not
to.
•Assembly process is very sequential.
CMPE12c
15
Gabriel Hugh Elkaim
Machine code generation
Assembly language:
addi $8, $20, 15
immediate
opcode
rt
rs
Machine code format:
31
0
opcode
rs
rt
immediate
•opcode is 6 bits – addi is defined to be 001000
•rs – source register is 5 bits, encoding of 20, 10100
•rt – target register is 5 bits, encoding of 8, 01000
The 32-bit instruction for addi $8, $20, 15 is:
001000 10100 01000 0000000000001111
Or
0x2288000f
CMPE12c
16
Gabriel Hugh Elkaim
Instruction Formats
I-Type Instructions with 16-bit immediates
•ADDI, ORI, ANDI, …
OPC:6
rs1:5
rd:5
immediate:16
•LW, SW
OPC:6
rs1:5 rs2/rd
displacement:16
OPC:6
rs1:5
distance(instr):16
•BNE
CMPE12c
rs2:5
17
Gabriel Hugh Elkaim
Instruction Formats
J-Type Instructions with 26-bit immediate
•J, JAL
OPC:6
26-bits of jump address
R-Type All other instructions
•ADD, AND, OR, JR, JALR, SYSCALL, MULT, MFHI,
LUI, SLT
OPC:6
CMPE12c
rs1:5
rs2:5
rd:5 ALU function:11
18
Gabriel Hugh Elkaim
Assembly Example
a1:
a2:
a3:
.data
.word
.word
.word
“Symbol Table”
3
16:4
5
Symbol
a1
0040 0000
.text
a2
0040 0004
la $6, a2
lw $7, 4($6)
mul $8, $9, $10
b loop
done
a3
0040 0014
main
0080 0000
loop
0080 0008
main:
loop:
CMPE12c
Address
19
Gabriel Hugh Elkaim
Assembly Example
Memory map of .data section
address
Contents (hex)
0040 0000
0000 0003
0000 0000 0000 0000 0000 0000 0000 0011
0040 0004
0000 0010
0000 0000 0000 0000 0000 0000 0001 0000
0040 0008
0000 0010
0000 0000 0000 0000 0000 0000 0001 0000
0040 000c
0000 0010
0000 0000 0000 0000 0000 0000 0001 0000
0040 0010
0000 0010
0000 0000 0000 0000 0000 0000 0001 0000
0040 0014
0000 0005
0000 0000 0000 0000 0000 0000 0000 0101
CMPE12c
Contents (binary)
20
Gabriel Hugh Elkaim
Assembly Example
Translation of MAL to TAL code
.text
main: lui $6, 0x0040
ori $6, $6, 0x0004
loop: lw $7, 4($6)
mult $9, $10
mflo $8
beq $0, $0, loop
ori $2, $0, 10
syscall
CMPE12c
21
# la $6, a2
# mul $8, $9, $10
# b loop
# done
Gabriel Hugh Elkaim
Assembly Example
Memory map of .text section
address
Contents
(hex)
0080 0000
3c06 0040
0011 1100 0000 0110 0000 0000 0100 0000 (lui)
0080 0004
34c6 0004
0011 0100 1100 0110 0000 0000 0000 0100 (ori)
0080 0008
8cc7 0004
1000 1100 1100 0111 0000 0000 0000 0100 (lw)
0080 000c
012a 0018
0000 0001 0010 1010 0000 0000 0001 1000 (mult)
0080 0010
0000 4012
0000 0000 0000 0000 0100 0000 0001 0010 (mflo)
0080 0014
1000 fffc
0001 0000 0000 0000 1111 1111 1111 1100 (beq)
0080 0018
3402 000a
0011 0100 0000 0010 0000 0000 0000 1010 (ori)
0080 001C
0000 000c
0000 0000 0000 0000 0000 0000 0000 1100 (sys)
CMPE12c
Contents (binary)
22
Gabriel Hugh Elkaim
Assembly Example
Branch offset computation
At execution time:
PC  NPC + {sign extended offset field,00}
•PC points to instruction after the beq when offset
is added.
At assembly time:
Byte offset
CMPE12c
= target addr – (address of branch + 4)
= 00800008 – (00800010 + 00000004)
= FFFFFFF4 (-12)
23
Gabriel Hugh Elkaim
Assembly Example
4 important observations:
• Offset is stored in the instruction as a word offset
• An offset may be negative
• The field dedicated to the offset is 16 bits, range
is thus limited
• More simply: Just count the number of
instructions from instruction following branch to
target, encode that as a 16-bit value
CMPE12c
24
Gabriel Hugh Elkaim
Assembly
Jump target computation
At execution time:
PC  {most significant 4 bits of PC, target field, 00}
At assembly time:
•Take 32 bit target address
•Eliminate least significant 2 bits (since word aligned)
•Eliminate most significant 4 bits
•What remains is 26 bits, and goes in the target field
CMPE12c
25
Gabriel Hugh Elkaim
Linking N’ Loading
The process of building/configuring the
executable, placing it in memory, and running
it.
CMPE12c
26
Gabriel Hugh Elkaim
Linking and Loading
Linker
•Searches libraries
•Reads object files
•Relocates code/data
•Resolves external references
•Creates object file
CMPE12c
27
Gabriel Hugh Elkaim
Linking and Loading
Loader
•
•
•
•
•
•
Creates address spaces for text & data
Copies text & data in memory
Initializes stack and copy args
Initializes regs (maybe)
Initializes other things (OS)
Jumps to startup routine
– And then to address of “main:”
CMPE12c
28
Gabriel Hugh Elkaim
Linking and Loading
Object file
Section:
Description:
Header
Start/size of other parts
Text
Machine Language
Data
Static data – size and initial values
Relocation info
Instructions and data with absolute addresses
Symbol table
Addresses of external labels
Debuggin` info
Break points
CMPE12c
29
Gabriel Hugh Elkaim
Linking and Loading
•The data section starts at 0x0040 0000 for the MIPS processor.
•If the source code has,
a1:
a2:
.data
.word 15
.word –2
then the assembler specifies initial configuration memory as
address:
0x00400000
0x00400004
contents:
0000 0000 0000 0000 0000 0000 0000 1111
1111 1111 1111 1111 1111 1111 1111 1110
•Like the data, the code needs to be placed starting at a specific
location to make it work
CMPE12c
30
Gabriel Hugh Elkaim
Linking and Loading
Consider the case where the assembly language code is
split across 2 files. Each is assembled separately.
File2:
File 1:
.data
a3: .word 0
.data
a1: .word 15
a2: .word –2
.text
main:
CMPE12c
.text
proc5:
la $t0, a1
add $t1, $t0, $s3
jal proc5
done
31
lw $t6, a1
sub $t2, $t0, $s4
jr $ra
Gabriel Hugh Elkaim
Linking and Loading
What happens to…
• a1
• a3
• main
• proc5
• lw
• la
• jal
CMPE12c
32
Gabriel Hugh Elkaim
Linking and Loading
Problem: there are absolute addresses in the machine
code.
Solutions:
1. Only allow a single source file
• Why not?
2. Allow linking and loading to
• Relocate pieces of data and code sections
• Finish the machine code where symbols were left
undefined
• Basically makes absolute address a relative
address
CMPE12c
33
Gabriel Hugh Elkaim
Linking and Loading
The assembler will:
•Start both data and code sections at address
0, for all files.
•Keep track of the size of every data and code
section.
•Keep track of all absolute addresses within
the file.
CMPE12c
34
Gabriel Hugh Elkaim
Linking and Loading
Linking and loading will:
• Assign starting addresses for all data and
code sections, based on their sizes.
• The blocks of data and code go at nonoverlapping locations.
• Fix all absolute addresses in the code
• Place the linked code and data in memory
at the location assigned
• Start it up
CMPE12c
35
Gabriel Hugh Elkaim
MIPS Example
Code levels of abstraction (from James Larus)
“C” code
#include <stdio.h>
int main (int argc, char *argv[])
{
int I;
int sum = 0;
for (I=0; I<=100; I++) sum += I * I;
printf (“The sum 0..100=%d\n”,sum);
}
Compile this HLL into a machine’s assembly language with the
compiler.
CMPE12c
36
Gabriel Hugh Elkaim
MIPS Example
Converted into MAL…
str:
.data
.asciiz “The sum 0..100=%d\n”
.text
sw
ble
la
lw
jal
move
lw
addu
jr
main:
subu
sw
sw
sw
sw
$sp, 32
$31, 20($sp)
$4, 32($sp)
$0, 24($sp)
$0, 28($sp)
lw
mul
lw
addu
$14, 28($sp)
$15, $14, $14
$24, 24($sp)
$25, $24, $15
loop:
CMPE12c
37
$8, 28($sp)
$8, 100, loop
$4, str
$5, 24($sp)
printf
$2, $0
$31, 20($sp)
$sp, 32
$31
Gabriel Hugh Elkaim
MIPS Example
Now resolve the labels and convert to MIPS…
addiu
sw
sw
sw
sw
sw
lw
lw
multu
addiu
slti
sw
mflo
addu
bne
sw
CMPE12c
$sp, $sp,-32
$ra, 20($sp)
$a0, 32($sp)
$a1, 36($sp)
$0, 24($sp)
$0, 28($sp)
t6, 28($sp)
$t8, 24($sp)
$t6, $t6
$t0, $t6, 1
$at, $t0, 101
$t0, 28($sp)
$t7
$t9, $t8, $t7
$at, $0, -9
$t9, 24($sp)
lui
lw
jal
addiu
lw
addiu
jr
$a0,4096
$a1, 24($sp)
1048812
$a0, $a0, 1072
$ra, 20($sp)
$sp, $sp, 32
$ra
Which the assembler then
translates into binary machine
code for instructions and data.
38
Gabriel Hugh Elkaim
MIPS Example
Real MIPS Machine language
CMPE12c
00100111101111011111111111100000
10101111101111110000000000010100
10101111101001000000000000100000
10101111101001010000000000100100
10101111101000000000000000011000
10101111101000000000000000011100
10001111101011100000000000011100
10001111101110000000000000011000
00000001110011100000000000011001
00100101110010000000000000000001
00101001000000010000000001100101
10101111101010000000000000011100
00000000000000000111100000010010
00000011000011111100100000100001
00010100001000001111111111110111
10101111101110010000000000011000
00111100000001000001000000000000
10001111101001010000000000011000
00001100000100000000000011101100
00100100100001000000010000110000
10001111101111110000000000010100
00100111101111010000000000100000
00000011111000000000000000001000
00000000000000000001000000100001
39
Gabriel Hugh Elkaim