Assembler Design
Download
Report
Transcript Assembler Design
CPE 471
Assemblers
1
Assembly Instructions
Four parts to an instruction:
We are using a flexible format
Label(s)
Operation
Operands
Comments
Instruction is still on one line (except label)
Spaces and tabs can appear between “tokens”
Alternative: fixed format
Label in column 1; Op in column 16; Operands in col 24, etc.
CPE 471
2
Assembly Instructions
Label definition:
Symbolic name for instructions address
Clarifies branching to an address and data's address
Often severely restricted in length
Starts anywhere before operation
Contains letters (a-zA-Z), digits, ‘$’, ‘_’, ‘.’
CPE 471
3
Assembly Instructions
Operation field
Operand field
Mnemonic for an instruction (ADD, BRA) or
Mnemonic for a pseudo-instruction (DC, EQU)
Addresses and registers used by instruction
In other words, what to add, where to branch
Comment field: no affect on assembler
Used for documentation.
CPE 471
4
What's an Assembler
Translation:
Source program translated into target
Source and target define levels
Source is not directly executed (what is that?)
Target or object file is executed or translated later
Assembly language means the source is a
symbolic representation of machine language
When the source is higher level the translator is
usually called a compiler
CPE 471
5
Advantages of Assembly
Compared to machine code
Easier to remember (HLT vs 26577) (symbolic)
Similarly for addresses in program (symbolic)
BGT loop1 vs
BGT 6210
Over high-level language
Access to full capabilities of the machine
testing overflow flag, test & set (atomic), registers
Speed
CPE 471
6
Advantages of Assembly
Often a holdover from when machines were
expensive and people where cheap
Systems programming often done in a language like C
Old myth: "if a program will be used a lot, it should
(for efficiency) be written in assembly"
Hard to write (10 lines/day independent of language)
Hard to read: high maintenance, high turnover
Reality: good compilers, fast machines
CPE 471
7
Modern Approach
Write in high level language
Analyze to find where time spent
Invariably, it's a tiny part of the code
Improve that tiny part (perhaps with assembly)
Problem oriented language allows high level insights
Algorithmic insights save tremendously
Assembly programmer immersed in bit-twiddling
(penny wise, pound foolish)
CPE 471
8
Types of Assemblers
Assemblers can be classified according to the number
of passes (scanning the source file) to:
One-pass assembler: some restrictions are imposed on the
forward referencing such as eliminating forward references
to data.
Two-pass assembler: the only restriction on forward
referencing is in the symbol definition, i.e., all assembler
directives (e.g. EQU and ORG) that define symbols can only
use symbols that are previously defined.
Multi-pass assembler, restrictions are made on the level of
nesting in forward referencing.
CPE 471
9
Assembler Tasks
Parse assembly instructions
Assigns addresses to instructions
Check for syntax
Tokenize string
Maintains location counter
LC = eventual location in memory of this instruction
Generate machine code
Evaluation mnemonic instructions (replace w/opcode)
Evaluate operand sub-field (replace symbols with value)
CPE 471
10
Assembler Tasks
Process pseudo-ops
Concatenate to form instructions
generate header record
evaluate DC, DS, … etc
Write output object file
Nothing here seems all that hard!
CPE 471
11
Example: First Attempt
Read each input line and generate machine code
Associate symbol with location counter
Lookup mnemonic and get opcode
Generate instruction
Example:
CPE 471
12
Example
Test
A
Begin
N
ANS
ORIG
EQU
LD
LD
ST
HLT
DC
DS
END
$0100
16
R0,N
R1, #A
R0, ANS
13
1
Begin
CPE 471
13
Data Structures
Location counter (LocCounter)
Search a table (OpTable) for mnemonic
Get opcode
Prepare to handle arguments (group like
instructions together?)
Translate arguments
Lookup/add symbol names (SymTable) and
replace with location
Watch out for relative offsets
CPE 471
14
Location Counter
Eventual address of instruction
Initialized with 0
Increment with each instruction (see OpTable).
Always two for our machine
CPE 471
15
2-pass Assembler
Solution: 2-pass assembler
Pass #1: identify and define all labels
As location of each label is determined, save in SymTable
Requires some knowledge of instructions
Pass #2: generate object code
Whenever a label is used, replace with value saved in
table
CPE 471
16
Pass I:
Determine length of machine instructions.
Keep track of Location Counter (LC).
Generate a table of symbols for pass 2.
Process some pseudo operations.
Remember literals.
CPE 471
17
Pass II
Look up value of symbols.
Generate machine code for instructions.
Generate data for DS, DC, and literals.
Process pseudo operations.
CPE 471
18
Pass 1 & 2 Communication
Scan text file twice
Save symbol locations first pass, then plug in 2nd
Simpler
Disadvantage: slow
6,800 instructions medium size file
52,000 instructions large file
CPE 471
19
Big Picture: Labs 1 & 2
Assembly
File
Assembler
Object
File
Linker
Lab 2
Disassembler
Lab 1
Lab 3
Executable
File
CPE 471
20
Table Driven Software
Many times software is very repetitive
Many times the information processes is
repetitive
Use functions!
Use loops
Many times the information to write the code is
repetitive but static:
Use tables
CPE 471
21
Table Driven Software
Easier to modify: add new entries, change
existing ones, well centralized
Code is easier, eventually, to understand
Works if there are not many exceptions
CPE 471
22
Machine OP Table
Mnemonic
Name
Opcode
Instruction
Size
Instruction
Format
This table is static (unchanging)
For our machine:
All opcodes are 6 bits
Instructions size is one or two words.
Formats do differ
CPE 471
23
Machine Op Table
Variable length instructions:
"Branch relative": PC <- PC + operand
Near: operand is 9 bits, far operand is 16 bits.
Varying formats
Fixed format makes parsing simple
CPE 471
24
Machine Op Table
Fields might include:
Name:
Type:
opcode:
Size:
“add”
ALU
000000
2 or 4
One entry for each instruction
CPE 471
25
Symbol Table
Pass 1:
Each symbol is defined. Every time a new symbol
seen (i.E., In a label) put it in the symbol table
If already there, error
CPE 471
26
Symbol Table (Pass #1)
Each symbol gives a value
Explicit: A
EQU 16
Easy: just put operand in table
Implicit: N DC 20
Must know address of instruction
Therefore, keep track of addresses as program is
scanned
Use location counter (LC)
CPE 471
27
Symbol Table (Pass #2)
Symbols in operand replaced with value
Look up symbol in symbol table
If there, replace with value
Else, ERROR
CPE 471
28
Literals
Implicit allocation & initialization of memory
Allows us to put the value itself right in the instruction
Prefice with “=“
Example:
LD D3,=#16
This means:
allocate storage at end of program
initialize this storage with the value #16
use this address in the instruction
CPE 471
29
Literal Table
Pass #1
To complete pass #1:
literals are identified and placed in the table
“name”, “value”, and “size” fields updated
duplicates can be eliminated
literals are added to the end of the program
“address” field can now be calculated
Pass #2:
literals in instructions are replaced with the
__________ field from the Literal Table
what if the literal is not in the table?
CPE 471
30
Pseudo Operations
Unlike operations, typically do not have
machine instruction (opcode) equivalents
Give information to the assembler itself.
Another term: assembler directive
Not intended to implement higher level control
structures (if-then, loops)
Uses: segment definition, symbol definition,
memory initialization, storage allocation
Low level bookeeping easily done by machine
CPE 471
31
Psuedo-op Table
Also a static table
Some lengths are 0, some 1, others?
CPE 471
32
Object File
Header record: contains information on program
length, symbols that other modules may be looking
for, and the name of this module.
Format:
1
H for header
2-5 Program length in HEX + a space
6-80 List of entry names and locations
Entry name (followed by a space).
Entry location in HEX 4 columns + a space
The first entry should be the first ORG.
CPE 471
33
Object File II
Text (type T): contains the hex equivalent of the
machine instruction.
Format:
1
T for text
2-5
Address in HEX where this text should be placed in
memory.
6-80 Instructions and/or data in HEX.
Each word (byte) should be followed by an allocation
character to determine the type:
S (Absolute – Does not need modification), R (Relative:
needs relocation), and X (external).
CPE 471
34
Object File III
End (type E): indicates the end of the module
and where the execution would begin.
Format:
Column
1
2-5
E for END
Execution start address in HEX
CPE 471
35
Information Flow Pass 1/2
CPE 471
36
Two Pass Assembler:
Limitations
Q: Does our 2-pass approach solve all forwardreference problems?
A: no! Something is still broken…
CPE 471
37
Forward Reference Restriction
Consider:
Y EQU 0
X EQU Y
To avoid this trouble, impose a restriction:
X EQU Y
Y EQU 0
-------------
What about DS? Is a forward symbol allowed
as the operand?
CPE 471
38
Containers
What does this mean
We can't just generate a machine instruction from
each assembly instruction -- must save info
We need to start using some data structures!
Table means any data structure or container
CPE 471
39
Absolute Programs
Programmer decides a priori where program will
reside
e.g., Prog ORIG $3176
But memory is shared
concurrent users, concurrent jobs/processes
several programs loaded simultaneously
CPE 471
40
Absolute Programs: Limitation
CPE 471
41
Absolute Programs: Limitation II
Would like the loading to be flexible
decide at load time where it goes!
(not at ____________ time)
this decision is made by
__________________
What the programmer wants:
“find a free slot in memory that is big enough to fit
this program”
CPE 471
42
Motivating Relocation -Example
Prog
X
Start
Y
ORIG
0
DC.W Y
LD.L
D1,X
ST.L
D1,Y
HLT
DS.W
1
END Start
•In Memory, this program
appears as:
0000
0002
0004
0006
0008
000A
000C
CPE 471
43
Motivating Relocation -Example
* One slight change
Prog
ORIG
$0100
X
DC.W Y
Start
LD.L
R1,X
ST.L
R1,Y
HLT
Y
DS.L
1
END Start
•In Memory, this program
appears as:
0100
0102
0104
0106
0108
010A
010C
CPE 471
44
Relocation
The loader must update some (parts of) text
records, but not all
after load address has been determined
The assembler does 2 things:
assemble with a load address
tell the loader which parts will need to be updated
CPE 471
45
Modification Records
One approach: define a new record type that identifies
the address to be modified
We could add the following record to the object file:
M address e.g. M 0004
Also need to indicate size of quantity being relocated:
Two sizes: 9 bit pgoffset and 16 bit full word
M0000_16
M0001_9
M0002_9
One disadvantage of this approach:
-
CPE 471
46
Alternative: Bit Masks
Use 1 bit / memory cell (or byte)
Size of relocation data independent of number
of records needing modification
bit value is 0 means no relocation necessary
bit value is 1 means relocation necessary
but more densely packed (1 bit / text record)
Hard to read (debug, grade,…)
CPE 471
47
Kinds of Data
Our machine has two flavors of data:
relative (to the load address)
absolute
• The first must be modified, the second not
• Let’s look at how these kinds arise…
CPE 471
48
Symbols
Some are relative:
Some are absolute:
e.g.,
e.g.,
Symbol Table
CPE 471
49
Searching
We must associate a name with a value.
Example: A symbol table is a collection of <key,
value> pairs:
Search is given a key, return the corresponding
value.
Very important for assemblers. Every line has an
instruction (look up in MOT or POT). Lots of
symbols (or literals) used. 50% of time
searching tables.
CPE 471
50
Searching
Cover several methods:
Linear
Binary
Hash (best for assembler)
For each method, lots of analysis (i.E. Theory
and mathematics) about how long they take and
why
CPE 471
51
Linear Search: Performance
For a table of length N, how long to insert?
Constant time: O(1) (eg. fast)
Why? Insert at beginning or end
How long to find?
Best case: 1 (i.e. At first location)
Worst case: O(N) (i.e. At last location)
average case: (N+1)/2 = O(N). How?
CPE 471
52
Linear Search: Performance
One way to lower average case is to change the
probability: put most likely matches at beginning
All this analysis assumes name is in the table
If not present, time is N. Average case should
consider this too
What does this mean? Time to find one item is
linear (double size, twice as long). N
CPE 471
53
Binary Search
This algorithm requires:
Sorted table
There is a "<" operator
Constant time access to arbitrary keys
Note: each iteration divides the problem in half
How long? Best case: 1. Worst case lg N
CPE 471
54
Compromise
In the assembler, we do all insertions then the
bulk of the searching
So insert as a linear table: O(1)
Then sort: O(N lg N)
Then binary search the sorted table
When do we detect duplicates
Insertion, or
After sort
CPE 471
55
Hash Tables
Combines advantage of binary search (fast
retrieval) and advantage of linear search (fast
insertion)
Basic idea: we want to use the key as an index
in an array so it only takes one step to find
something: table["key"]
Problem: strings (or any other data type) are not
integers. If we gave strings an ordinal value,
way too big 128^8 =72 quadrillion
CPE 471
56
Hash Tables
Solution: Use a function that transforms our data into
an integer. This is called the hash function, h:
h: K --> Z(m)
where Z(n) = {1,2,... m};
m = table size;
K = set of possible keys
An ideal function generates a unique integer for each
key (problem: 72Quadrillion to 4billion integers)
CPE 471
57
Hash Table
To insert <key, value>, we want to do:
To find key:
table[h (key)] = value
return table[h (key)];
Since |K| >> m what we look for is a hash
function h that returns a uniform distribution
CPE 471
58
Hash Functions
We want to design h so that small differences in
key makes a big difference in h(key).
H is like a deterministic random number
generator.
Why?
Digital signatures (md5, pgp) sometimes called hash
function
Need to find the key after insertion
CPE 471
59
Hash Function Example
h(key) = (sum of letter values) mod 50 + 1
h("magenta") = (13+1+7+5+14+20+1) mod 50 + 1
= (61 mod 50) + 1 = 12
But two different keys can get mapped to the same
value:
h("rub") = 18+21+2 = 41;
h("dream") = 4+18+5+1+13 = 41
This is called a collision.
CPE 471
60
Dealing With Collisions
Go to next open cell. (X2,X3,X4 hash to same value)
X1
X2X3X4
Problem: leads to clustering:
Rehash with another, different function:
Cost: computation, some clustering
CPE 471
61
Dealing With Collisions
Keep a linked list
X1
X2
X3
X4
CPE 471
62
Hash Tables: Complexity
How long does insertion take (w/ chaining)
Number of attempts depends on how full the
table is - 1st entry never collides
Assume there are n elements in table (size is m)
Average time for an unsuccessful search:
Average time to search to end of a list
Average list length: r = n/m
Including hash and indexing: O(1 + r)
CPE 471
63
Expressions
Most assemblers permit use of expressions
Used as instruction operands
in machine op’s and pseudo op’s
Typically simple mathematical forms only
Two parts:
operators (+, -, *, /)
individual terms
CPE 471
64
• Individual terms may be:
constants (e.g., #4, ‘A’, $3F)
user-defined symbols (e.g., X, Buff)
special terms (e.g., * for LC)
Examples
Buff
DS
4
Buff
DS
4
LD D2, Buff
Bend EQU *
LD D2, Buff+1
Len
CPE 471
EQU Bend - Buff
65
Relocation
Expressions are evaluated at ____________
Expressions can be relative or absolute
Intuition:
(not entirely true… as we’ll see later)
the value of an absolute expression _________
__________ with program relocation
What are the “rules” for well-formed
expressions?
CPE 471
66
Absolute Expressions
An expression is absolute iff:
1. contains only absolute terms, OR
2. contains relative terms provided:
i) they occur in pairs, AND
ii) terms in each pair have opposite sign, AND
iii) relative terms do not enter in * or /
Examples of 1: 3+4+x2
2*X where:
Examples of 2: Buff2-Buf
CPE 471
67
Relative Expressions
An expression is relative iff:
1. all relative terms can be paired as above,
except one, AND
2. that remaining unpaired term is positive,
AND
3. relative terms do not enter into * or /
Examples:
Buff+6
CPE 471
68