Assembler Design

Download Report

Transcript Assembler Design

CPE 471
Assemblers
1
Assembly Instructions

Four parts to an instruction:





We are using a flexible format



Label(s)
Operation
Operands
Comments
Instruction is still on one line (except label)
Spaces and tabs can appear between “tokens”
Alternative: fixed format

Label in column 1; Op in column 16; Operands in col 24, etc.
CPE 471
2
Assembly Instructions

Label definition:

Symbolic name for instructions address

Clarifies branching to an address and data's address

Often severely restricted in length

Starts anywhere before operation

Contains letters (a-zA-Z), digits, ‘$’, ‘_’, ‘.’
CPE 471
3
Assembly Instructions

Operation field



Operand field



Mnemonic for an instruction (ADD, BRA) or
Mnemonic for a pseudo-instruction (DC, EQU)
Addresses and registers used by instruction
In other words, what to add, where to branch
Comment field: no affect on assembler

Used for documentation.
CPE 471
4
What's an Assembler

Translation:



Source program translated into target
Source and target define levels
Source is not directly executed (what is that?)



Target or object file is executed or translated later
Assembly language means the source is a
symbolic representation of machine language
When the source is higher level the translator is
usually called a compiler
CPE 471
5
Advantages of Assembly

Compared to machine code


Easier to remember (HLT vs 26577) (symbolic)
Similarly for addresses in program (symbolic)



BGT loop1 vs
BGT 6210
Over high-level language

Access to full capabilities of the machine


testing overflow flag, test & set (atomic), registers
Speed
CPE 471
6
Advantages of Assembly






Often a holdover from when machines were
expensive and people where cheap
Systems programming often done in a language like C
Old myth: "if a program will be used a lot, it should
(for efficiency) be written in assembly"
Hard to write (10 lines/day independent of language)
Hard to read: high maintenance, high turnover
Reality: good compilers, fast machines
CPE 471
7
Modern Approach


Write in high level language

Analyze to find where time spent

Invariably, it's a tiny part of the code

Improve that tiny part (perhaps with assembly)

Problem oriented language allows high level insights

Algorithmic insights save tremendously
Assembly programmer immersed in bit-twiddling
(penny wise, pound foolish)
CPE 471
8
Types of Assemblers

Assemblers can be classified according to the number
of passes (scanning the source file) to:



One-pass assembler: some restrictions are imposed on the
forward referencing such as eliminating forward references
to data.
Two-pass assembler: the only restriction on forward
referencing is in the symbol definition, i.e., all assembler
directives (e.g. EQU and ORG) that define symbols can only
use symbols that are previously defined.
Multi-pass assembler, restrictions are made on the level of
nesting in forward referencing.
CPE 471
9
Assembler Tasks

Parse assembly instructions



Assigns addresses to instructions



Check for syntax
Tokenize string
Maintains location counter
LC = eventual location in memory of this instruction
Generate machine code


Evaluation mnemonic instructions (replace w/opcode)
Evaluate operand sub-field (replace symbols with value)
CPE 471
10
Assembler Tasks


Process pseudo-ops




Concatenate to form instructions
generate header record
evaluate DC, DS, … etc
Write output object file
Nothing here seems all that hard!
CPE 471
11
Example: First Attempt

Read each input line and generate machine code




Associate symbol with location counter
Lookup mnemonic and get opcode
Generate instruction
Example:
CPE 471
12
Example
Test
A
Begin
N
ANS
ORIG
EQU
LD
LD
ST
HLT
DC
DS
END
$0100
16
R0,N
R1, #A
R0, ANS
13
1
Begin
CPE 471
13
Data Structures


Location counter (LocCounter)
Search a table (OpTable) for mnemonic




Get opcode
Prepare to handle arguments (group like
instructions together?)
Translate arguments
Lookup/add symbol names (SymTable) and
replace with location

Watch out for relative offsets
CPE 471
14
Location Counter

Eventual address of instruction

Initialized with 0

Increment with each instruction (see OpTable).
Always two for our machine
CPE 471
15
2-pass Assembler

Solution: 2-pass assembler

Pass #1: identify and define all labels



As location of each label is determined, save in SymTable
Requires some knowledge of instructions
Pass #2: generate object code

Whenever a label is used, replace with value saved in
table
CPE 471
16
Pass I:





Determine length of machine instructions.
Keep track of Location Counter (LC).
Generate a table of symbols for pass 2.
Process some pseudo operations.
Remember literals.
CPE 471
17
Pass II




Look up value of symbols.
Generate machine code for instructions.
Generate data for DS, DC, and literals.
Process pseudo operations.
CPE 471
18
Pass 1 & 2 Communication

Scan text file twice

Save symbol locations first pass, then plug in 2nd

Simpler

Disadvantage: slow

6,800 instructions medium size file

52,000 instructions large file
CPE 471
19
Big Picture: Labs 1 & 2
Assembly
File
Assembler
Object
File
Linker
Lab 2
Disassembler
Lab 1
Lab 3
Executable
File
CPE 471
20
Table Driven Software

Many times software is very repetitive


Many times the information processes is
repetitive


Use functions!
Use loops
Many times the information to write the code is
repetitive but static:

Use tables
CPE 471
21
Table Driven Software



Easier to modify: add new entries, change
existing ones, well centralized
Code is easier, eventually, to understand
Works if there are not many exceptions
CPE 471
22
Machine OP Table
Mnemonic
Name


Opcode
Instruction
Size
Instruction
Format
This table is static (unchanging)
For our machine:



All opcodes are 6 bits
Instructions size is one or two words.
Formats do differ
CPE 471
23
Machine Op Table

Variable length instructions:



"Branch relative": PC <- PC + operand
Near: operand is 9 bits, far operand is 16 bits.
Varying formats

Fixed format makes parsing simple
CPE 471
24
Machine Op Table

Fields might include:





Name:
Type:
opcode:
Size:
“add”
ALU
000000
2 or 4
One entry for each instruction
CPE 471
25
Symbol Table

Pass 1:


Each symbol is defined. Every time a new symbol
seen (i.E., In a label) put it in the symbol table
If already there, error
CPE 471
26
Symbol Table (Pass #1)

Each symbol gives a value



Explicit: A
EQU 16
Easy: just put operand in table
Implicit: N DC 20



Must know address of instruction
Therefore, keep track of addresses as program is
scanned
Use location counter (LC)
CPE 471
27
Symbol Table (Pass #2)

Symbols in operand replaced with value



Look up symbol in symbol table
If there, replace with value
Else, ERROR
CPE 471
28
Literals

Implicit allocation & initialization of memory

Allows us to put the value itself right in the instruction




Prefice with “=“
Example:
LD D3,=#16
This means:



allocate storage at end of program
initialize this storage with the value #16
use this address in the instruction
CPE 471
29
Literal Table

Pass #1




To complete pass #1:



literals are identified and placed in the table
“name”, “value”, and “size” fields updated
duplicates can be eliminated
literals are added to the end of the program
“address” field can now be calculated
Pass #2:



literals in instructions are replaced with the
__________ field from the Literal Table
what if the literal is not in the table?
CPE 471
30
Pseudo Operations




Unlike operations, typically do not have
machine instruction (opcode) equivalents
Give information to the assembler itself.
Another term: assembler directive
Not intended to implement higher level control
structures (if-then, loops)
Uses: segment definition, symbol definition,
memory initialization, storage allocation

Low level bookeeping easily done by machine
CPE 471
31
Psuedo-op Table

Also a static table

Some lengths are 0, some 1, others?
CPE 471
32
Object File

Header record: contains information on program
length, symbols that other modules may be looking
for, and the name of this module.

Format:
1
H for header
2-5 Program length in HEX + a space
6-80 List of entry names and locations
Entry name (followed by a space).
Entry location in HEX 4 columns + a space

The first entry should be the first ORG.
CPE 471
33
Object File II

Text (type T): contains the hex equivalent of the
machine instruction.

Format:
1
T for text
2-5
Address in HEX where this text should be placed in
memory.
6-80 Instructions and/or data in HEX.
 Each word (byte) should be followed by an allocation
character to determine the type:
 S (Absolute – Does not need modification), R (Relative:
needs relocation), and X (external).
CPE 471
34
Object File III

End (type E): indicates the end of the module
and where the execution would begin.


Format:
Column
1
2-5
E for END
Execution start address in HEX
CPE 471
35
Information Flow Pass 1/2
CPE 471
36
Two Pass Assembler:
Limitations


Q: Does our 2-pass approach solve all forwardreference problems?
A: no! Something is still broken…

CPE 471
37
Forward Reference Restriction
Consider:
Y EQU 0
X EQU Y

To avoid this trouble, impose a restriction:


X EQU Y
Y EQU 0
-------------
What about DS? Is a forward symbol allowed
as the operand?
CPE 471
38
Containers




What does this mean
We can't just generate a machine instruction from
each assembly instruction -- must save info
We need to start using some data structures!
Table means any data structure or container
CPE 471
39
Absolute Programs

Programmer decides a priori where program will
reside


e.g., Prog ORIG $3176
But memory is shared


concurrent users, concurrent jobs/processes
several programs loaded simultaneously
CPE 471
40
Absolute Programs: Limitation
CPE 471
41
Absolute Programs: Limitation II

Would like the loading to be flexible




decide at load time where it goes!
(not at ____________ time)
this decision is made by
__________________
What the programmer wants:
“find a free slot in memory that is big enough to fit
this program”
CPE 471
42
Motivating Relocation -Example
Prog
X
Start
Y
ORIG
0
DC.W Y
LD.L
D1,X
ST.L
D1,Y
HLT
DS.W
1
END Start
•In Memory, this program
appears as:
0000
0002
0004
0006
0008
000A
000C
CPE 471
43
Motivating Relocation -Example
* One slight change
Prog
ORIG
$0100
X
DC.W Y
Start
LD.L
R1,X
ST.L
R1,Y
HLT
Y
DS.L
1
END Start
•In Memory, this program
appears as:
0100
0102
0104
0106
0108
010A
010C
CPE 471
44
Relocation

The loader must update some (parts of) text
records, but not all


after load address has been determined
The assembler does 2 things:


assemble with a load address
tell the loader which parts will need to be updated
CPE 471
45
Modification Records



One approach: define a new record type that identifies
the address to be modified
We could add the following record to the object file:
M address e.g. M 0004
Also need to indicate size of quantity being relocated:

Two sizes: 9 bit pgoffset and 16 bit full word





M0000_16
M0001_9
M0002_9
One disadvantage of this approach:
-
CPE 471
46
Alternative: Bit Masks

Use 1 bit / memory cell (or byte)



Size of relocation data independent of number
of records needing modification


bit value is 0 means no relocation necessary
bit value is 1 means relocation necessary
but more densely packed (1 bit / text record)
Hard to read (debug, grade,…)
CPE 471
47
Kinds of Data

Our machine has two flavors of data:




relative (to the load address)
absolute
• The first must be modified, the second not
• Let’s look at how these kinds arise…
CPE 471
48
Symbols

Some are relative:


Some are absolute:


e.g.,
e.g.,
Symbol Table
CPE 471
49
Searching




We must associate a name with a value.
Example: A symbol table is a collection of <key,
value> pairs:
Search is given a key, return the corresponding
value.
Very important for assemblers. Every line has an
instruction (look up in MOT or POT). Lots of
symbols (or literals) used. 50% of time
searching tables.
CPE 471
50
Searching


Cover several methods:

Linear

Binary

Hash (best for assembler)
For each method, lots of analysis (i.E. Theory
and mathematics) about how long they take and
why
CPE 471
51
Linear Search: Performance

For a table of length N, how long to insert?

Constant time: O(1) (eg. fast)


Why? Insert at beginning or end
How long to find?



Best case: 1 (i.e. At first location)
Worst case: O(N) (i.e. At last location)
average case: (N+1)/2 = O(N). How?
CPE 471
52
Linear Search: Performance




One way to lower average case is to change the
probability: put most likely matches at beginning
All this analysis assumes name is in the table
If not present, time is N. Average case should
consider this too
What does this mean? Time to find one item is
linear (double size, twice as long). N
CPE 471
53
Binary Search

This algorithm requires:

Sorted table

There is a "<" operator

Constant time access to arbitrary keys

Note: each iteration divides the problem in half

How long? Best case: 1. Worst case lg N
CPE 471
54
Compromise


In the assembler, we do all insertions then the
bulk of the searching

So insert as a linear table: O(1)

Then sort: O(N lg N)

Then binary search the sorted table
When do we detect duplicates

Insertion, or

After sort
CPE 471
55
Hash Tables



Combines advantage of binary search (fast
retrieval) and advantage of linear search (fast
insertion)
Basic idea: we want to use the key as an index
in an array so it only takes one step to find
something: table["key"]
Problem: strings (or any other data type) are not
integers. If we gave strings an ordinal value,
way too big 128^8 =72 quadrillion
CPE 471
56
Hash Tables

Solution: Use a function that transforms our data into
an integer. This is called the hash function, h:
h: K --> Z(m)
where Z(n) = {1,2,... m};
m = table size;
K = set of possible keys

An ideal function generates a unique integer for each
key (problem: 72Quadrillion to 4billion integers)
CPE 471
57
Hash Table

To insert <key, value>, we want to do:


To find key:


table[h (key)] = value
return table[h (key)];
Since |K| >> m what we look for is a hash
function h that returns a uniform distribution
CPE 471
58
Hash Functions


We want to design h so that small differences in
key makes a big difference in h(key).
H is like a deterministic random number
generator.



Why?
Digital signatures (md5, pgp) sometimes called hash
function
Need to find the key after insertion
CPE 471
59
Hash Function Example



h(key) = (sum of letter values) mod 50 + 1
h("magenta") = (13+1+7+5+14+20+1) mod 50 + 1
= (61 mod 50) + 1 = 12
But two different keys can get mapped to the same
value:
h("rub") = 18+21+2 = 41;
h("dream") = 4+18+5+1+13 = 41

This is called a collision.
CPE 471
60
Dealing With Collisions

Go to next open cell. (X2,X3,X4 hash to same value)
X1
X2X3X4

Problem: leads to clustering:

Rehash with another, different function:

Cost: computation, some clustering
CPE 471
61
Dealing With Collisions

Keep a linked list
X1
X2
X3
X4
CPE 471
62
Hash Tables: Complexity


How long does insertion take (w/ chaining)
Number of attempts depends on how full the
table is - 1st entry never collides


Assume there are n elements in table (size is m)
Average time for an unsuccessful search:

Average time to search to end of a list

Average list length: r = n/m

Including hash and indexing: O(1 + r)
CPE 471
63
Expressions


Most assemblers permit use of expressions
Used as instruction operands



in machine op’s and pseudo op’s
Typically simple mathematical forms only
Two parts:


operators (+, -, *, /)
individual terms
CPE 471
64

• Individual terms may be:




constants (e.g., #4, ‘A’, $3F)
user-defined symbols (e.g., X, Buff)
special terms (e.g., * for LC)
Examples
Buff
DS
4
Buff
DS
4
LD D2, Buff
Bend EQU *
LD D2, Buff+1
Len
CPE 471
EQU Bend - Buff
65
Relocation

Expressions are evaluated at ____________



Expressions can be relative or absolute
Intuition:


(not entirely true… as we’ll see later)
the value of an absolute expression _________
__________ with program relocation
What are the “rules” for well-formed
expressions?
CPE 471
66
Absolute Expressions



An expression is absolute iff:
1. contains only absolute terms, OR
2. contains relative terms provided:
i) they occur in pairs, AND
ii) terms in each pair have opposite sign, AND
iii) relative terms do not enter in * or /
Examples of 1: 3+4+x2
2*X where:
Examples of 2: Buff2-Buf
CPE 471
67
Relative Expressions


An expression is relative iff:
1. all relative terms can be paired as above,
except one, AND
2. that remaining unpaired term is positive,
AND
3. relative terms do not enter into * or /
Examples:
Buff+6
CPE 471
68