parsing assembler

Download Report

Transcript parsing assembler

The Assembly Language Level
Part B – The Assembly Process
Specifying numeric values
Java
int
int
int
MASM
i = 10;
j = 0x10;
k = 010;
System.out.println( i );
System.out.println( j );
System.out.println( k );
What appears?
i
j
k
el
m
n
C/C++
dword
dword
dword
=
=
=
10d
10h
10o
20D
20H
20O
int i = 10;
int j = 0x10;
int k = 010;
#define
el
#define
m
#define
n
20
0x20
020
Specifying numeric values
Java
int
int
int
MASM
i = 10;
j = 0x10;
k = 010;
System.out.println( i );
System.out.println( j );
System.out.println( k );
What appears?
10
16
8
i
j
k
el
m
n
C/C++
dword
dword
dword
=
=
=
10d
10h
10o
20D
20H
20O
int i = 10;
int j = 0x10;
int k = 010;
#define
el
#define
m
#define
n
20
0x20
020
(when a symbol is used before it is defined)
FORWARD REFERENCES
Two-pass assembler (translator)
class test {
public static void main ( String args[] ) {
System.out.println( "k=" + k );
f();
}
static int k = 72;
static void f ( ) {
System.out.println( "in f()" );
}
}
valid forward
references
Two-pass assembler (translator)
class test {
public static void main ( String args[] ) {
System.out.println( "k=" + k );
int k = 72;
}
}
invalid forward
reference
Two-pass assembler (translator)
• Forward reference
– symbol is used before it is defined
• Solutions:
1. (One-pass translators make only one pass and don’t
allow any forward references at all.)
2. Make two passes (read the source file twice).
•
•
The first pass collects symbol definitions and label locations.
The second pass uses the table built in the first pass.
3. Make one pass over the source and produce an
intermediate form.
•
A second pass is made over this intermediate data.
PASS ONE
Pass one
• Purpose: build the symbol table
– a table containing:
• the name of the symbol and its value
• or the name of the label and its location
– ILC = Instruction Location Counter
– Variable set to 0 and incremented by instruction (or data)
length for each line of code.
– Note: code or data may both be labeled.
Pass One: Two-Pass Assemblers
The instruction location counter (ILC) keeps track of the address where the
instructions will be loaded in memory.
In this example, the statements prior to MARIA occupy 100 bytes.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
Pass one
• Symbol types:
1. Symbol and corresponding value
•
•
Assigned by a pseudoinstruction
Ex. bufsize equ 8192
2. Label and corresponding location
;CF flag
test ebx, CF
jz
cf0
print SADD(" CF:1")
jmp nxt
;carry set?
;jump if 0
;flag is on
;skip over else part
print SADD(" CF:0")
;flag is off
cf0:
nxt:
Pass one
• Employs 3 (or sometimes 4) tables:
1.
2.
3.
4.
Symbol table
Pseudoinstruction table
Opcode table
Literal table (optional)
Pass one
• Employs 3 tables:
1. Symbol table
•
•
•
•
•
Symbol name
Value (ILC for labels; defn/value for equ)
Length of data field (especially for strings)
Relocation bits (relocatable?)
Scope of symbol
2. Pseudoinstruction table
3. Opcode table
4. Literal table (optional)
Pass One: Two-Pass Assemblers
A symbol table for the program of Fig. 7-7.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
Pass one
• Employs 3 tables:
1. Symbol table
2. Pseudoinstruction table
db
dw
dd
1
2
4
3. Opcode table
4. Literal table (optional)
Pass one
• Employs 3 tables:
1.
2.
3.
4.
Symbol table
Pseudoinstruction table
Opcode table
Literal table (optional)
Pass one: Two-Pass Assemblers
A few excerpts from the opcode table for a
Pentium 4 assembler.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
Pass one
• Employs 3 tables:
1.
2.
3.
4.
Symbol table
Pseudoinstruction table
Opcode table
Literal table (optional)
•
for pseudoimmediate instructions
–
–
when there is not any support for immediate instructions
only loads from memory are allowed
Pass One (part 1)
Pass one of a simple assembler.
...
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
Pass One (part 2)
...
Pass one of a simple assembler.
(Of course, a line
might contain more
than one literal!)
...
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
Pass One (part 3)
...
Pass one of a simple assembler.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
PASS TWO
Pass two
• Purpose:
– to generate object code
– to optionally generate assembly listing
• Reads in each line, 1 line at a time (from
original or intermediate code).
• Processes each line.
– writes out binary object code
MASM listing (.lst) file
00000000
= "<hit return>"
prompt
.data
equ
;insert variables below
"<hit return>"
;prompt string
…
00000000
00000000
00000000 B8 00000001
00000005 BB 00000002
0000000A B9 00000003
0000000F BA 00000004
00000014 BE 00000005
00000019 BF 00000006
0000001E E8 00000036
00000000
1
00000000 3C 68 69 74 20 1
72 65 74 75 72
6E 3E 00
00000000
1
00000000
1
00000023
1
0000003C C6 80 00000000 R
00
0000004D B8 00000000 R
00000059
main
??0019
??001A
1
main
.code
;insert executable instructions below
PROC
;program execution begins here
mov
eax, 1
;set regs values
mov
ebx, 2
mov
ecx, 3
mov
edx, 4
mov
esi, 5
mov
edi, 6
call
dump
;show contents of regs
.data
db prompt, 0
.data?
db 132 dup (?)
.code
mov BYTE PTR [??001A+eax], 0
mov
exit
ENDP
eax, input(prompt) ;prompt the user
;end of program
00000000
00000000
00000000
00000005
0000000A
0000000F
00000014
00000019
main
B8
BB
B9
BA
BE
BF
00000001
00000002
00000003
00000004
00000005
00000006
.code
PROC
mov
mov
mov
mov
mov
mov
;insert executable instructions below
;program execution begins here
eax, 1
;set regs values
ebx, 2
ecx, 3
edx, 4
esi, 5
edi, 6
00000000
00000000
00000000
00000005
0000000A
0000000F
00000014
00000019
main
B8
BB
B9
BA
BE
BF
00000001
00000002
00000003
00000004
00000005
00000006
.code
PROC
mov
mov
mov
mov
mov
mov
;insert executable instructions below
;program execution begins here
eax, 1
;set regs values
ebx, 2
ecx, 3
edx, 4
esi, 5
edi, 6
00000000
= "<hit return>"
00000000
1
00000000 3C 68 69 74 20 1
72 65 74 75 72
6E 3E 00
prompt
.data
equ
;insert variables below
"<hit return>"
;prompt string
??0019
.data
db prompt, 0
ASCII & MASM listing file
Pass two
Possible errors:
1.
2.
3.
4.
5.
6.
7.
Symbol used but not defined
Multiply defined symbol
Unrecognized opcode
Too few/many operands
Bad (binary/octal/decimal/hex) #
Illegal register use (ex. Branch to reg)
END missing
Pass Two (part 1)
Pass two of a simple assembler.
(Questionable
limitations (16
bytes).)
Read one
entry from the
intermediate
file.
...
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
...
Pass Two (part 2)
Pass two of a simple assembler.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
SYMBOL TABLE ORGANIZATION
Symbol table
• Stores <name,value> pairs.
• How do we find the value associated with a
particular symbol?
• If we store the symbol table in an unordered
array, then linear search requires:
– best case:
– worst case:
– average:
– O(n)
1
n
n/2 search effort on average
Symbol table
• Stores <name,value> pairs.
• How do we find the value associated with a
particular symbol?
• If we store the symbol table in a sorted array,
then binary search requires:
– best case:
– worst case:
– O(log2 n)
1
log2 (n)
Symbol table
• Stores <name,value> pairs.
• How do we find the value associated with a
particular symbol?
• If we store the symbol table in a hash table,
then search requires:
– best case = worst case = average case = O(1) iff we
have a perfect hash function
• H:S-->I
(H maps Strings onto the Ints)
Symbol table
• Hash function H:S-->I
– H maps Strings onto the Ints
– We store our symbols in a table of size k.
– Possible hash functions:


  si % k
 i 


  i * si %k
 i

  s % k
 i i
  i * s %k
i
 i

The Symbol Table (1)
Hash coding. (a) Symbols,
values, and the hash
codes derived from the
symbols.
We need a method to cope
with the situation when the
hash scheme is less than
perfect (i.e., when a collision
occurs).
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
The Symbol Table (1)
Hash coding. (a) Symbols,
values, and the hash
codes derived from the
symbols.
We need a method to cope
with the situation when the
hash scheme is less than
perfect (i.e., when a collision
occurs).
Chaining: One solution is to
Tanenbaum, Structured Computer
make a linked list of
symbols
Organization,
Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
that hash to the same
value.
The Symbol Table (2)
Hash coding. (b) Eight-entry hash table with
linked lists of symbols and values.
Tanenbaum, Structured Computer
Organization, Fifth Edition, (c) 2006
Pearson Education, Inc. All rights reserved.
Analysis of hashing with chaining
• If all symbols hash to the same position (i.e.,
to only one position), what happens?
• Load factor
– Given a hash table T with
• m slots that stores
• n elements,
• we define a load factor f for T as
f = n / m.
» Average number of elements in a chain.
Java and hash tables
• http://download.oracle.com/javase/6/docs/api
/java/util/Hashtable.html
This example creates a hashtable of names & numbers. It uses names as keys:
Hashtable<String, Integer> tbl = new Hashtable< String, Integer >();
tbl.put( "fred",
1 );
tbl.put( "ethel", 29 );
Note:
What’s going on with 1, 29, and
tbl.put( "barney", 3 );
3, and Integer?
To retrieve a number, use the following code:
Integer v = tbl.get( "ethel" );
if (v != null) { System.out.println( "ethel = " + v ); }
C++ and hash tables
See http://en.wikipedia.org/wiki/Hash_map_%28C%2B%2B%29
#include <hash_map>
struct eqstr {
bool operator() ( const char* s1, const char* s2 ) const {
return strcmp(s1,s2) == 0; //true when eq
}
};
…
hash_map< const char*, int, hash<const char*>, eqstr > tbl;
…
tbl["fred"] = 1;
Or one may use
tbl["ethel"] = 29;
iterator find ( const key_type& k )
tbl["barney"] = 3;
or
…
size_type count ( const key_type& k ) const
int v = tbl["ethel"];
to check.
Comparison (Java vs. C++)
import java.util.Hashtable;
Objects
only!
…
Hashtable<String, Integer>
tbl = new Hashtable< String, Integer >();
…
tbl.put( "fred",
1 );
tbl.put( "ethel", 29 );
tbl.put( "barney", 3 );
…
Integer v = tbl.get( "ethel" );
if (v != null) { System.out.println( "ethel = " + v ); }
#include <hash_map>
Why?
struct eqstr {
bool operator() ( const char* s1, const char* s2 )
const {
Objects or
return strcmp(s1,s2)==0;
primitive
}
types.
};
…
hash_map< const char*, int, hash<const char*>,
eqstr > tbl;
…
tbl["fred"] = 1;
tbl["ethel"] = 29;
Nicer syntax.
tbl["barney"] = 3;
…
int v = tbl["ethel"];
//can/should use find or count first to check for
// existence