An introduction to assembly language

Download Report

Transcript An introduction to assembly language

Assemblers
The low level way to program
Copyright 2004-2005 Curt Hill
Brief History
• Programs on the ENIAC were put in
via switches
• Von Neumann machines store the
program and data in same memory
– Machine language
• Assemblers follow
– Then macro assemblers
• Finally compilers and interpreters
Copyright 2004-2005 Curt Hill
First Programs
• The first programs were written in
machine language - very tedious and
error prone
• We still see machine language in certain
processors
• Several compilers have an Inline feature
– Use short pieces of machine language within
a high level language
– Decimal numbers make up the language
• For the most part using machine language
is dead
Copyright 2004-2005 Curt Hill
Assemblers
• Assembly usually converts one line into
one machine language statement
• Assemblers have a number of advantages
• Much easier to use than machine
language
• Full access to the hardware of all sorts
• The best code is produced by human
assemblers
– Quickest and most compact
Copyright 2004-2005 Curt Hill
Assembler Disadvantages:
• Very machine specific
• Very little checking for reasonability
is done
• Data and instructions are in the
same pool
– A whole class of errors can be made in
assembly that are impossible
elsewhere
– In assembly we can execute data and
compute instructions
Copyright 2004-2005 Curt Hill
High Level Languages
• Next came compilers and high level
languages
• First compiler is likely FORTRAN in
1958 – 1959
• First interpreter is likely LISP in
1959-1960
Copyright 2004-2005 Curt Hill
Notion of translation
• An assembler converts from one form to
another
– The basic item is the same
– Similar to oral and written communication
• Compilers are more like the translators at
the U.N.
– The basic form of the languages that they
hear and speak may be quite different:
FORTRAN and Assembly language are quite
different
– They hear one and speak the other
• Interpreters are like the butler
– He hears it then does it
Copyright 2004-2005 Curt Hill
Translation
• In translation we consider the source
language and the target language
– Most compilers take a high level language and
produce a form of machine language
– Source FORTRAN, target S360 machine
• However, the target can be anything
– The first ICON compiler (a derivative of
SNOBOL) took ICON as source and generated
FORTRAN (later C) as the target
– This gave them a machine independence
usually lacking in new compilers
Copyright 2004-2005 Curt Hill
Pascal Example
• Part of the early popularity of Pascal
stems from the fact that there existed a
series of P-compilers (P1-P4)
– Each of which took Pascal as their source and
generated P-Code as their target, the
compiler was written in Pascal
• The person who wanted to implement
Pascal on a new machine then wrote an
interpretter for the P-code or a translator
into the local machine language
• Much easier than doing it from scratch
• The Java Virtual Machine is similar
Copyright 2004-2005 Curt Hill
High level language
advantages:
• Easier to understand and use
– Closer to the way we think
– They make many things much easier to express
• They tend to be machine independent
– They remove the complications of using registers and
memory
• They usually conceptualize programs as data
and instructions which are mutually exclusive
• Studies show that a programmer produce the
same number of lines of code per hour,
regardless of language
• There is usually a loss of efficiency in terms of
time and space
– In our current climate this is disregarded
Copyright 2004-2005 Curt Hill
Compilers and assemblers
• Most assemblers and compilers
produce object code that is not
ready to run
• Two things are lacking:
– Relocation to the correct address (if
needed)
– Connection of external subroutines
• Object code is relocatable code
Copyright 2004-2005 Curt Hill
Linkers and loaders
• A linker will put together several
pieces of object code so that they
form one program and make sure
the links to each other are correct
– This is the executable
• A loader will load the item into its
actual memory location and relocate
the absolute addresses as it does so
– It then causes it to be executed
– Loaders are fundamental to operating
system
Copyright 2004-2005 Curt Hill
The process of assembling
• Almost always consists of two
passes
– There may be more for macro
processing
– They may do it all in one pass with the
right code
• First pass computes locations
• Second generates code
Copyright 2004-2005 Curt Hill
S360 Assembly Format
• The format is:
[label] opcode ops comments
• The label is optional
– Names data and branch targets
• The opcode is a mnemonic
– L = Load
– A = Add
• The ops are operands
– Comma separated – no blanks
• Comments follow
• A * in column 1 makes line a comment
Copyright 2004-2005 Curt Hill
Why two pass?
• Declaration before use is not
required
• An assembler instruction may use
data or location that has not yet
been seen
• So the first pass finds where and
what
• Second pass does the hard work
Copyright 2004-2005 Curt Hill
Pass 1 Processes each line
• Labels are processed and their
characteristics noted
– Location
– Size and type are sometimes recorded
• Operation codes are noted
– Mostly to compute current location
• Operands are usually only checked that
they match the operation code
– Validity of the operands is not considered yet,
since some of the symbols have not been seen
yet
• A symbol table is created
Copyright 2004-2005 Curt Hill
The Symbol Table
• Contains one entry for each name
found in the source
• The main function is to record the
location
• The name may also be connected to
other attributes
– Type
– Length
Copyright 2004-2005 Curt Hill
Pass 2
• Generate the operation code
machine language
• This is a table look up
• Take the mnemonic
– Search a table for it
– The table contains the opcode
– The search is usually done in the first
pass to find length
– The generation always in the second
pass
Copyright 2004-2005 Curt Hill
Pass 2 Continued
• Convert symbolic operands into machine
language
• Look up the name in symbol table
• Make that location into the kind of reference
needed in the machine instruction
• In real S360 (not always BALSX) this is a base
offset pair
• We had earlier announced the value of the
register that we wanted to use as a base register
• The difference in the location from that point
becomes the offset
• A variable declaration is just a special kind of
opcode
Copyright 2004-2005 Curt Hill
Pseudo Instructions
• These give the assembler some
information, but are not translated
into an operation
• In one sense a DC/DS is a pseudo
instruction, but it does generate
machine code of some sort
Copyright 2004-2005 Curt Hill
S360 Pseudo Instructions
• Start
– Begins a control section
– A control section is a named module
– Also has some addressability considerations
• EQU
– Equates the label with the item in the operand field
•
•
•
•
•
Often used to give new names to old things:
R4 EQU 4 - allows us to name registers
LOC EQU *
* is the current location
PARMS EQU 0(1)
• Using
– Describes which register points at what item for the
purpose of addressability
Copyright 2004-2005 Curt Hill
Entry
• The 360 needs to do a number of
things when a program or
subroutine is entered
– Save caller’s registers
– Establish a base register
– Establish a save area
• This is done with a sequence such
as shown on next screen
Copyright 2004-2005 Curt Hill
S360 Entry Conventions
STM 14,12,12(13) Save
BALR 12,0
get adr in 12
USING *,12
make 12 base
* Not until last instruction may named
* items be used
LA 15,Save
ST 15,8(13)
ST 13,Save+4
LR 13,15
* Now ready for program
Copyright 2004-2005 Curt Hill
Assembly formats
• There are several different assembly
language formats
• 360/370 format has been discussed
• Tannenbaum shows VAX/VMS assembly
language which is similar to Intel
• Same basic format, but with several
changes
– Comments always start with a ;
– Labels end in a : when declared, but not when
used
– Pseudo instructions start with a .
Copyright 2004-2005 Curt Hill
Object Code
• Object code is mostly machine language,
– Need to allow for linking and loading
• When an assembler or compiler records
external references and things needing
relocation
– Neither of which it can resolve
• The linker needs to organize the modules
in some order
– Then patch up the addresses
– This happens after the assembly or
compilation
Copyright 2004-2005 Curt Hill
IBM 360 object
• Four different kinds of cards that
represent the kinds of data that were
present
–
–
–
–
Each was originally punched on a card deck
Later stored on disk in card images
Thus each was 80 bytes long
The first three characters described the type
• TXT – This was the actual machine
language
• ESD – External Symbol Dictionary
• RLD – Relocation Dictionary
• END – The last card of the deck
Copyright 2004-2005 Curt Hill
ESD
• ESD - External symbol dictionary
• This was the location and name of every
external item
– Procedures, functions and data defined
outside of the module in question
• Suppose that external routine X was
called in the subroutine
• Somewhere in the routine there would
have to be and instruction like:
L 15,addrX
...
addrx DC V(‘X’)
• The V indicated that it was external
reference
– Thus only the linker could resolve it
Copyright 2004-2005 Curt Hill
More on ESD
• The ESD contained each reference to
each external symbol
– In this case the relative location of addrx and
the external name X
• The linker then would load into that
memory location another relative address
when it linked the two together
• The ESD should also contain the name of
this module
• The ESD should contain the names of
both external code and external data
Copyright 2004-2005 Curt Hill
RLD - Relocation Dictionary
• Here were identified any addresses that
need to be relocated when the item was
loaded
• Since the addressability of the 360 used
base offset pairs and the base could be
set after beginning of execution there did
not need to be many of these
• There was however, an A form storage
location, which was the address of an
item in this control section
local DC A(item)
Copyright 2004-2005 Curt Hill
Linker Again
• The linker would be given any object
files that were to be in the final
program
• It would also be given libraries that
contained common routines
• It would then construct an
executable that contained all the
needed modules and fill in the
relative addresses
• A map could be produced that
showed how things loaded
Copyright 2004-2005 Curt Hill
Macro processing
• Most assemblers are also macro
processors
– One should not do 360 assembly
language processing without macros
• A macro is a sequence of assembly
language statements that will
generated by a single assembly
language command
– Similar to a subroutine, except we do
not branch anywhere and return
– Just expand the instructions inline
Copyright 2004-2005 Curt Hill
Non-assembly macros
• Macros are not limited to assembly
language
• C and C++ have a defined macro facility
that is part of the preprocessor
• Many processors, such as Turbo Pascal
have macro processors as part of their
extensions to the language
• There are also many stand alone macro
processors that can be used in any
programming language environment
Copyright 2004-2005 Curt Hill
Macros
• Macros are a nice feature in a high
level language
• They are much more important to
assembly which lacks many of the
nice feature of a high level language
• Common sequences of instructions
are rendered as macros
– Almost all I/O is a macro on the 360
Copyright 2004-2005 Curt Hill
Macros can be parameterized
• The header indicates what
parameters are to be used
• There operands are incorporated
into the code when it is expanded
• The parameters usually use a
special symbol to indicate that it is a
macro parameter rather than an
assembly operand
• This was the & on 360
Copyright 2004-2005 Curt Hill
Example
• The entry sequence of a 360
program is pretty standard:
• Save the registers in the provided
save area
• Load register 12 with a value that
can be used for base addressing
• Do the using command to tell the
assembler what to use
• These three are usually
incorporated as the SAVE macro
Copyright 2004-2005 Curt Hill
Save macro
• The save macro:
MACRO
&L
SAVE &R
STM 14,12,12(13)
BALR &R,0
Using *,&R
MEND
– The &R is replaced by whatever is on
the invocation line, possibly with
defaults
Copyright 2004-2005 Curt Hill
Macro Example Notes
• Starts with MACRO and ends with
MEND
• The prototype is the second line
• If this line was encountered:
SAVE 10
– The &L would become nothing
– The &R would be 10, which is part of
two instructions
Copyright 2004-2005 Curt Hill
I/O processing
• The parameters for a Get macro would
include the Data Control Block and a
place to put the data
• The macro would create a parameter list
containing the addresses of the two items
• Load the address of that list into register
1
• Then call the needed subroutine or do a
SVC
• There is also the notion of conditional
assembly
– Where the lines of code generated would be
different given different situations
Copyright 2004-2005 Curt Hill
Copyright 2004-2005 Curt Hill
Discussion of Handout
• The first page is the assembly
language listing
• The second page is the execution
trace
– Last line of first page and on
• The third page is the post-mortem
dump
• The fourth page is the scratch file
– This demonstrates what the assembler
does and does not do in its first pass
Copyright 2004-2005 Curt Hill
Assembly listing
• Several columns
–
–
–
–
Memory location, starting at zero
Object code in hexadecimal
Line number
Statement
• Notice the Pascal program in
comments
• Notice executing data and using
instructions as data
Copyright 2004-2005 Curt Hill
Trace Page
• Again several columns for the trace
• All trace lines are prefixed with *trace*
– Other lines are output
• For each trace line there is:
–
–
–
–
The instruction address
The instruction
It’s mnemonic
Changed items
• Register, memory, execution location or condition
codes
• The output is interspersed
Copyright 2004-2005 Curt Hill
Post Mortem Dump Shows:
•
•
•
•
Last 10 instructions
Registers at the end
Memory
All in hexadecimal dump format
Copyright 2004-2005 Curt Hill
Scratch File
• The point of looking at this is to determine
what is done on the first pass
• The entire source is here but it is prefixed
by five columns
• Index
• Type indicator
• Memory location
• Position for later parsing
• Length for later parsing
Copyright 2004-2005 Curt Hill
Index
• Subscripts into an opcode table
• This is not the operation code per se
– Rather it is an index in a table
• The table contains the mnemonic,
opcode, kind of instruction, kind of
operands needed etc.
• -1 indicates a comment that can be
ignored in all other respects
Copyright 2004-2005 Curt Hill
Type Indicator
• The second number is a boolean that
captures whether the operation is an RR
instruction
– BALSX only implements RR, RS, RX, so only
two lengths
• Each RR opcode is 64 less than the RX
equivalent and its length is always 2
• So rather than add new opcodes to the
opcode table the program searches for
the opcode after stripping the R, only
certain opcodes can be RR so some are
eliminated even after that
Copyright 2004-2005 Curt Hill
Memory Location
• The third number the address that
this item will be located
• This presumes a starting point of
zero
• This could be relocated elsewhere,
though
• BALSX will assemble without proper
base register
Copyright 2004-2005 Curt Hill
Position
• The fourth is the position of the first
character of the operand field
• This is just to make reparsing the
line easy and we never have to
reparse the label or operation field
• Most things start at 15, because of
the code alignment
• First position is zero
Copyright 2004-2005 Curt Hill
Length
• The fifth is the length of the operand
field
• This is just to make reparsing the
line easy
• Notice the NOP labeled by endloop
• NOPs have no operand, but the
assembler tagged the comment as
the operand without worrying about
whether this thing had an operand
or not
Copyright 2004-2005 Curt Hill