Transcript 52223_ALP

52.223 Low Level Programming
Lecturer: Duncan Smeed
Overview of IA-32 Assembly Language
Programming
Part 1
Program Translation Hierarchy
}
52223_ALP/2
Assembly
Language
Programming
level
Overview of IA-32 Assembly Language Programming - Part 1
An Assembly Language Program: Global View
 Typically, an Assembly Language Program (ALP) is divided
into three sections that specify the main components of a
program. In some cases these sections can be inter-mixed to
provide for better design and structure. These section are:
• Assembler Directives (aka Pseudo-ops)
• Assembly Language Instructions
• Data Storage Directives
52223_ALP/3
Overview of IA-32 Assembly Language Programming - Part 1
Assembler Directives (Pseudo-ops)
 These are directives supplied by the user to the
assembler for defining data and symbols, setting
assembler and linking conditions, and specifying
output formats, etc. The directives do not produce
machine code. Examples:
DOSSEG - Specifies a standard segment order for the code,
data and stack segments.
PROC - Identifies the first executable instruction: the program
entry point.
END - Program End. This informs the assembler that the
program source is finished.
52223_ALP/4
Overview of IA-32 Assembly Language Programming - Part 1
Assembly Language Instructions
 These are the actual IA-32 instructions that are
translated into executable machine code. Examples:
MOV [operands] ; to move data, i.e. memory to register
ADD [operands] ; to add two data values
AND [operands] ; to logically AND two data values
52223_ALP/5
Overview of IA-32 Assembly Language Programming - Part 1
Data Storage Directives
 Also known as Data Definition Directives
 These allocate data storage locations containing
initialized or uninitialized data. Examples:
db
db
db
dw
52223_ALP/6
"Good afternoon”,0
20 dup(0) ; 20 bytes, all zeroed
20 dup(?) ; 20 uninitialised bytes
?,?,?,?,? ; 5 uninitialised words
Overview of IA-32 Assembly Language Programming - Part 1
Format of Assembly Language Statements
 In general an assembly language (AL) statement can
contain up to four fields. Namely:
[name] [mnemonic] [operand(s)] [comment]
 name identifies a label, variable, constant (symbol) or
keyword.
 mnemonic identifies the AL instruction (opcode) or an
assembler directive.
 operand(s) identifies the operand(s) for the mnemonic.
 comment signifies AL commentary/documentation.
52223_ALP/7
Overview of IA-32 Assembly Language Programming - Part 1
[name]
 This field identifies a label, variable, constant or keyword.
• Label - When a name appears next to a program instruction, it is called a
label. Labels serve as place markers to be used as, for example, an
address reference in a jump instruction: jmp endif_01
• Variable - A name used before a data allocation directive identifies a
location where data resides in memory. E.g.:
Count1 db
50
; the variable count1
• Constant - A name used to define a constant. E.g.:
max_col
equ
80
; the constant max_col
• Keyword - A keyword, or reserved word, has some predefined meaning
to the assembler. It may be an instruction mnemonic or an assembler
directive. Keywords cannot be used out of context or as identifiers:
add
mov ax,10 ; illegal use of add as label
52223_ALP/8
Overview of IA-32 Assembly Language Programming - Part 1
[mnemonic]
 This field contains the mnemonic of:
• an instruction opcode (e.g. MOV, ADD) or,
• a pseudo-op (e.g. DB, EQU)
 To distinguish labelled statements from unlabelled ones the
mnemonic field of an unlabelled statement must (depending on
assembler) either:
• not start in the first column since that’s where labels start,
• or labels must have an identifying character - often a ‘:’
suffix - to differentiate them from other fields. E.g. the
following code uses both types of formatting for illustration
(but note most assemblers use just one style or the other):
endif_01
52223_ALP/9
jmp endif_01
; ‘tabbed in’ statement
else_01: mov ax,10
; ‘suffix :’ style label
add dx,ax
; ‘column 1’ style label
Overview of IA-32 Assembly Language Programming - Part 1
[operand(s)]
 For those instructions or pseudo-ops that require operands then
this field contains one or more operands separated - typically by commas (e.g. registers or addresses of data to be operated
upon by the instruction in the mnemonic (op-code) field.
Examples:
ax
‘A’
ax,100
[200],bx
dx,[bx]
[bx+si],cx
ax,[bx+si+2]
52223_ALP/10
Overview of IA-32 Assembly Language Programming - Part 1
[comment]
 The remainder of the statement is the comment field.
 Some assemblers require this field to start with a
special character, such as ';’ or ‘#’.
 Comments in the program are for documentation
purposes only and are ignored by the assembler.
 Such comments are absolutely vital when
programming in AL since there is such a large
semantic gap between the design of a
program/algorithm at a high level and its
implementation at such a low level.
52223_ALP/11
Overview of IA-32 Assembly Language Programming - Part 1
Comment-only Statements
 The exception to the format of [name] [mnemonic]
[operand(s)] [comment] is that if a line starts with a
special comment-line character then the whole line is
treated as a comment:
;
;
;
;
;
;
This is an example of a comment-only line. If you ever
write AL programs then such comment lines should
ideally outnumber code lines by a significant factor!
IOW, AL is a write-only language ;-)
Incidentally, the following in-line comment is almost
worthless!!:
mov ax,10
; move the value 10 into AX
52223_ALP/12
Overview of IA-32 Assembly Language Programming - Part 1
Field Separators
 In general, the fields are separated by spaces and if the label
field is NOT present it must be replaced by at least one space.
To improve the appearance of the program it is wise to position
the fields at particular column positions (e.g. at tab stops). For
example, contrast the following two programs - one with an
untidy layout and the other with a neat layout.
;1) Untidily laid out ;2) Neatly laid out
; example program
;
example program
mov ax,[150]
mov ax,[150]
mov
bx,[152]
mov bx,[152]
add ax,2
add ax,2
mov
[154],ax
mov [154],ax
mov [150], bx
mov [150],bx
int 20
int 20
52223_ALP/13
;
;
;
;
;
;
Overview of IA-32 Assembly Language Programming - Part 1
blah
blah blah
wibble
...wibble
blah
End program
Data Definition Directives Revisited
 Variables are really just symbolic names for locations
in memory where data is stored. In assembly
language, (global) variables are identified by labels.
 A label does not, however, indicate how many bytes of
storage are allocated to a variable - it is, in effect, the
address of the first byte of a data structure.
 The following syntax diagram shows that label is
optional, and only one intialvalue is required. If more
are supplied, they must be separated by commas:
[label] <directive> initialvalue [,initialvalue]
52223_ALP/14
Overview of IA-32 Assembly Language Programming - Part 1
…Data Definition Directives
 Data definition directives are used to allocate storage
and include the following pre-defined types:
52223_ALP/15
Directive
DB
Defines
Byte
Bytes
1
DW
DD
DQ
Word
Doubleword
Quadword
2
4
8
Overview of IA-32 Assembly Language Programming - Part 1
DB - Define Byte
 The DB directive allocates storage for one or more 8-bit values:
[label] DB
initialvalue [,initialvalue]
 Initialvalue can be one or more 8-bit values, a string constant, a
constant expression (evaluated at assembly time), or a question
mark (?). If the value is signed, it has the range -128 to +127; if
unsigned, the range is 0 to 255. Here are a few examples:
char
db
'A' ; ASCII character
min_s db
-128 ; min. signed value
max_s db
+127 ; max. signed value
min_u db
0
; min. unsigned value
max_u db
255 ; max. unsigned value
52223_ALP/16
Overview of IA-32 Assembly Language Programming - Part 1
… DB - Define Byte
 Each value may also be expressed in a different radix.
For example, the following variables all contain
exactly the same value. Which radix to use is entirely
up to the programmer but is usually chosen to
reinforce the context of its use. I.e. if a value is to be
treated in a 'character' context then the definition
reflects that. Thus:
char_version
hex_version
dec_version
bin_version
oct_version
52223_ALP/17
db
db
db
db
db
'A' ; ASCII character
41h ; as hexadecimal
65
; as decimal
01000001b ; as binary
101q ; as octal
Overview of IA-32 Assembly Language Programming - Part 1
… DB - Define Byte
 A list of values may be grouped under a single label, with the
values separated by commas. In the following example, list1
and list2 have the same contents:
list1
db
10, 32, 41h,001000010b
list2
db
0Ah,20h,'A',22h
 A variable contents may be left undefined by using the question
mark (?) operator. Or a numeric expression can initialise a
variable with a value that is calculated at assembly time.
Examples:
count
db
?
ages
db
?,?,?,?,?
scrn_size
db
80*24
52223_ALP/18
Overview of IA-32 Assembly Language Programming - Part 1
…DB - Define Byte
 A string may be assigned to a variable, in which case the
variable (label) stands for the address of the first byte.
C_string
db
"Good morning",0
pascal_string
db
12,"Good morning"
 Long strings can be made more readable in an AL source
program by continuing them over multiple lines without the
necessity of supplying a label for each. The following string is
terminated by an end-of-line sequence and a null byte:
a_long_string db
"This is a string "
db
"that clearly is going to take "
db
"several lines to store in an "
db
"assembly language program."
db
0Dh,0Ah,0
; EOL sequence + NULL
52223_ALP/19
Overview of IA-32 Assembly Language Programming - Part 1
$ Operator
 The assembler can automatically calculate the length
of a string by making use of the $ operator which
represents the assembler's current location counter
value. In the following example, a_string_len is
initialised to 16:
a_string
a_string_len db
52223_ALP/20
db
"This is a string"
$-a_string
Overview of IA-32 Assembly Language Programming - Part 1
DW - Define Word
 The DW directive creates storage for one or more 16bit words. The syntax is:
[label]
DW initialvalue [,initialvalue]
 Initialvalue can be any 16-bit value from 0 to 65,535
(FFFFh) or -32,768 (8000h) to +32,767 (7FFFh) if
signed, a constant expression (evaluated at assembly
time), or a question mark (?) to leave a variable
uninitialised.
52223_ALP/21
Overview of IA-32 Assembly Language Programming - Part 1
DW and Near Pointers
 The offset of a variable or subroutine may be stored in
another variable. In the next example, the assembler
sets listPtr to the offset of list. Then
listPtrPtr contains the address of listPtr.
Finally, aProcPtr contains the offset of a label
called clear_screen.
list
dw 256,257,258,259
listPtr
dw list
listPtrPtr dw listPtr
aProcPtr
dw clear_screen
52223_ALP/22
Overview of IA-32 Assembly Language Programming - Part 1
DD - Define Doubleword
 The DD directive creates storage for one or more 32-bit
doublewords. The syntax is:
[label] DD
initialvalue [,initialvalue]
 Initialvalue can be any 32-bit value up to FFFFFFFFh, a
segment-offset address, a 4-byte encoded real number, or a
decimal real number. The bytes are stored in little-endian
format, i.e. the value 12345678h would be stored in memory as:
memory address (offset):
00 01 02 03
contents:
78 56 34 12
52223_ALP/23
Overview of IA-32 Assembly Language Programming - Part 1
…DD - Define Doubleword
 You can define either a single doubleword or a list of
doublewords. In the example that follows, far_pointer1 is
uninitialised and the assembler automatically initialises
far_pointer2 to the 32-bit segment-offset address of
subroutine1:
signed_val
dd
-2147483648
far_pointer1 dd
?
far_pointer2 dd
subroutine1
52223_ALP/24
Overview of IA-32 Assembly Language Programming - Part 1
DUP Operator
 The DUP operator only appears after a storage allocation
directive (DB, DW,...). DUP allows for the repetition of one or
more values when allocating storage. This is especially useful
when allocating space for a table or array. For example:
db
db
db
20 dup(0)
20 dup(?)
4 dup('ABC')
; 20 bytes, all zeroed
; 20 uninitialised bytes
; 12 bytes: 'ABCABCABCABC'
 The DUP operator may also be nested. The first example below
creates storage containing (in ASCII) 000XX000XX. The
second example creates a 2-dimensional word table of 3 rows
by 4 columns:
aTable
anArray
52223_ALP/25
db
dw
4 dup( 3 dup('0'), 2 dup('X') )
3 dup( 4 dup(0) )
Overview of IA-32 Assembly Language Programming - Part 1
Type Checking
 When a variable is created using DB, DW, etc., the
assembler gives it a default attribute (byte, word, etc.)
based on its size. This type is checked on referencing
the variable and an error results if the types do not
match. So:
count dw 20h
...
mov al,count ;error: type mismatch
52223_ALP/26
Overview of IA-32 Assembly Language Programming - Part 1
…Type Checking
 To overcome type checks requires the use of a LABEL
directive to create a new name (and associated type) at
the same address. Thus:
count_lo label
byte ; byte attribute
count
dw
20h ; word attribute
...
mov al,count_lo ; use low byte of count
mov cx,count; use all of count
52223_ALP/27
Overview of IA-32 Assembly Language Programming - Part 1
Addressing Modes Revisited


As we have seen an instruction consists of
a) the op-code that tells the process what instruction to
perform and,
b) the operand or address field which tells the processor where
to find that data to be operated upon. This address is known
as the Effective Address (EA).
To determine the EA, the processor uses one of a number of
addressing modes that are defined by the operand field of the
instruction. Getting the EA from the addressing mode may be
quite simple (e.g. the operand is [the contents of] a data
register) or complex (e.g. the operand is in memory, the
address of which is contained in an address register). [See
52223_02/16-34 for details of the IA-32 AMs.]
52223_ALP/28
Overview of IA-32 Assembly Language Programming - Part 1
Aside: Lecture Notes Archive
 Further examples of AMs, etc., of the IA-16 subset of
IA-32 can be found in my lecture notes archive at:
<http://www.cis.strath.ac.uk/~dunc/
cdrom/archives/ay2000/teaching/llp/lectures/odd/part2.html>
52223_ALP/29
Overview of IA-32 Assembly Language Programming - Part 1
DEBUG
 DEBUG is included as part of the standard Windows
installation.
 DEBUG is a DOS-mode debugger, which means:
• It’s of no use for debugging Win 32 applications
• But it is useful to explore the wonderful(!) world of the IA16 (real mode) subset of IA-32.
 An overview of DEBUG can be found in my lecture
notes archive at:
<http://www.cis.strath.ac.uk/~dunc/
cdrom/archives/ay2000/teaching/llp/practicals/debug.html>
52223_ALP/30
Overview of IA-32 Assembly Language Programming - Part 1
H:/llp/p1>debug
Statement
Comment
A 150
Assemble data at offset 150
db 10,20,30,0
1st 3 bytes are array, last is sum
<ENTER> ends assembly
A 100
Assembly code at offset 100h
mov bx,150
BX points to the array
mov si,2
SI will be an index
mov al,[bx]
Indirect operand
add al,[bx+1]
Base-offset operand
add al,[bx+si]
Base-indexed operand
mov [153],al
Direct operand
int 20
End program
<ENTER> ends assembly
T
Trace each instruction
…
[trace output appears…]
D 150,153
Dump array and sum
52223_ALP/31
Overview of IA-32 Assembly Language Programming - Part 1
References & Bibliography
 Duncan’s Archived 52.223 Lecture Notes
<http://www.cis.strath.ac.uk/~dunc/cdrom/archives/ay2000/teaching/llp/>
 sandpile.org -- IA-32 architecture
<http://www.sandpile.org/ia32/index.htm>
 PC Assembly Language
<http://www.drpaulcarter.com/pcasm/>
 Linux Assembly HOWTO
<http://www.faqs.org/docs/Linux-HOWTO/Assembly-HOWTO.html>
 Inline Assembly with DJGPP
<http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.html>
 docs.sun.com: IA-32 Assembly Language Reference Manual
<http://docs.sun.com/app/docs/doc/806-3773/6jct9o0ad?a=view>
 Pentium Assembly Code Using gcc
<http://william.krieger.faculty.noctrl.edu/archive/c2003_09_csc220/assembly/>
 Microsoft Windows XP - Debug
<http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/debug.mspx>
52223_ALP/32
Overview of IA-32 Assembly Language Programming - Part 1