Assembly Language 1

Download Report

Transcript Assembly Language 1

ECE 485/585
Microprocessors
Chapter 3
x86 Assembly Language 1
Herbert G. Mayer, PSU
Status 11/22/2016
1
Syllabus













Motivation
16-bit, 32-bit, 64-bit Processor
Null Program
Print Character
Print String
INT Function
Assembler Abbreviations
Macros
Procedures
Assembly and Linking
nasm Assembler
Summary
Appendix
2
Motivation
 Almost impossible to communicate with a
microprocessor on the binary level; causes insanity!
 Assembler offers abstraction, relocatability, and
program reuse
 Symbolic names permit convenient definition and
reference of named data and code objects
 Assembler offers high level data and control
constructs, similar to high-level languages
 Assembler programming allows high level of control
over the target machine
 And achieves highest performance -for short code
sections
3
Motivation
 Intel x86 is the most widely used microprocessor for
general computing; made by Intel and AMD
 The ARM processor is most widely used processor
for portable devices, e.g. tablets and cell phones
 We use Intel x86 here to explain the relation of μP
and assembly language; for any one μP, there may
be many assemblers, but only a single binary code
 The μP architecture defines details of the assembler
instructions; yet some assembly language detail is
independent of architecture
 E.g. the syntactic order in which operands are listed
in assembly instructions is arbitrary, but the bits
have to be assembled into their specific bit positions
of a machine instruction
4
Motivation
 Any machine instruction has its corresponding
assembler syntax; AKA mnemonics
 Different manufacturers of an assembler may have
different syntax rules and different mnemonics for
the same machine instruction
 For example, some define the destination register to
be situated in the leftmost position of the various
defined operands; e.g. a load instruction for a
hypothetical machine could be:
ld r1, [foo]
-- load word at address foo into reg r1
 Others might reverse the order, or use different
mnemonics, or name registers differently, such as:
load foo, %r1 -- load word at address foo into reg r1
5
Motivation
 Some manufacturers refer to moving bits from
memory into a register as a load instruction (IBM);
others as a move instruction (Intel)
 Assembly Language bridges the gap between low
level binary machine instructions and higher level
interface with human programmers
 Binary instructions execution on a digital computer,
while an assembler provides a tool of expressing
programs in readable text form, readable by
programmers
 Assembly language is by no means high-level in the
sense of machine independent, structured, or objectoriented
 It is a low level, target machine specific interface; but
shields programmers from the tedium of binary code
6
Motivation
 Users do not deal with the target machine in terms of
bits that represent binary machine instructions
 An assembler is a piece of system software that
maps an assembly source program into binary
instructions
 Thus assembly language provides an abstraction:
1. It elevates the user to the level of textual language, up
from the level of binary object code
2. Several, different assemblers may do this in
syntactically different ways for the same target μP
3. Yet the generated binary code has to be identical for
each assembler, in order to render the object code
executable on the targeted μP
7
Motivation
 Common to many architectures is the notion
(and separation) of data space, instruction
space, and perhaps other areas of program
logic (stack space, read-only space etc.)
 The x86 architecture embodies so called data
segments, code segments, stack segments,
and numerous of these if needed
 Each segment is identified at run time by a
segment register
 Offsets to specific data or code elements are
identified by offsets from the start of their
respective segment
8
Motivation
 For example, the code label next: will be interpreted
by the hardware as seg: offset, where seg is the
segment register cs, and offset is the offset of label
next from the start of the code segment
 Let’s say the offset of next is 248x and the value in the
cs register is 2030x, then the resulting run time (code)
address is 20548x Note the left-shift of the segment
address by 4 bits!
 This is possible, and required, since all segments are
required to be aligned at modulo-16 addresses on the
Intel x86 architecture
 Thus a segment’s starting address is always a
multiple of 16, and its binary address would always
have the rightmost (low-order) 4 bits 0; hence can be
skipped in asm source code and 16 bits suffice
9
Motivation
 This chapter introduces complete programs,
written in assembly language
 Starting with the smallest possible but
complete assembly program, we progress to
more sophisticated programs
 One example emits a single character, the
next prints a complete string onto the
standard screen, followed by conventions that
allow us to communicate with the assembler
in an abbreviated way
 We also discuss macros and simple
procedures (AKA functions) with calls, returns
10
16-Bit, 32-Bit, 64-bit Architecture
 The Intel x86 processor started out as a 16-bit
architecture in the late 1970s
 The x86 product name was: Intel 8086 μP
 Then the x86 architecture grew to become a 32-bit
architecture
 The initial product name being Intel 80386; yes, there
were preliminary versions, named 80186 and 80286,
with very short lives
 The 32-bit version was backwards compatible with the
16-bit architecture and could execute old 16-bit code
 In early 2000, since AMD had produced a 64-bit version
of the x86 family, very much to the surprise of Intel,
Intel productized a 64-bit version as well, in addition to
the new and different Itanium
11
16-Bit, 32-Bit, 64-bit Architecture
 The AMD product of 64-bit
x86 name was AMD64
 Intel’s name: Intel 64
 Old 16-bit and 32-bit x86
code is compatible and
executes without issue on
the new 64 bit processors
 Through not with optimal
speed, as legacy object
code cannot take advantage
of new instructions that
may speed up certain
applications
12
Photo of AMD64 μP
16-Bit, 32-Bit, 64-bit Architecture
 AMD’s 64-bit version of the old x86 architecture must
have sent shock waves through Intel, which at the
time of AMD’s release had no published plans to
release a 64-bit version of the old x86 machine
 That quickly changed, as Intel had been smart
enough, to have its skunk work design new Intel 64bit μP in secrecy
 All 8 old registers were expanded to 64 bits, names
modified correspondingly, to differentiate them from
their 32-bit or 16-bit siblings
 Old names, e.g. “eax” for 32-bit version of ax register
were modified to “rax”, for the 64-bit version of the ax
register; reminder: ax register has 16 bits
 Intel added 8 more GPR to the register-starved
architecture; these are known as rn, with n = 8..15
13
16-Bit, 32-Bit, 64-bit Architecture
14
MMX Extensions
 MMX registers introduced with Intel Pentium® II
processors; Streaming SIMD Extension (SSE)
introduced later with Pentium III
 MMX regs, standing for Multi Media Extensions.
Other meanings of acronym MMX exist!
 Visible as floating-point registers fp(i), 80 bits long,
to process 80-bit version of IEEE 754 format
 80-bit form that became FP standard
 Named fp(0) .. fp(7), known in assembler as fp0 .. fp7
 Aliased with 8 64-bit MMX registers; can use one or
the other in respective lengths, but not both at same
time; switch is needed to use one or the other
 With SSE introduction, MMX length grew to 128 bits
15
MMX Extension
Feature
Pentium® III
Processor
MHz
450-600 MHz
L2 Cache
512k off-die
Execution Type
System Bus
100MHz
MMX™ Technology
Yes
Streaming SIMD
Extensions
Streaming SIMD
Extensions 2
Manufacturing
Process
Chipset
Dynamic
Pentium® III
Processor
600 MHz – 1.13GHz
256k on-die
Dynamic
Pentium® 4
Processor
Northwood
1.5 GHz
2+ GHz
256k on-die
512k on-die
Intel®
NetBurst™mArch
Intel®
NetBurst™mArch
400MHz
400/533MHz
(4x100 MHz)
(4x100/133 MHz)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
.25 micron
ICH-1
133MHz
.18 micron
ICH-2
16
.18 micron
ICH-2
.13 micron
ICH-2
MMX Sample Operations
17
XMM Extension
 8 new 128-bit registers with SSE introduction,
named XMM0 .. XMM7
 Not aliased with any others, usable with MMX
 XMM registers usable as scratch, for various
purposes
 Handle all data types simultaneously without
penalty; e.g.:
 xmm0 - Extended SIMD integer data
 xmm1 - Single-precision FP
 xmm2 - Double-precision FP, etc.
 Usage does not add latencies, if all operations on a
register are of consistent same data type
18
XMM SHUFPD Operation
SHUFPD: Shuffle Packed Double-FP
XMM2
1
0
y2
y1
XMM1
y2-y1
XMM1
SHUFPD XMM1, XMM2, 3
XMM1
y2
SHUFPD XMM1, XMM2, 2
XMM1
y2
19
1
0
x2
x1
x2-x1
// binary 11
x2
// binary 10
x1
16-Bit, 32-Bit, 64-bit Usage
In assembly code below we use the following
names for the ax register, depending on 16-bit, 32bit, or 64-bit modes:
 ax
16 bits; also al is the low order byte register
 eax
32 bits
 rax
64 bits
Ditto with the other registers, for example, the bx:
 bx
16 bits; also bh is the high order byte register
 ebx
32 bits
 rbx
64 bits
Etc.
20
A Null Program
In x86 Assembly Language
21
Null Program
 Goal here is to craft an x86 assembly language
program that assembles, links, loads and executes
correctly, and then does nothing 
 Set up segments: code, data, and stack
 Here only the Code Segment as the others are empty
 Note the ’code’ string to identify code segment
 Communicate implied seg portion of seg:offset in
assume instruction
 Define start address (actually offset) via label, here
label start:
 Labels are user-defined identifiers, each followed by
colon, in the code segment
22
Null Program
; Source: out1.asm
; Purpose:simplest program, no data seg, no stack
code_s
segment ’code’
; ’code’ identifies segment
assume cs:code_s ; implied seg register cs
; use of some magic numbers:
start:
mov al, 0h
;
mov ah, 4ch
;
int 21h
;
code_s
ends
;
end start
0h, 4ch, 21h
termination code, same as 0
to terminate: place 4ch in ah
call system sw for help: 21h
end of code segment, good death
; end argument defines start
23
Null Program
 Use manufacturer-provided assembler services: Here
4ch to terminate; the ‘h’ stands for ‘hexadecimal’
 Run-time services, requested via INT 21h
 Service refinement specified in register ah, ‘h’ stands
for ‘high’ byte of the 2 bytes in ax
 Return code is 0, meaning: no errors occurred
 Comments start anywhere on line with ;
 Comments end at the end of line
 Can be different in different assemblers! Careful!
 Assembler used here assumed to be Microsoft
product: masm or ML; ML is newer, compatible
system SW tool
24
Print Single Character:
We Choose ’$’
25
Print Character ‘$’
 Goal to craft an x86 assembly language program that
assembles, links, loads and executes a complete
program for the purpose of printing one character
 Define also data and stack segment; though they will
remain unused; just used for demonstration
 Use assembler instruction to define data, here a
single machine word, via dw:
dw 999
; reserves 1 word, initialized: 999
 And we define an array of 100 machine words, via the
dup pseudo-opcode dup:
100 dup( 0 )
; defines 100 words, initialized: 0
; but all unused in program below!
26
Print Character ’$’
; Source: out2.asm
; Purpose: simple DOS program to output a character: ‘$’
data_s
segment
; unused data segment
dw
999
; define a word, init 999
data_s
ends
stack_s
stack_s
code_s
start:
code_s
segment
; unused stack segment
dw 100 dup( 0 ) ; reserve 100 words, init 0
ends
segment 'code'
; THE Code Segment
assume cs:code_s, ds:data_s
mov
ax, seg data_s ; initialize ds via ax
mov
ds, ax
; cannot load directly into ds
mov
dl, '$'
; char to print is assumed in dl
mov
ah, 2h
; call 2h emits char in dl
int
21h
; call OS routine, e.g. DOS
mov
ax, 4c00h ; termination code in ah + al
int
21h
; terminate finally via call
ends
; repeat seg name at ends
end
start
; say: Where to start
27
Print Character ‘$’
 Again a special DOS system routine is called to
provide help: INT 21h
 The specific argument, communicating which help
is needed, must be passed in register ah
 Value 2 (AKA 2h) in ah states: character output is
desired
 OS service routine 2 prints a char; it outputs the
one found in register dl; that is the ‘$’ character
 Moving 4c00h into register ax is same as 4ch into
register ah and 00h into al
 Note that one of the h qualifiers says “hex”, while
the other says “high”!! To confuse students 
 4c00h is just two byte literals concatenated
28
Printing a Character String
29
Print String
 Goal now is to craft an x86 assembly program that
assembles, links, loads and executes a program to
print a character string
 The Data Segment defines a string of bytes, initialized
to some string literal, identified by symbol msg
 This name msg is a user-defined name for the byte
address, where the string starts
 Note the $ character to end a string literal; hence
length has no inherent upper limit
 Used as end criterion for system SW routine 9
 Stack segment here is solely a dummy segment:
 It holds 10 unused strings, each of length 16, solely
for demonstration
30
Print String
; Source: out3.asm
; Purpose: simplest program to output character string
data_s
segment
msg
db
"Hello CCUT class$" ; was done in China!
data_s
ends
stack_s
stack_s
code_s
start:
code_s
segment
db
ends
; unused
10 dup( "---S t a c k----" )
; repeating name stack_s OK
segment 'code'
assume
cs:code_s, ds:data_s
mov
ax, seg data_s ; silly detour via ax
mov
ds, ax
; ds points to data_s
mov
dx, offset msg ; System SW prints
mov
ah, 9h
; sys call 9h emits string
int
21h
; call OS routine
mov
ax, 4c00h
; term code in ah + al
int
21h
; term finally via call
ends
; label seg name at ends
end
start
; start here! Yup: Microsoft
31
Print String
 System SW routine 9 emits character string to the
standard output file; note 9 is same as 9h
 Whose start address it finds in ds:offset, offset
communicated in register dx
 Note the built-in system-SW pseudo-function offset
applied to a data label, here label msg
 System-SW also provides built-in seg pseudofunction to generate another part of the final address
32
INT Function
 The x86 INT instruction, AKA interrupt, is not
what the computer sciences call an interrupt
 Instead, this is a call to a low-level system
SW routine, named INT
 Parameterized by the single-byte argument
residing in the ah register
 The actual system SW being executed as a
result of INT is dependent on the actual
operating system on which the x86 code
executes; here Microsoft DOS
 Thus it may be different on a Linux system,
Windows, or Unix system
33
Assembler Abbreviations
34
Assembler Abbreviations
 Assembler directive .mode small allows for certain
default abbreviations and assumptions
 For example data, code, stack, @data are predefined in
Microsoft assemblers, as are assume statements
 Here another string is printed, that string is “Hello”
 Note again the $ terminator, must be supplied
 Different meanings of $ on different target systems 
 E.g. $ means “current code address” in Linux
 Under Microsoft assembler SW, the macro @data is
predefined by ML (or masm), same as seg data
 Note again offset function, to compute the byte
distance (i.e. offset) from start address of the segment
35
Assembler Abbreviations
; Source file: out4.asm
; Purpose: simpler program to output string
.model small
; assumes stack data code
.stack 10h
hi
.data
db "Hello$”
.code
start: mov
mov
mov
mov
int
; assumes segment name: stack
; assumes segment name: data
;
ax, @data
;
ds, ax
;
dx, offset hi ;
ah, 9h
;
21h
;
assumes segment name: code
@data predefined macro
now data segment reg set
string 2 b output by System SW
System SW 9h emits string
call System SW
mov
int
ax, 4c00h
21h
; we want to terminate: ah + al
; terminate finally
end
start
; start here!
36
Assembler Abbreviations
 Note again the System SW routine 9 under Microsoft
system SW, to output some $-terminated string of
characters, whose at address is found in register dx
 Program using .model small abbreviation is smaller,
more compact, easier to read
 The .code ends previous segment, if any (here data)
 And starts code segment
 The .data ends previous segment, if any
 And starts the data segment
 So how does one output the ‘$’ character string?
37
Macros
38
Macros
 Programmers get tired  of writing segment … ends
 The .model small allows defaults and abbreviations
 Macros make program source more readable, easier
to maintain; here are the rules:
 Macros can be defined anywhere in assembler source
 The initial assembler translation process extracts all
macro definitions, stores them during assembly time,
and uses (expands) them, each time a macro name is
found in the asm source
 Macros are introduced by user defined name and the
macro keyword
 Terminated by endm keyword
39
Macros
; Source file: out5.asm
; Purpose:
macro-ized program to output character string
start macro
mov ax, @data
mov ds, ax
endm
;
;
;
;
no parameters
@data predefined macro
now data segment reg set
end of start macro
Put_Str macro Str
; one formal parameter, "Str”
mov
dx, offset Str; string 2 b output by DOS
mov
ah, 9h
; DOS call 9h emits string
int
21h
; call system SW
endm
; end of Put_Str macro
Done
macro
mov
mov
int
endm
ret_code
ah, 4ch
al, ret_code
21h
;
;
;
;
;
formal parameter ”ret_code”
want to terminate, ah = 4c
communicate: all is o.k.
terminate finally via DOS
end of macro body: Done
40
Macros, Program Cont’d
.model small
; allow predefined assumptions
.stack 10h
; assumes segment name: stack
.data
; assumes segment name: data
hi
db "Hello$"
.code
main: start
Put_Str hi
Done
0
end
main
; terminate string with $
; assumes segment name: code
; compare to page 31! Way shorter!
; use of macro ”start”
; invoke macro ”Put_Str” with hi
; use of macro ”Done”
; start here!
41
Macros
 Macros specify 0 or more formal macro parameters,
which can be referenced in the macro body
 At the place of macro definition, these parameters are
named formal parameters
 Formal parameters follow the macro keyword at the
place of definition
 At the place of use (the place where they are
expanded) formals are substituted by actual
parameters
 When macro name is used, its body is expanded inline at that place, with all actual parameters taking the
place of the formal ones
42
Assembler Procedures:
Like High-Level Language Procedures
43
Procedures
 Assembler procedure identified by proc and endp
 Procedures can be called and provide a syntactic
grouping mechanism to form physical modules
containing logically connected actions
 The Microsoft syntax rule for procedure names does
not allow : as used for labels
 Return instruction ret ends a procedure body and
allows return to the place of call, immediately after
the call instruction
 Physical procedure definitions allow logical
modularization
44
Procedures
; Source file: out6.asm
; Purpose: modular macro program to output string
start macro
mov
mov
endm
;
ax, @data ;
ds, ax
;
;
Put_Str macro Str
.data
hi
db
"Hello$”
. . .
main
main
.code
proc
start
Put_Str
Done
ret
endp
end
hi
0
main
no parameters
@data predefined macro
now data segment reg set
end of “start” macro body
; “Str” must be data label
; assumes name: data
; terminate string with $
;
;
;
;
;
;
assumes name: code
begin of procedure body
invoke “start” macro
invoke “Put_Str” with actual
invoke “Done” with actual 0
redundant return
; entry point is “main”
45
Procedures
 Like in High-Level language programs, procedures
are a key syntax tool to modularize
 Eases pain of asm programming
 Physical modules (procedures) encapsulate data and
actions that belong together
 Physical modules –delineated by the proc and endp
keywords– are the language tool to define such
logical modules
 Net result: programs that are easier to write, and
above all, easier to read
46
Assembly and Linking
Of
Full Programs
47
Assembly
 Linking is the process of binding 2 or more pieces of
software together in a way that they constitute one
running program
 Clearly the start address, where execution begins,
must be defined, by convention
 Typical tools to link include:
1. Microsoft Macro Assembler masm
2. Borland Macro Assembler tasm
3. Microsoft Macro Assembler ml
4. Microsoft Linker link
5. Borland Linker tlink
48
Assembly With MASM
 The Microsoft macro assembler old version has the
name masm
 A newer assembler from Microsoft is named ml
 This section explains the masm command briefly
 The masm command in version 5.10 and older has 4
arguments, separated from one another by commas.
These arguments are file names
 Arguments are considered omitted, if no comma (and
thus no file name) is given
 The assembler prompts for each omitted one, so it is
generally better to provide them, at least the commas,
lest there will be repeated interaction with the
assembler asking for file names, or hitting of carriage
returns
49
Assembly With MASM
 It is a nuisance in masm 5.10 that the last comma (the
third one to separate 4 arguments) must be followed
by another comma (or semicolon, indicating the end
of a command line)
 Else the assembler does not recognize that the
default should be used for the fourth argument
 If commas without file names are given, then default
file names are assumed
 The four file names, which are the arguments of the
masm command, are left to right:
50
Assembly With MASM
1. assembly source program, e.g. source.asm
2. object program generated by assembler, e.g.
source.obj
3. the listing, generated by the assembler, say
source.lst; yes, in days of old, people actually created
paper listings of programs being processed
4. the cross-reference file, named source.crf
51
Assembly With MASM
 Suffixes obj, lst, and crf are automatically generated
by the assembler, if no other names are provided
 Some complete masm commands, for the assembler
file src1.asm would be:
masm src1.asm, src.obj, src.lst, src.crf; no prompting
masm src1,src1,src1,src1
; no prompting
masm src1,src1.obj,src1,src1.crf ; no prompting
masm src1,,,;
; no prompting
 In the above cases the assembler will not prompt you,
because you provided all file names
 It was smart enough to think up the suffixes (like .lst
and .obj) from the respective positions
52
Assembly With MASM
 Some incomplete masm commands for source file
src2.asm, are shown next
 The assembler will prompt the user for the missing
ones:
masm src2.asm, src2.obj; asks for: list, cross ref file (xref)
masm src2,foo,src2
; creates foo.obj, src2.lst, asks xref
masm src2,,bar.lst
; creates src2.obj, bar.lst, asks xref
masm src2
; asks for object,list, cross ref file
 Borland Macro Assembler tasm 5.10
 Similar to masm, but command is tasm
53
Linking Assembler Programs
54
Linking
 The Microsoft link command also has 4 arguments,
one input file and 3 output files
 Input is the object to be linked
 The object may be a concatenation of multiple object
files, typically ending in the .obj suffix, concatenated
via the + operator. For example:
link mem0 + putdec,,,
 creates an executable mem0.exe
 The file name mem0 is derived from the first part of
the first argument; suffix .exe is assumed
 Also, the object file putdec.obj is used as input, used
to resolve external names used in mem0.obj
55
Linking
 The link command has 4 arguments: the 4 file names
are:
1. object files, concatenated by + with default suffix .obj
2. the linked executable with suffix .exe
3. the load map file, whose name ends in .map
4. the library
 If the input file is provided without suffix then the
suffix .obj is assumed
 If the executable file is specified without suffix, then
.exe is assumed
 Any other file and suffix is allowable too
56
Linking
 The file for the load map can be specified
 If none is provided then the file name nul is generated
by the linker
 If no file suffix is provided, then the .map suffix is
assumed. Similarly, for the library a file name must be
specified
 The suffix is .lib
 The commands below do not cause the linker to
prompt for additional file name inputs, because
sufficient information is assumed:
link mem0 + putdec,,,,
; mem0.exe, no map, no library
link mem0+putdex,foo.bar,,,
; generate executable foo.bar
link putdec+mem0,mem0.exe,,,
; mem0.exe
57
Linking
 Concatenation operator + may be embedded in any
number of blanks
 Commas may be surrounded by 0 or more blanks
 The order of specifying object files is immaterial,
provided the main entry point is unambiguous
 The commands below cause the linker to prompt for
some additional information:
link mem0 + putdec
; executable, map, and library
link mem0+putdec,x.y; ask for map and lib
link putdec+mem0,,
; gen putdec.exe, ask for map and lib
58
Main Entry Point
 Each assembly unit (.asm source file) must end in an
end directive (in MS AKA as end statement)
 This end statement may have a label, identifying one
of the labels of proc names of the program. Such a
label specifies the entry point, i.e. the initial value of
ip, set by the loader
 However, if an executable file is composed of multiple
objects, there must be one single entry point. All
other source modules should not specify an
argument after their end statement
 If, however, two or more object modules to be linked
into an executable do have entry points specified,
masm does not complain!
 Instead, it takes the first one of the objects listed as
the first argument in the link command. And if this is
not the intended entry point, program execution will
bring surprises
59
nasm Assembler
60
Nasm Assembler
Simplest possible, meaningful asm program that
outputs a character string. Assumes translation via
Borland nasm command
1. ; introduces comment, until the end of source line
2. %define macro_name value the value is replaced,
whenever the macro name is found
3. section pseudo instruction defines one of various
data segments, or code or stack segment
4. mov is instruction to move bits to register, memory
on the left, from source on the right
5. $ pseudo-operator means: Current value of location
counter.
6. int 80h instruction is an x86 instruction that uses
GPRs to determine what to do
61
Nasm Assembler
; Asm:
Netwide Assembler (nasm)
; Note:
uses Linux system calls, not Microsoft!
; Define convenient symbolic names for Linux system calls
%define __NR_exit
%define __NR_write
%define STDOUT_FILE
1
4
1
; symbolic names system dependent
; 4 for output under Linux
; 1 for standard out under Linux
section .data
message:
msglen:
; Other section names: .rodata and .bss
; have specific, and distinct, meanings
db "Hello CCUT class"
equ $ - message ; # bytes in message
section .text
; All executable code is in the .text section
global _start ; required to announced name “start” for linker
start:
; used by linker; similar to "main()" in C
62
Nasm Assembler
; Display the string on stdout
mov eax, __NR_write ; system call number for write
mov ebx, STDOUT_FILE ; write string to stdout
mov ecx, message
; address of string
mov edx, msglen
; number of bytes to write
int 80h
; call Linux
; Exit
mov
mov
int
the program
eax, __NR_exit
ebx, 0
80h
; system call number for exit
; exit status 0: "success"
; call Linux
63
Summary
 Comments introduced by ;
 .model pseudo instruction tells assembler: which
memory model to be used, pulls in predefined macros
 .stack is one such macro; tells assembler: Use of
stack is included in this program!
 Leftmost column used for optional labels
 Labels are symbolic names you can refer to in the
source; eases relocation
 Next column used for commands or pseudo
commands; but if no label is used, first string is the
asm command
 data_s is a symbolic name chosen to name a data
segment
 Define string literal by embedding it between pair of
double quotes, e.g. "Hello ECE class”; remember ‘$’
64
Summary
 The ends pseudo instruction says: end of segment;
may be redefined any number of times again
 The assume pseudo instruction tells assembler,
which value to set cs and ds registers to
 The segment ‘code’ pseudo instruction defines the
code segment
 mov is instruction to move bits to/from register,
memory or (if source) literal
 move offset message instruction breaks address into
segment/offset pair and uses offset
 The int 21 instruction is an x86 interrupt (really a
system call) that uses other registers to determine
what to do
 The end start pseudo instruction says: start
execution at first address of the segment with the
symbolic name start
65
Appendix:
Some Definitions
66
Definitions
Address
 Identity of any one of the distinguishable memory
units, e.g. bytes or words
 On the x86 architecture a logical address is a pair
seg:offset, which is translated by the hardware into
linear address
 The segment and the offset are 16 bits long each in
real mode
 The machine address, called a linear address, is 20
bits long on the original x86 microprocessor
 Since the 1980s Intel has produced the more famous
32-bit version of its x86 μP, and since the 2000s, the
64-bit version has become common
67
Definitions
Assembler
 A source to object translator, reading relocatable,
abstract, machine-specific source programs,
translating them into binary object code
 After linking, the binary code is executable
68
Definitions
Binary Object
 These are strings of bits, which, when interpreted by
the target machine, are legal machine operations plus
associated memory references
 Jointly, these bit strings represent executable
programs
69
Definitions
Code Segment
 The code segment is a subsection of memory which
holds executable instructions
 Possible to embed so called immediate operands in
the code segment, but these are not meant for
execution; generally they are prevented from being
executed by a branch round immediate data
 On the x86 microprocessors, the start address of the
code segment is identified by the cs register
 A complete program is comprised of one or more
code segments
70
Definitions
Data Segment
 Subsection of memory which holds data to be
manipulated
 Like any segment, a data segment is identified by a
segment register, holding its start address
 Such an address must be evenly divisible by 16 on
the x86 family processors
 Such aligned addresses are also the starts of
paragraphs
71
Definitions
Offset
 Byte distance of a named object (addressable unit)
from the beginning of an area that encompasses the
name
72
Definitions
Relocation, Relocatability
 Ability of data to be placed in any location of memory
 For example, referring to data (or object code) by
offsets relative to some start address allows the code
to be placed anywhere, as long as the respective start
address is always added at execution time
 Even object code can be relocatable, if al address
references in that code are relative to the code’s start
address
73
Definitions
Segment
 Subsection of memory with no fixed or predefined
length restriction
 A segment is identified by a segment register and
holds either code, data, or stack space
74
Definitions
Stack
 Data structure holding data, identified by a stack
segment register (ss). Access to these data is
restricted in a specific way, often referred to as lastin, first-out
 The amount of actively live data varies over time:
 Increase of data is accomplished through an
operation called pushing, decreases via popping
 A stack segment register points to the beginning of
the stack
 While the stack pointer register (sp) to the current top
 This top (i.e. the value of the sp register) varies
frequently during execution
75
Definitions
Top of Stack
 Select the element on the stack that is immediately
accessible, AKA addressable
 That element is said to be “at the top”
 There may be other elements in the stack as well,
hidden by the top element
 Additional elements are created by pushing, and
elements are removed by popping
 If the stack is empty, and the top element is accessed,
an error occurs
76
Bibliography
1. Jan’s Linux and Assembler:
http://www.janw.easynet.be/eng.html
2. Webster Assembly Language:
http://webster.cs.ucr.edu/
3. Nasm assembler under Unix:
http://www.int80h.org/bsdasm/
77