Data Transfer Instructions (cont.)
Download
Report
Transcript Data Transfer Instructions (cont.)
Assembly Language Fundamentals
Chapter 2
1
Directives and Instructions
Assembly language statements are either directives or
instructions
Instructions are executable statements. They are translated
by the assembler into machine instructions. Ex:
call MySub
mov ax,5
;transfer of control
;data transfer
Directives tells the assembler how to generate machine code
and allocate storage. Ex:
count db 50
2
;creates 1 byte
;of storage
;initialized to 50
A Template for Assembly Language Programs
.386 = directive to accept
all instructions of 386 and
previous processors (use
.586 to assemble Pentium
specific instructions)
end = directive that marks
the end of the program
main = label of the entry
point of the program (first
instruction to execute)
ret = instruction that
returns the control to the
caller (here the Win32
console)
Macros to perform I/O are
included in csi2121.inc
3
.386
.model flat
include csi2121.inc
.data
;data allocation
;directives here
.code
main:
;instructions here
ret
end
The FLAT Memory Model
The .model flat directive tells the assembler to generate code
that will run in protected mode and in 32-bit mode
Also ask the assembler to do whatever is needed in order
that code, stack, and data share the same 32-bit memory
segment
All the segment registers will be loaded with the correct
values at load time and do not need to be changed by the
programmer
Only the offset part of a logical address becomes relevant
Each data byte (or instruction) is referred to only by a 32-bit
offset address
The directives .code and .data mark the beginning of the
code and data segments. They are used only for protection
.code is read-only
.data is read and write
4
Steps to Produce an Executable File
Source
file
Assembler
Object
file
linker
Executable
file
library
The assembler produces an object file from the assembly
language source
The object file contains machine language code with some
external and relocatable addresses that will be resolved by
the linker. There values are undetermined at that stage.
The linker extract object modules (compiled procedures)
from a library and links them with the object file to produce
the executable file.
The addresses in the executable file are all resolved but they
are still logical addresses.
5
Using Borland’s BCC32
All these steps are performed with the command:
bcc32 –v hello.asm
The bcc32 command calls TASM32 to assemble
and produce an object file
It then calls ILINK32 to link this object file with the
C/C++ library functions and Win32 functions used
by the program to produce the executable file
hello.exe
The –v option produces full debugging info
See the LabInfo page for all the info you need
6
Names
A name identifies either:
a variable
a label
a constant
a keyword (assembler-reserved word).
7
Names (Cont.)
A variable is a symbolic name for a location in memory that
was allocated by a data allocation directive. Ex:
count db 50
; allocates 1 byte to
; variable count
A label is a name given to an instruction. It must be followed
by ‘:’. Ex:
main:
mov eax, 5
xor eax, ebx
jump main
8
Names (Cont.)
The first character must be a letter or any one of
‘@’, ‘_’, ‘$’, ‘?’
subsequent characters can include digits
A programmer chosen name must be different
from an assembler reserved word
avoid using ‘@’ as the first character since many
keywords start with it
When called from bcc32, the TASM32 assembler is
case sensitive for user-defined words but case
insensitive for the assembler reserved words
9
Integer Constants
Integer constants are made of numerical digits
with, possibly, a sign and a suffix. Ex:
-23 (a negative integer, base 10 is default)
1011b (a binary number)
1011 (a decimal number)
0A7Ch (an hexadecimal number)
A7Ch (this is the name of a variable, an
hexadecimal number must start with a decimal
digit)
10
Character and String Constants
They are any sequence of characters enclosed
either in single or double quotation marks.
Embedded quotes are permitted. Ex:
‘A’
‘ABC’
“Hello World!”
“123” (this is a string, not a number)
“This isn’t a test”
‘Say “hello” to him’
11
Simple Data Allocation Directives
The DB (define byte) directive allocates storage for
one or more byte values
[name] DB initval [,initval]
Each initializer can be any constant. Ex:
a db 10, 32, 41h ;allocate 3 bytes
b db 0Ah, 20h,‘A’;same values as above
A question mark (?) in the initializer leaves the initial
value of the variable undefined. Ex:
c db ?
;the initial value for c is
;undefined
Everything that follows “;” is ignored by the
assembler. It is thus a comment
12
Simple Data Allocation Directives (cont.)
A string is stored as a sequence of characters. Ex:
aString db “ABCD”
bString DB ‘A’,’B’,’C’,’D’;same values
cString db 41h,42h,43h,44h ;same values again
The (offset) address of a variable is the address of its first byte.
Ex: If the following data segment starts at address 0
.data
Var1 db “ABC”
Var2 db “DEFG”
13
The address of Var1 is 0 = the address of ‘A’
The address of ‘B’ is 1
The address of ‘C’ is 2
The address of Var2 is 3
The address of ‘E’ is 4 …
Simple Data Allocation Directives (cont.)
Define Word (DW) allocates a sequence of words.
Ex:
A dw 1234h, 5678h ; allocates 2 words
Intel’s x86 are little endian processors: the lowest
order byte (of a word or double word) is always
stored at the lowest address.
Ex: if variable A (above) is located at address 0, we
have:
address:
0
1
2
3
value:
34h 12h 78h 56h
14
Simple Data Allocation Directives (cont.)
Define Double Word (DD) allocates a sequence of double
words. Ex:
B dd 12345678h ;allocates 1 double word
If this variable is located at address of 0, we have:
address: 0
1
2
3
value:
78h
56h
34h
12h
If a value fits into a byte, it will be stored in the lowest
ordered byte available. Ex:
V dw ‘A’
the value will be stored as:
address: 0
1
value:
41h
00h
15
Simple Data Allocation Directives (cont.)
The DUP operator enables us to repeat values when
allocating storage. Ex:
a db 100 dup(?) ;100 bytes
;uninitialized
b db 3 dup(“Ho”) ;6 bytes: “HoHoHo”
DUP can be nested:
c db 2 dup(‘a’, 2 dup(‘b’))
;this allocates 6 bytes:‘abbabb’
DUP must be used with data allocation directives
There is a bug is some TASM32 versions:
b db 3 dup(“Ho”)
Will allocate 6 bytes that will be filled with 0 (i.e. the specified
initial values are ignored).
16
Constants
We can use the equal-sign (=) directive or the EQU
directive to give a name to a constant. Ex:
one = 1 ;this is a constant
two equ 2; also a constant
The EQU and = directives are equivalent
The assembler does not allocate storage to a
constant (in contrast with data allocation
directives)
It merely substitutes, at assembly time, the value
of the constant at each occurrence of the
assigned name
17
Constants (cont.)
In place of a constant, we can use a constant
expression involving the standard operators used
in HLLs: +, -, *, /
Ex: the following constant expression is evaluated
at assembly time and given a name at assembly
time:
A = (-3 * 8) + 2
A constant can be defined in terms of another
constant:
B = (A+2)/2
18
Exercise 1
Suppose that the following data segment starts at
address 0
.data
A DW 1,2
B DW 6ABCh
Z EQU 232
C DB 'ABCD'
19
A) Find the address of variable A.
B) Find the address of variable B.
C) Find the address of variable C.
D) Find the address of character ‘C’.
Data Transfer Instructions
The MOV instruction transfers the content of the source
operand to the destination operand
mov destination,source
This changes the content of destination (but not the content
of source)
Both operands must be of the same size.
An operand can be either direct or indirect
Direct operands (this chapter) are either:
Immediate (a constant): noted imm
Register: noted reg
Memory variable (with displacement), noted mem
Indirect operands are used for indirect addressing (later
chapter)
20
Data Transfer Instructions (cont.)
Some restrictions on MOV:
imm cannot be the destination operand...
EIP cannot be an operand
Source and destination cannot both be mem.
Direct memory-to-memory data transfer is
forbidden!
mov wordVar1,wordVar2; illegal
21
Data Transfer Instructions (cont.)
The type of an operand is given by its size (byte,
word, doubleword…)
Both operands of MOV must be of the same type
Type check is done by the assembler
The type assigned to a mem operand is given by
its data allocation directive (DB, DW…)
The type assigned to a register is given by its size
An imm source operand of MOV must fit into the
size of the destination operand
22
Data Transfer Instructions (cont.)
Examples of MOV usage:
mov bh, 255; 8-bit operands
mov al, 256; error: cst too large
mov bx, AwordVar; 16-bit operands
mov bx, AbyteVar; error: size mismatch
mov edx, AdoublewordVar;32-bit operands
mov cx, bl ; error: size mismatch
mov wordVar1,wordVar2 ;error: mem-to-mem
23
MOVZX: Move with Zero Extend
Often we want to move the content of a source operand into
a destination operand of larger size
The MOVZX instruction does this operation by filling with
zeros the high order part of the destination. Usage:
MOVZX destination,source
Immediate operands are not allowed here
The size of destination must be strictly larger than the size
of source
Example:
mov bh, 80h
movzx ah,bh
;illegal, size mismatch
movzx ax,bh
;AX = 0080h
movzx ecx,ax
;ECX = 00000080h
Notice that if the signed value in the source operand is
negative, then MOVZX will not preserve the sign.
mov bh, 80h
;BH = 80h (negative)
movzx ax,bh
;AX = 0080h (positive)
24
MOVSX: Move with Sign Extend
25
We can use the MOVSX instruction to preserve the sign of
the source operand. Usage:
MOVSX destination,source
The high order part of the destination operand will be the
sign extension of the source operand
The sign extension of a negative number is …111111
The sign extension of a positive number is …0000000
Examples:
mov bh, 80h
;BH = 80h (negative)
movsx ax,bh
;AX = FF80h (negative)
;FFh is the sign extension of 80h
mov bl, 7Ah
;BL = 7Ah (positive)
movsx ax,bl
;AX = 007Ah (positive)
;00h is the sign extension of 7Ah
MOVSX preserves the signed value whereas MOVZX
preserves the unsigned value
Immediate operands are not allowed and the size of
destination must be strictly larger than the size of source.
Data Transfer Instructions (cont.)
We can add a displacement to a memory operand to access a
memory value without a name Ex:
.data
arrB db 10h, 20h
arrW dw 1234h, 5678h
arrB+1 refers to the location one byte beyond the beginning of
arrB and arrW+2 refers to the location two bytes beyond the
beginning of arrW.
mov al,arrB
; AL = 10h
mov al,arrB+1 ;AL=20h (mem with displacement)
mov ax,arrW+2
; AX = 5678h
mov ax,arrW+1
; AX = 7812h
; little endian convention!
mov ax,arrW-2
; AX = 2010h negative
; displacement permitted
26
Data Transfer Instructions (cont.)
The XCHG instruction exchanges the content of
the source and destination operands:
XCHG destination,source
Only mem and reg operands are permitted (and
must be of the same size)
Both operands cannot be mem (direct mem-tomem exchange is forbidden).
To exchange the content of word1 and word2, we
have to do:
mov ax,word1
xchg word2,ax
mov word1,ax
27
Exercise 2
Given the following data segment
.data
A dw 1234h,-1
B dd 55h,66778899h
Indicate if the following instruction is legal. If it is, indicate
the value, in hexadecimal, of the destination operand
immediately after the instruction is executed (please verify
your answers with a debugger)
MOV eax,A
MOV bx,A+1
MOV bx,A+2
MOV dx,A+4
MOV cx,B+1
MOV edx,B+2
28
Simple Arithmetic Instructions
The ADD instruction adds the source to the
destination and stores the result in the
destination (source remains unchanged)
ADD destination,source
The SUB instruction subtracts the source from
the destination and stores the result in the
destination (source remains unchanged)
SUB destination,source
Both operands must be of the same size and
they cannot be both mem operands
Recall that to perform A - B the CPU in fact
performs A + NEG(B)
29
Simple Arithmetic Instructions (cont.)
ADD and SUB affect all the status flags according to the result
of the operation
ZF (zero flag) = 1 iff the result is zero
SF (sign flag) = 1 iff the msb of the result is one
OF (overflow flag) = 1 iff there is a signed overflow
CF (carry flag) = 1 iff there is an unsigned overflow
Signed overflow: when the operation generates an out-ofrange (erroneous) signed value
Unsigned overflow: when the operation generates an out-ofrange (erroneous) unsigned value
30
More on Overflows
A unsigned overflow occurs if and only if (IFF) the
unsigned value of the result does not fit into the
destination operand
This occurs IFF the unsigned interpretation of
the result is erroneous
It is signaled by CF=1
A signed overflow occurs IFF the signed value of
the result does not fit into the destination operand
This occurs IFF the signed interpretation of the
result is erroneous
It is signaled by OF=1
31
Simple Arithmetic Instructions (cont.)
Both types of overflow occur independently and are
signaled separately by CF and OF
mov
add
mov
add
mov
add
al, 0FFh
al,1
; AL=00h, OF=0, CF=1
al,7Fh
al, 1
; AL=80h, OF=1, CF=0
al,80h
al,80h ; AL=00h, OF=1, CF=1
Hence: we can have either type of overflow or both of
them at the same time
32
Overflow Example
mov ax,4000h
add ax,ax
;AX = 8000h
Unsigned Interpretation:
The sum of the 2 magnitudes 4000h + 4000h
gives 8000h. This is the result in AX (the
unsigned value of the result is correct). CF=0
Signed Interpretation:
we add two positive numbers: 4000h + 4000h
and have obtained a negative number!
the signed value of the result in AX is erroneous.
Hence OF=1
33
Overflow Example
mov ax,8000h
sub ax,0FFFFh
;AX = 8001h
Unsigned Interpretation:
from the magnitude 8000h we subtract the
larger magnitude FFFFh
the unsigned value of the result is erroneous.
Hence CF=1
Signed Interpretation:
We subtract -1 from the negative number 8000h
and obtained the correct signed result 8001h.
Hence OF=0
34
Overflow Example
mov ah,40h
sub ah,80h
;AH = C0h
Unsigned Interpretation:
we subtract from 40h the larger number 80h
the unsigned value of the result is wrong.
Hence CF=1
Signed Interpretation:
we subtract from 40h (64) a negative number 80h
(-128) to obtain a negative number
the signed value of the result is wrong. Hence
OF=1
35
Exercise 3
For each of these instructions, give the content (in
hexadecimal) of the destination operand and the
CF and OF flags immediately after the execution of
the instruction (verify your answers with a
debugger).
ADD AX,BX when AX contains 8000h and BX
contains FFFFh.
SUB AL,BL when AL contains 00h and BL contains
80h.
ADD AH,BH when AH contains 2Fh and BH
contains 52h.
SUB AX,BX when AX contains 0001h and BX
contains FFFFh.
36
Simple Arithmetic Instructions (cont.)
The INC (increment) and DEC (decrement)
instructions add 1 or subtracts 1 from a single
operand (mem or reg operand)
INC destination
DEC destination
They affect all status flags, except CF. Say that
initially we have, CF=OF=0
mov bh,0FFh
; CF=0, OF=0
inc bh
; bh=00h, CF=0, OF=0
mov bh,7Fh
; CF=0, OF=0
inc bh
; bh=80h, CF=0, OF=1
37
Simple Arithmetic Instructions (cont.)
The NEG instruction performs the twos
complement of its operand
NEG destination
Where destination is either mem or reg
CF=0 IFF the result is 0
OF=1 IFF there is a signed overflow. Ex:
mov ax,-5
neg ax; CF = 1, OF = 0
mov ax,8000h
neg ax; CF=1, OF=1 signed overflow!
38
I/O on the Win32 Console
Our programs will communicate with the user via the Win32
console (the MS-DOS box)
Input is done on the keyboard
Output is done on the screen
Modern OS like Windows forbids user programs to interact
directly with I/O hardware
User programs can only perform I/O operation via system
calls
For simplicity, our programs will perform I/O operations by
using macros that are provided in the csi2121.inc file
These macros are calling C libraries functions like printf()
which, in turn, are calling the Win32 API
Hence, these I/O operations will be slow but simple to use
and easy to migrate to another OS
We will examine the mechanisms involved in I/O operations
later in the course
39
Character Output
The putch macro prints on the screen the character of the
operand’s ASCII code. Usage:
putch source
Where source must be a 32-bit operand
i.e. either imm, reg32, or mem32 (a double word variable)
.data
aword dw 41h
adword dd 61h
.code
putch aword ;error: 16-bit operand
putch adword ;‘a’ is written on screen
putch ‘b’ ;’b’ is written on screen
mov eax,’c’
putch eax ;’c’ is written on screen
putch ax ;error: 16-bit operand
40
Character Output (cont.)
Also: the cursor will advance one position after
printing the character
The putch macro calls the putchar() function from
the C library. Hence:
The number 10 = 0Ah will direct the cursor to the
beginning of the next line (the “newline character”
in C). So the <CR> and <LF> functions are both
performed on the screen.
putch 10 ;move the cursor to the
;beginning of next line
41
String Output
To print a string, use the following macro:
putstr source
Where source must be mem operand (i.e. the name of a
variable). It cannot be a reg or imm operand.
This macro calls printf(“%s”, ) of the C library. Hence:
The number 10 = 0Ah will move the cursor to the beginning of
the next line (the “newline character” in C)
The string must be a “null terminating” string. The last
character must have ASCII code = 0h. Ex:
.data
msg db “hello”,0ah,“world”,0h
.code
putstr msg ;prints ‘hello’ on one line
;and ‘world’ on the next line
42
Integer Output
To print the signed value of an integer, use:
putint source
Where source must be a 32-bit operand
i.e. either imm, reg32, or mem32 (a double word variable) .
Ex:
.data
aword dw 243
adword dd -266
.code
putint aword ;error: 16-bit operand
putint adword ;-266 is written on screen
putint -1 ; -1 is written on screen
mov eax,0FFFFFFFFh
putint eax ;-1 is written on screen
putint ax ;error: 16-bit operand
43
Character Input
To read one or more character on the keyboard, we will use
the getch macro. Usage:
getch
This macro calls getchar() from the C library. So it uses a
memory buffer that we will call the input buffer.
Upon execution of getch, the input buffer is first examined.
If the input buffer is empty, then getch waits for the user to
enter an input line (a sequence of char ended by <CR>).
Each character that the user enters (at the keyboard) is
copied into the input buffer
When the user enters the <CR>: the screen cursor move to
the next line, the value 0Ah is stored in the input buffer and
the control is pass to the instruction following getch
The ASCII code of the first character entered on the keyboard
will be stored in AL. The remaining bits of EAX are filled with
zeros. Ex:
mov eax,-1
getch ; eax=41h if the user first hits ‘A’
44
Character Input (cont.)
Example: Suppose that the input buffer is initially empty
and, upon execution of getch, the users enters
“hello”+<CR> on the keyboard.
Then, when the control returns to the instruction following
getch, EAX contains 068h (= ‘h’) and the input buffer looks
like this:
‘h’
‘e’
Pointer to
next char
45
‘l’
‘l’
‘o’
0Ah
Pointer to
last char
If the input buffer is not empty when getch is executed, then
EAX will get loaded with the ASCII code of the next character
in the input buffer and the pointer to the next char will
increase by one.
The input buffer is empty only when the pointer to the next
char points beyond the last character (i.e: 0Ah)
The user is prompted only when the input buffer is empty
Character Input (example)
Try to understand this program
It first prints “?” and moves the cursor to
the next line awaiting user input
When the user enters “abcdef” +
<CR>, the program displays (before
exiting):
abc
.code
main:
putch '?'
putch 10
getch
putch eax
getch
putch eax
getch
putch eax
ret
But if, instead, the user enters “a” +
<CR>, the program displays:
a
and the cursor moves to the next line
awaiting user input. If the user then
enters “bcdef”+<CR>, the program
prints on the next line (before exiting):
b
46
.386
.model flat
include csi2121.inc
end
I/O Example: Case Conversion
.386
.model flat
include csi2121.inc
.data
msg1 db "Enter a lower case letter: ",0
msg2 db 'In upper case it is: '
char db ?,0
.code
main:
putstr msg1
getch
;char in eax and goto next line
sub al,20h ;converts to upper case
mov char,al
putstr msg2
ret
end
47