The Netwide Assembler: NASM

Download Report

Transcript The Netwide Assembler: NASM

Computer Architecture and
Assembly Language
Practical session outline
• Introduction
• 80x86 assembly
–Data storage
–The registers
–Flags
–Instructions
•Assignment 0
Introduction:
Administration:
- Background.
- Guy’s office hours:
Wednesday 16:00-18:00, room -105/58.
email: guyshat@cs….
- 4 practical assignments in the course, 1 theoretic.
Introduction:
Why assembly?
-Assembly is widely used in industry:
- Embedded systems.
- Real time systems.
- Low level and direct access to hardware
-Assembly is widely used not in industry:
-Cracking software protections:, patching, patch-loaders and emulators
(executable file compression, encryption, decryption)
-Hacking into computer systems: buffer under/overflows (worms and
trojans).
Byte structure:
byte has 8 bits
7
6
5
msb (most significant bit)
4
3
2
1
0
Data storage in memory:
NASM stores data using little endian order.
Little endian means that the low-order byte of the number is
stored in memory at the lowest address, and the high-order byte
at the highest address.
Example:
You want to store 0x1AB3 (hex number) in the memory.
This number has two bytes: 1A and B3.
It would be stored this way:
B3
0
1A
memory block
1
2
bytes of
memory
Note: when read a stored data from the memory, it comes in the source order.
Registers:
CPU contains a unit called “Register file”.
This unit contains the registers of the following types:
1. 8-bit general registers: AL, BL, CL, DL, AH, BH, CH, DH
2. 16- bit general registers: AX, BX, CX, DX, SP, BP, SI, Dl
3. 32-bit general registers: EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI
(Accumulator, Base, Counter, Data, Stack pointer, Base pointer,
Source index, Destination Index)
4. Segment registers: ES, CS ,SS, DS, FS, GS
5. Floating-point registers: ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7
6. instruction pointer: IP
Note: the registers above are the basic. There exist more registers.
Lets zoom in:
IP - instruction pointer:
contains offset (address) of the next instruction that is going to be
executed. Exists only in run time. Can’t be reached.
AX,BX,CX,DX - 16-bit general registers:
contains two 8-bit registers:
Example: AH,AL (for AX)
high byte
XH
low byte
XL
EAX - 32-bit general purpose register: lower 16 bits are AX.
segment registers: we use flat memory model – 32-bit 4Gb address
space, without segments. So for this course you can ignore
segment registers.
ST0 - floating-point registers: we use it to do calculations on
floating point numbers, you can ignore these registers.
ESP - stack pointer: contains the next free address on a stack.
Lets zoom in: (2)
. Some instructions use only specific registers.
Examples:
1. For DIV r/m8 instruction, AX is divided by the given operand;
the quotient is stored in AL and the remainder in AH.
2. LOOP imm,CX instruction uses CX register as a counter register.
3. LAHF instruction sets the AH register according to the contents of
the low byte of the flags word.
. We use ESP and EBP registers to work with stack.
Example for using registers:
instruction:
mov ax, 0
mov ah, 0x13
mov ax, 0x13
content of the register AX after the instruction execution:
AH
AL
00000000
00000000
AH
AL
00010011
AH
00000000
00000000
AL
00010011
Status Flags:
Flag is a bit (of the Flags Register).
. The status flags provide some information about the result of the last
(usually arithmetic) instruction that was executed.
This information can be used by conditional instructions (such a JUPMcc
and CMOVcc) as well as by some of the other instructions (such as
ADC).
There are 6 status flags:
CF - Carry flag: set if an arithmetic operation generates a carry or a
borrow out of the most-significant bit of the result; cleared otherwise.
This flag indicates an overflow condition for unsigned-integer arithmetic.
PF - Parity flag: set if the least-significant byte of the result contains an
even number of ‘1’ bits; cleared otherwise.
Status Flags (2):
AF - Adjust flag: set if an arithmetic operation generates a carry or a
borrow out of bit 3 of the result; cleared otherwise. This flag is used in
binary-coded decimal (BCD) arithmetic – not needed in our course.
ZF - Zero flag: set if the result is zero; cleared otherwise.
SF - Sign flag: set equal to the most-significant bit of the result, which is
the sign bit of a signed integer. (0 indicates a positive value and 1
indicates a negative value).
OF - Overflow flag: set if the integer result is too large a positive number
or too small a negative number (excluding the sign-bit) to fit in the
destination operand; cleared otherwise. This flag indicates an overflow
condition for signed-integer (two's complement) arithmetic.
Instructions on Flags Register:
We can’t reach Flags Register but there are few instructions that let
us get and set its value:
1. LAHF: set the AH register according to the contents of the low
byte of the flags word:
7
AH:
6
SF ZF
5
4
3
2
1
0
0
AF
0
PF
1
CF
2. SAHF: set the low byte of the flags word according to the contents
of the AH register.
3. SALC: set AL to zero if the carry flag is clear, or to 0xFF if it is set.
4. STC: sets the carry flag.
5. CLC: clears the carry flag.
Note: this is not a complete set of the flag instructions. You can find more in the
NASM tutorial.
Basic assembly instructions:
Each NASM standard source line contains a combination of the 4 fields:
label:
(pseudo) instruction
optional fields
operands
; comment
Either required or forbidden
by an instruction
Notes:
1. backslash (\) uses as the line continuation character: if a line ends with
backslash, the next line is considered to be a part of the backslash-ended line.
2. no restrictions on white space within a line.
3. a colon after a label is optional.
Examples:
1. mov ax, 2
; moves constant 2 to the register ax
2. buffer: resb 64 ; reserves 64 bytes
Instruction arguments
A typical instruction has 2 operands.
The left operand is the target operand, while the right operand is the source
operand
3 kinds of operands exists:
1. Immediate, i.e. a value
2. Register, such as AX,EBP,DL
3. Memory location; a variable or a pointer.
One should notice that the x86 processor does not allow
both operands be memory locations.
mov [var1],[var2]
Move instructions:
B.4.156: MOV – move data
mov r/m8,reg8
(copies content of 8-bit register (source) to 8-bit register or 8-bit memory unit
(destination) )
mov reg32,imm32
(copies content of 32-bit immediate (constant) to 32-bit register)
* for all the possible variants of operands look at NASM manual, B.4.156
- In all forms of the MOV instruction, the two operands are the same size
Examples:
mov EAX, 0x2334AAFF
mov word [buffer], ax
* Note: NASM don’t remember the types of variables you declare.
Whereas MASM will remember, on seeing var dw 0, that you declared var as a word-size
variable, and will then be able to fill in the ambiguity in the size of the instruction mov var,2,
NASM will deliberately remember nothing about the symbol var except where it begins, and so
you must explicitly code mov word [var],2.
Move instructions (2):
B.4.181 MOVSX, MOVZX: move data with sign or zero extend
movsx reg16,r/m8
(sign-extends its source (second) operand to the length of its destination (first)
operand, and copies the result into the destination operand)
movzx reg32,r/m8
(does the same, but zero-extends rather than sign-extending)
* for all the possible variants of operands look at NASM manual, B.4.181
Examples:
movsx EAX, AX (if AX has 10…0b value, EAX would have
111…1
100…0
value)
movzx EAX, BL (if AX has 10…0b value, EAX would have
000…0
100…0
value)
Basic arithmetical instructions:
B.4.3 ADD: add integers
add r/m16,imm16
(adds its two operands together, and leaves the result in its destination (first)
operand)
* for all the possible variants of operands look at NASM manual, B.4.3
Examples:
add AX, BX
B.4.2 ADC: add with carry
adc r/m16,imm8
(adds its two operands together, plus the value of the carry flag, and leaves
the result in its destination (first) operand)
* for all the possible variants of operands look at NASM manual, B.4.2•
Examples:
add AX, BX (AX gets a value of AX+BX+CF)
Basic arithmetical instructions (2):
B.4.305 SUB: subtract integers
sub reg16,r/m16
(subtracts its second operand from its first, and leaves the result in its destination
(first) operand)
* for all the possible variants of operands look at NASM manual, B.4.305
Examples:
sub AX, BX
B.4.285 SBB: subtract with borrow
sbb r/m16,imm8
(subtracts its second operand, plus the value of the carry flag, from its first,
and leaves the result in its destination (first) operand)
* for all the possible variants of operands look at NASM manual, B.4.285
Examples:
sbb AX, BX (AX gets a value of AX-BX-CF)
Basic arithmetical instructions (3):
B.4.120 INC: increment integer
inc r/m16
(adds 1 to its operand)
* does not affect the carry flag; affects all the other flags according to the result
* for all the possible variants of operands look at NASM manual, B.4.120
Examples:
inc AX
B.4.58 DEC: decrement integer
dec reg16
(subtracts 1 from its operand)
* does not affect the carry flag; affects all the other flags according to the result
* for all the possible variants of operands look at NASM manual, B.4.58
Examples:
dec byte [buffer]
Basic logical instructions:
B.4.189 NEG, NOT: two's and one's complement
neg r/m16
(replaces the contents of its operand by the two's complement negation - invert all
the bits, and then add one)
not r/m16
(performs one's complement negation- inverts all the bits)
* for all the possible variants of operands look at NASM manual, B.4.189
Examples:
neg AL (if AL = (11111110), it becomes (00000010))
not AL
(if AL = (11111110), it becomes (00000001))
Basic logical instructions (2):
B.4.191 OR: bitwise or
or r/m32,imm32
(each bit of the result is 1 if and only if at least one of the corresponding bits of the
two inputs was 1; stores the result in the destination (first) operand)
* for all the possible variants of operands look at NASM manual, B.4.191
Example:
or AL, BL (if AL = (11111100), BL= (00000010) => AL would be (11111110))
B.4.8 AND: bitwise and
and r/m32,imm32
(each bit of the result is 1 if and only if the corresponding bits of the two inputs were
both 1; stores the result in the destination (first) operand)
* for all the possible variants of operands look at NASM manual, B.4.8
Example:
and AL, BL (if AL = (11111100), BL= (00000010) => AL would be (11111100))
Compare instruction:
B.4.24 CMP: compare integers
cmp r/m32,imm8
(performs a ‘mental’ subtraction of its second operand from its first operand, and
affects the flags as if the subtraction had taken place, but does not store the result
of the subtraction anywhere)
* for all the possible variants of operands look at NASM manual, B.4.24
Example:
cmp AL, BL (if AL = (11111100), BL= (00000010) => ZF would be 1)
(if AL = (11111100), BL= (11111100) => ZF would be 0)
Labels definition (basic):
. Each instruction of the code has its offset (address from the beginning of
the address space).
. If we want to refer to the specific instruction in the code, we should mark it
with a label:
my_loop1:
add ax, ax
….
- label can be with or without colon
- an instruction that follows it can be at the same or the next line
- a code can’t contain two different non-local (as above) labels with the
same name
Loop definition:
B.4.142 LOOP, LOOPE, LOOPZ, LOOPNE, LOOPNZ: loop with counter
* for all the possible variants of operands look at NASM manual, B.4.142
Example:
mov ax, 1
mov cx, 3
my_ loop:
add ax, ax
loop my_ loop, cx
1. decrements its counter register
(in this case it is CX register)
2. if the counter does not become
zero as a result of this operation,
it jumps to the given label
Note: counter register can be either CX or ECX - if one is not specified explicitly,
the BITS setting dictates which is used.
LOOPE (or its synonym LOOPZ) adds the additional condition that it only jumps if the
counter is nonzero and the zero flag is set. Similarly, LOOPNE (and LOOPNZ) jumps only
if the counter is nonzero and the zero flag is clear.
Code ASCII
The standard ASCII code defines 128 character codes (from 0 to 127), of which,
the first 32 are control codes (non-printable), and the other 96 are representable
characters:
Example: the A character is located at the 4throw and the 1st column, for that it
would be represented in hexadecimal as 0x41.
Here you have an interactive Decimal-Hexadecimal-Octal-ASCII converter (at
the bottom of the page).
Assignment 0
• You get a simple program which prints the Nth element
of the Fibonacci series.
• Add a function written in assembly to the program:
– function_double: given 2 arguments: m and n, prints the
number: m * 2n (m, n > 0).
section .rodata
LC0:
DB
"the result is: %d", 10, 0
section .data
an_2: DD 0
an_1: DD 1
helper: DD 0
section .text
global function_fib
extern printf
function_fib:
push ebp
mov ebp, esp
pushad
mov ecx, dword [ebp+8]
label_here:
mov eax,[an_1]
mov ebx,[an_2]
add eax,ebx
mov [helper],eax
mov ebx,[an_1]
mov [an_2],ebx
mov ebx,[helper]
mov [an_1],ebx
loop label_here,ecx
mov
push
push
call
add
eax,[an_2]
eax
dword LC0
printf
dword esp,8
;;;;;;;;;;;;;;;;;;;;
popad
mov eax,[an_2]
mov esp, ebp
pop dword ebp
ret
; add necessary modifications
; to the various sections
function_double:
push ebp
mov ebp, esp
pushad
mov ecx, dword [ebp+8]
mov edx, dword [ebp+12]
;; insert your code here
push eax
push dword LC0
call printf
add dword esp,8
popad
mov esp, ebp
pop dword ebp
ret
Running NASM
To assemble a file, you issue a command of the form
> nasm -f <format> <filename> [-o <output>] [ -l listing]
Example:
> nasm -f elf mytry.s -o myelf.o
It would create myelf.o file that has elf format (executable and linkable format).
We use main.c file (that is written in C language) to start our program, and
sometimes also for input / output from a user. So to compile main.c with our
assembly file we should execute the following command:
> cc main.c myelf.o -o myexe.out –l mylist.lst
It would create executable file myexe.out and a listing file named mylist.lst.
In order to run it you should write its name on the command line:
> myexe.out