Transcript PS01_142

Computer Architecture and
Assembly Language
Data Representation Basics
• Bit - the basic unit of information:
(true/false) or (1/0)
• Byte structure:
a byte has 8 bits
7
6
5
4
MSB (Most Significant Bit)
3
2
1
0
LSB (Least Significant Bit)
Registers:
CPU contains a unit called “Register file”.
This unit contains the registers of the
following types:
1. 8-bit general registers:
AL, BL, CL, DL, AH, BH, CH, DH
2. 16- bit general registers:
AX, BX, CX, DX, SP, BP, SI, Dl
3. 32-bit general registers:
EAX, EBX, ECX, EDX, ESP, EBP,
ESI, EDI (Accumulator, Base, Counter, Data, Stack pointer, Base pointer,
Source index, Destination Index)
4. Segment registers: ES, CS ,SS, DS, FS, GS
5. instruction pointer: EIP
Note: the registers above are a partial list. There are more registers.
EIP - instruction pointer:
contains offset (address) of the next instruction that is going to be
executed. Exists only during run time. The software change it by
performing unconditional jump, conditional jump, procedure call,
return.
AX,BX,CX,DX - 16-bit general registers:
contains two 8-bit registers:
Example: AH,AL (for AX)
high byte
XH
low byte
XL
EAX - 32-bit general purpose register: lower 16 bits are AX.
segment registers: we use a flat memory model – 32bit 4GB
address space, without segments. So for this course you can
ignore segment registers.
ESP - stack pointer: contains the address of last used dword in the
stack.
Assembly language program
•
written in assembly language consists of a series of processor instructions and
meta-statements, comments and data
•
translated by an assembler into machine language instructions (binary code)
that can be loaded into memory and executed
Example
assembly code:
MOV AL, 61h
; load AL with 97 decimal (61 hex)
binary code:
10110000 01100001
10110
000
01100001
a code of instruction 'MOV'
an identifier for a register 'AL'
97 decimal (61 hex)
The Netwide Assembler (NASM) is an assembler and for x86 architecture
Basic assembly instructions:
Each NASM standard source line contains a combination of the 4 fields:
label:
(pseudo) instruction
optional fields
operands
; comment
Either required or forbidden
by an instruction
Notes:
1. backslash (\) uses as the line continuation character: if a line ends with
backslash, the next line is considered to be a part of the backslash-ended line.
2. no restrictions on white space within a line.
3. a colon after a label is optional.
Examples:
1. mov ax, 2
; moves constant 2 to the register ax
2. buffer: resb 64 ; reserves 64 bytes
Instruction arguments
A typical instruction has 2 operands.
The left operand is the target operand, while the right operand is the source
operand
3 kinds of operands exists:
1. Immediate, i.e. a value
2. Register, such as AX,EBP,DL
3. Memory location; a variable or a pointer.
One should notice that the x86 processor does not allow
both operands be memory locations.
mov [var1],[var2]
Move instructions:
MOV – move data
mov r/m8,reg8
(copies content of 8-bit register (source) to 8-bit register or 8-bit memory unit
(destination) )
mov reg32,imm32
(copies content of 32-bit immediate (constant) to 32-bit register)
- In all forms of the MOV instruction, the two operands are the same size
Examples:
mov EAX, 0x2334AAFF
mov [buffer], ax
Note: NASM doesn’t remember the types of variables you declare. It will deliberately
remember nothing about the symbol var except where it begins, and so you must explicitly
code mov word [var], 2.
Basic arithmetical instructions:
ADD: add integers
add r/m16,imm16
(adds its two operands together, and leaves the result in its destination (first)
operand)
Examples:
add AX, BX
ADC: add with carry
adc r/m16,imm8
(adds its two operands together, plus the value of the carry flag, and leaves
the result in its destination (first) operand)
Examples:
adc AX, BX (AX gets a value of AX+BX+CF)
Basic arithmetical instructions (Cont.):
SUB: subtract integers
sub reg16,r/m16
(subtracts its second operand from its first, and leaves the result in its destination
(first) operand)
Examples:
sub AX, BX
SBB: subtract with borrow
sbb r/m16,imm8
(subtracts its second operand, plus the value of the carry flag, from its first,
and leaves the result in its destination (first) operand)
Examples:
sbb AX, BX (AX gets a value of AX-BX-CF)
Basic arithmetical instructions (Cont.):
INC: increment integer
inc r/m16
(adds 1 to its operand)
* does not affect the carry flag; affects all the other flags according to the result
Examples:
inc AX
DEC: decrement integer
dec reg16
(subtracts 1 from its operand)
* does not affect the carry flag; affects all the other flags according to the result
Examples:
dec byte [buffer]
Basic logical instructions:
NEG, NOT: two's and one's complement
neg r/m16
(replaces the contents of its operand by the two's complement negation invert all the bits, and then add one)
not r/m16
(performs one's complement negation- inverts all the bits)
Examples:
neg
AL ; (if AL = (11111110), it becomes (00000010))
11111110 + 00000010 = 100000000 = 0
not
AL ; (if AL = (11111110), it becomes (00000001))
Basic logical instructions (Cont.):
OR: bitwise or
or r/m32,imm32
(each bit of the result is 1 if and only if at least one of the corresponding bits of the
two inputs was 1; stores the result in the destination (first) operand)
Example:
or AL, BL (if AL = (11111100), BL= (00000010) => AL would be (11111110))
AND: bitwise and
and r/m32,imm32
(each bit of the result is 1 if and only if the corresponding bits of the two inputs were
both 1; stores the result in the destination (first) operand)
Example:
and AL, BL (if AL = (11111100), BL= (11000010) => AL would be (11000000))
Compare instruction:
CMP: compare integers
cmp r/m32,imm8
(performs a ‘mental’ subtraction of its second operand from its first operand, and
affects the flags as if the subtraction had taken place, but does not store the result
of the subtraction anywhere)
Example:
cmp AL, BL (if AL = (11111100), BL= (00000010) => ZF would be 0)
(if AL = (11111100), BL= (11111100) => ZF would be 1)
Labels definition (basic):
Each instruction of the code has its offset (address from the beginning of
the address space).
If we want to refer to the specific instruction in the code, we should mark it
with a label:
my_instruction:
add ax, ax
…
- label can be with or without colon
- an instruction that follows it can be at the same or the next line
- a code can’t contain two different non-local (as above) labels with the
same name
Unconditional Jump:
JMP: jump to instruction
Usually it takes the form:
jmp label
*see section B.4.130 JMP in the nasm manual for full specification
Tells the processor that the next instruction to be executed is located at
the label that is given as part of the instruction.
Example:
mov eax,1
inc_again:
inc eax
jmp inc_again
mov ebx,eax
…
; in this case it is infinite loop!
; never reached from this code
Conditional Jumps:
JE,JG, JL, JGE, JLE, JNE: jump to instruction if condition is satisfied
Usually it takes the form:
j<cond> label
*see section B.4.128 JMP in the nasm manual for full specification
Execution is transferred to the target instruction only if the specified condition is
satisfied. Usually, the condition being tested is the result of the last arithmetic or
logic operation.
Example:
read_char:
mov dl,0
...
(code for reading a character into AL)
...
cmp al, ‘a’
; compare the character to ‘a’
je a_received ; if equal, jump to a_received
inc cl
; otherwise, increment CL and
jmp read_char ;go back to read another
a_received:
…
DB, DW, DD : declaring initialized data
DB, DW, DD, DQ (DT, DDQ, and DO) are used to declare
initialized data in the output file. They can be invoked in a wide
range of ways:
db
db
db
db
dw
dw
dw
dw
dd
0x55
0x55,0x56,0x57
'a',0x55
'hello',13,10,'$‘
0x1234
'a'
'ab‘
'abc'
0x12345678
Example
var: dd
0
; just the byte 0x55
; three bytes in succession
; character constants are OK
; so are string constants
; 0x34 0x12
; 0x41 0x00 (it's just a number)
; 0x41 0x42 (character constant)
; 0x41 0x42 0x43 0x00 (string)
; 0x78 0x56 0x34 0x12 (dword)
; define variable ‘var’ of size dword,
initialized by 0
DT, DDQ, and DO : declaring initialized data
dq
ddq
do
0x1122334455667788
0x112233445566778899aabbccddeeff00
0x112233445566778899aabbccddeeff00
dd
dq
dt
1.234567e20
1.234567e20
1.234567e20
; 8 bytes
; 16 bytes
; 16 bytes
; floating-point constant
; double-precision float
; extended-precision float
Assignment 0
You get a simple program that receives a string from the user.
Then, it calls to a function (that you’ll implement in assembly) that receives
one string as an argument and should do the following:
1. Convert every letter uppercase letter to lower case later and every lowercase
letter to upper case latter.
2. Convert ‘(’ into ‘[’ and ‘[‘ into ‘{‘.
3. Convert ‘)’ into ‘]’ and ‘[‘ into ‘{‘.
4. Convert each digit n to a character which follows by n places in the ascii
table.
5. Count the number of characters which aren’t uppercase or lowercase letter.
e.g. “53: [heLL() WorLd]!" → “:6:{HEll[]wORlD}!“
Returns 8
The function shall return the number of characters which aren’t uppercase or
lowercase letter (the output should be just the number) .
The characters conversion should be in-place.
section .data
an:
DD 0
section .text
global do_str
extern printf
do_str:
; data section, read-write
; this is a temporary var
;
;
;
;
;
push
ebp
;
mov
ebp, esp
;
pushad
;
mov ecx, dword [ebp+8]
;
;;;;;;;;;;;;;;;; FUNCTION EFFECTIVE
mov dword [an], 0
label_here:
our code is always in the .text section
makes the function appear in global scope
tell linker that printf is defined elsewhere
(not used in the program)
functions are defined as labels
save Base Pointer (bp) original value
use base pointer to access stack contents
push all variables onto stack
get function argument
CODE STARTS HERE ;;;;;;;;;;;;;;;;
; initialize answer
; Your code goes somewhere around here...
inc
cmp
jnz
ecx
byte [ecx], 0
label_here
;;;;;;;;;;;;;;;;
popad
mov
mov
pop
ret
; increment pointer
; check if byte pointed to is zero
; keep looping until it is null terminated
FUNCTION EFFECTIVE CODE ENDS HERE ;;;;;;;;;;;;;;;;
; restore all previously used registers
eax,[an]
; return an (returned values are in eax)
esp, ebp
ebp
Running NASM
To assemble a file, you issue a command of the form
> nasm -f <format> <filename> [-o <output>] [ -l listing]
Example:
> nasm -f elf mytry.s -o myelf.o
It would create myelf.o file that has elf format (executable and linkable format).
We use main.c file (that is written in C language) to start our program, and
sometimes also for input / output from a user. So to compile main.c with our
assembly file we should execute the following command:
gcc –m32 main.c myelf.o -o myexe.out
The -m32 option is being used to comply with 32- bit environment
It would create executable file myexe.out.
In order to run it you should write its name on the command line:
> myexe.out