Transcript Handout ()
Association of Computing
Machinery
Intro to Intel Assembly
Language
By Michael Kornbluh
[email protected]
March 27, 2003
Layout of Talk
•
•
•
•
•
•
•
•
•
•
Pros/Cons of Assembly
Intel vs. MIPS
The general-purpose registers
NASM instruction syntax
The Instructions Themselves
Interfacing C and Assembly
Comparisons
Optimizing
Demonstrations
Where to find more information
Pros of Programming In Assembly
Needed to do stuff not in a high-level
language, or that is processor
specific: e.g. disable interrupts.
You know exactly what the computer
is doing.
Learn ASM; learn your chip.
SPEED!
Cool tricks.
Ultra-tight code for time-critical sections
and slow processors.
Cons of Assembly
Takes so many lines of code to do quite a small
amount of work. (decreased productivity)
Can allow the most horrible spaghetti code
ever. Assembly code can get tangled in
ways that would make “GOTO” blush.
Annoying side-effects of commands (can
make debugging horrible)
Compilers are getting better every day,
and know the architecture.
ASM is processor specific
No error-checking. (e.g. type-checking)
Intel vs. MIPS
• Intel assembly has higher level stuff. E.g.
pushing is one command.
• Is RISC (like MIPS) faster?
– Each instruction completed faster
– But: more instructions to do things
• Does it matter? Intel is more widely
supported.
The Registers
Also, some segment registers, but you shouldn’t touch those.
Newer processors have such great innovations as floating
point registers, etc.
But we’re only talking about the basics today.
Commonly-Used Flags
Some parts of EFLAGS (the register that holds all flags):
• C: true if last math operation carried
• Z: true if last math operation gave a zero
• O: true if last math operation overflowed
• S: true if result of last operation was negative
• I: true if interrupts enabled
Flag commands:
• Stc: set carry flag to true
• Clc: clear carry flag (set to false)
• Similarly: sti, cli, etc.
NASM Instruction syntax (1)
• Just the name. e.g. “nop”
• Name, then argument. E.g. “call 1337”
• Destination, then source: e.g. “mov eax, 5”
means “let eax = 5”.
• Another example: Mov “esi, ebx” means
“let esi = ebx”
• Destination AND arg1, then arg2.
e.g. “add eax, edx” means “let eax = eax +
edx”. Thus, eax is an arg, and it is where
the result is stored. A lot of instructions do
this.
NASM Instruction syntax (2)
• For registers or numbers, just type them. E.g.
“add ecx, 5”
• For memory locations, put them in brackets:
E.g. “mov [72], eax” means “move the number in
eax into the variable at address 72.
• You can even put registers in brackets: “mov
[eax], bh” means “let the variable pointed to by
eax be loaded with the value in bh.
• You can’t access memory twice in one
instruction, so “mov [8], [3]” is illegal.
• Source and destination must be same size.
•
Advanced: e.g. “mov [eax+8*ebx+78], ecx” (but, let’s not worry about that
yet)
NASM Instruction syntax (3)
You must specify the size you’re transferring if it’s
not obvious to the assembler.
For example, “mov eax, ebx” is obviously moving
32-bits, because eax and ebx are 32-bits.
However, “mov [7], 3” is illegal, because the
variable at address 7 could be a byte, or
whatever.
So: byte = 8 bits, word = 16 bits, dword = 32 bits,
etc. (ones bigger than dword are not used quite
so often)
So, write “mov word [7], 3” or “mov dword [7], 3”,
depending on how many bits that variable is.
Instructions (general)
• Mov: copies value from second arg to first arg. E.g. “mov
eax, ebx” copies the value in ebx into eax. (mov should
really be called copy, since that’s what it does. All well.)
• Add: adds its two args together and stores answer in the
first one: “add ebx, ecx” means “let ebx = ebx + ecx”
• Sub: works just like add.
• Cmp: like sub, but doesn’t store the result anywhere.
(we’ll see why it’s still useful later)
• And: takes bitwise AND of both args, and stores answer
in first arg. So, “and eax, edx” means “let eax = eax &
edx” Or and xor work the same.
• Mul and div are complicated, so we’ll ignore them for
now.
• Push: push its argument onto the stack. E.g. “push eax”
• Pop: pop stuff off the stack into the argument. E.g. “pop eax”.
Instructions (control flow)
• Call: call a function. E.g. “call 500” calls
the function at memory address 500.
• Ret: return from function. works like
“return;”
• Jmp: like goto. “jmp 100” goes to 100.
• Conditional jumps: only jumps if a
condition is met. E.g. “jz 100” jumps only
if last instruction produced a (z)ero result.
• Use cmp and conditional jumps to do ifs
(as we’ll see later.)
Interfacing C and Assembly
(Calling a Function)
• Push arguments onto the stack from last to first, and get rid
of them later:
printf(stringPointer, 5, 7); becomes:
Push dword 7
Push dword 5
Push dword stringPointer
Call printf
Pop eax
Pop eax
Pop eax
The 3 “pop eax”s would probably be optimized to just add or
subtract ESP directly. Also, a register besides eax is fine.
Always keep stack “even”! (every push should have a pop)
There are a whole bunch of ways to pass an argument in
assembly; you’re not restricted to how C does it.
Interfacing C and Assembly
(Returning a value in a Function)
• Put the return value in EAX before
returning.
• Thus, “return 42;” becomes:
Mov eax, 42
Ret
Interfacing C and Assembly
(linking ASM and C)
The C side:
#include<iostream>
void asmFunc(); //prototype: it’s defined in the ASM file
int cFunc() {
return 84;
}
int main() {
cout << asmFunc(); //call asmFunc, which returns 91 in EAX.
}
The ASM side:
extern cFunc
global asmFunc
; means that cFunc is defined outside the ASM file
; means that other files can use the symbol asmFunc
asmFunc:
Call cFunc
Add eax, 7
Ret
; calls cFunc (in cfile.c), which returns 84 in EAX
; adds 7 to EAX, giving 91.
Nasm –felf asmfile.asm –o asmfile.o
Gcc –c cfile.c –o cfile.o
Gcc cfile.o asmfile.o –o finalfile.exe
Interfacing C and Assembly
(dropping ASM right into a C file)
• Much easier than dealing with linking ASM and
C. This is how the Linux kernel uses ASM.
• But, you have to use AT&T (not NASM) syntax.
• Just type asm(); with the instructions in the
parentheses.
e.g:
//this function returns 42:
int giveAnswer() {
asm(“movl $7, %eax”); //same as “mov eax, 7”
}
Comparison
• Use cmp and conditional jumps:
Cmp [someVariable], 3
Jne 200
This will CoMPare someVariable to 3.
Jne will jump to 200 if they’re not equal. (jne
= “Jump when Not Equal)
• This works with jg (greater than), jl (less
than), je (equal), etc.
Optimizing
• Use registers as much as possible.
• Use as few jumps as possible, since they
mess up the pipeline.
– For example, try to set up conditional jumps to
fall through more often than they jump.
• Don’t use archaic instructions like “loop”.
• Short instructions are good: less time
spent getting instructions from memory.
(But this rule doesn’t usually apply to
archaic instructions, which should still be
avoided.)
Demonstrations
• Example.
• TicTacToe.
Where to get more information
• http://webster.cs.ucr.edu/Page_asm/ArtOfAsm.html where I learned assembly.
• http://nasm.sourceforget.net – To download the NASM
assembler.
• http://developer.intel.com – For Intel’s official information.