x86 Assembly Intro

Download Report

Transcript x86 Assembly Intro

X86 Assembly Language
We will be using the nasm assembler
(other assemblers: MASM, as, gas)
Outline
• x86 registers
• x86 memory basics
• Introduction to x86 assembly
Why Assembly ?
• Assembly lets you write fast programs
– You can write programs that execute calculations at the maximum
hardware speed…
– …assuming that you know what you’re doing…
• Why not always use assembly ?
• Assembly is necessary for writing certain portions of system
software
– Compilers
– Operating systems
• Assembly is used to program embedded devices and DSPs
x86
• Intel/Microsoft’s monopoly: cost-effective
microprocessors for the mass market
• x86 refers to an instruction set rather than a specific
processor architecture, also known as ia32
• The processor core that implements the x86
instruction set has gone through substantial
modifications and additions over the last 20 years
• x86 is a Complex Instruction Set Computer
– 20,000+ instructions
– This course course will take you through some of them!
• amd64 instruction set has replaced x86 for new code
Registers
•
•
•
•
Registers are storage locations
The first-level of a computer’s memory hierarchy
The fastest to access storage in your system
Purposes
– Data used in arithmetic/logical operations
– Pointers to memory locations containing data or instructions
– Control information (e.g. outcome of arithmetic instructions,
outcome of instructions that change the control flow of a
program)
x86 Registers at a Glance
General Purpose (sort of)
Special Registers
AH
Accumulator
AL
AX
Index Registers
Instr Pointer
EAX
IP
EIP
BH
Base
BL
Flags
BX
Stack Pointer
FLAG
EFLAG
EBX
SP
ESP
Base Pointer
BP
EBP
Count
CH
CL
CX
Segment Registers
DH
CS
Code Segment
DS
Data Segment
ES
Extra Segment
SS
Stack Segment
DL
DX
EDX
DI
EDI
ECX
Data
Dest Index
FS
GS
Source Index
ESI
SI
General Purpose Registers
• Accumulator (AH,AL,AX,EAX)
– Accumulates results from mathematical calculations
• Base (BH,BL,BX,EBX)
– Points to memory locations
• Count (CL,CH,CX,ECX)
– Counter used typically for loops
– Can be automatically incremented/decremented
• Data (DL,DH,DX,EDX)
– Data used in calculations
– Most significant bits of a 32-bit mul/div operation
A note on GP registers
• In 80386 and newer processors GP registers can be
used with a great deal of flexibility…
• But you should remember that each GP register is
meant to be used for specific purposes…
• Memorizing the names of the registers will help you
understand how to use them
• Learning how to manage your registers will help you
develop good programming practices
• You will find that you are generally short of registers
Index Registers
• SP, ESP
– Stack pointer (more on that in upcoming lectures…)
• BP, EBP
– Address stack memory, used to access subroutine
arguments as a stack frame mechanism
• SI, ESI, DI, EDI
– Source/Destination registers
– Point to the starting address of a string/array
– Used to manipulate strings and similar data types
Segment Registers
• CS
– Points to the memory area where your program’s
instructions are stored
• DS
– Points to the memory area where your program’s data is
stored
• SS
– Points to the memory area where your stack is stored
• ES,FS,GS
– They can be used to point to additional data segments, if
necessary
Special Registers
• IP, EIP
– Instruction pointer, points always to the next instruction that
the processor is going to execute
– Only changed indirectly by branching instructions
• FLAG, EFLAG
– Flags register, contains individual bits set by different
operations (e.g. carry, overflow, zero)
– Used massively with branch instructions
x86 memory addressing modes
• Width of the address bus determines the amount of addressable
memory
• The amount of addressable memory is NOT the amount of
physical memory available in your system
• Real mode addressing
– A throwback to the age of 8086, 20-bit address bus, 16-bit data bus
– In real mode we can only address memory locations 0 through
0FFFFFh. Used only with 16-bit registers
– We will not be using real mode!
• Protected mode addressing
– 32-bit address bus, 32-bit data bus, 32-bit registers
– Up to 4 Gigabytes of addressable memory
– 80386 and higher operate in either real or protected mode
Real-mode addressing on the x86
• Memory address format Segment:Offset
• Linear address obtained by:
– Shifting segment left by 4 bits
– Adding offset
• Example: 2222:3333 Linear address: 25553
• Example: 2000:5553 Linear address: 25553
• THIS WILL NOT APPLY TO US IN 32-bit
PROTECTED MODE!
Assembly files
• Five simple things:
• Labels
– Variables are declared as labels pointing to specific memory
locations
– Labels mark the start of subroutines or locations to jump to
in your code
•
•
•
•
Instructions – cause machine code to be generated
Directives – affect the operation of the assembler
Comments
Data
Comments, comments, comments!
;
;
;
;
;
;
Comments are denoted by semi-colons.
Please comment your code thoroughly.
It helps me figure out what you were doing
It also helps you figure out what you were
doing when you look back at code you
wrote more than two minutes ago.
; everything from the semi-colon to the end
; of the line is ignored.
Labels
; Labels are local to your file/module
; unless you direct otherwise, the colon
; identifies a label (an address!)
MyLabel:
; to make it global we say
global
MyLabel
; And now the linker will see it
Example with simple instructions
var1:
str1:
var2:
dd
db
dd
0FFh
“my dog has fleas”,10
0
; Here are some simple instructions
mov
eax, [var1] ; notice the brackets
mov
edx, str1
; notice lack of brackets
call
dspmsg
jmp
done
mov
ebx, [var2] ; this will never happen
done:
nop