Transcript x86 ISA
x86 ISA
Compiler
Baojian Hua
[email protected]
Front End
source
code
lexical
analyzer
tokens
parser
abstract
syntax
tree
semantic
analyzer
IR
Code Generation
Before discussing code generation, we must
understand what we are trying to generate
virtual machines
bare architecture
…
This course uses x86
So you’d learn how to program at the x86 level
There is an online manual covering every details
relatively old, but enough for understanding Linux,
Windows, gcc, …
x86
Complex Instruction Set Computer (CISC)
Instructions can operate on memory values
Complex, multi-cycle instructions
e.g., string-copy, call
Many ways to do the same thing
e.g., add [eax], ebx
e.g., add eax,1 inc eax, sub eax,-1
Instructions are variable-length (1-10 bytes)
Registers are not orthogonal
Capsule History
1978, 8086
1985, 80386
MMX
2000, Pentium 4
32-bit, protected mode
1989, 80486
1993, Pentium
First x86 microprocessor, 16-bit
Deeply pipelined, high frequency
2006, Intel Core 2
Low power, multi-core
x86 ISA
Instruction Set Architecture
another programming language (instructions set)
different implementations
encoding
decoding
assemble, compile to
…
say Intel vs AMD
Basis for OS, compilers, etc.
hardware-software interface
x86 ISA
What’s important here?
OS and library
language syntax
Note: assembly program are NOT portable
another CFG, read the manual
assembler
directives etc. think “compiler”, read the gas
manual
OS and Library
OS simplifies programming model
e.g., Linux and Windows disable segmentation
the so-called “flat” model in the manual
so all segment-related details may be ignored when
reading the manual
OS provides protection mode
e.g., Linux and Windows run user programs on
ring3
so you cannot change the page table! etc.
OS and Library
OS provides system calls
hide many crazy details
but may be still annoying
Libraries
another level of indirection on top of OS
system calls
In particular, we’d use C library
Syntax
Syntax = data + instructions
Data
Immediate
4, 3.14, “hello”
Register
general-purpose
eax, ebx, …
segment
remember? we don’t care
Data
Memory
different usage:
globl
stack
heap
but same behavior
Data
Memory addressing mode
seg:[base+index*scale+disp]
any part can be null
complex! right?
e.g., int a[5][10], to read a[3][2]
mov
eax, 30
mov
ebx, 2
mov
ecx, [eax+ebx*4+a]
Problems with this strategy?
Instructions
Manual covers all instructions in details:
Data movement
Arithmetic
Control transfer
…
Rather than explain all these bit-by-bit,
I’ll give an example next
Assembler
Assembler is more than just a compiler:
it costumes assembly syntax
it also offers the so-called directives
Two main branches:
Intel syntax
assembler on Windows: masm, nasm, …
the Intel manual!
AT&T syntax
Linux assembler: gas
the good news is that recent version of gas supports Intel syntax!
the GCC output!
This course uses as with Intel syntax
So reading the Intel manual is relatively easy
Example
# Sum up an array of integers
comments start with “#”,
# compiled by GCC:
# $ gcc test.s
also supports C/C++ style
.intel_syntax noprefix
directive: telling that we
.data
directive: assemble
prefer Intel
thesyntax
following
data section
a:
label: the
currenttoaddress
.int 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
.globl main
.text
main:
directive: store 10 integers
start
from globl
the address
directive:
symbol“a”
directive: assemble the
following
text section
label:
anothertoaddress
Example, cont’
push ebp
mov
esp, ebp
# convention: eax: the sum, ebx: index
xor
eax, eax
mov
ebx, eax
L_start:
add
eax, dword ptr [ebx*4+a]
inc
ebx
comp ebx, 10
jl
L_start
leave
ret
Summary
Assembly programming is fun and
simple conceptually
but CISC architecture is …
and a compound knowledge of OS,
architecture and compiler
Read the online manual
Essential for code generation