Transcript x86 ISA

x86 ISA
Compiler
Baojian Hua
[email protected]
Front End
source
code
lexical
analyzer
tokens
parser
abstract
syntax
tree
semantic
analyzer
IR
Code Generation

Before discussing code generation, we must
understand what we are trying to generate




virtual machines
bare architecture
…
This course uses x86


So you’d learn how to program at the x86 level
There is an online manual covering every details

relatively old, but enough for understanding Linux,
Windows, gcc, …
x86

Complex Instruction Set Computer (CISC)

Instructions can operate on memory values


Complex, multi-cycle instructions



e.g., string-copy, call
Many ways to do the same thing


e.g., add [eax], ebx
e.g., add eax,1 inc eax, sub eax,-1
Instructions are variable-length (1-10 bytes)
Registers are not orthogonal
Capsule History

1978, 8086


1985, 80386



MMX
2000, Pentium 4


32-bit, protected mode
1989, 80486
1993, Pentium


First x86 microprocessor, 16-bit
Deeply pipelined, high frequency
2006, Intel Core 2

Low power, multi-core
x86 ISA

Instruction Set Architecture

another programming language (instructions set)





different implementations


encoding
decoding
assemble, compile to
…
say Intel vs AMD
Basis for OS, compilers, etc.

hardware-software interface
x86 ISA

What’s important here?

OS and library


language syntax


Note: assembly program are NOT portable
another CFG, read the manual
assembler

directives etc. think “compiler”, read the gas
manual
OS and Library

OS simplifies programming model

e.g., Linux and Windows disable segmentation



the so-called “flat” model in the manual
so all segment-related details may be ignored when
reading the manual
OS provides protection mode

e.g., Linux and Windows run user programs on
ring3

so you cannot change the page table! etc.
OS and Library

OS provides system calls



hide many crazy details
but may be still annoying
Libraries


another level of indirection on top of OS
system calls
In particular, we’d use C library
Syntax


Syntax = data + instructions
Data

Immediate


4, 3.14, “hello”
Register

general-purpose


eax, ebx, …
segment

remember? we don’t care
Data

Memory

different usage:




globl
stack
heap
but same behavior
Data

Memory addressing mode


seg:[base+index*scale+disp]
any part can be null


complex! right?
e.g., int a[5][10], to read a[3][2]
mov
eax, 30
mov
ebx, 2
mov
ecx, [eax+ebx*4+a]
Problems with this strategy?
Instructions

Manual covers all instructions in details:





Data movement
Arithmetic
Control transfer
…
Rather than explain all these bit-by-bit,
I’ll give an example next
Assembler

Assembler is more than just a compiler:



it costumes assembly syntax
it also offers the so-called directives
Two main branches:

Intel syntax



assembler on Windows: masm, nasm, …
the Intel manual!
AT&T syntax

Linux assembler: gas



the good news is that recent version of gas supports Intel syntax!
the GCC output!
This course uses as with Intel syntax

So reading the Intel manual is relatively easy
Example
# Sum up an array of integers
comments start with “#”,
# compiled by GCC:
# $ gcc test.s
also supports C/C++ style
.intel_syntax noprefix
directive: telling that we
.data
directive: assemble
prefer Intel
thesyntax
following
data section
a:
label: the
currenttoaddress
.int 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
.globl main
.text
main:
directive: store 10 integers
start
from globl
the address
directive:
symbol“a”
directive: assemble the
following
text section
label:
anothertoaddress
Example, cont’
push ebp
mov
esp, ebp
# convention: eax: the sum, ebx: index
xor
eax, eax
mov
ebx, eax
L_start:
add
eax, dword ptr [ebx*4+a]
inc
ebx
comp ebx, 10
jl
L_start
leave
ret
Summary

Assembly programming is fun and
simple conceptually



but CISC architecture is …
and a compound knowledge of OS,
architecture and compiler
Read the online manual

Essential for code generation