M[R[E b ]+ R[E i ]*s]

Download Report

Transcript M[R[E b ]+ R[E i ]*s]

Machine-Level Representation of
Programs I
1
Outline
•
•
•
•
•
•
Compiler drivers
History of the Intel IA-32 architecture
Assembly code and object code
Memory and Registers
Addressing Mode
Data Formats
• Suggested reading
– Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1
2
The Hello Program
• It begins life as a high-level C program
– Can be read and understand by human beings
• The individual C statements must be
translated by compiler drivers
– So that the hello program can run on a computer
system
– Compiler:编译器
3
The Hello Program
• The C programs are translated into
– A sequence of low-level machine-language
instructions
• These instructions are then packaged in a
form
– called an object program
• Object program are stored as a binary disk
file
– Also referred to as executable object files
4
The Context of a Compiler (gcc)
Figure 1.3 P5
hello.c Source program (text)
Preprocessor (cpp)
hello.i Modified source program (text)
Compiler (cc1)
hello.s
Assembly program (text)
Assembler (as)
hello.o Relocatable object program (binary)
Linker (ld)
Compiler:编译器
Assembler:汇编器
Linker:连接器
hello Executable object program (binary)
5
Characteristics of the high level programming
languages
• Abstraction
– Productive
– reliable
• Type checking
• As efficient as hand written code
• Can be compiled and executed on a number of
different machines, whereas assembly code is highly
machine specific
Productive:多产的
Reliable: 可靠的
6
Characteristics of the assembly programming
languages
• Managing memory
• Low level instructions to carry out the
computation
• Highly machine specific
7
Why should we understand the assembly code
• Understand the optimization capabilities of
the compiler
• Analyze the underlying inefficiencies in the
code
• Sometimes the run-time behavior of a
program is needed
8
From writing assembly code to understand
assembly code
• Different set of skills
– Transformations
– Relation between source code and assembly code
• Reverse engineering
– Trying to understand the process by which a
system was created
• By studying the system and
• By working backward
Backward:回溯
9
A Historical Perspective
• Long evolutionary development
– Started from rather primitive 16-bit processors
– Added more features
• Take the advantage of the technology improvements
• Satisfy the demands for higher performance and for supporting
more advanced operating systems
– Laden with features providing backward compatibility that
are obsolete
* laden with:承载
* compatibility: 兼容性
* obsolete:陈旧的
10
X86 family
• 8086(1978, 29K)
– The heart of the IBM PC & DOS
– 1M bytes addressable, 640K for users
• 80286(1982, 134K)
– More (now obsolete) addressing modes
– Basis of the IBM PC-AT & Windows
11
X86 family
• i386(1985, 275K)
– 32 bits architecture, flat addressing model
– Support a Unix operating system
• I486(1989, 1.9M)
– Integrated the floating-point unit onto the
processor chip
12
X86 family
• Pentium(1993, 3.1M)
• PentiumPro(1995, 6.5M)
– P6 microarchitecture
– Conditional mov
• Pentium/MMX(1997, 4.5M)
– New class of instructions for manipulating vectors
of integers
13
X86 family
• Pentium II(1997, 7M)
– Implementing MMX instructions within P6
• Pentium III(1999, 8.2M)
– New class of instructions for manipulating vectors
of floating-point numbers(SSE, Stream SIMD
Extension)
14
X86 family
• Pentium 4(2001, 42M)
– Netburst microarchitecture
– 144 new SSE2 instructions
15
X86 family
• Advanced Micro Devices (AMD)
– Now are close competitors to Intel
– Developing own extension to 64-bits
16
X86 family
• Transmeta
– In January of 2002, introduced CrucoeTM processor
– Radically different approach to implementation
• Translates x86 code into “Very Long Instruction Word”
(VLIW) code
• High degree of parallelism
– Shooting for low-power market such as lap-top
computers
17
Hardware Organization
•CPU: Central Processing Unit
•ALU: Arithmetic/Logic Unit
•PC: Program Counter
•USB: Universal Serial Bus
Figure 1.4 P7
18
Virtual spaces
• A linear array of bytes
– each with its own unique address (array index)
starting at zero
0xffffffff
0xfffffffe
contents
addresses
… … … …
0x2
0x1
0x0
19
Data layout
• Object model in C
– Different data types can be declared
20
Data layout
• Object model in assembly
– A large, byte-addressable array
– No distinctions even between signed or unsigned
integers
– Code, user data, OS data
– Run-time stack for managing procedure call and
return
– Blocks of memory allocated by user
21
•Figure 1.13 22
P17
Operations in C constructs
• Arithmetic expression evaluation
• Loops
• Procedure calls and returns
• Translated into sequences of instructions
23
Operations in Assembly Instructions
• Performs only a very elementary operation
• Normally one by one in sequential
• Operate data stored in registers
• Transfer data between memory and a register
• Conditionally branch to a new instruction
address
24
Assembly Programmer’s View Figure 3.2 P136
%eax
%ah
%al
%edx
%dh
%dl
%ecx
%ch
%cl
%ebx
%bh
%bl
Addresses
FF
C0
BF
Stack
Data
%esi
%edi
Instructions
80
7F
Heap
%esp
%ebp
%eip
%eflag
40
3F
08
00
DLLs
Heap
Data
Text
25
Programmer-Visible States
P129
• Program Counter(%eip)
– Address of the next instruction
• Register File
– Heavily used program data
– Integer and floating-point
26
Programmer-Visible States
• Conditional code register
– Hold status information about the most recently
executed instruction
– Implement conditional changes in the control flow
27
Code Examples
P130
C code
int sum(int x, int y)
{
int t = x+y;
return t;
}
Obtain with command
_sum:
pushl %ebp
movl %esp,%ebp
movl 12(%ebp),%eax
addl 8(%ebp),%eax
movl %ebp,%esp
popl %ebp
ret
gcc –O2 -S code.c
Assembly file code.s
28
Code Examples
P131
55 89 e5 8b 45 0c
03 45 08 01 05 00
00 00 00 89 ec 5d
c3
Obtain with command
gcc –O2 -c code.c
Relocatable object file code.o
29
Code Examples
Obtain with command
objdump -d code.o
Disassembly output (P132 反汇编输出)
0x80483b4 <sum>:
0x80483b4
55
0x80483b5
89 e5
0x80483b7
8b 45 0c
0x80483ba
03 45 08
0x80483bd
01 05 00 00 00 00
0x80483c3
89 ec
0x80483c5
5d
0x80483c6
c3
push
mov
mov
add
mov
add
pop
ret
nop
%ebp
%esp,%ebp
0xc(%ebp),%eax
0x8(%ebp),%eax
%ebp,%esp
%eax, 0x0
%ebp
30
C Code
• Add two signed integers
• int t = x+y;
31
Assembly Code
• Operands:
– x:
– y:
– t:
Register
Memory
Register
%eax
M[%ebp+8]
%eax
• Instruction
– addl 8(%ebp),%eax
– Add 2 4-byte integers
– Similar to expression x +=y
• Return function value in %eax
32
Object Code
• 3-byte instruction
• Stored at address 0x80483b7
• 0x80483b7:
03 45 08
33
Operands
P137
• In high level languages
– Either constants (常数)
– Or variable (变量)
• Example
– A=A+4
constant
34
Operands
• Counterparts in assembly languages
– Immediate ( constant )
– Register ( variable )
– Memory ( variable )
memory
• Example
movl 8(%ebp), %eax
register
addl $4, %eax
immediate
35
Simple Addressing Mode
• Immediate
– represents a constant
– The format is $imm ($4, $0xffffffff)
• Registers
– The fastest storage units in computer systems
– Typically 32-bit long
– Register mode Ea
• The value stored in the register
• Noted as R[Ea]
36
Virtual spaces
• A linear array of bytes
– each with its own unique address (array index)
starting at zero
0xffffffff
0xfffffffe
contents
addresses
… … … …
0x2
0x1
0x0
37
Memory References
• The name of the array is annotated as M
• If addr is a memory address
• M[addr] is the content of the memory
starting at addr
• addr is used as an array index
• How many bytes are there in M[addr]?
– It depends on the context
38
Memory Addressing Mode
• An expression for
– a memory address (or an array index)
• Most general form
– imm (Eb, Ei, s)
– s: 1, 2, 4, 8
• The address represented by the above form
– imm + R[Eb] + R[Ei] * s
• It gives the value
– M[imm + R[Eb] + R[Ei] * s]
39
Addressing Mode
Type
Form
Figure 3.3 P137
Operand value
Name
Immediate $Imm
Imm
Immediate
Register
Ea
R[Ea]
Register
Memory
Imm
M[Imm]
Absolute
Memory
(Ea)
M[R[Ea]]
Indirect
Memory
Imm(Eb)
M[Imm+ R[Eb]]
Base+displacement
Memory
(Eb, Ei)
M[R[Eb]+ R[Ei]]
Indexed
Memory
Imm(Eb, Ei)
M[Imm+ R[Eb]+ R[Ei]]
Scaled indexed
Memory
(, Ei, s)
M[R[Ei]*s]
Scaled indexed
Memory
(Eb, Ei, s)
M[R[Eb]+ R[Ei]*s]
Scaled indexed
Memory
Imm(Eb, Ei, s) M[Imm+ R[Eb]+ R[Ei]*s]
Scaled indexed
40
Address
0x100
Value
0xFF
0x104
0xAB
0x108
0x13
0x10C
0x11
Operand
Register
%eax
%ecx
%edx
Value
0x100
0x1
0x3
•Practice problem 3.1 P138
Value
Comment
%eax
0x100
Register
(%eax)
0xFF
Address 0x100
$0x108
0x108
0x108
0x13
260(%ecx,%edx)
0x13
Address 0x108
(%eax,%edx,4)
0x11
Address 0x10C41
Immediate
Absolute address
Data Formats
Figure 3.1 P135
C declaration
Intel data type
char
short
int
unsigned
long int
unsigned long
char *
float
double
long double
Byte
Word
Double word
Double word
Double word
Double word
Double word
Single precision
Double precision
Extended precision
GAS suffix
b
w
l
l
l
l
l
s
l
t
Size (byte)
1
2
4
4
4
4
4
4
8
10/12
42
Data Formats
• Move data instruction
–
–
–
–
mov (general)
movb (move byte)
movw (move word)
movl (move double word)
43