Transcript 17Assembly2

Assembly Language
Part 2
Professor Jennifer Rexford
COS 217
1
Goals of Today’s Lecture
• Machine language
 Encoding the operation and the operands
 Simpler MIPS instruction set as an example
• More on IA32 assembly language
 Different sizes of data
 Example instructions
 Addressing modes
• Layout of assembly language program
2
Machine Language
Using MIPS Architecture as an Example
(since it has a simpler instruction set than IA32)
3
Three Levels of Languages
• High-level languages (e.g., Java and C)
 Easier programming by describing operations in a
natural language
 Increased portability of the code
• Assembly language (e.g., IA32 and MIPS)
 Tied to the specifics of the underlying machine
 Instructions and names to make code human readable
• Machine language
 Also tied to the specifics of the underlying machine
 In binary format the computer can read and execute
 Every instruction is a sequence of one or more numbers
4
Machine-Language Instructions
An ADD Instruction:
add r1 = r2 + r3
Opcode
(assembly)
Operands
Parts of the Instruction:
• Opcode (verb) – what operation to perform
• Operands (noun) – what to operate upon
• Source Operands – where values come from
• Destination Operand – where to deposit data values
Machine-Language Instruction
• Opcode
 What to do
• Source operand(s)




Immediate (in the instruction itself)
Register
Memory location
I/O port
• Destination operand
 Register
 Memory location
 I/O port
• Assembly syntax
Opcode source1, [source2,] destination
6
MIPS Has Three Kinds of 32-bit Instructions
• R: Registers
 Two source registers (rs and rt)
 One destination register (rd)
 E.g., “rd = rs + rt” or “rd = rs & rt” or “rd = rs xor rt”
op
Operation and
specific variant
rs
rd
rt
shamt funct
Shift amount
7
MIPS Has Three Kinds of 32-bit Instructions
• I: Immediate, transfer, branch






One source register (rs) and one 16-bit constant (imm)
One destination register (rd)
E.g., “rd = rs + imm” or “rd = rs & imm”
E.g., “rd = MEM[rs + imm]” (treating rs+imm as address)
E.g., “jump to address contained in rs” (rs as address)
E.g., “jump to word imm if rs is 0” (i.e., change instruction
pointer)
op
rs
rd
address/immediate
8
MIPS Has Three Kinds of 32-bit Instructions
• J: Jump
 One 28-bit constant (imm) for # of 32-bit words to jump
 E.g., “jump by imm words” (i.e., change the instruction
pointer)
op
target address
9
MIPS “Add” Instruction Encoding
Add registers 18 and 19, and store result in register 17.
add is an R inst
0
18
19
17
0
32
10
MIPS “Subtract” Instruction Encoding
Subtract register 19 from register 18 and store in register 17
sub is an R inst
0
18
19
17
0
34
11
Greater Detail on IA32 Assembly:
Instruction Set and Data Sizes
12
Earlier Example
count=0;
while (n>1) {
count++;
if (n&1)
n = n*3+1;
else
n = n/2;
}
movl
.loop:
cmpl
jle
addl
movl
andl
je
movl
addl
addl
addl
jmp
.else:
sarl
.endif:
jmp
.endloop:
n
%edx
count %ecx
$0, %ecx
$1, %edx
.endloop
$1, %ecx
%edx, %eax
$1, %eax
.else
%edx, %eax
%eax, %edx
%eax, %edx
$1, %edx
.endif
$1, %edx
.loop
13
Size of Variables
• Data types in high-level languages vary in size




Character: 1 byte
Short, int, and long: varies, depending on the computer
Pointers: typically 4 bytes
Struct: arbitrary size, depending on the elements
• Implications
 Need to be able to store and manipulate in multiple sizes
 Byte (1 byte), word (2 bytes), and extended (4 bytes)
 Separate assembly-language instructions
– e.g., addb, addw, addl
 Separate ways to access (parts of) a 4-byte register
14
Four-Byte Memory Words
31
24 23 16 15
87
232-1
0
.
.
.
Byte 7 Byte 6 Byte 5 Byte 4
Byte 3 Byte 2 Byte 1 Byte 0
Memory
0
Byte order is little endian
15
IA32 General Purpose Registers
31
15
87
AL
BL
CL
DL
AH
BH
CH
DH
SI
DI
0 16-bit
AX
BX
CX
DX
32-bit
EAX
EBX
ECX
EDX
ESI
EDI
General-purpose registers
16
Arithmetic Instructions
• Simple instructions






add{b,w,l} source, dest
sub{b,w,l} source, dest
Inc{b,w,l} dest
dec{b,w,l} dest
neg{b,w,l} dest
cmp{b,w,l} source1, source2
dest = source + dest
dest = dest – source
dest = dest + 1
dest = dest – 1
dest = ^dest
source2 – source1
• Multiply
 mul (unsigned) or imul (signed)
mull %ebx
# edx, eax = eax * ebx
• Divide
 div (unsigned) or idiv (signed)
idiv %ebx
# edx = edx,eax / ebx
• Many more in Intel manual (volume 2)
 adc, sbb, decimal arithmetic instructions
17
Bitwise Logic Instructions
• Simple instructions
and{b,w,l} source, dest
or{b,w,l} source, dest
xor{b,w,l} source, dest
not{b,w,l} dest
sal{b,w,l} source, dest (arithmetic)
sar{b,w,l} source, dest (arithmetic)
dest = source & dest
dest = source | dest
dest = source ^ dest
dest = ^dest
dest = dest << source
dest = dest >> source
• Many more in Intel Manual (volume 2)





Logic shift
Rotation shift
Bit scan
Bit test
Byte set on conditions
18
Branch Instructions
• Conditional jump
 j{l,g,e,ne,...} target
if (condition) {eip = target}
Comparison


>
Signed
e
ne
g
Unsigned
e
ne
a

<

ge
l
le
o
no
ae
b
be
c
nc
overflow/carry
no ovf/carry
“equal”
“not equal”
“greater,above”
“...-or-equal”
“less,below”
“...-or-equal”
• Unconditional jump
 jmp target
 jmp *register
19
Setting the EFLAGS Register
• Comparison cmpl compares two integers
 Done by subtracting the first number from the second
– Discarding the results, but setting the eflags register
 Example:
– cmpl $1, %edx
(computes %edx – 1)
– jle .endloop
(looks at the sign flag and the zero flag)
• Logical operation andl compares two integers
 Example:
– andl $1, %eax
– je .else
(bit-wise AND of %eax with 1)
(looks at the zero flag)
• Unconditional branch jmp
 Example:
– jmp .endif and jmp .loop
20
EFLAG Register & Condition Codes
31
Reserved (set to 0)
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I VI VI A V R 0 N IO
OD I T S Z 0 A 0 P 1 C
P
D P F CM F
T L F F F F F F
F
F
F
Identification flag
Virtual interrupt pending
Virtual interrupt flag
Alignment check
Virtual 8086 mode
Resume flag
Nested task flag
I/O privilege level
Overflow flag
Direction flag
Interrupt enable flag
Trap flag
Sign flag
Zero flag
Auxiliary carry flag or adjust flag
Parity flag
Carry flag
21
Data Transfer Instructions
• mov{b,w,l} source, dest
 General move instruction
• push{w,l} source
pushl %ebx
# equivalent instructions
subl $4, %esp
movl %ebx, (%esp)
esp
esp
• pop{w,l} dest
popl %ebx
# equivalent instructions
movl (%esp), %ebx
addl $4, %esp
esp
esp
• Many more in Intel manual (volume 2)
 Type conversion, conditional move, exchange, compare and
exchange, I/O port, string move, etc.
22
Greater Detail on IA32 Assembly:
Addressing Modes
23
Ways to Read and Write Data
• Processors have many ways to access data
 Known as “addressing modes”
• Two simplest ways (used in earlier example)
 Immediate addressing: movl $0, %ecx
– Data embedded in the instruction
– Initialize register ECX with zero
 Register addressing: movl %edx, %ecx
– Data stored in a register
– Copy value in register EDX into register ECX
• The others all deal with memory addresses
 To read and write data from main memory
 E.g., to get data from memory into a register
 E.g., to write data from a register back in to memory
24
Direct vs. Indirect Addressing
• Read or write from a particular memory location
 Essentially dereferencing a pointer
• Direct addressing: movl 2000, %ecx
 Address embedded in the instruction
 E.g., address 2000 corresponds to a global variable
 Load ECX register with the long located at address 2000
• Indirect addressing: movl (%eax), %ebx
 Address stored in a register
 E.g., EAX register is a pointer
 Load EBX register with long located at address in EAX
25
More Complex Addressing Modes
• Base pointer addressing: movl 4(%eax), %ebx




Extends indirect addressing by allowing an offset
E.g., add “4” to the register EAX to get the address
Allows access to a particular field in a structure
E.g., if “age” starts at the 4th byte of a record
• Indexed addressing: movl 2000(,%ecx,1), %ebx




Starts from a base address (e.g., 2000)
Adds an offset from a register (e.g., ECX)
With a multiplier of 1, 2, 4, or 8 (e.g., 1 to multiply by 1)
Allows register to be index for byte, word, or long array
26
Effective Address
eax
ebx
ecx
edx
esp
ebp
esi
edi
Offset =
Base
+
eax
ebx
ecx
edx
esp
ebp
esi
edi
Index
*
1
2
4
8
None
8-bit
+
16-bit
32-bit
scale displacement
• Displacement
movl foo, %ebx
• Base
movl (%eax), %ebx
• Base + displacement
movl foo(%eax), %ebx
movl 1(%eax), %ebx
• (Index * scale) + displacement
movl (,%eax,4), %ebx
• Base + (index * scale) + displacement movl foo(%edx,%eax,4),%ebx
27
Data Access Methods: Summary
• Immediate addressing: data stored in the instruction itself
 movl $10, %ecx
• Register addressing: data stored in a register
 movl %eax, %ecx
• Direct addressing: address stored in instruction
 movl 2000, %ecx
• Indirect addressing: address stored in a register
 movl (%eax), %ebx
• Base pointer addressing: includes an offset as well
 movl 4(%eax), %ebx
• Indexed addressing: instruction contains base address, and
specifies an index register and a multiplier (1, 2, 4, or 8)
 movl 2000(,%ecx,1), %ebx
28
Layout of an Assembly Language
Program
29
A Simple Assembly Program
.section .data
.section .text
# pre-initialized
.globl _start
# variables go here
_start:
# Program starts executing
.section .bss
# here
# variables go here
# Body of the program goes
# here
.section .rodata
# Program ends with an
# “exit()” system call
# pre-initialized
# to the operating system
# constants go here
movl $1, %eax
# zero-initialized
movl $0, %ebx
int $0x80
30
Main Parts of the Program
• Break program into sections (.section)
 Data, BSS, RoData, and Text
• Starting the program
 Making _start a global (.global _start)
– Tells the assembler to remember the symbol _start
– … because the linker will need it
 Identifying the start of the program (_start)
– Defines the value of the label _start
31
Main Parts of the Program
• Exiting the program
 Specifying the exit() system call (movl $1, %eax)
– Linux expects the system call number in EAX register
 Specifying the status code (movl $0, %ebx)
– Linux expects the status code in EBX register
 Interrupting the operating system (int $0x80)
32
Conclusions
• Machine code
 Binary representation of instructions
 What operation to do, and on what data
• IA32 instructions
 Manipulate bytes, words, or longs
 Numerous kinds of operations
 Wide variety of addressing modes
• Next time
 Calling functions, using the stack
33