Transcript 17Assembly2
Assembly Language
Part 2
Professor Jennifer Rexford
COS 217
1
Goals of Today’s Lecture
• Machine language
Encoding the operation and the operands
Simpler MIPS instruction set as an example
• More on IA32 assembly language
Different sizes of data
Example instructions
Addressing modes
• Layout of assembly language program
2
Machine Language
Using MIPS Architecture as an Example
(since it has a simpler instruction set than IA32)
3
Three Levels of Languages
• High-level languages (e.g., Java and C)
Easier programming by describing operations in a
natural language
Increased portability of the code
• Assembly language (e.g., IA32 and MIPS)
Tied to the specifics of the underlying machine
Instructions and names to make code human readable
• Machine language
Also tied to the specifics of the underlying machine
In binary format the computer can read and execute
Every instruction is a sequence of one or more numbers
4
Machine-Language Instructions
An ADD Instruction:
add r1 = r2 + r3
Opcode
(assembly)
Operands
Parts of the Instruction:
• Opcode (verb) – what operation to perform
• Operands (noun) – what to operate upon
• Source Operands – where values come from
• Destination Operand – where to deposit data values
Machine-Language Instruction
• Opcode
What to do
• Source operand(s)
Immediate (in the instruction itself)
Register
Memory location
I/O port
• Destination operand
Register
Memory location
I/O port
• Assembly syntax
Opcode source1, [source2,] destination
6
MIPS Has Three Kinds of 32-bit Instructions
• R: Registers
Two source registers (rs and rt)
One destination register (rd)
E.g., “rd = rs + rt” or “rd = rs & rt” or “rd = rs xor rt”
op
Operation and
specific variant
rs
rd
rt
shamt funct
Shift amount
7
MIPS Has Three Kinds of 32-bit Instructions
• I: Immediate, transfer, branch
One source register (rs) and one 16-bit constant (imm)
One destination register (rd)
E.g., “rd = rs + imm” or “rd = rs & imm”
E.g., “rd = MEM[rs + imm]” (treating rs+imm as address)
E.g., “jump to address contained in rs” (rs as address)
E.g., “jump to word imm if rs is 0” (i.e., change instruction
pointer)
op
rs
rd
address/immediate
8
MIPS Has Three Kinds of 32-bit Instructions
• J: Jump
One 28-bit constant (imm) for # of 32-bit words to jump
E.g., “jump by imm words” (i.e., change the instruction
pointer)
op
target address
9
MIPS “Add” Instruction Encoding
Add registers 18 and 19, and store result in register 17.
add is an R inst
0
18
19
17
0
32
10
MIPS “Subtract” Instruction Encoding
Subtract register 19 from register 18 and store in register 17
sub is an R inst
0
18
19
17
0
34
11
Greater Detail on IA32 Assembly:
Instruction Set and Data Sizes
12
Earlier Example
count=0;
while (n>1) {
count++;
if (n&1)
n = n*3+1;
else
n = n/2;
}
movl
.loop:
cmpl
jle
addl
movl
andl
je
movl
addl
addl
addl
jmp
.else:
sarl
.endif:
jmp
.endloop:
n
%edx
count %ecx
$0, %ecx
$1, %edx
.endloop
$1, %ecx
%edx, %eax
$1, %eax
.else
%edx, %eax
%eax, %edx
%eax, %edx
$1, %edx
.endif
$1, %edx
.loop
13
Size of Variables
• Data types in high-level languages vary in size
Character: 1 byte
Short, int, and long: varies, depending on the computer
Pointers: typically 4 bytes
Struct: arbitrary size, depending on the elements
• Implications
Need to be able to store and manipulate in multiple sizes
Byte (1 byte), word (2 bytes), and extended (4 bytes)
Separate assembly-language instructions
– e.g., addb, addw, addl
Separate ways to access (parts of) a 4-byte register
14
Four-Byte Memory Words
31
24 23 16 15
87
232-1
0
.
.
.
Byte 7 Byte 6 Byte 5 Byte 4
Byte 3 Byte 2 Byte 1 Byte 0
Memory
0
Byte order is little endian
15
IA32 General Purpose Registers
31
15
87
AL
BL
CL
DL
AH
BH
CH
DH
SI
DI
0 16-bit
AX
BX
CX
DX
32-bit
EAX
EBX
ECX
EDX
ESI
EDI
General-purpose registers
16
Arithmetic Instructions
• Simple instructions
add{b,w,l} source, dest
sub{b,w,l} source, dest
Inc{b,w,l} dest
dec{b,w,l} dest
neg{b,w,l} dest
cmp{b,w,l} source1, source2
dest = source + dest
dest = dest – source
dest = dest + 1
dest = dest – 1
dest = ^dest
source2 – source1
• Multiply
mul (unsigned) or imul (signed)
mull %ebx
# edx, eax = eax * ebx
• Divide
div (unsigned) or idiv (signed)
idiv %ebx
# edx = edx,eax / ebx
• Many more in Intel manual (volume 2)
adc, sbb, decimal arithmetic instructions
17
Bitwise Logic Instructions
• Simple instructions
and{b,w,l} source, dest
or{b,w,l} source, dest
xor{b,w,l} source, dest
not{b,w,l} dest
sal{b,w,l} source, dest (arithmetic)
sar{b,w,l} source, dest (arithmetic)
dest = source & dest
dest = source | dest
dest = source ^ dest
dest = ^dest
dest = dest << source
dest = dest >> source
• Many more in Intel Manual (volume 2)
Logic shift
Rotation shift
Bit scan
Bit test
Byte set on conditions
18
Branch Instructions
• Conditional jump
j{l,g,e,ne,...} target
if (condition) {eip = target}
Comparison
>
Signed
e
ne
g
Unsigned
e
ne
a
<
ge
l
le
o
no
ae
b
be
c
nc
overflow/carry
no ovf/carry
“equal”
“not equal”
“greater,above”
“...-or-equal”
“less,below”
“...-or-equal”
• Unconditional jump
jmp target
jmp *register
19
Setting the EFLAGS Register
• Comparison cmpl compares two integers
Done by subtracting the first number from the second
– Discarding the results, but setting the eflags register
Example:
– cmpl $1, %edx
(computes %edx – 1)
– jle .endloop
(looks at the sign flag and the zero flag)
• Logical operation andl compares two integers
Example:
– andl $1, %eax
– je .else
(bit-wise AND of %eax with 1)
(looks at the zero flag)
• Unconditional branch jmp
Example:
– jmp .endif and jmp .loop
20
EFLAG Register & Condition Codes
31
Reserved (set to 0)
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I VI VI A V R 0 N IO
OD I T S Z 0 A 0 P 1 C
P
D P F CM F
T L F F F F F F
F
F
F
Identification flag
Virtual interrupt pending
Virtual interrupt flag
Alignment check
Virtual 8086 mode
Resume flag
Nested task flag
I/O privilege level
Overflow flag
Direction flag
Interrupt enable flag
Trap flag
Sign flag
Zero flag
Auxiliary carry flag or adjust flag
Parity flag
Carry flag
21
Data Transfer Instructions
• mov{b,w,l} source, dest
General move instruction
• push{w,l} source
pushl %ebx
# equivalent instructions
subl $4, %esp
movl %ebx, (%esp)
esp
esp
• pop{w,l} dest
popl %ebx
# equivalent instructions
movl (%esp), %ebx
addl $4, %esp
esp
esp
• Many more in Intel manual (volume 2)
Type conversion, conditional move, exchange, compare and
exchange, I/O port, string move, etc.
22
Greater Detail on IA32 Assembly:
Addressing Modes
23
Ways to Read and Write Data
• Processors have many ways to access data
Known as “addressing modes”
• Two simplest ways (used in earlier example)
Immediate addressing: movl $0, %ecx
– Data embedded in the instruction
– Initialize register ECX with zero
Register addressing: movl %edx, %ecx
– Data stored in a register
– Copy value in register EDX into register ECX
• The others all deal with memory addresses
To read and write data from main memory
E.g., to get data from memory into a register
E.g., to write data from a register back in to memory
24
Direct vs. Indirect Addressing
• Read or write from a particular memory location
Essentially dereferencing a pointer
• Direct addressing: movl 2000, %ecx
Address embedded in the instruction
E.g., address 2000 corresponds to a global variable
Load ECX register with the long located at address 2000
• Indirect addressing: movl (%eax), %ebx
Address stored in a register
E.g., EAX register is a pointer
Load EBX register with long located at address in EAX
25
More Complex Addressing Modes
• Base pointer addressing: movl 4(%eax), %ebx
Extends indirect addressing by allowing an offset
E.g., add “4” to the register EAX to get the address
Allows access to a particular field in a structure
E.g., if “age” starts at the 4th byte of a record
• Indexed addressing: movl 2000(,%ecx,1), %ebx
Starts from a base address (e.g., 2000)
Adds an offset from a register (e.g., ECX)
With a multiplier of 1, 2, 4, or 8 (e.g., 1 to multiply by 1)
Allows register to be index for byte, word, or long array
26
Effective Address
eax
ebx
ecx
edx
esp
ebp
esi
edi
Offset =
Base
+
eax
ebx
ecx
edx
esp
ebp
esi
edi
Index
*
1
2
4
8
None
8-bit
+
16-bit
32-bit
scale displacement
• Displacement
movl foo, %ebx
• Base
movl (%eax), %ebx
• Base + displacement
movl foo(%eax), %ebx
movl 1(%eax), %ebx
• (Index * scale) + displacement
movl (,%eax,4), %ebx
• Base + (index * scale) + displacement movl foo(%edx,%eax,4),%ebx
27
Data Access Methods: Summary
• Immediate addressing: data stored in the instruction itself
movl $10, %ecx
• Register addressing: data stored in a register
movl %eax, %ecx
• Direct addressing: address stored in instruction
movl 2000, %ecx
• Indirect addressing: address stored in a register
movl (%eax), %ebx
• Base pointer addressing: includes an offset as well
movl 4(%eax), %ebx
• Indexed addressing: instruction contains base address, and
specifies an index register and a multiplier (1, 2, 4, or 8)
movl 2000(,%ecx,1), %ebx
28
Layout of an Assembly Language
Program
29
A Simple Assembly Program
.section .data
.section .text
# pre-initialized
.globl _start
# variables go here
_start:
# Program starts executing
.section .bss
# here
# variables go here
# Body of the program goes
# here
.section .rodata
# Program ends with an
# “exit()” system call
# pre-initialized
# to the operating system
# constants go here
movl $1, %eax
# zero-initialized
movl $0, %ebx
int $0x80
30
Main Parts of the Program
• Break program into sections (.section)
Data, BSS, RoData, and Text
• Starting the program
Making _start a global (.global _start)
– Tells the assembler to remember the symbol _start
– … because the linker will need it
Identifying the start of the program (_start)
– Defines the value of the label _start
31
Main Parts of the Program
• Exiting the program
Specifying the exit() system call (movl $1, %eax)
– Linux expects the system call number in EAX register
Specifying the status code (movl $0, %ebx)
– Linux expects the status code in EBX register
Interrupting the operating system (int $0x80)
32
Conclusions
• Machine code
Binary representation of instructions
What operation to do, and on what data
• IA32 instructions
Manipulate bytes, words, or longs
Numerous kinds of operations
Wide variety of addressing modes
• Next time
Calling functions, using the stack
33