14_x86_Part3

Download Report

Transcript 14_x86_Part3

x86 Programming
Memory Accessing Modes,
Characters, and Strings
Computer Architecture
Multi byte storage
• Multi-byte data types include:
– word/short (2 bytes)
– int (4 bytes)
– long or quad (8 bytes)
• Conceptual representation
– Most significant byte (MSB) is left most byte
– Least significant byte (LSB) is right most byte
– Example:
• Number: 0xaabb
• MSB: 0xaa
• LSB: 0xbb
• In memory representation (applicable only to multi byte storage)
– Big Endian
• MSB is stored at the lower memory address
– Little Endian
• MSB is stored at the higher memory address
Big vs. Little Endian
• Consider the integer: 0x11aa22bb
• Big Endian Storage
0x1000
0x1001
0x1002
0x1003
0x11
0xaa
0x22
0xbb
Memory Address
• Little Endian Storage (x86 architecture)
0x1000
0x1001
0x1002
0x1003
0xbb
0x22
0xaa
0x11
Memory Address
Characters
• Characters are simply represented using an
unsigned 8-bit (byte) numbers
– In memory as well as in instructions.
– The number is interpreted and displayed as
characters for Input-Output (I/O) purposes only!
– The mapping from byte values to character (as
displayed on screen) is based on the American
Standard Code for Information Interchange
(ASCII)
• It is used all over the world by all I/O devices
– Like: Monitors, keyboards, etc.
Standard ASCII Codes
• Here is a short table illustrating standard ASCII
codes that are frequently used:
Range of ASCII Codes
4810 to 5710
Range of Characters
‘0’ to ‘9’
6510 to 9010
‘A’ to ‘Z’
9710 to 12210
‘a’ to ‘z’
Characters in assembly
• Example assembly code with 5 characters
– Note that the characters stored at consecutive
memory addresses! It is guaranteed by the
assembler!
/* Assembly program involving characters */
.text
/* Instructions */
.data
char1: .byte 72
/* ASCII code for ‘H’ */
char2: .byte 101 /* ASCII code for ‘e’ */
char3: .byte 108 /* ASCII code for ‘l’ */
char4: .byte 108 /* ASCII code for ‘l’ */
char5: .byte 111 /* ASCII code for ‘o’ */
For the Java programmer…
• Assembler permits direct representation of
characters
– It converts characters to ASCII codes
/* Assembly program involving characters */
.text
/* Instructions */
.data
char1: .byte ’ H’ /* Assembler converts the */
char2: .byte ’e’ /* characters to ASCII
*/
char3: .byte ’l’
char4: .byte ’ l’
char5: .byte ’ o’
Memory organization
• Bytes declared consecutively in the assembly
source are stored at consecutive memory
locations
– Assume that the assembler places char1 (‘H’) at
address 0x20, then other characters have the
following memory addresses:
0x20 0x21 0x22 0x23 0x24
H
e
l
l
o
Addresses
Working with characters
• All characters (including other symbols) have
2 unique values associated with them
– The address in memory
• Accessed by prefixing the symbol with a $ (dollar) sign
• The memory address is always 32-bits (4 bytes) on 32bit x86 processors
– It is 64-bits wide on 64-bit x86 processors.
– The value contained in the memory location
• Accessed without any prefixes to the symbol.
• The bytes read depends on the type of the symbol
– 1 byte for byte, 4 bytes for int etc.
• This is exactly how we have been doing it so far.
Cross Check
• Given the following memory layout and
symbol table what are the values of:
– $letter: 0x20
– Yellow: ‘e’
– $k: 0x22
– e: ‘o’
Addresses of symbols
(expressions with a $ sign)
are obtained from the
symbol table while values of
symbols (expressions
without $ sign) are obtained
from the memory layout
shown below.
0x20 0x21 0x22 0x23 0x24
H
e
l
l
o
Symbol
Address
letter
0x20
Yellow
0x21
k
0x22
e
0x24
Address
Example assembly
/* Example use of characters */
.text
movb char1, %al
addb $1, %al
movb %al, char1
/* al = ASCII(‘H’) */
/* al = ASCII(‘I’) */
/* char1 = (‘I’)
*/
movl $char1, %ebx /* ebx = addressOf(char1) */
.data
char1: .byte ‘H’
What’s the use of addresses?
• Why bother loading addresses into registers?
– x86 permits indirect memory access and
manipulation using addresses stored in registers!
– A variety of mechanisms are supported by x86
processors for generating the final memory
address for retrieving data
• The variety of mechanism is collectively called memory
Addressing Modes
Addressing Modes
•
x86 supports the following addressing
modes
1.
2.
3.
4.
5.
6.
Register mode
Immediate mode
Direct mode
Register direct mode
Base displacement mode
Base-index scaled mode
Register mode
 Instructions involving only registers
 This is the simplest and fastest mechanism
 Data is loaded and stored to registers.
 In this mode, the processor does not access
RAM.
.text
movb %al, %ah
addl %eax, %ebx
mull %ebx
/* ah = al
*/
/* ebx += eax */
/* eax *= ebx */
Immediate mode
 Instructions involving registers & constants
 This mode is used to load constant values into
registers
 The constant value to be loaded is encoded as a
part of the instruction.
 Consequently, there is no real memory access
.text
movb $5, %ah
/* ah = 5
*/
addl $-35, %ebx /* ebx += -35 */
Direct Mode
• Standard mode used with symbols
– Address to load/store data is part of instruction
• Involves 1 memory access using the address
• Number of bytes loaded depends on type
• Symbols are used to represent addresses
– Source/Destination has to be a register!
.text
movb char1, %ah /* ah = ‘H’ */
addl %eax, i1
/* i1 += eax */
.data
char1: .byte ‘H’
i1:
.int
100
Register direct mode
• Address for memory references are obtained
from a register.
– The address needs to be loaded into a register.
• Addresses can be manipulated as a regular number!
.text
/* eax = addressOf(char1) */
movl $char1, %eax
movb (%eax), %bl /* bl = ‘H’ */
inc %eax
/* eax++
*/
movb %bl, (%eax) /* char2 = char1 */
.data
char1: .byte ‘H’
char2: .byte ‘e’
Register direct mode (Contd.)
• Register direct mode is most frequently used!
– It is analogous to accessing using references in
Java
– Note that one of the operands in register direct
mode has to be a register
– Pay attention to the following syntax
• $symbol: To obtain address of symbol
– Address is always 32-bits!
• (%register): Data stored at the memory address
contained in register.
– The number of bytes read from the given memory location
depends on the instruction.
Base Displacement Mode
• Constant offset from a given address stored
in a register
– Used to access parameters to a method
• We will see the use for this mode in the near future.
.text
/* eax = addressOf(char1) */
movl $char1, %eax Displacement value is
movb 1(%eax), %blconstant.
/* bl =The
char2
*/
base value
inc %eax
is contained in registers!
movb %bl, -1(%eax) /* char1 = char2 */
.data
char1: .byte ‘H’
char2: .byte ‘e’
Base-Index scaled Mode
• Most complex form of memory referencing
•
•
•
•
Involves a displacement constant
A base register
An index register
A scale factor (must be 0, 1, 2, 4, or 8)
– Final address for accessing memory is computed
as: address = base_register +
(index_register * scale_factor) +
displacement_constant
Base-Index scaled Mode
• Examples of this complex mode is shown
below:
.text
Address = %eax + (%ebx * 4) + 1
/* eax = addressOf(char1)
= %eax +*/
(0 * 4) + 1
movl $char1, %eax = %eax + 1
Address
%eax
+ (%ebx * 0) - 1
movl =$0,
%ebx
movb =1(%eax,
%eax + %ebx,
(1 * 0) 4),
- 1 %bl /*bl=char2*/
inc %eax
= %eax - 1
movl $1, %ebx
movb %bl, -1(%eax, %ebx, 0)
.data
char1: .byte ‘H’
char2: .byte ‘e’
LEA Instruction
• X86 architecture provides a special
instruction called LEA (Load Effective
Address)
– This instruction loads the effective address
resulting from applying various memory access
modes into a given register.
– Examples:
• LEA -1(%eax, %ebx, 0), %edi
• LEA (%eax, %ebx), %edi
• LEA -5(%eax), %edi
LEA Example (Contd.)
• Here is an example of the LEA instruction
.text
/* eax = addressOf(char1) */
movl $char1, %eax
movl $0, %ebx
lea 1(%eax, %ebx, 2), %edi
/*edi = address of char2*/
movb $’h’, (%edi) /* change ‘e’ to ‘h’*/
.data
char1: .byte ‘H’
char2: .byte ‘e’
Strings
• Strings are simply represented as a
sequence (or array) of characters in memory
– Each character is stored at a consecutive
memory address!
– Every string is terminated by ASCII value 0
• Represented as ‘\0’ in assembly source
Declaring Strings in Assembly
• Strings are defined using the .string directive
.text
/* Instructions go here */
.data
msg1: .string “Hello\n”
msg2: .string “World!\n”
Memory representation
• Given the previous example, the strings
(msg1 and msg2) are stored in memory as
shown below:
.text
22
20 21
23 go24here25
/* Instructions
*/
H .data
e
l
l
o
\n
msg1: .string “Hello\n”
msg1=20
msg2: .string “World!\n”
27
W
28
o
msg2=27
29
r
2A
l
2B
d
2C
!
26
\0
2D
\n
2E
\0
Displaying Strings
• Strings or characters can be displayed on standard
output (analogous to System.out) using System call:
– Set eax to 4
• To write characters to a file (stream)
• Changing eax to 3 will cause reading characters instead!
– Set ebx to 1
• Destination steam is standard output
• You may set ebx to 2 for standard error
• If ebx is 0 it indicates standard input (you can write to it!)
– Set ecx to address of message to display
– Set number of characters to display in edx
– Call int 0x80
Complete Example
/* Console output example */
text
.global _start
_start:
mov $4, %eax /* System call to write to a file handle */
mov $1, %ebx /* File handle=1 implies standard output */
mov $msg, %ecx /* Address of message to be displayed */
mov $14, %edx /* Number of bytes to be displayed
*/
int $0x80
/* Call OS to display the characters. */
mov $1,%eax
mov $0,%ebx
int $0x80
/* The system call for exit (sys_exit) */
/* Exit with return code of 0 (no error) */
Calculated value by hand!
Can be cumbersome for
.data
/* The data to be displayed */
large strings.
msg:
.string "Hello!\nWorld!\n"
Rewritten using Macro!
/* Console output example */
text
.global _start
_start:
mov $4, %eax /* System call to write to a file handle */
mov $1, %ebx /* File handle=1 implies standard output */
mov $msg, %ecx
/* Addressaofassembler
message to be
displayed len
*/
Compute
constant
mov $len, %edx /* Number of bytes to be displayed
*/
by
subtracting
address
of
msg
from
int $0x80
/* Call OS to display the characters. */
current address, represented by
/* The system
call for
exit (sys_exit)
*/
special
symbol
• (dot).
Every use
of
/* Exit with return code of 0 (no error) */
$msg is replaced with the resulting
constant value.
mov $1,%eax
mov $0,%ebx
int $0x80
.data
/* The data to be displayed */
msg: .string "Hello!\nWorld!\n“
.equ len, . - msg
Compute string length
• The previous examples use fixed length
strings
– For strings that change values or change lengths,
the string length must be computed using
suitable assembly code.
– The corresponding Java source is shown below:
public static int length(char[] str) {
int i;
for(i = 0; (str[i] != ‘\0’); i++);
return i;
}
Compute string length
_length:
/* Let eax correspond to i */
movl $0, %eax /* eax = 0 * /
/* Let ebx correspond to str */
movl $str, %ebx /* ebx = address(str) */
loop:
Base register
= ebx
cmpb $0,
(%ebx,
%eax) /* str[i] != ‘\0’
*/
Offset
register
= eax
je done
/* We have hit
the ‘\0’ in string
Displacement
(implicit)=
0 */
inc %eax
/* i++
*/
Scale
value
(implicit) = 1
jmp loop /* Continue the loop */
done: