14_x86_Part3
Download
Report
Transcript 14_x86_Part3
x86 Programming
Memory Accessing Modes,
Characters, and Strings
Computer Architecture
Multi byte storage
• Multi-byte data types include:
– word/short (2 bytes)
– int (4 bytes)
– long or quad (8 bytes)
• Conceptual representation
– Most significant byte (MSB) is left most byte
– Least significant byte (LSB) is right most byte
– Example:
• Number: 0xaabb
• MSB: 0xaa
• LSB: 0xbb
• In memory representation (applicable only to multi byte storage)
– Big Endian
• MSB is stored at the lower memory address
– Little Endian
• MSB is stored at the higher memory address
Big vs. Little Endian
• Consider the integer: 0x11aa22bb
• Big Endian Storage
0x1000
0x1001
0x1002
0x1003
0x11
0xaa
0x22
0xbb
Memory Address
• Little Endian Storage (x86 architecture)
0x1000
0x1001
0x1002
0x1003
0xbb
0x22
0xaa
0x11
Memory Address
Characters
• Characters are simply represented using an
unsigned 8-bit (byte) numbers
– In memory as well as in instructions.
– The number is interpreted and displayed as
characters for Input-Output (I/O) purposes only!
– The mapping from byte values to character (as
displayed on screen) is based on the American
Standard Code for Information Interchange
(ASCII)
• It is used all over the world by all I/O devices
– Like: Monitors, keyboards, etc.
Standard ASCII Codes
• Here is a short table illustrating standard ASCII
codes that are frequently used:
Range of ASCII Codes
4810 to 5710
Range of Characters
‘0’ to ‘9’
6510 to 9010
‘A’ to ‘Z’
9710 to 12210
‘a’ to ‘z’
Characters in assembly
• Example assembly code with 5 characters
– Note that the characters stored at consecutive
memory addresses! It is guaranteed by the
assembler!
/* Assembly program involving characters */
.text
/* Instructions */
.data
char1: .byte 72
/* ASCII code for ‘H’ */
char2: .byte 101 /* ASCII code for ‘e’ */
char3: .byte 108 /* ASCII code for ‘l’ */
char4: .byte 108 /* ASCII code for ‘l’ */
char5: .byte 111 /* ASCII code for ‘o’ */
For the Java programmer…
• Assembler permits direct representation of
characters
– It converts characters to ASCII codes
/* Assembly program involving characters */
.text
/* Instructions */
.data
char1: .byte ’ H’ /* Assembler converts the */
char2: .byte ’e’ /* characters to ASCII
*/
char3: .byte ’l’
char4: .byte ’ l’
char5: .byte ’ o’
Memory organization
• Bytes declared consecutively in the assembly
source are stored at consecutive memory
locations
– Assume that the assembler places char1 (‘H’) at
address 0x20, then other characters have the
following memory addresses:
0x20 0x21 0x22 0x23 0x24
H
e
l
l
o
Addresses
Working with characters
• All characters (including other symbols) have
2 unique values associated with them
– The address in memory
• Accessed by prefixing the symbol with a $ (dollar) sign
• The memory address is always 32-bits (4 bytes) on 32bit x86 processors
– It is 64-bits wide on 64-bit x86 processors.
– The value contained in the memory location
• Accessed without any prefixes to the symbol.
• The bytes read depends on the type of the symbol
– 1 byte for byte, 4 bytes for int etc.
• This is exactly how we have been doing it so far.
Cross Check
• Given the following memory layout and
symbol table what are the values of:
– $letter: 0x20
– Yellow: ‘e’
– $k: 0x22
– e: ‘o’
Addresses of symbols
(expressions with a $ sign)
are obtained from the
symbol table while values of
symbols (expressions
without $ sign) are obtained
from the memory layout
shown below.
0x20 0x21 0x22 0x23 0x24
H
e
l
l
o
Symbol
Address
letter
0x20
Yellow
0x21
k
0x22
e
0x24
Address
Example assembly
/* Example use of characters */
.text
movb char1, %al
addb $1, %al
movb %al, char1
/* al = ASCII(‘H’) */
/* al = ASCII(‘I’) */
/* char1 = (‘I’)
*/
movl $char1, %ebx /* ebx = addressOf(char1) */
.data
char1: .byte ‘H’
What’s the use of addresses?
• Why bother loading addresses into registers?
– x86 permits indirect memory access and
manipulation using addresses stored in registers!
– A variety of mechanisms are supported by x86
processors for generating the final memory
address for retrieving data
• The variety of mechanism is collectively called memory
Addressing Modes
Addressing Modes
•
x86 supports the following addressing
modes
1.
2.
3.
4.
5.
6.
Register mode
Immediate mode
Direct mode
Register direct mode
Base displacement mode
Base-index scaled mode
Register mode
Instructions involving only registers
This is the simplest and fastest mechanism
Data is loaded and stored to registers.
In this mode, the processor does not access
RAM.
.text
movb %al, %ah
addl %eax, %ebx
mull %ebx
/* ah = al
*/
/* ebx += eax */
/* eax *= ebx */
Immediate mode
Instructions involving registers & constants
This mode is used to load constant values into
registers
The constant value to be loaded is encoded as a
part of the instruction.
Consequently, there is no real memory access
.text
movb $5, %ah
/* ah = 5
*/
addl $-35, %ebx /* ebx += -35 */
Direct Mode
• Standard mode used with symbols
– Address to load/store data is part of instruction
• Involves 1 memory access using the address
• Number of bytes loaded depends on type
• Symbols are used to represent addresses
– Source/Destination has to be a register!
.text
movb char1, %ah /* ah = ‘H’ */
addl %eax, i1
/* i1 += eax */
.data
char1: .byte ‘H’
i1:
.int
100
Register direct mode
• Address for memory references are obtained
from a register.
– The address needs to be loaded into a register.
• Addresses can be manipulated as a regular number!
.text
/* eax = addressOf(char1) */
movl $char1, %eax
movb (%eax), %bl /* bl = ‘H’ */
inc %eax
/* eax++
*/
movb %bl, (%eax) /* char2 = char1 */
.data
char1: .byte ‘H’
char2: .byte ‘e’
Register direct mode (Contd.)
• Register direct mode is most frequently used!
– It is analogous to accessing using references in
Java
– Note that one of the operands in register direct
mode has to be a register
– Pay attention to the following syntax
• $symbol: To obtain address of symbol
– Address is always 32-bits!
• (%register): Data stored at the memory address
contained in register.
– The number of bytes read from the given memory location
depends on the instruction.
Base Displacement Mode
• Constant offset from a given address stored
in a register
– Used to access parameters to a method
• We will see the use for this mode in the near future.
.text
/* eax = addressOf(char1) */
movl $char1, %eax Displacement value is
movb 1(%eax), %blconstant.
/* bl =The
char2
*/
base value
inc %eax
is contained in registers!
movb %bl, -1(%eax) /* char1 = char2 */
.data
char1: .byte ‘H’
char2: .byte ‘e’
Base-Index scaled Mode
• Most complex form of memory referencing
•
•
•
•
Involves a displacement constant
A base register
An index register
A scale factor (must be 0, 1, 2, 4, or 8)
– Final address for accessing memory is computed
as: address = base_register +
(index_register * scale_factor) +
displacement_constant
Base-Index scaled Mode
• Examples of this complex mode is shown
below:
.text
Address = %eax + (%ebx * 4) + 1
/* eax = addressOf(char1)
= %eax +*/
(0 * 4) + 1
movl $char1, %eax = %eax + 1
Address
%eax
+ (%ebx * 0) - 1
movl =$0,
%ebx
movb =1(%eax,
%eax + %ebx,
(1 * 0) 4),
- 1 %bl /*bl=char2*/
inc %eax
= %eax - 1
movl $1, %ebx
movb %bl, -1(%eax, %ebx, 0)
.data
char1: .byte ‘H’
char2: .byte ‘e’
LEA Instruction
• X86 architecture provides a special
instruction called LEA (Load Effective
Address)
– This instruction loads the effective address
resulting from applying various memory access
modes into a given register.
– Examples:
• LEA -1(%eax, %ebx, 0), %edi
• LEA (%eax, %ebx), %edi
• LEA -5(%eax), %edi
LEA Example (Contd.)
• Here is an example of the LEA instruction
.text
/* eax = addressOf(char1) */
movl $char1, %eax
movl $0, %ebx
lea 1(%eax, %ebx, 2), %edi
/*edi = address of char2*/
movb $’h’, (%edi) /* change ‘e’ to ‘h’*/
.data
char1: .byte ‘H’
char2: .byte ‘e’
Strings
• Strings are simply represented as a
sequence (or array) of characters in memory
– Each character is stored at a consecutive
memory address!
– Every string is terminated by ASCII value 0
• Represented as ‘\0’ in assembly source
Declaring Strings in Assembly
• Strings are defined using the .string directive
.text
/* Instructions go here */
.data
msg1: .string “Hello\n”
msg2: .string “World!\n”
Memory representation
• Given the previous example, the strings
(msg1 and msg2) are stored in memory as
shown below:
.text
22
20 21
23 go24here25
/* Instructions
*/
H .data
e
l
l
o
\n
msg1: .string “Hello\n”
msg1=20
msg2: .string “World!\n”
27
W
28
o
msg2=27
29
r
2A
l
2B
d
2C
!
26
\0
2D
\n
2E
\0
Displaying Strings
• Strings or characters can be displayed on standard
output (analogous to System.out) using System call:
– Set eax to 4
• To write characters to a file (stream)
• Changing eax to 3 will cause reading characters instead!
– Set ebx to 1
• Destination steam is standard output
• You may set ebx to 2 for standard error
• If ebx is 0 it indicates standard input (you can write to it!)
– Set ecx to address of message to display
– Set number of characters to display in edx
– Call int 0x80
Complete Example
/* Console output example */
text
.global _start
_start:
mov $4, %eax /* System call to write to a file handle */
mov $1, %ebx /* File handle=1 implies standard output */
mov $msg, %ecx /* Address of message to be displayed */
mov $14, %edx /* Number of bytes to be displayed
*/
int $0x80
/* Call OS to display the characters. */
mov $1,%eax
mov $0,%ebx
int $0x80
/* The system call for exit (sys_exit) */
/* Exit with return code of 0 (no error) */
Calculated value by hand!
Can be cumbersome for
.data
/* The data to be displayed */
large strings.
msg:
.string "Hello!\nWorld!\n"
Rewritten using Macro!
/* Console output example */
text
.global _start
_start:
mov $4, %eax /* System call to write to a file handle */
mov $1, %ebx /* File handle=1 implies standard output */
mov $msg, %ecx
/* Addressaofassembler
message to be
displayed len
*/
Compute
constant
mov $len, %edx /* Number of bytes to be displayed
*/
by
subtracting
address
of
msg
from
int $0x80
/* Call OS to display the characters. */
current address, represented by
/* The system
call for
exit (sys_exit)
*/
special
symbol
• (dot).
Every use
of
/* Exit with return code of 0 (no error) */
$msg is replaced with the resulting
constant value.
mov $1,%eax
mov $0,%ebx
int $0x80
.data
/* The data to be displayed */
msg: .string "Hello!\nWorld!\n“
.equ len, . - msg
Compute string length
• The previous examples use fixed length
strings
– For strings that change values or change lengths,
the string length must be computed using
suitable assembly code.
– The corresponding Java source is shown below:
public static int length(char[] str) {
int i;
for(i = 0; (str[i] != ‘\0’); i++);
return i;
}
Compute string length
_length:
/* Let eax correspond to i */
movl $0, %eax /* eax = 0 * /
/* Let ebx correspond to str */
movl $str, %ebx /* ebx = address(str) */
loop:
Base register
= ebx
cmpb $0,
(%ebx,
%eax) /* str[i] != ‘\0’
*/
Offset
register
= eax
je done
/* We have hit
the ‘\0’ in string
Displacement
(implicit)=
0 */
inc %eax
/* i++
*/
Scale
value
(implicit) = 1
jmp loop /* Continue the loop */
done: