Transcript Chapter 1

CS2422 Assembly Language and System Programming
IA-32 Processor
Architecture
Department of Computer Science
National Tsing Hua University
Assembly Language for IntelBased Computers, 5th Edition
CS2422 Assembly Language and System Programming
Kip Irvine
Chapter 2: IA-32 Processor
Architecture
Slides prepared by the author
Revision date: June 4, 2006
(c) Pearson Education, 2006-2007. All rights reserved. You may modify and copy this slide show for your personal use,
or for use in the classroom, as long as this copyright statement, the author's name, and the title are not changed.
Chapter Overview
Goal: Understand IA-32 architecture

Basic Concepts of Computer Organization








Instruction execution cycle
Basic computer organization
Data storage in memory
How programs run
IA-32 Processor Architecture
IA-32 Memory Management
Components of an IA-32 Microcomputer
Input-Output System
2
Recall: Computer Model for ASM
MOV AX, a
ADD AX, b
MOV x, AX
…
Register
CPU
PC
AX
BX
xa+b
Memory
a
010100110010101
b
110010110001010
x
000000000010010
...
ALU
+ 3
Meanings of the Code (assumed)
Assembly code
MOV AX, a
(Take the data stored in
memory address ‘a’, and
move it to register AX)
ADD AX, b
(Take the data stored in
memory address ‘b’, and
add it to register AX)
MOV x, AX
Machine code
01 0000001 1000010
MOV
register
address
AX
memory
address
a
11 0000001 1000110
ADD
01 1000011 0000001
(Take the data stored in
register AX, and move it to
memory address ‘x’)
4
Another Computer Model for ASM
Processor
AX
BX
…
x
a
data
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
ALU
IR
Memory Stored program
architecture
Register
b
MOV AX, a
ADD AX, b
MOV x, AX
…
PC
address
PC: program counter
IR: instruction register
5
Step 1: Fetch (MOV AX, a)
Register
AX
BX
…
Memory
x
a
data
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
IR
PC
01 0000001 1000010
0000111
address
6
Step 2: Decode (MOV AX,a)
Register
AX
BX
…
Memory
x
a
data
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
Controller
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
clock
IR
PC
01 0000001 1000010
0000111
address
7
Step 3: Execute (MOV AX,a)
Register
AX
BX
Memory
00000000 00000001
…
data
x
00000000 000000001 a
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
Controller
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
clock
IR
PC
01 0000001 1000010
0000111
address
8
Step 1: Fetch (ADD AX,b)
Register
AX
BX
Memory
00000000 00000001
…
x
a
data
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
IR
PC
11 0000001 1000110
0001000
address
9
Step 2: Decode (ADD AX,b)
Register
AX
BX
Memory
00000000 00000001
…
x
a
data
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
Controller
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
clock
IR
PC
11 0000001 1000110
0001000
address
10
Step 3a: Execute (ADD AX,b)
Register
AX
BX
Memory
00000000 00000001
…
x
a
data
00000000 00000010
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
Controller
+
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
00000000 00000011
clock
IR
PC
11 0000001 1000110
0001000
address
11
Step 3b: Write Back (ADD AX,b)
Register
AX
BX
Memory
00000000
00000000 00000011
00000001
…
x
a
data
01 0000001 1000010
11 0000001 1000110
01 1000011 0000001
Controller
ALU
b
MOV AX, a
ADD AX, b
MOV x, AX
…
00000000 00000011
clock
IR
PC
11 0000001 1000110
0001000
address
12
Basic Computer Organization



Clock synchronizes CPU operations
Control unit (CU) coordinates execution sequence
ALU performs arithmetic and bitwise processing
data bus
registers
Central Processor Unit
(CPU)
ALU
CU
Memory Storage
Unit
I/O
Device
#1
I/O
Device
#2
clock
control bus
address bus
13
Clock


Operations in a computer are triggered and thus
synchronized by a clock
Clock tells “when”: (no need to ask each other!!)



When to put data on output lines
When to read data from input lines
Clock cycle measures time of a single operation

Must long enough to allow signal propagation
one cycle
1
0
14
Instruction/Data for Operations


Where are the instructions needed for computer
operations from?
Stored-program architecture:




The whole program is stored in main memory,
including program instructions (code) and data
CPU loads the instructions and data from memory
for execution
Don’t worry about the disk for now
Where are the data needed for execution?



Registers (inside the CPU, discussed later)
Memory
Constant encoded in the instructions
15
Memory

Organized like mailboxes, numbered 0, 1, 2, 3,…,
2n-1.



Each box can hold 8 bits (1 byte)
So it is called byte-addressing
Address of mailboxes:



16-bit address is enough for up to 64K
20-bit for 1M
32-bit for 4G
Most servers need more than 4G!!
That’s why we need 64-bit CPUs
like Alpha (DEC/Compaq/HP) or
Merced (Intel)
…

16
Storing Data in Memory

Character String:





So how are strings like “Hello, World!” are stored
in memory?
ASCII Code! (or Unicode…etc.)
Each character is stored as a byte
Review: how is “1234” stored in memory?
Integer:

A byte can hold an integer number:
‒ between 0 and 255 (unsigned) or
‒ between –128 and 127 (2’s complement)


How to store a bigger number?
Review: how is 1234 stored in memory?
17
Big or Little Endian?




Example: 1234 is stored in 2 bytes.
= 100 1101 0010 in binary
= 04 D2 in hexadecimal
Do you store 04 or D2 first?
Big Endian: 04 first
Little Endian: D2 first Intel’s choice

Reason: more consistent for variable length (e.g.,
2 bytes, 4 bytes, 8 bytes…etc.)
18
Cache Memory

High-speed expensive static RAM both inside
and outside the CPU.




Level-1 cache: inside the CPU chip
Level-2 cache: often outside the CPU chip
Cache hit: when data to be read is already in
cache memory
Cache miss: when data to be read is not in cache
memory
19
How a Program Runs?
User
sends program
name to
Operating
system
gets starting
cluster from
searches for
program in
returns to
System
path
loads and
starts
Directory
entry
Current
directory
Program
20
Load and Execute Process







OS searches for program’s filename in current
directory and then in directory path
If found, OS reads information from directory
OS loads file into memory from disk
OS allocates memory for program information
OS executes a branch to cause CPU to execute
the program. A running program is called a
process
Process runs by itself. OS tracks execution and
responds to requests for resources
When the process ends, its handle is removed
and memory is released
How?
OS is only a program!
21
Multitasking




OS can run multiple programs at same time
Multiple threads of execution within the same
program
Scheduler utility assigns a given amount of CPU
time to each running program
Rapid switching of tasks


Gives illusion that all programs are running at the
same time
Processor must support task switching
What supports are needed from hardware?
22
What's Next


General Concepts
IA-32 Processor Architecture







Modes of operation
Basic execution environment
Floating-point unit
Intel microprocessor history
IA-32 Memory Management
Components of an IA-32 Microcomputer
Input-Output System
23
Modes of Operation

Protected mode



Real-address mode


native mode (Windows, Linux)
Programs are given separate memory areas
named segments
native MS-DOS
System management mode

power management, system security, diagnostics
• Virtual-8086 mode
 hybrid of Protected
 each program has its own 8086 computer
24
Basic Execution Environment
Address space:
 Protected mode



4 GB
32-bit address
Real-address and Virtual-8086 modes


1 MB space
20-bit address
25
Basic Execution Environment

Program execution registers: named storage
locations inside the CPU, optimized for speed
32-bit General-Purpose Registers
EAX
EBP
EBX
ESP
ECX
ESIRegister
EDX
EDI
Memory
16-bit
Segment ALU
Registers
Controller
EFLAGS
CS
SS
clock
EIP
DS
N
Z
IR
ES
PCFS
GS
26
General Purpose Registers


Used for arithmetic and data movement
Addressing:

AX, BX, CX, DX: 16 bits

Split into H and L parts, 8 bits each
Extended into E?X to become 32-bit register (i.e.,
EAX, EBX,…etc.)

27
Index and Base Registers

Some registers have only a 16-bit name for their
lower half:
28
Some Specialized Register Uses

General purpose registers






EAX: accumulator, automatically used by
multiplication and division instructions
ECX: loop counter
ESP: stack pointer
ESI, EDI: index registers (source, destination) for
memory transfer, e.g. a[i,j]
EBP: frame pointer to reference function
parameters and local variables on stack
EIP: instruction pointer (i.e. program counter)
29
Some Specialized Register Uses

Segment registers







In real-address mode: indicate base addresses of
preassigned memory areas named segments
In protected mode: hold pointers to segment
descriptor tables
CS: code segment
DS: data segment
SS: stack segment
ES, FS, GS: additional segments
EFLAGS


Status and control flags (single binary bits)
Control the operation of the CPU or reflect the
outcome of some CPU operation
30
Status Flags (EFLAGS)
Reflect the outcomes of arithmetic and logical
operations performed by the CPU
 Carry: unsigned arithmetic out of range
 Overflow: signed arithmetic out of range
 Sign: result is negative
 Zero: result is zero
 Auxiliary Carry: carry from bit 3 to bit
4
Register
 Parity: sum of 1 bits is an even number
Memory
Controller
clock
ALU
N
Z
IR
PC
31
System Registers
Application programs cannot access system
registers
 IDTR (Interrupt Descriptor Table Register)
 GDTR (Global Descriptor Table Register)
 LDTR (Local Descriptor Table Register)
 Task Register
 Debug Registers
 Control registers CR0, CR2, CR3, CR4
 Model-Specific Registers
32
Floating-Point, MMX, XMM Reg.

Eight 80-bit floating-point data registers





ST(0), ST(1), . . . , ST(7)
80-bit Data Registers
arranged in a stack
ST(0)
used for all floating-point arithmetic
Eight 64-bit MMX registers
Eight 128-bit XMM registers for
single-instruction multiple-data
(SIMD) operations
ST(1)
ST(2)
ST(3)
ST(4)
ST(5)
ST(6)
ST(7)
33
Opcode Register
Intel Microprocessors

Early microprocessors:

Intel 8080:
‒ 64K addressable RAM, 8-bit registers
‒ CP/M operating system
‒ S-100 BUS architecture
‒ 8-inch floppy disks!

Intel 8086/8088
‒ IBM-PC used 8088
‒ 1 MB addressable RAM, 16-bit registers
‒ 16-bit data bus (8-bit for 8088)
‒ separate floating-point unit (8087)
This is where “real-address mode” comes from!
34
Intel Microprocessors

The IBM-AT Intel

80286
‒ 16 MB addressable RAM
‒ Protected memory
‒ Introduced IDE bus architecture
‒ 80287 floating point unit

Intel IA-32 Family



Intel386: 4 GB addressable RAM, 32-bit registers,
paging (virtual memory)
Intel486: instruction pipelining
Pentium: superscalar, 32-bit address bus, 64-bit
internal data path
35
Intel Microprocessors

Intel P6 Family




Pentium Pro: advanced optimization techniques in
microcode
Pentium II: MMX (multimedia) instruction set
Pentium III: SIMD (streaming extensions)
instructions
Pentium 4 and Xeon: Intel NetBurst microarchitecture, tuned for multimedia
36
What’s Next



General Concepts of Computer Architecture
IA-32 Processor Architecture
IA-32 Memory Management







Real-address mode
Calculating linear addresses Understand it from
Protected mode
the view point of
the processor
Multi-segment model
Paging
Components of an IA-32 Microcomputer
Input-Output System
37
Real-Address Mode




Programs have 1 MB RAM maximum
addressable with 20-bit addresses
Application programs can access any area of the
1MB memory
Single tasking: one program at a time, but CPU
can momentarily interrupt that program to
process requests (called interrupts) from
peripherals
MS-DOS runs in real-address mode
38
Ancient History




IBM PC XT (Intel 8088/8086) is a so-called 16-bit
machine
Each register has 16 bits
216 = 65536 = 64K
Want to use more memory (640K, 1M)…


How to hold 20-bit addresses with 16-bit registers
in the 8086 processor?
Solution: segmented memory


All of memory is divided into 64KB units called
segments  16 segments in total
One 16-bit register for segment value and another
for 16-bit offset within the segment
39
Segmented Memory

Segmented memory addressing: absolute (linear) address
is a combination of a 16-bit segment value (in CS, DS, SS,
or ES) added to a 16-bit offset
F0000
E0000
8000:FFFF
D0000
C0000
B0000
A0000
90000
80000
one segment
70000
60000
50000
40000
30000
represented
as
8000:0250
0250
8000:0000
20000
8000
0000
10000
seg
segment
value
ofs
offset
00000
40
Calculating Linear Addresses


Given a segment address, multiply it by 16 (add
a hexadecimal zero), and add it to the offset
 all done by the processor
Example:
convert 08F1:0100 to a linear address
Adjusted Segment value: 0 8 F 1 0
Add the offset:
0 1 0 0
Linear address:
0 9 0 1 0
41
Protected Mode



Designed for multitasking
Each process (running program) is assigned a
total of 4GB of addressable RAM
Two parts:


Segmentation: provides a mechanism of isolating
individual code, data, and stack so that multiple
programs can run without interfering one another
Paging: provides demand-paged virtual memory
where sections of a program’s execution environ.
are moved into physical memory as needed
 Give segmentation the illusion that it has 4GB
of physical memory
42
Segmentation in Protected Mode

Segment: a logical unit of storage (not the same
as the “segment” in real-address mode)





e.g., code/data/stack of a program, system data
structures
Variable size
Processor hardware provides protection
All segments in the system are in the processor’s
linear address space (physical space if without
paging)
Need to specify: base address, size, type, …
 segment descriptor & descriptor table
 linear address = base address + offset
43
Flat Segment Model


Use a single global descriptor table (GDT)
All segments (at least 1 code and 1 data)
mapped to entire 32-bit address space
not used
Segment descriptor, in the
Global Descriptor Table
FFFFFFFF
(4GB)
00040000
limit
access
00000000
00040
----
physical RAM
base address
00000000
44
Multi-Segment Model

Local descriptor table (LDT) for each program

One descriptor for each segment
located in a
system
segment of
LDT type
RAM
Local Descriptor Table
26000
base
limit
00026000
0010
00008000
000A
00003000
0002
access
8000
300045
Segmentation Addressing

Program references a memory location with a
logical address: segment selector + offset


Segment selector: provides an offset into the
descriptor table
CS/DS/SS points to descriptor table for
code/data/stack segment
46
Convert Logical to Linear Address
Segment selector
points to a segment
descriptor, which
contains base
address of the
segment.
The 32-bit offset
from the logical
address is added to
the segment’s base
address, generating
a 32-bit linear
address
Logical address
Selector
Offset
Descriptor table
Segment Descriptor
+
GDTR/LDTR
Linear address
(contains base address of
descriptor table)
47
Paging



Supported directly by the processor
Divides each segment into 4096-byte blocks
called pages
Part of running program is in memory, part is on
disk



Sum of all programs can be larger than physical
memory
Virtual memory manager (VMM): An OS utility
that manages loading and unloading of pages
Page fault: issued by processor when a page
must be loaded from disk
48
What's Next




General Concepts
IA-32 Processor Architecture
IA-32 Memory Management
Components of an IA-32 Microcomputer


Skipped …
Input-Output System
49
What's Next





General Concepts
IA-32 Processor Architecture
IA-32 Memory Management
Components of an IA-32 Microcomputer
Input-Output System

How to access I/O systems?
50
Different Access Levels of I/O

Call a HLL library function (C++, Java)



Call an operating system function



specific to one OS; device-independent
medium performance
Call a BIOS (basic input-output system) function




easy to do; abstracted from hardware
slowest performance
may produce different results on different systems
knowledge of hardware required
usually good performance
Communicate directly with the hardware

May not be allowed by some operating systems
51
Displaying a String of Characters

When a HLL program displays a string of
characters, the following steps take place:





Calls an HLL library function to write the string to
standard output
Library function (Level 3) calls an OS function,
passing a string pointer
OS function (Level 2) calls a BIOS subroutine,
passing ASCII code and color of each character
BIOS subroutine (Level 1) maps the character to a
system font, and sends it to a hardware port
attached to the video controller card
Video controller card (Level 0) generates timed
hardware signals to video display to display pixels
52
Summary










Central Processing Unit (CPU)
Arithmetic Logic Unit (ALU)
Instruction execution cycle
Multitasking
Floating Point Unit (FPU)
Complex Instruction Set
Real mode and Protected mode
Motherboard components
Memory types
Input/Output and access levels
53