of registers.

Download Report

Transcript of registers.

Appendix A:
Instruction Set Principles
and Examples
•
•
•
•
•
•
•
•
Classifying Instruction Set Architecture
Memory addressing mode
Operations in the instruction set
Control flow instructions
Instruction format
Structure of recent compilers
MMX technology
MIPS instruction set
1
Introduction
• An instruction set architecture is a specification of a
standardized programmer-visible interface to
hardware, comprised of:
– A set of instructions (really, instruction types)
• With associated argument fields, assembly syntax, and
machine encoding.
– A set of named storage locations
• Registers, memory, … Programmer-accessible caches?
– A set of addressing modes (ways to name locations)
– Often an I/O interface (usually memory-mapped)
2
Classifying Architectures
• One important classification scheme is by the
type of addressing modes supported.
– Stack architecture: Operands implicitly on top of a stack.
(Early machines.)
– Accumulator architecture: One operand is implicitly an
accumulator (a special register). (Early machs.)
– General-purpose register architecture: Operands may be
any of a large (typically 10s-100s) # of registers.
• Register-memory architectures: One op may be memory.
• Load-store architectures: All ops are registers, except in
special load and store instructions.
3
Four Architecture Classes
Assembly for C:=A+B:
4
Number of Operands
A further classification is by the maximum number of
operands, and # that can be memory: e.g.,
– 2-operand (e.g. a += b)
• src/dest(reg), src(reg)
• src/dest(reg), src(mem)
IBM 360, x86, 68k
• src/dest(mem), src(mem)
VAX
– 3-operand (e.g. a = b+c)
• dest(reg), src1(reg), src2(reg) MIPS, PPC, SPARC, &c.
• dest(reg), src1(reg), src2(mem) IBM 370
• dest(mem), src1(mem), src2(mem)
IBM 370, VAX
5
Further Classification
# of Memory
Operands
# of Operands
Type of
Architecture
Examples
0
3
Register-register
Alpha, ARM, MIPS,
PowerPC, Sparc,etc
1
2
Register-memory
IBM360/370, Intel 80x86,
Motorola 68000, TI C54x
2
2
Memory-memory
VAX
3
3
Memory-memory
VAX
6
Comparison of Architecture Types
Type
Instruction
Encoding
Code
Generation
# of Clock
Cycles/Inst.
Code Size
Registerregister
Fixed-length
Simple
Similar
Large
Registermemory
Easy
Moderate
Different
Medium
Memorymemory
Variablelength
Complex
Large
variation
Compact
Advantages
Disadvantages
7
Endians & Alignment
7
6
5
4
3
2
1
Increasing byte
address
0
4
Word-aligned word at byte address 4.
2
Halfword-aligned word at byte address 2.
1
Byte-aligned (non-aligned) word, at byte address 1.
word
3 (MSB)
2
1
0 (LSB)
word
0 (LSB)
1
2
3 (MSB)
Little-endian byte order
(least-significant byte “first”).
Big-endian byte order
(most-significant byte “first”).
8
Addressing Modes
Mode
Immediate
Register
Direct
Indirect
Displacement
Indexed
Memory indirect
Example
add r4, #3
add r4, r3
add r1, (1001)
add r4, (r1)
add r4, 100(r1)
add r3, (r1+r2)
add r1, @(r3)
Meaning (RTL)
R[4]R[4]+3
R[4]R[4]+R[3]
R[1]R[1]+M[1001]
R[4]R[4]+M[R[1]]
R[4]R[4]+M[100+R[1]]
R[3]R[3]+M[R[1]+R[2]]
R[1]R[1]+M[M[R[3]]]
• In example assembly syntax in middle column, ( )
indicates memory access. (A typical syntax.)
• In RTL syntax on right, [ ] denotes accessing a
member of an array, Register or Memory.
9
Addressing Mode Usage
3 SPEC89 on VAX
10
Displacement Distribution
SPEC CPU2000 on Alpha
Sign bit is not counted
11
Use of Immediate Operand
12
Distribution of Immediate
SPEC CPU2000 on Alpha
Sign bit is not counted
13
Instruction Type
14
Instruction Distribution
(5 SPECint92)
15
Control Flow Instructions
• Four basic types:
–
–
–
–
(Conditional) branches
(Unconditional) jumps
Procedure calls
Procedure returns
• Control flow addressing modes:
– Often PC-relative (PC + displacement). Relocatable.
– Also useful: register indirect jumps (reg. has addr.).
Uses:
• Procedure returns
• Case / switch statements
• Virtual functions / methods (abstract class method calls)
• High-order functions / function pointers
• Dynamically shared libraries
16
Conditional Branch Options
• Condition Code (CC) Register
– E.g.: X86, ARM, PPC, SPARC, …
– ALU ops set condition code flags in the CCR
– Branch just checks the flag
• Condition register
– E.g.: Alpha, MIPS
– Comparison instruction puts result in a GPR
– Branch instruction checks the register
• Compare & Branch
– E.g.: PA-RISC, VAX
– Compare & branch in 1 instruction.
17
Procedure Calling Conventions
• Two major calling conventions:
– Caller saves:
• Before the call, procedure caller saves registers that will be
needed later, even if callee did not use them
– Callee saves:
• Inside the call, called procedure saves registers that it will
overwrite
• Can be more efficient if many small procedures
• Many architectures use a combination of schemes:
– E.g., MIPS: Some registers caller-saves, some calleesaves
18
Three Classes of Control Instructions
SPEC CPU2000 on Alpha
19
Branch Distance Distribution
SPEC CPU2000 on Alpha
20
Branch Comparison Types
SPEC CPU2000 on Alpha
21
Encoding An Instruction Set
22
Compiler Structure
23
Compiler Optimizations
24
Compiler Optimizations (cont.)
25
Effect of Optimization
26
Architectural Support for Compiler
• Provide regularity
– Orthogonality (independence) of:
• Registers used
• Addressing modes
• Operations used
• Provide primitives, not solutions
– Don’t directly support specific kernels or languages
• Simplify trade-offs among alternatives
– Make easy to tell fastest code sequence @ compile time
• Don’t interpret values known at compile time
– Allow compile-time constants to be provided in
immediates
27
MIPS Architecture
• RISC, load-store architecture, simple address
• 32-bit instructions, fixed format
• 32 64-bit GPRs, R0-R31.
– Really, only 31 – R0 is just a constant 0.
• 32 64-bit FPRs, F0-F31
– Can hold 32-bit floats also (with other ½ unused).
– “SIMD” extensions operate on more floats in 1 FPR
• A few special registers
– Floating-point status register
• Load/store 8-, 16-, 32-, 64-bit integers
– All sign-extended to fill 64-bit GPR
– Also 32- bit floats/doubles
28
MIPS Addressing Modes
• Register (arith./logical ops only)
• Immediate (arith./logical only) & Displacement
(load/stores only)
– 16-bit immediate / offset field
– Register indirect: use 0 as displacement offset
– Direct (absolute): use R0 as displacement base
• Byte-addressed memory, 64-bit address
• Software-settable big-endian/little-endian flag
• Alignment required
29
Inst. Format: I-type Instructions
30
Inst. Format: R-type Instructions
31
Inst. Format: J-type Instructions
32
MIPS Instruction Set
• Go through Figures A.23-A.25 in textbook,
– Loads and stores in MIPS, Figure A.23
– Arithmetic and logical instructions, Figure A.24
– Control flow instructions, Figure A.25
• More on Appendix A: Figure A.26 – A.30.
33
MIPS Dynamic Instr. Frequencies
Integer benchmarks
FP benchmarks
34
Multimedia Extensions
• Graphics displays work on pixels: 8, 16, 32 bits
per pixel to define pixel colors
• Audio samples of 16, 24 bits
• Exploit subword parallelism using existing 64/128
bit registers and ALUs
• Intel i860, first (1989) to operate on 8 8-bit, 4 16bit, or 2 32-bit operands on 64-bit ALUs
• Almost all microprocessors have media
extensions
• Intel use SIMD to describe MMX extensions, only
limit in the width of registers, e.g. 64 bits
35
Intel MMX Technology
• MMX registers: 64-bit MM0 to MM7 shared with FP registers
R0, R7, has side-effect on FPU state, only use for operands
• Four MMX data types:
MMX Register
63
0
Packed Byte 8x8
Packed Word 16x4
Packed Doubleword 32x2
Quadword 64
• 64-bit / 32-bit access mode from memory to MMX registers
• SIMD techniques for arithmetic/logical operations on bytes,
words, doublewords from/to 64-bit registers
36
MMX Instruction Set
• MMX instruction set consists of 57 instructions,
group into 7 categories: (See Intel Architecture
Software Developer’s Manual Vol. 1 Basic
Architecture (order#: 143190); Vol. 2 Instruction
Set Ref. (order#: 243191); Vol. 3 System
Programming Guide (order#: 243192) at:
http://developer.intel.com/design/archives/proces
sors/mmx/index.htm
–
–
–
–
–
–
–
Arithmetic instructions
Data transfer instructions
Comparison instructions
Conversion instructions
Logical instructions
Shift instructions
Empty MMX state instruction (EMMS)
37
SIMD – Parallel Operations
• Conventional scalar operations vs. SIMD - PADDW
A4
B4
A3
B3
A2
B2
A1
B1
A1
A2
A3
+
A4+B4
A3+B3
A4
B1
B2
B3
B4
+
A1+B1
A2+B2
A3+B3
A4+B4
A2+B2
A1+B1
• 4-time faster, but require to move data in/out of
the MMX registers
38
Packed Multiply Add
• 4 multiplications and 2 adds in one PMADDWD instruction
A3
A2
A1
B3
B2
B1
x
A3xB3
x
A2xB2
A3xB3 + A2xB2
x
A0
x
B0
A1xB1
A1xB1 + A0+B0
Source 1
Source 2
A0xB0
Intermediate
Destination (Result DW)
• PMADDWD produces 2 DW (32 bits) results
– Useful inst. for many media and signal applications
– Need arrange and pack input / output results to/from MMX
registers, add programming complexity and performance
overhead
39
Data Move Instructions
• MOVD m32, mm
63
xx xx
0
xx xx
A3 A2 A1 A0
15
mm
0
A3 A2
A1 A0
Memory m32
• MOVD mm, r32
63
00 00
31
A3 A2
0
00 00
A3 A2 A1 A0
0
A1 A0
Move data between MMX registers and memory or regular
register for SIMD instructions
40