Computer Architecture

Download Report

Transcript Computer Architecture

Computer Architecture
Lecture 3
Basic Fundamentals
and
Instruction Sets
The Task of a
Computer Designer
1.1 Introduction
1.2 The Task of a Computer
Designer
1.3 Technology and Computer
Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting
Performance
1.6 Quantitative Principles of
Computer Design
1.7 Putting It All Together: The
Concept of Memory
Hierarchy
Implementation
Complexity
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
2
Technology and
Computer Usage Trends
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
When building a Cathedral numerous
very practical considerations need to
be taken into account:
• available materials
• worker skills
• willingness of the client to pay the
price.
Similarly, Computer Architecture is about
working within constraints:
• What will the market buy?
• Cost/Performance
• Tradeoffs in materials and processes
3
Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of
transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.
Transistors Per Chip
1.E+08
Pentium 3
Pentium Pro
1.E+07
Pentium II
Power PC G3
Pentium
486
1.E+06
Power PC 601
386
80286
1.E+05
8086
1.E+04
4004
1.E+03
1970
1975
1980
1985
1990
1995
2000
2005
4
Measuring And
Reporting Performance
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
This section talks about:
1. Metrics – how do we describe
in a numerical way the
performance of a computer?
2. What tools do we use to find
those metrics?
5
Metrics
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concodre
3 hours
1350 mph
132
178,200
• Time to run the task (ExTime)
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns …
(Performance)
– Throughput, bandwidth
6
Metrics - Comparisons
"X is n times faster than Y" means
ExTime(Y)
--------ExTime(X)
=
Performance(X)
--------------Performance(Y)
Speed of Concorde vs. Boeing 747
Throughput of Boeing 747 vs. Concorde
7
Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine
performance. There is special interest in comparing the new product, rabbit to the
old product, turtle, since the product was rewritten for performance reasons. (Pat
had used Performance Engineering techniques and thus knew that rabbit was
"about twice as fast" as turtle.) The measurements showed:
Performance Comparisons
Product
Turtle
Rabbit
Transactions / second
30
60
Seconds/ transaction
0.0333
0.0166
Seconds to process transaction
3
1
Which of the following statements reflect the performance comparison of rabbit and
turtle?
o Rabbit is 100% faster than turtle.
o Rabbit is twice as fast as turtle.
o Rabbit takes 1/2 as long as turtle.
o Rabbit takes 1/3 as long as turtle.
o Rabbit takes 100% less time than turtle.
o Rabbit takes 200% less time than turtle.
o Turtle is 50% as fast as rabbit.
o Turtle is 50% slower than rabbit.
o Turtle takes 200% longer than rabbit.
o Turtle takes 300% longer than rabbit.
8
Metrics - Throughput
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
(millions) of Instructions per second: MIPS
(millions) of (FP) operations per second: MFLOP/s
Datapath
Control
Function Units
Transistors Wires Pins
Megabytes per second
Cycles per second (clock rate)
9
Methods For Predicting
Performance
• Benchmarks, Traces, Mixes
• Hardware: Cost, delay, area, power estimation
• Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental “Laws”/Principles
10
Benchmarks
SPEC: System Performance Evaluation
Cooperative
•
•
•
First Round 1989
– 10 programs yielding a single number (“SPECmarks”)
Second Round 1992
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=
memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
Third Round 1995
– new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating
point)
– “benchmarks useful for 3 years”
– Single flag setting for all programs: SPECint_base95, SPECfp_base95
11
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):
Program
Language
What Is It
164.gzip
175.vpr
176.gcc
181.mcf
186.crafty
197.parser
252.eon
253.perlbmk
254.gap
255.vortex
256.bzip2
300.twolf
C
C
C
C
C
C
C++
C
C
C
C
C
Compression
FPGA Circuit Placement and Routing
C Programming Language Compiler
Combinatorial Optimization
Game Playing: Chess
Word Processing
Computer Visualization
PERL Programming Language
Group Theory, Interpreter
Object-oriented Database
Compression
Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
12
Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000):
Program
168.wupwise
171.swim
172.mgrid
173.applu
177.mesa
178.galgel
179.art
183.equake
187.facerec
188.ammp
189.lucas
191.fma3d
200.sixtrack
301.apsi
Language
Fortran 77
Fortran 77
Fortran 77
Fortran 77
C
Fortran 90
C
C
Fortran 90
C
Fortran 90
Fortran 90
Fortran 77
Fortran 77
What Is It
Physics / Quantum Chromodynamics
Shallow Water Modeling
Multi-grid Solver: 3D Potential Field
Parabolic / Elliptic Differential Equations
3-D Graphics Library
Computational Fluid Dynamics
Image Recognition / Neural Networks
Seismic Wave Propagation Simulation
Image Processing: Face Recognition
Computational Chemistry
Number Theory / Primality Testing
Finite-element Crash Simulation
High Energy Physics Accelerator Design
Meteorology: Pollutant Distribution
http://www.spec.org/osg/cpu2000/CFP2000/
13
Sample Results For
SpecINT2000
Benchmarks
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc
Benchmarks
Base
Base
Base
Ref Time Run Time Ratio
164.gzip
1400
175.vpr
1400
176.gcc
1100
181.mcf
1800
186.crafty
1000
197.parser
1800
252.eon
1300
253.perlbmk
1800
254.gap
1100
255.vortex
1900
256.bzip2
1500
300.twolf
3000
SPECint_base2000
SPECint2000
277
419
275
621
191
500
267
302
249
268
389
784
505*
334*
399*
290*
522*
360*
486*
596*
442*
710*
386*
382*
438
Peak
Peak
Peak
Ref Time Run Time Ratio
1400
1400
1100
1800
1000
1800
1300
1800
1100
1900
1500
3000
270
417
272
619
191
499
267
302
248
264
375
776
518*
336*
405*
291*
523*
361*
486*
596*
443*
719*
400*
387*
Intel OR840(1 GHz
Pentium III processor)
442
14
Benchmarks
Performance Evaluation
• “For better or worse, benchmarks shape a field”
• Good products created when have:
– Good benchmarks
– Good ways to summarize performance
• Given sales is a function in part of performance relative to
competition, investment in improving product as reported by
performance summary
• If benchmarks/summary inadequate, then choose between
improving product for real programs vs. improving product to get
more sales;
Sales almost always wins!
• Execution time is the measure of computer performance!
15
Benchmarks
How to Summarize Performance
Management would like to have one number.
Technical people want more:
1.
They want to have evidence of reproducibility – there should be enough
information so that you or someone else can repeat the experiment.
2.
There should be consistency when doing the measurements multiple
times.
How would you report these results?
Computer A
Computer B
Computer C
Program P1 (secs)
1
10
20
Program P2 (secs)
1000
100
20
Total Time (secs)
1001
110
40
16
Quantitative Principles
of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
Make the common case fast.
Amdahl’s Law:
Relates total speedup of a
system to the speedup of some
portion of that system.
17
Quantitative
Design
Amdahl's Law
Speedup due to enhancement E:
Speedup( E ) 
Execution _ Time _ Without _ Enhancement
Performance _ With _ Enhancement

Execution _ Time _ With _ Enhancement
Performance _ Without _ Enhancement
This fraction enhanced
Suppose that enhancement E accelerates a fraction F
of the task by a factor S, and the remainder of the
task is unaffected
18
Quantitative
Design
Cycles Per
Instruction
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count
n
CPU _ Time  Cycle _ Time *  CPIi * I i
i 1
“Instruction Frequency”
n
CPI   CPI i * Fi
i 1
where
Number of
instructions of
type I.
Fi 
Ii
Instruction _ Count
Invest Resources where time is Spent!
19
Quantitative
Design
Cycles Per
Instruction
Suppose we have a machine where we can count the frequency with which
instructions are executed. We also know how many cycles it takes for
each instruction type.
Base Machine (Reg / Reg)
Op
Freq Cycles CPI(i)
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
Total CPI
1.5
(% Time)
(33%)
(27%)
(13%)
(27%)
20
Quantitative
Design
Locality of
Reference
Programs access a relatively small portion of the address space at
any instant of time.
There are two different types of locality:
Temporal Locality (locality in time): If an item is referenced, it will
tend to be referenced again soon (loops, reuse, etc.)
Spatial Locality (locality in space/location): If an item is referenced,
items whose addresses are close by tend to be referenced soon
(straight line code, array access, etc.)
21
The Concept of
Memory Hierarchy
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design
1.7 Putting It All Together: The Concept of
Memory Hierarchy
Fast memory is expensive.
Slow memory is cheap.
The goal is to minimize the
price/performance for a
particular price point.
22
Memory Hierarchy
Registers
Level 1
cache
Level 2
Cache
Memory
Disk
Typical
Size
4 - 64
<16K bytes
<2 Mbytes
<16
Gigabytes
>5
Gigabytes
Access
Time
1 nsec
3 nsec
15 nsec
150 nsec
5,000,000
nsec
Bandwidth
(in MB/sec)
10,000 –
50,000
2000 - 5000
500 - 1000
500 - 1000
100
Managed
By
Compiler
Hardware
Hardware
OS
OS/User
23
Memory Hierarchy
• Hit: data appears in some block in the upper level (example:
Block X)
– Hit Rate: the fraction of memory access found in the upper level
– Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the lower level
(Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
• Hit Time << Miss Penalty (500 instructions on 21264!)
24
Memory Hierarchy
Registers
Level 1
cache
Level 2
Cache
Memory
Disk
What is the cost of executing a program if:
• Stores are free (there’s a write pipe)
• Loads are 20% of all instructions
• 80% of loads hit (are found) in the Level 1 cache
• 97 of loads hit in the Level 2 cache.
25
The Instruction Set
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
Bonus
26
Introduction
The Instruction Set Architecture is that portion of the machine visible
to the assembly level programmer or to the compiler writer.
software
instruction set
hardware
1.
What are the advantages and disadvantages of various
instruction set alternatives.
2.
How do languages and compilers affect ISA.
3.
Use the DLX architecture as an example of a RISC architecture.
27
2.1 Introduction
Classifying Instruction Set
Architectures
2.2 Classifying Instruction Set
Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
Classifications can be by:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
1.
2.
3.
Stack/accumulator/register
Number of memory operands.
Number of total operands.
28
Instruction Set
Architectures
Accumulator:
1 address
1+x address
Basic ISA
Classes
add A
addx A
acc acc + mem[A]
acc acc + mem[A + x]
add
tos tos + next
add A B
add A B C
EA(A) EA(A) + EA(B)
EA(A) EA(B) + EA(C)
Stack:
0 address
General Purpose Register:
2 address
3 address
Load/Store:
0 Memory
1 Memory
load R1, Mem1
load R2, Mem2
add R1, R2
ALU Instructions
can have two or
three operands.
ALU Instructions can
have 0, 1, 2, 3 operands.
Shown here are cases of
0 and 1.
add R1, Mem2
29
Instruction Set
Architectures
Basic ISA
Classes
The results of different address classes is easiest to see with the examples here,
all of which implement the sequences for C = A + B.
Stack
Accumulator
Register
(Register-memory)
Register
(load-store)
Push A
Load A
Load R1, A
Load
R1, A
Push B
Add B
Add
Load
R2, B
Add
Store C
Store
Add
R3, R1, R2
Pop C
R1, B
C, R1
Store
C, R3
Registers are the class that won out. The more registers on the CPU, the better.
30
Instruction Set
Architectures
Intel 80x86
Integer Registers
GPR0
EAX
Accumulator
GPR1
ECX
Count register, string, loop
GPR2
EDX
Data Register; multiply, divide
GPR3
EBX
Base Address Register
GPR4
ESP
Stack Pointer
GPR5
EBP
Base Pointer – for base of stack seg.
GPR6
ESI
Index Register
GPR7
EDI
Index Register
CS
Code Segment Pointer
SS
Stack Segment Pointer
DS
Data Segment Pointer
ES
Extra Data Segment Pointer
FS
Data Seg. 2
GS
Data Seg. 3
EIP
Instruction Counter
Eflags
Condition Codes
PC
31
Memory Addressing
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
Sections Include:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
Interpreting Memory Addresses
2.7 The Role of Compilers
2.8 The DLX Architecture
Addressing Modes
Displacement Address Mode
Immediate Address Mode
32
Memory
Addressing
Interpreting Memory
Addresses
What object is accessed as a function of the address and length?
Objects have byte addresses – an address refers to the number of bytes
counted from the beginning of memory.
Little Endian – puts the byte whose address is xx00 at the least
significant position in the word.
Big Endian – puts the byte whose address is xx00 at the most significant
position in the word.
Alignment – data must be aligned on a boundary equal to its size.
Misalignment typically results in an alignment fault that must be
handled by the Operating System.
33
Memory
Addressing
Addressing
Modes
This table shows the most common modes.
Addressing Mode
Example
Instruction
Meaning
When Used
Register
Add R4, R3
R[R4] <- R[R4] + R[R3]
When a value is in a
register.
Immediate
Add R4, #3
R[R4] <- R[R4] + 3
For constants.
Displacement
Add R4, 100(R1)
R[R4] <- R[R4] +
M[100+R[R1] ]
Accessing local
variables.
Register Deferred
Add R4, (R1)
R[R4] <- R[R4] +
M[R[R1] ]
Using a pointer or a
computed address.
Absolute
Add R4, (1001)
R[R4] <- R[R4] + M[1001]
Used for static data.
34
Memory
Addressing
Displacement
Addressing Mode
How big should the displacement be?
For addresses that do fit in displacement size:
Add R4, 10000 (R0)
For addresses that don’t fit in displacement size, the compiler
must do the following:
Load R1, address
Add R4, 0 (R1)
Depends on typical displaces as to how big this should be.
On both IA32 and DLX, the space allocated is 16 bits.
35
Memory
Addressing
Immediate Address
Mode
Used where we want to get to a numerical value in an
instruction.
At high level:
At Assembler level:
a = b + 3;
Load
Add
if ( a > 17 )
Load
R2, 17
CMPBGT R1, R2
goto
Load
Jump
Addr
R2, 3
R0, R1, R2
R1, Address
(R1)
36
2.1 Introduction
2.2 Classifying Instruction Set
Architectures
Operations In The
Instruction Set
2.3 Memory Addressing
2.4 Operations in the Instruction Set
Sections Include:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Detailed information about types
of instructions.
Instructions for Control Flow
(conditional branches, jumps)
37
Operations In The
Instruction Set
Operator Types
Arithmetic and logical
and, add
Data transfer
move, load
Control
branch, jump, call
System
system call, traps
Floating point
add, mul, div, sqrt
Decimal
add, convert
String
move, compare
Multimedia 2D, 3D? e.g., Intel MMX and Sun VIS
38
Operations In The
Instruction Set
Control
Instructions
Conditional branches are 20%
of all instructions!!
Control Instructions Issues:
•
taken or not
•
where is the target
•
link return address
•
save or restore
Instructions that change the PC:
•
(conditional) branches, (unconditional) jumps
•
function calls, function returns
•
system calls, system returns
39
Operations In The
Instruction Set
Control
Instructions
There are numerous tradeoffs:
There are numerous tradeoffs:
Compare and branch
+ no extra compare, no state passed
between instructions
-- requires ALU op, restricts code
scheduling opportunities
Implicitly set condition codes Z, N, V, C
+ can be set ``for free''
-- constrains code reordering, extra
state to save/restore
Explicitly set condition codes
+ can be set ``for free'', decouples
branch/fetch from pipeline
-- extra state to save/restore
condition in generalpurpose register
+ no special state but uses up a register
-- branch condition separate from branch
logic in pipeline
some data for MIPS
> 80% branches use immediate data, >
80% of those zero
50% branches use == 0 or <> 0
compromise in MIPS
branch==0, branch<>0
compare instructions for all other
compares
40
Control
Instructions
Operations In The
Instruction Set
Link Return Address:
Save or restore state:
implicit register many recent
architectures use this
What state?
+ fast, simple
-- s/w save register before next call,
surprise traps?
explicit register
+ may avoid saving register
-- register must be specified
processor stack
+ recursion direct
-- complex instructions
function calls: registers
system calls: registers, flags, PC, PSW, etc
Hardware need not save registers
Caller can save registers in use
Callee save registers it will use
Hardware register save
IBM STM, VAX CALLS
Faster?
Many recent architectures do no register
saving
Or do implicit register saving with register
windows (SPARC)
41
Type And Size of Operands
2.1 Introduction
2.2 Classifying Instruction Set
Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
The type of the operand is usually
encoded in the Opcode – a LDW
implies loading of a word.
Common sizes are:
Character (1 byte)
Half word (16 bits)
Word (32 bits)
Single Precision Floating Point (1 Word)
Double Precision Floating Point (2 Words)
Integers are two’s complement binary.
Floating point is IEEE 754.
Some languages (like COBOL) use
packed decimal.
42
Encoding And Instruction Set
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
This section has to do with how an
assembly level instruction is
encoded into binary.
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
Ultimately, it’s the binary that is
read and interpreted by the
machine.
2.8 The DLX Architecture
43
Encoding And
Instruction Set
80x86 Instruction
Encoding
for ( index = 0; index < iterations; index++ )
0040D3AF
C7 45 F0 00 00 00 00 mov
dword ptr [ebp-10h],0
0040D3B6
EB 09
jmp
main+0D1h (0040d3c1)
0040D3B8
8B 4D F0
mov
ecx,dword ptr [ebp-10h]
0040D3BB
83 C1 01
add
ecx,1
0040D3BE
89 4D F0
mov
dword ptr [ebp-10h],ecx
0040D3C1
8B 55 F0
mov
edx,dword ptr [ebp-10h]
0040D3C4
3B 55 F8
cmp
edx,dword ptr [ebp-8]
0040D3C7
7D 15
jge
main+0EEh (0040d3de)
long_temp = (*alignment + long_temp) % 47;
0040D3C9
8B 45 F4
mov
eax,dword ptr [ebp-0Ch]
0040D3CC
8B 00
mov
eax,dword ptr [eax]
0040D3CE
03 45 EC
add
eax,dword ptr [ebp-14h]
0040D3D1
99
cdq
0040D3D2
B9 2F 00 00 00
mov
ecx,2Fh
0040D3D7
F7 F9
idiv
eax,ecx
0040D3D9
89 55 EC
mov
dword ptr [ebp-14h],edx
0040D3DC
EB DA
jmp
main+0C8h (0040d3b8)
Here’s some
sample code that’s
been disassembled.
It was compiled
with the debugger
option so is not
optimized.
This code
was
produced
using Visual
Studio
44
Encoding And
Instruction Set
80x86 Instruction
Encoding
for ( index = 0; index < iterations; index++ )
00401000 8B 0D 40 54 40 00
mov
ecx,dword ptr ds:[405440h]
00401006 33 D2
xor
edx,edx
00401008 85 C9
test
ecx,ecx
0040100A 7E 14
jle
00401020
0040100C 56
push
esi
0040100D 57
push
edi
0040100E 8B F1
mov
esi,ecx
long_temp = (*alignment + long_temp) % 47;
00401010 8D 04 11
lea
eax,[ecx+edx]
00401013 BF 2F 00 00 00
mov
edi,2Fh
00401018 99
cdq
00401019 F7 FF
idiv
eax,edi
0040101B 4E
dec
esi
0040101C 75 F2
jne
00401010
0040101E 5F
pop
edi
0040101F 5E
pop
esi
00401020 C3
ret
Here’s some
sample code that’s
been disassembled.
It was compiled
with optimization
This code
was
produced
using Visual
Studio
45
Encoding And
Instruction Set
80x86 Instruction
Encoding
for ( index = 0; index < iterations; index++ )
0x804852f <main+143>: add $0x10,%esp
0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx
0x8048535 <main+149>: test %esi,%esi
0x8048537 <main+151>: jle 0x8048543 <main+163>
0x8048539 <main+153>: mov %esi,%eax
0x804853b <main+155>: nop
0x804853c <main+156>: lea 0x0(%esi,1),%esi
long_temp = (*alignment + long_temp) % 47;
0x8048540 <main+160>: dec %eax
0x8048541 <main+161>: jne 0x8048540 <main+160>
0x8048543 <main+163>: add $0xfffffff4,%esp
Here’s some
sample code that’s
been disassembled.
It was compiled
with optimization
This code
was
produced
using gcc
and gdb.
Note that the representation of
the code is dependent on the
compiler/debugger!
46
Encoding And
Instruction Set
4
3
1
8
ADD Reg W
6
SHL
80x86 Instruction
Encoding
A Morass of disjoint encoding!!
Disp.
2
8
8
V/w
postbyte
Disp.
7
1
8
8
TEST
W
postbyte
Immediate
47
Encoding And
Instruction Set
4
4
JE
Cond
80x86 Instruction
Encoding
8
Disp.
8
16
16
CALLF
Offset
Segment Number
6
MOV
5
2
8
8
D/w
postbyte
Disp.
3
PUSH Reg
48
The Role of Compilers
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Compiler goals:
• All correct programs execute
correctly
• Most compiled programs
execute fast (optimizations)
• Fast compilation
• Debugging support
49
The Role of
Compilers
Steps In Compilation
Parsing > intermediate representation
Jump Optimization
Loop Optimizations
Register Allocation
Code Generation > assembly code
Common SubExpression
Procedure in-lining
Constant Propagation
Strength Reduction
Pipeline Scheduling
50
The Role of
Compilers
Optimization
Name
Steps In Compilation
Explanation
% of the total number of
optimizing
transformations
High Level
At or near the source level;
machine-independent
Not Measured
Local
Within Straight Line Code
40%
Global
Across A Branch
42%
Machine Dependent
Depends on Machine Knowledge
Not Measured
51
The Role of
Compilers
What compiler writers want:
• regularity
• orthogonality
• composability
Compilers perform a giant case
analysis
• too many choices make it hard
Orthogonal instruction sets
• operation, addressing mode, data
type
One solution or all possible solutions
• 2 branch conditions eq, lt
• or all six eq, ne, lt, gt, le, ge
• not 3 or 4
There are advantages to having
instructions that are primitives.
Let the compiler put the instructions
together to make more complex
sequences.
52
The MIPS Architecture
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
MIPS is very RISC oriented.
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
MIPS will be used for many
examples throughout the
course.
MIPS (originally an acronym for Microprocessor without Interlocked
Pipeline Stages) is a RISC microprocessor architecture developed by MIPS
Technologies. We will look at the Pipeline concept in our next lecture.
The acronym RISC (pronounced risk), for reduced instruction set computer
represents a CPU design strategy emphasizing the insight that simplified
instructions which "do less" may still provide for higher performance if this
simplicity can be utilized to make instructions execute very fast. Well known
RISC families include DEC Alpha, ARC, ARM, AVR, MIPS, PA-RISC, Power
53
Architecture (including PowerPC), and SPARC.
The MIPS
Architecture
32bit byte addresses aligned
Load/store only displacement
addressing
Standard data types
3 fixed length formats
32 32bit GPRs (r0 = 0)
16 64bit (32 32bit) FPRs
FP status register
No Condition Codes
There’s MIPS – 64 – the current arch.
Standard datatypes
4 fixed length formats (8,16,32,64)
32 64bit GPRs (r0 = 0)
64 64bit FPRs
MIPS Characteristics
Addressing Modes
• Immediate
• Displacement
• (Register Mode used only for ALU)
Data transfer
• load/store word, load/store byte/half
word signed?
• load/store FP single/double
• moves between GPRs and FPRs
ALU
• add/subtract signed? immediate?
• multiply/divide signed?
• and, or, xor immediate?, shifts: ll, rl,
ra immediate?
• sets immediate?
54
The MIPS
Architecture
MIPS Characteristics
Control
•
branches == 0, <> 0
•
conditional branch testing FP bit
•
jump, jump register
•
jump & link, jump & link register
•
trap, returnfromexception
Floating Point
•
add/sub/mul/div
•
single/double
•
fp converts, fp set
55
The DLX Architecture
The DLX is a RISC processor architecture design by the principal designers of
the MIPS and the Berkeley RISC designs, the two benchmark examples of
RISC design. The DLX is essentially a cleaned up and simplified MIPS with a
simple 32-bit load/store architecture. Intended primarily for teaching purposes,
the DLX design is widely used in university-level computer architecture
courses.
The next couple of lectures will use the MIPS and DLX architectures as
examples to demonstrate concepts.
56
End of Lecture
57