Drawing inside the lines: Musings on architectural research

Download Report

Transcript Drawing inside the lines: Musings on architectural research

Relaxing Constraints:
Thoughts on the
Evolution of Computer
Architecture
Joel Emer
Alpha Development Group
Compaq Computer Corporation
Better answers
Moore’s Law Alpha-style
100
EV67-730
EV6-575
SPECint95.
EV56-500
EV56-400
10
EV5-300
EV45-275
EV4-200
1
3.73
Date of Introduction
Better answers
EV56-600
Iron Law of Performance
Performance = Frequency * Instructions
CPI

Frequency – largely circuit design/technology

CPI – largely organization

Instructions – largely architecture/compiler
Better answers
Outline

Review of technology factors

Retrospective on the quantitative method

Augmenting the quantitative method

Recommendation
Better answers
Power Dissipation Trends
Power Dissipation
80
60
40
20
0
21064 21164 21264 21364
80
70
60
50
40
30
20
10
0
3.5
3
2.5
2
1.5
1
0.5
0
21064 21164 21264 21364
•Power consumption is increasing
•Supply current is increasing faster!
Better answers
Voltage (V)
Power (W)
100
Current (A)
3.5
3
2.5
2
1.5
1
0.5
0
Voltage (V)
120
Supply Current
Coping With Power Growth

Technology techniques
Better cooling technology needed
 Accelerate V
dd scaling
 SOI
 Clock distribution


Architectural possibilities
Use less power-hungry structures
 Reduce useless speculation

Better answers
Clock Distribution Trends
21264 Power (Peak)
2%
5%
8%
32%
Global Clock Networks
Instruction Issue Units
10%
Caches
Floating Execution Units
Integer Execution Units
Memory Management Unit
10%
I/O
Miscellaneous Logic
15%
Better answers
18%

Frequencies will continue to scale

Clock edge rates are not scaling
Coping With Clock Distribution


Technology solution

Low swing differential clocks

Adiabatic clocking
Architectural possibilities

Multiple clock zones

Asynchronous design
Better answers
Communication Delay
Microprocessor Chip
21064 ~ 1cycle
21164 ~ 1.5 cycles
21264 ~ 3 cycles
21464 ~ 6 cycles
Not drawn to scale
Better answers
Coping With Communication Delay

Technology solutions



Low K dielectrics
Thinner (Cu) interconnect
Architectural possibilities

Deeper pipelining
Replication/clustering of structures

More autonomous computation

Better answers
SIA Roadmap
1997 1999 2002 2005 2008 2012
Technology Node (um)
250 180 130 100
70
50
Memory (bit/chip)
256M 1G 4G 16G 64G 256G
Transistors/chip (MPU)
11M 21M 76M 200M 520M 1.4G
Chip Frequency (MHz)
750 1250 2100 3500 6000 10,000
Wiring Levels (max)
6 6 to 7
7 7 to 8 8 to 9
9
Power Supply Voltage, Vdd (V)
1.8-2.5 1.5-1.8 1.2-1.5 0.9-1.2 0.6-0.9 0.5-0.6
Power - High Performance (W), w/Heat sink
70
90 130 160 170 175
Power -Hand-held (W)
1.2 1.4
2 2.4 2.8 3.2
*The 2012 is directly from the SIA 1997 National Technology Roadmap
Better answers
Outline

Review of technology factors

Retrospective on the quantitative method

Augmenting the quantitative method

Recommendation
Better answers
Disclaimer
The names used and events depicted in this talk are
meant to be real. The events are, however, not an
exhaustive enumeration of significant milestones.
The misrepresentations of fact and omission of
contributors are unintentional and solely the
responsibility of the presenter. Finally, the
interpretations are just that and are mine as well.
Better answers
Early quantitative method - 1981
Better answers
uPC Histogram Chart – 1981-5
TABLE 8
Average VAX Instruction Timing (Cycles per Instruction)
Compute
Decode
1.000
Spec1
0.895
Spec2-6
1.052
B-Disp
0.221
Simple
0.870
Field
0.482
Float
0.292
Call/Ret
0.937
System
0.434
Character
0.318
Decimal
0.026
Int/Except
0.055
Mem Mngmt
0.555
Abort
0.127
TOTAL
7.267
Better answers
Read
R-Stall
Write
W-Stall
lB-Stall
0.613
0.306
0.148
0.364
0.116
0.161
0.192
0.102
0.005
0.029
0.049
0.000
0.133
0.015
0.039
0.002
0.002
0.061
0.017
0.058
0.000
0.074
0.031
0.099
0.000
0.005
0.200
0.033
0.007
0.008
0.130
0.014
0.046
0.001
0.004
0.004
0.027
0.002
0.001
0.184
0.028
0.004
0.002
0.006
0.003
0.783
0.964
0.409
0.450
0.720
Total
1.613
1.565
1.771
0.226
0.977
0.600
0.302
1.458
0.522
0.506
0.031
0.071
0.824
0.127
10.593
Paper counts
ISCA 1
ISCA24
No model
22
1
Analytic Model
5
½
Simulation
1
21½
Measurement
0
7
Better answers
Scientific Method
Make hypothesis about behavior
 Design experiment
 Run experiment and quantify
 Interpret results
 New hypothesis

Better answers
Scientific Method
Make hypothesis about behavior
 Pick baseline design and workload
 Run experiment and quantify
 Interpret results
 New hypothesis

Better answers
Scientific Method
Make hypothesis about behavior
 Pick baseline design and workload
 Run simulation model or measure hardware
 Interpret results
 New hypothesis

Better answers
Scientific Method
Make hypothesis about behavior
 Pick baseline design and workload
 Run simulation model or measure hardware
 Interpret results
 Propose new design

Better answers
Making and Testing Hypothesis

Cache experiment (Schlansker)

64K word cache




32-way set associative cache/LRU replacement
200x200 matrix subblock of an N x N matrix
Read twice
Sizes



N=2727: 0 misses
N=2729: 24160 misses
N=2731: 36382 misses
Better answers
Propose new design

Skewed associative (Seznec)
Direct mapped
Better answers
4-way associative
4-way skewed
Quantitative Approach Problems

Too much abstraction


Intra-chip latencies
Memory subsystem

Poor workloads

Too incremental…
Better answers
Quantitative -> Incremental
4
3.5
3
2.5
2
1.5
1
0.5
0
a
Better answers
b
c
d
e
f
g
h
I
j
k
l
Outline

Review of technology factors

Retrospective on the quantitative method

Augmenting the quantitative method

Recommendation
Better answers
Relaxing Constraints

Select a constraint to relax

Generate design

Employ quantitative method

Evaluate results
Better answers
Important Steps…

Before


Carefully pick a constraint to relax
After


Find contributions without constraint
Preserving results after reinstating the constraint
Better answers
Extrapolate From Current Trends


Personal Workstation – Xerox PARC – late 70’s
VAX 11/780
Dorado
5 MHz
15 MHz
512 Kilobytes
8 Megabytes
40+ Users
1 User
Results

Accelerate innovation
Better answers
Throw Out Standards

Distributed file system - 1985
Better answers
Use a Simpler Starting Point

Fetch
RISC out-of-order (Johnson, Tourng)
Decode/
Map
Queue
Reg
Read
Execute
Dcache/
Store
Buffer
Reg
Write
PC
Register
Map
Regs
Icache
Better answers
Dcache
Regs
Retire
CISC-based O-O-O
K6 (Johnson)
 Pentium Pro (Colwell, Papworth…)

PC
Covert
CISC
to RISC
Icache
Better answers
RISC
O-O-O
Core
Abandon conventions

VLIW (Fisher)



Relieve hardware of all dependency responsibility
Give that responsibility to compiler
Expected consequences


Much simpler implementation
Faster cycle time
Better answers
Sometimes not what you expect

Compiler scheduling for hardware is a great idea



For 21064 - narrow in-order
For 21164 - wider in-order
For 21264 – wider out-of-order
Better answers
Issue Logic Critical Loop
Issue
Conflict
Checker
to floating point
multiply pipeline
to floating point
add pipeline
X
to integer
pipeline 0
to integer
pipeline 1
Instruction
Slot
S2
Better answers
Instruction
Issue
S3
Make a Radical Departure

Multiscalar research (Sohi, Smith…)
Better answers
New Mechanism Required

Dependence prediction (Moshovos)
Store
Program
Order
Execution
Order
Load
Load
Store
Store
Load
Trap!
Load
Load
Better answers
Load
What Was Really Important

Full hardware management (Sohi)
Sequencing
Register dependencies
 Memory dependencies



Refinement (Mowry and Olukuton)


Compiler managed – registers, sequencing
Hardware managed memory dependence only
Better answers
Ignoring Implementation Realities

SMT - in-order (Tullsen, Eggers, Levy)
Fetch
Issue
Reg
Read
Execute
Dcache/
Store
Buffer
Reg
Write
PC
Icache
Regs
Icache
Better answers
Dcache
Regs
Solution Already Available

Fetch
SMT out-of-order
Decode/
Map
Queue
Reg
Read
Execute
Dcache/
Store
Buffer
Reg
Write
PC
Register
Map
Regs
Icache
Better answers
Dcache
Regs
Retire
Outline

Review of technology factors

Retrospective on the quantitative method

Augmenting the quantitative method

Recommendation
Better answers
Pay Attention to Reality

Look at technology trends



Power
Latency
Use more realistic models


More organizational details
Better workloads
Better answers
Ignore Reality

Look for revolutionary contributions
Decide on a constraint to relax
 Apply the scientific method


Revolutionary contributions may arise because
–
Constraint will be relaxed in time
–
Constraint wasn’t fundamental
–
New avenues of exploration will be opened
Better answers
Acknowledgments
Bill Bowhill
 Paul Gronowski
 Bill Herrick
 Toni Juan
 Geoff Lowney
 Ellen Piccioli
 Andre Seznec

Better answers