Drawing inside the lines: Musings on architectural research
Download
Report
Transcript Drawing inside the lines: Musings on architectural research
Relaxing Constraints:
Thoughts on the
Evolution of Computer
Architecture
Joel Emer
Alpha Development Group
Compaq Computer Corporation
Better answers
Moore’s Law Alpha-style
100
EV67-730
EV6-575
SPECint95.
EV56-500
EV56-400
10
EV5-300
EV45-275
EV4-200
1
3.73
Date of Introduction
Better answers
EV56-600
Iron Law of Performance
Performance = Frequency * Instructions
CPI
Frequency – largely circuit design/technology
CPI – largely organization
Instructions – largely architecture/compiler
Better answers
Outline
Review of technology factors
Retrospective on the quantitative method
Augmenting the quantitative method
Recommendation
Better answers
Power Dissipation Trends
Power Dissipation
80
60
40
20
0
21064 21164 21264 21364
80
70
60
50
40
30
20
10
0
3.5
3
2.5
2
1.5
1
0.5
0
21064 21164 21264 21364
•Power consumption is increasing
•Supply current is increasing faster!
Better answers
Voltage (V)
Power (W)
100
Current (A)
3.5
3
2.5
2
1.5
1
0.5
0
Voltage (V)
120
Supply Current
Coping With Power Growth
Technology techniques
Better cooling technology needed
Accelerate V
dd scaling
SOI
Clock distribution
Architectural possibilities
Use less power-hungry structures
Reduce useless speculation
Better answers
Clock Distribution Trends
21264 Power (Peak)
2%
5%
8%
32%
Global Clock Networks
Instruction Issue Units
10%
Caches
Floating Execution Units
Integer Execution Units
Memory Management Unit
10%
I/O
Miscellaneous Logic
15%
Better answers
18%
Frequencies will continue to scale
Clock edge rates are not scaling
Coping With Clock Distribution
Technology solution
Low swing differential clocks
Adiabatic clocking
Architectural possibilities
Multiple clock zones
Asynchronous design
Better answers
Communication Delay
Microprocessor Chip
21064 ~ 1cycle
21164 ~ 1.5 cycles
21264 ~ 3 cycles
21464 ~ 6 cycles
Not drawn to scale
Better answers
Coping With Communication Delay
Technology solutions
Low K dielectrics
Thinner (Cu) interconnect
Architectural possibilities
Deeper pipelining
Replication/clustering of structures
More autonomous computation
Better answers
SIA Roadmap
1997 1999 2002 2005 2008 2012
Technology Node (um)
250 180 130 100
70
50
Memory (bit/chip)
256M 1G 4G 16G 64G 256G
Transistors/chip (MPU)
11M 21M 76M 200M 520M 1.4G
Chip Frequency (MHz)
750 1250 2100 3500 6000 10,000
Wiring Levels (max)
6 6 to 7
7 7 to 8 8 to 9
9
Power Supply Voltage, Vdd (V)
1.8-2.5 1.5-1.8 1.2-1.5 0.9-1.2 0.6-0.9 0.5-0.6
Power - High Performance (W), w/Heat sink
70
90 130 160 170 175
Power -Hand-held (W)
1.2 1.4
2 2.4 2.8 3.2
*The 2012 is directly from the SIA 1997 National Technology Roadmap
Better answers
Outline
Review of technology factors
Retrospective on the quantitative method
Augmenting the quantitative method
Recommendation
Better answers
Disclaimer
The names used and events depicted in this talk are
meant to be real. The events are, however, not an
exhaustive enumeration of significant milestones.
The misrepresentations of fact and omission of
contributors are unintentional and solely the
responsibility of the presenter. Finally, the
interpretations are just that and are mine as well.
Better answers
Early quantitative method - 1981
Better answers
uPC Histogram Chart – 1981-5
TABLE 8
Average VAX Instruction Timing (Cycles per Instruction)
Compute
Decode
1.000
Spec1
0.895
Spec2-6
1.052
B-Disp
0.221
Simple
0.870
Field
0.482
Float
0.292
Call/Ret
0.937
System
0.434
Character
0.318
Decimal
0.026
Int/Except
0.055
Mem Mngmt
0.555
Abort
0.127
TOTAL
7.267
Better answers
Read
R-Stall
Write
W-Stall
lB-Stall
0.613
0.306
0.148
0.364
0.116
0.161
0.192
0.102
0.005
0.029
0.049
0.000
0.133
0.015
0.039
0.002
0.002
0.061
0.017
0.058
0.000
0.074
0.031
0.099
0.000
0.005
0.200
0.033
0.007
0.008
0.130
0.014
0.046
0.001
0.004
0.004
0.027
0.002
0.001
0.184
0.028
0.004
0.002
0.006
0.003
0.783
0.964
0.409
0.450
0.720
Total
1.613
1.565
1.771
0.226
0.977
0.600
0.302
1.458
0.522
0.506
0.031
0.071
0.824
0.127
10.593
Paper counts
ISCA 1
ISCA24
No model
22
1
Analytic Model
5
½
Simulation
1
21½
Measurement
0
7
Better answers
Scientific Method
Make hypothesis about behavior
Design experiment
Run experiment and quantify
Interpret results
New hypothesis
Better answers
Scientific Method
Make hypothesis about behavior
Pick baseline design and workload
Run experiment and quantify
Interpret results
New hypothesis
Better answers
Scientific Method
Make hypothesis about behavior
Pick baseline design and workload
Run simulation model or measure hardware
Interpret results
New hypothesis
Better answers
Scientific Method
Make hypothesis about behavior
Pick baseline design and workload
Run simulation model or measure hardware
Interpret results
Propose new design
Better answers
Making and Testing Hypothesis
Cache experiment (Schlansker)
64K word cache
32-way set associative cache/LRU replacement
200x200 matrix subblock of an N x N matrix
Read twice
Sizes
N=2727: 0 misses
N=2729: 24160 misses
N=2731: 36382 misses
Better answers
Propose new design
Skewed associative (Seznec)
Direct mapped
Better answers
4-way associative
4-way skewed
Quantitative Approach Problems
Too much abstraction
Intra-chip latencies
Memory subsystem
Poor workloads
Too incremental…
Better answers
Quantitative -> Incremental
4
3.5
3
2.5
2
1.5
1
0.5
0
a
Better answers
b
c
d
e
f
g
h
I
j
k
l
Outline
Review of technology factors
Retrospective on the quantitative method
Augmenting the quantitative method
Recommendation
Better answers
Relaxing Constraints
Select a constraint to relax
Generate design
Employ quantitative method
Evaluate results
Better answers
Important Steps…
Before
Carefully pick a constraint to relax
After
Find contributions without constraint
Preserving results after reinstating the constraint
Better answers
Extrapolate From Current Trends
Personal Workstation – Xerox PARC – late 70’s
VAX 11/780
Dorado
5 MHz
15 MHz
512 Kilobytes
8 Megabytes
40+ Users
1 User
Results
Accelerate innovation
Better answers
Throw Out Standards
Distributed file system - 1985
Better answers
Use a Simpler Starting Point
Fetch
RISC out-of-order (Johnson, Tourng)
Decode/
Map
Queue
Reg
Read
Execute
Dcache/
Store
Buffer
Reg
Write
PC
Register
Map
Regs
Icache
Better answers
Dcache
Regs
Retire
CISC-based O-O-O
K6 (Johnson)
Pentium Pro (Colwell, Papworth…)
PC
Covert
CISC
to RISC
Icache
Better answers
RISC
O-O-O
Core
Abandon conventions
VLIW (Fisher)
Relieve hardware of all dependency responsibility
Give that responsibility to compiler
Expected consequences
Much simpler implementation
Faster cycle time
Better answers
Sometimes not what you expect
Compiler scheduling for hardware is a great idea
For 21064 - narrow in-order
For 21164 - wider in-order
For 21264 – wider out-of-order
Better answers
Issue Logic Critical Loop
Issue
Conflict
Checker
to floating point
multiply pipeline
to floating point
add pipeline
X
to integer
pipeline 0
to integer
pipeline 1
Instruction
Slot
S2
Better answers
Instruction
Issue
S3
Make a Radical Departure
Multiscalar research (Sohi, Smith…)
Better answers
New Mechanism Required
Dependence prediction (Moshovos)
Store
Program
Order
Execution
Order
Load
Load
Store
Store
Load
Trap!
Load
Load
Better answers
Load
What Was Really Important
Full hardware management (Sohi)
Sequencing
Register dependencies
Memory dependencies
Refinement (Mowry and Olukuton)
Compiler managed – registers, sequencing
Hardware managed memory dependence only
Better answers
Ignoring Implementation Realities
SMT - in-order (Tullsen, Eggers, Levy)
Fetch
Issue
Reg
Read
Execute
Dcache/
Store
Buffer
Reg
Write
PC
Icache
Regs
Icache
Better answers
Dcache
Regs
Solution Already Available
Fetch
SMT out-of-order
Decode/
Map
Queue
Reg
Read
Execute
Dcache/
Store
Buffer
Reg
Write
PC
Register
Map
Regs
Icache
Better answers
Dcache
Regs
Retire
Outline
Review of technology factors
Retrospective on the quantitative method
Augmenting the quantitative method
Recommendation
Better answers
Pay Attention to Reality
Look at technology trends
Power
Latency
Use more realistic models
More organizational details
Better workloads
Better answers
Ignore Reality
Look for revolutionary contributions
Decide on a constraint to relax
Apply the scientific method
Revolutionary contributions may arise because
–
Constraint will be relaxed in time
–
Constraint wasn’t fundamental
–
New avenues of exploration will be opened
Better answers
Acknowledgments
Bill Bowhill
Paul Gronowski
Bill Herrick
Toni Juan
Geoff Lowney
Ellen Piccioli
Andre Seznec
Better answers