Transcript p148_rea_s

MAPLD 148:"Is Scaling the Correct Approach for Radiation
Hardened Conversions of Deep Submicron Microprocessors?"
D. Rea,
D. Bayles,
A. Kazemzadeh,
F. Thoma, and
N. Haddad
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
1
P148/MAPLD 2004
Introduction
•
Opportunity: Provide faster and lower power devices for satellite
applications through the use of advanced technologies
– Migrate existing designs to new technologies
– Develop new designs in new technologies
•
Migration Challenge: Affordably maximize benefits of new technologies
•
Situation: Migrate a 0.25u CMOS version of the RAD750TM to both a
0.18u CMOS process and a 0.15u CMOS process and increase
performance by ~33% at 0.18u and ~50% at 0.15u
– All technologies are bulk CMOS
– Transistor behavior and back end metallurgy are very compatible
Increasing demands for highly reliable, radiation hardened processing power on
satellites continue to push the capabilities of technology.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
2
P148/MAPLD 2004
Title III Radiation Hardened Microprocessor for
Space Program Goals
Technical Performance Measures
(TPM) – Objectives
Technology
Machine Instructions Per Second
Equivalent Processor Frequency
- @2 MIPS/MHz
Single Event Upset (SEU)
Latchup immune
•
Prototypes 4/06
•
Flight parts 7/06
Baseline
Phase 2
Phase 3
Units
R25
264
R18
360
RH15
>400
132
1.60E-10
Yes
180
1.4E-10
Yes
200
1.4E-10
Yes
MIPS-Drystone
MHz
upsets/bit-day
Challenge: Increase microprocessor performance at each technology node
without a degradation in radiation performance while maintaining affordability.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
3
P148/MAPLD 2004
Circuit Families in the RAD750™
• Standard Cells (RLMs)
• Complex Cells (OTS)
(Data Flow)
- Harden latches and clock
splitters
- Replace dynamic logic with
static equivalents
- Design circuits to minimize
injected pulses
- Replace low Vt devices
(Control Logic)
l
- Harden latches and clock
splitters
- Design circuits to minimize
injected pulses
- Replace low Vt devices
• Custom Blocks
- Harden RAM cells, sense amps, and decoders
- Harden latches and clocks
- Harden PLL and add temperature compensation
- Replace dynamic logic with static equivalents
- Design circuits to minimize injected pulses
- Replace low Vt devices
A variety of circuit families were utilized in the RAD750 to provide density and
performance. Modifications were made to all circuit types to harden the design.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
4
P148/MAPLD 2004
Scaling Options
•
Gate shrink only (one dimension, 1D)
–
–
–
–
–
•
Pro: drive current increases from larger W/L
Pro: simple to implement
Pro: minimal impact to floorplan
Con: no decrease in die size or wiring parasitics
Con: uneven performance improvement
Two dimensional shrink (2D)
– Pro: die size and parasitics decrease
– Con: uneven performance improvement
– Con: greater perturbation to routing
•
Hybrid approach (combination of 1D, 2D shrinks and circuit optimization)
– Pro: achieve balanced improvement
– Con: increase in effort to implement (circuit level and full chip)
The objective is to pick the highest performance at lowest cost migration option.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
5
P148/MAPLD 2004
Scaling Options - Standard Cell Study
(RLMs)
Average Standard Cell Delay (No load)
0.900
Migration Technique
R18, 2DC scale/compact
R18, 1DC scale/compact
0.783
0.813
R18, 1D scale
1.000
R25, baseline
0.000
0.200
0.400
0.600
0.800
1.000
1.200
Normalized Cell Delay
1D - 1 dimension scaling, 2D - 2 dimension scaling, C - scaling with compaction
Largest average improvement observed with 1 dimensional scaling plus
compaction. However, minimal cell size reduction yields little parasitic reduction,
so advantage seen with no load is lost when loads are taken into consideration.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
6
P148/MAPLD 2004
Scaling Options - Standard Cell Study
(RLMs)
Average Standard Cell Area
Migration Technique
R18, 2DC scale/compact
0.728
R18, 1DC scale/compact
0.802
R18, 1D scale
1.000
R25, baseline
1.000
0.000
0.200
0.400
0.600
0.800
1.000
1.200
Normalize Area
1D - 1 dimension scaling, 2D - 2 dimension scaling, C - scaling with compaction
Largest average improvement observed with 2 dimensional scaling plus compaction.
However, performance improvement is not uniform across all cells.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
7
P148/MAPLD 2004
Scaling Options - Standard Cell Study
(RLMs)
Average Standard Cell Power (fixed cycle time)
0.455
Migration Technique
R18, 2DC scale/compact
R18, 1DC scale/compact
0.496
0.509
R18, 1D scale
1.000
R25, baseline
0.000
0.200
0.400
0.600
0.800
1.000
1.200
Normalized Power
1D - 1 dimension scaling, 2D - 2 dimension scaling, C - scaling with compaction
As expected, reduced power supply voltage accounts for majority of power reduction.
Overall chip power will increase when frequency of operation increases.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
8
P148/MAPLD 2004
Scaling Options - Complex Cells (Data
Flow) Study
Complex Cells 20% 2D Scaling Exercise
Use “as is”
until first
full chip
timing run
35
Number of Unique Cells
Optimize or
synthesize
as necessary
to improve
performance
30
25
20
15
10
5
0
slower
0-10%
10-15%
15-20%
>20%
Percent Speed Improvement (Unloaded)
Using two dimensional scaling, the non-uniformity in performance improvement
shows that the average is misleading. Should the “slower” cells end up in the critical
path, overall speed could go down.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
9
P148/MAPLD 2004
Scaling Options - Custom Macros
Custom macros are the heart of the processor, representing over 2/3 the total transistor
count and driving the critical performance paths.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
10
P148/MAPLD 2004
Scaling Options - Custom Macros
The MMU/TAG/CACHE paths on both the instruction and data sides comprise the
processor critical path.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
11
P148/MAPLD 2004
Scaling Options - Custom Macro Study
1D Scaling Results
Custom
Block
BHT
BTIC
DQ0
FPR
GPR
Timing Arc
(Longest Path)
RCLK->BR_PREDICT
CLK->INST_VALID_OUT
IN_QC2->DQOUT
RCLK->DOUT
RCLK->DOUT
2.25V, 125C, SS
R25 (ns)
1.346
2.922
1.327
2.315
1.959
1.62V, 125C, SS
R18 (ns)
1.072
2.309
1.439
2.252
1.870
AVERAGE
% difference
-20.36%
-20.98%
8.44%
-2.72%
-4.54%
-8.03%
1D scaling was chosen for the custom macros for the following reasons
• Critical node spacing in the memory arrays had to be maintained
• Because of the amount of custom layout in these macros, simple 2D scaling resulted
in a very large number of DRC errors that would have required considerable manual
intervention to correct
As with the data flow complex cell macros (see p. 9), scaling of the custom
macros produced uneven results.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
12
P148/MAPLD 2004
Scaling Solution for the RAD750™
• Standard Cells
- Resize transistors
- Automatically generate
layouts (~2D scaling)
- Resynthesize at chip
level
• Complex Cells
- 2D scale where appropriate
- Optimize when possible
- Synthesize from standard
cells where economically
advantageous and
performance isn’t required.
l
• Custom Blocks
- 1D Scale as baseline (minimize cost)
- Optimize circuit design (new topologies,
layout structure) as necessary
Bottom line - scaling by itself is not sufficient to meet performance goals. Scaling
combined with other techniques supports the performance goals at a reasonable cost.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
13
P148/MAPLD 2004
Predicted Performance of RAD750™
Technical Performance Measures
Phase 2/3
Phase 2/3
(TPM) – Estimated Achievement
Without
With
units
Results from Phase 1 Study
Optimization Optimization
300-340
370-420
Machine Instructions Per Second
MIPS-Drystone
150-170
185-210
Equivalent Processor Frequency
MHz
* Frequency rated at nominal and fast process given worst case conditions.
Phase 2
Phase 3
<1.4E-10
<1.4E-10
Single Event Upset (SEU)
Upsets/bit-day
Yes
Yes
Latchup immune
By combining scaling with other design techniques, program goals can be met at an
affordable price.
Rea
Cleared for Open Publication July 30, 2004 04-S-2144
14
P148/MAPLD 2004
Summary
•
Simple scaling is not sufficient to meet performance objectives at
advanced technology nodes
– improvements are not uniform
– improvements from scaling don’t meet objectives even if average
was uniform
•
To maintain affordability, a hybrid approach consisting of several scaling
techniques and circuit optimization was selected to maximize the
advantages of the advanced technologies
•
Automation is used where possible to support changes in technology
groundrules and support conversion to future technologies
– Additional automation in the custom macro area required to resolve
issues with simple scaling
•
Rea
Program goals can be met with the hybrid approach
Cleared for Open Publication July 30, 2004 04-S-2144
15
P148/MAPLD 2004