ahmed-fpga2010 - University of British Columbia

Download Report

Transcript ahmed-fpga2010 - University of British Columbia

Impact of
Interconnect Architecture
on VPSAs
(Via-Programmed Structured ASICs)
Usman Ahmed
Guy Lemieux
Steve Wilton
System-on-Chip Lab
University of British Columbia
What is a Structured ASIC?
• An FPGA without reprogrammable interconnect
– Interconnect is mask-programmed
Interconnect
Layers
Transistors
2
What is a Structured ASIC?
• An FPGA without reprogrammable interconnect
– Interconnect is mask-programmed
• Two types
Via Programmable
VPSA
Interconnect
Layers
Transistors
This Talk
3
What is a Structured ASIC?
• An FPGA without reprogrammable interconnect
– Interconnect is mask-programmed
• Two types
Via Programmable
VPSA
Metal Programmable
MPSA
This Talk
FPT 2009
Interconnect
Layers
Transistors
4
Key Messages
1. Structured ASICs
will be the key technology
of the future.
5
Key Messages
1. Structured ASICs
will be the key technology
of the future.
Because the key issues that
make structured ASICs attractive
have not been solved.
They are growing more prominent.
6
Key Messages
1. Structured ASICs
will be the key technology
of the future.
Because the key issues that
make structured ASICs attractive
have not been solved.
They are growing more prominent.
2. Interconnect matters.
MPSAs have better performance,
VPSAs are cheaper.
7
Motivation for Structured ASICs
• Enormous NRE + Design cost
• SAs reduce mask
limit access to advanced process
cost + risk
• SAs must reduce
design effort
• Many ICs are still manufactured
with old process technologies
– New processes (90nm and below)
49% of TSMC revenue in 2009 Q1
• SAs make advanced
processes profitable
• FPGAs not suitable for low power, • SAs lower power,
consumer-oriented,
lower cost
hand-held devices
than FPGAs
8
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
9
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
10
VPSA Die-Cost
• Cost is more important than die area
• Primary cost components
– Die Area
– Number of configurable layers (New for structured ASICs)
• Secondary cost components
– Die Yield
– Wafer and processing cost
– Volume requirements
11
VPSA Cost Model
Costdie = Cbase +
Ccustom +
Cproto
12
VPSA Cost Model
Costdie =
Cost of the masks for the base Cost of fabricating the base
+ portion
+
Cbase +portion)
(common
Ccustom +
Cproto
13
VPSA Cost Model
Costdie =
Cost of the masks for the base Cost of fabricating the base
+ portion
+
Cbase +portion)
(common
Ccustom
Cost
of the+remaining masks + Cost of fabricating the +
remaining portion
Cproto
14
VPSA Cost Model
Costdie =
Cost of the masks for the base Cost of fabricating the base
+ portion
+
Cbase +portion)
(common
Ccustom
Cost
of the+remaining masks + Cost of fabricating the +
remaining portion
Similar
Cprototo Ccustom, but depends on
the number of spins
15
VPSA Cost Model
Cbase
Costdie =
Csml N fml  Csmu ( Nfmm  Nfmv  Nfmu ) Cwpm Nfmm  Nfmv  Csw


Cbase + Vtot
N gdpw
Csmu Nvl N fmv

V
Ccustom
+
c
Cwpm ( Nvl Nfmv  Nfm m  Nfmv  Nfmu )
Ngdpw

Ccustom
 Ns  1 

 Csmu Nvl Nfmv  Cwpm ( Nfml  Nvl Nfmv  Nfmm  Nfmv  Nfmu )  Csw
C Vc 
proto
C


proto
•
•
•
•
•
Die Area and Yield: Ngdpw
Primary Cost Variables
Configurable layers: Nvl
Fixed layers: Nfm , Nfm , Nfm , Nfm
l
m
v
u
Mask/wafer cost: Csml,Csmu,Cwpm, Nmprl Constants
Volume requirements: Vtot, Vc
16
VPSA Cost Model
 K1

K0
Cost die 
 Nvl 
 K2   K3
N

Ngdpw
 gdpw

Die Area
4503 (10mm2 die) … 450 (100mm2 die)
Configurability
1 … 5 via layers
17
VPSA Cost Model
$4800
$440
$0.74
 K1

K0
Cost die 
 Nvl 
 K2   K3
N

Ngdpw
 gdpw

$1.08
Die Area
4503 (10mm2 die) … 450 (100mm2 die)
Configurability
1 … 5 via layers
• Key Assumptions
– 45nm Maskset cost: $2.5M
– Total volume: 2M
– Per-customer volume: 100k
– No. of spins: 2
18
VPSA Cost Model
VPSA
2
Core Area (mm )
200
$35
160
$25
120
$18
$12
80
$8
40
1
2
3
4
5
Nvl
• At constant cost, area can be traded for
number of customizable layers
19
VPSA Cost Model
VPSA
2
Core Area (mm )
200
$35
160
$25
120
$18
$12
80
$8
40
1
2
3
4
5
Nvl
11 mm2/layer
• At constant cost, area can be traded for
number of customizable layers
20
VPSA Cost Model
VPSA
MPSA
200
$35
2
Core Area (mm )
2
Core Area (mm )
200
160
$25
120
$18
$12
80
$8
40
$35
160
$25
120
$18
80
$12
40
$8
1
2
3
4
Nvl
11 mm2/layer
5
2
3
4
5
6
Nrl
30 mm2/layer
• At constant cost, area can be traded for
number of customizable layers
21
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
22
Metrics
• Cost
– Detailed cost model (just presented)
• Area
– Placement grid size after whitespace insertion
• Determined by CAD flow
• Delay and Power
– Please see paper
23
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
24
CAD Flow
Input Circuit
T-VPACK
Std. Cell Placer: CAPO
Std. Cell Router: FGR
Tech. Mapping
Placement
Global Routing
Insert Whitespace
Any
Yes
Congestion?
No
Interconnect Architecture
Detailed Routing
Custom Router
-Pathfinder based
Any
Yes
Congestion?
No
Area/Delay/Power/Cost Estimate
25
CAD Flow
Input Circuit
T-VPACK
Std. Cell Placer: CAPO
Std. Cell Router: FGR
Tech. Mapping
- Simulated annealing too slow
- Uniform whitespace
distribution
- Logic cells snapped to grid
Placement
Global Routing
Insert Whitespace
Any
Yes
Congestion?
No
Interconnect Architecture
Detailed Routing
Custom Router
-Pathfinder based
Any
Yes
Congestion?
No
Area/Delay/Power/Cost Estimate
26
CAD Flow
Input Circuit
T-VPACK
Std. Cell Placer: CAPO
Std. Cell Router: FGR
Tech. Mapping
- Increase Placement Grid Size
Placement
Global Routing
Insert Whitespace
Any
Yes
Congestion?
No
Interconnect Architecture
Detailed Routing
Custom Router
-Pathfinder based
Any
Yes
Congestion?
No
Area/Delay/Power/Cost Estimate
27
CAD Flow
Input Circuit
T-VPACK
Std. Cell Placer: CAPO
Std. Cell Router: FGR
Tech. Mapping
Placement
Global Routing
Insert Whitespace
- Routing graph for only
single basic tile
- Expand wavefront only
along the global route
Any
Yes
Congestion?
No
Interconnect Architecture
Detailed Routing
Custom Router
-Pathfinder based
Any
Yes
Congestion?
No
Area/Delay/Power/Cost Estimate
28
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
29
Routing Fabrics
Crossover Fabric
All wires same length!
30
Routing Fabrics
Crossover Fabric
All wires same length!
31
Routing Fabrics
Crossover Fabric
All wires same length!
32
Routing Fabrics
Jumper Fabric
Long wires OK!
33
Routing Fabrics
Jumper Fabric
Long wires OK!
34
Routing Fabrics
Jumper Fabric
Long wires OK!
35
Routing Fabric Comparison
Crossover Fabric
Jumper Fabric
Long Segment that
spans two blocks
These points coincide
Layer2i+1 (i=0, 1, …)
•
•
Layer2i+2 (i=0, 1, …)
Single via to extend
All wires same: length-1
Layer2i+1 (i=0, 1, …)
•
•
•
•
Jumper is not required
at this point
Layer2i+2 (i=0, 1, …)
Two vias to extend
Short segments: 1 blocks
Long segments: 4 blocks, staggered
Two variants
– Jumper20: 20% Long segments
– Jumper40: 40% Long segments
36
Logic Block Model
• Characteristics of logic block
– Physical dimensions
(in wire pitches)
– Pin locations
• Do not need low-level
layout details
37
Parameterize Logic Block
• Cover wide search space for logic blocks
• Vary layout density
– Dense: Determined by # pins (small layout area)
– Sparse: Determined by Standard Cell implementation
• Vary logic capacity
– Sweep number of inputs and outputs
• 2-input, 1-output logic blocks (shown here)
• 16-input, 8-output logic blocks (also in paper)
– Use logic clustering (T-VPack) as tech-mapper
38
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
39
Area, Cost Trends
• Experimental results
– MCNC benchmarks
• Geometric mean over 19 large circuits
– Logic block density
• Dense, medium, and sparse
– Logic block capacity
• From 2-input, 1-output to 16-input, 8-outputs
• Only 2-input, 1-output results shown here
40
Area and Die-Cost Trends
15
Crossover
Jumper20
Jumper40
MPSA
1.5
1
Cost ($)
Core Area
2
Area
0.5
0
Dense Logic Block
Dense Logic Block
1
2
Nvl
MPSAs:(Nrl – 1)
3
10
Cost
5
0
1
2
3
Nvl
MPSAs:(Nrl – 1)
41
Area and Die-Cost Trends
2
15
Crossover
Jumper20
Jumper40
MPSA
1.5
1
Cost ($)
Core Area
Dense Logic Block
Dense Logic Block
0.5
0
1
2
Nvl
MPSAs:(Nrl – 1)
Area
3
10
Cost
5
0
1
2
3
Nvl
MPSAs:(Nrl – 1)
MPSA < Crossover < Jumper
Nvl = 1 more area,
needs whitespace to route
42
Area and Die-Cost Trends
2
15
Crossover
Jumper20
Jumper40
MPSA
1.5
1
Cost ($)
Core Area
Dense Logic Block
Dense Logic Block
0.5
0
1
2
Nvl
MPSAs:(Nrl – 1)
3
10
5
0
1
2
3
Nvl
MPSAs:(Nrl – 1)
Area
Cost
MPSA < Crossover < Jumper
MPSAs: more layers  higher cost
Nvl = 1 more area,
needs whitespace to route
VPSAs: more layers  lower cost
43
Area and Die-Cost Trends
2
15
Crossover
Jumper20
Jumper40
MPSA
1.5
1
Cost ($)
Core Area
Dense Logic Block
Dense Logic Block
0.5
0
1
2
Nvl
MPSAs:(Nrl – 1)
3
10
5
0
1
2
3
Nvl
MPSAs:(Nrl – 1)
Area
Cost
MPSA < Crossover < Jumper
MPSAs: more layers  higher cost
Nvl = 1 more area,
needs whitespace to route
VPSAs: more layers  lower cost
44
Area and Die-Cost Trends
Sparse Logic Block
15
Crossover
Jumper20
Jumper40
MPSA
1.5
1
Cost ($)
Core Area
2
Sparse Logic Block
0.5
0
1
2
3
10
5
0
Nvl
MPSAs:(Nrl – 1)
Area
1
2
3
Nvl
MPSAs:(Nrl – 1)
Cost
• Sparse layout is better! ???
– Less whitespace needed
• Need to study whitespace allocation
45
Delay and Power Trends
Key results (in paper):
MPSA is significantly better than VPSA
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, cost trends
• Conclusions
47
Conclusions
• Trends for VPSAs
– Die-cost more important than die-area
– MPSAs better in Area, Delay, and Power
– VPSAs better in Cost
– Interconnect Matters
• Performance varies with different routing fabrics
• Even significant variation among VPSA structures
• Ongoing research
– Interconnect architectures
– Whitespace insertion algorithm
48
Limitations
• CAD framework available online
http://groups.google.com/group/sasic-pr
• This is early work … need improvements!
– Whitespace insertion
– Buffer insertion
– Delay/Power of logic blocks
– Power/clock network area overhead
– SRAM-configurable logic blocks
49
Key Message
1. Structured ASICs
will be the key technology
of the future.
Because the key issues that
make structured ASICs attractive
have not been solved.
They are growing more prominent.
2. Interconnect matters.
MPSAs have better performance,
VPSAs are cheaper.
50
51
CAD Framework Available
52
Power and Delay Trends
Metrics
• Area
– Determined from placement grid size
• Delay
– Average net delay (Elmore model)
• Register locations unknown; critical path delay
calculation is difficult
• CAD flow is not timing driven
• Power
– Total metal + via capacitance
54
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, delay, power, cost trends
• Cost model sensitivity
• Conclusions
55
Power Trends
Interconnect Power
Dense Logic Block
2
Crossover
Jumper20
Jumper40
1.5
1
1
2
3
Nvl
56
Power Trends
Interconnect Power
Dense Logic Block
2
Sparse Logic Block
2
Crossover
Jumper20
Jumper40
1.5
1.5
1
1
Crossover
Jumper20
Jumper40
1
2
3
Nvl
1
2
3
Nvl
• Significant range for different routing fabrics
• More custom via layers → Lower Power
– Especially for dense layouts
57
VPSA Interconnect Power
MPSA Interconnect Power
Power Trends
Sparse Logic Block
Dense Logic Block
Crossover
Jumper20
Jumper40
6
5
6
5
4
4
3
3
2
2
1
1
2
Crossover
Jumper20
Jumper40
3
1
1
Nvl
MPSAs:(Nrl – 1)
2
3
Nvl
MPSAs:(Nrl – 1)
• Re-Normalized to MPSAs
• VPSAs use more power
– 2x (sparse) to 6x (dense) more than MPSAs
58
Delay Trends
Interconnect Delay
Dense Logic Block
2
Crossover
1.5
1
1
2
3
Nvl
59
Delay Trends
Interconnect Delay
Dense Logic Block
2
Crossover
Jumper40
1.5
1
1
2
3
Nvl
60
Delay Trends
Interconnect Delay
Dense Logic Block
2
Crossover
Jumper40
Jumper20
1.5
1
1
2
3
Nvl
61
Delay Trends
Interconnect Delay
Dense Logic Block
2
Sparse Logic Block
2
Crossover
Jumper40
Jumper20
1.5
Crossover
Jumper20
Jumper40
1.5
1
1
1
2
3
1
Nvl
2
3
Nvl
• Significant range for different fabrics
– Delay improves with more custom via layers
• Jumper Fabric: Long segments improve delay
(but higher power)
62
VPSA Interconnect Delay
MPSA Interconnect Delay
Delay Trends
20
Dense Logic Block
20
Crossover
Jumper20
Jumper40
15
15
10
10
5
5
1
2
Nvl
3
1
MPSAs:(Nrl – 1)
Sparse Logic Block
Crossover
Jumper20
Jumper40
2
3
Nvl
MPSAs:(Nrl – 1)
• Re-Normalized to MPSAs
• VPSA delay up to 20x worse
63
VPSA Interconnect Delay
MPSA Interconnect Delay
Delay Trends
20
Dense Logic Block
20
Crossover
Jumper20
Jumper40
15
15
10
10
5
5
1
2
Nvl
3
1
MPSAs:(Nrl – 1)
Sparse Logic Block
Crossover
Jumper20
Jumper40
Why is
VPSA delay
worse
2 than
N
MPSA
delay?
MPSAs:(N
– 1)
vl
rl
• Re-Normalized to MPSAs
• VPSA delay up to 20x worse
64
3
Delay Trends
VPSA Total No. of Vias
MPSA Total No. of Vias
20
Dense Logic Block
Crossover
Jumper20
Jumper40
15
10
25
more vias
15
5
1
5
1
2
Nvl
3
MPSAs:(Nrl – 1)
• Re-Normalized to MPSAs
• VPSA delay up to 20x worse
VPSA Total Wirelength
MPSA Total Wirelength
VPSA Interconnect Delay
MPSA Interconnect Delay
35
2
3
6
5
more
wirelength
4
3
2
1
2
3
Nvl
MPSAs:(Nrl – 1)
65
Delay Trendsmore vias
VPSA Total No. of Vias
MPSA Total No. of Vias
20
Dense Logic Block
Crossover
Jumper20
Jumper40
15
10
25
15
5
1
5
1
2
Nvl
3
MPSAs:(Nrl – 1)
• Re-Normalized to MPSAs
• VPSA delay up to 20x worse
VPSA Total Wirelength
MPSA Total Wirelength
VPSA Interconnect Delay
MPSA Interconnect Delay
35
2
3
6
5
more
wirelength
4
3
2
1
2
3
Nvl
MPSAs:(Nrl – 1)
66
Delay Trendsmore vias
VPSA Total No. of Vias
MPSA Total No. of Vias
20
Dense Logic Block
Crossover
Jumper20
Jumper40
15
10
25
15
5
1
5
1
2
Nvl
3
MPSAs:(Nrl – 1)
• Re-Normalized to MPSAs
• VPSA delay up to 20x worse
VPSA Total Wirelength
MPSA Total Wirelength
VPSA Interconnect Delay
MPSA Interconnect Delay
35
2
3
2
3
6
5
4
3
2
1
Nvl
more
MPSAs:(N – 1)
wirelength
rl
67
68
Cost Model Sensitivity
Talk Outline
• Cost model
• Experimental methodology
– Metrics
– CAD flow
– Architecture modeling
• Area, delay, power, cost trends
• Cost model sensitivity
• Conclusions
70
Cost Model Sensitivity
• How sensitive is the die-cost to various factors?
• Primary factors
– Die area
– Number of customizable layers
• Secondary factors
– Maskset cost
– Volume requirements
– Number of fixed lower masks
71
Cost Model Sensitivity
– Sensitivity to Maskset Cost
Maskset Cost = $2.5M
25
Crossover
Jumper20
Jumper40
MPSA
20
Cost ($)
Cost ($)
25
Maskset Cost = $5M
15
20
15
10
10
5
1
Crossover
Jumper20
Jumper40
MPSA
2
5
3
1
2
Nvl
Nvl
MPSAs:(Nrl – 1)
MPSAs:(Nrl – 1)
3
• VPSAs less sensitive to maskset cost
72
Cost Model Sensitivity
– Sensitivity to Number of Fixed Lower Masks ( N fm )
l
Nfm (Fixed Lower Masks) = 18
25
Crossover
Jumper20
Jumper40
MPSA
20
Cost ($)
Cost ($)
25
Nfm (Fixed Lower Masks) = 36
15
10
5
1
Crossover
Jumper20
Jumper40
MPSA
20
15
10
2
3
5
1
2
Nvl
Nvl
MPSAs:(Nrl – 1)
MPSAs:(Nrl – 1)
3
• VPSA cost increases more rapidly than MPSAs
– Large area of VPSAs
73
Cost Model Sensitivity
– Sensitivity to Per Customer Volume (Vc)
Vc (Per Customer Volume) = 100k
Cost ($)
25
25
Crossover
Jumper20
Jumper40
MPSA
20
Vc (Per Customer Volume) = 50k
20
15
15
10
10
5
1
2
Nvl
MPSAs:(Nrl – 1)
Crossover
Jumper20
Jumper40
MPSA
3
5
1
2
Nvl
3
MPSAs:(Nrl – 1)
• VPSAs less sensitive to customer volume
than MPSAs
74
75