2006 Altera Presentation Template
Download
Report
Transcript 2006 Altera Presentation Template
FPGAs and Structured ASICs
Overview & Research Challenges
Vaughn Betz
Director, Software Engineering
©©2006
2005Altera
AlteraCorporation
Corporation
Agenda
What is an FPGA?
FPGA & ASIC market dynamics
FPGA technology
Structured ASIC technology
Research Challenges
Power
Scalable CAD
CAD to raise abstraction level
Structured ASIC total cost
© 2006 Altera Corporation
2
What is an FPGA?
©©2006
2005Altera
AlteraCorporation
Corporation
What is an FPGA?
Field Programmable Gate Array
Gate Array
Two-dimensional array of logic gates
Traditionally connected with customized metal
Every logic circuit (customer) needs a custommanufactured chip
Field Programmable
Customized by programming after manufacture
One FPGA can serve every customer
FPGA: re-programmable hardware
© 2006 Altera Corporation
4
Basic Internals of an FPGA
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Each logic element is
programmed to
to implement the
desired function
Programmable Connections
© 2006 Altera Corporation
5
Embedding a circuit in an FPGA
All done by CAD system (e.g. Quartus)
Chop up circuit into little pieces of logic
Each piece goes in a separate logic element (LE)
Hook them together with the programmable routing
x
y
f
LE
z
y
z
f
I/O Pads
Desired Circuit
I/O Pad
x
FPGA
© 2006 Altera Corporation
6
FPGA Logic Element
Look-Up Table (LUT) + register + extra …
LUT
A
B
0
Out
0
0
0
1
SRAM
Cell
Out
0
1
A B
FPGAs typically use 4-input or larger LUTs
Cyclone family (low cost): 4-inputs
Stratix II: Adaptive Logic Module implements 4 – 6
input LUTs efficiently
Virtex 5: 6 inputs
© 2006 Altera Corporation
7
Connecting the Logic
y
LE
z
f
I/O Pads
I/O Pad
x
FPGA
Logic elements implement the pieces of the circuit
Now hook them up with the programmable routing
© 2006 Altera Corporation
8
Programmable Routing
Programmable switches connect fixed metal
wires
Choose pattern so any logic element can
connect to any other
In2
Logic Block
SRAM
cell
© 2006 Altera Corporation
9
In1
Out
Modern, mid-size FPGA – 2S60
I/O Channels with
External Memory
Interface Circuitry
Adaptive Logic
Modules
High-Speed I/O
Channels with
DPA
M512 Block
Digital Signal
Processing
(DSP) Blocks
M4K Block
M-RAM Blocks
High-Speed I/O
Channels with
Dynamic Phase
Alignment (DPA)
I/O Channels with
External Memory
Interface Circuitry
© 2006 Altera Corporation
10
Phase-Locked
Loops (PLL)
60,440 Equivalent Logic Elements
2,544,192 Memory Bits
90nm Stratix II 2S60
FPGA and ASIC Market Dynamics
©©2006
2005Altera
AlteraCorporation
Corporation
FPGAs vs. Standard Cell ASICs
Parameter
FPGA
Standard Cell
CAD tool Cost
$2000
$Millions
Mask Cost
0
$1.4M US @ 90 nm
Bug Fix
1 hour
~10 weeks
Electrical & Optical
Check & Debug
Vendor’s Problem
Your Problem!
Time to Market
Fast
Slow
Die Size
2X to 20X
1X
Volume Cost
1X to 20X
1X
Speed
0.3X to 0.6X
1X
Power
2X to 5X
1X
© 2006 Altera Corporation
12
CMOS Semiconductor Market
Gate
Array
5%
Standard
Logic 6%
Custom IC
3%
ASSP
37%
Standard Cell
39%
2003 Total
$26.0B
© 2006 Altera Corporation
13
Programmable
Logic 10%
Traditional FPGA Users
© 2006 Altera Corporation
14
Std Cell ASIC Development Cost Trend
Total Development Costs ($M)
45
40
35
30
25
20
15
10
5
0
0.18 µm 0.15 µm 0.13 µm
Masks & Wafers
Software
90 nm
45 nm
Test & Product Engineering
Design/Verification & Layout
Note: Conservative estimate; does not include re-spins.
© 2006 Altera Corporation
15
65 nm
Result: Declining ASIC Starts
12000
Standard Cell/Gate Arrays
Design Starts
10000
8000
6000
4000
2000
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Source: Dataquest/Gartner
© 2006 Altera Corporation
16
Today’s “typical design”
EE Times, Aug 23, 2004
© 2006 Altera Corporation
17
New FPGA Users & Products
Designers “Priced Out” Of ASIC
Start-Ups / Risk-Adverse
Replacement For DSP
Consumer & Industrial
© 2006 Altera Corporation
18
Broadcast/Audio/Video
© 2006 Altera Corporation
19
Wireless
© 2006 Altera Corporation
20
Industrial, Test & Measurement
© 2006 Altera Corporation
21
Consumer: Displays
© 2006 Altera Corporation
22
Consumer Gadgets
© 2006 Altera Corporation
23
FPGA Technology
©©2006
2005Altera
AlteraCorporation
Corporation
FPGAs Need Vertical Integration
Silicon process models & expertise
FPGA architecture
Complete CAD system
Intellectual Property cores
Including soft processors
Embedded software development tools
© 2006 Altera Corporation
25
Silicon Process Knowledge
FPGAs move to latest process very early
Helps close speed, area gap with ASICs in older
processes
High volume covers development costs
Foundries use FPGAs as process drivers
Large dies
Both logic and RAM
Regular structures help shake out systematic fab
issues
Need good silicon expertise to stay on the
bleeding edge of process
© 2006 Altera Corporation
26
90nm 9-layer Interconnect
© 2006 Altera Corporation
27
Transistor
90nm Transistor Cross-section
Dielectric
Contact
Salicide
Spacer Poly Spacer
Isolation
Isolation
© 2006 Altera Corporation
28
Diffusion
FPGA Architecture
Want to improve speed, area & power to
Close gap with ASICs
Stay ahead of competition
Need to ensure device
Is routable
Has right mix of features
Huge problem space
Routing wires, switch pattern, LUT size, RAM
types, logic block size, …
© 2006 Altera Corporation
29
Architecting via Virtual Prototypiong
FPGA
Arch. Spec
(150 pages)
Customer Designs
IP, Reference Designs
FMT
Critical Path Delay (s)
FPGA
Database
(300M)
1.6E-08
1.5E-08
1.5E-08
1.4E-08
1.4E-08
0
0.2
0.4
0.6
0.8
Fraction Length 4 Wires
© 2006 Altera Corporation
30
Params
FMT Place&Route
Timing, Area
Models
Analysis:
Length 4/16
Length 4/8
1.6E-08
FMT Synthesis
1
Speed & Area
Routability, Power
Distribution of Designs and Types
computer
automotive
medical
wireline
networking
storage
wireless
Parallel design
Carefully Manage Risk vs. Reward
Can’t Do This Sequentially
Circuit Design
aclr[1:0]
sclr
aload
carry_in
ena[2:0]
sload
share_in
clk[1:0]
Process Technology
reg_cascade_in
F0
E0
A
B
DC0
LUT 4
0
1
CLR
+
0
D
Q
lelocal0
1
1
LUT 3
0
0
0
leout0a
1
1
EN
R
0
LUT 3
leout0b
DATA
0
Concurrent
Design
LD
1
1
0
DC1
LUT 4
1
CLR
+
0
D
Q
lelocal1
1
0
1
LUT 3
0
1
0
1
R
0
DATA
0
LD
1
LUT 3
leout1a
EN
leout1b
1
E1
F1
share_out
carry_out
reg_cascade_out
FPGA Architecture
© 2006 Altera Corporation
31
Software
Complete Design Flow: Quartus II
// Begin: Write Control
always @ (posedge wrbusy_int)
begin
// Begin:
Control
write0 Write
<= 1'b1;
always <=
@ (posedge
wrbusy_int)
write1
1'b0;
begin <= 1'b0;
writex
// Begin:
Control
write0 Write
<= 1'b1;
end
always
@
wrbusy_int)
write1 <= (posedge
1'b0;
begin
<= wrbusy_int)
1'b0;
always @ writex
(negedge
write0
<= 1'b1;
beginend
write1 <= 1'b0;
write0 <= writex
1'b0; <= 1'b0;
@ (negedge
wrbusy_int)
end always
beginend
IP Cores
<= 1'b0;
always @ write0
(posedge
always @ write0_done)
(negedge wrbusy_int)
beginend begin
write1 <= write0
1'b1; <= 1'b0;
always @ (posedge
write0_done)
beginend
Verilog,
VHDL
write1 <= 1'b1;
always @ (posedge write0_done)
begin
write1 <= 1'b1;
Synthesis
3-rd Party
or Altera
Placement
& Routing
Physical
Synthesis
Timing &
Power
Analysis
Assembler
Over 10 Million
Lines of Code!
© 2006 Altera Corporation
32
Report
IP Core: Nios II Soft Processor
Three CPU Choices:
Nios II/f
Nios II/s
Nios II/e
Fast: Optimized for Performance
Standard: Faster and Smaller than Nios
Economy: Smallest FPGA Footprint
Choose peripherals you want
SoPC Builder software builds bus interfaces,
arbitration etc.
Smaller
© 2006 Altera Corporation
33
Nios II/e
Nios II/s
Nios II/f
Faster
Soft Processors are Affordable
Largest Stratix II
180,000 LEs
Small Cyclone II
4600 LEs
Nios II
Nios II
Nios II
FPGA
FPGA
Nios II
Nios II
600 LEs
13% of FPGA
Nios II
Nios II/e “economy”
35¢ in lowest
cost FPGA
© 2006 Altera Corporation
34
1800 LEs, 1% of FPGA
Nios II/f “fast”
Massively Parallel Nios II
Barco Media & Entertainment
Olite 510 LED Display System
Modular LED Display
System
100 Nios II Processors
per square meter!
FPGA Used:
© 2006
Corporation
©
2005Altera
Altera
Corporation 35
Structured ASIC Technology
©©2006
2005Altera
AlteraCorporation
Corporation
What is a Structured ASIC?
Use fixed masks for most layers
Use customer-specific masks for a few via
& metal layers
To customize the logic cells and to route
signals between logic cells
Has characteristics between an FPGA and
a standard cell ASIC
Faster and smaller than an FPGA
But lower development cost & time than a
standard cell ASIC
© 2006 Altera Corporation
37
FPGA to Structured ASIC
Two Metal Layers for
Customization
Flip-Chip
Bumps
Signal
Routing
© 2006 Altera Corporation
38
Configuration Routing
LEs, Memory, PLLs,
DSP Blocks,
Internal Routing
Common
Base Die
Stratix HardCopy Base Array
Memory
Configuration Memory
& Logic
Interconnect
Resulting Base Die
Up to 70% Smaller
Remove
Remove
Start With
Configuration,
Interconnect
The FPGASystem
Logic
Die &
Memory Programmability
© 2006 Altera Corporation
39
Logic Elements
Development Cost and Risk
Mask cost reduced vs. Std. Cell
~5 masks instead of ~30
Verification of crosstalk, electromigration etc.
much easier than Std. Cell
Since most layers are standard
Same PLLs, I/Os, RAMs and packages as FPGA
Debug your system with an FPGA, then do a drop-in
replacement with HardCopy
Can ship systems with FPGA until volume merits going
to HardCopy
Can get customer feedback on systems with FPGAs
and tweak before going to HardCopy
© 2006 Altera Corporation
40
Identical Operation
EP1S80F1020, 105C, VCC-5%
HC1S80F1020, 105C, VCC-5%
Data Rate 840 Mbps, LVDS
Key selling point for Altera HardCopy
© 2006 Altera Corporation
41
FPGA to HardCopy CAD Flow
FPGA
Constraints
Quartus
Stratix II
Flow
Stratix II
POF
Equality
Checker
HDL
Code
HardCopy
Constraints
Quartus
HardCopy II
Flow
Handoff
Design
Files
HardCopy
Design Center
Same CAD flow & guaranteed equivalence
© 2006 Altera Corporation
42
2nd Generation: HardCopy II
First generation HardCopy
Removed programmability from FPGA
Second generation HardCopy II
Removes programmability
Re-maps logic and DSP blocks to a fabric that is more
efficient in a structured ASIC
Larger die size reduction
But more complex CAD flow
Typical results vs. Stratix II
70% die size reduction
60% power reduction
50% speed increase
© 2006 Altera Corporation
43
HardCopy II Logic Remapping
Section of Stratix II Floorplan
Section of HardCopy II Floorplan
HCell Macro
Implementations of ALMs
M4K
Block
Logic ALMs
•Not Drawn to Scale
•Illustration Only, Not Actual Quartus II
Floorplan View
© 2006 Altera Corporation
44
DSP Block Remapping
Stratix II Floorplan
(only DSP Blocks Shown)
Built as Needed
using HCell Macros
Can be Placed
Anywhere in the
Floorplan where
HCells Exist
© 2006 Altera Corporation
45
HardCopy II Floorplan
Research Challenges
©©2006
2005Altera
AlteraCorporation
Corporation
Power
©©2006
2005Altera
AlteraCorporation
Corporation
Power Scaling
130 nm and above
FPGAs scaled without regard to power
Got full performance boost of process
90 nm and below
Power-constrained scaling
Low-cost FPGA power budget: ¼ W to 3 W
High-speed FPGA: 2 W to 20 W
Maximum performance within power budget
© 2006 Altera Corporation
48
Process Scaling & Power
Dynamic Power drops per LE
But reduction is less than 50% / LE
Doubling LE count increases power budget
Static Power tends to increase
Use higher Vt, thicker Tox, longer L on nontiming-critical circuitry
If still too high, sacrifice speed by increasing
Vt, Tox, L on timing-critical circuitry
Can compensate by making architecture faster
E.g. Larger LUT
© 2006 Altera Corporation
49
Controlling Power
90 nm
Process parameters
FPGA CAD tools optimize for power
20% dynamic power reduction
Innovate on performance, then trade for Pstatic
E.g. Stratix II ALM: larger LUT
65 & 45 nm
Innovation needed!
32 nm
Process will likely have better static power
Double-gates FETs, high-K gate dielectric
© 2006 Altera Corporation
50
CAD for Power Optimization
Timing-Driven Compiler
Timing Yes
Critical?
Min
Delay
Power-Driven Compiler
Timing
Critical?
Yes
Min
Delay
No
No
Power Yes
Critical?
Min
Area
No
Min
Area
© 2006 Altera Corporation
51
Min
Power
E.g. Power-Optimized RAM Mapping
1K X 16
RAM
Default Option
Power Efficient Option
16
16
4 1Kx4 M4K RAMs
© 2006 Altera Corporation
52
2:4
Decoder
4 256x16 M4K RAMs
E.g. Power-Driven Place & Route
Minimize capacitance of high-toggling signals
Without violating timing constraints
20 Million Toggle/s
100 Million Toggle/s
Power Optimize
© 2006 Altera Corporation
53
CAD Scalability
©©2006
2005Altera
AlteraCorporation
Corporation
FPGA Logic & Memory Growth
40
700
Logic Elements (K)
30
500
400
20
300
200
10
100
0
0
1998
1999
2000
250 nm 180 nm 180 nm
© 2006 Altera Corporation
55
2001
2002
2004
2006
150 nm 130 nm 90 nm 65 nm
2009
45 nm
Memory Bits (Mbits)
600
FPGA Capacity vs. CPU Speed
30X logic growth from 1998 to 2006
Over 30X memory bits growth
~8X CPU speed increase from 1998 to
2006
FPGA CAD problem growing more rapidly
than CPU speed
But productivity of FPGA designers
depends on many compiles
To iteratively debug, add features, close timing
© 2006 Altera Corporation
56
Compile Time
Need to find highly scalable algorithms
For placement, routing, synthesis
Do not sacrifice result quality
Future: single processor speed-up will fall further
behind FPGA capacity growth
But more cores per chip
Today: 2
2007: 4
Parallel CAD tools, with same result quality?
Need sequentially consistent algorithms, or debugging
is a nightmare
© 2006 Altera Corporation
57
Increasing Design Abstraction
©©2006
2005Altera
AlteraCorporation
Corporation
FPGA Usage
FPGA design is usually done in Hardware
Description Language (HDL)
Limits FPGA use to hardware designers
FPGAs can:
Outperform DSPs
Create custom hardware / software systems
that outperform fixed microcontrollers
Usage in these fields limited by
unfamiliarity with HDL design
© 2006 Altera Corporation
59
Efficiency vs. Development Cost
High
Power & System Cost*
Development Difficulty & Cost
Low
Processor
DSP
FPGA
Struct.
ASIC
Std. Cell
*For applications with significant parallelism
© 2006 Altera Corporation
60
Full
Custom
Raising Design Abstraction
Ideal: software engineers can design hardware
C to gates
Not achievable in general
Practical: domain-specific higher-level tools
SoPC builder:
Build a custom microcontroller
Integrate IP cores
C-HAC, Impulse, Celoxica: Hardware accelerator for
targeted C code, soft processor for rest
DSP Builder: Convert DSP block diagrams to
hardware
Other tools?
© 2006 Altera Corporation
61
Modern FPGA RTL Design Flow
Specification
Custom RTL
Development
Functional
Verification
IP Cores
Design
Third-Party
Software
RTL Logic
Synthesis
Place-&-Route
& Physical Synth.
Timing Verification
& Debug
Compilation
& Optimization
Hardware/Software
Debug
Product
Verification
© 2006 Altera Corporation
62
62
Extending the Design Flow
RTL Design Flow
Back-end Flow
Hardware/Software
Debug
Product
© 2006 Altera Corporation
63
63
Extending the Design Flow To System Level
HW/SW
Interface
Generation
Higher Level
Languages
Embedded Soft
Processors
IP Core Reuse
System Integration
Interface Synthesis
RTL Design Flow
Back-end Flow
Hardware/Software
Debug
Product
© 2006 Altera Corporation
64
64
Structured ASIC Architecture
©©2006
2005Altera
AlteraCorporation
Corporation
Structured ASIC Architecture
Many questions similar to FPGA
Logic cell, RAM types, structure of custom metal
routing layers for best speed, area, power
Metal programmed answers different than FPGA
How to keep non-recurring engineering cost low
Few masks?
Cheap masks?
Make custom layers easy to electrically and optically
verify?
Clever tricks?
Still have to beat FPGA speed, area, power
And device must be routable
© 2006 Altera Corporation
66