2006 Altera Presentation Template

Download Report

Transcript 2006 Altera Presentation Template

FPGAs and Structured ASICs
Overview & Research Challenges
Vaughn Betz
Director, Software Engineering
©©2006
2005Altera
AlteraCorporation
Corporation
Agenda
What is an FPGA?
FPGA & ASIC market dynamics
FPGA technology
Structured ASIC technology
Research Challenges
 Power
 Scalable CAD
 CAD to raise abstraction level
 Structured ASIC total cost
© 2006 Altera Corporation
2
What is an FPGA?
©©2006
2005Altera
AlteraCorporation
Corporation
What is an FPGA?
 Field Programmable Gate Array
 Gate Array
 Two-dimensional array of logic gates
 Traditionally connected with customized metal
 Every logic circuit (customer) needs a custommanufactured chip
 Field Programmable
 Customized by programming after manufacture
 One FPGA can serve every customer
 FPGA: re-programmable hardware
© 2006 Altera Corporation
4
Basic Internals of an FPGA
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Logic
Element
Each logic element is
programmed to
to implement the
desired function
Programmable Connections
© 2006 Altera Corporation
5
Embedding a circuit in an FPGA
 All done by CAD system (e.g. Quartus)
 Chop up circuit into little pieces of logic
 Each piece goes in a separate logic element (LE)
 Hook them together with the programmable routing
x
y
f
LE
z
y
z
f
I/O Pads
Desired Circuit
I/O Pad
x
FPGA
© 2006 Altera Corporation
6
FPGA Logic Element
 Look-Up Table (LUT) + register + extra …
LUT
A
B
0
Out
0
0
0
1
SRAM
Cell
Out
0
1
A B
 FPGAs typically use 4-input or larger LUTs
 Cyclone family (low cost): 4-inputs
 Stratix II: Adaptive Logic Module implements 4 – 6
input LUTs efficiently
 Virtex 5: 6 inputs
© 2006 Altera Corporation
7
Connecting the Logic
y
LE
z
f
I/O Pads
I/O Pad
x
FPGA
 Logic elements implement the pieces of the circuit
 Now hook them up with the programmable routing
© 2006 Altera Corporation
8
Programmable Routing
 Programmable switches connect fixed metal
wires
 Choose pattern so any logic element can
connect to any other
In2
Logic Block
SRAM
cell
© 2006 Altera Corporation
9
In1
Out
Modern, mid-size FPGA – 2S60
I/O Channels with
External Memory
Interface Circuitry
Adaptive Logic
Modules
High-Speed I/O
Channels with
DPA
M512 Block
Digital Signal
Processing
(DSP) Blocks
M4K Block
M-RAM Blocks
High-Speed I/O
Channels with
Dynamic Phase
Alignment (DPA)
I/O Channels with
External Memory
Interface Circuitry
© 2006 Altera Corporation
10
Phase-Locked
Loops (PLL)
60,440 Equivalent Logic Elements
2,544,192 Memory Bits
90nm Stratix II 2S60
FPGA and ASIC Market Dynamics
©©2006
2005Altera
AlteraCorporation
Corporation
FPGAs vs. Standard Cell ASICs
Parameter
FPGA
Standard Cell
CAD tool Cost
$2000
$Millions
Mask Cost
0
$1.4M US @ 90 nm
Bug Fix
1 hour
~10 weeks
Electrical & Optical
Check & Debug
Vendor’s Problem
Your Problem!
Time to Market
Fast
Slow
Die Size
2X to 20X
1X
Volume Cost
1X to 20X
1X
Speed
0.3X to 0.6X
1X
Power
2X to 5X
1X
© 2006 Altera Corporation
12
CMOS Semiconductor Market
Gate
Array
5%
Standard
Logic 6%
Custom IC
3%
ASSP
37%
Standard Cell
39%
2003 Total
$26.0B
© 2006 Altera Corporation
13
Programmable
Logic 10%
Traditional FPGA Users
© 2006 Altera Corporation
14
Std Cell ASIC Development Cost Trend
Total Development Costs ($M)
45
40
35
30
25
20
15
10
5
0
0.18 µm 0.15 µm 0.13 µm
Masks & Wafers
Software
90 nm
45 nm
Test & Product Engineering
Design/Verification & Layout
Note: Conservative estimate; does not include re-spins.
© 2006 Altera Corporation
15
65 nm
Result: Declining ASIC Starts
12000
Standard Cell/Gate Arrays
Design Starts
10000
8000
6000
4000
2000
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Source: Dataquest/Gartner
© 2006 Altera Corporation
16
Today’s “typical design”
EE Times, Aug 23, 2004
© 2006 Altera Corporation
17
New FPGA Users & Products
Designers “Priced Out” Of ASIC
Start-Ups / Risk-Adverse
Replacement For DSP
Consumer & Industrial
© 2006 Altera Corporation
18
Broadcast/Audio/Video
© 2006 Altera Corporation
19
Wireless
© 2006 Altera Corporation
20
Industrial, Test & Measurement
© 2006 Altera Corporation
21
Consumer: Displays
© 2006 Altera Corporation
22
Consumer Gadgets
© 2006 Altera Corporation
23
FPGA Technology
©©2006
2005Altera
AlteraCorporation
Corporation
FPGAs Need Vertical Integration
Silicon process models & expertise
FPGA architecture
Complete CAD system
Intellectual Property cores
 Including soft processors
Embedded software development tools
© 2006 Altera Corporation
25
Silicon Process Knowledge
 FPGAs move to latest process very early
 Helps close speed, area gap with ASICs in older
processes
 High volume covers development costs
 Foundries use FPGAs as process drivers
 Large dies
 Both logic and RAM
 Regular structures help shake out systematic fab
issues
 Need good silicon expertise to stay on the
bleeding edge of process
© 2006 Altera Corporation
26
90nm 9-layer Interconnect
© 2006 Altera Corporation
27
Transistor
90nm Transistor Cross-section
Dielectric
Contact
Salicide
Spacer Poly Spacer
Isolation
Isolation
© 2006 Altera Corporation
28
Diffusion
FPGA Architecture
Want to improve speed, area & power to
 Close gap with ASICs
 Stay ahead of competition
Need to ensure device
 Is routable
 Has right mix of features
Huge problem space
 Routing wires, switch pattern, LUT size, RAM
types, logic block size, …
© 2006 Altera Corporation
29
Architecting via Virtual Prototypiong
FPGA
Arch. Spec
(150 pages)
Customer Designs
IP, Reference Designs
FMT
Critical Path Delay (s)
FPGA
Database
(300M)
1.6E-08
1.5E-08
1.5E-08
1.4E-08
1.4E-08
0
0.2
0.4
0.6
0.8
Fraction Length 4 Wires
© 2006 Altera Corporation
30
Params
FMT Place&Route
Timing, Area
Models
Analysis:
Length 4/16
Length 4/8
1.6E-08
FMT Synthesis
1
Speed & Area
Routability, Power
Distribution of Designs and Types
computer
automotive
medical
wireline
networking
storage
wireless
Parallel design
 Carefully Manage Risk vs. Reward
 Can’t Do This Sequentially
Circuit Design
aclr[1:0]
sclr
aload
carry_in
ena[2:0]
sload
share_in
clk[1:0]
Process Technology
reg_cascade_in
F0
E0
A
B
DC0
LUT 4
0
1
CLR
+
0
D
Q
lelocal0
1
1
LUT 3
0
0
0
leout0a
1
1
EN
R
0
LUT 3
leout0b
DATA
0
Concurrent
Design
LD
1
1
0
DC1
LUT 4
1
CLR
+
0
D
Q
lelocal1
1
0
1
LUT 3
0
1
0
1
R
0
DATA
0
LD
1
LUT 3
leout1a
EN
leout1b
1
E1
F1
share_out
carry_out
reg_cascade_out
FPGA Architecture
© 2006 Altera Corporation
31
Software
Complete Design Flow: Quartus II
// Begin: Write Control
always @ (posedge wrbusy_int)
begin
// Begin:
Control
write0 Write
<= 1'b1;
always <=
@ (posedge
wrbusy_int)
write1
1'b0;
begin <= 1'b0;
writex
// Begin:
Control
write0 Write
<= 1'b1;
end
always
@
wrbusy_int)
write1 <= (posedge
1'b0;
begin
<= wrbusy_int)
1'b0;
always @ writex
(negedge
write0
<= 1'b1;
beginend
write1 <= 1'b0;
write0 <= writex
1'b0; <= 1'b0;
@ (negedge
wrbusy_int)
end always
beginend
IP Cores
<= 1'b0;
always @ write0
(posedge
always @ write0_done)
(negedge wrbusy_int)
beginend begin
write1 <= write0
1'b1; <= 1'b0;
always @ (posedge
write0_done)
beginend
Verilog,
VHDL
write1 <= 1'b1;
always @ (posedge write0_done)
begin
write1 <= 1'b1;
Synthesis
3-rd Party
or Altera
Placement
& Routing
Physical
Synthesis
Timing &
Power
Analysis
Assembler
Over 10 Million
Lines of Code!
© 2006 Altera Corporation
32
Report
IP Core: Nios II Soft Processor
Three CPU Choices:
 Nios II/f
 Nios II/s
 Nios II/e
Fast: Optimized for Performance
Standard: Faster and Smaller than Nios
Economy: Smallest FPGA Footprint
 Choose peripherals you want
 SoPC Builder software builds bus interfaces,
arbitration etc.
Smaller
© 2006 Altera Corporation
33
Nios II/e
Nios II/s
Nios II/f
Faster
Soft Processors are Affordable
Largest Stratix II
180,000 LEs
Small Cyclone II
4600 LEs
Nios II
Nios II
Nios II
FPGA
FPGA
Nios II
Nios II
600 LEs
13% of FPGA
Nios II
Nios II/e “economy”
35¢ in lowest
cost FPGA
© 2006 Altera Corporation
34
1800 LEs, 1% of FPGA
Nios II/f “fast”
Massively Parallel Nios II
Barco Media & Entertainment
Olite 510 LED Display System
Modular LED Display
System
100 Nios II Processors
per square meter!
FPGA Used:
© 2006
Corporation
©
2005Altera
Altera
Corporation 35
Structured ASIC Technology
©©2006
2005Altera
AlteraCorporation
Corporation
What is a Structured ASIC?
Use fixed masks for most layers
Use customer-specific masks for a few via
& metal layers
 To customize the logic cells and to route
signals between logic cells
Has characteristics between an FPGA and
a standard cell ASIC
 Faster and smaller than an FPGA
 But lower development cost & time than a
standard cell ASIC
© 2006 Altera Corporation
37
FPGA to Structured ASIC
Two Metal Layers for
Customization
Flip-Chip
Bumps
Signal
Routing
© 2006 Altera Corporation
38
Configuration Routing
LEs, Memory, PLLs,
DSP Blocks,
Internal Routing
Common
Base Die
Stratix HardCopy Base Array
Memory
Configuration Memory
& Logic
Interconnect
Resulting Base Die
Up to 70% Smaller
Remove
Remove
Start With
Configuration,
Interconnect
The FPGASystem
Logic
Die &
Memory Programmability
© 2006 Altera Corporation
39
Logic Elements
Development Cost and Risk
 Mask cost reduced vs. Std. Cell
 ~5 masks instead of ~30
 Verification of crosstalk, electromigration etc.
much easier than Std. Cell
 Since most layers are standard
 Same PLLs, I/Os, RAMs and packages as FPGA
 Debug your system with an FPGA, then do a drop-in
replacement with HardCopy
 Can ship systems with FPGA until volume merits going
to HardCopy
 Can get customer feedback on systems with FPGAs
and tweak before going to HardCopy
© 2006 Altera Corporation
40
Identical Operation
EP1S80F1020, 105C, VCC-5%
HC1S80F1020, 105C, VCC-5%
Data Rate 840 Mbps, LVDS
 Key selling point for Altera HardCopy
© 2006 Altera Corporation
41
FPGA to HardCopy CAD Flow
FPGA
Constraints
Quartus
Stratix II
Flow
Stratix II
POF
Equality
Checker
HDL
Code
HardCopy
Constraints
Quartus
HardCopy II
Flow
Handoff
Design
Files
HardCopy
Design Center
Same CAD flow & guaranteed equivalence
© 2006 Altera Corporation
42
2nd Generation: HardCopy II
 First generation HardCopy
 Removed programmability from FPGA
 Second generation HardCopy II
 Removes programmability
 Re-maps logic and DSP blocks to a fabric that is more
efficient in a structured ASIC
 Larger die size reduction
 But more complex CAD flow
 Typical results vs. Stratix II
 70% die size reduction
 60% power reduction
 50% speed increase
© 2006 Altera Corporation
43
HardCopy II Logic Remapping
Section of Stratix II Floorplan
Section of HardCopy II Floorplan
HCell Macro
Implementations of ALMs
M4K
Block
Logic ALMs
•Not Drawn to Scale
•Illustration Only, Not Actual Quartus II
Floorplan View
© 2006 Altera Corporation
44
DSP Block Remapping
Stratix II Floorplan
(only DSP Blocks Shown)
 Built as Needed
using HCell Macros
 Can be Placed
Anywhere in the
Floorplan where
HCells Exist
© 2006 Altera Corporation
45
HardCopy II Floorplan
Research Challenges
©©2006
2005Altera
AlteraCorporation
Corporation
Power
©©2006
2005Altera
AlteraCorporation
Corporation
Power Scaling
130 nm and above
 FPGAs scaled without regard to power
 Got full performance boost of process
90 nm and below
 Power-constrained scaling
 Low-cost FPGA power budget: ¼ W to 3 W
 High-speed FPGA: 2 W to 20 W
 Maximum performance within power budget
© 2006 Altera Corporation
48
Process Scaling & Power
 Dynamic Power drops per LE
 But reduction is less than 50% / LE
Doubling LE count increases power budget
 Static Power tends to increase
 Use higher Vt, thicker Tox, longer L on nontiming-critical circuitry
 If still too high, sacrifice speed by increasing
Vt, Tox, L on timing-critical circuitry
 Can compensate by making architecture faster
E.g. Larger LUT
© 2006 Altera Corporation
49
Controlling Power
 90 nm
 Process parameters
 FPGA CAD tools optimize for power
 20% dynamic power reduction
 Innovate on performance, then trade for Pstatic
 E.g. Stratix II ALM: larger LUT
 65 & 45 nm
 Innovation needed!
 32 nm
 Process will likely have better static power
 Double-gates FETs, high-K gate dielectric
© 2006 Altera Corporation
50
CAD for Power Optimization
Timing-Driven Compiler
Timing Yes
Critical?
Min
Delay
Power-Driven Compiler
Timing
Critical?
Yes
Min
Delay
No
No
Power Yes
Critical?
Min
Area
No
Min
Area
© 2006 Altera Corporation
51
Min
Power
E.g. Power-Optimized RAM Mapping
1K X 16
RAM
Default Option
Power Efficient Option
16
16
4 1Kx4 M4K RAMs
© 2006 Altera Corporation
52
2:4
Decoder
4 256x16 M4K RAMs
E.g. Power-Driven Place & Route
 Minimize capacitance of high-toggling signals
 Without violating timing constraints
20 Million Toggle/s
100 Million Toggle/s
Power Optimize
© 2006 Altera Corporation
53
CAD Scalability
©©2006
2005Altera
AlteraCorporation
Corporation
FPGA Logic & Memory Growth
40
700
Logic Elements (K)
30
500
400
20
300
200
10
100
0
0
1998
1999
2000
250 nm 180 nm 180 nm
© 2006 Altera Corporation
55
2001
2002
2004
2006
150 nm 130 nm 90 nm 65 nm
2009
45 nm
Memory Bits (Mbits)
600
FPGA Capacity vs. CPU Speed
30X logic growth from 1998 to 2006
 Over 30X memory bits growth
~8X CPU speed increase from 1998 to
2006
FPGA CAD problem growing more rapidly
than CPU speed
But productivity of FPGA designers
depends on many compiles
 To iteratively debug, add features, close timing
© 2006 Altera Corporation
56
Compile Time
 Need to find highly scalable algorithms
 For placement, routing, synthesis
 Do not sacrifice result quality
 Future: single processor speed-up will fall further
behind FPGA capacity growth
 But more cores per chip




Today: 2
2007: 4
Parallel CAD tools, with same result quality?
Need sequentially consistent algorithms, or debugging
is a nightmare
© 2006 Altera Corporation
57
Increasing Design Abstraction
©©2006
2005Altera
AlteraCorporation
Corporation
FPGA Usage
FPGA design is usually done in Hardware
Description Language (HDL)
 Limits FPGA use to hardware designers
FPGAs can:
 Outperform DSPs
 Create custom hardware / software systems
that outperform fixed microcontrollers
Usage in these fields limited by
unfamiliarity with HDL design
© 2006 Altera Corporation
59
Efficiency vs. Development Cost
High
Power & System Cost*
Development Difficulty & Cost
Low
Processor
DSP
FPGA
Struct.
ASIC
Std. Cell
*For applications with significant parallelism
© 2006 Altera Corporation
60
Full
Custom
Raising Design Abstraction
 Ideal: software engineers can design hardware
 C to gates
 Not achievable in general
 Practical: domain-specific higher-level tools
 SoPC builder:
 Build a custom microcontroller
 Integrate IP cores
 C-HAC, Impulse, Celoxica: Hardware accelerator for
targeted C code, soft processor for rest
 DSP Builder: Convert DSP block diagrams to
hardware
 Other tools?
© 2006 Altera Corporation
61
Modern FPGA RTL Design Flow
Specification
Custom RTL
Development
Functional
Verification
IP Cores
Design
Third-Party
Software
RTL Logic
Synthesis
Place-&-Route
& Physical Synth.
Timing Verification
& Debug
Compilation
& Optimization
Hardware/Software
Debug
Product
Verification
© 2006 Altera Corporation
62
62
Extending the Design Flow
RTL Design Flow
Back-end Flow
Hardware/Software
Debug
Product
© 2006 Altera Corporation
63
63
Extending the Design Flow To System Level
HW/SW
Interface
Generation
Higher Level
Languages
Embedded Soft
Processors
IP Core Reuse
System Integration
Interface Synthesis
RTL Design Flow
Back-end Flow
Hardware/Software
Debug
Product
© 2006 Altera Corporation
64
64
Structured ASIC Architecture
©©2006
2005Altera
AlteraCorporation
Corporation
Structured ASIC Architecture
 Many questions similar to FPGA
 Logic cell, RAM types, structure of custom metal
routing layers for best speed, area, power
 Metal programmed  answers different than FPGA
 How to keep non-recurring engineering cost low
 Few masks?
 Cheap masks?
 Make custom layers easy to electrically and optically
verify?
 Clever tricks?
 Still have to beat FPGA speed, area, power
 And device must be routable
© 2006 Altera Corporation
66