FPGAs The Superior Solution to Custom Logic November, 1998

Download Report

Transcript FPGAs The Superior Solution to Custom Logic November, 1998

Challenges and opportunities
for FPGA platforms
Ivo Bolsens
Xilinx Research Labs
Thanks to
•
•
•
•
•
•
•
•
Bill Carter
David Eden
Erich Goetting
Alireza Kaviani
Bernie New
Cameron Patterson
Steve Trimberger
Tim Tuan
Xilinx Confidential
Overview
• FPGA’s ride the tide
• Opportunities
• Challenges
Xilinx Confidential
ASICs buck the tide, FPGAs
ride the tide
•
•
•
•
•
•
Process Technology
Performance
Architecture
Cost
Flexibility
Market trends
Xilinx Confidential
Moore’s Law
A tale of two numbers : What process people
don’t tell you
CD
CD
Tox, Gate
Leakage
320nm
Gate
240nm
Source
160nm
Substrate
80nm
1.3
2.7
4.5
6.5
(nm)
Tox
Xilinx Confidential
Drain
Channel
Leakage
Trend: Line Widths Smaller
Than the Wavelength of Light
Process Geometry (micron)
0.700
0.600
0.500
0.400
0.300
0.200
0.100
1988
1990
1992
1994
Optical Processing Wavelength
Xilinx Confidential
1996
1998
2000
Process Geometry
2002
Painting a one cm line with a
three cm brush…
Courtesy : IBM
Xilinx Confidential
Gate Oxide
Polysilicon Gate
Gate Oxide
Silicon crystal
• About 10 molecular layers of SiO2 for this 150nm example
• 90nm technology is about half the thickness
Xilinx Confidential
Process Technology Feature Size (nm)
FPGA’s are ahead of the curve
350
Virtex-II FPGA to Market
1-Year Earlier
250
180
150
130
100
Cu/Low-K
Xilinx is developing 90nm in 2002
70
97
SIA Roadmap
98
99
00
01
Year
Xilinx
Xilinx Confidential
02
03
04
05
Where are we today
556
10Mb
125K
442
24
4
105K
340
3Mb
Logic Cells
168
Block RAM Multipliers
XC2V8000
= 350M tranistors
840Mb/s
LVDS
XC2VP125
Xilinx Confidential
3.125Gb/s PowerPC
MGTs
CPUs
FPGAs are leading
Intel’s Roadmap
Source : Intel
Xilinx Confidential
Gate count requirement for
ASICs
7
20
<3M
3-5M
>5M
Source: IMS
73
FPGAs can address very large part of the ASIC market today
Xilinx Confidential
Performance requirement
for ASICs
10.8
53.9
38.5
<100Mhz
100-200Mhz
>200Mhz
Source: IMS 2000
FPGAs can address very large part of the ASIC market today
Xilinx Confidential
A Decade of Progress
1000x
1000
Virtex-II
(excl. Block RAM)
100x
100
Capacity
Speed
Price
Virtex &
Virtex-E
(excl. Block RAM)
XC4000
10
10x
Spartan
1x1
1/91
1/92
1/93
1/94
1/95
1/96
Year
Xilinx Confidential
1/97
1/98
1/99
1/00
1/01
The Cost/Volume Crossover
1000
Relative Cost
100
ASIC Cost
10
FPGA Cost
1
0.1
10
100
1,000
10,000
Unit Volume
Xilinx Confidential
100,000
1,000K
Are Transistors Free?
1 pin
10,000X
Xilinx Confidential
10,000 transistors
Performance Scaling
45
40
Delay (ps)
35
Gate Delay
30
Wire Delay (Al)
25
20
Total Delay (Al)
15
10
Wire Delay
(Cu+Low k)
5
0
0
200
400
600
800
1000
Total Delay
(Cu+Low k)
Line Width (nm)
Source: ITRS
Xilinx Confidential
Localisation of storage and
computing
+
+
+
l
store
store
+
+ + + +
+ + + +
+ + + +
+ + + +
l/2
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
l/2
3 3
Heat/area V /l
2
Tconnect (nsec) .l /l
2
Courtesy :IMEC
Xilinx Confidential
Market Requirements
PC
Mainframe
Smart Things
>100 # / human
1
0.01
Compute
Power
+DSP
+Communications
+ Ambient
Intelligence
60
70
80
90
Xilinx Confidential
00
10
Electronics Industry Dynamics
Market Size ($)
Residential Gateway
(Broadband access)
Satellite/Cable
+ Digital
NTSC
VCR
Custom
Features
(Pay-Per-View)
Cable Decoders
NTSC
NTSC
Smart cards
(DES)
Digital
VCR
NTSC
DES
ATAPI
DES
ATAPI
DBS
Dramatic
DOCSIS
increase
NTSC HomePNA
in new
DES
HomeRF
standards
ATAPI HomePLUG
Bluetooth
DBS
DOCSIS Hiperlan2
DSL...
• New Products
Time
– Take less time to reach high volumes
– Shorter Product Life Cycles
– Many standards / More Interoperability
Xilinx Confidential
Complex ASIC Design
The Shrinking Window of Innovation
Interconnect Power Analysis 3%
Transistor Analysis 3%
Simulation 5%
Design
authoring 20%
Extraction 5%
Place and Route
17%
Synthesis
16%
Floorplanning 5%
Static Timing
Analysis
5%
Gate Simulation 7%
Simulation
14%
• Average iterations between design and layout = 20
(Source Electronic Systems Jan 99)
Xilinx Confidential
Simpler/Faster Design Flows
ASIC
Flow
Design and
Verification
Spec
Silicon
Prototype
System
Integration
Design
Freeze
FPGA
Flow Spec
Design and
Verification
System
Integration
Design
Freeze
• 2:1 proven Time-to-Market Advantage
• No silicon design or verification steps
• More design flexibility through later design
freeze
Xilinx Confidential
Silicon
Production
Today’s Product Lifecycle
Profit for first
Profit to Market
Reduced profit
for latecomers
Time
• 37% of new digital products were late to market
• Entering the market first can result in up to a 40% greater
total profit contribution over the product’s life vs. the #2
entrant
Xilinx Confidential
Today’s Product Lifecycle
IRL extends
product life in market
Profit
Time
• 37% of new digital products were late to market
• Entering the market first can result in up to a 40% greater
total profit contribution over the product’s life vs. the #2
entrant
Xilinx Confidential
Virtex-II Pro PowerPC Technology
Fetch &
Decode
I-Cache
PPC
16KB
Timers
and
Debug
Logic
MMU
PPCD-Cache
16KB
Execution Unit
32x32b GPR
ALU, MAC
IBM PowerPC™ 405 RISC CPU
•
•
•
•
•
•
•
•
•
•
•
32-bit RISC CPU, Harvard Architecture
130nm CMOS with 1.5V Operation
456 Dhrystone MIPS at 300MHz
32 x 32-bit General Purpose Registers
Hardware Multiply / Divide
5-Stage Execution Pipeline
16KB D-Cache, 16KB I-Cache
Memory Management Unit (MMU)
High-Bandwidth Interface to Logic
Built-In Hardware Timers
Built-In JTAG Debug and Trace support
3.8 sq mm = 1% of 2VP100
Xilinx Confidential
High Performance
1824
456
400
200
100
2 CPUs
800
Virtex-II Pro
PowerPC 405
Xilinx Confidential
4 CPUs
912
1 CPU
Dhrystone MIPS
1600
220
Altera
Excalibur
Arm 9
“Low PowerPC”: 0.59mW/MIPS
400
Full-Custom IBM CPU Design
1.5V 130nm CMOS Technology
Low-K Dielectric
IP-Immersion
Power (mW)
300
100mW =
1 LED Indicator
200
100
…or 169 MIPS!
0
50
100 150
200 250 300
Performance (Dhrystone MIPS)
Xilinx Confidential
350
400
IP-Immersion
Embed multiple IP blocks of arbitrary shape with
high-bandwidth connectivity to FPGA core logic, memory & I/O
Technologies Enabling IP-Immersion
Metal 9
Metal 8
Metal 7
Metal 6
Metal 5
Metal 4
PPC
Metal 3
PPC
Metal 2
Metal 1
Advanced hard-IP block
(e.g. PowerPC CPU)
Poly
Silicon Substrate
Active Interconnect™
Segmented Routing
Metal ‘Headroom’
Xilinx Confidential
System Architecture Options
External
Devices
• “Logic-Centric Architecture”
External
Interfaces
–
–
–
–
•
PPC
External
Devices
PPC
PowerPC Executes Entirely out of Cache
No FPGA Logic, Memory, or I/O Used
10-20 Pages of C-Code or More
Use as Complex Algorithmic Engine
• Web Server
• Encryption/Decryption
• Packet Processor
“CPU-Centric Architecture”
– PowerPC forms Heart of Embedded System
– On & Off-Chip Peripherals
– External Interfaces
• e.g. PCI, 3GIO, Gb Ethernet, ZBT SRAM
– CoreConnect™ On-Chip Bus
• Ties System Together
– Peripherals implemented in FPGA Logic
– Typically Runs Embedded OS
External
Interfaces
Xilinx Confidential
HW acceleration
Virtex-II Pro
Code Stack (C++)
Concatenated FEC Engine
Control Tasks
PowerPC
Viterbi
Processor
RAM
Viterbi
Interleaver
ReedSolomon
Interleaver
Reed-Solomon
PowerPC with Application-Specific
Hardware Acceleration
Control Tasks
XTREME
The Virtex-II Pro Advantage
Processing™
Control Tasks
Traditional
Viterbi
Interleave Reed-Solomon
Processing time
Xilinx Confidential
HW/SW Interfacing
6.4Gb/sec
6.4Gb/sec
Timers
Fetch & and
Decode Debug
Logic
I-Cache
16KB
MMU
D-Cache
16KB
Execution Unit
32x32b GPR
ALU, MAC
6.4Gb/sec
BlockRAMs
6.4Gb/sec
Acceleration
Logic
• Provides Specialized Connectivity
Between PowerPC & FPGA Logic
• Dual-Port BlockRAM Memory
– CPU & Logic Each Own 1
Port
• High-Bandwidth
– 6.4Gb/sec
• Low-Latency
• Non-Caching
– Designed for
Communications Data
Processing
• Enables PowerPC & FPGA Logic to
Work together on Complex
Problems
Xilinx Confidential
• Micro-controller style interface
to fabric for control plane
applications
• Benefits:
– Up to 10x faster than memory
mapped interface
– Saves PLB bandwidth for code
execution
– Minimizes pipeline stalls
PLB
405
Core
Processor Block
Xilinx Confidential
APU
Controller
Hardware
Coprocessor
APU Controller
Creating Complete
Communications Solutions
TCP
IP
MAC
PHY
Upper Layers
on PowerPC
TCP/IP Stack
on PowerPC
Link Layer in
FPGA Logic
(GbE MAC)
MAC
ftp
telnet
rlogin
mail
etc
RocketIO is PHY
(1000Base-SX/LX)
Gb Ethernet
(1000BaseLX/SX/CX)
TCP/IP
Xilinx Confidential
Infiniband Example
CPU Makes Communications Practical, Easier, & Cheaper
InfiniBand TCA
built with
CPU + fabric
CPU Based Solution
8 Times Less Area
…or built
with fabric only
Sources: Intel, Xilinx
Xilinx Confidential
Configurable Platform
Specify System Architecture
Create System Architecture
Define Addresses
Configure Peripherals
Xilinx Confidential
™
The MicroBlaze
High Performance Soft CPU
UART
PPC 405
32-Bit RISC
130nm Process
300+ MHz Core
420 D MIPS
PPC 405
Interrupt
Controller
TM
tm
CoreConnect
Technology
32-Bit RISC
130nm Process
300+ MHz Core
420 D MIPS
Arbiter
Local
OPB
Bus
Xilinx Confidential
Incremental Design
lessens the impact of design changes
– “Next Generation” technology
– Easy set-up through floorplanning
along HDL hierarchy boundaries
– Changes only affect the module
that was changed
– The remainder of the design stays
locked and intact
– Timing repeatability
• preserves routing
– Faster turnaround for localized design
changes
Xilinx Confidential
Partial Reconfigurability
FPGA Flexibility for the Field
011011
• Re-program part of an FPGA
while it’s still running
• Virtex-II and Virtex-E
Fixed
Logic
PR
Logic
PR
Logic
Fixed
Logic
Fixed
Logic
User Definable
Boundaries
Xilinx Confidential
System Exploration
Bus
Line
System
System
System
Payload
Interfaces
Processing
Payload
Payload
Data
Line
Assembly
Qualify
Format
Coding
Payload
Payload
Data
Line
Buffer
Quality
Alignment
Decoding
Xilinx Confidential
Tx
Rx
Traditional Architecture
Payload
Assembly
Payload
Qualify
Data
Format
Line
Coding
Payload
Buffer
Payload
Quality
Data
Alignment
Line
Decoding
RAM
Tx
Rx
mP Bus Motorola PowerQUICC
System
U-Bus CPM
Memory
Interface
AAL5
Processor
G704
Framer
G703
LIU
FLASH
EEPROM
Processor
PCI Bus
System
Payload
Processing
Other Peripherals
MPC860
PCI Bridge
Device
CPM = Communications Processor Module
Generic Design
System
Interfaces
Xilinx Confidential
Traditional Architecture
Payload
Assembly
Payload
Qualify
Data
Format
Line
Coding
Payload
Buffer
Payload
Quality
Data
Alignment
Line
Decoding
mP Bus
System
Tx
Rx
Motorola PowerQUICC
U-Bus
RAM
Memory
Interface
CPM
AAL5
Processor
G704
Framer
G703
LIU
Data
Direction
FLASH
EEPROM
Processor
PCI Bus
System
Payload
Processing
Other Peripherals
MPC860
PCI Bridge
Device
CPM = Communications Processor Module
Generic Design
System
Interfaces
Xilinx Confidential
Optimized Architecture
Payload
Assembly
Payload
Qualify
Data
Format
Line
Coding
Payload
Buffer
Payload
Quality
Data
Alignment
Line
Decoding
Tx
Rx
mP Bus
System
RAM
FLASH
Dual Port
Block
RAM
MicroB
Processor
G704
Framer
G703
LIU
EEPROM
Memory
Interface
PCI Bus
System
PowerPC
Processor
Other Peripherals
PCI Bridge
Device
Fast I/F
FIFO
FPGA Boundary
Generic Design
System
Interfaces
Xilinx Confidential
Payload
Processing
Optimized Architecture
Payload
Assembly
Payload
Qualify
Data
Format
Line
Coding
Payload
Buffer
Payload
Quality
Data
Alignment
Line
Decoding
Tx
Rx
mP Bus
System
RAM
FLASH
Dual Port
Block
RAM
MicroB
Processor
G704
Framer
G703
LIU
EEPROM
Memory
Interface
PCI Bus
System
PowerPC
Processor
Other Peripherals
PCI Bridge
Device
Fast I/F
FIFO
FPGA Boundary
Generic Design
System
Interfaces
Xilinx Confidential
Payload
Processing
Interconnect and power
Source : Bill Daly
Xilinx Confidential
Interconnect and
performance
Source : Bill Daly
Xilinx Confidential
Power Analysis
• Typical design
– 5.9uW/CLB/MHz [FPGA00]
– Fabric power is ~69% of total power
– 2V6000 = 5.9uW/CLB/MHz  8448CLBs
 100MHz  69% = 7.5W
Mult
11%
BRAM
13%
IOB
7%
Fabric
69%
Xilinx Confidential
Dynamic Power
• Normalized to 2001
– Best fit is a quadratic trend line
– Predicts 5X by 2007
Dynamic Power
1996: 4000EX
1997: 4000XL
1998: 4000XV
1999: Virtex
2000: Virtex-E
2001: Virtex-II
6
5
4
3
2
1
0
1994
1996
1998
2000
2002
Xilinx Confidential
2004
2006
2008
Static Power
• Normalized to 2001
– Best fit is a power trend
– Predicts 100X by 2007
• Future data points projected
Staticusing
Power linear trend for 1/VTH
1000
100
10
1
0.1
0
2
4
6
8
10
12
0.01
\
0.001
0.0001
0.00001
0.000001
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Xilinx Confidential
14
Static versus Dynamic
Xilinx Confidential
The Age of Accumulation
Mixed
Signal
uProc.
System
Clock
Management
High
Performance
I/O
Memory
Special
Arithmetic
Functions
Gates
Routing
Virtex-II Pro
Virtex
Virtex
XC4000
XC4000
XC2000
Xilinx Confidential
FPGA
Incremental Design
lessens the impact of design changes
– “Next Generation” technology
– Easy set-up through floorplanning
along HDL hierarchy boundaries
– Changes only affect the module
that was changed
– The remainder of the design stays
locked and intact
– Timing repeatability
• preserves routing
– Faster turnaround for localized design
changes
“New in ISE5.1i”
Xilinx Confidential
Partial Reconfigurability
FPGA Flexibility for the Field
011011
• Re-program part of an FPGA
while it’s still running
• Virtex-II and Virtex-E
Fixed
Logic
PR
Logic
PR
Logic
Fixed
Logic
Fixed
Logic
User Definable
Boundaries
Xilinx Confidential
System Level Design Flow
Functional
Modeling
IP
Models
Architectural
Exploration
Behavioral C/C++
System Design
and Partitioning
Formal/ Plug&Play
Implementation
Architecture
ANSI C/C++
Embedded
Software
HW-SW Codesign
Enabling
Technologies
Xilinx Confidential
Cycle-Accurate C/C++
HW Synthesis
HDL
Reconfigurability : 4th
dimension
• Applications open, close,
and change priority
over time
• Use model : ObjectOriented
Multiprocessing
Application 4
Application 3
Memory
Application 2
Memory
Prog H/W
Object
S/W
Memory
Prog H/W
Object
Object
S/W
Memory
Prog H/W
Object
ObjectFixed H/W
S/W
Prog H/W
Object
Object
Fixed
H/W
Object
S/W
Object
Object
Fixed H/W
Object
Fixed H/W
Object
Application 1
Xilinx Confidential
The mother of all complex flows
–
–
–
–
–
No established formal models / semantics
Design capture and simulation issues
Hardware/software co-design
Modular design with uniform interfaces
Dynamic management of diverse hardware resources
creates new place & route issues
– Bitstream production, storage, loading and linking
Xilinx Confidential
Combining the Best of
FPGA and ASIC: XBlue
PowerPC
Core
Embedded
FPGA Core
Special
Functions
Block
RAM
Block
RAM
100% Fixed Logic
100% Programmable
Traditional ASIC Market
Traditional FPGA Market
• Inflexible, but highest
performance/integration
• Flexible, but expensive
Xilinx Confidential
Conclusions
• FPGA’s ride the tide
• Today : Programmable System Platform
• Challenges
–
–
–
–
Design Technology
Interconnect
Low Power
Exploit 3rd and 4th dimension
Xilinx Confidential
Xilinx Confidential