Coarse Grain Reconfigurable Architectures

Download Report

Transcript Coarse Grain Reconfigurable Architectures

Reconfigurable HPC
May 14, 2004 , TU Tallinn, Estonia
Reiner Hartenstein
TU Kaiserslautern
Reconfigurable HPC
part 4
miscellaneous
Time to Market
TU Kaiserslautern
• A Fundamental Paradigm
Shift in Silicon Application
Revenue
/ month
[Tom Kean]
Update 2
Update 1
reconfigurable
Product
Product
with
download
ASIC
Product
Time / months
1
© 2004, [email protected]
10
20
2
30
http://hartenstein.de
[Keutzer / Newton]
TU Kaiserslautern
The next Revolution:
[Keutzer / Newton]
Makimoto’s 3rd wave
EDA industry paradigm
switching every 7 years
82% of designers
hate their tools
Tornado
2006
Mainstream
Paradigm
Shift
1999
1992
Synthesis: Cadence, Synopsys ...
1985
1978
Reconfigurability
(Co-) Compilation &
[Hartenstein]
Data-stream-based (r)DPAs
Schematics entry: Daisy, Mentor, Valid ...
Transistor entry: Applicon, Calma, CV ...
© 2004, [email protected]
3
[Richard Newton]
http://hartenstein.de
TU Kaiserslautern
Software to Configware Migration
Software to Configware Migration
is the most important source of speed-up
Hardware is just frozen Configware
this talk will illustrate the performance benfit
which may be obtained from Reconfigurable Computing
stressing coarse grain Reconfigurable Computing (RC),
point of view, this talk hardly mentions FPGAs
(But coarse grain may be always mapped onto FPGAs)
© 2004, [email protected]
4
http://hartenstein.de
avoiding specific silicon ….
TU Kaiserslautern
number of
design starts
[N. Tredennick, Gilder
Technology Report, 2003]
50,000
40,000
h
p
r
mo
re
a
w
rGA-based
30,000
20,000
10,000
0.13
ASIC
year
c)
© 2004, [email protected]
0
2001
5
2002
2003
2004
http://hartenstein.de
TU
Kaiserslautern
System
gates
10 000 000
Mega-rGAs
per rGA chip
[Xilinx Data]
planned
Virtex II
1 000 000
Virtex
XC 40250XV
XC 4085XL
100 000
10 000
1 000
500
200
100
1984
1986
1988
© 2004, [email protected]
1990
1992
1994
6
1996
1998
2000
Jahr
2002
2004
http://hartenstein.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
HLL
Embedded hardw. CPU & memory cores
on chip.
Compiler
FPGA core
HLL
Compiler
CPU Memory
core
core
[à la S. Guccione]
© 2002, [email protected]
7
http://kressarray.de
TU Kaiserslautern
entire system on a single chip
• Xilinx Virtex-II Pro
FPGA Architecture
• PowerPC 405
RISC CPU
(PPC405) cores
• FPGA Fabric-based
on Virtex-II
Architecture
all you need on board
Rocket
IO
Power PC
Core
On Chip
Memory
Controller
Embeded
RAM
Source: Ivo Bolsens, Xilinx
© 2004, [email protected]
8
http://hartenstein.de
What’s Wrong with This Picture?
TU Kaiserslautern
What About PLD
Cores on ASICs ?
Embedded
FPGA Fabric
[Jonathan Rose]
1. Still Have to Make the Chip
2. Need Two Sets of Software to Build It
– The ASIC Flow
– The PLD Flow
3. Have No Idea What to Connect the PLD Pins to
– Chances Are, You Are Going to Get It Wrong!
http://hartenstein.de
© 2004, [email protected]
9
What’s Right with This Picture!
TU Kaiserslautern
Embedded
CPU Serial Link,
Analog, “etc.”
[Jonathan Rose]
1. Pre-Fabricated
2. One CAD Tool Flow!
3. Can Connect Anything to Anything
 PLDs are built for general connectivity
© 2004, [email protected]
10
http://hartenstein.de
>> rGAs <<
TU Kaiserslautern
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
http://www.uni-kl.de
© 2004, [email protected]
• Future directions
• conclusions
11
http://hartenstein.de
Different Morphware-Platforms:
TU Kaiserslautern
Reconfigurable Logic Blocks
fine grain reconfigurable
Reconfigurable Interconnect Blocks
Reconfigurable interconnect fabrics
Reconfigurable Datapath Arrays
coarse grain reconfigurable
© 2004, [email protected]
12
http://hartenstein.de
rGA w. island architecture
(Ausschnitt)
TU Kaiserslautern
connect box
Interkonnectswitch
switch box
Fabrics
reconfigurable
logic block
©
© 2004,
2003,[email protected]
[email protected]
13
13
http://hartenstein.de
http://hartenstein.de
Switch box
Xputer Lab
TU Kaiserslautern
TU Kaiserslautern
switch
point
switch box
©
© 2004,
2003,[email protected]
[email protected]
14
14
http://hartenstein.de
http://hartenstein.de
Xputer Lab
TU Kaiserslautern
TU Kaiserslautern
connect box
point
©
© 2004,
2003,[email protected]
[email protected]
15
15
http://hartenstein.de
http://hartenstein.de
Xputer Lab
TU Kaiserslautern
TU Kaiserslautern
conncect point activated
Verbindu
ngspunkt
(vergröße
rt)
©
© 2004,
2003,[email protected]
[email protected]
16
16
http://hartenstein.de
http://hartenstein.de
Xputer Lab
TU Kaiserslautern
TU Kaiserslautern
switch boxes
activated
3 Schaltpunkte
switch
point
der 4. Schaltpunkt
der 5. Schaltpunkt
© 2004, [email protected]
© 2003, [email protected]
17
17
switch box
http://hartenstein.de
http://hartenstein.de
Result
Xputer Lab
TU Kaiserslautern
TU Kaiserslautern
18
© 2004, [email protected]
18
http://hartenstein.de
http://hartenstein.de
© 2003, [email protected]
Xputer Lab
TU Kaiserslautern
TU Kaiserslautern
A
Routing
completed
for 1 net
20 Transistors
+ 20 Flipflops
1979 Silva Lisco
(Silicon Valley
Research Corp.)
offers CALM-P
B
© 2004, [email protected]
19
19
http://hartenstein.de
http://hartenstein.de
© 2003, [email protected]
>> Placement & Routing <<
TU Kaiserslautern
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
http://www.uni-kl.de
© 2004, [email protected]
• Future directions
• conclusions
20
http://hartenstein.de
A
TU Kaiserslautern
Routing:
long
distance
net
passing
through
At a time a path
may be used only
for one signal...
... Bridges of
Königsberg
B
© 2004, [email protected]
21
http://hartenstein.de
A
TU Kaiserslautern
Routing
congestion
C and D are
not reachable
C
D
C cannot
beconnected
with D.
C and D need
another
placement
B
© 2004, [email protected]
rLBs are not
100% usable
22
http://hartenstein.de
TU Kaiserslautern
Leonhard Euler
Euler‘s Problem of
the bridges
Königsberg is
such a network
(1736):
Find a way, which
crosses each bridge
exactly once .....
© 2004, [email protected]
1736
... Also an optimization: none of the bridges is unused.
23
http://hartenstein.de
TU Kaiserslautern
L. Euler: Solutio Problematis Ad geometriam Situs
Pertinentis; Commetarii Academiae Scientiarum
Imperialis Petropolitanae 8 (1736), pp. 128-140
Graph
node
Right Bank
Kneiphof
Island
Other
Island
Left Bank
© 2004, [email protected]
24
http://hartenstein.de
Crossbar
TU Kaiserslautern
Crossbr switch
1913 J. N. Reynold‘s
crossbar switch
1915 patent granted
1926 first public telefon
switching application in
Shweden
Betulander‘s crossbar switch 1919
NASA
telemetrics
crossbar
array 1964
© 2004, [email protected]
25
http://hartenstein.de
TU Kaiserslautern
Crossbar complete?
Crossbar Chips
available from
Aptix,
Texas Instruments
and others
One bar connects 2 pins
cossbar
chips in
a row
n
no of crossbar
chips needed
partial
full
n x n/2
n
4
8
4
100 5000 100
Size of full complete switchs: n x n / 2
© 2004, [email protected]
26
http://hartenstein.de
Detour connection
Routing
congestion
example with
detour
TU Kaiserslautern
rGA
rGA
rGA
rGA
Routing through
Direct connection
impossible
rLB
Routing-Resources:
Logic gates and/or
pass transistors
©
© 2004,
2003,[email protected]
[email protected]
27
27
Identity
function
configured
http://hartenstein.de
TU Kaiserslautern
Crossbar-based Architectures
16 bit
C
T
L
EXU
1990: UC Berkeley
(Jan Rabaey)
1993: PADY-II
(Jan Rabaey)
1997: Pleiades
(mesh & crossbar)
C
T
L
EXU
C
T
L
EXU
C
T
L
EXU
crossbar switch
I/O
I/O
C
T
L
EXU
C
T
L
EXU
C
T
L
EXU
C
T
L
EXU
32 bit
© 2004, [email protected]
28
http://hartenstein.de
PADDI-II Architecture
TU Kaiserslautern
P1
P2
P3
P4
P5
P6
P7
P8
Level-2
Network
16 x 16b
© 2004, [email protected]
P9
P10
P11
P12
P13
P14
P15
P16
P25
P26
P27
P28
P29
P30
P31
P32
I/O
I/O
I/O
I/O
P17
P18
P19
P20
P21
P22
P23
P24
break-switch
I/O
break-switch
6 x 16b
I/O
P33
P34
P35
P36
P37
P38
P39
P40
29
16 x 6 switch matrix
4-PE Cluster
P45
P46
P47
P41
P42
P43
P44
P45
P46
P47
P48
P48
I/O
I/O
Level-1 Network
http://hartenstein.de
>> Soft Processors <<
TU Kaiserslautern
http://www.uni-kl.de
© 2004, [email protected]
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
• Future directions
• conclusions
30
http://hartenstein.de
TU Kaiserslautern
FPGA CPUs in teaching and
academic research
• Michigan State
• Universidad de
Valladolid, Spain
• Virginia Tech
• Washington
University, St. Louis
• New Mexico Tech
• UC Riverside
• Tokai University,
Japan
• UCSC: 1990!
• Märaldalen University,
Eskilstuna, Sweden
• Chalmers University,
Göteborg, Sweden
• Cornell University
• Gray Research
• Georgia Tech
• Hiroshima City
University, Japan
© 2004, [email protected]
31
http://hartenstein.de
Some soft CPU core examples
TU Kaiserslautern
core
architecture
platform
MicroBlaze
125 MHz 70
D-MIPS
32 bit
standard RISC
32 reg. by 32
LUT RAMbased reg.
Xilinx up to
100 on one
FPGA
Nios
16-bit
instr. set
Nios
50 MHz
Nios
core
architecture
platform
Leon
25 Mhz
SPARC
ARM7 clone
ARM
uP1232 8-bit
CISC, 32 reg.
Altera
Mercury
200 XC4000E
CLBs
REGIS
32-bit
instr. set
Altera
22 D-MIPS
8 bits Instr. +
ext. ROM
2 XILINX
3020 LCA
Reliance-1
12 bit DSP
8 bit
Altera –
Mercury
Lattice
4 isp30256,
4 isp1016
1Popcorn-1
8 bit CISC
Altera, Lattice,
Xilinx
gr1040
16-bit
gr1050
32-bit
My80
i8080A
FLEX10K30
or EPF6016
YARD-1A
16-bit RISC,
2 opd. Instr.
old Xilinx FPGA
Board
DSPuva16
16 bit DSP
Spartan-II
xr16
RISC integer C
SpartanXL
© 2004, [email protected]
Acorn-1
32
1 Flex 10K20
http://hartenstein.de
einige „soft CPU core“ Beispiele
TU Kaiserslautern
core
architecture
platform
MicroBlaze
125 MHz 70
D-MIPS
32 bit
standard RISC
32 reg. by 32
LUT RAMbased reg.
Xilinx up to
100 on one
FPGA
Nios
16-bit
instr. set
Nios
50 MHz
Nios
core
architecture
platform
Leon
25 Mhz
SPARC
ARM7 clone
ARM
uP1232 8-bit
CISC, 32 reg.
Altera
Mercury
200 XC4000E
CLBs
REGIS
32-bit
instr. set
Altera
22 D-MIPS
8 bits Instr. +
ext. ROM
2 XILINX
3020 LCA
Reliance-1
12 bit DSP
8 bit
Altera –
Mercury
Lattice
4 isp30256,
4 isp1016
1Popcorn-1
8 bit CISC
Altera, Lattice,
Xilinx
gr1040
16-bit
gr1050
32-bit
My80
i8080A
FLEX10K30
or EPF6016
YARD-1A
16-bit RISC,
2 opd. Instr.
old Xilinx FPGA
Board
DSPuva16
16 bit DSP
Spartan-II
xr16
RISC integer C
SpartanXL
© 2004, [email protected]
Acorn-1
33
1 Flex 10K20
http://hartenstein.de
It’s a Paradigm Shift !
TU Kaiserslautern
• Using FPGAs (fine grain reconfigurable) just
mainly has been classical Logic Synthesis on
a “strange hardware” platform
• Coarse Grain Reconfigurable Arrays (rDPAs)
(Reconfigurable Computing), however,
mean a really fundamental Paradigm Shift
• This is still ignored by CS and EE
Curricula and almost all R&D scenes
© 2004, [email protected]
34
http://hartenstein.de
Why the speed-up ...
TU Kaiserslautern
... although FPGA is clock slower by x 3 or even more
(most know-how from „high level synthesis“ discipline)
support operations: no clock nor memory cycle
decisions without memory cycles nor clock cycles
moving operator to the data stream (before run time)
most „data fetch“ without memory cycle
© 2004, [email protected]
35
http://hartenstein.de
TU Kaiserslautern
http://www.uni-kl.de
© 2004, [email protected]
>> History of Frameworks <<
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
• Future directions
• conclusions
36
http://hartenstein.de
Goal: away from complex design flow
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
[à la S. Guccione]
Schematics/
HDL
Netlister
Netlist
Place
and
Route
Bitstream
HLL
Compiler
© 2002, [email protected]
37
http://kressarray.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
Overcome traditional separate design flow
[à la S. Guccione]
HLL
Schematics/
HDL
Netlister
Netlist
Compiler
Place
and
Route
.
.
Bitstream
User
Code
Compiler
Executable
© 2002, [email protected]
38
http://kressarray.de
Overcome traditional co-processing design
Xputer Lab
TU Kaiserslautern
separate flow -> JBits Design Flow
University of Kaiserslautern
[à la S. Guccione]
Schematics/
HDL
JBits
API
Netlister
Netlist
Place
and
Route
User
Java
Code
Java
Compiler
Executable
.
.
Bitstream
User
Code
Compiler
Executable
© 2002, [email protected]
39
http://kressarray.de
new directions in application development
TU Kaiserslautern
• new directions in application development.
• aut. partitioning compilers: designer productivity
• like CoDe-X (Jürgen Becker, Univ. of Karlsruhe),
• supports Run-Time Reconfiguration (RTR), a key
enabler of error handling and fault correction by
partial re-routing the FPGA at run time, as well as
remote patching for upgrading, remote debugging,
and remote repair by reconfiguration - even over
the internet.
© 2004, [email protected]
40
http://hartenstein.de
TU Kaiserslautern
http://www.uni-kl.de
© 2004, [email protected]
>> RTR <<
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
• Future directions
• conclusions
41
http://hartenstein.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
CPU use for configuration management
• on-board microprocessor CPU is available
anyhow - even along with a little RTOS
• use this CPU for configuration management
RTR System Design
HLL
© 2002, [email protected]
42
Compiler
http://kressarray.de
hard CPU & memory core on same chip
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
HLL
Compiler
FPGA core
RTR System Design
HLL
© 2002, [email protected]
Compiler
43
CPU Memory
core
core
http://kressarray.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
Converging factors for RTR
• Converging factors make RTR based system design viable
• 1) million gate FPGA devices and co-processing with
standard microprocessors are commonplace
• direct implementation of complex algorithms in FPGAs.
• This alone has already
revolutionized FPGA design.
JBits
• 2) new tools like Xilinx Jbits
API
software tool suite directly
support coprocessing and RTR.
User
Java
Code
© 2002, [email protected]
44
Java
Compiler
Executable
http://kressarray.de
RTR
TU Kaiserslautern
• divides application into a series of sequentially executed
stages, each mapped as a separate execution module.
• Excellent example :Xtrem platform by PACT AG, Munich
• Without RTR, all configurable platforms just ASIC
emulators.
• directly support development and debugging of RTR
applications
• will also heavily influence the future system organization
© 2004, [email protected]
45
http://hartenstein.de
TU Kaiserslautern
http://www.uni-kl.de
© 2004, [email protected]
>> Support by rGA vendors <<
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
• Future directions
• conclusions
46
http://hartenstein.de
>> Support …
TU Kaiserslautern
• Support by FPGA Vendors
– Xilinx
• Software by Xilinx
• Configware (soft IP Cores)
• Hardware
– Altera
• Software
• Configware
• Hardware
© 2004, [email protected]
47
http://hartenstein.de
Xilinx
TU Kaiserslautern
• fabless FPGA semi vendor, San Jose, Ca, founded 1984
• key patents on FPGAs (expiring in a few years)
• Fortune 2001: No. 14 Best Company to work for in (intel:
no. 42, hp no. 64, TI no. 65).
• DARPA grant (Nov‘99) to develop Jbits API tools for
internet reconfigurable / upgradable logic (w. VT)
• Less brilliant early/mid 90ies (president Curt Wozniak):
1995 market share from 84% down to 62% [Dataquest]
• As designs get larger, Xilinx losed its advantage
(bugfixes did not require to burn new chips)
• meanwhile, weeks of expensive debug time needed
© 2004, [email protected]
48
http://hartenstein.de
Software by Xilinx
TU Kaiserslautern
• Full design flow from Cadence, Mentor, and Synopsys
• Xilinx Software AllianceEDA Program:
–
–
–
–
–
–
–
Alliance Series Development System.
Foundation Series Development Systems.
Xilinx Foundation Series ISE (Integrated Synthesis Environment)
free WebPOWERED SW w. WebFitter & WebPACK-ISE
StateCAD XE and HDL Bencher
Foundation Base Express
Foundation ISE Base Express ----- More:
• ModelSim Xilinx Edition (ModelSim XE) | Forge Compiler | Modular
Design | Chipscope ILA | The Xilinx System Generator| XPower|
JBits SDK | The Xilinx XtremeDSP Initiative| MathWorks / Xilinx
Alliance| System Generator| The Wind River / Xilinx alliance|
© 2004, [email protected]
49
http://hartenstein.de
Configware (soft IP Products)
TU Kaiserslautern
• For libraries, creation and reuse of configware
• To search for IPs see: List of all available IP
• The AllianceCORE program is a cooperation
between Xilinx and third-party core developers
• The Xilinx Reference Design Alliance Program
• The Xilinx University Program
• LogiCORE soft IP with LogiCORE PCI Interface.
• Consultants
© 2004, [email protected]
50
http://hartenstein.de
Xilinx hardware
TU Kaiserslautern
• Virtex, Virtex-II, first w. 1 mio system gates.
– Virtex-E series > 3 mio system gates.
• Virtex-EM on a copper process & addit. on chip memory f. network switch appl.
• The Virtex XCV3200E > 3 million gates, 0.15-micron technology,
• Spartan, Spartan-XL, Spartan-II
– for low-cost, high volume applications as ASIC replacements
– Multiple I/O standards, on-chip block RAM, digital delay lock loops
– eliminate phase lock loops, FIFOs, I/O xlators , system bus drivers
• XC4000XV, XC4000XL/XLA, CPLD: low-cost families
– rapid development, longer system life, robust field upgradability
– support In-System Programming (ISP), in-board debugging,
– test during manufacturing, field upgrades, full JTAG compliant interface
• CoolRunner: low power, high speed/density, standby mode.
• Military & Aerospace: QPRO high-reliability QML certified
• Configuration Storage Devices
© 2004, [email protected]
51
http://hartenstein.de
Altera
TU Kaiserslautern
• Altera was founded in June 1983
• EDA: synthesis, place & route, and, verification
• Quartus II: APEX, Excalibur, Mercury, FLEX 6000 families
• MAX+PLUS II: FLEX, ACEX & MAX families
• Flow with Quartus II: Mentor Graphics, Synopsys, Synplicity
deliver a design design software to support Altera SOPC solutions.
• Mentor: only EDA vendor w. complete design environment f. APEX
II incl. IP, design capture, simulation, synthesis, and h/s coverification
• Configware: Altera offers over a hundred IP cores
• Third party IP core design services and consultants
© 2004, [email protected]
52
http://hartenstein.de
Altera hardware
TU Kaiserslautern
• Newer families: APEX 20KE, APEX 20KC, APEX II, MAX
7000B, ACEX 1K, Excalibur, Mercury families.
– Apex EP20K1500E (0.18-µ), up to 2.4 mio system gates,
– APEX II (all-copper 0.13-µ) f. data path applications, supports
many I/O standards. 1-Gbps True-LVDS performance
– wQ2001, an ARM-based Excalibur device
• Altera mainstream: MAX 7000A, 3000A; FLEX 6000,
10KA, 10KE; APEX 20K families.
• Mature and other : Classic, MAX 7000, 7000S, 9000;
FLEX 8000, 10K families.
© 2004, [email protected]
53
http://hartenstein.de
TU Kaiserslautern
http://www.uni-kl.de
© 2004, [email protected]
>> EDA <<
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
• Future directions
• conclusions
54
http://hartenstein.de
>> EDA <<
TU Kaiserslautern
• EDA as the Key Enabler (major EDA vendors)
•
Altera
•
Cadence
•
Mentor Graphics
•
Synopsys
•
Xilinx
• Changing EDA Tools Market
© 2004, [email protected]
55
http://hartenstein.de
EDA as the Key Enabler (major EDA vendors)
TU Kaiserslautern
• Select EDA quality / productivity, not FPGA architectures
• EDA often has massive software quality problems
• Customer: highest priority EDA center of excellence
–
–
–
–
–
collecting EDA expertise and EDA user experience
to assemble best possible tool environments
for optimum support design teams
to cope with interoperability problems
to keep track with the EDA scene as a rapidly moving target
• being fabless, FPGA vendors spend most qualified manpower
in development of EDA, IP cores, applications , support
• Xilinx and Altera are morphing into EDA companies.
© 2004, [email protected]
56
http://hartenstein.de
Cadence
TU Kaiserslautern
• FPGA Designer: top-down FPGA design system,
• high-level mapping, architecture-specific optimization,
• Verilog,VHDL, schematic-level design entry.
• Verilog, VHDL to Synergy (logic synthesis) and FPGA Designer
• FPGAs simulated by themselves using Cadence's VerilogXL or Leapfrog VHDL simulators and
• simulated w. rest of the system design w. Logic
Workbench board/system verification env‘ment.
• Libraries for the leading FPGA manufacturers.
© 2004, [email protected]
57
http://hartenstein.de
Mentor Graphics
TU Kaiserslautern
•
•
•
•
System Design and Verification.
PCB design and analysis:
IC Design and Verification
shifts ASIC design flow to FPGAs (Altera, Xilinx)
–
–
–
–
by FPGA Advantage with IP support
by ModuleWare,
Xilinx CORE Generator
Altera MegaWizard integration,
© 2004, [email protected]
58
http://hartenstein.de
Synopsys
TU Kaiserslautern
•
•
•
•
•
FPGA Compiler II
Version of ASIC Design Compiler Ultra
Block Level Incremental Synthesis (BLIS)
ASIC <-> FPGA migration
Actel, Altera, Atmel, Cypress, Lattice, Lucent,
Quicklogic, Triscend, Xilinx
© 2004, [email protected]
59
http://hartenstein.de
new directions in application development
TU Kaiserslautern
•
•
•
•
new directions in application development.
aut. partitioning compilers: designer productivity
like CoDe-X (Jürgen Becker, Univ. of Karlsruhe),
supports Run-Time Reconfiguration (RTR), a key
enabler of error handling and fault correction by
partial re-routing the FPGA at run time, as well as
remote patching for upgrading, remote debugging,
and remote repair by reconfiguration - even over
the internet.
© 2004, [email protected]
60
http://hartenstein.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
Converging factors for RTR
• Converging factors make RTR based system design viable
• 1) million gate FPGA devices and co-processing with
standard microprocessors are commonplace
• direct implementation of complex algorithms in FPGAs.
• This alone has already
revolutionized FPGA design.
JBits
• 2) new tools like Xilinx Jbits
API
software tool suite directly
support coprocessing and RTR.
User
Java
Code
© 2002, [email protected]
61
Java
Compiler
Executable
http://kressarray.de
RTR
TU Kaiserslautern
• divides application into a series of sequentially executed stages, each
implemented as a separate execution module.
• Partial RTR partitions these stages into finer-grain sub-modules to be
swapped in as needed.
• Without RTR, all conf. platforms just ASIC emulators.
• needs a new kind of application development environments.
• directly support development and debugging of RTR appl.
• essential for the advancement of configurable computing
• will also heavily influence the future system organization
• Xilinx, VT, BYU work on run-time kernels, run-time support, RTR
debugging tools and other associated tools.
• smaller, faster circuits, simplified hardware interfacing, fewer IOBs;
smaller, cheaper packages, simplified software interfaces.
© 2004, [email protected]
62
http://hartenstein.de
Run-time Mapping
TU Kaiserslautern
• run-time reconfigurable are: Xilinx VIRTEX FPGA family
• RAs being part of Chameleon CS2000 series systems
• Using such devices changes many of the basic assumptions
in the HW/SW co-design process:
• host/RL interaction is dynamic, needs a tiny OS like eBIOS,
also to organize RL reconfiguration under host control
• typical goal is minimization of reconfiguration latency
(especially important in communication processors), to hide
configuration loading latency, and,
• Scheduling to find ’best’ schedule for eBIOS calls (C~side).
© 2004, [email protected]
63
http://hartenstein.de
>> future directions <<
TU Kaiserslautern
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
http://www.uni-kl.de
© 2004, [email protected]
• Future directions
• conclusions
64
http://hartenstein.de
Soft CPU: new job for compilers
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
Memory
FPGA
core
HLL
© 2002, [email protected]
soft
CPU
Compiler
65
FPGA
http://kressarray.de
Soft rDPA feasible ?
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
[à la S. Guccione]
© 2002, [email protected]
66
http://kressarray.de
Array I/O examples
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
data streams, or, from / to
embedded memory banks
Performance
1000
100
µProc
60%/yr..
10
1
1980
Processor-Memory
Performance Gap:
(grows 50% / year)
CPU
DRAM
1990
[à la S. Guccione]
2000
DRAM
7%/yr..
data
streams,
or,
from / to
embedded
memory
banks
© 2002, [email protected]
67
http://kressarray.de
HLL 2 Soft Array
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
miscellanous
HLL
Compiler
soft CPU
Memory
[à la S. Guccione]
© 2002, [email protected]
68
http://kressarray.de
HLL 2 „flex“ rDPA
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
miscellanous
HLL
Compiler
CPU
Memory
[à la S. Guccione]
© 2002, [email protected]
69
http://kressarray.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
>> HLLs <<
© 2002, [email protected]
70
http://kressarray.de
HLLs for Hardware Design vs.
System Design vs. RTR System
Design
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
HLL
Compiler
System Design
HLL
[à la S. Guccione]
© 2002, [email protected]
Compiler
RTR System Design
71
http://kressarray.de
HLLs for Hardware Design vs.
System Design vs. RTR System
Design
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
HLL
Compiler
HLL
Compiler
System Design
HLL
[à la S. Guccione]
© 2002, [email protected]
Compiler
RTR System Design
72
http://kressarray.de
CPU and memory on Chip
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
HLL
Compiler
FPGA core
RTR System Design
HLL
Compiler
CPU Memory
core
core
[à la S. Guccione]
© 2002, [email protected]
73
http://kressarray.de
Jbit Environment
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
RTP Core
Library
[à la S. Guccione]
JRoute
API
JBits
API
User
Code
BoardScope
Debugger
XHWIF
TCP/IP
Device
Simulator
© 2002, [email protected]
74
http://kressarray.de
HLLs for Hardware Design vs.
System Design vs. RTR System
Design
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
HLL
Compiler
HLL
Compiler
System Design
[à la S. Guccione]
© 2002, [email protected]
75
http://kressarray.de
Embedded System Design
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
FPGA core
HLL
Compiler
CPU Memory
core
core
HLL
Memory
core
soft
CPU
FPGA
Compiler
[à la S. Guccione]
© 2002, [email protected]
FPGA
76
http://kressarray.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
>> conclusions <<
• rGAs
• Placement & Routing
• Soft Processors
• History of Frameworks
• RTR
• Support by rGA vendors
• EDA
http://www.uni-kl.de
© 2002, [email protected]
• Future directions
• conclusions
77
http://kressarray.de
Xputer Lab
TU
Kaiserslautern
University of Kaiserslautern
missing the next revolution
Ignoring reconfigurable computing
by teaching computing fundamentals
within our CS curricula is one of
the biggest mistakes in the history of
information technology application
causing the waste billions of dollars.
©
© 2004,
2001, [email protected]
[email protected]
78
http://hartenstein.de
http://KressArray.de
TU Kaiserslautern
„EDA industry shifts into CS mentality“
[Wojciech Maly]
• Microprogramming to replace FSM design
• Hardware languages replace EE-type schematics
• EDA Software and its interfacing languages
• Newer system level languages like systemC etc.
• Small and large module re-use
• Hierarchical organization of designs, EDA, et al.
• .....................
© 2004, [email protected]
79
http://hartenstein.de
TU Kaiserslautern
„EDA industry shifts into CS mentality“
[Wojciech Maly]
• Which language to select ?
© 2004, [email protected]
80
http://hartenstein.de
roadmap
TU Kaiserslautern
old CS lab course philosophy:
given an application: implement it by a program -/-
new CS freshman lab course environment:
Given an application:
a) implement it by writing a program
b) implement it as a morphware prototype
c) Partition it into P and Q
c.1) implement P by software
c.2) implement Q by morphware
c.3) implement P / Q communication interface
© 2004, [email protected]
81
http://hartenstein.de
All enabling technologies are available
TU Kaiserslautern
• literature from last 30 years
• languages & (co-)compilation techniques
• anti machine and all its architectural resources
• parallel memory IP cores and generators
• morphware vendors like PACT ....
• anything else needed
© 2004, [email protected]
82
http://hartenstein.de
TU Kaiserslautern
END
© 2004, [email protected]
83
http://hartenstein.de
TU Kaiserslautern
The dichotomy of models
• Note for von Neumann:
state register is with the CPU
• Note for the anti machine:
state register is with memory bank /
state registers are within memory banks
© 2004, [email protected]
84
http://hartenstein.de
Machine Paradigms
TU Kaiserslautern
machine category
Computer (the Machine:
“v. Neumann”)
driven by:
Instruction streams
data streams (no “dataflow”)
engine principles
instruction sequencing
sequencing data streams
state register
single program counter
(multiple) data counter(s)
at run time
at load time
resource
DPU (e.g. single ALU)
DPU or DPA (DPU array) etc.
operation
sequential
parallel pipe network etc.
Communication path set-up
. fetch” )
( “instruction
data
path
*) e g. Bee project Prof. Broderson
© 2004, [email protected]
The Anti Machine
also hardwired implementations*
85
http://hartenstein.de
benefit from RAM-based & 2nd paradigm
TU Kaiserslautern
1)
2)
RAM-based platform needed for:
• flexibility, programmability
• avoiding the need of specific silicon
mask cost:
currently 2 mio $
- rapidly growing
simple 2nd machine paradigm needed as a common model:
• to avoid the need of circuit expertize
• needed to to educate zillions of programmers
© 2004, [email protected]
86
http://hartenstein.de
Design Space Exploration Systems
TU Kaiserslautern
interactive
status evaluation
status generation
[66]
no
abstract models
rule-based
1992
[67]
yes
prediction models
device generator
DIA
1998
[68]
yes
prediction from library
rule-based
DSE for RAW
1998
[49]
no
analytical models
analytical
ICOS
1998
[76]
no
fuzzy logic
greedy search
DSE for Multimedia 1999
[77]
no
simulation
branch and bound
yes
fuzzy rule-based
simulated annealing
Explorer System
year source
DPE
1991
Clio
Xplorer
1999 [11] [50]
© 2004, [email protected]
87
http://hartenstein.de