SPARK Overview

Download Report

Transcript SPARK Overview

SPARK
Accelerating ASIC designs through parallelizing
high-level synthesis
Sumit Gupta
Rajesh Gupta
[email protected]
Outline
The target
 The problem
 The technology
 The competition

The
 The
 The
 The

market opportunity
people
status
plan
©2003 Spark Team, Confidential
2
A Chip Is A Wonderful Thing!
A typical chip, circa: 2006
 50 square millimeters
 50 million transistors
 1-10 GHz, 100-1000 MOP/sq mm, 10-100 MIPS/mW
 300 mm, 10,000 units/wafer, 20K wafers/month
 $5 per part
Does not matter what you build
 Processor, MEMS, Networking, Wireless, Memory
 But it takes $20M to build one today, going to $50+M
 So there is a strong incentive to port your
application, system, box to the “chip”
3
But Design Decisions Matter!
4
Technical Target

Anyone and everyone with a
technology IP to grind (build on-chip)
– E.g., WLAN, Cellphone Chips:
• about 50 GOPS in BB processing
– and about 72 other application ‘markets’
enhanced by ASIC/FPGA parts

More technically
– Behavioral descriptions with complex and
nested conditionals and loops.
©2003 Spark Team, Confidential
5
The Problem

Doing chip design in a system house is
increasingly a costly proposition
– Case Study: Conexant in 802.11a chip
• 9 month from PRD to parts
• 7 months from PRD to synthesizable RTL
• The pain is in getting the algorithmic right for
the chip implementation

Would love a “compiler”
– but “push-buttons” just do not work.
©2003 Spark Team, Confidential
6
Enter High-Level Synthesis
Task
Analysis
Hardware
Behavioral
Description
High
Level
Synthesis
HW/SW
Partitioning
ASIC
I/O
Software
Behavioral
Description
Software
Compiler
©2003 Spark Team, Confidential
FPGA
Memory
Processor
Core
7
Poor QOR, even Poor Controllability
T
d=e-f
c
Memory
If Node
F
g=h+i
Control
x = a + b;
c = a < b;
if (c) then
d = e – f;
else
g = h + i;
j = d x g;
l = e + x;
x=a+b
c=a<b
ALU
Data path
j=dxg
l=e+x
©2003 Spark Team, Confidential
8
The Technology: Enter SPARK
By the time you got to CDFG, it is
already too late
 Parallelize (judiciously) and submerge
it with HLS.

C Input
Original
CDFG
Source-Level Compiler
Transformations
Scheduling
& Binding
Optimized
CDFG
VHDL
Output
Scheduling Compiler &
Dynamic Transformations
©2003 Spark Team, Confidential
9
Why SPARK, Why Now?

The chip designer is finally
– letting go of the cycle boundary in design
– being replaced by non-chip types

Education and awareness through
– Synopsys Behavioral Compiler
– But not ready to be the dominator…

SPARK changes the landscape
– Parallelizing compilation as the ‘power tool’
©2003 Spark Team, Confidential
10
SPARK Core Strengths

Focus on
– Transformations that increase amount of
parallelism available in the source
description
– Tightly integrate with parallelizing compiler
transformations

Provide a HLS Toolbox for the microarchitect
– Fire the circuit designer.
©2003 Spark Team, Confidential
11
The POC and The Experiments

Intel ILD design
– Produced a design that fundamentally
restructures the input description (the way a
designer would, and no tool could)

Bunch of other media benchmarks
– 40-70% improvement in delay for the same area
– Based on Synopsys backend

See appendix.
©2003 Spark Team, Confidential
12
The Market Opportunity

The big picture
– Semi is $140B, Fabless Semi is $15B
– EDA currently is about $4B

Current EDA market
– $1B Synthesis and verification
• $400M synthesis, $400M verification, $200M E.
– $3B in PDA, IP and Design Services.

$400M Synthesis
– 90% is RTL and below.

Market movement and ‘structural’ changes.
©2003 Spark Team, Confidential
13
Future ESL and Synthesis Market

Keys to growth
– ASIC focus (including structured ASICS)
– ‘Power tool’ key to commanding high ASPs

Challenge
– The raid of the FPGAs
• In which case, PHLS will be OEM’d
– ASICs mired in Nano swamp
• Attention shifts to PDA, stationary semi market
©2003 Spark Team, Confidential
14
The Competition

The early educator: Synopsys BC
– Classical HLS that just does not work,
fundamentally flawed

The improviser: Cadence Get2Chip A2C
– Done a good job at RTL

The others
– Celoxica, Forte, Synfora, BlueSpec
– “Boutiques” primarily targeted for “somebody
else”
©2003 Spark Team, Confidential
15
The Competition
Synopsys
Behav.
Compiler
Traditional HLS: Synthesis from subset
of SystemC and Behav VHDL
No parallelizing and
beyond basic block
(BBB) transformations
Cadence/ A2C
Get2Chip
Traditional HLS; closely tied to logic
synthesis
No parallelizing and
BBB trafos
Celoxica
DK
Design
Suite
Uses explicitly parallelized input in
Handel-C; traditional HLS
No pure behavioral
input such as C or
SystemC
Forte DS
Cynthesiz
er
Traditional HLS from SystemC with
design space exploration
No parallel and BBB
trafos
Synfora
NA
Maps applications to a VLIW processor
and a pipelined array of processors –
uses parallelizing transformations in
VLIW compiler
Does not do HLS at
all – it’s more of a
mapping tool from C
to a processor array
BlueSpec
NA
Based on term rewriting systems; starts
Not HLS – input is
from a description closer to RTL than to behav code already
behav
scheduled into states
What Do We Want To Do?

Make it accessible to SystemC, SystemVerilog
– Front end architecture to port it across

Implement missing compiler passes
– Really standard stuff but missing piece now

Work out a design flow
– Build a path to existing RTL flow incl. validation
Industry strength characterization
 Secure IP rights

©2003 Spark Team, Confidential
17
Synergistic Activities

SPARK release on the web
– Mailing list
– Build the users group
– Expand to SystemC User Community

Kluwer book in preparation
– Announcement at DATE, Feb 2004
– Availability at DAC, June 2004
©2003 Spark Team, Confidential
18
Exit Strategy
Not yet worked out, but…
 Build a stand-alone EDA company

– As a standalone it would not work unless
complemented by verification

Build to be bought
– As an HLS company

License technology
– Companies that have shown interest in licensing it
• Poseidon Systems, Cadence
©2003 Spark Team, Confidential
19
SPARK History

A joint project
– Rajesh Gupta, Nikil Dutt, Alex Nicolau

Kicked off in Fall 1999
– First Ph.D., Sumit Gupta, 2003

Supported by
– Semiconductor Research Corporation, SRC
– Intel grant as a match to UC Micro
– National Science Foundation.
©2003 Spark Team, Confidential
20
Case Study: Intel Instruction Length Decoder
Stream of
Instructions
Instruction Buffer
Instruction Length Decoder
First
Insn
Second
Insn
Copyright Sumit Gupta 2003
Third
Instruction
21
ILD Synthesis: Resulting
Architecture
Speculate Operations,
Fully Unroll Loop,
Eliminate Loop Index
Variable
Multi-cycle
Sequential
Architecture


Single cycle
Parallel
Architecture
Our toolbox approach enables us to develop a script to
synthesize applications from different domains
Final design looks close to the actual implementation done
by Intel
Copyright Sumit Gupta 2003
22
Target Applications
Design
# of Ifs
# of
Loops
# Non-Empty
# of
Basic Blocks Operations
MPEG-1
pred1
4
2
17
123
MPEG-1
pred2
11
6
45
287
MPEG-2
dp_frame
18
4
61
260
GIMP
tiler
11
2
35
150
Copyright Sumit Gupta 2003
23
Scheduling & Logic Synthesis Results
1.2
MPEG-1 Pred1 Function
MPEG-1 Pred2 Function
1.2
1
0.8
0.6
0.4
1
36%
39%
0.8
42%
0.6
10%
0.2
0
0
Unit Area
8%
0.4
0.2
Longest Path(l Critical Path(c Total Delay (c*l)
cyc)
ns)
36%
Longest Path(l Critical Path(c Total Delay (c*l)
cyc)
ns)
Unit Area
Non-speculative
CMs: Within
Overall: 63-66
% improvement
in Delay
+ Pre-Synthesis Transforms
BBs & Across Hier Blocks
Almost constant Area
+ Speculative Code Motions
+ Dynamic CSE
Copyright Sumit Gupta 2003
24
Scheduling & Logic Synthesis Results
1.2
MPEG-2 DpFrame Function
1.2
1
0.8
0.6
GIMP Tiler Function
1
33%
20%
1%
0.6
0.4
0.4
0.2
0.2
0
0
Longest Path(l Critical Path(c Total Delay (c*l)
cyc)
ns)
52%
0.8
Unit Area
41%
14%
Longest Path(l Critical Path(c Total Delay (c*l)
cyc)
ns)
Unit Area
Non-speculative
CMs: Within
Overall: 48-76
% improvement
in Delay
+ Pre-Synthesis Transforms
BBs & Across Hier Blocks
Almost constant Area
+ Speculative Code Motions
+ Dynamic CSE
Copyright Sumit Gupta 2003
25