LAMP: A Tool Suite for Families of FPGA-based

Download Report

Transcript LAMP: A Tool Suite for Families of FPGA-based

LAMP: A Tool Suite for Families of FPGA-based
Computation Accelerators
Tom VanCourt
Martin Herbordt
Accelerator design isn’t logic design.
Logic design:
Accelerators:
• Optimizes individual problems
• Optimize families of problems
• Reuse of leaf components
• Reuse of control components
• Specific to hardware platform
• Should be portable between platforms
• Stable implementations
• Flexible, user-defined applications
• Parallelism defined by designer
• Degree of parallelism undefined
As much as this FPGA will fit for this app
Accelerators require skilled logic
design for high performance
AND
Create a model with behavior left
as parameter to be provided.
Implement applications as families.
Case study: Dynamic Programming for Approximate String Matching – Choose:
• Character by character alignment or goodness-of-match only
• Global alignment (with end-rule options) or local, gap parameters
• Character type
DNA
[2 bits]
IUPAC wildcards
[4]
Amino acid
[5]
Codons
[6]
Ascii text
[8]
Unicode 3.0 text
[16]
• Mismatch scoring, may be parameterized
Model instance
Annotated VHDL
components
Score Only
Logic designer provides
Annotated VHDL
App Abstraction
HW Abstraction
HW Concretion
Alignment
Type
The semantic gap isn’t going away.
Semantic gap
Gulf between high-level design and low-level implementation
Compiled
programs
C++, Java, high level programming languages
vs. Compiled machine code
FPGA
accelerators
Application-specific knowledge in framework
vs. Gate-level implementation primitives
Why not compile
C into logic?
Compiled
code
Synthesized
logic
Global
(Smith-Waterman)
(Needleman-Wunsch)
Nucleotide
Amino acid
Codon
PAM-N
Cost vs. value of
design effort
Wildcard
Machine code
Exact
Match
C++, Java
Semantic Gap
…
Gonnet
Smaller PEs Higher parallelism
Larger fabric Increased computing capacity
Value of
FPGA
acceleration
Larger PEs Don’t constrain other
implementations
FPGA capacity increases →
abstract const Score zeroScore;
Score compare(Ref r, Que q) {
bool isMatch =
(r.a & q==0) | (r.c & q==1) |
(r.g & q==2) | (r.t & q==3);
…
Concrete definition (partial)
Abstract definition of character type
Application-specific implementation
can give acceleration > 100.
Application acceleration
Xilinx VP70 Virtex-II Pro
relative to
3GHz Intel Xeon
DNA alignment
Protein Alignment
152 
to
215 
77 
to
175 
~100 
Rigid Docking
to ~500 
Every different application gets individually tuned performance.
Simple applications don’t have to run at ‘worst case’ speed.
Approximate matching application family:
Each component varies individually
Combinatorics work in our favor
Each user creates new possibilities!
Repetition increases value of the design effort
Cost of
FPGA
design
class IUPAC extends CharType {
type Ref {bool: a, c, g, t};
type Que int 0 .. 3;
type Score int -1000 .. 1000;
const Score scoreZero = 0;
match = +1,
miss = -10;
abstract Score compare(
Ref refCh, Que queryCh);
“10-100 of performance ... has been at the cost of 10-100
increase in difficulty in application development” *
Effort in designing leaf components is about the same
… Effort in designing an array is largely independent of array size
… Larger FPGAs hold larger computation arrays
Subclassing creates applicationspecific data types and behaviors.
class CharType {
abstract type Ref, Que, Score;
Automated replication makes
maximum use of FPGA fabric.
FPGA capacity is exploding.
“An order of magnitude increase in any computing resource
changes the way in which that resource is used”
Reusable control and interface components
Interface definition of application classes and operations
Abstract definition of FPGA hardware resources
Actual resources present in the FPGA platform
Application specialist provides
App Concretion
Actual definitions specific to the application instance
Model Instance
Generic accelerator bound to specific HW and application logic
}
Domain
Knowledge
Gates
BLOSUM-N
FPGAs are near a crossing point.
What changed?
…
Forty years of research haven’t solved the problem.
Semantic complexity increases →
Until now
Character
Type
AppConcretion
Accelerator model for application family
Alignment
Local
HwConcretion
AppAbstraction
Result Type
requires domain specialists
for tailoring to details of
specific applications.
HwAbstraction
BOSTO
UNIVERSITY
N
This research was supported by NIH grant RR020209-01,
“FPGA-Based Computational Accelerators.”


2
result types
17
alignment types
15
character types
510 different accelerators
created on demand
* M. Gokhale, J. Stone, J. Arnold, and M. Kalinowski.
Stream-oriented FPGA computing in the Streams-C high-level language.
Proc. FCCM. 2000