High-Level Synthesis
Download
Report
Transcript High-Level Synthesis
Politecnico di Milano
HIGH LEVEL SYNTHESIS WITH AREA
CONSTRAINTS FOR FPGA DESIGNS:
AN EVOLUTIONARY APPROACH
Tesi di Laurea di:
Christian Pilato
Matr.n. 674373
Relatore: Prof. Fabrizio FERRANDI
Correlatore: Ing. Antonino TUMEO
Outlines
High-Level Synthesis
Proposed methodology
Experimental results
Some further extensions…
Conclusion and future works
Summary
2
High-Level Synthesis
Three main sub-tasks:
Inputs:
“High-Level
Synthesislevel
means
going
from
an algorithmic
level
Output:
register-transfer
(RTL)
design
in
a
hardware
description
behavioral
description
(in
C
language)
1. operation scheduling: when operations start their execution
specification
of
a behaviour
of resources
aVerilog)
digital system to a register level
language
(e.g. of
SystemC,
VHDL
and
library
different
types of
2. structure
resourcethat
allocation
and binding:
where operations are executed,
implements
that behavior”.
set of constraints
where
values objectives
are stored (area,
and how
elements
Goal:
minimize
latency,
etc.)are interconnected.
McFarland, et al., Proc. IEEE, February 1990.
3. controller synthesis: which operations are issued
Resource
Library
Behavioral
specification
Design
constraints
High-Level
Synthesis tool
Objectives
Scheduling
Allocation
Binding
Datapath
& Controller
Controller
Synthesis
High-Level Synthesis – Problem description
3
What are the problems?
All the sub-tasks are NP-complete: no efficient algorithms
Interconnections have to be considered: up to 80% of final area
All the tasks are closely interdependent
Most of information are available only at the end of the synthesis
Try non-deterministic approaches with feedback information
Genetic algorithms
Multi-objective optimization: reducing to single-objective
(weighted average) is not efficient
Non-dominated Sorting Genetic Algorithm (NSGA-II)
K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGAII,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.
High-Level Synthesis – Problem description
4
The proposed methodology
Fitness Evaluation
Chromosome
Partial binding
scheduling
Finite State
Machine creation
Initial
population
Rank sorting
(NSGA-II)
Register
allocation
Interconnection
optimization
Selection
Result
estimations
Mutation
Add offsprings to
parent population
NO
Are any
stopping criteria
met?
End
YES
Crossover
High-Level Synthesis and Design Space Exploration
5
Experimental results
Development framework
Integrated in the PandA framework
an open-source C++ framework covering different aspects of
the hardware-software design of embedded systems
Evolutionary computation with Open BEAGLE framework
Functional validation
Comparison between Verilog and C simulations
Estimation model validation
Comparison between estimations and logic synthesis values
average error equal 4.02 %
standard deviance equal 2.82 %
maximum error less than 10 %
These values can be effectively used as fitness values
Experimental results
6
Experimental results
Design Space Exploration validation
Population size of 1.000 individuals, evolving up to a maximum of
200 generations
the best trade-off between overall execution time and solution
quality.
Considerations:
It takes into account all elements in the design solution
It can cover a good number of trade-offs between the fastest solution
and the minimal area solution
Better approach than existing tools to deal with area constraints
Paper accepted for publication at International Symposium on Systems, Architectures, MOdeling and
Simulation (SAMOS), Samos, Greece, July 2007
Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs”
Experimental results
7
Some features just provided…
Weighted clique covering: in register allocation to reduce interconnections
An higher weight is assigned to compatibility edge when the two
values involve the same functional units
Clique covering on a weighted graphs; results show a further
reduction of overall area up to 10%.
Fitness inheritance: to reduce overall execution time
A fraction of expensive real evaluations is substituted with an
estimation based on similar evaluated individuals
It is able to reduce overall execution time over by 25%
No substantial difference in the final Pareto-optimal solution
Paper submitted to IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September
2007.
Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis”
Some extensions…
8
Conclusion and future works
The main contributions from this thesis are:
An high-level synthesis flow from C specifications to HDL
descriptions
Integration of a model for fast estimation of synthesis results
Design space exploration with a genetic algorithm:
It takes into account all elements composing the design solution
High fitting with real values
Multi-objective concurrent optimization
Future works:
Optimize the results coming from the synthesis flow
Further reduce the overall execution time of the proposed methodology
Refine the estimation model and specialize it for different targets
Conclusion and future works
9
Thank you!
Christian PILATO
Matr. n. 674373
High-Level Synthesis Flow
The proposed flow is organized as follows:
From C to intermediate representation
from GIMPLE to produce graph representation
High-Level Synthesis Flow
1. Partial binding and Scheduling
2. Finite State Machine creation
3. Register allocation
4. Interconnection allocation
5. Performance and area estimations
From data structures to intermediate representation in form of graph
From intermediate representation to Hardware Description Language (e.g.
Verilog) ready for logic synthesis
The proposed High-Level Synthesis flow
5
Partial Binding and Scheduling
Partial binding: force an operation to be executed on a selected
functional unit instance
β (+1) = < plus; 0 >
A technique introduced to partially control the final area occupation
It can affect scheduling, register allocation and interconnection
allocation
Scheduling: assign a starting control step to each operation to be
executed
Many scheduling algorithms are able to support partial binding
(Integer Linear Programming formulation, list based algorithm, etc.)
Different solutions based on the selected algorithm
1. Partial binding and Scheduling
6
FSM and Register allocation
Scheduling gives information about concurrent operations.
This information is useful for controller synthesis and register allocation
State Transition Graph (STG), based on Moore-FSM model, is created
on scheduled specification
It represents control flow and concorrent operations
Conditional operations create bifurcation based on evaluated
conditional values
Register allocation: allocate elements to store values across cycle step
boundaries. A compiler approach has been implemented on STG:
Liveness analysis based on dataflow equations
Interference graph based on liveness information
Different heuristics to minimize number of registers
2-3. Finite State Machine creation and Register allocation
7
The final steps…
Interconnection allocation: allocate elements to interconnect the
hardware components
Mux-based architecture: port swapping for commutative operations
Glue logic: represent logic netlist to decode commands and select inputs
Truth tables based on signals from controller
The RTL structural description is now available and it considers all
elements. Objective values could be retrieved from logic synthesis
too slow!
Estimation model: perform fast estimations of objective values.
Area is difficult to be estimated
Updated and used an existing area model*
*: C. Brandolese, W. Fornaciari, and F. Salice. “An Area Estimation Methodology for FPGA Based Designs at
SystemC-Level ”, DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 129– 132,
2004.
4-5. Interconnection allocation and result estimations
8
Problem dependent elements
Chromosome encoding
Each operation in the specification has a gene to represent a feasible
partial binding
Genes are added to represent algorithms used to perform high-level
synthesis steps: scheduling, register allocation and interconnection
optimization
Fitness Evaluation
Information from chromosome about partial binding and algorithms
are used to perform a synthesis flow.
Objective values are estimated using the proposed model
Design Space Exploration by Genetic Algorithm
10
Problem independent elements
Generic operators
common operators (crossover and mutation) used without
modifications: no unfeasible chromosomes can be created.
If the gene changed by operators is related to:
operation: a new binding constraint for that operation.
algorithm: a different algorithm to solve the related synthesis step
Initial population
created by random or starting from some interesting points to explore
around them.
Solution ranking
ranking into different levels according to their fitness values.
accelerated using the fast-non-dominated-sort algorithm available in
the NSGA-II
Design Space Exploration by Genetic Algorithm
11