SWOT Data Acquisition System - University of Massachusetts Amherst
Download
Report
Transcript SWOT Data Acquisition System - University of Massachusetts Amherst
LOPASS: A Low Power Architectural Synthesis
for FPGAs with Interconnect Estimation and
Optimization
Harikrishnan K.C.
University of Massachusetts Amherst
1
Overview
•
•
•
•
•
•
•
•
•
Motivation
Introduction
FPGA Architecture
LOPASS Synthesis Flow
High level Power Estimation
Power Optimization Engine
Multiplexer Optimization for Interconnect Reduction
Experimental Results
Conclusion
2
Motivation
• Power consumption
• Critical constraining factor in IC design flow
• Field Programmable Gate Arrays(FPGA)
• Power inefficient due to large amount of transistors for
programmability
• Fixed Logic and Routing Resources
• Difficult to optimize during physical design stage
3
Introduction
• Behavioral Level Optimization
•
scheduling, allocation, binding
• Techniques for power reduction
•
•
•
high level power estimation
simultaneous scheduling allocation and binding for power
optimization
interconnection optimization
4
Previous Work
• Most previous high level synthesis techniques for FPGAs
optimized objectives other than power reduction
• Dynamic reconfiguration during run time to save area,
[M. Vasilko, Int.Workshop Logic Architecture Synthesis,1995]
• Tradeoff between power and circuit speed by selecting
different implementations of components
• Power consumption in steering logic and interconnects were
not considered. [F. G. Wolff, Proc IEEE Nat.Aerospace.Conf.,2000]
• Newer studies have looked into simultaneous resource
allocation and binding algorithms for power reduction
[D. Chen, Proc. AsiaSouth Pacific Des. Autom. Conf., Jan. 2007]
5
Techniques for Power Reduction
• High level power estimation
• For effective power optimization
• wire capacitance, length, FPGA characteristics
• Power Optimization engine
• combined solution space
• Simulated Annealing based algorithm
• Interconnect Optimization
• Reduce Multiplexer(MUX) requirement
6
FPGA Architecture
• SRAM based technology
• Configurable Logic Block (CLB)
• Basic Logic Element (BLE)
• Look Up Table (LUT)
• Routing Architecture parameters
• Channel Width (W)
• Switch box flexibility (Fs)
• Connection box flexibility (Fc)
7
LOPASS Synthesis Flow
•
•
•
•
•
Design in HDL converted to CDFG
Estimated power values from power estimator
Power optimization by low power optimization engine
RTL synthesis using Design Compiler
FPGA evaluation tool fpgEva_LP2
report delay, power and area.
8
High Level power Estimation
• Wire Length Estimation
• Rent’s Rule T = kNp
• Interconnect density function i(l)
• p is Rent’s exponent, α is fraction of sink terminals
• f.o is average fan-out, k is average input/output per CLB
9
High Level power Estimation cont.
• Switching Activity Estimation
• CDFG simulation
• Cin(O,O’) , input transitions when FU switches from O to O’
• The switching activity Sin is given by
• The total switching activity of the overall design
10
High Level power Estimation cont.
• Resource library Characterization
• Design ware libraries from Synopsys
• different resource versions for implementing same operation
type
Resource characterization flow
11
High Level power estimator
• Static and Dynamic power need to considered
• Dynamic power is given by
• Pdynamic = PLUT + PREG +PLW +PGW
• Static power is given by
• Pstatic = Ps_LUT + Ps_FF + Ps_LB + Ps_GB
• PLUT = NLUT.S.ELUT.f
• PREG = NREG.S.EREG.f
• PLW, GLW = 0.5f.S.Vdd2.Cwire
12
Power Optimization Engine
• FPGAs have abundance of distributed registers
• No efficient support for wide MUXes
• Uses simulated annealing based on hill climbing to
gradually reduce overall power
Power Optimization engine
13
Multiplexer Optimization for
Interconnect Reduction
• Register binding
• Cofamily based algorithm
• Port assignment
• Port Assignment Algorithm
• Definitions
• DFG, G =(V,A)
• Compatibility Graph Gc = (Vc,Ac)
14
Register Binding
• Given a compatibility graph Gc = (Vc,Ac)
• find a subset of Ac that covers all vertices in Vc
• total sum of weights of all edges is minimum
• Calculate minimum weighted cofamilies of a partially
ordered set (POSET)
• POSET
• chain, antichain, k-family, k-cofamily
• Theorem: Register binding on a compatibility graph Gc
into k registers is equivalent to finding k disjoint chains in
the POSET.
15
Register Binding cont.
• Find the minimum weighted k-cofamily in POSET
• Convert POSET to a network flow graph, the split graph
• Find the minimum cost flow for this split graph
• Cost of each edge is given by
16
Cost Function Formulation
• A MUX occurs in two situations
• when more than two registers feed data to a port
• when more than two FUs produce results and store them into a
register
• The cost function is defined as
Nmux = number of MUXes saved/wasted
Tr-f = total connections between registers and fan out FUs
Tfu = total fanout FUs involved
α and β are positive scaling constants
17
Port Assignment
• Technique for reducing MUX connection
• Case 1
• Case 2
18
Experimental Results
• Power Estimation
• Comparison between estimated power and those reported by
fpgaEva_LP2
• Wire length is 13.7% away from reality
• Total power is 14.1% away from reality
• Multiplexer Optimization
• Comparison between k-co family algorithm and Bipartite
algorithm and Left edge algorithm
• 24.7 % better than Bipartite algorithm
• 29.6% better than Left edge algorithm
19
Experimental Results
• LOPASS Compared to SPARK
• 9.1 % better in terms of latency optimization
• LOPASS Compared to Synopsys Behavioral Compiler
• 57.3% reduction in CLBs
• 61.6% reduction in total power consumption
• 10.6% reduction in critical delay
• LOPASS Compared to Impulse C
• On average 77.1% reduction in multipliers and 27.9% in LEs
• 44.1% and 31.1% reduction in dynamic and total power
20
Conclusion
• A Low power architectural synthesis system, LOPASS
for FPGA designs is presented
• It includes three major components
• a flexible high level power estimator
• a simulated annealing based optimization engine
• a k-co family based register binding algorithm
• LOPASS is 61.6% better on power consumption and
10.6% better on clock period compared to Synopsis BC
• LOPASS is 31.1% better on power consumption with
11.8% penalty on clock period compared to Impulse C
21
Thank You!
22