lect05-ece697f

Download Report

Transcript lect05-ece697f

ECE 697F
Reconfigurable Computing
Lecture 5
Technology Mapping: Packing Logic into LUTs
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Overview
• Logic synthesis
• LUT Clustering
• LUT capacity
• Chortle – example technology mapper
• Architecture-specific optimization
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Boolean network
° A Boolean network is the main representation of
the logic functions for technology independent
optimizations.
° Each node can be represented as sum-ofproducts (or product-of-sums).
° Provides multi-level structure, but functions in the
network need not correspond to logic gates.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Boolean network example
primary outputs
out1 = k2 + x2’
k2 = x1’ x2 x4 + k1
out2 = k3 + x1
k3 = k1 x4’
k1 = x2 + x3
primary inputs
x1
Lecture 5: Technology Mapping: Packing Logic Into LUTs
x2
x3
September 22, 2004
x4
Terms
° Support: set of variables used by a function.
° Transitive fanout: all the primary outputs and
intermediate variables of a function.
° Transitive fanin: all the primary inputs and
intermediate variables used by a function.
Transistive fanin determines a cone of logic.
primary inputs
cone
Lecture 5: Technology Mapping: Packing Logic Into LUTs
output
September 22, 2004
Partially-specified function
1
x2
don’t care
x1
0
1
1
x3
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Optimizations
° Simplification.
• Changing the way a function is represented.
° Network restructuring.
• Adding and removing nodes.
° Delay restructuring.
• Optimizations that reduce the height of critical paths.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Partial collapsing
f1
f4
f2
f3
before
Lecture 5: Technology Mapping: Packing Logic Into LUTs
F
f4
f3
after
September 22, 2004
Technology mapping
° Cover the function:
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
FPGA tech mapping
° Cost (number of inputs)
doesn’t always increase
with added functions:
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
FPGAs vs. custom logic
° Cost metric for static gates is literal:
• ax + bx’ has four literals, requires 8 transistors.
° Cost metric for FPGAs is logic element:
• All functions that fit in an LE have the same cost.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
LUT-based logic synthesis
° Find the largest logic cone that will fit into the LUT:
r = q + s’
q = g’ + h
s = d’
d=a+b
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
How much fits in a LUT?
• One 2-input NAND gate frequently used for comparison.
A
B
C
D
A
B
C
D
• Approximately 12 ~ 15 gates per four-input LUT.
• 216 functions -> 80 after IO swapping
14 after IO inversion
• 4-input determined to be optimal
[Rose 1990]
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Technology-Independent Logic Optimization
• Improve circuit based on cost
- Keep same functionality
• Boolean Evaluation/decomposition
• Simple factoring -> minimizing literals
f = ac + ad + bc + bd
g=a+b+c
e=a+b
g=e+c
f = e(c + d)
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Factorization
° Based on division:
• formulate candidate divisor;
• test how it divides into the function;
• if g = f/c, we can use c as an intermediate function for f.
° Algebraic division: don’t take into account
Boolean simplification. Less expensive then
Boolean division.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Library-based Technology Mapping – MIS II
• Three steps: decomposition, matching, covering
• Circuit first decomposed into NAND representations
• Different collections of NANDs can be implemented differently in
VLSI
Inv, cost 2
NAND2, cost 3
Lecture 5: Technology Mapping: Packing Logic Into LUTs
AOI-21, cost 4
September 22, 2004
MIS II
Cost =
Cost =
• Decompose into NAND-2 using Boolean techniques
• Use dynamic programming to match subtrees with libraries
• Choose lowest cost implementation that covers all
primitives.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Tech Mapping for LUTs
• Minimize total number of LUTs
• Minimize the number of levels of LUTs
• Many different approaches
- Partitioning -> Flowmap
- BDDs -> XMAP
- Chortle -> Covering
• Basic Xilinx tech mapping follows Chortle with
modification to handle registers.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Chortle-crf
• Dynamic programming approach
• Minimize # LUTs – primary goal
• Minimize # input circuit root uses
- Secondary goal
• Operates on AND-OR circuits.
AB C DE F
GH I
J
K L
M
w
x
Locate boundaries
Lecture 5: Technology Mapping: Packing Logic Into LUTs
y
z
September 22, 2004
Chortle-crf
• Major innovation is bin packing
• Simultaneously addresses decomposition and matching
• Goal: Find decomposition of every node in the network that
minimizes # LUTs in final circuit
With decomposition
2-LUTs
Without decomp
4-LUTs
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Mapping Each Tree
• Dynamically visit each node in the graph
- Fanin nodes drive the node under evaluation
Boxes -> fanin LUTs, cost is number of inputs
Bins -> N input LUT (in this case 5)
First Fit Decreasing /* construct 2-level decomp */
box list <- fanin LUTs sorted by size
bin list <- 0
while (box list is not 0) {
box <- largest LUT
find bin that will contain LUT
if bin doesn’t exist
bin <- box /* create new bin */
else
bin <- box
/* pack in exisiting */
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Multi-Level Decomposition
• Chain LUTs together
• Output of largest second level LUT connected to LUT with unused
input
• May need to add a new LUT
• Leads to min LUTs and fanout LUT with smallest # input
• This fanout LUT used as input to next stage
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Examples
a) Fanin LUTs
u
v
w
x
y
b) Two-level Decomposition
u
u
v
w
x
z.2
v
y
z.1
w
x
z.1
c) Multi-level Decomposition
y
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Optimality
• For LUTs with fewer than 6 inputs Chortle will create an optimal result for
subtree
• Combination of sub-trees is not optimized.
• Local optimizations needed to ensure global optimality.
Reconvergent paths -> net drives multiple gates.
Replicating logic -> creating additional fanout
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Translating a Design to an FPGA
• Improve 2-level decomposition to take fanout into account
• Replace FFD with an exhaustive search that repeatedly invokes
FFD.
• Try both with and without reconvergent path and select best
mapping (forced merging)
• Inputs must reconverge at node being decomposed.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Reconvergent Paths
• Frequently, more than one pair of fan-in LUTs share inputs
• For each combination of pairs that share inputs, perform FFD.
• Two-level decomp with fewest bins and smallest least filled bin
retained
Reconverge
pair list <- all pairs of fanin LUTs with
shared inputs
best LUTs <- 0
for all possible pairs from pair list {
merged LUTs <- copy of fanin LUTs
with forced merge
FFD(merged LUTs) /* best combo */
}
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Maximum Share Decreasing
• Exhaustive search prohibitive
• Select box using following criteria
1. Greatest # inputs
2. Shares greatest # inputs with any existing bin
3. Shares greatest # of inputs with existing (remaining)
boxes
• Reduces to FFD for no input sharing
• Points 2 and 3 optimize network sharing
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Node Replication
Without Replication
•
•
•
•
With Replication
Apply replication to fanout nodes
Map without replication first
Locally decompose fanout nodes to determine savings
Ordering important
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Results – Chortle-crf
• 20 netlists mapped to 5-input LUTs
• Reconvergence reduced LUTs by 2.7%
• Replication reduced LUTs by 3.7%
• Combined 14% reduction achieved
• Replication exposes reconvergent paths creating additional
opportunities for optimization.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Chortle-d
• Minimize delay through circuit
• Generally increases hardware required
- Reduced logic levels by 38%
- Increased # LUTs by 79%
• Note most delay in FPGA in interconnect
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Other Approaches
• MIS-PGA
- Groups inputs into LUTs
- Decompose into 4-LUTs (Roth-Karp)
- 47 times slower than Chortle
- 14% fewer LUTs
• XMAP
- Represent circuit as BDDs
- Effective for multiplexer based devices.
- Also, BDS-PGA
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Flowmap
1. Use network flow to partition circuit.
2. Determine point where minimum flow achieved for
minimum cut
3. Cut until LUTs of size N achieved.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Taking Flip flops into Account
• FPGA devices contain fixed resources – FFs
• Technology mapping should take these into account
• Consider fanout nodes.
FF
FF
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
LUT Packing - VPACK
• Seed BLE – choose BLE with most inputs.
• Select next BLE -> BLE which shares most inputs and outputs with
cluster
• Continue until cluster is full or adding any BLE will overflow I -> #
inputs
• Hill Climbing – exceed I limit temporarily to find better minimum.
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004
Summary
• Many tech mapping algorithms exist to minimize
delay/area
• Chortle use dynamic programming heuristic to perform
mapping
• Largely a solved problem
• More sophisticated techniques evaluated recently
Lecture 5: Technology Mapping: Packing Logic Into LUTs
September 22, 2004