Transcript good `i`

Simulated Evolution Algorithm for MultiObjective VLSI Netlist Bi-Partitioning
Sadiq M. Sait,, Aiman El-Maleh, Raslan Al Abaji
King Fahd University of Petroleum & Minerals
Dhahran, Saudi Arabia
27 May 2003, IEEE ISCAS, Bangkok, Thailand
1
Outline
•
•
•
•
•
•
Introduction
Problem Formulation
Cost Functions
Proposed Approaches
Experimental results
Conclusion
2
VLSI Technology Trends
Design Characteristics
0.06M
2MHz
6um
SPICE
Simulation
0.13M
12MHz
1.5um
CAE
Systems,
Silicon
compilation
1.2M
50MHz
0.8um
HDLs,
Synthesis
3.3M
200MHz
0.6um
Top-Down
Design,
Emulation
7.5M
333MHz
0.25um
Cycle-based
simulation,
Formal
Verification
Key CAD Capabilities
The challenges to sustain such a fast growth to achieve giga-scale
integration have shifted in a large degree, from the process of
3
manufacturing technologies to the design technology.
The VLSI Chip in 2006
Technology
Transistors
Logic gates
Size
Clock
Chip I/O’s
Wiring levels
Voltage
Power
Supply current
0.1 um
200 M
40 M
520 mm2
2 - 3.5 GHz
4,000
7-8
0.9 - 1.2
160 Watts
~160 Amps
Performance
Power consumption
Noise immunity
Area
Cost
Time-to-market
Tradeoffs!!!
4
VLSI Design Cycle
VLSI design process is carried out at a number of
levels.
1. System Specification
2. Functional Design
3. Logic Design
4. Circuit Design
5. Physical Design
6. Design Verification
7. Fabrication
8. Packaging Testing and Debugging
5
Physical Design
The physical design cycle consists of:
1. Partitioning
2. Floorplanning and Placement
3. Routing
4. Compaction
Physical design converts a circuit description
(behavioral/structural), into a geometric description.
This description is used to manufacture a chip.
6
Why we need Partitioning ?
• Decomposition of a complex system into
smaller subsystems.
• Each subsystem can be designed independently
speeding up the design process (divide-and
conquer-approach).
• Dividing a complex IC into a number of
functional blocks, each of them designed by one
or a team of engineers.
• The partitioning scheme has to minimize the
interconnections between subsystems.
7
Levels of Partitioning
System
System Level Partitioning
PCBs
Board Level Partitioning
Chip Level Partitioning
Chips
Subcircuits
/ Blocks
8
Classification of Partitioning Algorithms
Partitioning Algorithms
Group Migration
Iterative Heuristics
1.
Kernighan-Lin
1.
Simulated annealing
2.
FiducciaMattheyeses (FM)
2.
Simulated evolution
3.
Tabu Search
4.
Genetic
3.
Multilevel K-way
Partitioning
Performance
Driven
1.
2.
3.
4.
Lawler et al. 1.
2.
Vaishnav
Choi et al.
Jun’ichiro et
al.
Others
Spectral
Multilevel
Spectral
9
Related previous Works
1999 Two low power oriented techniques based on simulated annealing (SA)
algorithm by choi et al.
1969 A bottom-up approach for delay optimization (clustering) was proposed by
Lawler et al.
1998 A circuit partitioning algorithm under path delay constraint is proposed by
jun’ichiro et al. The proposed algorithm consists of the clustering and
iterative improvement phases.
1999 Enumerative partitioning algorithm targeting low power is proposed in
Vaishnav et al. Enumerates alternate partitionings and selects a partitioning
that has the same delay but less power dissipation. (not feasible for huge
circuits.)
10
Motivation
•
•
•
•
•
•
Need for Power optimization
Portable devices
Power consumption is a hindrance in further integration
Increasing clock frequency
Need for delay optimization
In current sub micron design wire delay tend to
dominate gate delay.
Larger die size imply long on-chip global routes, which
affect performance
Optimizing delay due to off-chip capacitance
11
Objective
• Design a class of iterative algorithms for
VLSI multi-objective partitioning.
• Explore partitioning from a wider angle and
consider circuit delay, power dissipation
and interconnect in the same time, under a
given balance constraint
12
Problem formulation
Objectives
• Power cost is optimized
• Delay cost is optimized
• Cutset cost is optimized
Constraint
• Balanced partitions to a certain tolerance degree (10%)
13
Problem formulation
• the circuit is modeled as a hypergraph H(V,E),
where V ={v1,v2,v3,… vn} is a set of modules
(cells).
• And E = {e1, e2, e3,… ek} is a set of hyperedges.
Being the set of signal nets, each net is a subset of
V containing the modules that the net connects.
• A two-way partitioning of a set of nodes V is to
determine two subsets VA and VB such that VA U VB = V
and VA VB = 
14
Cutset
•
•
•
•
Based on hypergraph model H = (V, E)
Cost 1: c(e) = 1 if e spans more than 1 block
Cutset = sum of hyperedge costs
Efficient gain computation and update
cutset = 3
15
Delay Model
Metal 1
Metal 2
C2
C7
C3
SE1
C1
C4
C5
SE2
Cut Line
C6
CoffChip
path : SE1  C1C4C5SE2.
Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2
CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)
16
Delay
Delay(Pi) =
 Delay (cell )   Delay (net )
cellPi
netPi
Objective : Max Delay ( Pi) 
PiP
Pi: is any path Between 2 cells or nodes
P: set of all paths of the circuit.
17
Power
The average dynamic power consumed by CMOS logic gate in a
synchronous circuit is given by:
Pi
average
2
dd
V
Load
 0 .5 
 Ci  N i
Tcycle
Ni is the number of output gate transition per cycle (Switching
Probability)
Load
i
C
: is the load capacitance
18
Power
Load
i
C
C
basic
i
C
extra
i
basic : Load Capacitances driven by a cell before
i
Partitioning
C
C
extra : Additional load due to off chip capacitance.
i
(cut net)
Total Power dissipation of a Circuit:
2
dd


V
basic
extra
P   
  Ci  C i  N i
Tcycle i
19
Power
C
C
extra
i
 C
basic
i
extra
: Can be assumed identical for all nets
i
objective:  N i
v
iv
:Set of Visible gates Driving a load outside the
partition.
20
Unifying Objectives, How ?
• Weighted Sum Approach




 DelayCosto fCircuit 
Power 
 CutsetCost 

  Wc 
cos t  Wd * 
  Wp 
Maxdelay
MaxPower 
 MaxCutest 






1. Problems in choosing weights.
2. Need to tune for every circuit.
21
Fuzzy logic for cost function
•Imprecise values of the objectives
– best represented by linguistic terms that are
basis of fuzzy algebra
• Conflicting objectives
• Operators for aggregating function
22
Fuzzy logic for Multi-objective function
1. The cost to membership mapping
2. Linguistic fuzzy rule for combining the
membership values in an aggregating
function
3. Translation of the linguistic rule in form of
appropriate fuzzy operators
4.
And-like operators: Min operator  = min (1, 2)
And-like OWA: = * min (1,2) + ½ (1-) (1+
2)
Or-like operatorsMax operator  = max (1, 2)
Or-like OWA: = * max (1,2) + ½ (1-) (1+
2)
23
Membership functions
Where Oi and Ci are lower bound and actual cost of objective “i”
 i(x) is the membership of solution x in set “good ‘i’ ”
gi is the relative acceptance limit for each objective.
24
Fuzzy linguistic rule
• A good partitioning can be described by the following
fuzzy rule
IF solution has small cutset AND low power AND
short delay AND
good Balance.
THEN it is a good solution
25
Fuzzy cost function
The above rule is translated to AND-like OWA
 ( x)    min  C ,  P ,  D ,  B  1   
1
 C   P   D   B 
4
 (x) Represent the total Fuzzy fitness of the solution,
our aim is to Maximize this fitness
C ,  P ,  D ,  B Respectively (Cutset, Power, Delay, Balance)
Fitness
26
Simulated Evolution
Algorithm Simulated evolution
Begin
Start with an initial feasible Partition S
Repeat
Evaluation :
Evaluate the Gi (goodness) of all modules
Selection :
For each Vi (cell) DO
begin
if Random Rm > Gi then select the cell
End For
Allocation: For each selected Vi (cell) DO
begin
Move the cell to destination Block.
End For
Until Stopping criteria is satisfied.
Return best solution.
End
27
Cut goodness
d i  wi
gci 
di
gc5 
3 2
3
 0.33
di: set of all nets, Connected
and not cut.
wi : set of all nets, Connected
and cut.
28
Power Goodness
k
gpi 
k
 S  j V    S  j U 
j 1
j
I
j
j 1
i
k
 S  j V 
j 1
j
I
Vi is the set of all nets connected and Ui is
the set of all nets connected and cut.
gp5 
0.7  0.4
 0.428
0.7
29
Delay Goodness
gd i 
K i  Li
Ki
Ki: is the set of cells in all paths passing by cell i.
Li: is the set of cells in all paths passing by cell i
and are not in same block as i.
52
gd 5 
 0.6
5
53
gd 4 
 0 .4
5
30
Final selection Fuzzy rule
IF Cell I is near its optimal Cut-set goodness as
compared to other cells
AND
AND
near its optimal power goodness
compared to other cells
near its optimal net delay goodness as compared
to other cells
OR
T(max)(i) is much smaller than Tmax
THEN it has a high goodness.
31
Fuzzy Goodness
Tmax :delay of most critical
path in current iteration.
T(max)(i) :delay of longest path
traversing cell i.
Xpath= Tmax / T(max)(i)
Fuzzy Goodness:
1
gi  i ( x)    min iC , iP , iD  1    iC  iP  iD 
3
 iC ,  iP ,  iD
Respectively (Cutset, Power, Delay ) goodness.
32
Experimental Results
ISCAS 85-89 Benchmark Circuits
33
SimE versus Tabu Search & GA against time Circuit S13207
34
Experimental Results SimE versus Ts versus GA
SimE results were better than TS and GA, with faster execution time.
35
•
Conclusion
Re-write
this
The present work successfully addressed the
important issue of reducing power and delay
consumption in VLSI circuits.
• The present work successfully formulate and
provide solutions to the problem of multiobjective VLSI partitioning
• TS partitioning algorithm outperformed GA in
terms of quality of solution and execution time
36
Thank you
37