Transcript Document
Fast Synthesis of Clock
Gating from Existing Logic
Aaron P. Hurst
Univ. of California, Berkeley
Portions In Collaboration with…
Artur Quiring and Andreas Kuehlmann
Cadence Berkeley Labs
IWLS 2007
Motivation
Dynamic power consumption of clock network
consumes 30-40% of total power in current designs
Every register clock input is switched every cycle
A large fraction of these transitions can be avoided
Clock gating inserts combinational logic on clock
path to conditionally block switching
Capacitive load is “hidden” behind gates
Implementation
G is the gating condition for the registers R1-R3
The clock is not propagated when active
R1
R1
G
G
R2
R2
clk
clk
clkG
Clock Gate
R3
R3
clkG
Glitch-Safe Clock Gate
Glitches in G may propagate without a latch
Clock Gating Problem
Problem: How to produce gating conditions that are…
1.
2.
3.
4.
Functionally correct
Meet timing and physical constraints
Result in maximal dynamic power savings
Require minimal additional area and power to generate
Combinational versus sequential
Combinational gating conditions are functions of signals
available within the same cycle
This work addresses the combinational problem…
Previous Approaches
Human effort
Worthwhile investment at architectural level
Automatic approaches are needed at netlist level
Symbolic computation of gating functions
Problem 1: Symbolic functional manipulation is not scalable
Problem 2: Implementing required logic is unpredictable
?
quality
quality
0
1
New Algorithm
Computation is scalable and bounded
Combining simulation and SAT-solving
Existing logic is heavily reused
Contains difficulty of synthesis problem
Minimizes design perturbation
Terminology
Let x be the set of combinational inputs (PIs and states)
Given a register R
Let xR be its current state
Let FR(x) be its next state
The register doesn’t switch when FR(x) = xR
A function G is a valid combinational clock gating
function for R if the validity condition is met
GR FR ( x) xR
G1 (x)
FR ( x) xR
G3 (x)
G2 (x)
G4 (x)
Nodes
Assume existing logic network is some sort of DAG
Each node in the network implements some function f
Can be used “for free”… less additional load and wiring
Nodes of interest are collected for each register
Not all pairs need to be enumerated. Constrained by…
1.
2.
3.
4.
Physical location
Timing information
Functionality
Potential power savings
Nodes: Constraints
Timing Constraints
Physical Constraints
Reason: gating condition must be
available before clock
Reason: length of clock gating nets
Limit nodes to a local region around reg.
Late-arriving signals are discarded
R
kept
discarded
Nodes: Pruning
Multiple simulation vectors are pushed through
circuit and node / register pairs are examined
1.
Search for counterexamples to functional validity
2.
If one is found, node is marked as proven invalid
Accumulate probabilistic information about the actual
coverage of a gating function
P GR FR ( x) xR
Using size of Boolean ON-set not an effective estimate
If available, simulation traces reflective of actual operation
Nodes: Proving
Functional validity of nodes must be conclusive
Problem is formulated as
Boolean satisfiability using
a simple test structure
SAT?
x
g(x)
FR(x)
R
xR
Two powerful speed-ups
1. Solver can be run in incremental mode
2. Counter-examples can be used for further simulation
Covers
Nodes are merged together into disjunctive covers
A cover G is a set of nodes { fi }
Is valid gating function for register R iff all fi are valid for R
FR ( x) xR
f1 (x)
f (x)
G (x) 3
f2 (x)
Functional coverage of clock gate is increased with
little additional hardware
Only an additional input is required on the AND-gate
Covers: Problem
Information collected thus far…
registers
Possible nodes for each register
Validity and probability
R4 R5 . . . R66 R87
Proven
Unknown
Disproven
nodes
Functional Validity
f1
0.10 0.05
f7
0.23
f104
0.33
0.05
...
f233
0.17
0.00
Selecting gates is an instance of rectangle covering
But… correlation between nodes is not known
Split rectangle covering into two phases
Cover generation and selection
Redo simulation between to capture correlation
Covers: Generation & Selection
A heuristic is used generate interesting covers
R4
NP-complete
Efficient heuristics exist
Problem instance is sparse
G1 0.10 0.05
G7
0.01
G104 0.02
...
Problem is maximum set covering
covers
registers
R5 . . . R66 R87
G233
0.25
0.31
0.01
0.01
0.15
Covers are greedily selected in order of power savings
Power cost of additional logic included
Negative power savings are ignored
There’s More…
When clock is gated, register keeps its value. Its input is irrelevant
An observability don’t care (ODC) is produced
ODC FR ( x) GR ( x)
In general, these are difficult to utilize in logic minimization
A purely structural simplification can be applied
Rule: Any immediate fanout of f can be rewired to a constant if its
transitive fanout contains only registers gated by a cover containing f
f
0
f G
clkG
clk
Results
Procedure can be applied both pre- and postmapping
Post-mapping includes physical and timing information
Pre-mapping exposes non-physical signals
If useful for clock gating, mark to be mapped
Preliminary experimental results: technology
independent netlist without pair-wise compounds
Applied to OpenCores benchmarks
Pre-synthesized using ABC logic synthesis package
Implemented in OpenAccess / OpenAccess Gear
Results
Total Register Switching
1
1
Average reduction in
register switching: 14.4%
Average reduction in size
of circuit: 7.7%
0.855
0.8
0.6
0.4
0.4
0.2
Original
Average
Best
Completed Next Steps
Technology-dependent version
New scheme for delaying SAT-based proof
Creating new simple functions of existing nodes
Additional candidates for improving coverage
G2(x)
G1 (x)
FR ( x) xR
G2 (x)
G3(x)
Future Next Steps
G1
Hierarchical Clock Gating
G2
G3
clk
clkG1
clkG2
clkG3
Bounded sequential gating selection
Reaching forward, to identify more unobservable transitions
Reaching backward, to generate more signals
xi-2
xi-1
xi
g(xi-2)
clk
clkG