Transcript Document

Fast Synthesis of Clock
Gating from Existing Logic
Aaron P. Hurst
Univ. of California, Berkeley
Portions In Collaboration with…
Artur Quiring and Andreas Kuehlmann
Cadence Berkeley Labs
IWLS 2007
Motivation

Dynamic power consumption of clock network
consumes 30-40% of total power in current designs

Every register clock input is switched every cycle

A large fraction of these transitions can be avoided

Clock gating inserts combinational logic on clock
path to conditionally block switching

Capacitive load is “hidden” behind gates
Implementation

G is the gating condition for the registers R1-R3
The clock is not propagated when active

R1
R1
G
G
R2
R2
clk
clk
clkG
Clock Gate

R3
R3
clkG
Glitch-Safe Clock Gate
Glitches in G may propagate without a latch
Clock Gating Problem

Problem: How to produce gating conditions that are…
1.
2.
3.
4.

Functionally correct
Meet timing and physical constraints
Result in maximal dynamic power savings
Require minimal additional area and power to generate
Combinational versus sequential

Combinational gating conditions are functions of signals
available within the same cycle
This work addresses the combinational problem…
Previous Approaches

Human effort



Worthwhile investment at architectural level
Automatic approaches are needed at netlist level
Symbolic computation of gating functions
Problem 1: Symbolic functional manipulation is not scalable
Problem 2: Implementing required logic is unpredictable
?
quality
quality
0
1
New Algorithm

Computation is scalable and bounded


Combining simulation and SAT-solving
Existing logic is heavily reused


Contains difficulty of synthesis problem
Minimizes design perturbation
Terminology


Let x be the set of combinational inputs (PIs and states)
Given a register R


Let xR be its current state
Let FR(x) be its next state

The register doesn’t switch when FR(x) = xR

A function G is a valid combinational clock gating
function for R if the validity condition is met
GR  FR ( x)  xR
G1 (x)
FR ( x)  xR
G3 (x)
G2 (x)
G4 (x)
Nodes


Assume existing logic network is some sort of DAG
Each node in the network implements some function f

Can be used “for free”… less additional load and wiring

Nodes of interest are collected for each register

Not all pairs need to be enumerated. Constrained by…
1.
2.
3.
4.
Physical location
Timing information
Functionality
Potential power savings
Nodes: Constraints
Timing Constraints


Physical Constraints
Reason: gating condition must be
available before clock

Reason: length of clock gating nets

Limit nodes to a local region around reg.
Late-arriving signals are discarded
R
kept
discarded
Nodes: Pruning

Multiple simulation vectors are pushed through
circuit and node / register pairs are examined
1.
Search for counterexamples to functional validity

2.
If one is found, node is marked as proven invalid
Accumulate probabilistic information about the actual
coverage of a gating function
P  GR   FR ( x)  xR  


Using size of Boolean ON-set not an effective estimate
If available, simulation traces reflective of actual operation
Nodes: Proving

Functional validity of nodes must be conclusive

Problem is formulated as
Boolean satisfiability using
a simple test structure
SAT?
x
g(x)
FR(x)
R
xR
Two powerful speed-ups
1. Solver can be run in incremental mode
2. Counter-examples can be used for further simulation
Covers

Nodes are merged together into disjunctive covers


A cover G is a set of nodes { fi }
Is valid gating function for register R iff all fi are valid for R
FR ( x)  xR

f1 (x)
f (x)
G (x) 3
f2 (x)
Functional coverage of clock gate is increased with
little additional hardware

Only an additional input is required on the AND-gate
Covers: Problem
Information collected thus far…


registers
Possible nodes for each register
Validity and probability
R4 R5 . . . R66 R87
Proven
Unknown
Disproven



nodes
Functional Validity
f1
0.10 0.05
f7
0.23
f104
0.33
0.05
...

f233
0.17
0.00
Selecting gates is an instance of rectangle covering
But… correlation between nodes is not known
Split rectangle covering into two phases


Cover generation and selection
Redo simulation between to capture correlation
Covers: Generation & Selection

A heuristic is used generate interesting covers
R4




NP-complete
Efficient heuristics exist
Problem instance is sparse
G1 0.10 0.05
G7
0.01
G104 0.02
...
Problem is maximum set covering
covers

registers
R5 . . . R66 R87
G233
0.25
0.31
0.01
0.01
0.15
Covers are greedily selected in order of power savings

Power cost of additional logic included

Negative power savings are ignored
There’s More…

When clock is gated, register keeps its value. Its input is irrelevant

An observability don’t care (ODC) is produced
ODC FR ( x)  GR ( x)



In general, these are difficult to utilize in logic minimization
A purely structural simplification can be applied
Rule: Any immediate fanout of f can be rewired to a constant if its
transitive fanout contains only registers gated by a cover containing f
f
0
f  G
clkG
clk
Results

Procedure can be applied both pre- and postmapping


Post-mapping includes physical and timing information
Pre-mapping exposes non-physical signals


If useful for clock gating, mark to be mapped
Preliminary experimental results: technology
independent netlist without pair-wise compounds



Applied to OpenCores benchmarks
Pre-synthesized using ABC logic synthesis package
Implemented in OpenAccess / OpenAccess Gear
Results
Total Register Switching
1
1

Average reduction in
register switching: 14.4%

Average reduction in size
of circuit: 7.7%
0.855
0.8
0.6
0.4
0.4
0.2
Original
Average
Best
Completed Next Steps

Technology-dependent version

New scheme for delaying SAT-based proof

Creating new simple functions of existing nodes

Additional candidates for improving coverage
G2(x)
G1 (x)
FR ( x)  xR
G2 (x)
G3(x)
Future Next Steps

G1
Hierarchical Clock Gating
G2
G3
clk
clkG1

clkG2
clkG3
Bounded sequential gating selection


Reaching forward, to identify more unobservable transitions
Reaching backward, to generate more signals
xi-2
xi-1
xi
g(xi-2)
clk
clkG