slides (PowerPoint) - Newcastle University

Download Report

Transcript slides (PowerPoint) - Newcastle University

Efficient Automatic Resolution
of Encoding Conflicts Using
STG Unfoldings
Victor Khomenko
School of Computing Science,
Newcastle University, UK
Asynchronous circuits
 The traditional synchronous (clocked) designs
lack flexibility to cope with contemporary
design technology challenges
Asynchronous circuits – no clocks:
 Low power consumption and EMI
 Tolerant of voltage, temperature and
manufacturing process variations
 Modularity – no problems with the clock skew
and related subtle issues
[ITRS’05]: 22% of designs will be driven by ‘handshake
clocking’ in 2013, and 40% in 2020
 Hard to synthesize efficient circuits
 The theory is not sufficiently developed
 Limited tool support
2
Motivation
•
Resolution of encoding conflicts is one of
the most difficult tasks in logic synthesis
•
The quality of the resulting circuit (in terms
of area and latency) depends to a large
extent on the way the encoding conflicts
were resolved
3
Example: VME Bus Controller
Data Transceiver
Bus
dsr
dtack
VME Bus
Controller
lds
ldtack
dsr+
lds+
dtackd-
ldsdsr-
ldtackdtack+
Device
d
ldtack+
d+
4
Example: Encoding Conflict
dtack-
00100
10000
dsr+
00000
lds+
ldtack-
ldtack-
dtack01100
lds01110
01000
ldtack-
dsr+
10010
11000
ldsldsdtackdsr+
01010 11010 M’’
ldtack+
M’
11010
d+
ddsr01111
dtack+
11111
11011
5
State Graphs vs. Unfoldings
State Graphs:
 Relatively easy theory
 Many efficient algorithms
 Not visual
 State space explosion problem
6
State Graphs vs. Unfoldings
Unfoldings:
 Alleviate the state space explosion problem
 More visual than state graphs
 Proven efficient for model checking
 Quite complicated theory
 Not sufficiently investigated
 Relatively few algorithms
7
Example: Encoding Conflict
e8
e10
dtack- dsr+
e1
e2
e3
e4
dsr+
lds+
ldtack+
d+
Code(conf’)=10110
e5
e6
dtack+ dsr-
Code(conf’’)=10110
e7
e12
lds+
dlds-
ldtack-
e9
e11
8
Example: Resolving the conflict
dtack-
dsr+
csc+
lds+
d-
lds-
ldtack-
ldtack+
csc-
dsr-
dtack+
d+
9
Example: Resolving the conflict
dtack-
001000
ldtack-
dsr+
000000
ldtack-
lds011100
d-
010000
100001
lds+
ldtack-
dtack011000
csc+
100000
dsr+
100101
110000
ldtack+
ldsldsdtackdsr+
110100 M’’ M’ 110101
010100
d+
csc011110
011111
dsr111111
dtack+
110111
10
Example: Resulting Circuit
Data Transceiver
Bus
d
dtack
dsr
csc
Device
lds
ldtack
11
Core map
•
•
•
Cores often overlap
High-density areas are good candidates for
signal insertion
Analogy with topographic maps
Core1
Core2
A1
A2
A3
Core3
12
Transformations
•
•
•
Need to check the validity of the
transformation: safeness, bisimulation, not
delaying inputs, preserving semi-modularity
The validity should be checked before the
transformation is performed, i.e. on the original
prefix (to avoid backtracking)
Perform the transformation directly on the
prefix to avoid re-unfolding
 Re-unfolding is time-consuming
 Good for visualization (re-unfolding can
dramatically change the look of the prefix)
 Can transfer unresolved encoding conflicts
between the iterations of the algorithm
13
Sequential pre-insertion
 Preserves safeness
 Preserves traces
 Can introduce deadlocks: check that the new transition
never ‘steals’ tokens from any other enabled transition
 state property – can be checked on the prefix
 Easy to check that inputs are not delayed
 Can violate semi-modularity
 state property – can be checked on the prefix
14
Sequential post-insertion




Preserves safeness
Yields a bisimular PN
Preserves semi-modularity
Easy to check (conservatively) that inputs are not
delayed
15
Concurrent insertion
 Can introduce non-safeness and/or deadlocks
 the correctness can be checked on the prefix
 Easy to check that inputs are not delayed
 Preserves semi-modularity
16
Equivalent insertions
 Equivalence is easy to check
 Fewer transformations to consider
 Can convert to ‘canonical form’, e.g.
pre-insertions
 No need to check validity – postinsertions are always valid
17
Commutative insertions
Two transition insertions commute if they can be
performed in any order
• concurrent insertions commute with any other
insertions
• pre-insertions commute with post-insertions
• two pre/post-insertions commute iff they split
different transitions or the sets of split off
places do not overlap
 A valid insertion remains valid if another valid
commutative insertion is applied first, i.e. the
validity needs to be checked only once
18
Outline of the algorithm
1.
2.
3.
4.
Compute unfolding prefix of the STG
Compute conflict cores and terminate if none
Compute the set of valid transition insertions
Find a subset of these insertions such that
 no two of them are
SAT
– non-commuting, or
Incremental
– concurrent, or
– in auto-conflict, or
SAT
– one of them can trigger the other
 consistent assignment of signs is possible
 some encoding conflicts are resolved
 a cost function is minimised
5. Apply the insertions to the STG and the prefix and
goto 2
19
Cost function
Parameterised by the user; takes into account:
• the number of unresolved CSC and USC cores
• the delay introduced by the insertion
• the number of syntactic triggers of all noninput signals
• the number of inserted transitions of a signal
• the number of signals which are not locked
with the newly inserted signal
20
Case study 1: VME bus controller
e8
e10
dtack- dsr+
e1
e2
e3
e4
dsr+
lds+
ldtack+
d+
•
•
•
•
e5
e6
dtack+ dsr-
e7
e12
lds+
dlds-
ldtack-
e9
e11
Fully sequential solution
Fully concurrent solution
Solution with two set and two reset transitions
Explored design space: 17 different solutions
21
Case study 2: 8-way sequencer
•
•
Fully concurrent solution with 3 signals (csc1 is
set and reset twice)
Petrify needs 4 signals
22
Experimental results
•
•
•
Compared with Petrify and the ILP approach of
[CC06]
Small benchmarks:
 always better in terms of inserted signals
 8.5-8.8% smaller area
 Jordi Cortadella was impressed
Scalable benchmarks (the tool of [CC06] was
not available):
 almost the same area and number of signals
 runtime and memory consumption are better
by orders of magnitude
 some intractable for Petrify benchmarks
were easily solved
23
Thank you!
Any questions?
24