On-Chip Logic Minimization - Department of Electrical and

Download Report

Transcript On-Chip Logic Minimization - Department of Electrical and

On-Chip Logic Minimization
Roman Lysecky & Frank Vahid*
Department of Computer Science and Engineering
University of California, Riverside
*Also with the Center for Embedded Computer Systems, UC Irvine
This work was supported in part by the National Science Foundation, the Semiconductor
Research Corporation, and a Department of Education GAANN fellowship
Introduction
• Boolean logic minimization typically used during logic
synthesis
• Two-level logic minimization can be considered as a
general optimization technique
• Many applications can benefit from using logic
minimization dynamically
– IP routing table reduction
– Access control list (ACL) reduction
2
On-Chip Minimization Applications
(IP Routing Table Reduction)
138.23.16.9
Incoming IP packet
138.23.16.9
Destination IP address
Prefix
138.23.16.x
Next hop
Port 7
138.23.x.x
Port 5
125.x.x.x
Port 3
Port 7
Lookup Destination IP in
Routing Table
Choose Longest Prefix Match
3
On-Chip Minimization Applications
(IP Routing Table Reduction)
• Longest Prefix Match
– Ternary CAM (McAuley & Francis, 1993)
•
•
•
•
•
•
Store 0,1,*
Store IP address as TCAM entry
Store prefix length using the TCAM entries mask
Fast
Smaller hardware resources than binary CAM
Very large power consumption
– How can we reduce hardware resources and power
consumption?
4
On-Chip Minimization Applications
(IP Routing Table Reduction)
• Mask Extension (Liu, 2002)
– Uses two-level logic
minimization
– Performing minimization for
each update too slow
– Incremental update
• Existing minimized set
becomes don’t care set
• New route becomes single
entry in on set
• Achieves an average of 50
updates/second
– Not considering
communication, though
Original TCAM Entries
#
IP Addr.
Mask
Next Hop
P1
10011100
11111100
7
P2
10001100
11111100
7
Logic Minimization
TCAM Entries after Mask Extension
#
IP Addr.
P1&P2 10001100
Mask
Next Hop
11101100
7
5
On-Chip Minimization Applications
(Access Control List Reduction)
• Access Control List (ACL)
– Used to restrict IP traffic through network routers
– ACL size can range anywhere from from 300 (UCR CS&E
Dept.) to 10,000 (AOL)
– Common use is to block a particular protocol or port
number to avoid attacks such as Denial of Service attacks
• ACL Minimization
– Similar approach as used for IP routing table reduction
– However, order of the list must be preserved
ACL Input Format
Type Protocol
In IP
In Port
Out IP
Out Port
Action
6
Introduction
(Off-chip Logic Minimization)
Router
Transmit Data
to Server
Proc.
I$
•
Slow due to
communication
•
Sensitive to
server failures
•
Security issues
MEM
Execute
Minimizer
D$
Server
Network Router Chip
Transmit
Result to
Router
7
Introduction
(On-chip Logic Minimization)
Router
Initialize
Minimizer
I$
Proc.
D$
Execute
Minimizer
Indicate
Completion
ARM7
MEM
DMA
Mem.
On-chip Minimizer
Network Router Chip
8
On-Chip Logic Minimization
Requirements
• On-chip Logic Minimization Requirements
– Data Memory Resources
• On-chip minimization algorithms must be very memory conscious
– Instruction Memory Resources
• On-chip minimization algorithm must incorporate simplified
approaches that result in acceptable designs
– Execution time
• Limited data and instruction memory will likely lead to longer
execution times
• Must still remain reasonable
– Quality of results
• Must be capable of producing solution relatively close to optimal
Focus on developing an on-chip logic minimization tool that
produces acceptable results with reasonable increases in
execution time while using limited memory resources.
9
ROCM
• ROCM – Riverside On-Chip Minimizer
– Two-level minimization tool
– Utilized a combination of approaches from Espresso-II
(Brayton, et al. 1984) and Presto (Svoboda & White, 1979)
Optimize(F,D)
{
OrderCubes(F)
for i=1 to |F|
{
c = Fi
(c',W) = IterativeExpand(F,D,c)
F = (F  c') - W
}
}
IterativeExpand(F,D,c)
{
W = {}
c' = c
for i=1 to |c|
{
c' = Expand(c',i)
(val,W') = ValidExpansion(F,D,c')
if val = true
W = W  W'
else
Revert(c',i)
}
return (c',W)
}
10
ROCM Results – Quality
(Full Routing Table Reduction)
Initial
Espresso-Exact
Espresso-II
ROCM
35000
30000
25000
Only 2% larger
than optimal on
average
20000
15000
10000
5000
ra
ge
Pa
ix
Pa
cB
el
l
AA
DS
Av
e
M
ae
W
es
t
0
11
ROCM Results – Performance
(Incremental Update Execution Time)
14
Espresso-Exact
Espresso-II
ROCM
ROCM (ARM7)
12
10
ROCM executing on a
40MHz ARM7 requires
less than 1 second
8
6
4
2
0
Exec. Time
(seconds)
On a 500 MHz On a 40 MHz
ARM 7
Sun Ultra60
12
ROCM Results – Memory
(Code Size and Data Memory Usage)
Small code size of
only 22 kilobytes
250
Espresso-Exact
Espresso-II
ROCM
200
3500
3000
2500
150
2000
100
1500
1000
50
0
Average data
memory usage of
only 1 megabyte
500
Code Mem (KB)
Code Size
0
Data Mem (KB)
Data Memory Usage
13
ROCM Results – Quality
(Access Control List Reduction)
4500
4000
3500
3000
Initial
Espresso-Exact
Espresso-II
ROCM
Only 2% larger
than optimal on
average
2500
2000
1500
1000
500
ag
e
Av
er
Lo
ng
Ba
d
Ty
p2
Ty
p1
Un
iv
0
14
Customizing ROCM
• ROCM Customization
– Beneficial to optimize an algorithm for a particular
application
– Customize ROCM’s data structures and algorithms for a
particular input size
• Require less memory
• Reduce dynamic memory allocation
• Improve performance
– Created ROCM-32 customized for IP routing table
reduction
15
Customized ROCM-32 Results
(IP Routing Table Reduction)
37% reduction in
execution time vs.
ROCM
250
Espresso-Exact
Espresso-II
ROCM
ROCM-32
200
150
11% reduction in
data memory usage
vs. ROCM
3500
3000
2500
2000
1500
100
1000
50
0
500
0
Exec. Time
(seconds)
Data (kilobytes)
16
Conclusions
• Presented Riverside On-Chip Minimizer (ROCM)
• Feasible to execute logic minimization on chip
– Can be executed on an embedded 40 MHz ARM7 in
seconds for real networking problem sizes
– Requires small code size (22 kilobytes)
– Requires small data memory (1 megabyte)
• Produces good results
– On average only 2% larger than exact minimization
• Shown usefulness for networking applications
17
Future Work
• More Applications
– May appear now that on-chip minimization is feasible
• Dynamic HW/SW Partitioning
– Dynamically partition executing binary to on-chip
configurable logic
– Logic minimization is used during the logic synthesis
stage
– Initial work on dynamic HW/SW partitioning presented at
DAC 2003 yesterday in session 15
18