Transcript PPT - Unife

Digital Integrated
Circuits
A Design Perspective
Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolic
Modified and integrated by
Davide Bertozzi
Design
Methodologies:
Standard cell design
© Digital Integrated Circuits2nd
Design Methodologies
None
Configurable/Parameterizable
10-100
1-10
Providing programmability adds
overhead to the implementation
Embedded microprocessor
100-1000
Hardwired components
Energy Efficiency (in MOPS/mW)
Three orders of
magnitude
Higher
efficiency
Domain-specific processor
(e.g. DSP)
Impact of Implementation Choices
•Late binding
•Re-use across
multiple applications
•Software upgrade
0.1-1
0.25um CMOS process
Somewhat
flexible
Fully
flexible
Flexibility
(or programmability)
Flexibility comes at a cost in terms of power and performance
© Digital Integrated Circuits2nd
Design Methodologies
Mapping Computation to…..
Coarse-grain
parallelism
Strong memory
consistency models
OK with few large
independent threads
GeneralPurpose
Computing
(host
processor)
Leading core count
Single-Instruction
Multiple-Data (SIMD)
 OK with thousands
of small threads (better
if almost identical) to
expose massive HW
multithreading
Parallel threads
heavily dependent on
local data content
Many, truly
independent parallel
computations
Branch divergence
between threads
Throughput
Computing
(GPGPUs)
Highest
GOPS/W
No
flexibility
(lower yield)
HW
accelerators
Cluster-based, many-core
programmable
accelerators
© Digital Integrated Circuits2nd
Design Methodologies
Heterogeneous Parallel Computing
•
•
•
There is not only one perfect mapping solution!
Architectural heterogeneity and many-cores are THE design paradigm for
embedded SoCs (e.g., HSA initiative)
This is today the way to pursue high performance with energy efficiency, that
is high performance-per-watt
Host
(multi-core heterog.)
Cache-coherent
processor
interconnect
General Purpose
Programmable
Accelerator
2nd-level NoC
Hardware
accelerators
High-Speed
I/O
Top-level NoC
DRAM memory
controller
2nd-level NoC
Graphics
© Digital Integrated Circuits2nd
DMA engine
Intermediate
Fabric
FPGA-like
fabric
Design Methodologies
Implementation Choices
Design for high performance/high density: handcrafted full custom design
Design for fast time-to-market: design automation techniques
Digital Circuit Implementation Approaches
Custom
Semicustom
Cell-based
Standard Cells
Compiled Cells
© Digital Integrated Circuits2nd
Macro Cells
Array-based
Pre-diffused
(Gate Arrays)
Pre-wired
(FPGA's)
Design Methodologies
The Custom Approach – early days
Intel 4004 microprocessor
(108 KHz, 2300 transistors, 10um)
When performance
or design density are
critical, handcrafting
circuit topology and
physical design seems
to be the only option
• high cost
• long time-to-market
It is OK when
• custom blocks can be reused
• cost can be amortized over a
large volume (e.g., uP,memories)
• cost is not primary design criterion
(e.g. Supercomputers)
Courtesy Intel
© Digital Integrated Circuits2nd
Design Methodologies
Transition to Automation and Regular Structures
Intel 4004 (‘71)
Intel 8080
Intel 8286
Intel 8085
Intel 8486
Evolution of full custom design
 Replication of the same custom-designed block multiple times (e.g., memories)
 Composition of different custom-designed blocks with a regular composition pattern
In both cases regularity enables deployment of automation
© Digital Integrated Circuits2nd
Courtesy Intel
Design Methodologies
Pentium 4 processor
• Almost all parts were designed automatically
– composing custom blocks together in a regular way
(semicustom design with a library of cells)
• Performance critical modules (PLL, clock buffers) were still
designed manually
•Basic automation levels even for full-custom design:
- layout editors, DRC.
© Digital Integrated Circuits2nd
Design Methodologies
Full-custom design


Complete control over transistor and interconnect dimensions
(within design rule constraints)
Design rules:
 Minimum spacing between metal lines (varies per layer)
 Line width
 Transistor channel length

Circuit Designers create application-specific building blocks
 Technology Provider (foundry) provide SPICE/HSPICE transistor
models, parasitic extraction tools
 Models are used to drive transistor sizing/layout constraints



Continual verification of design as it becomes more defined
PRO: Produces Optimized Design (density, power,
performance)
CON: Time-consuming; error-prone, highest NRE
© Digital Integrated Circuits2nd
Design Methodologies
Layout editors
VDD
3
Out
In
1
GND
Stick diagram of inverter
Magic Layout Editor
(UC Berkeley)
© Digital Integrated Circuits2nd
• Dimensionless layout entities
• Only topology is important
• Final layout generated by
“compaction” program
Design Methodologies
Cell-based semicustom design (CBD)
Predefined and custom-designed cells are instantiated multiple times and
interconnected to yield a given logic function
Metaphor: mosaic
cell
Logic function
ADVANTAGES
• cuts down on design time and costs
• reduces implementation effort by
REUSING a library of cells for different
designs
 cells need to be designed and
verified once for a given
technology node
 cell reuse amortizes the cost
for their full-custom design
DRAWBACKS
•Reduced integration density and
performance
•No design fine-tuning (i.e., transistorlevel) allowed
CBD approaches are categorized based on the granularity of library elements
© Digital Integrated Circuits2nd
Design Methodologies
Standard cells
Standardizes the design entry level at the logic gate
Based on a library of standard pre-designed, pre-verified
cells
• basic logic functions (NOR, NOT, NAND, XOR,..)
• complex functions (basic MUX, decoders, adder, comparator,..)
• storage elements (DFF, SR latches, ...)
• special cells (e.g., brute-force synchronizers; tie-high; tie-low)
• logic cell variants to cover a wide range of fan-in/fan-out
conditions (e.g., 4:1 mux vs 2:1 mux; different transistor sizing)
• specialization for other parameters:
- supply voltage, threshold voltage, corner cases
Foundries or even fabless companies (in partnership with
foundries) provide libraries of standard cells for semicustom
design with tens or hundreds of cells
© Digital Integrated Circuits2nd
Design Methodologies
Standard cell layout methodology
Strong restrictions on the layout allow high levels of automation
(e.g. automatic layout generation)
Row of standard cells
(all cells must have same height)
Routing channel
requirements are
reduced by presence
of more interconnect
layers
Intermixing with other layout
design approaches. For those
modules which do not adapt to
the logic cell paradigm (e.g.,
highly regular, more stringent
performance requirements)
© Digital Integrated Circuits2nd
Design Methodologies
Standard cell layout methodology


A standard cell library is complemented by an I/O cell library
I/O circuits are analog in nature, and analog delays are not easy to
predict/model
 IC designers are faced with interfacing to a growing diversity of
standards and parts





Memory, I/O, graphics, networking
Standards: DDR, SDRAM, PCI, USB,..
Different signaling methods: LVDS, CML,…
Circuits for latchup, ESD, isolation,..
Ground and power pins (many)
Routing
Cell
I/O pad ring
© Digital Integrated Circuits2nd
IO cell
Design Methodologies
I/O library and packaging options
Low-Cost packaging
with low pin count
Peripheral pads for bond wires
allow the I/O cell circuitry to be
placed in alignment with the pads
leading to simple logical, electrical
and physical structures
© Digital Integrated Circuits2nd
High-cost packaging with high pin count
Creating a grid array of pads for flip chip mounting
allows for easier alignment in the packaging, but
may cause the routing to and from the I/O circuitry
to become very complex
Design Methodologies
Standard Cell — Early Example
• Large area overhead
for the interconnects
 Feedthrough cells
 large routing channels
• Adding more metal layers
 less requirements
on routing channels
[Brodersen92]
© Digital Integrated Circuits2nd
Design Methodologies
Standard Cell – The New Generation
Design in a 7 metal layers technology
Cell-structure
hidden under
interconnect layers
• Density: 90%
• small area overhead
for interconnects
© Digital Integrated Circuits2nd
Design Methodologies
Standard cells
Designing a standard cell library is time consuming,
although amortized among a large number of designs
 Today it is common practice to have several cell versions
- number of inputs
- transistor sizing for different capacitive loads (driving strength)
- pullup/pulldown ratios
- technology: Vth, Vdd, technology corner cases
Non-trivial choice of the mix of logic cells
• small library with most cells having limited fan-ins?
• large library with many versions of the same cell?
• conservative large driving capabilities lead to power/area overhead
Technology libraries are broadly differentiated based on the
target design goal (low-power vs. high-performance)
Synthesis tools choose the correct cell version in the library
based on speed/area/power constraints
© Digital Integrated Circuits2nd
Design Methodologies
Standard cell structure
Routing
channel
VDD
PMOS transistors close to the Vdd rail
Intra-cell wiring
signals
NMOS transistors close to the ground rail
GND
Mirrored Cell
No Routing
channels
VDD
VDD
Cell mirroring enables
sharing of
power and ground rails
M2
M3
GND
Mirrored Cell
© Digital Integrated Circuits2nd
GND
Design Methodologies
Inverter standard cell layout
Power rail
p-mos diffusions
N-well
n-mos diffusions
Ground rail
© Digital Integrated Circuits2nd
Design Methodologies
Design rules



The feature size f is the minimum spacing between drain and
source (min. poly width)
Design rules expressed in terms of λ= f/2
A wiring track is the space required for a wire
 E.g., 4 λ width, 4 λ spacing from beighbor = 8 λ pitch
 The rule applies to transistors as well
© Digital Integrated Circuits2nd
Design Methodologies
Cell height
VDD
Rails
~10
Tall cells (11 or 12 metal tracks) support more
complex routing, larger driving strength transistors
and are typically tuned for performance, but may
exhibit higher leakage power.
N Well
2
Cell height 12 metal tracks
(metal track is the M1 pitch)
In
Out
Short cells (7 or 8 metal tracks) are optimized for area
efficiency, but generally designed with smaller,
lower driving strength transistors, so are less
appropriate for high-speed designs.
Standard height cells (9 or 10 tracks) are an
intermediate trade-off
GND
Cell boundary
© Digital Integrated Circuits2nd
Design Methodologies
Standard Cell – Example in 0.18 um
Power
rail
3-input NAND cell (from ST Microelectronics):
C = Load capacitance
T = input rise/fall time
Input
signals
wired
through
PolySi
• 5 cell versions
Ground
rail
- C from 0.18 to 0.72 pF
- area from 16.4 to 32.8 um2
 Not just performance,
but also energy given in datasheet
Low power library;
high Vth TNs against leakage
© Digital Integrated Circuits2nd
Library cells documentation is
critical, although time-intensive
Design Methodologies
Routing tracks
cell
cell
cell
cell

 Spacing between tracks is center-tocenter distance between wires.
 Track spacing depends on wire layer
used.
cell

Horizontal track
Routing channel
wire
Vertical track
cell
cell
cell
cell
Tracks form a grid for routing.
Different layers are (generally)
used for horizontal and vertical
wires.
 Horizontal and vertical can be routed
relatively independently.
cell

Cell pins placed at intersections
between vertical and horizontal
tracks.
 Pin placement dictates the
complexity of the routing problem
© Digital Integrated Circuits2nd
Design Methodologies
Left-edge algorithm
 Basic
channel routing algorithm.
 Assumes one horizontal segment per
net.
 Sweep pins from left to right:
 assign horizontal segment to lowest
available track.
© Digital Integrated Circuits2nd
Design Methodologies
Example
A
A
B
© Digital Integrated Circuits2nd
B
B
C
C
Design Methodologies
Limitations of left-edge algorithm
 Some
combinations of nets require more
than one horizontal segment per net.
A
B
?
B
A
aligned
© Digital Integrated Circuits2nd
Design Methodologies
Vertical constraints
 Aligned
pins form vertical constraints.
 Wire to lower pin must be on lower track; wire to
upper pin must be above lower pin’s wire.
© Digital Integrated Circuits2nd
A
B
B
A
Design Methodologies
Dogleg wire
A
dogleg wire has more than one
horizontal segment.
 But
A
B
B
A
requires an additional metal layer!
© Digital Integrated Circuits2nd
Design Methodologies
Technology library –
Architectural effects
Power-Length/Speed trade-off for NoC link design
65nm LP-LVT
From:
A. Pullini, F. Angiolini,
S. Murali, D. Atienza,
G. De Micheli, and L. Benini,
``Bringing NoCs to 65nm,''
IEEE Micro Magazine,
vol. 12, no. 5,
September/October, pp. 7585, 2007.


NoC links
synthesized
in isolation
Short and/or slow-clocked links don’t pose any problem
Long and/or high-speed links force routing tools to infer a
large number of buffering gates, increasing power
© Digital Integrated Circuits2nd
Design Methodologies
One Technology Library?
A single technology library no longer
exists for standard cell design
65nm LP-LVT



65nm LP-HVT
An aggressively low-power library (LP-HVT) infers buffers with
lower size and speed, resulting in much tighter constraints on
operation frequency or length (i.e., link feasibility)
The spread increases as technology scales down
We need to pick the right library for specific design constraints
© Digital Integrated Circuits2nd
Design Methodologies
Mixed-Library Design
During logic synthesis, it is possible to link different technology
libraries at the same time to span the performance-power tradeoff for the design at hand.
4x4 2D mesh NoC
65nm Library variants:
Low-Vth (fast)
High-Vth (low-power)
Mixed-Vth (multiple Vths)
- aiming for max performance
- aiming for a power-perf.
trade-off
Clock gating always enabled
except for Low-Vth, to avoid
performance penalties
Handle with care:
When you link more libraries, you are increasing mask complexity and
fabrication cost, since manufacturing steps for transistors are different
© Digital Integrated Circuits2nd
Design Methodologies
Mixed-Vth
10ns
5ns
 Gates
on the critical path should come
from the fastest library (Low-Vth)
 Gates on non-critical paths should come
from the low-power library (High-Vth)
© Digital Integrated Circuits2nd
Design Methodologies
Power-Performance Trade-Off
HVth
Frequency target
Clock gating
Max
Library variant
MVthA
MVthB
300 MHz
Max
LVth
Max.
Enabled
Enabled
Enabled
Disabled
Frequency (MHz)
142
300
714
952
Bandwidth (GB/s)
27
57
137
183
Power (mW)
11
25
88
145
There is almost an order of magnitude difference in the
power/performance ratios achievable
by LVth and HVth libraries
© Digital Integrated Circuits2nd
Design Methodologies
Power-Performance Trade-off
HVth
Frequency target
Clock gating
Max
Library variant
MVthA
MVthB
300 MHz
Max
LVth
Max.
Enabled
Enabled
Enabled
Frequency (MHz)
142
300
714
952
Bandwidth (GB/s)
27
57
137
183
Power (mW)
11
25
88
145
-
Disabled
Mixed Vth is attractive:
- approaches LVth performance at a lower power
Approaches HVth performance at almost the same power
efficiency (GB/s over mW)
© Digital Integrated Circuits2nd
Design Methodologies
Semicustom Design Flow
Design Capture
Behavioral
Design Iteration
HDL
Pre-Layout
Simulation
Logic Synthesis
Structural
(RTL)
Floorplanning
Post-Layout
Simulation
Placement
Circuit Extraction
Routing
Physical
GDSII file.
Tape-out to silicon foundry
for mask generation
© Digital Integrated Circuits2nd
Design Methodologies
Semicustom design flow
Design capture: schematics, block diagrams, HDLs, imported IPs
Logic synthesis: from HDL language into a gate-level netlist,
combined with the netlist of reused or generated macros
PreLayout Simulation: (grossly) estimated parasitics and layout
parameters; performance analysis
Floorplanning: chip outlay creation based on estimated module
sizes, early design of clock and power distribution networks
Placement: Precise positioning of cells within blocks
© Digital Integrated Circuits2nd
Design Methodologies
Semicustom design flow
Routing: Interconnects between cells and blocks
Extraction: chip model from actual physical layout and parasitics
PostLayout Simulation: Check functionality and correctness
of the circuit in presence of layout parasitics; Performance AND
Power analysis
Tape out: binary file generation in GDSII format, containing
information needed for mask generation. To silicon foundry.
© Digital Integrated Circuits2nd
Design Methodologies
Integrating Logic synthesis with
Physical Design
RTL (Timing) Constraints
• Exponential increase
of design tool complexity
and run-time
Physical Synthesis
Macromodules
Fixed netlists
Netlist with
Place-and-Route Info
Place-and-Route
Optimization
© Digital Integrated Circuits2nd
Logic synthesis
with first-order
place-and-route
Accurate Place-and-route
meeting timing constraints
Design Methodologies
Design synthesis
© Digital Integrated Circuits2nd
Design Methodologies
Logic synthesis
© Digital Integrated Circuits2nd
Design Methodologies
Design Environment

The process parameters
 Technology library
 Operating conditions (PVT)

I/O port attributes
 Driving strength of input ports
 Capacitive Loading of output ports
 Design rule constraints
– max_transition, max_fanout, max_capacitance

Statistical wire-load model
 wirelength=f(fanout)
 Resistance/Capacitance/Area-per-unit-length given
 pre-layout static timing analysis
© Digital Integrated Circuits2nd
Design Methodologies
Input and output delay constraints
These parameters
may have a
tremendous impact
on driving strength
of boundary cells
and power
consumption of the
design as a whole
© Digital Integrated Circuits2nd
Design Methodologies
Design constraints

Clock signal specification





Period
Duty cycle
Transition time
Skew
Delay specifications
 Maximum delays
 Minimum delays

Timing exceptions
 Multicycle paths
 False paths

Path grouping
When the max. speed of the
design is searched for,
then a max. period of 0.1ns
can be given as a constraint.
The min. period can be
derived from the amount of
violation
 E.g., for multi-clock designs
© Digital Integrated Circuits2nd
Design Methodologies
Design constraints

Clock signal specification





Period
Duty cycle
Transition time
Skew
Delay specifications
 Maximum delays
 Minimum delays

Timing exceptions
 Multicycle paths
 False paths

Path grouping
 E.g., for multi-clock designs
© Digital Integrated Circuits2nd
Enforce absolute constraints
Extract timing of paths
Enforce minimum delay requirements on bundling paths
Are bundling constraints fulfilled?
Design Methodologies
Design constraints

Clock signal specification





Period
Duty cycle
Transition time
Skew
Delay specifications
set_multicycle_path -from U1 -to U5
 Maximum delays
 Minimum delays

Timing exceptions
 Multicycle paths
 False paths

Path grouping
 E.g., for multi-clock designs
© Digital Integrated Circuits2nd
Design Methodologies
Performance-Area/Power trade-off during logic synthesis
LET US COMPARE SEVERAL ADDER IMPLEMENTATIONS
WHILE RELAXING TARGET CLOCK SPEED FOR SYNTHESIS
•As the target clock period increases, new adder architectures come
progressively into play (see lower side of bars in the plots).
• As the period is further increased, adders’ slack is exploited for power
optimizations (RTL netlist transformations, insertion of HVT cells), therefore
adders do not show slacks for a certain time window
• After a certain period, RTL netlists of adders cannot be power-optimized any
more, and they start having slacks (upper side of the bars in the plots)
Target clock periods for
64 bit adders
Target clock periods for
32 bit adders
4000
2500
[ns]
[ps]
3500
3000
2000
1500
1000
500
0
8
7
6
5
4
3
2
1
0
BK
CLF
PPARCH
CSM
2nd
© Digital Integrated Circuits
Is tance (DW)
RPL
PPARCH
CSM
RPL
Design Methodologies
Istan ce (DW)
Area-Power for 32 bit adders
Let us sweep a range of target clock periods
Maximum data introduction rate
Synthesis tool optimizes adder slack for power.
Power 32 Bit
Area 32 bit
1,00E-002
2000
9,00E-003
1800
8,00E-003
CLF
1400
BK
1200
PPARCH
1000
CLA
800
CSM
600
RPCS
RPL
400
200
Total Power
[u^2]
1600
CLF
7,00E-003
BK
6,00E-003
PPARCH
5,00E-003
CLA
4,00E-003
CSM
RPCS
3,00E-003
RPL
2,00E-003
1,00E-003
0
0,00E+000
330
341
385
418
979
Period [ps]
1199
3500
330
341
385
418
979 1199 3500
Period [ps]
• The “new entry” adder for a given target period is never the most power
efficient
• Higher area always means higher power
© Digital Integrated Circuits2nd
Design Methodologies
Floorplanning
Typical issues the floorplanning tool copes with:
 does the design fit the chip budgeted area?
 estimates area of major units and defines their relative placement based
on some objective function
 estimates wire lengths and wiring congestion, although more advanced
cost functions can be considered:
Having high
communication traffic
(thick lines) spread
over short (up) or
long (bottom) links is
likely to heavily affect
the power required
for data transmission.
© Digital Integrated Circuits2nd
Best IR drop
solutions
spread out the
hot spot across
a large part of
the floorplan,
instead of
concentrating it
in a specific
region.
Design Methodologies
Placement

Placement: assign cells to positions on the chip, such that no
two cells overlap with each other (legalization), and some cost
function (e.g., projected wirelength) is optimized
 Considers: wirelength, routability/channel density, power, timing,....
© Digital Integrated Circuits2nd
Design Methodologies
Placement
Acting upon the “row utilization parameter” of most placement tools, a
given cell placement density can be achieved to compact vs. alleviate
routing congestion of the design
© Digital Integrated Circuits2nd
Design Methodologies
Link with routing
Ideally, placement and routing (P&R) should be
performed simultaneously as they depend on each
other’s results
 This is however often too computation-intensive
 Approximation: placement estimates the wire length
of a net using some wirelength model

During P&R, the gate-level netlist will change:
(good reasons why row utilization should NOT be 100%)
-Buffer insertion
-Driving strength resizing
-Local logic optimizations ending up in selective netlist
modifications to meet design constraints
-Avoid wiring congestion
© Digital Integrated Circuits2nd
Design Methodologies
Wirelength estimation models
© Digital Integrated Circuits2nd
Design Methodologies
The (lucky) physical synthesis flow
Open tool
Create Floorplan
Timing analysis
Create Power Grid
Insert clock tree
(CTS)
Placement
Timing
optimization
and new reports
Routing
Post-routing optimization and design closure
© Digital Integrated Circuits2nd
Design Methodologies
Timing convergence (min-max analysis)
Clock Domain
Clock Period
Slack Pre-Opt
Slack Post-Opt
clk_Audio
100 MHz
5,75 ns
3,45 ns
clk_CPU
500 MHz
0,2 ns
0,09 ns
clk_DDR
250 MHz
0,38 ns
0,17 ns
clk_DMA
200 MHz
0,65 ns
0,39 ns
clk_DSP
300 MHz
0,34 ns
0,19 ns
clk_Radio
150 MHz
1,12 ns
0,82 ns
clk_SD_USB_WiFi
200 MHz
0,53 ns
0,23 ns
clk_SPI
140 MHz
6,33 ns
2,27 ns
clk_SRAM
500 MHz
-0,19 ns
0,14 ns
clk_Video
300 MHz
0,38 ns
0,2 ns
 Cells in those clock domains that have a big slack are relaxed
(from a driving strength viewpoint) to save power.
 Small timinig violations in the fastest clock domains are easily fixed.
This was a lucky (or successful?) case of first-time-right design
In practice: The “Timing Closure”
Concern
Due to the increased role of parasitics
(mostly interconnect-related) in deep sub-micron designs,
prediction models of synthesis tools are having a hard time
Initial design
Intermediate design
Final design
Iterative Removal of Timing Violations (white lines)
Synthesis iterations, buffer insertion, placement constraints, routing issues,..
© Digital Integrated Circuits2nd
Courtesy Synopsys
Design Methodologies
Case study – a NoC switch
Fixing layout rule violations
Inputbuf
Outbuf
Crossbar
Inputbuf
Outbuf
Inputbuf
Outbuf
Arbiter
Inputbuf
Outbuf
SWITCH
© Digital Integrated Circuits2nd
Design Methodologies
Case study – a NoC switch
Fixing layout rule violations
Inputbuf
Crossbar
&
control
Arbiter
Outbuf
Arbiter
Inputbuf
Inputbuf
Outbuf
Arbiter
Outbuf
Inputbuf
Arbiter
SWITCH
© Digital Integrated Circuits2nd
Outbuf
Design Methodologies
Switch radix
Topologies often differentiate themselves based
on the switch radix they require
65nm MVth 1.2V technology; Clock gating enabled
Area and power increased with switch radix, while
frequency decreased dramatically
© Digital Integrated Circuits2nd
Design Methodologies
Switch radix


Placement-aware logic synthesis worked as expected
Physical synthesis is aware of placement…not routing!!!!
 Beginning from 14x14 switches, wire density in the switch crossbar
becomes an issue
 Meeting timing constraints, avoiding crosstalk and resolving DRC
violations cannot be met at the same time
 Hundreds of violations in 14x14, tens of thousands in 30x30
© Digital Integrated Circuits2nd
Design Methodologies
Switch radix

There are two options to
fix DRC violations
 Increase switch area
 Decrease switch
frequency
Switch area can be controlled by
specifying “row utilization” parameter
85% was OK up to 10x10
70% was OK for 14x14
At 30x30 even an utilization of 50% did not
fix violations
Tuning switch area only partially effective
© Digital Integrated Circuits2nd
Design Methodologies
Switch radix

There are two options to
fix DRC violations
 Increase switch area
 Decrease switch
frequency
25% slow-down was OK for 14x14
30% was OK for 18x18
At 30x30 even halving clock speed did not
fix violations
Frequency slowdown somewhat more
effective
© Digital Integrated Circuits2nd
Design Methodologies
Switch radix

Key take-away: high radix switches at 65nm are feasible until
10x10 or 14x14, after which their overhead in area and
frequency becomes too severe.
We would need long
links to connect cores
to the switch. They
would be pipelined,
with additional area and
power cost
© Digital Integrated Circuits2nd
Design Methodologies
Standard cell design
It has become immensely popular, except for
• very high performance ICs
• ultra low energy consumption ICs
• extremely regularly structured ICs (memory, multiplier,..)
Reasons for the success
• Increased quality of automatic cell placement
and routing tools
• Availability of multiple routing layers
• Advent of sophisticated logic-synthesis tools
- abstract design inputs:
behavioural models, RTL models
- gate-level netlist production
(behavioural synthesis, logic synthesis, respectively)
Drawbacks
• Cell redesign with every migration to a new technology
• Huge cost for mask sets (order of million $)
© Digital Integrated Circuits2nd
Design Methodologies
Macrocells
For certain blocks, standard cell approach
might be inefficient
(multipliers, memories, embedded up, DSPs)
 Blocks whose complexity is larger than
traditional standard cells: macrocells
 2 kinds of macrocells:
Hard Macros and Soft Macros

© Digital Integrated Circuits2nd
Design Methodologies
Hard macrocells

custom designs of the requested functions
 Functionality and layout are fixed
 Some parameterization is feasible (e.g., multipliers,
memories)
good properties of custom design
(dense layout, optimized performance and power)
 opportunity for reuse in many designs
 hard to port them to new manifacturers or
technologies -> less and less used
 Examples: embedded uP or memories, DSPs


Parameterization by means of module compilers
 Replication of basic macrocells
© Digital Integrated Circuits2nd
Design Methodologies
Hard MacroModules
25632 (or 8192 bit) SRAM
Generated by hard-macro module generator
• automatic layout generation
• provides timing and power information
• adds redundancy to deal with defects
© Digital Integrated Circuits2nd
Design Methodologies
Soft Macrocells
Module with a given functionality, but without a
specific physical implementation
• Placement and routing may vary from instance to instance
• Timing is not predictable – wait for final layout
• No advantages of full custom design, they rely on the
semicustom physical design process
• Ease of migration to new technologies
• Structural generators: specify function and
parameters, and they generate:
-a netlist of standard cells
-constraints for the place and route tools
Cleverer structures (than logic synthesis) based on
function knowledge (e.g., multipliers)
© Digital Integrated Circuits2nd
Design Methodologies
“Soft” MacroModules
2 instances of 8x8 multiplier module with different aspect ratios
Input to Module compiler
Macrocell generator: optimized connection of standard cells
Soft approach advantages: different aspect ratios can be generated
© Digital Integrated Circuits2nd
Synopsys DesignCompiler
Design Methodologies
Hybrid ASIC design methodology
•Macromodules have changed the semicustom design
landscape: design reuse instead of designing from scratch
•Macrocells can be acquired from third-party vendors, who
make the parts available through royalty or licensing
agreement (Intellectual Property modules, IPs)
•Examples: embedded microprocessors, DSPs, bus
interfaces (e.g., PCI), special purpose functions (FFT, ECC,
MPEG dec.), graphic accelerators (GPUs)
•For an IP to be useful, it has to come with appropriate
software tools, not just hardware (e.g., xdevelopment
toolchain, test benches for validation)
© Digital Integrated Circuits2nd
Design Methodologies
“Intellectual Property”
A typical SoC consists of a blend of design styles and modules,
embedding a number of hard or soft macrocells within a sea of
standard cells
A Protocol stack SoC for Wireless
Hard Macrocells
(compiler from
Process vendor)
Hard-wired
(std cells)
© Digital Integrated Circuits2nd
Tensilica Xtensa
soft-core
generated from Verilog description
Design Methodologies