voor dia serie SNS
Download
Report
Transcript voor dia serie SNS
HiCap: A Fast Hierarchical
Algorithm for 3D Capacitance
Extraction
Weiping Shi
Department of Computer Science
University of North Texas
Outline
Introduction
Previous Research
Integral Equation & N-Body Problem
New Algorithm
Experimental Results
Conclusion
Future Work
Introduction
Capacitance Extraction: Given a set of conductors
in 3-D space, compute the capacitance between all
pairs of conductors.
-
1V
+
+
+
+
-
+
-
- -
C=Q
Signal delay = gate delay + interconnect delay
Interconnect delay is caused by RC (resistance and
capacitance) parasitic.
R
C
C
Interconnect delay dominates gate delay in deep
sub-micron VLSI.
Delay
(ps)
45
40
35
30
25
20
15
10
5
0
Gate
Interconnect
(Al+SiO2)
Interconnect
(Cu+lowk)
Sum (Al+SiO2)
Sum (Cu+lowk)
0.85 0.5 0.35 0.25 0.18 0.13 0.11
Generation (micron)
Importance in VLSI
Fast and accurate capacitance extraction is crucial in
the design and verification of VLSI circuits and
packaging.
Current 3D tools are too slow.
FastCap, Raphael, QuickCap, etc.
2D/2.5D/Quasi-3D tools use 3D engines to generate
library. Accuracy depends on 3D engines.
Dracula, HyperExtract, Arcordia, Fire&Ice, StarRC, Columbus, etc.
For critical nets and clock trees, 3D accuracy is
necessary.
Importance in MEMS
Accurate capacitance extraction of complex 3-D
structures is also important in design of MEMS
(MicroElectroMechanical Systems).
Design
of most motion sensors needs accurate
estimate of capacitance.
Design of most drivers needs to solve a similar
potential problem.
A
recent ARPA report estimates the market of above
applications at 1 to 3 billion dollars by 2004.
Enlarged comb driver
Previous Research
Differential Maxwell Equation (Finite Difference
Method or Finite Element Method)
Raphael Field Solver
Integral Laplace Equation (Boundary Element
Method)
Multipole algorithm FastCap by Nabors & White.
O(N) time. Kernel dependent.
Pre-corrected FFT algorithm by Phillips & White.
O(N log N) time. Kernel independent.
SVD algorithm IES3 by Kapur & Long. O(N log N)
time. Kernel independent.
Integral Equation Approach
where (x) is the known surface potential,
(x’) is the charge density,
da’ is an incremental conductor surface area,
x’ is on da’,
is the kernel.
Partition conductor surfaces into N panels and
assume uniform charge density on each panel.
Then we have a linear system:
Pq = v
where P is an NxN matrix of potential coefficients,
q is an N-vector of panel charges,
v is an N-vector of known panel potentials.
Each entry pij of potential coefficient matrix P
represents the potential at panel Ai due to unit
charge on panel Aj:
Solution q of the linear system Pq = v gives
the capacitance.
Challenge
Partition the conductor surfaces into N panels,
Calculate and store the dense NxN matrix P, and
Solve the linear system Pq = v
In O(N) time?
N-body Problem
N-body Problem: Given N particles in 3D space,
compute all forces between the particles.
Hierarchical Algorithm (Appel 85)
O(N) time (Esselink)
Radiosity (Hanrahan, Salzman & Aupperle)
Multipole Algorithm (Greengard & Rohklin 87)
O(N) time
FastCap
Appel’s Key Ideas
For practical purposes, forces acting on a particle
need only be calculated to within the given precision.
The force due to a cluster of particles at some
distance can be approximated with a single term.
Outline of New Algorithm
Adaptively partition conductor surfaces into small
panels according to a user supplied error bound
Pe.
Approximate potential coefficient matrix P and
store it in a hierarchical data structure of size O(N).
The data structure permits O(N) time matrix-vector
product Px for any N-vector x.
Solve linear system Pq = v using iterative methods.
Adaptive Panel Partition
If the potential coefficient estimate between two
panels are greater than Pe, then partition the panels.
Otherwise, record the coefficient.
A
C
C
B
E
F G
H
1
2
I
M N
L
J
J
3
4
5
Coefficient Matrix Representation
Entries of P are are stored in a hierarchical data
structure as links.
A
B
D
F
H
C
I
E
G
J
K
M
L
N
A
Matrix with
B
block entries
B
D
E
A
C
K
I
L
H
J
D
H
I
C
E
K
J
L
It can be shown the matrix contains O(N) block
entries, where N is the number of panels.
If expanded explicitly, the matrix would contain
NxN entries.
If panel sizes were uniform, the matrix would be
much larger than NxN.
Matrix-Vector Product Px
Compute charge for all panels in O(N) time.
A
B
D
F
H
C
I
E
G
J
K
M
L
N
Compute potential for all panels in O(N) time.
A
B
D
F
H
C
I
E
G
J
K
M
L
N
Distribute potential to leaf panels in O(N) time.
A
B
D
F
H
C
I
E
G
J
K
M
L
N
Solving Linear Systems
Use iterative methods such as GMRES or MINRES.
Each iteration requires a matrix-vector product Px
and can be completed in O(N) time.
Number of iterations needed is very small, normally
10-20 regardless of N.
Error and Complexity
Error of approximation can be controlled by the user
supplied error bound Pe.
Time complexity is O(N) because each of the above
steps is O(N).
Experimental Results
Test examples: Bus crossing 2x2, 3x3, …, 6x6. In
commercial tools, thousands of these crossings will
be computed to build the library.
2x2 Bus crossing
Previous 3D Algorithms
FastCap expansion order 2 (assume accurate).
FastCap expansion order 0.
Pre-corrected FFT. 40% faster than FastCap(2) and
uses 1/4 of memory of FastCap(2).
IES3. 60% faster than FastCap(2) and uses 1/5 of
memory of FastCap(2).
CPU time (in seconds):
250
200
150
FastCap(2)
FastCap(0)
New
100
50
0
2x2
3x3
4x4
5x5
6x6
40 - 100 times faster than FastCap(2),
14 - 40 times faster than FastCap(0).
Memory (in MB):
100
90
80
70
60
50
40
30
20
10
0
FastCap(2)
FastCap(0)
New
2x2
3x3
4x4
5x5
6x6
1/60 - 1/100 of memory of FastCap(2),
1/80 - 1/280 of memory of FastCap(0).
Error with respect to FastCap(2):
10.00%
9.00%
8.00%
7.00%
6.00%
5.00%
4.00%
3.00%
2.00%
1.00%
0.00%
FastCap(0)
New
2x2
3x3
4x4
5x5
6x6
Less than 2.7% error with respect to
FastCap(2), 3 times more accurate than
FastCap(0).
Conclusion
A new algorithm significantly faster than previous
best algorithms. It provides the possibility for 3D
extraction of clock trees and critical nets. It can also
be used to generate libraries for commercial 2D/2.5D
tools.
Kernel independent. Can be applied to multi-layered
dielectrics.
Adaptive refinement scheme produces good partition
of conductor surfaces.
Hierarchical data structure is much more efficient
than previous data structures.
Future Research
Capacitance Extraction
High order basis function
Bottom-up construction of hierarchy
Full chip and critical net extraction
Inductance Extraction
FastHenry is too slow
No commercial tool for mutual inductance.
Variational Parasitic Extraction
MEMS application