NSF/NSC workshop

Download Report

Transcript NSF/NSC workshop

Giga-Scale System-On-A-Chip
International Center on System-on-a-Chip (ICSOC)
Jason Cong
University of California, Los Angeles
Tel: 310-206-2775, Email: [email protected]
(Other participants are listed inside)
Background: “Double Exponential” Growth of
Design Complexity
• C1: complexity due to exponential increase of chip capacity
– More devices
– More power
– Heterogeneous integration, ……
• C2: complexity due to exponential decrease of feature size
– Interconnect delay
– Coupling noise
– EMI, ……
• Design Complexity  C1 x C2
Jason Cong
2
10,000,000
100,000,000
1,000,000
100,000
10,000
10,000,000
58%/Yr. Complexity
growth rate
1,000,000
100,000
1,000
10,000
xx
10
21%/Yr.
Productivity growth rate
x
100
xx
x
x
x
1,000
100
1
Transistor/Staff-Month
Logic Transistors/Chip (K)
Motivation: Productivity Gap
10
1998
2003
Chip Capacity and Designer Productivity
Jason Cong
Source: NTRS’97
3
Project Summary
• Develop new design methodology to enable efficient
giga-scale integration for system-on-a-chip (SOC)
designs
• Project includes three major components
– SOC synthesis tools and methodologies
– SOC verification, test, and diagnosis
– SOC design driver – network processor
Jason Cong
4
Research Team by Institutions

US



Taiwan



UCLA: Jason Cong
UC Santa Barbara: Tim Cheng
NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee,
Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu
NCTU: Jing-Yang Jou
China



Jason Cong
Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang,
Hongxi Xue
Peking Univ.: Xu Cheng
Zhejiang Univ.: Xiaolang Yan
5
Current Research Team

US



Taiwan



NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee,
Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu
NCTU: Jing-Yang Jou
China





UCLA: Jason Cong
UC Santa Barbara: Tim Cheng
Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang,
Hongxi Xue
Peking Univ.: Xu Cheng
Zhejiang Univ.: Xiaolang Yan
Several new faculty members in the 7 institutions
Guest members from National University of Singapore, Purdue
Univ., and UCLA (EE Dept)
Jason Cong
6
Thrust 1 -- SOC Synthesis Environment/Methodology
(Led by Jason Cong)
Design Spec
VHDL/C
VHDL/C
Co-Simulation
Design Partitioning
ASIC Synthesis
Code Generation for
Retargetable Compiler
and Assembler
Generator
DSP Synthesis
and
Optimization
FPGA Synthesis
and Technology
Mapping
Interconnect-Driven
High-level Synthesis
Synthesis for
IP Reuse
Physical Synthesis for
Full-Chip Assembly
Embedded
Processors
Jason Cong
DSPs
Embedded
FPGAs
Customiz
ed Logic
7
Interconnect Bottleneck in Nanometer Designs
 Challenge: Single-cycle full chip communication is no longer possible
 Not supported by the current CAD toolset
5 cycles




4 cycles
ITRS’01 0.07um Tech
5.63 G Hz across-chip clock
800 mm2 (28.3mm x 28.3mm)
IPEM BIWS estimations



3 cycles
2 cycles
1 cycle
0
Jason Cong
11.4
22.8
28.3
Buffer size: 100x
Driver/receiver size: 100x
On semi-global layer (tier 3) :
 Can travel up to 11.4 mm in
one cycle
 Need 5 clock cycles from
corner to corner
8
Regular Distributed Register Architecture
Reg. file
Reg. file
Reg. file
…
…
…
…
…
LCC
ADD
Cluster with area constraint
Hi
Wi
FSM

LCC
FSM

FSM
LCC
MUX
FSM
…
….
k cycle
Reg. file
2 cycle
Reg. file
MUL
Register File
Local
Computational
Cluster (LCC)
Global Interconnect
Reg. file
1 cycle
LCC
FSM
LCC
FSM
FSM
LCC
Island
Use register banks:
 Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, … k
cycle interconnect communication in each island
Highly regular
Jason Cong
9
MCAS: Architectural Synthesis for Multi-Cycle
Communication Using RDR Architecture
C program
CDFG generation
MCAS (Multi-Cycle Architectural
Synthesis)
CDFG
Resource allocation
& Functional unit binding
ICG
Scheduling-driven placement
Locations
Placement-driven
rescheduling & rebinding
Register and port binding
Datapath & FSM generation
Jason Cong
RTL
VHDL
Floorplan
constraints
Multi-cycle path
constraints
10
MCAS flow vs. Synopsys Behavioral Compiler
(on Virtex-II)
Design
pr
wang
mcm
honda
Flow
Synopsys BC
MCAS
Synopsys BC
MCAS
Synopsys BC
MCAS
Synopsys BC
MCAS
Cylces
25
27
29
14
43
34
29
23
Reg
28
34
36
35
142
35
44
42
ALU MULT fmax (MHz)
5
8
95.87
6
2
86.07
7
8
63.02
5
8
140.31
23
7
51.09
6
3
53.59
8
14
52.13
6
8
71.95
LUTs Latency (ns) MCAS vs. BC
877
260.78
1477
313.69
120.29%
1143
460.17
1523
99.78
21.68%
3256
841.60
2561
634.44
75.39%
2112
556.31
2606
319.65
57.46%
 Synopsys Behavioral Compiler setting: default (optimizing latency)
 Average latency ratio of MCAS vs. BC: 69%
900.00
3500
800.00
3000
700.00
2500
600.00
500.00
Synopsys BC
MCAS
400.00
2000
Synopsys BC
MCAS
1500
300.00
1000
200.00
500
100.00
0.00
Jason Cong
pr
wang
mcm
Latency
honda
0
pr
wang
mcm
Resource
honda
11
Optimality Study of Large-Scale Circuit Placement
• Construction of Placement Example with Known Optimal
(PEKO) [C. Chang et al, 2003]
 Construct instances with known
optimal using the characteristic of
the original problem
 First quantitative evaluation of the
optimality of circuit placement
problem
 Existing placement algorithms can
be 70% to 150% away from the
optimal
?
Jason Cong
12
High Interest in the Community
• Three EE Times articles coverage
– Placement tools criticized for hampering
IC designs [Feb’03]
– IC placement benchmarks needed,
researchers say [April’03]
– FPGA placement performance [Nov03]
• More than 150 downloads from our
website
– Cadence, IBM, Intel, Magma, Mentor
Graphics, Synopsys, etc
– CMU, SUNY, UCB, UCSB, UCSD,
UIC, UMichgan, UWaterloo, etc
• Used in every placement since its
publication
http://ballade.cs.ucla.edu/~pubbench
Jason Cong
13
Floorplanning & Interconnect Planning
• Based on proposed Corner Block List (CBL) representation propose
several Extended Corner Block List, ECBL, CCBL and SUB-CBL to
speed up floorplanning and handle more complicate L/T shaped and
rectilinear shaped blocks.
• Propose floorplanning algorithms with some geometric constraints,
such as boundary, abutment, L/T shaped blocks.
• Propose integrated floorplanning and buffer planning algorithms with
consideration of congestion .
• Using research results from UCLA on interconnect planning
• About 30 papers published in DAC, ICCAD, ISPD, ASPDAC, ISCAS
and Transactions.
Jason Cong
14
P/G Network Analysis & Optimization
• Propose an Area Minimization of Power Distribution Network
Using Efficient Nonlinear Programming Techniques
(ICCAD2001, accepted by IEEE Trans. On CAD)
• Propose a decoupling capacitance optimization algorithm for
Robust On-Chip Power Delivery (ASPDAC2004, ASICON2003)
Jason Cong
15
Parasitic R/L/C Etraction
• 3-D R/C Extraction using Boundary Element Method (BEM)
• Quasi-Multiple Medium (QMM) BEM algorithms
• Hierarchical Block BEM (HBBEM) technique
• Fast 3-D Inductance Extraction (FIE)
• Papers were published in ASPDAC, ASICON and IEEE
Transaction on MTT
Jason Cong
16
Thrust 2 -- SOC Verification, Test, and Diagnosis
(Led by Tim Cheng)
Verification and
Testing
Testing and diagnosis
for heterogeneous SOC
Self-testing using
on-chip
programmable
components
Self-testing for onchip analog/mixedsignal components
New test techniques
for deep-submicron
embedded memories
Jason Cong
Enabling techniques for semiformal functional verification
Scalable constraintsolving techniques
Automatic/semiautomatic functional
vector generation
from HDL code
Integrated framework for
simulation, vector generation
and model checking
17
Key Results - Verification
• Developed and released ATPG-based SAT solvers for circuits
(Univ. of California, Santa Barbara)
– Integrating structural ATPG and SAT techniques with new conflict learning
– CSAT: Fast combinational solver (released on March 2003)
• Demonstrated 10-100X speedup over state-of-the-art SAT solvers on industrial
test cases (reported by Intel and Calypto)
• Has been integrated into Intel’s FV verification system and a startup’s
verification engine
• Publications: DATE2003 and DAC2003
– Satori2: Fast sequential solver (released on Dec. 2003)
• Demonstrated 10X-200X speedup over a commercial, sequential ATPG engine
on public benchmark circuits
• Publications: ICCAD2003, HLDVT2003 and ASPDAC2004
Tim Cheng
18
Key Results - Testing
A new Statistical Delay Testing and Diagnosis framework consisting of five
major components (UCSB):
ATPG/Pattern Selection
• Statistical timing analysis
• Statistical critical path selection [DAC’02,ICCAD’02]
 Selecting statistical long & true paths whose tests
maximize detection of parametric failures
• Path coverage metric [ASPDAC’03]
 Estimating the quality of a path set
•
Path Filtering
Static Timing Analysis
Defect Injection &
Simulation
Dynamic Timing Simulator
Statistical Timing Analysis Framework
(Cell-based characterization)
Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004]

•
Critical Path Selection
Diagnosis
Identifying tests that activate longer delay along the target path
Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03]
 Ref: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in Test
Tim Cheng
19
Key Results - Testing
• On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of MultiGHz Signal (UCSB)
– Using on-chip, single-shot measurement unit to sample signal
periods for spectral analysis
– Demonstrated, through simulation, accurate extraction of
multiple sinusoids and random jitter components for a 3GHz
signal
– Publications: ASPDAC2004 and DATE2004
Tim Cheng
20
Thrust 3 – Design Driver: Network Security Processor
(Led by Prof. C. W. Wu & Xu Cheng)
•
•
Applications: IPSec, SSL, VPN, etc.
Functionalities:
–
–
–
–
•
•
•
•
•
•
•
Public key: RSA, ECC
Secret key: AES
Hashing (Message authentication): HMAC (SHA-1/MD5)
Truly random number generator (FIPS 140-1,140-2 compliant)
Target technology: 0.18m or below
Clock rate: 200MHz or higher (internal)
32-bit data and instruction word
10Gbps (OC192)
Power: 1 to 10mW/MHz at 3V (LP to HP)
Die size: 50mm2
On-chip bus: AMBA (Advanced Microcontroller Bus Architecture)
Jason Cong
21
Encryption Modules (PKEM)
•
Public key encryption module
– Operations:
• 32-bit word-based modular multiplication
• Multiplication over GF(p) and GF(2m)
•
•
•
•
•
•
•
An RSA cryptography engine with small area overhead and high speed
Scalable word-width
TSMC 0.35μm
34K gates (1.7×1.8 mm2 )
100MHz clock
Scalable key length
Throughput
– 512-bit key: 1.79Kbps/MHz
– 1024-bit key: 470bps/MHz
Jason Cong
22
Encryption Modules (SKEM)
• Secret key encryption module
– Operations:
• Matrix operations,
manipulation
•
•
•
•
•
•
AES cryptography
32-bit external interface
58K gates
Over 200MHz clock
Throughput: 2Gbps
Support key length of
128/192/256 bits
Jason Cong
Technology
TSMC 0.25m CMOS
Package
128CQFP
Core Size
1,279 x 1,271 m2
Gate Count
63.4K
Max. Freq.
250MHz
Throughput
2.977 Gbps (128-bit key)
2.510 Gbps (196-bit key)
23
2.169 Gbps (256-bit key)
International Collaborations
•
Joint NSF/NSC workshop in Aug. 1999 on SOC (Hsin-Chu, Taiwan)
•
First team preparation meeting for the proposed center in Jan. 2000 (Yokohama,
Japan)
•
2nd planning meeting held in April 2000 (Hawaii, US)
•
3rd planning meeting in Aug. 2000 (Chengde, China)
•
Proposal submitted to NSF in Aug. 2000 and funded in Dec. 2000
•
Workshops
–
March 30-31, 2001 in Taipei, Taiwan.
–
June 23-24, 2001 in Los Angeles, USA
–
August 31-September 1, 2001 in HangZhou, China
•
March 28-29, 2002, National Tsing Hua University, Hsinchu, Taiwan
•
August 20-21, 2002, Peking University, Beijing, China
•
November 15-16, 2002, University of California, Santa Barbara
•
March 27-29, 2003, National Taiwan University, Taipei, Taiwan
•
December 19-21, 2003, Yunnan University, Kunming, China
Jason Cong
24
Publications
• 56 research publications up to this point
• 17 in top conferences/journals (DAC, ICCAD,
ASPDAC, ITC, etc.) in the field
Jason Cong
25
People & Education
• Many interactions among participants from different
institutes
• Two new IEEE fellows:
– Prof. Xiaolang Hong, Tsinghua Univ.
– Prof. Cheng-Wen Wu, National Tsing Hua Univ.
• Involved many young faculty members and researchers
• Trained an army of graduate students
Jason Cong
26