Transcript p140_guo_s

High performance field
programmable gate array for
gigahertz applications
Jong-Ru Guo, C. You, M. Chu, K. Zhou, Jin-Woo Kim, B.S. Goda*, R.P.
Kraft, J.F. McDonald
Rensselaer Polytechnic Institute, Troy, NY, 12180
* United State Military Academy, West Point, N.Y. 10096
Jong-Ru Guo
1
2004 MAPLD: 140
Gigahertz era
High speed reconfigurable
system is needed to handle
the increasing amount of
data.
However, the CMOS FPGA
just is operated at the
hundreds MHz.
 GHz reconfigurable
system is needed.
Jong-Ru Guo
2
2004 MAPLD: 140
Introduction
Field Programmable Gate Array (FPGA)
B
FPGA:
A reconfigurable chip that can be
A
programmed for a specific function.
Status:
There are no FPGA’s that operate at
GHz microprocessor clock rates much
less at K-band or X-band.
C
I/O Cell
Routing Cell
Goal:
Change this situation for the better.
Logic Cell
K-band: 10.9~36GHz
X-band: 8-12GHz
Jong-Ru Guo
3
2004 MAPLD: 140
FPGA Applications
1. Prototyping
2. Digital Networks
- Mobile Subscriber Equipment
- High Speed Switching Nodes
2. Real Time Signal/Image Processing
- Radar
- Pattern Recognition
3. Digital System Processing
- Filters
- Fourier Transform
4. Satellite Systems
5. Wireless
Jong-Ru Guo
4
2004 MAPLD: 140
High speed IBM SiGe HBT Process
Approximated cut-off frequency:
IBM 0.5 & 0.25 um generations (5HP)
~ 50 GHz
IBM 0.18 um generation (7HP)
~ 120 GHz
IBM 0.13 um generation (8HP)
~ 180 GHz
8HP process
7HP process
5HP process
Observe the Logarithmic Ic Axis
Jong-Ru Guo
Ref. 40-Gb/s Circuits Built From a 120-GHz fT SiGe Technology
IEEE Journal of Solid-State Circuit. VOL. 37, NO.9, Sept. 2003
5
2004 MAPLD: 140
SiGe Graded Base Bipolar Transistor
Ref. Flash Comm
Eg,Ge(x=0)
EC
EV
en+ Si
emitter
Drift Field
Si/SiGe band diagram
h+
n- Si
collector
(Concentration)
p-Si
Jong-Ru Guo
p-SiGe
base
Ge
Eg,Ge(grade)= Eg,Ge(x=Wb)- Eg,Ge(x=0)
x
6
Ref. Yuan Taur and Tak H. Ning “Fundamentals of Modern
VLSI Devices”, Cambridge University Press, p364, 1998.
2004 MAPLD: 140
New Structure: Input and Output Block and
Function Unit (FU)
West Output
MUX
Ce,Qe,E,E4
Cw,Qw,W,W4
Cs,Qs,S,S4
Cn,Qn,N,N4
E,S,N
To
East
Ce,Qe,E,E4
Cw,Qw,W,W4
Cs,Qs,S,S4
Cn,Qn,N,N4
W,S,N
To
West
C
Ce,Qe,E,E4
Cw,Qw,W,W4
Cs,Qs,S,S4
Cn,Qn,N,N4
Input 16:1
MUX
North Output
MUX
Q
New
D-FF
E,W,S
E,W,N
To
South
Function Unit
(FU)
East Output
MUX
South
Output
MUX
To
North
Output routing
block
Output
drivers
Schematic of the new function unit
Based on XC6200
Jong-Ru Guo
Input 17:1
MUX
2:1
MUX
CLK
Input Routing
block
Input 17:1
MUX
Master-Slave
Latch
7
Memory configuration
structure
(170um x 210um)
2004 MAPLD: 140
Area improvement-BC
170 um
130 um
135 um
210 um
49% layout area saved
7HP: 0.18 um process
8HP: 0.13 um process
Smaller layout  Better performance
More Configurable cells.
Jong-Ru Guo
8
2004 MAPLD: 140
Prediction of the performance improvement
by the different generation processes
130 ps
100 ps
42 ps
30 ps???
71 mW
52 mW13.8 mW
5HP
4.2 mW
7HP 8T
9HP?
HBT Generations
Propagation delay and power consumption comparisons
between different processes
Jong-Ru Guo
9
2004 MAPLD: 140
Information for the old, new, and
future Basic Cells-BC
Process
Vcc, Vee
Current
trees
Iref
Power
( PON )
Tp
BC-I
5HP
0, -3.4V
30
0.7 mA 71.4mW
239ps
Basic Cell-II
7HP
0, -2.8V
21
0.8 mA 47.04mW
100ps
Basic Cell-II
8HP
0, -2.2V
21
0.7mA
32.34mW
42ps
8HP
0, -2.2V
21
0.3mA
13.86mW
75ps
(high performance case)
Basic Cell-II
(Power Saving case)
1.
2.
Jong-Ru Guo
The 8HP cases (High performance case and power saving case) are based on
simulations.
The difference between 8HP cases is the high performance case has its transistors
set to max. cutoff frequency and the transistors in the Power Saving case are set
to be the same with the maximum cutoff frequency of 7HP process.
10
2004 MAPLD: 140
Test circuit
Four stage Basic Cell ring oscillator-BC
Measurement result of the 7HP ring
oscillator
Measurement result of the 5HP
Basic Cell
Jong-Ru Guo
11
2004 MAPLD: 140
Power-saving scheme- Basic Cell
Design
Tree #
Usage
BC Maximum Usage
21
100%
Case I (Comb./Sequential. Logic)
10/12
47.6%/57.1%
Case II
Sequential, One Redir.
15
71.4%
Sequential, Two Redir.
18
85.7%
Sequential, Three Redir
21
100%
3 tree/dir
14.2%/dir
Case III redirect function only
Power-saving scheme Usage [12]
Case I: Only combinational logic or sequential logic is used.
Case II: Sequential logic and redirection function are used.
Case III: Only redirection function is used.
Jong-Ru Guo
12
2004 MAPLD: 140
Summary: Basic Cell
• Layout size has been reduced by 49%.
With the latest Basic Cell, there will be 48x48 Basic Cell array
in 7mm x 7mm area.
• Propagation delay has been reduced by 82.5%
• Power consumption has been reduced by 80.6% (5HP case and
8HP power saving case) for the fully turned-on case.
• There is 94% power saved when the power-saving scheme is
enabled.
Jong-Ru Guo
13
2004 MAPLD: 140
High speed reconfigurable system
Interleaving block
High speed
inputs
High speed
outputs
High
speed
front
end
High
speed
back
end
Interleaving data path
De-interleaving data path
SiGe
FPGA
CMOS
FPGA
De-interleaving block
To processors
or other circuits
DSP and other applications
Such as, Poly-phase filter,
digital filter…etc
10GHz ~ 80GHz 500MHz ~ 10GHz 100MHz~700MHz
Jong-Ru Guo
14
2004 MAPLD: 140
Application:
High speed data acquisition system
MUX-DEMUX
• SiGe FPGA can be configured to DSP and other applications.
•To compare the performance between the SiGe and CMOS FPGAs, the
SiGe FPGA is configured to be 4:1 MUX and 1:4 DEMUX.
•The results can be used to prove its interleaving and de-interleaving
functions.
Jong-Ru Guo
15
2004 MAPLD: 140
High speed data acquisition
MUX-DEMUX
CH1
CH1B
CH2
CH2B
CH3
CH3B
CH4
CH4B
2:1
MUX
2:1
MUX Output
OutputB
2:1
MUX
1/2 CLK
1/2 CLKB
1/2
DIV
CLK
CLKB
Data
CLK
The block diagram of the 4:1 MUX
Output
½
DIV
Outputs
CH1 CH2 CH3 CH4
½
DIV
¼ CLK out
CHx
4T
1:4
DEMUX CH1 CH3 CH2 CH4
input
T
CHx
4T
The timing diagram of the 4:1 MUX (x
represents 1, 2, 3 and 4)
Jong-Ru Guo
1:2
DEMUX
D2
D2B
D4
D4B
1:2
DEMUX
The building blocks of the 1:4 DEMUX.
T
Data input
1:2
DEMUX
D1
D1B
D3
D3B
The timing diagram of the 1:4 DEMUX
16
2004 MAPLD: 140
Layout of the 4:1 MUX and 1:4 DEMUX
implemented by the SiGe Basic Cells
Simulation results show both 4:1
MUX and
1:4 DEMUX can operate up to
10GHz
Layout of the 4:1 MUX
Compare to CMOS FPGA (Xilinx
Virtex),
same circuits can run to 183MHz
Layout of the 1:4 DEMUX
Jong-Ru Guo
17
2004 MAPLD: 140
Simulation results (MUX)
Simulation result of the 4:1 MUX.
Inputs: CH_A: 1010011, CH_B: 0010100,
CH_C: 0101001 and CH_D: 0001010.
Output: 0010-1000-0011-1100-0001-0110.
Jong-Ru Guo
Simulated eye diagram of the 4:1 MUX
programmed by SiGe FPGA runs at 10Gbps
18
2004 MAPLD: 140
SiGe and CMOS FPGA
Performance comparisons
Tx rate
Power (mW)
Used CLB
4:1 MUX (SiGe)
10GBps
258.3
7
4:1 DEMUX (Virtex)
170MHz
61
7
1:4 DEMUX (SiGe)
2.5Gbps
(Input: 10Gbps)
194.52
8
1:4 DEMUX (Virtex)
45.5MHz
(Input 182MHz)
91
8
Virtex results are based on the following environments:
Software: Foundation 2.1
Xilinx power consumption work sheet V1.5
Jong-Ru Guo
19
2004 MAPLD: 140
Larger scale SiGe FPGA
5.8mm
• 20x20 Basic Cell array is
fabricated by IBM (7HP).
with the dimension of 7mm
x 7mm. (400 Basic Cells).
7mm
• 48x48 Basic Cell array is
developed (8HP) with the
high speed ADC integrated.
Jong-Ru Guo
20
2004 MAPLD: 140
Conclusion
• The performance of the SiGe FPGA can reach
up to 20GHz (8HP generation)
• The layout has been reduced by 49% between
the 8HP and 5HP generations.
• Applications have been proposed to run at
GHz range.
• 4:1 MUX and 1:4 DEMUX have been
configured to compare the performance of
SiGe and CMOS FPGA.
Jong-Ru Guo
21
2004 MAPLD: 140
Future work
• Test 20x20 Basic Cell array has
been fabricated by IBM (7HP).
• Develop high speed data
acquisition system.
• Implement DSP applications.
Such as software radar, poly
phase filtering …etc.
• 10GHz and 20GHz SiGe FPGA.
• Integrated with high speed frontend and back-end circuits.
Primitive layout of the 48x48 SiGe FPGA
Jong-Ru Guo
22
2004 MAPLD: 140