27-FPGAEvolution

Download Report

Transcript 27-FPGAEvolution

Evolution of Implementation Technologies
 Discrete devices: relays, transistors (1940s-50s)
 Discrete logic gates (1950s-60s)
 Integrated circuits (1960s-70s)
trend toward
higher levels
of integration
 e.g. TTL packages: Data Book for 100’s of different parts
 Map your circuit to the Data Book parts
 Gate Arrays (IBM 1970s)
 “Custom” integrated circuit chips
 Design using a library (like TTL)
 Transistors are already on the chip
 Place and route software puts the chip together automatically
 + Large circuits on a chip
 + Automatic design tools (no tedious custom layout)
 - Only good if you want 1000’s of parts
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 1
Gate Array Technology (IBM - 1970s)
 Simple logic gates
 Use transistors to
implement combinational
and sequential logic
 Interconnect
 Wires to connect inputs and
outputs to logic blocks
 I/O blocks
 Special blocks at periphery
for external connections
 Add wires to make connections
 Done when chip is fabed
“mask-programmable”
 Construct any circuit
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 2
Programmable Logic
 Disadvantages of the Data Book method
Constrained to parts in the Data Book
Parts are necessarily small and standard
Need to stock many different parts
 Programmable logic
Use a single chip (or a small number of chips)
Program it for the circuit you want
No reason for the circuit to be small
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 3
Programmable Logic Technologies
 Fuse and anti-fuse
 Fuse makes or breaks link between two wires
 Typical connections are 50-300 ohm
 One-time programmable (testing before programming?)
 Very high density
 EPROM and EEPROM
 High power consumption
 Typical connections are 2K-4K ohm
 Fairly high density
 RAM-based
 Memory bit controls a switch that connects/disconnects two wires
 Typical connections are .5K-1K ohm
 Can be programmed and re-programmed in the circuit
 Low density
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 4
Programmable Logic
 Program a connection
 Connect two wires
 Set a bit to 0 or 1
 Regular structures for two-level logic (1960s-70s)
 All rely on two-level logic minimization
 PROM connections - permanent
 EPROM connections - erase with UV light
 EEPROM connections - erase electrically
 PROMs
Program connections in the _____________ plane
 PLAs
Program the connections in the ____________ plane
 PALs
Program the connections in the ____________ plane
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 5
PAL Logic Building Block
 Programmable AND gates
 Fixed OR/NOR gate
 Flipflop/Registered Output
 Feedback to Array
 Tri-state Output
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 6
XOR PALs
 Useful for comparator logic, arithmetic sums, etc.
Use of XOR gates can dramatically reduce the number of
AND plane inputs needed to realize certain functions
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 7
XOR PAL
 And/Or/XOR Logic
 Feedback
 Registered Outputs
 Tri-State Outputs
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 8
Another Variation: Synchronous vs.
Asynchronous Outputs
CLK
Q0
DQ
Seq
N
Q1
DQ
Seq
D
Open
DQ
Com
Reset
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 9
Making Large Programmable Logic Circuits
 Alternative 1 : “CPLD”
Put a lot of PLDS on a chip
Add wires between them whose connections can be
programmed
Use fuse/EEPROM technology
 Alternative 2: “FPGA”
Emulate gate array technology
Hence Field Programmable Gate Array
You need:
A way to implement logic gates
A way to connect them together
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 10
Field-Programmable Gate Arrays
 PALs, PLAs = 10s – 100s Gate Equivalents
 Field Programmable Gate Arrays = FPGAs
Altera MAX Family
Actel Programmable Gate Array
Xilinx Logical Cell Array
 1000s - 100000(s) of Gate Equivalents!
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 11
Field-Programmable Gate Arrays
 Logic blocks
 To implement combinational
and sequential logic
 Interconnect
 Wires to connect inputs and
outputs to logic blocks
 I/O blocks
 Special logic blocks at
periphery of device for
external connections
 Key questions:
 How to make logic blocks programmable?
 How to connect the wires?
 After the chip has been fabbed
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 12
Tradeoffs in FPGAs
 Logic block - how are functions implemented: fixed functions
(manipulate inputs) or programmable?
 Support complex functions, need fewer blocks, but they are bigger
so less of them on chip
 Support simple functions, need more blocks, but they are smaller so
more of them on chip
 Interconnect
 How are logic blocks arranged?
 How many wires will be needed between them?
 Are wires evenly distributed across chip?
 Programmability slows wires down – are some wires specialized to
long distances?
 How many inputs/outputs must be routed to/from each logic block?
 What utilization are we willing to accept? 50%? 20%? 90%?
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 13
Altera EPLD (Erasable Programmable
Logic Devices)
 Historical Perspective
 PALs: same technology as programmed once bipolar PROM
 EPLDs: CMOS erasable programmable ROM (EPROM) erased by UV light
 Altera building block = MACROCELL
CLK
8 Product Term
AND-OR Array
+
Programmable
MUX's
Clk
MUX
AND
ARRAY
Output
MUX
Q
pad
I/O Pin
Inv ert
Control
F/B
MUX
Programmable polarity
Seq. Logic
Block
Programmable feedback
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 14
Altera EPLD: Synchronous vs.
Asynchronous Mode
Altera EPLDs contain 10s-100s of independently programmed macrocells
Global
CLK
Personalized
by EPROM
bits:
Clk
MUX
Synchronous Mode
1
Flipflop controlled
by global clock signal
OE/Local CLK
Q
EPROM
Cell
Global
CLK
Clk
MUX
local signal computes
output enable
Asynchronous Mode
1
OE/Local CLK
Q
EPROM
Cell
Flipflop controlled
by locally generated
clock signal
+ Seq Logic: could be D, T positive or negative edge triggered
+ product term to implement clear function
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 15
Altera Multiple Array Matrix (MAX)
AND-OR structures are relatively limited
Cannot share signals/product terms among macrocells
Logic
Array
Blocks
(similar to
macrocells)
LAB A
LAB H
LAB B
LAB C
LAB D
LAB G
P
I
A
LAB F
LAB E
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 16
Global Routing:
Programmable
Interconnect
Array
EPM5128:
8 Fixed Inputs
52 I/O Pins
8 LABs
16 Macrocells/LAB
32 Expanders/LAB
LAB Architecture
I/O Pad
Macrocell
ARRAY
I
N
P
U
T
S
I/O
Block
I/O Pad
P
I
A
Expander
Product
Term
ARRAY
Macrocell
P-Terms
Expander
P-Terms
Expander Terms shared among all
macrocells within the LAB
• Efficient way to use AND plane resources
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 17
P22V10 PAL
INCREMENT
2904
1
0
0
FIRST
FUSE
NUMBERS
4
8
12
16
20
24
28
32
36
2948
2992
3036
3080
3124
3168
3212
3256
3300
3344
3388
3432
3476
3520
3564
3608
40
ASYNCHRONOUS RESET
(TO ALL REGISTERS)
44
88
132
176
220
264
308
352
396
1 1
1 0
AR
D
Q
23
0 0
0 1
Q
5808
SP
P
R
1
0
5809
OUTPUT
LOGIC
MACROCEL
L
18
P - 5818
R - 5819
6
440
3652
484
528
572
616
660
704
748
792
836
880
OUTPUT
LOGIC
MACROCELL
3696
3740
3784
3828
3872
3916
3960
4004
4048
4092
4136
4180
4224
4268
22
P - 5810
R - 5811
2
924
968
1012
1056
1100
1144
1188
1232
1276
1320
1364
1408
1452
P - 5820
R - 5821
4312
OUTPUT
LOGIC
MACROCELL
4356
4400
4444
4488
4532
4576
4620
4664
4708
4752
4796
4840
21
P - 5812
R - 5813
1496
OUTPUT
LOGIC
MACROCEL
L
16
P - 5822
R - 5823
8
4884
OUTPUT
LOGIC
MACROCELL
4928
4972
5016
5060
5104
5148
5192
5236
5280
5324
20
P - 5814
R - 5815
4
OUTPUT
LOGIC
MACROCEL
L
15
P - 5824
R - 5825
9
5368
2156
2200
2244
2288
2332
2376
2420
2464
2508
2552
2596
2640
2684
2728
2772
2816
2860
5
17
7
3
1540
1584
1628
1672
1716
1760
1804
1848
1892
1936
1980
2024
2068
2112
OUTPUT
LOGIC
MACROCEL
L
OUTPUT
LOGIC
MACROCELL
5412
5456
5500
5544
5588
5632
5676
5720
OUTPUT
LOGIC
MACROCEL
L
14
P - 5826
R - 5827
19
10
P - 5816
R - 5817
SYNCHRONOUS
PRESET
(TO ALL REGISTERS)
5764
11
INCREMEN
T
13
0
4
8
12
16
20
24
28
32
36
40
Supports large number of product terms per output
Latches and muxes associated with output pins
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 18
Actel Programmable Gate Arrays
Rows of programmable
logic building blocks
+
rows of interconnect
Anti-fuse Technology:
Program Once
Use Anti-fuses to build
up long wiring runs from
short segments
8 input, single output combinational logic blocks
FFs constructed from discrete cross coupled gates
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 19
Actel Logic Module
SOA
S0
Basic Module is a
Modified 4:1 Multiplexer
S1
D0
2:1 MUX
D1
2:1 MUX
Y
D2
2:1 MUX
D3
R
"0"
SOB
Example:
Implementation of S-R Latch
2:1 MUX
"0"
2:1 MUX
"1"
2:1 MUX
S
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 20
Q
Actel Interconnect
Logic Module
Horizontal
Track
Anti-fuse
Vertical
Track
Interconnection Fabric
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 21
Actel Routing Example
Logic Module
Input
Logic Module
Output
Logic Module
Input
Jogs cross an anti-fuse
minimize the # of jogs for speed critical circuits
2 - 3 hops for most interconnections
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 22
Actel’s Next Generation: Axcelerator
 C-Cell
Basic multiplexer logic plus
more inputs and support for
fast carry calculation
Carry connections are “direct”
and do not require propagation
through the programmable
interconnect
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 23
Actel’s Next Generation: Accelerator
 R-Cell
 Core is D flip-flop
 Muxes for altering the clock and
selecting an input
 Feed back path for current
value of the flip-flop for simple
hold
 Direct connection from one Ccell output of logic module to an
R-cell input; Eliminates need to
use the programmable
interconnect
 Interconnection Fabric
 Partitioned wires
 Special long wires
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 24
Xilinx Programmable Gate Arrays
 CLB - Configurable Logic Block
 5-input, 1 output function
 or 2 4-input, 1 output functions
 optional register on outputs
 Three types of routing
 direct
 general-purpose
 long lines of various lengths
 RAM-programmable
IOB
IOB
CLB
CLB
IOB
 Can be used as memory
IOB
Wiring Channels
IOB
 Built-in fast carry logic
IOB
IOB
 can be reconfigured
IOB
CLB
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 25
CLB
CLB
Slew
Rate
Control
CLB
D
Q
Passive
Pull-Up,
Pull-Down
Output
Buffer
Switch
Matrix
Vcc
Pad
Input
Buffer
CLB
Q
CLB
Programmable
Interconnect
C1 C2 C3 C4
S/R
Control
DIN
G
Func.
Gen.
SD
F'
H'
EC
RD
1
F4
F3
F2
F1
H
Func.
Gen.
F
Func.
Gen.
Y
G'
H'
S/R
Control
DIN
SD
F'
D
G'
Q
H'
1
H'
K
Q
D
G'
F'
EC
RD
X
Delay
I/O Blocks (IOBs)
H1 DIN S/R EC
G4
G3
G2
G1
D
Configurable
Logic Blocks (CLBs)
The Xilinx 4000 CLB
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 27
Two 4-input functions, registered output
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 28
5-input function, combinational output
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 29
CLB Used as RAM
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 30
Fast Carry Logic
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 31
Xilinx 4000 Interconnect
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 32
Switch Matrix
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 33
Xilinx 4000 Interconnect Details
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 34
Global Signals - Clock, Reset, Control
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 35
Xilinx 4000 IOB
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 36
Xilinx FPGA Combinational Logic Examples
 Key: General functions are limited to 5 inputs
(4 even better - 1/2 CLB)
No limitation on function complexity
 Example
2-bit comparator:
A B = C D and A B > C D implemented with 1 CLB
(GT) F = A C' + A B D' + B C' D'
(EQ) G = A'B'C'D'+ A'B C'D + A B'C D'+ A B C D
 Can implement some functions of > 5 input
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 37
Xilinx FPGA Combinational Logic
 Examples
N-input majority function: 1 whenever n/2 or more inputs are 1
N-input parity functions: 5 input/1 CLB; 2 levels yield 25 inputs!
5-input Majority Circuit
9 Input Parity Logic
CLB
CLB
7-input Majority Circuit
CLB
CLB
CLB
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 38
CLB
Xilinx FPGA Adder Example
 Example
2-bit binary adder - inputs: A1, A0, B1, B0, CIN
outputs: S0, S1, Cout
A3
B3
A2
CLB
Cout
B2
A1
CLB
S3
A3 B3 A2 B2
CLB
A0
CLB
S2
C2
B1
C1
Full Adder, 4 CLB delays to
final carry out
CLB
S1
C0
S0
A1 B1 A0 B0 Cin
S2
2 x Two-bit Adders (3 CLBs
each) yields 2 CLBs to final
carry out
CLB
S0
S3
Cout
B0 Cin
S1
C2
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 39
Xilinx Vertex-II Family
 88-1000+ pins
 64-10000+ CLBs
 Combinational and sequential logic using lookup tables and flip-flops
 Random-access memory
 Shift registers for use as buffer storage
 Multipliers regularly placed throughout the CLB array to
accelerate digital signal processing applications
 E.g., the XC2V8000: 11,648 CLBs, 1108 IOBs, 90,000+ FFs,
3Mbits RAM (168 x 18Kbit blocks), 168 multipliers
 Equivalent to eight million two-input gates!
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 40
Xilinx Vertex-II Family IOB
 Tri-state/bidirectional driver
 Registers for each of three
signals involved: input,
output, tri-state enable.
 Two registers to latch values
with separate clocks.
 For large pinouts, separate
clocks stagger signals
changes to avoid large
current spikes
 FFs used for synchronization
as well as latching
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 41
Xilinx Vertex-II Family CLB
 Four basic slices in two groups
 Each has a fast carry-chain
 Local interconnect to wire logic
of each slice and connect to
the CLB array: switch matrix is
large collection of
programmable switches
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 42
Xilinx Vertex-II Family CLB Internals
 Just ½ of one slice!
 4-input LUT + FF
 Fast carry logic
 Many programmable
interconnections
for sync vs. async
operation
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 43
Xilinx Vertex-II Family Fast Carry Logic
(AB)Ci+AB
A
B
0
LUT
Co
C
Mux
1
B
(AB)Ci
0
LUT
1
1
1 1
AB
1
A
B
A
Mux
1
(AB)
0
Ci
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 44
(ABCi)
Xilinx Vertex-II Family CLB
 Sequential Portion
 Two positive edge-triggered
flip-flops
 Transparent latches or flipflops
 Asynchronous or synchronous
sets and resets
 Initialize to different values
at power-up
 Clocks and load enables
complemented or not
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 45
Xilinx Vertex-II Family Slice Personality
 4-input function generator
 OR 16 bits of dual-ported
random-access memory (with
separate address inputs for read
- G1 to G4 - and write - WG1 to
WG4)
 OR a 16-bit variable-tap shift
register
 With muxes, CLB can implement
any function of 8 inputs and
some functions of 9 inputs
 Registered and unregistered
versions of function block
outputs
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 46
Xilinx Vertex-II Family Interconnections
 Methods of interconnecting
CLBs and IOBs:
(1) direct fast connections within
a CLB
(2) direct-connections between
adjacent CLBs
(3) double-lines to fanout signals
to CLBs one or two away
(4) hex lines to connect to CLBs
three or six away
(5) long lines that span the entire
chip
 Fast access to neighbors
vertically and horizontally with
direct connections
 Double and hex lines provide a
slightly larger range
 Long lines saved for timecritical signals w/ min signal
skew
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 47
Programmable Logic Summary
 Discrete Gates
 Packaged Logic
 PLAs
 Ever more general architectures of programmable combinational
+ sequential logic and interconnect
 Altera
 Actel
 Xilinx—4000 series to Vertex
CLBs implementing logic function generators, RAMs, Shift registers, fast
carry logic
Local, inter-CLB, and long line interconnections
CS 150 – Fall 2005 - Lec #27: FPGA Evolution – 48