PLD Organization and Designing with FPGAs(Abdul)

Download Report

Transcript PLD Organization and Designing with FPGAs(Abdul)

Programmable Logic Devices
by
Abdulqadir Alaqeeli
1/27/98
Programmable Logic
— Programming Methods
— Programmable Logic Devices
– SPLDs
– CPLDs
– FPGAs
Designing for FPGAs
– Metastability
– Synchronous Designs
– Designing State Machine
2
Programming Methods
Programming Methods
Fuseable
EPROM
EEPROM
SRAM
3
FUSE
 Fuses are the basic
storage element in TTL
programmable circuits.
 Passing a large current
through fuse layer
blows it. This allows
the IC to store data by
having the fuses
selectively blown.
4
EPROM
 In CMOS the metal
fuse is replaced by
FAMOS transistor.
 By hot electron
injection, a charge is
placed onto the floating
gate and switch action
is provided.
 UV erasable.
5
EEPROM and SRAM
 EEPROM
— Electrically erasable floating gate.
— No UV.
 SRAM
— Loads configuration memory cells that
control the logic and interconnect. (i.e.
pass-transistors)
— To erase, turn the power off.
6
Programming Technologies
1) Bipolar fusible link
- Closed device, burned open by high current
2) SRAM based
- Uses pass transistors controlled by SRAM
- CMOS based
3) E/EEPROM based
- Floating gate
- CMOS based
7
Programmable Logic
Devices
 Simple PLDs:
– PALs
– PLAs
– PROMs
– GALs
 Complex PLDs
 FPGAs
8
Programmable Array Logic
PALs
 Programmable AND array.
 Fixed OR array.
 Bipolar, Fuse.
 Large number of Inputs.
 Each Output relatively independent.
9
Programmable Logic Arrays
PLAs
 Programmable AND array.
 Programmable OR array.
 Bipolar, Fuse.
 Large number of Inputs.
 Output functions share some product terms.
10
Programmable ROM
PROM
 Fixed AND array.
 programmable OR array.
 Fuse.
 Limited number of Inputs.
 Strong independence among the Outputs.
11
 PALs : most popular PLD architecture.
 PLAs : most flexible of combinatorial PLDs.
 PROMs:can be used to store any logic
function.
12
Generic Array Logic
GALs
 Configurable PAL-type.
 CMOS.
 Electrically Erasable CMOS technology
 Replaces many PAL devices.
13
Complex Programmable Logic
Devices
( CPLDs )
14
XC7300 Dual Block
Architecture
Universal Interconnect Matrix
- SMARTswitch
I/O
PAL-like Function Block
High
Density
Function
Block
High
Density
Function
Block
I/O
Input
Registers
UIM
3.3 /5 Volt I/O
FO
Fast
Function
Block
FAST
5 ns Pin to Pin
fCLK =167 MHz
Fast
Function
Block
FO
High Drive
- 24 mA
FAST
tSU = 4.0 ns
tC0 = 5.5 ns
15
XC9500 - Flexible Architecture
3
JTAG Port
JTAG
Controller
In-System
Programming Controller
Function
Block 1
I/O
I/O
Function
Block 2
I/O
I/O
I/O
Global
Clocks
Global
Set/Reset
Global
Tri-States
Blocks
FastCONNECT
Switch Matrix
Function
Block 3
3
1
Function
Block n
2 or 4
16
XC9500 Function Block
Global
Clocks
AND
Array
3
Global
Tri-State
2 or 4
Macrocell 1
I/O
Macrocell 18
I/O
ProductTerm
Allocator
36
From
FastCONNECT
To
FastCONNECT
17
XC9500 Architectural Features
 Uniform, PAL-like architecture
 Flexible function block
—36 inputs with 18 outputs
—Expandable to 90 product terms per macrocell
—Product term and global 3-state enables
—Product term and global clocks
 3.3V/5V I/O operation
18
XC9500 Optimizes Pin-Locking
Add another pin
or FB output
Add more logic
Inputs
36
Inputs
FastCONNECT
Switch Matrix
Fixed
Output
Pin
D/T Q
Function Block
Logic
Add another FB input
19
XC9500 Product Family
0.6µ Phase I Family
9536
9536F
9572
9572F
95108
95108F
95144
95180
95216
95288
Macrocells
36
72
108
144
180
216
288
Usable
Gates
800
1600
2400
3200
4000
4800
6400
tPD (ns)
5
7.5
7.5
7.5
10
10
10
Registers
36
72
108
144
180
216
288
34
72
108
133
168
168
192
84PC1
100TQ
100PQ1
84PC1
100TQ
100PQ1
160PQ1
100PQ
160PQ
160PQ
208HQ
160PQ
208HQ
Max. User
I/Os
Packages
44PC1
44VQ
208HQ
304HQ
20
Field Programmable Gate
Arrays
( FPGAs )
21
FPGA Architecture
CLB
Slew
Rate
Control
CLB
D
Q
Passive
Pull-Up,
Pull-Down
Vcc
Output
Buffer
Switch
Matrix
Pad
Input
Buffer
CLB
Q
CLB
Programmable
Interconnect
D
Delay
I/O Blocks (IOBs)
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
G4
G3
G2
G1
DIN
G
Func.
Gen.
SD
F'
H'
EC
RD
1
F4
F3
F2
F1
H
Func.
Gen.
F
Func.
Gen.
Y
G'
H'
S/R
Control
DIN
SD
F'
D
G'
Q
H'
1
H'
K
Q
D
G'
F'
EC
RD
X
Configurable
Logic Blocks (CLBs)
22
XC4000 Configurable Logic Blocks
 2 Four-input function
generators (Look Up
Tables)
— 16x1 RAM or G4
Logic function G3
G2
G1
 2 Registers

- Each can be
configured as Flip
F4
Flop or Latch
F3

- Independent
F2
F1
clock polarity

- Synchronous
and asynchronous
Set/Reset
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
DIN
F'
G'
G
Func.
Gen.
SD
EC
RD
1
G'
H'
Y
S/R
Control
DIN
SD
F'
G'
D
Q
XQ
H'
1
EC
RD
H'
K
YQ
H'
H
Func
.Gen.
F
Func.
Gen.
Q
D
X
F'
23
Look Up Tables
 Combinatorial Logic is stored in 16x1 SRAM Look Up
Tables (LUTs) in a CLB
Look Up Table
 Example:
4-bit address
Combinatorial Logic
A B C D
A
B
Z
C
D
 Capacity is limited by number of
inputs, not complexity
 Choose to use each function
generator as 4 input logic (LUT) or
as high speed sync.dual port
WE
RAM
G4
G3
G2
G1
G
Func.
Gen.
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
1
0
0
0
1
0
1
0
1
Z
0
0
0
1
1
1
4
(2 )
2
= 64K !
. . .
1
1
1
1
1
1
1
1
0
0
1
1
0
1
0
1
0
0
0
1
24
ROM is Equivalent to Logic
 When using ROM, it is simply defining logic
functions in a look-up table format
— Memory might be an easier way to define logic
— Xilinx provides ROM library cells
 FPGA lookup tables are essentially blocks of RAM
— Data is written during configuration
— Data is read after configuration
– Effectively operate as a ROM
As Gates
I1
I2
F1
F2
A0
O = I1*I2
X
O
A1
As ROM
DATA(0)=0
F1
DATA(1)=0 X
F2 DATA(2)=0
DATA(3)=1
DOUT
25
RAM Provides 16X the Storage of
Flip-Flops
 32 bits versus 2 bits of storage
— Two 16x1 RAMS or One 32X1 Single Port
Ram fit in one CLB
— One 16x1 Dual Port RAM fits in one CLB
CLB
D1
A0
A1
A2
A3
A4
32 bits
WE
CLB
D1
DQ
Q1
D2
2 bits
DQ
Q2
O1
CLK
 32x8 shift register with RAM = 11 CLBs
— Using flip-flops, takes 128 CLBs for data alone
— Address decoders not included
26
Using Function Generator As
RAM
27
RAM Guidelines
 Less than 32 words is best
—32x1 or 16x2 per RAM requires only one CLB
– Delays are short, (one level of logic)
—Data and output MUXes are required to expand
depth
 Less than 256 words recommended per RAM
—Use external memory for 256 words or more
 Width easily expanded
—Connect the address lines to multiple blocks
 Recommendation: Use less than 1/2 of max memory
resources
—Maximum memory uses all logic resources of
CLBs
28
XC4000E I/O Block Diagram
Slew
Rate
Control
Passive
Pull-Up,
Pull-Down
Vcc
T/OE
O
D
Q
Output
Buffer
OK (Output
Clock)
Pad
I1
Input
Buffer
I2
Q
CE
IK (Input
Clock)
D
Delay
Elements in BLUE are not in the XC3000 family.
29
Xilinx FPGA Routing
 Fast Direct Interconnect - CLB to CLB
 General Purpose Interconnect - Uses switch
matrix
 Long Lines
— Segmented
across chip
— Global clocks,
lowest skew
— 2 Tri-states per
CLB for busses
CLB
Switch
Matrix
CLB
Switch
Matrix
CLB
CLB
30
Fast Direct Interconnect
 Direct connections
from CLB to adjacent
CLB or IOB
 Fastest interconnect
—Less than 1 ns
delay
CLB
CLB
CLB
CLB
31
Flexible General-Purpose Interconnect
 Flexible but slow if
crosses many channels
 XC3000
—5 lines per channel
 XC4000
—8 similar SingleSwitch
Matrix
Length lines
—4 Double-Length lines
skip every other switch
matrix
—4 Quadrable-Length
Lines skip three switch
matrices.
CLB
CLB
Switch
Matrix
CLB
CLB
32
Use Long Lines for High Fanout Nets
 Single metal lines that traverse length & width of chip
 Lowest skew
 Ideal for high fan-out signals
 Ideal for clocking
CLB
CLB
CLB
CLB
 Internal three-state buffers
for buses and wide
functions
33
CPLD or FPGA?
CPLD
FPGA
 Non-volatile
 SRAM reconfiguration
 Wide fan-in
 Excellent for computer
architecture, DSP,
registered designs
 Fast counters, state
machines
 Combinational Logic
 PROM required for nonvolatile operation
34
Designing For
FPGAs
35
Avoiding Metastability
 Metastability caused by violation of timing specifications such
as setup
 In-between state takes unknown time to resolve
—Two destinations could be responding to different values
 Error rate decreases by a factor of 40 for every additional 1ns
of delay before destinations respond to signal
 Be aware but not paranoid!
D
Q
Metastable Output
Data and Clock
Change Simultaneously
36
Use Synchronous Design
 Easy to analyze internal timing of synchronous
designs
 Hold time is not an issue
—Clock skew is guaranteed to be much shorter
than the minimum clock-to-Q of any CLB
 Use global clock distribution networks
—If not, check for clock skew problems
D
3.0ns
Q
2.5ns
D
Q
3.1ns
37
Avoid Gated Clock or
Asynchronous Reset
 Move gating to non-clock pin to prevent glitch from
affecting logic
3-Bit Counter
Q0
Q1
Q2
3-Bit Counter
D Q
Carry
Q0
Q1
Q2
Carry-1
D Q
 Or separate input signal changes by at least a CLB
delay to minimize the likelihood of a glitch
38
Pipeline for Speed
 Register-rich FPGAs encourage pipelining
 Pipelining improves speed
—Consider wherever latency is not an issue
—Use for terminal counts, carry lookahead, etc.
 Clock period will be approximately
—2 x (number of combinatorial levels) x (speed
grade)
—XC3100A-3: 3 levels x 2 x 3ns = 18 ns clock
period
39
Use Dedicated Carry for
Large Counters
 Use XC4000/XC5000 carry logic to improve counter speed
and density
—Especially for counters of >5 bits
tADDER
tCO
A
d
d
e
r
R
e
g
tNET
40
Use One-Hot Encoding
for State Machines
 Shift register is always fast and dense
—“One-hot” uses one flip-flop for each count
—Useful for state machine encoding
D
Q
D
Q
D
Q
D
Q
D
Q
 Use MooreType state machines.
41
Use LFSRs for Fixed Count
 Consider Linear Feedback Shift Register for speed when
terminal count is all that is needed
—Or when any regular sequence is acceptable (e.g.,
FIFO)
 Maximal length sequence of 2n-1
 Use XNOR feedback to make lockup state all 1s
D1
10-bit Shift Register
Q1
Q7
Q10
42
Use Global Clock Buffers
 Use clock buffers for highest fanout clocks
—Drive low-skew, high-speed long line resources
—Use BUFG primitive to be family-independent
 Limit number of clocks to ease placement issues
—XC3000: 2 (GCLK, ACLK)
—XC4000/XC5000: 4 (BUFGP / BUFG)
 Additional clocks might be routable on long lines
—Otherwise routed on general interconnect
–Slower and higher skew
43
Using a Clock Generated Off-Chip
 Connect IPAD directly to clock buffer
primitive
—Required for BUFGP
 Provides higher speed and uses fewer routing
resources
D
IPAD
BUFG
44
Generating Clock On-Chip
 XC4000
—Internal clock available
after configuration
—Use OSC4 primitive
F8M
F500k
OSC4
F16k
BUFGS
F490
F15
45
Use Clock Enables Instead
of Gating Clock
 Use clock enable when using
most of or all logic inputs
—Not recommended to gate
clock signal directly
FDxE
D
Q
CE
 Use muxed data when using
only 1-2 logic inputs
—Easier to route
D
Q
 Some macros use logic for clock
CE
enable while others use the CE pin
—Make sure CE, if unused, is always connected to VCC
46