PLD Organization and Designing with FPGAs(Abdul)

Transcript PLD Organization and Designing with FPGAs(Abdul)

Programmable Logic Devices
by
Abdulqadir Alaqeeli
1/27/98
Programmable Logic
— Programming Methods
— Programmable Logic Devices
– SPLDs
– CPLDs
– FPGAs
Designing for FPGAs
– Metastability
– Synchronous Designs
– Designing State Machine
2
Programming Methods
Programming Methods
Fuseable
EPROM
EEPROM
SRAM
3
FUSE
 Fuses are the basic
storage element in TTL
programmable circuits.
 Passing a large current
through fuse layer
blows it. This allows
the IC to store data by
having the fuses
selectively blown.
4
EPROM
 In CMOS the metal
fuse is replaced by
FAMOS transistor.
 By hot electron
injection, a charge is
placed onto the floating
gate and switch action
is provided.
 UV erasable.
5
EEPROM and SRAM
 EEPROM
— Electrically erasable floating gate.
— No UV.
 SRAM
— Loads configuration memory cells that
control the logic and interconnect. (i.e.
pass-transistors)
— To erase, turn the power off.
6
Programming Technologies
1) Bipolar fusible link
- Closed device, burned open by high current
2) SRAM based
- Uses pass transistors controlled by SRAM
- CMOS based
3) E/EEPROM based
- Floating gate
- CMOS based
7
Programmable Logic
Devices
 Simple PLDs:
– PALs
– PLAs
– PROMs
– GALs
 Complex PLDs
 FPGAs
8
Programmable Array Logic
PALs
 Programmable AND array.
 Fixed OR array.
 Bipolar, Fuse.
 Large number of Inputs.
 Each Output relatively independent.
9
Programmable Logic Arrays
PLAs
 Programmable AND array.
 Programmable OR array.
 Bipolar, Fuse.
 Large number of Inputs.
 Output functions share some product terms.
10
Programmable ROM
PROM
 Fixed AND array.
 programmable OR array.
 Fuse.
 Limited number of Inputs.
 Strong independence among the Outputs.
11
 PALs : most popular PLD architecture.
 PLAs : most flexible of combinatorial PLDs.
 PROMs:can be used to store any logic
function.
12
Generic Array Logic
GALs
 Configurable PAL-type.
 CMOS.
 Electrically Erasable CMOS technology
 Replaces many PAL devices.
13
Complex Programmable Logic
Devices
( CPLDs )
14
XC7300 Dual Block
Architecture
Universal Interconnect Matrix
- SMARTswitch
I/O
PAL-like Function Block
High
Density
Function
Block
High
Density
Function
Block
I/O
Input
Registers
UIM
3.3 /5 Volt I/O
FO
Fast
Function
Block
FAST
5 ns Pin to Pin
fCLK =167 MHz
Fast
Function
Block
FO
High Drive
- 24 mA
FAST
tSU = 4.0 ns
tC0 = 5.5 ns
15
XC9500 - Flexible Architecture
3
JTAG Port
JTAG
Controller
In-System
Programming Controller
Function
Block 1
I/O
I/O
Function
Block 2
I/O
I/O
I/O
Global
Clocks
Global
Set/Reset
Global
Tri-States
Blocks
FastCONNECT
Switch Matrix
Function
Block 3
3
1
Function
Block n
2 or 4
16
XC9500 Function Block
Global
Clocks
AND
Array
3
Global
Tri-State
2 or 4
Macrocell 1
I/O
Macrocell 18
I/O
ProductTerm
Allocator
36
From
FastCONNECT
To
FastCONNECT
17
XC9500 Architectural Features
 Uniform, PAL-like architecture
 Flexible function block
—36 inputs with 18 outputs
—Expandable to 90 product terms per macrocell
—Product term and global 3-state enables
—Product term and global clocks
 3.3V/5V I/O operation
18
XC9500 Optimizes Pin-Locking
Add another pin
or FB output
Add more logic
Inputs
36
Inputs
FastCONNECT
Switch Matrix
Fixed
Output
Pin
D/T Q
Function Block
Logic
Add another FB input
19
XC9500 Product Family
0.6µ Phase I Family
9536
9536F
9572
9572F
95108
95108F
95144
95180
95216
95288
Macrocells
36
72
108
144
180
216
288
Usable
Gates
800
1600
2400
3200
4000
4800
6400
tPD (ns)
5
7.5
7.5
7.5
10
10
10
Registers
36
72
108
144
180
216
288
34
72
108
133
168
168
192
84PC1
100TQ
100PQ1
84PC1
100TQ
100PQ1
160PQ1
100PQ
160PQ
160PQ
208HQ
160PQ
208HQ
Max. User
I/Os
Packages
44PC1
44VQ
208HQ
304HQ
20
Field Programmable Gate
Arrays
( FPGAs )
21
FPGA Architecture
CLB
Slew
Rate
Control
CLB
D
Q
Passive
Pull-Up,
Pull-Down
Vcc
Output
Buffer
Switch
Matrix
Pad
Input
Buffer
CLB
Q
CLB
Programmable
Interconnect
D
Delay
I/O Blocks (IOBs)
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
G4
G3
G2
G1
DIN
G
Func.
Gen.
SD
F'
H'
EC
RD
1
F4
F3
F2
F1
H
Func.
Gen.
F
Func.
Gen.
Y
G'
H'
S/R
Control
DIN
SD
F'
D
G'
Q
H'
1
H'
K
Q
D
G'
F'
EC
RD
X
Configurable
Logic Blocks (CLBs)
22
XC4000 Configurable Logic Blocks
 2 Four-input function
generators (Look Up
Tables)
— 16x1 RAM or G4
Logic function G3
G2
G1
 2 Registers

- Each can be
configured as Flip
F4
Flop or Latch
F3

- Independent
F2
F1
clock polarity

- Synchronous
and asynchronous
Set/Reset
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
DIN
F'
G'
G
Func.
Gen.
SD
EC
RD
1
G'
H'
Y
S/R
Control
DIN
SD
F'
G'
D
Q
XQ
H'
1
EC
RD
H'
K
YQ
H'
H
Func
.Gen.
F
Func.
Gen.
Q
D
X
F'
23
Look Up Tables
 Combinatorial Logic is stored in 16x1 SRAM Look Up
Tables (LUTs) in a CLB
Look Up Table
 Example:
4-bit address
Combinatorial Logic
A B C D
A
B
Z
C
D
 Capacity is limited by number of
inputs, not complexity
 Choose to use each function
generator as 4 input logic (LUT) or
as high speed sync.dual port
WE
RAM
G4
G3
G2
G1
G
Func.
Gen.
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
1
0
0
0
1
0
1
0
1
Z
0
0
0
1
1
1
4
(2 )
2
= 64K !
. . .
1
1
1
1
1
1
1
1
0
0
1
1
0
1
0
1
0
0
0
1
24
ROM is Equivalent to Logic
 When using ROM, it is simply defining logic
functions in a look-up table format
— Memory might be an easier way to define logic
— Xilinx provides ROM library cells
 FPGA lookup tables are essentially blocks of RAM
— Data is written during configuration
— Data is read after configuration
– Effectively operate as a ROM
As Gates
I1
I2
F1
F2
A0
O = I1*I2
X
O
A1
As ROM
DATA(0)=0
F1
DATA(1)=0 X
F2 DATA(2)=0
DATA(3)=1
DOUT
25
RAM Provides 16X the Storage of
Flip-Flops
 32 bits versus 2 bits of storage
— Two 16x1 RAMS or One 32X1 Single Port
Ram fit in one CLB
— One 16x1 Dual Port RAM fits in one CLB
CLB
D1
A0
A1
A2
A3
A4
32 bits
WE
CLB
D1
DQ
Q1
D2
2 bits
DQ
Q2
O1
CLK
 32x8 shift register with RAM = 11 CLBs
— Using flip-flops, takes 128 CLBs for data alone
— Address decoders not included
26
Using Function Generator As
RAM
27
RAM Guidelines
 Less than 32 words is best
—32x1 or 16x2 per RAM requires only one CLB
– Delays are short, (one level of logic)
—Data and output MUXes are required to expand
depth
 Less than 256 words recommended per RAM
—Use external memory for 256 words or more
 Width easily expanded
—Connect the address lines to multiple blocks
 Recommendation: Use less than 1/2 of max memory
resources
—Maximum memory uses all logic resources of
CLBs
28
XC4000E I/O Block Diagram
Slew
Rate
Control
Passive
Pull-Up,
Pull-Down
Vcc
T/OE
O
D
Q
Output
Buffer
OK (Output
Clock)
Pad
I1
Input
Buffer
I2
Q
CE
IK (Input
Clock)
D
Delay
Elements in BLUE are not in the XC3000 family.
29
Xilinx FPGA Routing
 Fast Direct Interconnect - CLB to CLB
 General Purpose Interconnect - Uses switch
matrix
 Long Lines
— Segmented
across chip
— Global clocks,
lowest skew
— 2 Tri-states per
CLB for busses
CLB
Switch
Matrix
CLB
Switch
Matrix
CLB
CLB
30
Fast Direct Interconnect
 Direct connections
from CLB to adjacent
CLB or IOB
 Fastest interconnect
—Less than 1 ns
delay
CLB
CLB
CLB
CLB
31
Flexible General-Purpose Interconnect
 Flexible but slow if
crosses many channels
 XC3000
—5 lines per channel
 XC4000
—8 similar SingleSwitch
Matrix
Length lines
—4 Double-Length lines
skip every other switch
matrix
—4 Quadrable-Length
Lines skip three switch
matrices.
CLB
CLB
Switch
Matrix
CLB
CLB
32
Use Long Lines for High Fanout Nets
 Single metal lines that traverse length & width of chip
 Lowest skew
 Ideal for high fan-out signals
 Ideal for clocking
CLB
CLB
CLB
CLB
 Internal three-state buffers
for buses and wide
functions
33
CPLD or FPGA?
CPLD
FPGA
 Non-volatile
 SRAM reconfiguration
 Wide fan-in
 Excellent for computer
architecture, DSP,
registered designs
 Fast counters, state
machines
 Combinational Logic
 PROM required for nonvolatile operation
34
Designing For
FPGAs
35
Avoiding Metastability
 Metastability caused by violation of timing specifications such
as setup
 In-between state takes unknown time to resolve
—Two destinations could be responding to different values
 Error rate decreases by a factor of 40 for every additional 1ns
of delay before destinations respond to signal
 Be aware but not paranoid!
D
Q
Metastable Output
Data and Clock
Change Simultaneously
36
Use Synchronous Design
 Easy to analyze internal timing of synchronous
designs
 Hold time is not an issue
—Clock skew is guaranteed to be much shorter
than the minimum clock-to-Q of any CLB
 Use global clock distribution networks
—If not, check for clock skew problems
D
3.0ns
Q
2.5ns
D
Q
3.1ns
37
Avoid Gated Clock or
Asynchronous Reset
 Move gating to non-clock pin to prevent glitch from
affecting logic
3-Bit Counter
Q0
Q1
Q2
3-Bit Counter
D Q
Carry
Q0
Q1
Q2
Carry-1
D Q
 Or separate input signal changes by at least a CLB
delay to minimize the likelihood of a glitch
38
Pipeline for Speed
 Register-rich FPGAs encourage pipelining
 Pipelining improves speed
—Consider wherever latency is not an issue
—Use for terminal counts, carry lookahead, etc.
 Clock period will be approximately
—2 x (number of combinatorial levels) x (speed
grade)
—XC3100A-3: 3 levels x 2 x 3ns = 18 ns clock
period
39
Use Dedicated Carry for
Large Counters
 Use XC4000/XC5000 carry logic to improve counter speed
and density
—Especially for counters of >5 bits
tADDER
tCO
A
d
d
e
r
R
e
g
tNET
40
Use One-Hot Encoding
for State Machines
 Shift register is always fast and dense
—“One-hot” uses one flip-flop for each count
—Useful for state machine encoding
D
Q
D
Q
D
Q
D
Q
D
Q
 Use MooreType state machines.
41
Use LFSRs for Fixed Count
 Consider Linear Feedback Shift Register for speed when
terminal count is all that is needed
—Or when any regular sequence is acceptable (e.g.,
FIFO)
 Maximal length sequence of 2n-1
 Use XNOR feedback to make lockup state all 1s
D1
10-bit Shift Register
Q1
Q7
Q10
42
Use Global Clock Buffers
 Use clock buffers for highest fanout clocks
—Drive low-skew, high-speed long line resources
—Use BUFG primitive to be family-independent
 Limit number of clocks to ease placement issues
—XC3000: 2 (GCLK, ACLK)
—XC4000/XC5000: 4 (BUFGP / BUFG)
 Additional clocks might be routable on long lines
—Otherwise routed on general interconnect
–Slower and higher skew
43
Using a Clock Generated Off-Chip
 Connect IPAD directly to clock buffer
primitive
—Required for BUFGP
 Provides higher speed and uses fewer routing
resources
D
IPAD
BUFG
44
Generating Clock On-Chip
 XC4000
—Internal clock available
after configuration
—Use OSC4 primitive
F8M
F500k
OSC4
F16k
BUFGS
F490
F15
45
Use Clock Enables Instead
of Gating Clock
 Use clock enable when using
most of or all logic inputs
—Not recommended to gate
clock signal directly
FDxE
D
Q
CE
 Use muxed data when using
only 1-2 logic inputs
—Easier to route
D
Q
 Some macros use logic for clock
CE
enable while others use the CE pin
—Make sure CE, if unused, is always connected to VCC
46

PLD Organization and Designing with FPGAs(Abdul)

Transcript PLD Organization and Designing with FPGAs(Abdul)

Directory