overarch - "PLDWorld.com"

Download Report

Transcript overarch - "PLDWorld.com"

Xilinx FPGA Architecture
Overview
®
Virtex/Spartan-II
Top-level Architecture
 Gate-array like architecture
 Configurable logic blocks
—
Implement logic here!
 I/O blocks
—
16 signal standards
 Block RAM
—
On-chip memory for higher
performance
 Clocks & Delay-Locked Loop
 Interconnect resources
—
Three-state internal buses
®
www.xilinx.com
Logic Cell Capacity
 A better first-order alternative to gate counting
 Better comparisons among different FPGAs
 Logic cell definition:
— 4-input look-up table + dedicated flip-flop
 Logic cells per CLB:
— Xc4000/Spartan 2.375 (2 4-LUTs, 1 3-LUT, 2 FFs)
— Virtex/Spartan-II 4.5 (4 4-LUTs, 1 F5MUX, 4 FFs)
®
www.xilinx.com
Configurable Logic Block (CLB)
 Combinational logic generated in a lookup table
(LUT)
— Any function of available inputs
 LUT output feeds CLB output or D input of flip-flop
Inputs
Combinational
Logic
Function
(LUT)
FlipFlop
Outputs
®
www.xilinx.com
Virtex/Spartan-II Function
Generators
 Four 4-input function
generators
CLB
Slice
— Independent inputs
(4 functions of 4 inputs)
 MUXF5 combines 2 LUTs
to form
— 4x1 multiplexer
— Or any 5-input function
 MUXF6 combines 2 slices
to form
— 8x1 multiplexer
— Or any 6-input function
www.xilinx.com
MUXF6
LUT
LUT
MUXF5
Slice
LUT
LUT
MUXF5
®
Lookup Table
 Generates any function of its inputs
— Typically 4 inputs
 Logically equivalent to a 16 x 1 ROM
Inputs
Output
0000
0001
0010
0011
0
1
1
0
LUT
®
www.xilinx.com
Targeting LUT-based Logic
 LUT limit is on inputs, not
complexity
— Reducing inputs/function
(fan-in) to fit CLBs
improves density and
speed
— Automatically done by
Xilinx synthesis and
implementation tools
CLB Lookup Table
 Inverters are free
®
www.xilinx.com
Duplicating Logic Can Improve
Results
 Collapsing of logic into CLBs affects number of
levels required and therefore speed
 The gates you use will determine mapping
— Nets with a fanout >1 may be outside a CLB
O1
O1
I1
I1
N1A
N1
N1B
N1 must go to two places, so O1 may
require a second level of logic
Duplicating first gate allows N1A to always be
collapsed inside a single lookup table
www.xilinx.com
®
Defining Lookup Tables With Gate
Primitives
 Example of gate primitive
AND2
 Up to five inputs with all combinations of inversion
— AND2B1 indicates 1 “bubbled” or inverted input
 Up to nine inputs non-inverted
— Add external INV primitives if desired
®
www.xilinx.com
Flip-Flops
 Stores data (D) on rising edge of clock (K)
— Clock enable (CE)
— Asynchronous clear (C)
K
X
0
CE
x
1
x
C
1
0
0
D
x
d
x
Q
0
d
q
D Q
CE
K
C
®
www.xilinx.com
Additional Flip-Flop Controls
 Reset (Clear) and/or Set
 Global initialization
(GSR)
— Use to initialize all flipflops
 Programmable clock
polarity
 Clock enable can be left
unconnected
®
www.xilinx.com
Virtex/Spartan-II CLB Slice
 1 CLB holds 2 slices
 Each slice has two sets of
— Four-input LUT
– Any 4-input logic function
– Or 16-bit x 1 RAM
– Or 16-bit shift register
— Carry & Control
– Fast arithmetic logic
– Multiplier logic
– Multiplexer logic
— Storage element
–
–
–
–
Latch or flip-flop
Set and reset
True or inverted inputs
Sync. or Async. Control
www.xilinx.com
®
Dedicated Multiplier Logic
 Highly efficient ‘Shift & Add’ implementation
— For a 16x16 multiplier
– 30% reduction in area
– 1 less logic level
LUT
A
CY_MUX
S
DI
CO
CI
CY_XOR
MULT_AND
AxB
B
®
www.xilinx.com
On-chip RAM
 All Xilinx FPGAs use RAM-based programming
 Adding Write Enable to LUT creates on-chip
SelectRAM memory
®
www.xilinx.com
SelectRAM Benefits
 Single-Port
— Synchronous
— Simple timing
Data
Write Enable
Write Clock
Output
Address
 Dual-Port
Data
Write Enable
Write Clock
Write Address/
Single-Port Read Address
Dual-Port Read Address
Single-Port
Output
Dual-Port
Output
®
www.xilinx.com
Memory Bandwidth and
Flexibility
 Virtex/Spartan-II On-Chip SelectRAM+ Memory
DSP Coefficients
Small FIFOs
Shallow/Wide
16x1
Distributed RAM
bytes
Large FIFOs
Packet Buffers
Video Line Buffers
Cache Tag Memory
Deep/Wide
4Kx1
2Kx2
1Kx4
512x8
256x16
Block RAM
kilobytes
SDRAM
ZBTRAM
SSRAM
SGRAM
External RAM
megabytes
200 MHz Memory Continuum
®
www.xilinx.com
Spartan-II Memory
 CLB LUTs provide small distributed RAM (16 bits/LUT)
 Block RAM provides 4K bits each
— Dual read/write port. Each port has…
– Independent Clock, R/W, and Enable
– Independently configurable data width from 4K x 1 to 256 x 16
W
R
R
W
Port B
Port A
Spartan-II
Dual-R/W
Port
Block RAM
W
W
R
R
®
www.xilinx.com
I/O Block (IOB)
 Periphery of identical I/O
blocks
— Input, output, or bi-directional
— Direct or registered (or latched
input)
— Pullup/Pulldown
— Programmable slew rate
— Three-state output
— Programmable thresholds
I
O
TS
Clocks
IOB
Pad
Bonded to
Package Pin
®
www.xilinx.com
Use Special IOB Primitives
 User explicitly defines what resources in the IOB
are to be used
 I/Os are defined with
— 1 pad primitive
— At least 1 function primitive
– 1 input element, 1 output element or both
– Inverters may also be pulled into IOBs
IPAD
IBUF
®
www.xilinx.com
Locking Down I/O Locations
 LOC=Pxx attribute defines I/O pad location(s)
 Avoid locking IOBs early
— Makes routing more difficult
 Use IOB LOC= to lock pins late in design cycle
once PCB is built
— Can lock IOBs if floorplanning the connected CLBs
®
www.xilinx.com
Use Pullups/Pulldowns
 Pullup automatically connected on unused IOBs
 User can specify PULLUP or PULLDOWN
primitive on used IOBs
 Inputs should not be left floating
— Add Pullup to design inputs that may be left floating to
reduce power and noise
IPAD
IBUF
®
www.xilinx.com
Faster Setup With NODELAY
 Delay included by default
— Compensates for clock routing delay to prevent hold
time
 NODELAY attribute removes delay element
— Creates hold time
Example IOB
Q D
External
Data
Pad
Delay
External Clock
Routed Clock
External Data
Input
Buffer
Delay Data
External
Clock
Routing
Delay
X
X
Pad
®
www.xilinx.com
Slew Rate Control
 Slew rate controls output speed
 Default slow slew rate reduces noise & ground
bounce
 Use fast slew rate wherever speed is important
— FAST parameter on output logic primitive
FAST
OPAD
OBUF
®
www.xilinx.com
Output Three-State Control
 Free inverter on output buffer control
— Use OBUFE macro for active-high enable
— Use OBUFT primitive for active-low enable
OBUFE
OE
T
OBUFT
OE
T
®
www.xilinx.com
Global Three-State
 3-state control either local and/or via a dedicated
global net
— Global three-state controlled by STARTUP... primitive
STARTUP
GTS
GSR
®
www.xilinx.com
Virtex/Spartan-II I/O Block
(Simplified)
®
www.xilinx.com
Multiple I/O Interface Standards
 16 to 20 I/O interface
standards supported
 CMOS, HSTL, SSTL,
GTL, CTT, PCI
 As many as eight banks
on a device
— Package dependent
 Different banks can
support different
standards at the same
time
— Logic level translation
— Boards with mixed
standards
www.xilinx.com
®
High Performance Routing
 Hierarchical Routing
— Singles, Hexes, Longs
 Sparse connections on
longer interconnects for
high speed
2ns
 Routing delay depends
primarily on distance
— Direction independent
— Device-size independent
CLB Array
 Predictable for early
design analysis
®
www.xilinx.com
Flexible General-Purpose
Interconnect
 Flexible but slow if crosses many channels
— Programmable switch matrix at each channel
crossing
— Connects across, changes direction or fans out
®
www.xilinx.com
Switch Matrix
 Bidirectional pass transistors
 High routing flexibility
®
www.xilinx.com
Reduce Fanout
 Higher fanout nets (>16 loads) are harder to route
& slower
 Consider duplicating source in schematic to
improve routing or speed
fn1
D
Q
fn1
D
Q
fn1
D
Q
®
www.xilinx.com
Long Lines for High Fanout Nets
 Metal lines that traverse
length & width of chip
 Lowest skew
CLB
CLB
CLB
CLB
 Ideal for high fan-out
signals
 Ideal for clocking
 Requires vertical or
horizontal alignment
of loads
®
www.xilinx.com
Internal Three-State Buses
 Two 3-state drivers per CLB
 OR-AND logic implementation in place of 3-state drivers
— With no drivers enabled, bus is a logic 1
 Low power
— No danger of contention when multiple BUFTs enabled
— No physical pullups or large capacitance to drive
®
www.xilinx.com
General Clock Support
 Use clock buffers for highest fanout clocks
— Drive high-speed long line resources
– Lowest skew across a device
– No internal hold times
— Use generic BUFG primitive
– Allows software to choose best type of buffer
– Allows easy migration across families
 Four dedicated global low skew buffers
— Dedicated input pin (clock distribution only)
 Additional shared resources (i.e., long lines)
— Distribute low-skew/high-fanout signals (10ns max.)
 Four delay-locked loops on each device
— All-digital implementation
— Two global buffers associated with each DLL pair
®
www.xilinx.com
Configuration
 Schematic or HDL description is converted to a
configuration file by the Xilinx development system
 Configuration file is loaded into FPGA on power-up
— Stored in configuration latches
— Controls CLBs, IOBs, interconnect, etceteras
®
www.xilinx.com
Configuration Bitstream
 Binary programming file
 Length depends only on device, not utilization
— Typically 1 ms per bit (total from a few ms to <1s)
 FPGA can load its configuration automatically on
power-up, or under microprocessor control
 Can be loaded directly into device/configuration
PROM
®
www.xilinx.com
Configuration Modes
 Bit-serial configuration
— Simple, uses few device pins
— Controlled by FPGA (Master) or externally (Slave)
— Xilinx serial proms available
 Byte-parallel configuration
— Can drive PROM addresses (Master)
— Can be microprocessor-controlled
®
www.xilinx.com
Configuration Pins
 Configuration starts on power-up
 Mode pin(s) checked to determine method
— Usable as extra I/O after configuration
 All I/O not used for configuration are disabled
 Reconfiguration possible by pulling PROGRAM pin
low
®
www.xilinx.com
Readback
 Configuration data can be read back serially
— Allows verification of programming
 Readback data can include user-register values
— Allows in-circuit functional verification
— Requires READBACK... symbol
CLK
TRIG
DATA
READBACK
RIP
®
www.xilinx.com
Boundary Scan
 IEEE 1149.1-compatible boundary scan (JTAG)
 Available before configuration
 Configuration & readback possible via boundary
scan logic
®
www.xilinx.com
Power Consumption
 CMOS SRAM technology provides low standby
power
 Operating power is mostly dynamic
— Proportional to transition frequency of internal nodes
— Xilinx segmented interconnect minimizes amount of
metal capacitance to switch, minimizing power
®
www.xilinx.com