Xilinx XC4000 FPGA devices - Mahanakorn University of

Download Report

Transcript Xilinx XC4000 FPGA devices - Mahanakorn University of

Introduction to CPLD/FPGA
Technology, Devices and Tools
Theerayod Wiangtong
Electronic Department
Mahanakorn University of Technology
1
Outline
• Programmable Logic
– CPLD
– FPGA
•
•
•
•
Architecture: Basic & Advance
Examples
Features
Vendors and Devices
• Design Tools
2
World of Integrated Circuits
Full-Custom
ASICs
Semi-Custom
ASICs
User
Programmable
PLD
FPGA
3
ASIC
• ASIC: Application Specific
Integrated Circuit
• Designs must be sent for expensive and
time consuming fabrication in
semiconductor foundry
• Designed all the way from behavioral
description to physical layout
4
CPLD/FPGA
• CPLD: Complex Programmable Logic Device
• FPGA: Field Programmable Gate Array
•
•
•
•
•
Small development overhead
No NRE (non-recurring engineering) costs
Quick time to market
No minimum quantity order
Reprogrammable
5
Which Way to Go?
Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in
high volumes
Reconfigurability
ASICs
CPLD/FPGAs
6
Other Advantages
• Manufacturing cycle for ASIC is very costly,
lengthy and engages lots of manpower
– Mistakes not detected at design time have large
impact on development time and cost
– FPGAs are perfect for rapid prototyping of digital
circuits
• Easy upgrades like in case of software
• Unique applications
– Reconfigurable computing
7
Programmable Logic
CPLD/FPGA
8
Programmable Logic
• Programmable digital integrated circuit
• Standard off-the-shelf parts
• Desired functionality is implemented by configuring onchip logic blocks and interconnections
• Types of programmable logic:
– Complex PLDs (CPLD)
– Field programmable Gate Arrays (FPGA)
9
PLD - Sum of Products
Programmable AND array followed by fixed fan-in OR gates
A
B
C
Programmable switch or fuse
f1  A  B  C  A  B  C
f2  A  B  A  B  C
AND plane
10
PLD - Macrocell
Can implement combinational or sequential logic
Select
A
B
Enable
C
f1
Flip-flop
MUX
D
Q
Clock
AND plane
11
CPLD Structure
Integration of several PLD blocks with a programmable
interconnect on a single chip
PLD
Block
•
•
•
•
•
•
I/O Block
PLD
Block
I/O Block
I/O Block
•
•
•
Interconnection Matrix
I/O Block
•
•
•
PLD
Block
PLD
Block
12
CPLD Example - Altera MAX7000
EPM7000 Series Block Diagram
13
CPLD Example - Altera MAX7000
EPM7000 Series Device
Macrocell
14
FPGA Architecture
15
FPGA - Generic Structure
Logic
block
FPGA building blocks:
•
•
I/O
I/O
I/O
•
Programmable logic blocks
Implement combinatorial and
sequential logic
Programmable interconnect
Wires to connect inputs and
outputs to logic blocks
Programmable I/O blocks
Special logic blocks at the
periphery of device for external
connections
Interconnection switches
I/O
16
FPGA – Basic Logic Element
• LUT to implement combinatorial logic
• Register for sequential circuits
• Additional logic (not shown):
– Carry logic for arithmetic functions
– Expansion logic for functions requiring more than 4 inputs
Select
Out
A
B
C
D
LUT
D
Clock
Q
17
Look-Up Tables (LUT)
• Look-up table with N-inputs can be used to implement
any combinatorial function of N inputs
• LUT is programmed with the truth-table
A
B
C
D
Z
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
1
0
1
1
1
0
1
1
1
0
0
0
A
B
C
D
LUT
Z
LUT implementation
A
B
Z
C
D
Truth-table
Gate implementation
18
LUT Implementation
• Example: 3-input LUT
• Based on multiplexers
(pass transistors)
• LUT entries stored in
configuration memory cells
X1
X2
0/1
0/1
0/1
0/1
0/1
F
0/1
0/1
Configuration memory
cells
0/1
X3
19
Programmable Interconnect
• Interconnect hierarchy (not shown)
– Fast local interconnect
– Horizontal and vertical lines of various lengths
LE
LE
Switch
Matrix
LE
LE
Switch
Matrix
LE
LE
20
Switch Matrix Operation
Before Programming
•
•
•
After Programming
6 pass transistors per switch matrix
interconnect point
Pass transistors act as
programmable switches
Pass transistor gates are driven by
configuration memory cells
21
Configuration Storage Elements
• Static Random Access Memory (SRAM)
– each switch is a pass transistor controlled by the state of an
SRAM bit
– FPGA needs to be configured at power-on
• Flash Erasable Programmable ROM (Flash)
– each switch is a floating-gate transistor that can be turned off by
injecting charge onto its gate. FPGA itself holds the program
– reprogrammable, even in-circuit
• Fusible Links (“Antifuse”)
– Forms a forms a low resistance path when electrically
programmed
– one-time programmable in special programming machine
– radiation tolerant
22
FPGA Technology Roadmap
Year
1995
1996
1997
2000
2003
2004
Technology
0.6µ
0.35 µ 0.25 µ
0.18 µ
0.13 µ
0.09µ
Transistor
count
3.5M
75M
430M
1B
12M
23M
23
Special Features
• Clock management
– PLL,DLL
– Eliminate clock skew between external clock input and
on-chip clock
– Low-skew global clock distribution network
•
•
•
•
•
Embedded memory blocks
Support for various interface standards
High-speed serial I/Os
Embedded processor cores
DSP blocks
24
FPGA Vendors & Device
Families
• Xilinx
– Virtex-II/Virtex-4: Featurepacked high-performance
SRAM-based FPGA
– Spartan 3: low-cost feature
reduced version
– CoolRunner: CPLDs
• Altera
– Stratix/Stratix-II
• High-performance SRAM-based
FPGAs
– Cyclone/Cyclone-II
• Actel
– Anti-fuse based FPGAs
• Radiation tolerant
– Flash-based FPGAs
• Lattice
– Flash-based FPGAs
– CPLDs (EEPROM)
• QuickLogic
– ViaLink-based FPGAs
• Low-cost feature reduced
version for cost-critical
applications
– MAX3000/7000 CPLDs
– MAX-II: Flash-based FPGA
25
State of the Art in FPGAs
• 90 nm process on 300 mm wafers
– Lower cost per function (LUT + register)
– Smaller and faster transistors: Higher speed
• System speed up to 500 MHz
– Mainly through smart interconnects, clock management, dedicated
circuits, flexible I/O.
– Integrated transceivers running at 10 Gigabits/sec
• More Logic and Better Features:
– >100,000 LUTs & flip-flops
– >200 embedded RAMs, and same number 18 x 18 multipliers
• 1156 pins (balls) with >800 GP I/O
– 50 I/O standards, incl. LVDS with internal termination
• 16 low-skew global clock lines
– Multiple clock management circuits
• On-chip microprocessor(s) and multi-Gbps transceivers
26
Latest Devices: Capacity & Features
Xilinx Virtex-4
Altera Stratix-II
•
•
•
•
•
•
•
•
•
•
•
•
•
90nm process
Up to 960 I/Os
>200000 logic cells
Up to 552 18kb block RAMs
(~10Mb RAM)
192 DSP slices (18x18 multiplieraccumulator)
20 digital clock managers (DCM)
24 high-speed serial
transceivers (622Mb/s to
11.1Gb/s)
Up to four PowerPC 405 cores
•
•
•
90nm process
Up to 1170 I/Os
179000 logic elements
9.6Mb embedded RAM
96 DSP blocks: 380 18x18
multipliers
12 PLLs
Serial I/O up to 1Gb/s
No hard processor cores
27
ALTERA
28
Device Families & Tools
29
Device Roadmap
30
Technology
31
Logic Density
32
Pricing Roadmap
33
FLEX10K Basic Architecture
34
Logic Array Block: FLEX10K
35
Logic Element of FLEX10K
36
Advance Altera Architecture
37
Stratix Device
38
Stratix Device Family
39
Altera: Embedded DSP Blocks
• Two DSP Block columns per device
• Number varies by height of column
• Can implement:
– Eight 9x9 multipliers
– Four 18x18 multipliers
– One 36x36 multiplier
• Contains adder/subtractor/accumulator
• Registered inputs can become shift register
40
Altera: Embedded DSP Block
41
Embedded RAM
Dual-Port RAM
– M512 – 512 x 1
– M4K – 4096 x 1
– M-RAM – 64K x 8
42
Embedded RAM Block
43
ALTERA High Speed I/O
44
Embedded Processor
• Soft Processor: NIOS 32bit @150MHz
• Hard Processor: ARM922T 32bit RISC @200
MHz (Excalibur device)
• Additional features
–
–
–
–
–
Communication Controller
Integrated MMU (Memory Management Unit)
High-Speed Memory Interface
C-Level Simulation
Multi-Processor Support
45
NIOS II Family
46
Max II Device
47
Xilinx
48
Product Overview
High Volume
Low Cost
CPLD
Rom-based
High Performance
High Density
Low Power
Low Cost
49
Xilinx FPGA Families
• Old families
– XC3000, XC4000, XC5200
– Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for
modern designs.
• High-performance families
– Virtex (0.22µm)
– Virtex-E, Virtex-EM (0.18µm)
– Virtex-II, Virtex-II PRO (0.13µm)
• Low Cost Family
–
–
–
–
Spartan/XL – derived from XC4000
Spartan-II – derived from Virtex
Spartan-IIE – derived from Virtex-E
Spartan-3
50
Basic FPGA Architecture
Spartan-II
51
CLB Structure
COUT
G4
G3
G2
G1
Look-Up
Table O
Carry
&
Control
Logic
COUT
YB
Y
D
S
Q
CK
EC
CIN
CLK
CE
Look-Up
Table O
R
F5IN
BY
SR
F4
F3
F2
F1
G4
G3
G2
G1
Carry
&
Control
Logic
YB
Y
D
S
Q
CK
EC
R
F5IN
BY
SR
Look-Up
Table O
Carry
&
Control
Logic
XB
X
D
S
CK
EC
Q
F4
F3
F2
F1
R
SLICE
CIN
CLK
CE
Look-Up
Table O
Carry
&
Control
Logic
XB
X
D
S
Q
CK
EC
R
SLICE
• Contains 2 slices
• Each slice has 2 LUT-FF pairs with associated carry logic
• Two 3-state buffers (BUFT) associated with each CLB,
accessible by all CLB outputs
52
CLB Slice Structure
• Each slice contains two sets of the
following:
– Four-input LUT
• Any 4-input logic function,
• or 16-bit x 1 sync RAM
• or 16-bit shift register
– Carry & Control
• Fast arithmetic logic
• Multiplier logic
• Multiplexer logic
– Storage element
•
•
•
•
Latch or flip-flop
Set and reset
True or inverted inputs
Sync. or async. control
53
Example: 5-Input Functions implemented
using two LUTs
X5 X4 X3 X2 X1
0 0 0 0 0
0 0 0 0 1
0 0 0 1 0
0 0 0 1 1
0 0 1 0 0
0 0 1 0 1
0 0 1 1 0
0 0 1 1 1
0 1 0 0 0
0 1 0 0 1
0 1 0 1 0
0 1 0 1 1
0 1 1 0 0
0 1 1 0 1
0 1 1 1 0
0 1 1 1 1
1 0 0 0 0
1 0 0 0 1
1 0 0 1 0
1 0 0 1 1
1 0 1 0 0
1 0 1 0 1
1 0 1 1 0
1 0 1 1 1
1 1 0 0 0
1 1 0 0 1
1 1 0 1 0
1 1 0 1 1
1 1 1 0 0
1 1 1 0 1
1 1 1 1 0
1 1 1 1 1
Y
0
1
0
0
1
1
0
0
1
0
0
1
1
1
1
1
0
0
0
0
0
0
0
1
0
1
0
1
0
1
0
0
LUT
OUT
A4
A3
LUT
ROM
RAM
D
A2
A1
WS
DI
F5
0
LUT
F4
A4
F3
A3
F2
A2
F1
A1
BX
WS
DI
D
1
F5
GXOR
X
G
LUT
ROM
RAM
nBX
BX
1
0
54
Dedicated Expansion Multiplexers
• MUXF5 combines 2 LUTs to create
– Any 5-input function (LUT5)
– Or selected functions up to 9 inputs
– Or 4x1 multiplexer
• MUXF6 combines 2 slices to form
– Any 6-input function (LUT6)
– Or selected functions up to 19 inputs
– 8x1 multiplexer
CLB
Slice
MUXF6
LUT
LUT
MUXF5
Slice
LUT
LUT
MUXF5
55
Distributed RAM
RAM16X1S
• CLB LUT configurable as
Distributed RAM
– A LUT equals 16x1 RAM
– Implements Single and
Dual-Ports
– Cascade LUTs to increase
RAM size
• Synchronous write
• Synchronous/Asynchronous
read
– Accompanying flip-flops
used for synchronous read
D
WE
WCLK
A0
A1
A2
A3
=
LUT
O
RAM32X1S
D
WE
WCLK
A0
A1
A2
A3
A4
LUT
=
LUT
or
O
RAM16X2S
D0
D1
WE
WCLK
A0
A1
A2
A3
O0
O1
or
RAM16X1D
D
WE
WCLK
A0
SPO
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
56
Fast Carry Logic

Each CLB contains separate logic
and routing for the fast generation
of sum & carry signals
MSB

Carry logic is independent of
normal logic and routing resources
Carry Logic
Routing
– Increases efficiency and performance
of adders, subtractors, accumulators,
comparators, and counters
LSB
57
Basic I/O Block Structure
Q
D
Three-State
EC
FF Enable
Clock
Three-State
Control
SR
Set/Reset
•
•
•
•
Each IOB can work as
uni- or bi-directional I/O
Outputs can be forced
into High Impedance
Inputs and outputs can
be registered
Inputs can be delayed
Q
D
Output
EC
FF Enable
Output Path
SR
Direct Input
FF Enable
Registered
Input
Input Path
Q
D
EC
SR
58
Advance Xilinx Architecture
59
Virtex-II Pro
• 130nm CMOS Copper
Low-K
• 1200 I/Os, 1696 Pin
Package
• 125,000 Logic Cells
• 10 Megabits of RAM
• 556 XTREME DSP
Multipliers
• 16 3.125 Gbps
transceivers
• 4 PowerPC CPUs
Virtex-II Pro
60
Vertex-II Pro
PowerPC 405
Dedicated
multipliers
and memory
Digital Clock Management (DCM)
provides
• 16 independent clock domains
• Clock divide, multiply, phase shift
• Enhanced Phase Locked Loops (PLLs)
Routing Resources (90%)
61
Block RAM
• Most efficient memory implementation
– 4 to 14 memory blocks
• 4096 bits per blocks
– Use multiple blocks for larger memories
Spartan-II
True Dual-Port
Block RAM
Port B
• Ideal for most memory requirements
Port A
– Dedicated blocks of memory
Block RAM
• Builds both single and true dual-port RAMs
62
Dual-Port Bus Flexibility
RAMB4_S4_S16
WEA
Port A In
1K-Bit Depth
ENA
RSTA
DOA[3:0]
Port A Out
4-Bit Width
DOB[15:0]
Port B Out
16-Bit Width
CLKA
ADDRA[9:0]
DIA[3:0]
WEB
ENB
Port B In
256-Bit Depth
RSTB
CLKB
ADDRB[7:0]
DIB[15:0]
• Each port can be configured with a different data bus width
• Provides easy data width conversion without any additional logic
63
Two Independent Single-Port RAMs
RAMB4_S1_S1
Port A In
2K-Bit Depth
VCC, ADDR[10:0]
WEA
ENA
RSTA
DOA[0]
Port A Out
1-Bit Width
CLKA
ADDRA[10:0]
DIA[0]
Port B In
2K-Bit Depth
GND, ADDR[10:0]
WEB
ENB
RSTB
DOB[0]
Port B Out
1-Bit Width
CLKB
ADDRB[10:0]
DIB[0]
• Can split a Dual-Port 4K RAM into two Single-Port 2K RAM
– Simultaneous independent access to each RAM
• To access the lower RAM
– Tie the MSB address bit to Logic Low
• To access the upper RAM
– Tie the MSB address bit to Logic High
64
Rocket I/O
• From 4 to 24 RocketIO MGTs per Virtex-II Pro™ device
• Continuous operating range 622 Mbps to 3.125 Gbps
Virtex 4: 11.1 Gbps !!!
65
Embedded Processor
• Soft Processor: MicroBlaze 32bit @150MHz
• Hard Processor: IBM PowerPC405 32bit RISC @300MHz
(in Vertex-II Pro)
–
–
–
–
–
–
–
–
Low Power Consumption: 0.9 mW/MHz
Five-Stage Data Path Pipeline
Hardware Multiply/Divide Unit
Thirty-Two 32-bit General Purpose Registers
Memory Management Unit (MMU)
Dedicated On-Chip Memory (OCM) Interface
Supports IBM CoreConnect™ Bus Architecture
Debug and Trace Support
66
FPGA Design Tools
67
Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be able
to perform an encryption algorithm by itself,
executing 32 rounds…..
Specification (Lab Experiments)
VHDL description (Your Source Files)
Library IEEE;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
Functional simulation
entity RC5_core is
port(
clock, reset, encr_decr: in std_logic;
data_input: in std_logic_vector(31 downto 0);
data_output: out std_logic_vector(31 downto 0);
out_full: in std_logic;
key_input: in std_logic_vector(31 downto 0);
key_read: out std_logic;
);
end AES_core;
Synthesis
Post-synthesis simulation
68
Design process (2)
Implementation
Timing simulation
Configuration
On chip testing
69
Active-HDL
70
Simulation and Synthesis Tools
71
Logic Synthesis
VHDL description
Circuit netlist
architecture MLU_DATAFLOW of MLU is
signal
signal
signal
signal
begin
A1:STD_LOGIC;
B1:STD_LOGIC;
Y1:STD_LOGIC;
MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;
A1<=A when (NEG_A='0') else
not A;
B1<=B when (NEG_B='0') else
not B;
Y<=Y1 when (NEG_Y='0') else
not Y1;
MUX_0<=A1
MUX_1<=A1
MUX_2<=A1
MUX_3<=A1
and B1;
or B1;
xor B1;
xnor B1;
with (L1 & L0) select
Y1<=MUX_0 when "00",
MUX_1 when "01",
MUX_2 when "10",
MUX_3 when others;
end MLU_DATAFLOW;
72
Features of synthesis tools
• Interpret RTL code
• Produce synthesized circuit netlist in a
standard EDIF format
• Give preliminary performance estimates
• Some can display circuit schematics
corresponding to EDIF netlist
73
Implementation
• After synthesis the entire implementation process
is performed by FPGA vendor tools
• Xilinx ISE foundation 11.1i
• Altera Quartus II 9.2
• 3rd party tools for alliance version
74
Circuit Compilation
1. Technology Mapping
LUT
2. Placement
LUT
?
Assign a logical
LUT to a physical
location.
3. Routing
Select wire segments
And switches for
Interconnection.
75
Routing Example
FPGA
Programmable Connections
76
Configuration
• Once a design is implemented, you must
create a file that the FPGA can understand
– This file is called a bit stream or configuration file
• The BIT file can be downloaded directly to the
FPGA, or can be converted into a PROM file
which stores the programming information
77
QUESTIONS?
THANK YOU
78