vir - "PLDWorld.com"
Download
Report
Transcript vir - "PLDWorld.com"
Redefining the FPGA
The first fully programmable system solution
designed specifically for intellectual property.
Agenda
Technology Roadmap
Redefining the FPGA
Architecture Overview
The CLB Tile, Vector Based Interconnect, Internal Bus
Support, SelectRAM+, Clocking & DLLs, SelectI/O,
Thermal Management & The SelectMap Interface
Software & Cores Support
Summary - A System Level Solution
Technology Roadmap
Virtex
Density/Performance
1 Million+ System Gates with
High Performance System Solution
5LM - 0.25µm (7LM - 0.18µm)
XC4000XV
3LM - 0.25µm
(XC40250XV)
XC4000XL
3LM - 0.35µm
(XC4085XL)
XC4000E
2LM - 0.5µm
(XC4025E)
1995
XC4000EX
2LM - 0.5µm
(XC4036EX)
1996
1997
1998
1999
Redefining the FPGA
Chip 1
Chip 2
133MHz SDRAM
3
1x CLK
SRAM Cache (Mbytes)
2x CLK
LVCMOS
SSTL3
4
LVTTL
GTL+
1
2
Low Voltage
CPU
High Speed System Backplane
"Virtex moves FPGAs from
glue to system component”
Redefining the FPGA
2
System
Integration
1
4
System Memory
3
System
Timing
System Interfaces
Value Extends Beyond the Socket
Redefining the FPGA
Advanced Process Technology Allows for
Almost 10x the Density of Today’s FPGAs
System
Integration
Extremely Dense
2ns
1
2ns
1,728 to 27,648 Logic Cells
Predictable Routing Delays Produce
a Core Friendly Architecture
With Fast Place & Route Times
Redefining the FPGA
2
System Memory
200 MHz Distributed SelectRAM
200 MHz Block SelectRAM
RAMB4_S4_S16
200 MHz Access to External Memory
WEA
ENA
RSTA
CLKA
ADDRA[9:0]
DIA[3:0]
DOA[3:0]
WEB
ENB
RSTB
CLKB
ADDRB[7:0]
DIB[15:0]
DOB[15:0]
Redefining the FPGA
CLKDLL
CLKIN
CLKFB
CLK0
CLK90
CLK180
CLK270
90 MHz
DLL
CLK2X
CLKDV
LOCKED
RST
3
CLK
DLL
DLL
Virtex
Route to Other Devices
45 MHz
(Divide by 2)
180 MHz
(Multiply by 2)
System
Timing
Redefining the FPGA
5.0V
1.8V
PCI
3.3V
2.5V
SelectI/O Allows Connection
Directly to External Signals of
Varied Voltages & Thresholds
SSTL HSTL
Future Standards Can be
Supported Without Having
to Make Silicon Changes
4
GTL
System Interfaces
GTL+ AGP
Redefining the FPGA
1 System Integration
Intellectual Property is Critical for High Density Design &
Must Drop in Easily Without Penalty Across an Entire Family
2 System Memory
Memory Bandwidth is Always Key
Size & Depth Requirements Vary Depending on the
Application
3 System Timing
Chip to Chip Performance Typically Limits System Speeds
Clock Skew is an Important Factor in High Performance
Systems
4 System Interfaces
Process Technology Leads to Mixed Voltage Systems
High performance, Lower Power Signal Standards Have
Emerged
Redefining the FPGA
New Modules
IP Modules
AllianceCore
133Mhz
SDRAM
VHDL Design
Environment
Verilog Design
Environment
Designer #1
Designer #2
CoreGen
DSP
FIFO
Design
Reuse
Giga-bit
Ethernet
CPU
LogiCore
66Mhz
PCI
Virtex
160 MHz I/O
133 MHz Memory
1 Million+
System Gates
Redefining the FPGA
Extremely Dense
50,000 to 1,000,000 System
Gates
1,728 to 27,648 Logic Cells
System Performance & Features
160 MHz+ System Performance
Multiple DLLs & Block
SelectRAM
Supports Multiple I/O Standards
IP
Software
Internal Performance & Features
System
Building Blocks
100 MHz+ at 3 to 4 Logic Levels
TBUFs & Distributed SelectRAM
Fast, Flexible I/Os
Superior Intellectual Property
Infrastructure - CoreGen & Web
Segmented Routing
4-Input LUT Architecture
Leading Edge Process Technology
Proven Software Flows for High
Density & Performance - M1.5 The World’s First Fully Programmable System-Level Architecture
Architecture Overview
2ns
RAMB4_S4_S16
2ns
WEA
ENA
RSTA
CLKA
ADDRA[9:0]
DIA[3:0]
DOA[3:0]
WEB
ENB
RSTB
CLKB
ADDRB[7:0]
DIB[15:0]
DOB[15:0]
2
1
Block SelectRAM
The CLB Tile
Thermal Management
SelectMAP
Configuration
Distributed SelectRAM
CLKDLL
GTL
CLK0
GTL+
AGP
CLK90
CLKIN
CLKFB
CLK180
CLKDV
LOCKED
3
DLL
1.8V
3.3V
2.5V
CLK270
CLK2X
RST
5.0V
PCI
SSTL
4
SelectI/O
HSTL
The CLB Tile
Advanced Process Technology Allows for
Almost 10x the Density of Today’s FPGAs
System
Integration
Extremely Dense
2ns
1
2ns
1,728 to 27,648 Logic Cells
Predictable Routing Delays Produce a
Core Friendly Architecture With
Much Faster Place & Route Times
All CLB Inputs Have Access
to Interconnect on All 4
Sides
INTERNAL BUSSES
CARRY
CARRY
SINGLE
HEX
CLB Tile is Composed of a
Switch Matrix, Configurable
Logic Block, and Associated
General Routing Resources
LONG
The CLB Tile
TRISTATE BUSSES
LONG
LONG
HEX
HEX
SWITCH
MATRIX
SINGLE
SINGLE
SLICE
DIRECT
CONNECT
Local
Feedback
Slices Have a Bit Pitch of 2
CLB
CARRY
Fast Local Feedback Within
the CLB & Direct Connects
to Adjacent Horizontal
Neighbors
SLICE
CARRY
SINGLE
DIRECT
CONNECT
HEX
Wide Single CLB Functions
LONG
CLB is Divided into Two
Identical Slices
Simplified CLB Structure
CLB
Slice
LUT
Slice
Carry
PRE
D
Q
CE
LUT
Carry
CLR
LUT
Carry
PRE
D
Q
CE
CLR
PRE
D
Q
CE
CLR
LUT
Carry
PRE
D
Q
CE
CLR
2 Slices in Each CLB
Virtex Slice is Similar in Contents to the Current XC4000 CLB
2 BUFTs Associated with Each CLB, Accessible by All 8 CLB Outputs
Detailed Slice Structure
COUT
G1
G2
G3
G4
A1
A2
A3
A4
O
WS
DI
YB
1
LUT/RAM/ROM/SHIFT
0 1
Y
*
0
1
D
BY
S
Q
YQ
CE
CLK
R
Write
Strobe
Logic
Data In
Multiplex
Logic
CE
SR
GSR
F5 from
other slice
XB
Position of
F5 tap on
other slice
WS
A1
A2
A3
A4
F1
F2
F3
F4
DI
1
0 1
X
O
LUT/RAM/ROM/SHIFT
*
D
0
1
S
Q
XQ
CE
R
* Controlled by the same pair of memory cells
** Implemented as extra inputs on the BX input mux
*** CLK and SR inputs are common to both slices
BX
1 0
CIN
Wide Single CLB Functions
2.5ns
CLB
Slice
Slice
0.3ns
1.1ns
1.1ns
LUT
LUT
Implement 13-Input Functions in a Single CLB
Builds on XC4000 Architecture 9-Input Function
2 Logic Levels and 1 Local Interconnect Yield a 2.5ns Max Delay
Slice Features
Two 4-Input LUTs in Each Slice
Includes 2 Highly Flexible Sequential Elements
Dedicated Logic for 4x1 & 8x1 Muxes
Fast Look Ahead Carry Logic
Dedicated Multiplier Fabric
New SelectShift Feature
Create Shift Registers up to 16 Cycles Deep in a Single 4Input LUT
4-Input LUTs can be used as Distributed SelectRAM
Same as XC4000 Synchronous Modes - Single & Dual Port
Flexible Sequential Elements
Sequential Elements Can be
Flip-flops or Latches
FDRSE
D
S
CE
2 in Each Slice, 4 in Each CLB
Can be Sourced from LUTs or
an Independent CLB Input
Separate Set & Reset Controls
Controls Can be
Synchronous or
Asynchronous
GSR Can be Used for
Power On Set/Reset
All Controls Can be Inverted
Controls are Shared Within
Each Slice
Q
R
FDCPE
D PRE
Q
CE
CLR
LDCPE
D PRE
CE
G
CLR
Q
Fast Efficient Muxes
Primary Use of XC4000 HMAP
was to Implement a 2x1 Mux
Dedicated Muxes are Faster &
More Space Efficient
Space Freed Up is Used
for Muxes & Other Special
Logic
MUXF5 Can be Used to
Combine the Two LUTs in a
Slice to Create a 4x1 Mux or
Any Function of 5 Inputs
CLB
Slice
LUT
MUXF6
LUT
MUXF5
Slice
LUT
LUT
MUXF6 Can be Used to
Combine the Two Slices in a
CLB to Create an 8x1 Mux or
Any Function of 6 Inputs
MUXF5
Fast Look Ahead Carry Logic
0
1
LUT
0
1
LUT
0
1
LUT
0
1
LUT
Simple, Fast & Complete Arithmetic Logic
Vertical, Up Only Carry Direction
Look Ahead Carry Implementation Yields 32-Bit Counters &
Arithmetic Functions that Perform at 100MHz+
Discrete XOR Component for Single Level Sum Completion
2 Separate Carry Chains in CLB Allow for 3 Operand Functions
Dedicated Multiplier Fabric
LUT
A
CY_MUX
CO
S
DI
CI
CY_XOR
MULT_AND
AxB
LUT
B
LUT
Highly Efficient ‘Shift & Add’ Implementation
Logic Added for Implementation of Binary Tree Style Multipliers
30% Reduction in Area for a 16x16 Multiply & 1 Less Logic Level
SelectShift
Dynamically Addressable Shift
Registers - DASRs
LUT
Ultra-Efficient Programmable Clock
Cycle Delay
Serial In, Serial Out, Clock, Clock
Enable, and Shift Depth Address
Single LUT Maximum Cycle Delay
of 16
Cascade DASRs for Cycle Delays
Greater than 16
CLB Flip-Flops Can be Used for
Other Functions or to Add to DASR
Depth
IN
CE
CLK
D
Q
CE
D
Q
CE
D
Q
CE
CLB
Slice
Slice
LUT
LUT
LUT
LUT
D
Q
CE
DEPTH[3:0]
OUT
SelectShift
12 Cycles
64
Operation A
Operation B
4 Cycles
8 Cycles
64
Operation C
3 Cycles
9-Cycle Imbalance
3 Cycles
Register Rich FPGAs Allow for the Addition of
Pipeline Stages to Increase Throughput
Data Paths Must be Balanced to Maintain Desired
Functionality
SelectShift
12 Cycles
64
Operation A
Operation B
4 Cycles
8 Cycles
Operation C
Operation D - NOP
3 Cycles
9 Cycles
64
Paths Statically
Balanced
12 Cycles
SelectShift Feature of the 4-Input LUT Can be
Used to Create NOPs
Above Example Uses 64 LUTs to Replace 576 Flip-flops (64*9)
SelectShift
(continued)
12 Cycles
64
Operation A
Operation B
4 Cycles
8 Cycles
Operation C
3 Cycles
3 Cycles
# NOP Cycles
64
1/10 Cycles
Operation D - NOP
Paths Dynamically
Balanced
SelectShift Depth Can be Dynamically Changed
Above uses 64 LUTs to Replace 704 Flip-flops & 64 2x1 Muxes
Paths Statically
Balanced
Internal Bus Support
One Pair of BUFTs Associated with Each CLB
Same ‘Pitch’ as Slice Carry Logic - 2 Bits/Slice
Each BUFT has an Independent Control Input
All CLB Outputs can Source Either BUFT Data Input
Combine BUFTs to Create Wide Muxes
Replace LUT Based Mux Logic to Increase Density
Much Faster than Previous Architectures
Approximately 10ns to Span Entire XCV1000 - 96
Columns
Ties Groups of 4 BUFTs with Bi-directional Look Ahead
Scheme Similar to Slice Carry Logic
Internal Bus Support
And-Or Implementation Replaces Three-State Drivers
Simultaneously Driving BUFTs will not Cause Contention
Capacitance of Entire Load Reduced Dramatically
Slow, Power Hungry Pullups & Weak Keepers Unnecessary
Output Flexibility
Removal of Pullups Allows for Outputs to Span Rows
Segments of 4 Columns Allow for Many Outputs Per Row
High Performance Routing
General Purpose Routing
2ns
Routing Delay Depends on
Radial Distance
Routing Structure
Designed to Handle High
Fanout Nets
2ns
1000+ Loads - Sub 10ns
Much More Predictable
Predictability is Critical for
Core Integration & Reuse
Optimized for 5 Layer Metal
CLB Array
High Performance Routing
Significant Compile Time
Reduction Without Performance
Penalty
CARRY
CARRY
SINGLE
HEX
HEX
HEX
SWITCH
MATRIX
SINGLE
DIRECT
CONNECT
SINGLE
SLICE
SLICE
DIRECT
CONNECT
Local
Feedback
CLB
CARRY
Algorithmically Friendly Structure
LONG
CARRY
LONG
SINGLE
TRISTATE BUSSES
INTERNAL BUSSES
HEX
Allows For Optimal Connection
Delay, Power, Capacitance &
Resource Utilization
Combined With Timing Driven
Place & Route Yields Superior
Path Delays
Increasing Device Utilization Does
Not Decrease Design Performance
Resource Mix Optimized for Large
Devices - Optimized for 5 LM
LONG
LONG
Segmented Routing Architecture
High Performance Routing
Advanced Local CLB Routing
Massive Hierarchical General Routing Resources
Designed For Speed
24 Singles, 72 Hexes, 12 Longs per Tile
(4KXL: 8 Singles, 4 Doubles, 12 Quads, 12 Longs per Tile)
Selective Connectivity Between Resource Types to Limit
Loading
Longs and Hexes Can be Used as Secondary Global
Resources for Clocks and Controls With Sub 10ns
Delays
Special Backbone Routing in Top and Bottom I/O Edges
to Connect Vertical Longs to Create Low Skew
Resources
Increased Switch Matrix Connectivity
Higher Connectivity Eliminates Congestion
Advanced Local CLB Routing
Each LUT Output Can Connect to
the Three Other LUTs
100ps to 300ps Maximum Delay
Create 13-Input Functions Within
the Same CLB - 2.5ns Total Delay
Synthesis Tools Use FastConnects
on Critical Paths
IMUX Receives 96 Connections
from General Routing Matrix (GRM)
Highly Exhaustive Connection
Matrix
OMUX Equivalent to 8-bit 13x1 Mux
All 8 Outputs Connect to the GRM
2 Outputs Can be Used to Connect
Directly to the Horizontal
Neighbors
All Outputs Can Feed the 2 BUFTs
CLB
Slice
LUT
LUT
Slice
LUT
LUT
Massive Hierarchical Resources
Routing Needs Based On XCV1000
Loading of Resources Minimized
While Connectivity Increased
Both Long Lines & Hexes are
Buffered To Reduce RC Delays
Longs Have Access Every 6 Tiles
Hexes Have Access at Ends &
Middle
Special Hexes Added to Top and
Bottom to Create High Fanout
Resources with Vertical Long
Lines
Horizontal Singles Connect
Directly to Vertical Long Lines for
Fast Control Signal Distribution
Increased Matrix Connectivity
Previous Families Use Planar Pipulation
Allows for Routing Along Same Channel
Restricts Connectivity of Dissimilar
Resources
Planar pipulation
Virtex Devices Use Non-Planar
Pipulation
Allows for Routing Across Resource
Types
Longs Drive Hexes, Hexes Drive Hexes
and Singles, Singles drive Singles and
CLB IMUXs - Vertical Hexes Drive CLB
Controls Inputs As Well
CLB OMUXs Drives All Types
Switch Matrix Connectivity Determines
Design Routabilty
Increased Switch Matrix Connectivity
Alleviates Congestion
Non-Planar pipulation
SelectRAM+
2
System Memory
200 MHz Distributed SelectRAM
200 MHz Block SelectRAM
RAMB4_S4_S16
200 MHz Access to External Memory
WEA
ENA
RSTA
CLKA
ADDRA[9:0]
DIA[3:0]
DOA[3:0]
WEB
ENB
RSTB
CLKB
ADDRB[7:0]
DIB[15:0]
DOB[15:0]
SelectRAM+ Hierarchy
Distributed SelectRAM
Proven Synchronous RAM of the XC4000 Families
16x1 Implemented in a LUT - 4 in Each CLB
32x1 Implemented in a Slice - 2 in Each CLB
Ideal for DSP Applications
Block SelectRAM
True Dual Port, Fully Synchronous RAM
4096-Bit Block Configurable in Widths From 1 to 16
Ideal for Data Buffers & FIFOs
Fast Access to External RAM
133MHz Direct Interface to SSTL3, 3.3V Synchronous DRAM
Distributed SelectRAM
Builds on XC4000 Tradition
Synchronous Write
Asynchronous Read
No Asynchronous Write
LUT
Use a Single LUT to Create a
RAM16X1S
Use a Pair of LUTs to Create a
RAM32X1S or RAM16X1D
RAM16X1D Comes With One
R/W Address & One Read Only
Address
Accompanying Flip-Flops Can
Be Used to Register Read
Slice
LUT
LUT
RAM16X1S
D
WE
WCLK
A0
O
A1
A2
A3
RAM32X1S
D
WE
WCLK
A0
O
A1
A2
A3
A4
RAM16X1D
D
WE
WCLK
A0
SPO
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
Block SelectRAM
True Dual Port Synchronous RAM
2 R/W Ports with Independent
Controls
Synchronous Read & Write
RAMB4_S#_S#
WEA
ENA
RSTA
CLKA
ADDRA[#:0]
DIA[#:0]
Block Count Increases With FPGA
Size
Flexible 4096-Bit Block
Variable Aspect Ratio
Each Port can be a Different Width
Synchronous Reset & INIT Values
WEB
ENB
RSTB
CLKB
ADDRB[#:0]
DIB[#:0]
8 Blocks in the XCV50 - 32Kb
32 Blocks in the XCV1000 - 128Kb
Located on Left & Right Sides with 1
Block Every 4 Rows
State Machines, Decodes, Etc
Sub-10ns Cycle Time For All Widths
DOA[#:0]
DOB[#:0]
Allowed Widths
ADDR
(11:0)
(10:0)
(9:0)
(8:0)
(7:0)
DATA
(0:0)
(1:0)
(3:0)
(7:0)
(15:0)
#/Width
1
2
4
8
16
Depth
4096
2048
1024
512
256
Block SelectRAM
Library Name Specifies Port Configuration
RAMB4_S4_S16
WEA
ENA
Port A In
1K-Bit Depth
RSTA
DOA[3:0]
Port A Out
4-Bit Width
DOB[15:0]
Port B Out
16-Bit Width
CLKA
ADDRA[9:0]
DIA[3:0]
WEB
ENB
Port B In
256-Bit Depth
RSTB
CLKB
ADDRB[7:0]
DIB[15:0]
Each Dual Port can be configured with a different width
Block SelectRAM
The Dual Ports Access the Same 4096
Bits
4096-Bit Storage When Viewed
by a Port Configured as 1kx4
Nibble
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Combine Blocks For Additional Depth &
Width
The Depth/Width Ratio Determines How
the Bits are Accessed
For Example:
A RAMB4_S4_S16 Has a 1kx4 Port & a
256x16 Port
Provides Easy Data Width Conversion
Without Any Additional Logic
Bit0
Bit4
Bit8
Bit12
Bit16
Bit20
Bit24
Bit28
Bit32
Bit36
Bit40
Bit44
Bit48
Bit52
Bit56
Bit60
DOA[0:3]
Bit1
Bit2
Bit5
Bit6
Bit9
Bit10
Bit13
Bit14
Bit17
Bit18
Bit21
Bit22
Bit25
Bit26
Bit29
Bit30
Bit33
Bit34
Bit37
Bit38
Bit41
Bit42
Bit45
Bit46
Bit49
Bit50
Bit53
Bit54
Bit57
Bit58
Bit61
Bit62
4096-Bit Storage When Viewed by a Port Configured as 256x16
Word
1
2
3
4
Bit0
Bit16
Bit32
Bit48
Bit1
Bit17
Bit33
Bit49
Bit2
Bit18
Bit34
Bit50
Bit3
Bit19
Bit35
Bit51
Bit4
Bit20
Bit36
Bit52
Bit5
Bit21
Bit37
Bit53
Bit6
Bit22
Bit38
Bit54
DOB[0:15]
Bit7
Bit8
Bit23 Bit24
Bit39 Bit40
Bit55 Bit56
Bit9
Bit25
Bit41
Bit57
Bit10
Bit26
Bit42
Bit58
Bit11
Bit27
Bit43
Bit59
Bit12
Bit28
Bit44
Bit60
Bit13
Bit29
Bit45
Bit61
Bit14
Bit30
Bit46
Bit62
Bit15
Bit31
Bit47
Bit63
Bit3
Bit7
Bit11
Bit15
Bit19
Bit23
Bit27
Bit31
Bit35
Bit39
Bit43
Bit47
Bit51
Bit55
Bit59
Bit63
Block SelectRAM
RAMB4_S1
0
WE
1
EN
0
RST
Clock
A[31:20]
N/C
CLK
DO
4095
FFFXXXXX
4094
FFEXXXXX
4093
FFDXXXXX
Subdivide 32-Bit
Address Space into
4096 1MB Blocks
Enable
ADDR[11:0]
DI[7:0]
Using a DLL, the Enable is Available Only 5.1ns
After the Rising Edge of the External System Clock
0002
002XXXXX
0001
001XXXXX
0000
000XXXXX
Build State Machines & PROM Based Address Decodes
Clocking & DLLs
CLKDLL
CLKIN
CLKFB
CLK0
CLK90
CLK180
CLK270
90 MHz
DLL
CLK2X
CLKDV
LOCKED
RST
3
CLK
DLL
DLL
Virtex
Route to Other Devices
45 MHz
(Divide by 2)
180 MHz
(Multiply by 2)
System
Timing
General Clock Support
4 Dedicated Global Low Skew Buffers
Dedicated Input Pin - Intended to Distribute Clocks Only
66 MHz PCI Performance With 500ps Maximum Skew
–
–
3ns TSetup /0ns THold - Input IOB Flip-flop with No Data Delay
6ns TClock2Out - Output IOB Flip-flop
24 Additional Shared Resources
Intended to Distribute Low Skew/High Fanout Signals
Distribute Control Signals Across the Device under 10ns
–
additional clocks, clock enables, three-state controls & resets
4 Delay Lock Loops on Each Device
100% Digital Implementation
2 Global Buffers Associated with Each DLL Pair
DLLs Versus PLLs
Both types are used to remove clock delay & provide
additional clocking functionality
Frequency synthesis, Phase adjustment & clock conditioning
Both can be implemented using either analog or digital logic
CLKIN
Programmable
Delay Line
Control
Logic
CLKOUT
Programmable
Oscillator
Clock
Distribution
CLKIN
CLKFB
DLLs use Programmable Delay Line
in Conjunction with Control Logic
that Selects the Delay to Match the
Distribution
Control
Logic
CLKOUT
Clock
Distribution
CLKFB
PLLs use Programmable Oscillators in
Conjunction with Phase Detectors &
Filters to Phase Adjust the Clock
DLLs Versus PLLs
The Oscillator Used in a PLL Inherently
Introduces Instability & Phase Error
The DLL Architecture is Unconditionally Stable
and Does Not Accumulate Phase Error
It is Generally Accepted that DLLs are Better for
Delay Compensation and Clock Conditioning
PLLs Typically Have an Advantage When
Performing Frequency Synthesis and Can
Operate Over a Larger Input Clock Frequency
DLL Functions
Virtex
Speedup Tc2o
Zero-Delay Internal Clock Buffer
Clock Phase Synthesis
For Use Internally Or
Externally
Virtex
Clock Multiplication &
Division
For Use Internally Or
Externally
Clock Mirror
Zero-Delay Board Clock
Buffer
DLL Functions
Speedup Tc2o by Eliminating Clock Distribution Delay
Generate Phase Shifted Clocks
Perform Clock Multiplication & Division
Cleanup Clocks with 50/50 Duty Cycle Correction
Generate Clock Lock for Internal & External Use
Can Require Configuration to Synchronize with DLL Lock
DLL Feedback can be Connected Internally or
Externally
Can be Used to Create Clock Mirrors & Perform
System Synchronization
DLL Tc2o Speedup
Tclock = 0ns
DLL
CLKext
D Q
>
OUT
Tc2q + Tout = Tc2o
CLKint
Nullify Clock Delay - Fast Tc2o on XCV1000
External CLKext pin and Internal CLKint pin are Aligned
2.5ns Setup/0.0ns Hold & 3.5ns Tc2o on All Devices
Optional Duty Cycle Correction
50/50 Duty Cycle Correction Applied when Specified
Not sensitive to clock input noise - use standard cans
DLL Phase Shift
Coarse Phase Shifts
Available
0°, 90°, 180°, and 270°
Available for Internal &
External Use
50/50 Duty Cycle
Correction Available
100MHz - 180° Phase Shift
DLL
100 MHz
(0 Phase)
100 MHz
(180° Shift)
DLL Multiplication
16
16
32
Data
Buffer
IO
Internal
Logic
2x
DLL
CLK
x
Generate 2x & 4x Clocks
Reduce Board EMI and Trace Concerns by Routing Low
Frequency Clocks Externally and Multiplying Internally
Cross Clock Domains Without Worry
Multiplied & Divided Clocks Have Synchronized Edges
No External Clock Drift & Minimal External Clock Skew Eliminates Metastable Events
DLL Multiplication
2 DLLs on Top & Bottom
Use 1 DLL on an Edge for
2x Multiplication or Both
for 4x Multiplication
180 MHz Maximum Output
Frequency
66MHz - 2x Clock Multiplication
DLL
66 MHz
132 MHz
(Multiply by 2)
DLL Division
Selectable Division Values
1.5, 2, 2.5, 3, 4, 5, 8, or 16
50/50 Duty Cycle
Correction Available
Use DLL Pair to Combine
Functions
Input
180
2X
30 MHz - 180° Phase Shift
DV2
DLL
30 MHz
(180° Shift)
30 MHz
30 MHz
Used for FB
30 MHz
(180° Shift)
DLL
15 MHz
(Divide by 2)
60 MHz
(Multiply by 2)
30 MHz 180° Phase Shift - Clock Multiply & Clock Divide
Clock Mirrors
Generate Clock Mirrors for
Cascaded & Other Devices
Extremely Low Output
Skew
Rising Edge Skew -20ps*
Falling Edge Skew +40ps*
*Actual Device Measurements
100MHz - 100MHz Clock Mirror
DLL
100 MHz
LVTTL
100 MHz
LVTTL
Feedback from
External Trace
Input
Output
System Synchronization
Synchronize All Devices
CLK
DLL
DLL
FPGA 1
DLL
FPGA 2
DLL
FPGA 3
Eliminate Clock Skew
Nullify Clock Input & Board
Delay in Addition to Internal
Distribution Delay
Chip to Chip Race
Conditions Removed
Increase Chip to Chip
Interface Speed - 160MHz
DLL
FPGA N
DLL Modes
Low Frequency
Input Frequency Range - 25 MHz to 100 MHz
Minimum High/Low Time - 2.2 ns
All 6 Outputs Available for use Internally & Externally
–
CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV
High Frequency
Input Frequency Range - 60 MHz to 200 MHz
Minimum High/Low Time - 2.2 ns
3 Outputs Available for use Internally & Externally
–
CLK0, CLK180 & CLKDV
Both Modes Supported with Simple Design
Primitives
VHDL & Verilog Simulation Support Available
DLL Software Support
Use BUFGDLL Macro for
Common Clock Usage
BUFGDLL
0ns
Build Complex Structures
Using CLKDLL Primitive
CLKDLL
CLKIN
CLKFB
RST
Equivalent Structure
CLK0
CLK90
CLK180
CLK270
CLK2X
CLKDV
LOCKED
PAD
BUFG
IBUFG
DLL
FB
To distributed
clock network
SelectI/O
5.0V
1.8V
PCI
3.3V
2.5V
SelectI/O Allows Connection
Directly to External Signals of
Varied Voltages & Thresholds
SSTL HSTL
Future Standards Can be
Supported Without Having
to Make Silicon Changes
4
GTL
System Interfaces
GTL+ AGP
Supply Voltage Migration
Lower cost
Faster speed
Higher density
Lower power
1.2
Feature Size (µm)
1.0
0.8
Virtex FPGAs Ship
0.6
Voltage
5.0
0.4
0.2
0
1990
1992
1994
1996
1998
2000
3.3
2.5
1.8
1.3
2002
Process Technology Migration Leads to Mixed Voltage Systems
Supply Voltage Migration
5V
3.3 V
2.5 V
I/O
Supply
Accepts
5 V levels
Any
5V
device
(XC4000E)
5V
3.3 V
Logic
Supply
Virtex
&
XC4000XV
2.5 V logic
3.3 V I/O
3.3 V
3.3 V
Meets TTL
Levels
Supply Voltage Sequencing Independent
Virtex Supports Additional I/O Standards
Any
3.3 V
device
(XC4000XL)
SelectI/O
Allows Connection & Use of a Wide Variety of
Devices
Processors, Memory, Bus Specific Standards, Mixed Signal...
Provides Industry Standard IEEE/JDEC I/O Standards
Maximizes Speed/Noise Tradeoff - Use Only What is Needed
Can Connect to or Create High Performance Backplanes
– PCI, GTL+, HSTL
– DIY - Virtex Based Backplane Design in Progress
Define I/O by Simply Placing Desired Input And/Or
Output Buffers Into the Design
Special IBUF and OBUF Components Provided in Schematic
Based and HDL Based Design Flows
For Example: SSTL3, Class I Output Buffer - OBUF_SSTL3_I
Simplified IOB Structure
Fast I/O Drivers
Separate Registers for
Input, Output & ThreeState Control
Asynchronous Set or
Reset Available on Each
Flip-flop
Common Clock, Separate
Clock Enables
Programmable Slew Rate,
Pullup, Input Delay, Etc
Selectable I/O Standard
Support
Supported Standards List
can be Updated After
Testing
DFF/LATCH
D
Q
CE
S/R
DFF/LATCH
D
Q
CE
S/R
DFF/LATCH
D
Q
CE
S/R
PAD
How It Works
SelectI/O Output
SelectI/O Input
Configuration Bits
OBUF_SSTL3_I
IBUF_SSTL3_I
SSTL3 Class1
Output Driver
SSTL3 Class1
Input Receiver
How It Works
Separate I/O & Core Supply
Rails
Programmable Driver Strength
P & N Drivers Individually
Controlled
16 Different Setting for Each
Variable I/O & Vref Voltages
8 Banks on Each Device
Specific I/Os are Used as
Reference Inputs
Differential Inputs Supported
nMOS for High Vref
pMOS for Low Vref
VCCO
Currently Supported Standards
Standard
LVTTL
LVCMOS2
PCI 33MHz 3.3V
PCI 33MHz 5.0V
PCI 66MHz 3.3V
GTL
GTL+
HSTL-I
HSTL-III
SSTL3-I
SSTL3-II
SSTL2-I
CTT
AGP
VCCO
3.3
2.5
3.3
3.3
3.3
na
na
1.5
1.5
3.3
3.3
2.5
3.3
3.3
Vref
na
na
na
na
na
0.80
1.00
0.75
0.90
0.90
1.50
1.10
1.50
1.32
Application
General Purpose
PCI
Back-Plane
Hitachi SRAM
SDRAM
Memory
Graphics
I/O Performance
Virtex Chip-Chip I/O Performance
SSTL3
AGP
I/O Standard
HSTL IV
PCI-3.3V
LVCMOS2.5V
TTL-Fast 24mA
TTL-Fast 12mA
TTL-Slow 12mA
TTL-Slow 2mA
0
50
100
150
200
Maximum Chip to Chip I/O Frequency = 1/(Tsetup + Tc2o)*
*DLLs Used to Eliminate Clock Distribution Delay
SelectI/O Banks
BANK 1
BANK 5
BANK 4
BANK 3
BANK 6
BANK 2
BANK 7
BANK 0
SelectI/O Banks
Each Device is Broken in 8 Banks Regardless of Size
2 Banks on Each Side of the Device
Each Bank has Voltage Sources Shared Among
Associated I/Os in that Bank
All I/O Requiring a Voltage Source Must be of the Same Type
Input Banking - Vref
I/O Standards Which use a Differential Amplifier Require a
Voltage Reference Input
All Fixed Location/Dual Purpose Vref Inputs in a Bank Must be
Used When Supplying a Voltage Reference
Output Banking - Vcco
Dedicated Pins provide drive source voltage for output pins
SelectI/O Input Banks
1 Voltage Reference can be Supplied in a Bank
Any input not requiring a Vref can be placed in Bank
Flexible Use of Voltage Reference Inputs
Pins Can be Used as General Purpose I/O If a Voltage
Reference is Not Needed - All Must be Used to Supply a
Voltage Reference
Locations are Fixed for Each Device/Package Combination
Any Single Output Buffer Type Can be Placed in the
Bank
Multiple Output Buffer Types Must Adhere to Output Bank
Rules
OBUFTs with Keepers Circuits Requiring a Voltage
Reference are Treated as IOBUFs
SelectI/O Output Banks
Only One Vcc Output is Supplied to Each Bank
Any Output Not Requiring Use of the Vcc Output
can be Placed in the Bank
Any Single Input Buffer Type Can be Placed in the
Bank
Multiple Input Buffer Types Must Adhere to Input Bank
Rules
Special Consideration Must be Given to
Configuration I/O
Configuration I/O is Located on the Right Side of the
Device
Serial PROM Downloads Require Vcco Set to 3.3V In
Banks 2 & 3
Non-PROM Serial Downloads will generate warning
(Even though Vcco Connection dependent on data source)
Thermal Management
Thermal Challenge
Today’s FPGA Density is
Absorbing Large
Percentages of Board
Designs
Ambient
Temp
Data
Because of its Highly
Demands
Dynamic Nature, Power
Can Only be Estimated
Before Design Completion
Even as Voltages
Decrease, Power
Consumption is a Major
Concern
How do I Know My Die
Temp is Within Spec?
Heat
Sinking
Vcc
Tolerance
Virtex XCV1000
75M Transistors*
100+ MHz
Advanced Signal
Processing Apps
20W+ Power
Dissipation
* Pentium II = 7.5 Million Transistors
Thermal Solution
Maxim MAX1617
2-Pin SMBUS
Serial Interface
Interrupt
SBMCLK
SBMDATA
DXP
DXN
Virtex
DXP
DXN
ALERT*
Remote Die Sensor
Specially Designed to be Used With the Maxim
MAX1617
Simple 2-Pin Interface with no Calibration Required
Provides Two Channels
–
–
FPGA Die Temp Reported from -40°C to +125°C at +/- 3°C
Maxim Die Temp also at +/- 3°C
Programmable Over-Temp & Under-Temp Alarms
Same Technology as Pentium II
System Management is Now Possible
SelectMAP
Advanced Configuration
Master/Slave Serial
JTAG
SelectMAP
Simple Serial Interface
System Integrated Serial
Virtex
High Performance Parallel
Simplified Configuration Mode Set
50 Megabyte/Second Download Rate Using
SelectMAP
Dedicated JTAG Port - No Contention Issues
No Master Parallel Support
Direct, JTAG & SelectMAP Device Readback
Software & Cores Support
HDL Design Entry Focus
Synthesis Support is Critical for Large Designs
Architecture Decisions Made Based on Synthesis Tool Tendencies
Xilinx Relationships With Synthesis Vendors Initiated Direct 4-Input
LUT & Carry Chain Synthesis - The Building Blocks of XL & Virtex
Xilinx Will Continue to Drive Synthesis Vendors to Support Virtex
Specific Features - Block SelectRAM, SelectShift & CLKDLLs
Virtex Architecture Adds Additional Resources That Synthesis
Vendors Easily Synthesize To Today
Implementation Software Written With Synthesis Tool Flow Focus
All Three Major Synthesis Vendors Supported Virtex for Beta
Large Designs Also Require Team Based Design
Must be able to Support Multiple Designers on the Same Device as
Well as Core Integration
Implementation Software
Virtex Software is built on proven M1 technology
Builds on Robust Integration with Third Party Design Entry Tools
Emphasizes Constraint Driven Design Philosophy
Vector Based Interconnect Yields More Predictable Routing
Results
Predictable Results Allows the Placement Algorithms to Make
Better Routing Estimations in Must Less Time
Architecture fully software tested before 1st silicon
Virtex Implementation Software Was Available 18 Months
Before Actual Silicon was Produced
Used Proven Place & Route Software as a Gauge of the
Architecture’s Ability to Meet Density & Performance
Needs
Early Software Allowed for Changes to be Made in the
Finalization of the Architecture - Necessary Routing Mix,
Special Features, etc
A System Level Solution
2
System
Integration
1
4
System Memory
3
System
Timing
System Interfaces
Virtex is a True System Level Solution
A System Level Solution
Virtex Opens New System Level Applications to
FPGAs
1
Extremely Dense - 50,000 to 1,000,000 System Gates
Flexible Architecture
–
–
Vector Based Interconnect
–
–
Efficient for Random Logic, Memory, DSP & Data Path Circuits
Automatically Implemented by Today’s Leading Synthesis Vendors
Much More Predictable Before Place & Route
Enhances Synthesis Based Flows
Excellent Platform for Core Integration
–
Software Based on Proven M1 Timing Driven Place & Route
Hierarchical Memory Support
2
SelectRAM+ Can be Used to Create Bytes or KBytes of
Internal Storage and Access MBytes of Fast External
Memory
A System Level Solution
System Speedup & Synchronization
3
Nullify Clock Distribution Delays - 160 MHz System
Performance
Synthesize Clocks for Internal and External Use
Synchronize Systems - Create Clock Mirrors & Nullify
Board Delay
Flexible System Interface
4
Controllable Current, Input Vref and Vcco
Characteristics
Connect Directly to Existing & Emerging I/O Standards
SelectMap Protocol Allows for Easy Interfacing to
µControllers and µProcessors
–
–
–
400+ Mb/sec Configuration, Verify & Debug Using a Simple 8-Bit
Interface
SelectMAP Port Can Remain on After Configuration
JTAG Can Also be Used to Configure