viretech - "PLDWorld.com"

Download Report

Transcript viretech - "PLDWorld.com"

®
DLLs
2
The Need for Clock Management
 As system speeds increase, we can no longer
ignore clock skew and noise problems
— A 2ns clock skew matters more with a 6ns clock, than it
does with a 20ns clock
 Need a way to control clock skew and
decrease the effect of noise on the clock
3
Ways to Manage the Clock
 PLLs
 DLLs
— All digital
— Triggered by incoming
clock edge
— Creates output jitter
less than 50ps
— Less susceptible to
analog noise
— Easily transferable
from one process
technology to another
4
— Uses analog VCO
— Can suppress
incoming clock jitter
— Adds undefined output
jitter
— Susceptible to analog
noise
— Not easily transferable
from one process
technology to another
DLL Basics
 A DLL works by inserting delay on the clock net until the next
clock input rising edge is in phase with the clock feedback
rising edge.
 Requires a well designed low-skew clock distribution network
so that the clock edges arrive simultaneously everywhere in
the part.
CLKIN
Delay
Delay
Delay
Delay
CLKOUT
Phase Delay
Control
CLKFB
5
Clock
Distribution
Network
DLL Functions
Virtex
Speedup Tc2o
Zero-Delay Internal Clock Buffer
Clock Phase Synthesis
For Use Internally Or Externally
Virtex
Clock Multiplication & Division
For Use Internally Or Externally
6
Clock Mirror
Zero-Delay Board Clock Buffer
DLL Tclock-to-out Speedup
Tclock = 0ns
D Q
>
DLL
OUT
CLKext
Tc2q + Tout = Tc2o
CLKint
 Nullify clock delay - fast Tc2o on XCV1000
—External CLKext pin and internal CLKint pin are aligned
—2.5ns setup/0.0ns hold & 3.5ns Tc2o on all devices
 Optional Duty Cycle correction
—50/50 Duty Cycle correction applied when specified
7
DLL Multiplication
16
32
16
Data
Buffer
IO
Internal
Logic
2x
DLL
x
CLK
 Generate 2x & 4x clocks
— Reduce board EMI and trace concerns by routing low frequency clocks
externally and multiplying internally
 Cross clock domains without worry
— Multiplied & divided clocks have synchronized edges
— No external clock drift & minimal external clock skew
8
DLL Division
 Selectable Division Values
— 1.5, 2, 2.5, 3, 4, 5, 8, or 16
— 50/50 Duty Cycle
correction available
— Use DLL pair to combine
functions
Input
180
2X
30 MHz - 180° Phase Shift
DV2
DLL
30 MHz
(180° Shift)
30 MHz
30 MHz
Used for FB
DLL
15 MHz
(Divide by 2)
60 MHz
(Multiply by 2)
30 MHz
(180° Shift)
30 MHz 180° Phase Shift - Clock Multiply & Clock Divide
9
System Synchronization
 Synchronize all devices
— Eliminate board clock skew
— Nullifies clock input & board
delay in addition to internal
distribution delay
— Removes chip to chip race
conditions
— Increases chip to chip interface
speed - 240MHz for Virtex-E
CLK
DLL
DLL
FPGA 1
DLL
DLL
FPGA 2
FPGA 3
10
DLL
FPGA N
DLL Applications
 Clock to out Speedup
— High Speed Memory interfaces
— High Speed chip to chip requirements
 Clock Multiplication/Division
— Multiply clock internally, so that the external clock is slower, thus
decreasing the signal integrity problems on the board
 Clock Phase Shift and Duty Cycle Correction
— Double Data Rate applications
— Generation of multiple clocks
 Clock Mirroring
— Generate extra external clocks for fanout issues
— Board level clock management
11
Virtex-E DLL Modes
 Low Frequency
—
—
—
—
Input Frequency Range - 25 MHz to 160 MHz
Maximum Output Frequency - 320 MHz
Minimum High/Low Time - 2.0 ns*
All 6 Outputs Available for use Internally & Externally
– CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV
 High Frequency
—
—
—
—
Input Frequency Range - 60 MHz to 320 MHz
Maximum Output Frequency - 320 MHz
Minimum High/Low Time - 1.3 ns*
3 Outputs Available for use Internally & Externally
– CLK0, CLK180 & CLKDV
 Both Modes Supported with Simple Design Primitives
— VHDL & Verilog Simulation Support Available
* Varies with frequency
12
DLL Software Support
 Use BUFGDLL macro for
common clock usage
BUFGDLL
0ns
 Build complex structures
using clkdll primitive
Equivalent Structure
CLKDLL
CLK0
CLKIN
CLKFB
CLK90
BUFG
CLK180
CLK270
CLK2X
CLKDV
RST
PAD
LOCKED
13
IBUFG
DLL
FB
To distributed
clock network
What happens if
the CLKIN phase shifts?
 The outputs will phase shift 1-4 clock edges
after the CLKIN shifts.
— Due to this delay inter-chip communication could
have problems since the clock sources are not
aligned.
 LOCKED will stay asserted and the control
logic will remain at the previous setting
 Advice: Keep the phase shift to a longer LOW
pulse.
14
What happens if
the CLKIN changes frequency?
 The control logic is may not able to catch
period changes of 1.0ns or more
 The outputs may start to destabilize as the
control logic tries to adjust the delay lines to
compensate.
 What to do: Make sure that a change of
frequency is followed by a reset of the
CLKDLL.
15
What happens if
the operating temperature changes?
 The DLL will automatically adjust for
temperature variance
 DLL specs are guaranteed for chip
temperatures between 0ºC and 85ºC
16
Why can’t I
mux the CLKIN line?
 The CLKIN input must come from an IBUFG,
a BUFG driven from another CLKDLL, or
DLLIOB
 If a LUT or other route is placed in the circuit
the CLKDLL can not adjust for this unknown
delay
 What to do: Route the net out of the chip and
into an IBUFG or DLLIOB
17
DLL Information
 XAPP132: Using the Virtex DLL
 XAPP400: DLL usage in Software
http://www.xilinx.com/apps/virtexapp.htm
18
Differential Signaling
LVDS, LVPECL, BusLVDS
19
Moore’s Law at Work
Blasting Thru the 100M Transistor Barrier
XCV3200E
211M Transistors
200M
XCV2000E
125M Transistors
100M
XCV1000
75M Transistors
1998
20
1999
2000
I/O Bandwidth Trends
Bandwidth (MB/s)
10,000
1,000
Ethernet
100
SCSI
10
Internet
Backbone
1986
1988
1990
21
1992
1994
1996
1998
2000
2002
I/O Signaling
I/O Signaling
Single-Ended
I/O Signaling
TTL
HSTL
22
Differential
I/O Signaling
SSTL
LVDS
BLVDS
LVPECL
The Problem
 As the process shrinks, the absolute I/O noise
margin shrinks as well
5V
4V
Logic 1
3V
Logic 1
2V
1.6 V
Logic 1
1.0 V
1V
0.86 V
Logic 0
Logic 0
5V CMOS
3.3V CMOS
23
Logic 0
1.8V CMOS
Differential Signaling
The Solution
 Differential I/O signaling has a higher noise
immunity
 The data is transmitted in the voltage
difference of two lines
 The noise effects both lines, but the voltage
difference stays about the same, which means
that the data is not effected by the noise
24
Differential Signaling
The Benefits
 The benefits:
—
—
—
—
High Noise Immunity… Huge Benefit
Low Power
High Speed I/O transfer
Low EMI
– Noise due to switching cancels between the
two lines, since both lines switch at the same
time, in the opposite direction
25
Differential Configurations
Multidrop
Point to Point
Multi-Point
26
Signal Interconnect Classification
Dual-Pin Differential
Point-to-Point
+
_
LVDS
LVPECL
50  Transmission Lines
+
_
+
_
Multi-Drop
Bus LVDS
LVPECL
•Typically found
in backplanes
30  Transmission Lines
Multi-Point
Bus LVDS
LVPECL
•Typically found
in backplanes
30  Transmission Lines
27
VIRTEX-E as a Differential Receiver
Point-to-point configuration
LVDS/LVPECL
Line driver
Zo = 50
Q
Virtex-E
FPGA
IN
Rt
Data out
Data in
Zo = 50
QB
INX
 VIRTEX-E can be driven by any standard LVDS or
LVPECL driver
 VIRTEX-E receiver complies with the LVDS or LVPECL
specs
28
VIRTEX-E as an Differential Driver
Point-to-point configuration
Virtex-E
FPGA
Zo = 50
Rs
Q
Rdiv
Data out
OUT
Standard LVDS or
LVPECL receiver, or
VIRTEX-E LVDS or
LVPECL receiver
Rt
Data in
Zo = 50
QB
Rs
OUTX
 Capable of driving any standard LVDS or LVPECL
receiver
29
LVDS
 LVDS stands for:
— Low Voltage Differential Signaling.
 It’s a way of communication using low
voltage
— Swing (~350 mV) over two differential
connections.
 The Big motivation for developing LVDS
is the need for noise immunity for board
to board communication
30
BLVDS
 BLVDS stands for:
— Bus LVDS
 Bidirectional LVDS
— The device can transmit and receive
LVDS signals through the same pins
 Requires different termination than LVDS
31
Virtex-E LVDS Signaling
+/- 175 mV Swing
@ 1.25V Midpoint
Q
_
Q
1.5V
1.0V
0.5V
0.0V
Computed Signal
Differential 2 x (Q-QB)
32
LVDS Standards
Parameter
RS-422
Driver output voltage
Receiver input threshold
Data Rate
Dynamic Power
Noise
Cost
~2 - 5 V
~200 mV
<30 Mbps
Low
Low
Medium
33
PECL
~600 - 1.000 mV
~200 - 300 mV
> 400 Mbps
High
Low
High
LVDS
~250 - 450 mV
~100 mV
> 400 Mbps
Low
Low
Low
LVDS Characteristics
 Termination
— The transmission medium must be terminated with a 100  +
20 .
— The resistor is placed across the differential inputs
— With this termination as LVDS driver can drive signals over
several meters at speeds in excess of 155.5 Mbps (77.7
MHz).
— The real limitation of speed is:
–
–
How fast can data be delivered to the driver.
Bandwidth performance of the selected media.
— The simple LVDS termination is easy to implement
— ECL and PECL require more complex termination schemes.
34
LVDS Advantages
 Saving Power
— LVDS technology saves power in several important
way’s.
— Power dissipation at the terminator is ~1.2 mW
– RS-422 driver delivers 3 V across a termination of 100 , for 90
mW power consumption... 75 times more than LVDS!
— Due to the current mode driver design, the frequency
component of Icc is greatly reduced.
– Compared to TTL / CMOS transceivers where the dynamic
power consumption increases exponentially with the
frequency.
35
LVDS Advantages
 Save Money
— High performance can be achieved using off the shelf
FPGA’s
— LVDS consumes less power, therefore one can use
cheaper power supplies, or fewer fans
— LVDS is low noise, so no more EMI headaches (save
time).
— Since LVDS is much faster than CMOS / TTL, LVDS
signals can be serialized. This results in smaller
packages, simpler connectors, etc
36
LVPECL
 LVPECL stands for
— Low Voltage Positive Emitter Coupled Logic
 Well known industry standard for fast clocking
 Voltage swing (~750 mV) over two differential
connections.
 Virtex-E offers easy interface with other standard
LVPECL chips
38
LVPECL Clocking
 TTL is not the most desired clocking technique
for clock frequencies higher than 150 MHz
System Clock Speed
LVPECL
TTL
150 MHz
39
Clock Sources
TTL
Oscillator
TTL/CMOS
Up to ~135MHz
LVPECL
Up to ~250 MHz
Generic
LVPECL
Oscillator
Example: Saronix SEL3400 Series
Quartz
Crystal
16MHz Nom
LVPECL Clock
Synthesizer
Example: Motorola MC12429
Synergy SY89429V
40
LVPECL
Up to ~400 MHz
Virtex-E 300+ MHz LVPECL Clocking
No LVPECL-TTL Translator
Equal-Length Point-to-Point
LVPECL PCB Clock Traces
Virtex-E 1
2
LVPECL
Clock Source
LVPECL
Clock
Distributor
2
2
Virtex-E 2
2
Example Devices:
Motorola MC10/100E111
Synergy SY10E111LE
Virtex-E
Virtex-E n
Typical
Discrete Solution:PECL-to-TTL
Motorola MC100EPT23
Dual Differential -PECL
to TTL Translator,
TPD
= 2.0ns & Skew
Virtex-E
Eliminates
Converters
Eliminates
2ns
Delay
41
Virtex-E LVPECL Clock Conversion
Receive and convert high speed clocks with zero delay
Zero-Delay Local Clock Generation
to Any of Virtex-E I/O Standards
LVPECL
Clock
TTL
DLL
DLL
Virtex-E
42
External
RAM, etc.
SSTL
External
RAM, etc.
Putting it All Together ...
No LVPECL-TTL Translator
Device
Equal-Length Point-to-Point
LVPECL PCB Clock Traces
Virtex-E 1
Device
Device
2
LVPECL
Clock Source
2
LVPECL
Clock
Distributor
2
Virtex-E 2
Device
2
Example Devices:
Motorola MC10/100E111
Synergy SY10E111LE
Virtex-E
43
Device
Virtex-E n
Device
Designing With LVDS and LVPECL
 Some Facts
— Impedance Matching is VERY important
— Discontinuities in impedance WILL create
reflections.
— Reflections degrade signals and show up as
Common Mode Noise.
— Common Mode Noise cancels the magnetic shield
effect of differential lines and radiates as EMI.
— Do not make sharp turns since this causes
impedance discontinuities.
— Keep stubs and uncontrolled tracks < 10 mm.
44
Designing With LVDS and LVPECL
(Continued)
 PCB guidelines:
— Use at least 4 PCB layers (LVDS signals, ground, power, TTL/CMOS
signals)
— Separate TTL/CMOS signals from the LVDS signals
— Keep LVDS driver/receiver connections as close to the
connectors as possible.
— Decouple the power supply as good as possible.
— Connect all the VCC and Ground pins of the
component.
— Make power and ground tracks as wide as possible.
— Connect to power and ground tracks with multiple
vias.
45
Designing With LVDS and LVPECL
(Continued)
 PCB guidelines
— Match the tracks to the impedance of your
transmission medium and termination resistor.
— Run differential tracks as close together as
possible as soon as they leave the IC
— Use Microstrip or Stripline for tracks
— Match electrical length of tracks to reduce skew.
— Keep the distance of a pair of tracks as constant
as possible to avoid discontinuities in impedance.
46
Designing With LVDS and LVPECL
(Continued)
 PCB guidelines
— Use a good matching termination resistor.
–
LVDS will not work without resistor termination.
— Typically a single resistor at the receiver is OK.
— Surface mount resistors are best.
–
–
–
Stubs are short.
Distance between receiver and termination is short.
No component leads.
— At extra cost you can use the center tap capacitance
termination scheme.
R/2
R
C
R/2
47
More LVDS and LVPECL Info
At Xilinx’ website:
http://www.xilinx.com/apps/xapp.htm
Look at AppNotes
XAPP230, XAPP231, XAPP232
48
Memory Interfaces
ZBT RAM, SDRAM, DDR SDRAM
49
Virtex-E and High Speed
Memory Interfaces
 Features needed for interface to high speed
memory
— Fast I/Os
— Clock management capabilities
 Virtex-E has both:
— SSTL2, HSTL, LVDS, LVPECL and many more
— 8 on-chip DLLs - use for Clk-to-Out speed up,
clock deskew, clock multiplication/division
50
Benefits of using an FPGA for the
Memory Interface
 Easy to implement
 Can add functionality in the future easily
— ASIC is a one-time-deal
 Combine multiple discrete devices into the
FPGA
— Save space, money, and power
51
High Speed
Memory Interfaces
ZBT RAM Interface
SDRAM Interface
DDR SDRAM Interface
52
Zero Bus Turn-around SRAM
 Extremely high bandwidth
—
Other non-cache applications in telecom, test equipment, DSP
and embedded memory applications
 ZBT stands for “Zero Bus Turnaround”
—
—
—
No idle cycles between read-to-write and write-to-read
100% bus use
Previous architectures had a Turnaround Cycle
 Completely Deterministic Timing - Simplifies System Design
—
Any cycle can perform any operation
53
ZBT SRAM Parameters
 Densities
2, 4 and 8 Mbits
 Data bus widths
18, 32, and 36-bit
 IO Voltage and standards
2.5V, 3.3V, LVTTL
 Flow thru speed
8, 10ns
 Pipeline speed
5, 6, 7.5ns (Clock cycle time)
54
(Clock cycle time)
ZBT Flow-ThroughTiming
Read Operation - data available after single clock latency
Clk
1
2
Control
Address
Data
Write Operation - “Late Write” data to be written is presented on next clock
Clk
1
2
Control
Address
Data
55
ZBT Pipelined Timing
Read Operation - data available after two clock latency
Clk
1
2
3
Control
Address
Data
Write Operation - “Late Write” data is written 2 cycles later
Clk
1
2
3
Control
Address
Data
56
ZBT 100% Bus Use
Write/Write/Read/Write/Read/Burst Read
T1
T2
T3
Command
Write1
Write2
Read1
Address
Addw1
Addw2
AddR1
T4
T5
T6
WRITE3
Read2
RdBrst
Addw3
AddR2
T7
T8
Clock
Dout
w1
DQ
Dout
w2
Din
R1
Dout
w3
Din
R2
Pipelined part’s timing is illustrated above
57
Din
R2+1
Virtex-E ZBT Bandwidth
800 Mbytes/sec @ 32bits wide
Device
Frequency
(MHz)
ZBT Pipelined
ZBT Pipelined
ZBT Pipelined
ZBT Pipelined
SyncBurst Pipelined
ZBT FlowThrough
SyncBurst Flow-Through
200
166
143
133
133
100
83
Cycle
Time
(nS)
5
6
7
7.5
7.5
10
12
MAX*
READ/WRITE
Bandwidth
Cycle
(MByte/sec) Bandwidth
800
800
666
666
572
572
533
533
533
267
400
400
332
221
READ/WRITE
Burst of 4
Bandwidth
800
666
572
533
426
400
295
Very High Performance Synchronous, Static Memory
NOTE:
The bandwidth figures presented in this table are for a 32 bit data path, the
raw bandwidth is 12.5% higher if a 36 bit data path is used.
58
ZBT Interface Reference Design
DLL 1
CLKin
DLL 2
Clk2x
Clk2x
Data out
Reset
Data in
Addr
Error
Data
Controller
Tester
ZBT SRAM
RW#
XCV300-E
59
Addr
ZBT Interface Application Note
•7.2 Giga-bits/s @ 36 bits wide
•200 MHz Synthesisable HDL Controller Design
•XCV300-E, -6 speed grade
ZBT Controller Interface with tester resource utilisation
93 Logic Cells
502 Flip Flops
71 IO
Part
XCV50-E
XCV100-E
XCV200-E
XCV300-E
XCV400-E
XCV600-E
XCV1000-E
Logic
Cell
Total
available
Utilisation
Logic
Cells
5.38%
3.44%
1.76%
1.35%
0.86%
0.60%
0.34%
1,728
2,700
5,292
6,912
10,800
15,552
27,648
60
Flip Flop
Utilisation
Total
available
Flip Flops
IO
Utilisation
Total
available
IO
32.68%
20.92%
10.67%
8.17%
5.23%
3.63%
2.04%
1536
2400
4704
6144
9600
13824
24576
39.44%
39.44%
25.00%
22.47%
17.57%
13.87%
13.87%
180
180
284
316
404
512
512
ZBT Bus Contention - Real World
143 MHz Clock
R/W
Address [0]
Data [0]
Scope shot taken directly from the ZBT controller reference board.
61
Virtex-E High Speed
SDRAM Interface
 SDRAM Overview
— Features
 Virtex-E SDRAM controller
— Features
— Block diagram
— Timing
62
SDRAM
 Features:
— Synchronous interface (free system from wait states)
— Burst mode access (reduce CAS access time)
— Multiple banks
(parallel processing: access one bank,
precharge/refresh the other)
— LVTTL, 3.3V
— Programmable burst length, CAS latency
Clock
READ
Command
Address
Col
D1
DQ
CAS latency=2
D2
D3
Burst length=4
63
D4
SDRAM Controller
Application Note
 Synthesizable Verilog/VHDL
 Programmable burst length (1, 2, 4, 8)
 Programmable CAS latency (2, 3)
 Automatically issues refresh commands
 Supports LOAD_MR, AUTO_REFRESH, PRECHARGE,
ACT_ROW, READA, WRITEA, BURST_STOP, NOP
 Interfaces with SDRAM at 125MHz (Virtex-E, -6 speed)
 Uses 2 DLLs and 165 CLB slices (5% of XCV300E)
64
SDRAM controller
62.5MHz clock
system
controls
125MHz clock
XCV300-E
-6
data_addr_n
controls
addr
11
AD
data
32
32
65
SDRAM
16M
(x16)
SDRAM controller
Controller
66
SDRAM controller IO timing
 Read Cycle is the critical timing:
— SDRAM-8
clk-to-out
= 6.0ns
— Virtex-6 setup
= 1.7ns
— 125 MHz operation (8ns cycle), 300ps left for board routing on
data lines
 Write Cycle:
— Virtex-6
clk-to-out
= 3.9ns
— SDRAM-8
setup
= 2.0ns
— 125 MHz operation (8ns cycle), 2.1ns left for board routings
67
Virtex-E DDR-SDRAM
Interface
 DDR SDRAM Overview
— Features
— Differences from SDRAM
 Virtex-E SDRAM controller
—
—
—
—
Features
Block diagram
Timing
Board layout guideline
68
DDR SDRAM
 Features:
— Next generation SDRAM
— DDR data I/O (twice the bandwidth at the same
clock frequency as SDRAM)
— Peak bandwidth: 1.6 GBytes/s (64-bit @ 100MHz)
— 2.5V, SSTL2, 100/133MHz
— Advantages over RDRAM cost, package, open
industry spec, compatible with existing spec
— Supported by major vendors Micron, Samsung, IBM,
Fujitsu, Hitachi, Huyndai, Toshiba,...
69
DDR SDRAM
 Differences compared to standard SDRAM:
— All IOs are SSTL2, 2.5V (reduce power and noise)
— Differential clock (CLK and CLKB). Positive edge
clock is the crossing of CLK going high and CLB
going low.
— Bidirectional data strobe (clock-to-data skew is
eliminated)
— Double Data Rate data transfer
70
Write Cycle
SDRAM:
clk
cmd
ACT
addr
data
ROW
NOP
WRITE
COL
D1
D2
D4
D3
DDR SDRAM:
clk
clkb
cmd
addr
dqs
ACT
NOP
WRITE
COL
ROW
data
D1
71
D2
D3
D4
Read Cycle
SDRAM:
clk
cmd
ACT
addr
data
ROW
READ
NOP
COL
D1
D2
D4
D3
DDR SDRAM:
clk
clkb
cmd
addr
dqs
ACT
NOP
READ
COL
ROW
data
D1
72
D2
D3
D4
DDR SDRAM controller
Application Note
 Synthesizable Verilog
 Virtex-E, -6 speed grade: 100 MHz Clk
— 200 MHz Data rate
— 1.6 Giga-Bytes/S bandwidth @ 64 bits wide
 Programmable CAS latency, burst length
 2 DLLs, 474 slices (15% of XCV300-E)
 Uses “Logic Accessible Clock” technique
 Uses Clock to latch Read Data, instead of DQS
73
DDR SDRAM controller
Virtex-E
74
DDR SDRAM IO timing
Data Lines: Read Cycle
 Data Lines
— Read cycle is critical. Data is strobed by clk,
instead of DQS
ddr_clk
minimum DDR clk-out
-0.8ns
-0.4ns
minimum Virtex-E hold time
Minimum trace delay on data = 0.8ns - 0.4ns - clock skew between ddr_clk & fpga_clk
= 0.4ns- clock skew
75
DDR SDRAM IO timing
Addr/Cntrl Lines
 Address and Control lines are generated on the
negative edge of the clock, to guarantee DDR hold
time
5ns
ddr_clk
2.4ns
Virtex-E clk_out (max)
1.2ns DDR setup time
Maximum trace delay on Addr/Cntrl = 5ns - 2.4ns - 1.2ns - clock skew
= 1.4ns - clock skew
76
DDR SDRAM IO timing
Summary
 The I/O spec for DDR is very tight
 Carefully calculate data and address trace
delays to guarantee setup and hold times
 The minimum trace delay on the data lines
can be eliminated by delaying the ddr_clk
— Since DDR has negative tAC(min), delaying the ddr_clk
helps meet Virtex-E’s hold time requirement
77
Board Layout Guideline
 All high speed memory interfaces
— Virtex device and the memory chips must be placed close to each
other
— Consider/Simulate board level signal integrity and timing, pay
particular attention to clocks
— Use matched impedance traces
 DDR
— All bi-directional signals use IOBUF_SSTL2_II (data & data strobes)
other output signals use OBUF_SSTL2_I
— DQ lines must be closely matched, and kept short to minimize cross
talk
— DQS trace lengths should match DQ
— CLK and CLKB delays and loads should match (CLKB can also be
routed back to an unused IOB near the feedback pin)
78
Memory Interface
Application Notes
 ZBT RAM: XAPP136
 SDRAM: XAPP134
 DDR SDRAM: XAPP200
http://www.xilinx.com/apps/virtexapp.htm
79
CAM in Virtex-E
80
CAM Overview
 Content Addressable Memory
 Storage Array (like RAM)
 Find a location of a particular stored value
 Compare input against data in memory
– If Match found, output the Address
– Maximum performance, if match in a single
clock cycle
81
CAM Overview
 Simple RAM and CAM compared
RAM
Add [9:0]
1024 x 8
Dout [7:0]
CAM
Add [9:0]
Din [7:0]
1024 x 8
Match
82
CAM Applications
 Telecommunications
 Networking
 Ethernet
 ATM
 Protocol
83
CAM Overview
 CAM features:
—
—
—
—
—
—
—
Word Size (width)
Number of Words (depth)
Match or Compare Time (read)
Significance of Write Speed
Clock Frequency
Masks
Decoded and/or Encoded Address (outputs)
84
CAMs in Virtex-E
Depth
32
256
32
128
4096
Width
8
8
16
40
16
Size
Match
256 bits
4.5 ns
2Kbits
8.5 ns
512 bits
8 ns
5Kbits
12 ns
64Kbits 16 x 20 ns
Device
Logic
XCV50-6
BRAM
XCV50E-6 BRAM
XCV50-6 SRL16
XCV300-6 SRL16
XCV400-6 RAM16x1
 Flexible CAM designs in Virtex and Virtex-E
— CAM implemented in a LUT
— CAM implemented in a Block SelectRAM
85
Designing CAM in Virtex slices
 XAPP203: “Designing Flexible, Fast CAMs with
Virtex Family FPGAs”:
— VHDL and Verilog Reference Designs available
 Features
—
—
—
—
—
—
—
—
—
4 bits per LUT
16-word x 4-bit organization
Match in one clock cycle
16 Write clock cycles
Decoded address output
Generic word width from 4 bits up to any multiple by 4
Generic number of 16 words CAM blocks
Cascadable
Address Encoder in logic or tri-state buffers (TBUF)
86
CAM in a LUT
Match Operation
8
DATA_IN
MATCH_SIGNAL
D
D
A[0:3]
4
Q
Q
FF
SRL16
CLK
LUT
D
4
Q
A[0:3]
SRL16
“1”
Wide AND
LUT
1 slice
Reconfigurable 8-bit Word Comparator
87
Match Waveforms for CAM in a LUT
“…1001”
DATA_IN
MATCH_ENABLE
MATCH
“xxxx xxxx xxxx xxxx”
“0000 0000 0000 0100”
“xxxx”
“0010”
R_MATCH_OK
R_MATCH_ADDR
CLK
Match_cycle
DATA_IN
CAM
16WORDS
MATCH_ENABLE
88
MATCH
Encode_cycle
R_MATCH_ADDR
ENCODE
R_MATCH_OK
CAM in a LUT
Write Operation
DATA_IN
8
4-bit
Compare
4
D
A[0:3]
Q
SRL16
MSB
LUT
4
4
4-bit
Compare
D
Q
A[0:3]
SRL16
LSB
LUT
Counter
1 slice
Reconfigurable 8-bit Word Comparator
89
Cascading CAMs in LUTs
 CAM match path (1 CLK) & encode (1 CLK)
DATA_IN
8
Array of N x 16_WORDS
16
D
CAM_16WORDS
Encode
4 LSB
CAM_16WORDS
Encode
4 LSB
CAM_16WORDS
Encode
4 LSB
CAM_16WORDS
CLK
MATCH_ENABLE
90
MATCH_ADDR
Encode
4 LSB
Q
FF
Encode
MSB
16 FFs
MATCH_OK
D
Q
FF
CAM in Block SelectRAM
 XAPP204: “Using Block SelectRAM+ for HighPerformance Read/Write CAMs”:
— VHDL and Verilog Reference Designs available
 Features
—
—
—
—
—
—
—
—
128 bits per Block SelectRAM+
16-word x 8-bit organization
Match in one clock cycle
Write in one clock cycle (and Erase in one clock cycle)
Decoded address output
Fully synchronous match and write ports (Independent)
Cascadable
Address Encoder in logic or tri-state buffers (TBUF)
91
CAM in a Block SelectRAM+
 CAM 16x8 Macro in 1 Block SelectRAM+
ERASE_WRITE
DATA_WRITE[7:0]
ADDR[3:0]
DIA[0]
8
12
PORT A
ADDRA[11:0]
WEA
ENA
4
WRITE_ENABLE
“0”
CLK_WRITE
RSTA
DOA
N.C.
CLKA
“0000….0000”
DATA_MATCH[7:0]
“0”
MATCH_ENABLE
MATCH_RST
DIB[15:0]
ADDRB[7:0]
WEB
ENB
RSTB
CLK_MATCH
PORT B
DOB[15:0]
CLKB
RAMB4_S1_S16
92
MATCH[15:0]
Cascading Block SelectRAM+
CAMs for bigger depth
 CAM 64-word x 8-bit in Read Mode
8
8
8
CAM (16x8)
CAM (16x8)
CAM (16x8)
MATCH[63:0]
8
CAM (16x8)
DATA_MATCH[7:0]
64
[63:48]
48
[47:32]
32
[31:16]
CLK_MATCH
[15:0]
93
16
Cascading Block SelectRAM+ CAMs
for higher width
 CAM 16-word x 16-bit in Read Mode
MATCH[15:0]
[15:8]
[15]
CAM (16x8)
[15]
[7:0]
[15]
CAM (16x8)
[1]
DATA_MATCH[15:0]
[1]
[1]
[0]
[0]
[15:0]
CLK_MATCH
[15:0]
94
[0]
CAM in Block SelectRAM+
The final picture
 CAM16x8 Macro
— Match flag and encoded outputs
Write port A
(4096 x 1)
DATA[7:0]
ADDRB[7:0]
MATCH[15:0]
Read port B
(256 x 16)
DOB[15:0]
CLK_MATCH
ENCODE
Decoded Address
MATCH_ADDR[3:0]
16
D
CLKB
Q
FF
CLK_MATCH
95
4
MATCH_SIGNAL
CAM in Virtex FPGAs
 Basic decoder/comparator block designed using:
—
—
Virtex slices configured as 16-bit shift registers (8 bits per slice)
Virtex dual port block SelectRAM+ (128 bits per block)
 Use an array of basic blocks to implement a CAM
Width (bits)300
250
XCV2000E
200
150
BRAM 16x8b
Slice 1x8b
100
Size = 20,480 bits
50
7680
2560
1280
15360
96
640
480
128
0
Size = 122,880 bits
CAM depth in words
XILINX CAMs comparison
Device
Implementation
Min. CAM size
Max CAM size
MATCH (# of clock)
WRITE (# of clock)
Min. CAM width
Min. CAM depth
Max. CAM depth
Fastest Match
Decoded Address
Design
VIRTEX & VIRTEX-E
Slices RAM16x1 based
10 bits per LUT
~ 500 Kbits (XCv3200E)
16 cycles
1 cycle
1 bit
16 words
~64 K 8-bit words
16 x 12 ns
yes (by 16)
Ref. Design 202
97
VIRTEX & VIRTEX-E
Slices SRL16 based
4 bits per LUT
~200 Kbits (XCV3200E)
1 cycle
16 cycles
4 bits
1 word
~25 K 8-bit words
7.5 ns
yes
Ref. Design 203
VIRTEX & VIRTEX-E
Block SelectRAM
128 bits per Block
26 Kbits (XCV3200E)
1 cycle
1 cycle (+1 erase cycle)
8 bits
16 words
3,328 8-bit words
4.5 ns
yes
Ref. Design 204
SRL16
98
SelectShift
LUT
 Dynamically addressable
Shift Registers,
implemented in one LUT
IN
CE
CLK
D
Q
CE
0
D
Q
CE
1
D
Q
CE
2
CLB
Slice
Slice
LUT
LUT
LUT
LUT
D
Q
CE
ADDR[3:0]
99
15
OUT
SelectShift Features
 Serial In, Serial Out
 Does not require an address counter
 Programmable cycle delay from 1 to 16
— Addr[3:0] specifies the desired delay
 Cascade for cycle delays greater than 16
 CLB Flip-Flops can be used to add depth
100
Software Support
D
CLK
A0
A1
A2
A3
SRL16
Q
16-bit Shift Register Look-Up-Table
D
CE
CLK
A0
A1
A2
A3
SRL16E
Q
 Primitives available in
software
 Positive or negative
clock edge triggered
 Clock Enable optional
 Available for VHDL or
Verilog instantiations
16-bit Shift Register Look-Up-Table
with Clock Enable
101
SRL16 Applications
 Shift Registers
 Delayed Signal Generation
 Linear Feedback Shift Registers (LFSRs)
 CRC circuits
102
Virtex- E Configuration
103
Agenda
 Review of configuration Modes
— Serial, Parallel, JTAG
 Startup Sequence
 XC1800 PROM interfacing
 Daisy Chaining
 Tips in debugging configuration issues
 JTAG Configuration
104
Operation Flow
POWER UP
CONFIGURATION
• Serial Mode
•Parallel Mode
•JTAG
Device
Operational
105
 Configuration Data
stored in a PROM or
downloaded through a
cable
 Configuration time
dependents
— device size
— type of configuration
— clock speed
Configuration Modes
 Serial Modes
— Master
— Slave
 Parallel Mode
— SelectMAP
 JTAG
106
Serial Mode Configuration
Master Serial Configuration Mode
Virtex-E
PROM
CCLK
CLK
DIN
DATA
DONE
/CE
/INIT
/RESET/OE
 Serial Configuration
— Master mode: the Virtex-E device is initiating the
configuration
— Slave mode: the Virtex-E device is waiting for
some other device to start the configuration
107
Serial Mode Configuration
 Data is loaded serially- one bit per CCLK
 A Virtex-E device in Master Serial Mode
produces it’s own CCLK
— CCLK rate is controllable in software
— Mode used with a PROM
 In a Slave Serial Mode, Virtex-E device needs
a CCLK provided by another device
— All download cables do this
108
Parallel Mode Configuration
SelectMAP
Virtex-E
D0-D7
/CS
Microprocessor
/WRITE
CCLK
PROG
DONE
 One byte loaded per CCLK
 Designed to be driven by other logic device
—
—
—
—
Another FPGA or CPLD
Processor
Microcontroller
MultiLinx Cable
109
Important Signals in SelectMAP
 Data(D0-D7)- bi-directional data bus
— D0 is the MSB
 /WRITE- direction of data on the bus
— Low for configuration (Write)
— High for readback
 /CS- enable for the data bus
— a High will ignore CCLK transitions
 BUSY- output that indicates when data can be
received
— Not needed for CCLK < 50 MHz
110
SelectMAP- Things to Know
 Initialization needed after /INIT goes high
— 3 CCLKs needed
— If /CS and /WRITE are asserted early , no data
will be transferred on the first CCLK
 To strobe data, use /CS, not /WRITE
— If a CCLK rising edge occurs when /CS is
asserted and /WRITE is de-asserted, an ABORT
will occur
– Need to reload Sync Word and redo last
packet
111
Virtex-E Bitstream Format
 10 internal configuration registers
 Bitstream is actually a set sequence of writes
into those registers
 Configuration data still broken into frames
 All data is encapsulated into packets- Type I
and Type II
 When migrating from Virtex to Virtex-E a new
bitstream is needed
112
Configuration Registers
Register
Symbol
Register Name/Description
CMD
Command Register- executes commands to control read/write, CRC, etc.
FLR
COR
MASK
CTL
Frame Length- indicates frame size (available in XAPP138)
Configuration Option Register- some user selected options from Bitgen
Mask Register- masks out bits of CTL register for security
Control Register- handles internal functions like Port Persistence
FAR
Frame Address Register- sets the starting frame address
FDRI
Frame Data Input- pipelined input register that receives frame data
CRC
Cyclic Redundancy Check- loaded with CRC value that checks for errors
FDRO
LOUT
Frame Data Output- pipelined output register for reading frame data
Legacy Data Output- pipelines data to the DOUT pin
 Each register has a 5-bit address
 Detailed information in XAPP 138
113
Configuration Startup Sequence
 Four signals to control
—
—
—
—
GWE (Global Write Enable)
GSR (Global Set/Reset)
GTS (Global 3-State)
DONE (External Done Pin)
 Six phases to select assertion/de-assertion (1-6)
 Sequencer will wait in the DONE phase until DONE
goes high
 Can create “Sync-To-Done” behavior by setting GTS,
GSR, and GWE to same cycle as DONE
114
Startup Sequence
Phase
0
1
2
3
4
5
6
7
StartupClk
DONE
Default
Phase in
Bold
GTS
GSR
GWE
115
Virtex-E and XC1800 PROM’s
 Can program via serial or SelectMAP mode
— serial vs. parallel controlled in software
116
Daisy Chaining
Master
Slave
Slave
Virtex-E
#1
Virtex-E
#2
Virtex/4kX
#3
DIN
DIN
DOUT
DOUT
DIN
PROM
 Available only is Serial or JTAG Mode
 Concatenation of bitstreams does not work
— Use the software to generate the necessary
bitstreams (PROMGen)
117
Debugging Tips and Info
 What causes /INIT to go low?
— CRC check fails
— Internal error, e.g. data loaded too fast
 When will an error stay undetected?
— A bit is missed or added- this will misalign the
instructions, and the CRC check won’t happen
 Mode pin considerations
— Internal pullups are guaranteed
— Make sure pulldown is strong enough (4.7k)
118
JTAG Configuration
119
What is JTAG?
 JTAG - Joint Test Action Group
— Developed as standard testing interface
— Boundary Scan, IEEE STD 1149.1
 Four Dedicated Pins Required:
— TDI, TDO, TMS, and TCK
— TRST is an optional 5th pin that Xilinx does not
use
120
JTAG Standard
 JTAG Standard - 16 State, State Machine
—
—
—
—
TAP (Test Access Port)
IR (Instruction Register)
DR (Data Register)
Bypass Register
121
JTAG Tap Controller
Test-Logic-Reset
1
1
Run-Test/Idle
1
Select-DR-Scan
0
0
0
CaptureDR
0
1
ShiftDR
1
1
0
Capture-IR
0
Shift-IR
0
Exit1DR
0
PauseDR
1
0
1
Exit1-IR
1
1
0
Pause-DR
0
0
1
0
Exit2-DR
1
Exit2-DR
1
Update-DR
Update-IR
1
0
122
1
Select-IR-Scan
JTAG TAP Controller:
Architecture
123
BSDL Files
 Boundary Scan Description Language
 BSDL Files define the hardware
— Description of the die, with pins and scan chain
order
— Information about the size of the various chip
specific registers (e.g. instruction register length)
 Unconfigured BSDL files are provided
— Assumes all I/Os are bidirectional
124
BSDL Availability
 Files on the web are continuously updated
— Current software does not always have most
recent BSDL file
 HTTP://support.xilinx.com -> Software
125
JTAG Programmer
Software Support for Virtex-E
 JTAG Software Support in M2.1i SP3
— Non invasive: Idcode, Bypass, Usercode
— SVF file generation
 Stay current with the download tools
— Service packs
— Web Pack (pc only)
Foundation or Alliance software updates at:
http://support.xilinx.com/support/techsup/sw_updates/
JTAG Programmer at:
http://www.xilinx.com/sxpresso/webpack.htm
126
Cables
 Provided by Xilinx
 Multilinx
— Supported in 2.1i sp2 JTAG Programmer
— USB or Serial ports
— Win 98 only
 Parallel Cable III
 XChecker
127
Cables: JTAG Connections
* If there is a TRST trace on the board, it should be tied high
128
JTAG Debugging Tips
 Debug Chain Software Tool (Logic Probe)
 /TRST pin should be tied high on 3rd party chips
 Noise or bad parallel port
 ISP Checklist app note XAPP104
 Know all devices in chain and the order
 Virtex-E does not tolerate 5V signals directly
129
Good References
 Virtex-E Datasheet- basic information on configuration modes
 XAPP138- Configuration modes, packets and readback
 XAPP151- Detailed bitwise explanation of configuration
registers, partial reconfiguration hints and advanced concepts in
readback
 XAPP139 - Detailed information on JTAG configuration and
readback for VIRTEX devices
 XAPP153 - Status and Control register information for partial
reconfiguration information
http://www.xilinx.com/apps/virtexapp.htm
130