Transcript naturally
Digital Interface Design
EECS150 Fall 2008 – Lecture #23
Greg Gibeling
Slides adapted from everywhere
11/18/2008
EECS150 Lecture #23
1
Motivation
Any useful system includes at least two interfaces:
input and output
The most difficult work in any system is matching
incompatible interfaces
In a computer: keyboard & screen
In your project: audio & video
Compare CS70 and CS61B
Compare K-maps or adder design and your project
You will be designing interfaces
Either hardware or software
The basic ideas presented here apply fairly widely
11/18/2008
EECS150 Lecture #23
2
Outline
Quick Review: SDRAM and Audio
Principles
Interfaces
Metrics: Bandwidth, Latency, Pin Count & Logic Overhead
Datapath & Control (States & Events)
Synchronization: Clock & Reset
Handshaking (Ready/Valid)
Protocols (structure, syntax, sematics)
Simple Interfaces: SPI, I2C, UART, N64
Intermediate Interfaces: LCD, Ethernet (10M-10G), Interchip
CPU Interfaces: ISA, PCIe
Design
Back to principles
Reuse & Standardization
Modeling, Verification & Debugging
11/18/2008
EECS150 Lecture #23
3
Quick Review (1 of 4)
So What?
Almost everything needs
storage
Lots of space -> DRAM
SDRAM
SDRAM is BIG
Time multiplex address
lines
2 Dimensional Address
(Row & Column)
Often Shared
SDRAM
Mux
Mux
Mux
SDRAM SDRAM
SDRAM Data
Control Address
Data
Address
Address
Handshaking
SDRAM
Control
Handshaking
RST
Select
Handshaking Address Data
Data
Address
Handshaking
Arbitration for access
Affects performance
11/18/2008
Micron SDRAM Chips
Data
Data
SDRAM
Arbiter
Act
Act
Read
Write
Ref
Address
Handshaking
SDRAM Control
EECS150 Lecture #23
4
Quick Review (2 of 4)
SDRAM (cont)
Steps to Read/Write
Send Row Address (RAS)
Send Column Address
(CAS)
Send/Get Data (For 2,4,8
cycles)
Wait (precharge,
autorefresh, etc)
Synchronous Interface
Uses a clock & bursts to
increase bandwidth
Control requires precise
timing
11/18/2008
Issue sequences of
commands
Timing must be matched
to clock frequency
EECS150 Lecture #23
5
Quick Review (3 of 4)
So what?
Example data stream
Low bandwidth
Includes control
Audio
Primary interfaces are
analog
Audio is analog
Mixers, etc…
Bit Serial
11/18/2008
Low & fixed bandwidth
Low complexity
Expandable (e.g. 5.1,
7.1)
EECS150 Lecture #23
6
Quick Review (4 of 4)
AC97Controller
Decode
Sync
IORegister
BitCount
CodecReady
Bit
Count
<< Shift Register <<
SDataIn
>> Shift Register >>
SDataOut
Audio
Codec
Audio (cont)
Mux
Decode
Driver
Mux
FullVolumeControl
PHY_RX_CLK (~25MHz)
{CMD_A,
CMD_D}
CMD_Request
CMD_Valid
Control
AP_BIT_CLOCK (12MHz)
32b PCM Audio Data
Handshaking
Decode
32b PCM Audio
Recorded Data
Audio Buffer
Control
11/18/2008
EECS150 Lecture #23
Pair of shifters
Simple sync framing
Abstract registers
Highly stateful
VERY low bandwidth
7
Metrics (1 of 3)
So What?
Objective Metrics
We need some way to judge good vs bad
Allows us to compare interfaces without guessing
Evaluate tradeoffs and requirements in a formal manner
Bandwidth
Latency
Pin Count
State & Logic Overhead
Subjective Metrics
Documentation
Ease of use or debugging
Elegance
11/18/2008
EECS150 Lecture #23
8
Metrics (2 of 3)
Bandwidth
High or Low
Fixed or Variable
Latency
Higher is always better, but e.g. humans can only hear so much
Video, Audo are classic, but programs need instructions, which means DRAM
bandwidth
Raw video or audio have fixed bandwidth, compression (e.g. MP3) can make this
vary
Network bandwidth varies because of sharing
High or Low
Lower is usually better
If there’s no elastic buffer (no way to say “I’m not ready”)
This can cause data loss or require extra buffering, which is costly
Humans are very sensitive to gross latency
Generally reducing latency is VERY HARD without affecting the clock rate
Fixed or variable
11/18/2008
Generally referred to as “jitter”
E.g. on VOIP phones, Audio is fixed latency, network is variable, so we have a
problem
EECS150 Lecture #23
9
Metrics (3 of 3)
Pincount
Fast becoming a major problem
Chip area grows with N2, Pins are N for DIPs or N2 for BGAs
Either way pins are just physically large
Serial vs Parallel
They require a lot of area
They are slow and power hungry
Old: Parallel for high bandwidth
New: Serial for high bandwidth
What changed?
State & Logic Overhead
This is where major cost & complexity come into play
The bigger the circuit the more places to have a bug
Also affects power, yield and price
Interfaces can be very large
For example DDR2 SDRAM on a Virtex2 Pro
11/18/2008
The FPGA couldn’t support the clocking/handshaking easily
Required an incredible amount of logic to make up for this
Never very reliable as a result
EECS150 Lecture #23
10
Datapath & Control (1 of 4)
So What?
Datapath
Separates the data & control
Allows us to understand the meaning of signals
Separates timing from dataflow
Variable information not known until runtime
Regular structure or meaning (e.g. all integers)
Easy to design and debug
Control
Circuits which deal with meaning and timing
Small, irregular and complicated
Difficult to design and debug, even harder to extend
11/18/2008
EECS150 Lecture #23
11
Datapath & Control (2 of 4)
Datapath Signals
Wires which carry a value with temporal significance
Form the backbone of the datapath
May include “control” values
E.g. that this is a value to be written to DRAM
This is common in “data stationary control”
Coding
Common Codes
Binary: easy to understand, easy to work with
One-hot: allows inexpensive decoding
Gray Code: asynchronous logic, one bit change at a time
Other issues: state coding, floating point, etc
11/18/2008
EECS150 Lecture #23
12
Datapath & Control (3 of 4)
Control Signals
Wires which carry timing, but little data
Form the backbone of the control logic
Enables, resets, and so forth fall into this category
Event Coding
Edge (neg or pos)
Pulse High
Do something when a wire is 1, usually relative to a clock edge
Pulse Change
Generally we only use the clock edge in FPGA designs
Latch based designs use edges all the time, of course
Do something when a signal is different than on the last cycle
Time
11/18/2008
Do something a certain amount of time after a previous event
Measured with a clock in synchronous systems
Possible to build “delay lines” using transistors and gates
EECS150 Lecture #23
13
Synchronization (1 of 4)
Clocking
1 Clock
Fully synchronous, no need to worry about the issue
May have multiple resets
2 Clocks
Clock Crossing, easy to keep straight
E.g. hold video in reset until SDRAM is ready
Can get pretty complex (e.g. CPU & JTag)
Often use Async FIFOs and dual port RAMs on FPGAs
These are expensive in ASICs, use synchronizers
Obviously multiple resets
Local Clocks & LocalResetGen
11/18/2008
Often restricted to use in an interface (e.g. interchip)
May not be free-running
Often require careful design to avoid issues
EECS150 Lecture #23
14
Synchronization (2 of 4)
Reset
1 Clock, no initialization
Multistage Initialization
2 Clocks
Usually reset is synchronous to one
clock
May need a shift register to
resynchronize reset
Self starting
Reset for one module depends on
state of another
Using the ButtonParser is an
example of this
Useful for generating a reset for the
rest of the system
Any device which “just works” on
power-up has one
Can be built on FPGA by using a
shift register with an initial value
Local Resets & LocalResetGen
11/18/2008
Reset logic can affect clocking &
reliability
May be requirements like holding
reset for some time
EECS150 Lecture #23
15
se
Va
l
id
ea
el
el
R
W
as
Lo
ca
lR
es
eg
et
R
R
ck
lo
Lo
ca
lC
lo
Lo
ca
lC
lo
lC
ca
Lo
ea
t
Se
l
ck
R
ck
R
eg
lR
ca
Lo
ec
et
es
et
es
ed
pl
Sa
m
et
es
R
se
Synchronization (3 of 4)
Clock
Reset
Long Reset Count
LocalClock
LocalClockReset
LocalRegReset
Sync Shift
Sync Shift
Resync Shift
Sync Shift
Resync, ED, FF
(LCSEnable)
WasResetValid
11/18/2008
EECS150 Lecture #23
16
Reset
R
LR Counter
Synchronization (4 of 4)
PIn
Long Reset
Compare
SIn
Sync Shift SOut
S E POut
LocalClockReset
2
LocalClockSelect
LocalRegReset
0
LocalClocks
R
PIn
E
SIn Resync Shift SOut
TO Counter
LocalClock
Timeout
Compare
R E POut
WasResetReady
S
R
D FF Q
E
11/18/2008
EECS150 Lecture #23
WasResetValid
17
Handshaking (1 of 4)
So What?
When things happen is vital
Hardware modules must cooperate in order to be
useful
Planning out all interaction timings on the drawing
board is best, but often hopeless
Handshakes
Pipelined (None)
2 & 4 Cycle (Self-timed)
Ready/Valid (Synchronous)
11/18/2008
EECS150 Lecture #23
18
Handshaking (2 of 4)
4 Cycle
2 Cycle
More transistors
Not really faster
NRTZ: Non-RTZ
Can be synchronous
GasP
11/18/2008
RTZ: Return to Zero
Fewer transistors
Easier to debug
EECS150 Lecture #23
RTZ handshaking
Carefully delay matched
circuits
No clock!
19
Handshaking (3 of 4)
Ready/Valid
Accept
Composable
Allows the pass-through
Coregen FIFOs asymmetric
FIFOPassthrough
Latency Insensitive
Send
Symmetric
Avoid combinational loops
Simplifies generation and
checking
fe
r
Data
Independent
ns
Allows modules to run at their
own pace
Trades cost to do this!!
Tr
a
Clock
Send/Accept
Same signals, new names!
Why? Read on….
Valid
Ready
11/18/2008
EECS150 Lecture #23
20
Handshaking (4 of 4)
Valid0
Ready0
Arbiter
Router
Composition Failure
Valid1
Ready1
Classes
11/18/2008
Arbiter chooses one of two
inputs
Router chooses one of two
outputs
Read0 & Valid1
Any time two modules are
connected by two paths…
EECS150 Lecture #23
Class1: No dependencies
Class2: Dependencies
between ports
Class3: Dependencies within
ports
21
Protocols (1 of 5)
So What?
Structure
How the data fits together
We’ll cover this more in the next few slides
Sematics
Parallel: all the bits at once
Counted: there are a fixed number of words, we count them off
Framed: adding a higher level handshake allows variable length
Syntax
Know the data isn’t enough, we need meaning
Just like language we build representations of meaning
Knowing the patterns to meaning, allows us to abstract it
What the data means
Highly dependent on the interface in question
Terms: The Band
In Band: the data we’re trying to move
Out of Band: control, metadata and other issues
11/18/2008
EECS150 Lecture #23
22
Protocols (2 of 5)
Dataflow Based
Audio, video, instructions in a CPU
Generally when there’s little (no) OOB data
Usually parallel or counted for simplicity
Benefits
Excellent handling of LTI or independent data values
Simple production and consumption
Little or no state, e.g. a valid bit is all you need
Allows construction of specialized hardware (DSP designs for example)
Drawbacks
Very difficult, if not impossible to deal with exceptions
For playing audio: what if you need data but it’s not there?
When things fail there’s often nothing you can do
11/18/2008
EECS150 Lecture #23
23
Protocols (3 of 5)
Command Based
Benefits
Useful for low bandwidth peripherals
Organized according to master/slave
E.g. draw a line, write a word to memory
Very easy to build new slaves
Clear demarcation of responsibility (Good for CPUs)
Generally very easy to expand, just add new commands
Drawbacks
Tends to be very low performance
Overhead to specify command
No parallelism
Usually requires some polling (interrupts are poll based)
Requires master to know state at all times
11/18/2008
EECS150 Lecture #23
24
Protocols (4 of 5)
Register Based
Benefits
Stateful peripherals with lots of config
Organized according to master/slave
Often used alongside a dataflow interface
Provides a memory-like abstraction
Allows the master to read state easily
Easy to deal with exceptional conditions (error flag)
Drawbacks
Medium performance
Overhead to specify read/write and register address
DMA can help with this
Requires a clear master, often meaning an FSM/CPU
11/18/2008
EECS150 Lecture #23
25
Protocols (5 of 5)
Layering
Dataflow on top of command
Each command can be a “write <data>”
Not entirely efficient, but gets the job done
This is how software FIFOs and networks work
Register on top of command
Uncommon to have one syntax
They are easy to layer
Two commands: read & write
Relatively common, allows command wires to be shared
This is how most memories, especially DRAMs work
Command on top of register
Writing a certain value to a register indicates the command
Perhaps a series of writes to registers
Many CPU peripherals do this
11/18/2008
EECS150 Lecture #23
26
Simple Interfaces (1 of 4)
So What?
Uses few wires
No tristates
Synchronous
SPI
Signals: SO, SI, CS,
CLK
Uses: CC2420, ADC
Bit Serial
Bidirectional
Often used with
register syntax
11/18/2008
EECS150 Lecture #23
27
Simple Interfaces (2 of 4)
So What?
I2C
11/18/2008
Fewest pins (almost)
Control, not data
Long distance
EECS150 Lecture #23
Uses two wires
Master/Slave
Includes handshake
Bit Serial
Bidirectional
Often used with register
syntax
28
Simple Interfaces (3 of 4)
So What?
Very few pins (3)
No clock required
Long distance
History
UART
Bit serial
No clock signal
Good & Bad
Relies on timing for events
Often used with dataflow syntax
Simple/cheap
Noise resistant
Problems
11/18/2008
In IBM PCs
RS232 and RS485
Still widely used
Low bandwidth
Limited by internal timing clocks
Very low level protocol
EECS150 Lecture #23
29
Simple Interfaces (4 of 4)
So What?
1'b1
Stop
N64 Controllers
Used in projects
Start
1'b0
Stop
Start
N64
Asynchronous
More robust than UART
Command Syntax
Stop
0 to 1
Data
Start
Main: Reset & Read Buttons
Other: Status, Mempack, EEPROM
4us/Bit
Receiving a bit:
Look for 1’b1 (Stop) -> 1’b0 (Start)
Wait 1us (why 1us?!?)
Capture Data
4us/Bit
1'b0
0
11/18/2008
1'b0
1
1'b0
2
1'b0
3
1'b0
4
1'b0
5
1'b0
6
EECS150 Lecture #23
1'b1
1'b1
7
8
30
Intermediate Interfaces (1 of 4)
So What?
HD44780, standard
4 or 8b operation
Interesting timing
LCD
Interface
LCD_DB[7:0]: Data
LCD_RS: Registe select
LCD_RW: Read/Write
LCD_E
11/18/2008
EECS150 Lecture #23
Enable/Strobe
Provides timing
31
Intermediate Interfaces (2 of 4)
So What?
Ethernet Packet Format
Used everywhere
Framed structure
Dataflow syntax
32bits
Destination [47:16]
Destination [15:0]
10M-1G Ethernet
Ethernet Type [15:0]
Data [15:0]
Data [31:0]
Bit Serial Link
Source [47:32]
Source [31:0]
4/5bit Encoding takes 20% overhead
Bit5 is used for Data-Valid and Error
Data [31:0]
CRC [31:0]
Preamble used for clock extraction
Inter Frame Gap ensures packets aren’t
back-to-back
CRC used to avoid errors from
transmission
11/18/2008
EECS150 Lecture #23
32
Intermediate Interfaces (3 of 4)
MAC Rx FSM (Simplified)
Error | ~Valid
Idle
10M-1G Ethernet
Receive
Preamble
Valid & ~SFD
Error | ~Valid
SFD
Valid & SFD
FSM
Nibble Counter & Reset
MAC Rx Unit
Counter
Data Valid Signal
MACShift
PHY_RX_D
4b Raw Ethernet Data
Check
Ethernet
CRC
32b CRC
32b Ethernet Packet Data
11/18/2008
Transmit is similar
CRC
1b Data Valid Signal
PHY_RX_DV
1b Ethernet Data Valid
Data
MAC Rx Detailed Block Diagram
Wait for DataValid & SFD
Start shifting/FIFOing data
Wait for DataValid to go
low
Check CRC, discard/mark
packet
An LFSR based code
Appended to the end of
each frame
Used to ensure nothing is
corrupted
CRC Valid?
EECS150 Lecture #23
33
Intermediate Interfaces (4 of 4)
So What?
Source Synchronous
Very high bandwidth
966Mbps per pair
Interchip
Dataflow structure
Send clock alongside data
Requires async FIFO
Differential pairs require special signaling
for this
11/18/2008
EECS150 Lecture #23
34
CPU Interfaces (1 of 3)
So What?
Key Assumptions
Allow CPU to control peripherals
Old: Simplicity of I/O devices (no FPGAs back in the day)
New: Bandwidth (audio & video)
CPU is in control
Separation of data (high bandwidth) and control (very low latency)
Basic Organization
Historically “bus” based
Single arbiter, or even single master
Most devices are simple and respond only
Memory/register centric (e.g. read/write ops)
Newer point to point designs
11/18/2008
PCIe, HyperTransport
Based on command packets (e.g. read/write ops)
EECS150 Lecture #23
35
CPU Interfaces (2 of 3)
So What?
ISA
Very widespread standard
Simple enough to describe here
Synchronous bus
Basic Operations
Address (CPU -> IO)
Control (CPU -> IO)
Data (CPU <-> IO)
Extensions
Assumes 1 cycle access
8MHz standard
DMA
Interrupts
History
IBM PC XT
8b and then 16b
PnP Added Later
Open Standard
11/18/2008
EECS150 Lecture #23
36
CPU Interfaces (3 of 3)
So What?
Higher bandwidth than old parallel busses
Overcomes pin limitations
Separates physical and logical transport to allow more complex analog design
PCIe
Based on bit-serial lanes
Point to Point
Packet/Switch Based
High overhead for small messages (interrupts)
Layers
Very high bandwidth
Channel bonding, similar to 10Gbps Ethernet
Physical
Data Link (ack/nak)
Transactions (memory/int)
History
Developed by Intel
2.5 GTps, 5GTps …
11/18/2008
EECS150 Lecture #23
37
Design (1 of 3)
So What?
Well, you’ve been designing some interfaces
You will keep using them
Similar principles apply to hardware and software
Back to Principles
What do you want from the interface (SHOULD)
What do you need from the interface (MUST)
11/18/2008
EECS150 Lecture #23
38
Design (2 of 3)
Reuse & Standardization
May introduce overhead
Leverage well tested modules
Eases debugging & documentation
Modeling, Verification & Debugging
Requires two implementations
E.g. transmitter & receiver
Automated testing
11/18/2008
Allows you to quickly verify any changes
Greatly simplifies life for someone else
EECS150 Lecture #23
39
Design (3 of 3)
Good Interfaces
Simplify the interacting modules
Both the design and implementation
Simplify doesn’t always mean “making smaller”
Are self-documenting
Are naturally widely applicable
Bad Interfaces
Are complex, or hard to debug
Are expensive to design and implement
Make incorrect assumptions
Do more work than necessary
11/18/2008
Eliminating timing assumptions, when we know the timing
Otherwise checking invariants we know to be true
EECS150 Lecture #23
40
A Case Study (1 of 2)
The RAMP DRAM Interface
What MUST we do
What should we do
Convey address to the controller
Convey data in both directions
Support handshaking to deal with variable latency in controller
Allow multiple users to share DRAM
Support extremely high bandwidth
The Design
3 FIFOs with Ready/Valid
Command: read/write and address to controller
DataIn: data to be written (and mask)
DataOut: data which was read (and any error counts for ECC)
11/18/2008
EECS150 Lecture #23
41
A Case Study (2 of 2)
Metrics
Datapath & Control
Bandwidth: maximized by using wide data FIFOs
Latency: minimized by avoiding any serialization
Pint Count: dictated by need for maximum bandwidth
Complexity: low thanks to ready/valid
All 3 FIFOs are datapath
Separate initialization & power state for control
Clocking: Each FIFO can have a separate clock
Handshaking is Ready/Valid
Protocol
Low level: dataflow
Intermediate level: commands
High level: register
11/18/2008
EECS150 Lecture #23
42
Summary (1 of 2)
Any useful system includes at least two interfaces: input and
output
The most difficult work in any system is matching incompatible
interfaces
Principles
Metrics: Bandwidth, Latency, Pin Count & Logic Overhead
Datapath & Control (States & Events)
Synchronization: Clock & Reset
Handshaking (Ready/Valid)
Protocols (structure, syntax, sematics)
Design
Back to principles
Reuse & Standardization
Modeling, Verification & Debugging
11/18/2008
EECS150 Lecture #23
43
Summary (2 of 2)
Interfaces
Simple Interfaces
Intermediate Interfaces
SPI, I2C, UART, N64
JTag, Slave Serial, MDI (Ethernet)
SDRAM, Audio, LCD, Ethernet (10M-10G), Interchip
CC2420, Video Encoder/Decoder
CPU Interfaces
11/18/2008
ISA, PCIe
MCA, PCI, PCI-X, HyperTransport, Intel FSB, AGP, AMBA
EECS150 Lecture #23
44