naturally

Transcript naturally

Digital Interface Design
EECS150 Fall 2008 – Lecture #23
Greg Gibeling
Slides adapted from everywhere
11/18/2008
EECS150 Lecture #23
1
Motivation

Any useful system includes at least two interfaces:
input and output



The most difficult work in any system is matching
incompatible interfaces



In a computer: keyboard & screen
In your project: audio & video
Compare CS70 and CS61B
Compare K-maps or adder design and your project
You will be designing interfaces


Either hardware or software
The basic ideas presented here apply fairly widely
11/18/2008
EECS150 Lecture #23
2
Outline


Quick Review: SDRAM and Audio
Principles






Interfaces




Metrics: Bandwidth, Latency, Pin Count & Logic Overhead
Datapath & Control (States & Events)
Synchronization: Clock & Reset
Handshaking (Ready/Valid)
Protocols (structure, syntax, sematics)
Simple Interfaces: SPI, I2C, UART, N64
Intermediate Interfaces: LCD, Ethernet (10M-10G), Interchip
CPU Interfaces: ISA, PCIe
Design



Back to principles
Reuse & Standardization
Modeling, Verification & Debugging
11/18/2008
EECS150 Lecture #23
3
Quick Review (1 of 4)

So What?



Almost everything needs
storage
Lots of space -> DRAM
SDRAM

SDRAM is BIG



Time multiplex address
lines
2 Dimensional Address
(Row & Column)
Often Shared


SDRAM
Mux
Mux
Mux
SDRAM SDRAM
SDRAM Data
Control Address
Data
Address
Address
Handshaking
SDRAM
Control
Handshaking
RST
Select
Handshaking Address Data
Data
Address
Handshaking
Arbitration for access
Affects performance
11/18/2008
Micron SDRAM Chips
Data
Data
SDRAM
Arbiter
Act
Act
Read
Write
Ref
Address
Handshaking
SDRAM Control
EECS150 Lecture #23
4
Quick Review (2 of 4)

SDRAM (cont)

Steps to Read/Write





Send Row Address (RAS)
Send Column Address
(CAS)
Send/Get Data (For 2,4,8
cycles)
Wait (precharge,
autorefresh, etc)
Synchronous Interface


Uses a clock & bursts to
increase bandwidth
Control requires precise
timing


11/18/2008
Issue sequences of
commands
Timing must be matched
to clock frequency
EECS150 Lecture #23
5
Quick Review (3 of 4)

So what?




Example data stream
Low bandwidth
Includes control
Audio

Primary interfaces are
analog



Audio is analog
Mixers, etc…
Bit Serial



11/18/2008
Low & fixed bandwidth
Low complexity
Expandable (e.g. 5.1,
7.1)
EECS150 Lecture #23
6
Quick Review (4 of 4)
AC97Controller
Decode
Sync
IORegister
BitCount
CodecReady
Bit
Count
<< Shift Register <<
SDataIn
>> Shift Register >>
SDataOut
Audio
Codec
Audio (cont)

Mux
Decode

Driver
Mux
FullVolumeControl
PHY_RX_CLK (~25MHz)
{CMD_A,
CMD_D}
CMD_Request
CMD_Valid
Control
AP_BIT_CLOCK (12MHz)
32b PCM Audio Data
Handshaking
Decode

32b PCM Audio
Recorded Data

Audio Buffer

Control



11/18/2008
EECS150 Lecture #23
Pair of shifters
Simple sync framing
Abstract registers
Highly stateful
VERY low bandwidth
7
Metrics (1 of 3)

So What?




Objective Metrics





We need some way to judge good vs bad
Allows us to compare interfaces without guessing
Evaluate tradeoffs and requirements in a formal manner
Bandwidth
Latency
Pin Count
State & Logic Overhead
Subjective Metrics



Documentation
Ease of use or debugging
Elegance
11/18/2008
EECS150 Lecture #23
8
Metrics (2 of 3)

Bandwidth

High or Low



Fixed or Variable



Latency

Higher is always better, but e.g. humans can only hear so much
Video, Audo are classic, but programs need instructions, which means DRAM
bandwidth
Raw video or audio have fixed bandwidth, compression (e.g. MP3) can make this
vary
Network bandwidth varies because of sharing
High or Low

Lower is usually better





If there’s no elastic buffer (no way to say “I’m not ready”)
This can cause data loss or require extra buffering, which is costly
Humans are very sensitive to gross latency
Generally reducing latency is VERY HARD without affecting the clock rate
Fixed or variable


11/18/2008
Generally referred to as “jitter”
E.g. on VOIP phones, Audio is fixed latency, network is variable, so we have a
problem
EECS150 Lecture #23
9
Metrics (3 of 3)

Pincount

Fast becoming a major problem


Chip area grows with N2, Pins are N for DIPs or N2 for BGAs
Either way pins are just physically large



Serial vs Parallel




They require a lot of area
They are slow and power hungry
Old: Parallel for high bandwidth
New: Serial for high bandwidth
What changed?
State & Logic Overhead

This is where major cost & complexity come into play




The bigger the circuit the more places to have a bug
Also affects power, yield and price
Interfaces can be very large
For example DDR2 SDRAM on a Virtex2 Pro



11/18/2008
The FPGA couldn’t support the clocking/handshaking easily
Required an incredible amount of logic to make up for this
Never very reliable as a result
EECS150 Lecture #23
10
Datapath & Control (1 of 4)

So What?




Datapath




Separates the data & control
Allows us to understand the meaning of signals
Separates timing from dataflow
Variable information not known until runtime
Regular structure or meaning (e.g. all integers)
Easy to design and debug
Control



Circuits which deal with meaning and timing
Small, irregular and complicated
Difficult to design and debug, even harder to extend
11/18/2008
EECS150 Lecture #23
11
Datapath & Control (2 of 4)

Datapath Signals



Wires which carry a value with temporal significance
Form the backbone of the datapath
May include “control” values



E.g. that this is a value to be written to DRAM
This is common in “data stationary control”
Coding

Common Codes




Binary: easy to understand, easy to work with
One-hot: allows inexpensive decoding
Gray Code: asynchronous logic, one bit change at a time
Other issues: state coding, floating point, etc
11/18/2008
EECS150 Lecture #23
12
Datapath & Control (3 of 4)

Control Signals




Wires which carry timing, but little data
Form the backbone of the control logic
Enables, resets, and so forth fall into this category
Event Coding

Edge (neg or pos)



Pulse High


Do something when a wire is 1, usually relative to a clock edge
Pulse Change


Generally we only use the clock edge in FPGA designs
Latch based designs use edges all the time, of course
Do something when a signal is different than on the last cycle
Time



11/18/2008
Do something a certain amount of time after a previous event
Measured with a clock in synchronous systems
Possible to build “delay lines” using transistors and gates
EECS150 Lecture #23
13
Synchronization (1 of 4)

Clocking

1 Clock


Fully synchronous, no need to worry about the issue
May have multiple resets



2 Clocks

Clock Crossing, easy to keep straight




E.g. hold video in reset until SDRAM is ready
Can get pretty complex (e.g. CPU & JTag)
Often use Async FIFOs and dual port RAMs on FPGAs
These are expensive in ASICs, use synchronizers
Obviously multiple resets
Local Clocks & LocalResetGen



11/18/2008
Often restricted to use in an interface (e.g. interchip)
May not be free-running
Often require careful design to avoid issues
EECS150 Lecture #23
14
Synchronization (2 of 4)

Reset


1 Clock, no initialization
Multistage Initialization



2 Clocks



Usually reset is synchronous to one
clock
May need a shift register to
resynchronize reset
Self starting




Reset for one module depends on
state of another
Using the ButtonParser is an
example of this
Useful for generating a reset for the
rest of the system
Any device which “just works” on
power-up has one
Can be built on FPGA by using a
shift register with an initial value
Local Resets & LocalResetGen


11/18/2008
Reset logic can affect clocking &
reliability
May be requirements like holding
reset for some time
EECS150 Lecture #23
15
se
Va
l
id
ea
el
el
R
W
as
Lo
ca
lR
es
eg
et
R
R
ck
lo
Lo
ca
lC
lo
Lo
ca
lC
lo
lC
ca
Lo
ea
t
Se
l
ck
R
ck
R
eg
lR
ca
Lo
ec
et
es
et
es
ed
pl
Sa
m
et
es
R
se
Synchronization (3 of 4)
Clock
Reset
Long Reset Count
LocalClock
LocalClockReset
LocalRegReset
Sync Shift
Sync Shift
Resync Shift
Sync Shift
Resync, ED, FF
(LCSEnable)
WasResetValid
11/18/2008
EECS150 Lecture #23
16
Reset
R
LR Counter
Synchronization (4 of 4)
PIn
Long Reset
Compare
SIn
Sync Shift SOut
S E POut
LocalClockReset
2
LocalClockSelect
LocalRegReset
0
LocalClocks
R
PIn
E
SIn Resync Shift SOut
TO Counter
LocalClock
Timeout
Compare
R E POut
WasResetReady
S
R
D FF Q
E
11/18/2008
EECS150 Lecture #23
WasResetValid
17
Handshaking (1 of 4)

So What?




When things happen is vital
Hardware modules must cooperate in order to be
useful
Planning out all interaction timings on the drawing
board is best, but often hopeless
Handshakes



Pipelined (None)
2 & 4 Cycle (Self-timed)
Ready/Valid (Synchronous)
11/18/2008
EECS150 Lecture #23
18
Handshaking (2 of 4)

4 Cycle




2 Cycle





More transistors
Not really faster
NRTZ: Non-RTZ
Can be synchronous
GasP



11/18/2008
RTZ: Return to Zero
Fewer transistors
Easier to debug
EECS150 Lecture #23
RTZ handshaking
Carefully delay matched
circuits
No clock!
19
Handshaking (3 of 4)
Ready/Valid




Accept
Composable
Allows the pass-through
Coregen FIFOs asymmetric
FIFOPassthrough
Latency Insensitive



Send
Symmetric


Avoid combinational loops
Simplifies generation and
checking
fe
r

Data
Independent
ns

Allows modules to run at their
own pace
Trades cost to do this!!
Tr
a

Clock
Send/Accept


Same signals, new names!
Why? Read on….
Valid
Ready
11/18/2008
EECS150 Lecture #23
20
Handshaking (4 of 4)

Valid0

Ready0
Arbiter
Router
Composition Failure

Valid1
Ready1



Classes



11/18/2008
Arbiter chooses one of two
inputs
Router chooses one of two
outputs
Read0 & Valid1
Any time two modules are
connected by two paths…
EECS150 Lecture #23
Class1: No dependencies
Class2: Dependencies
between ports
Class3: Dependencies within
ports
21
Protocols (1 of 5)

So What?




Structure





How the data fits together
We’ll cover this more in the next few slides
Sematics



Parallel: all the bits at once
Counted: there are a fixed number of words, we count them off
Framed: adding a higher level handshake allows variable length
Syntax


Know the data isn’t enough, we need meaning
Just like language we build representations of meaning
Knowing the patterns to meaning, allows us to abstract it
What the data means
Highly dependent on the interface in question
Terms: The Band


In Band: the data we’re trying to move
Out of Band: control, metadata and other issues
11/18/2008
EECS150 Lecture #23
22
Protocols (2 of 5)

Dataflow Based




Audio, video, instructions in a CPU
Generally when there’s little (no) OOB data
Usually parallel or counted for simplicity
Benefits


Excellent handling of LTI or independent data values
Simple production and consumption



Little or no state, e.g. a valid bit is all you need
Allows construction of specialized hardware (DSP designs for example)
Drawbacks

Very difficult, if not impossible to deal with exceptions


For playing audio: what if you need data but it’s not there?
When things fail there’s often nothing you can do
11/18/2008
EECS150 Lecture #23
23
Protocols (3 of 5)

Command Based




Benefits




Useful for low bandwidth peripherals
Organized according to master/slave
E.g. draw a line, write a word to memory
Very easy to build new slaves
Clear demarcation of responsibility (Good for CPUs)
Generally very easy to expand, just add new commands
Drawbacks

Tends to be very low performance




Overhead to specify command
No parallelism
Usually requires some polling (interrupts are poll based)
Requires master to know state at all times
11/18/2008
EECS150 Lecture #23
24
Protocols (4 of 5)

Register Based




Benefits




Stateful peripherals with lots of config
Organized according to master/slave
Often used alongside a dataflow interface
Provides a memory-like abstraction
Allows the master to read state easily
Easy to deal with exceptional conditions (error flag)
Drawbacks

Medium performance



Overhead to specify read/write and register address
DMA can help with this
Requires a clear master, often meaning an FSM/CPU
11/18/2008
EECS150 Lecture #23
25
Protocols (5 of 5)

Layering



Dataflow on top of command




Each command can be a “write <data>”
Not entirely efficient, but gets the job done
This is how software FIFOs and networks work
Register on top of command




Uncommon to have one syntax
They are easy to layer
Two commands: read & write
Relatively common, allows command wires to be shared
This is how most memories, especially DRAMs work
Command on top of register



Writing a certain value to a register indicates the command
Perhaps a series of writes to registers
Many CPU peripherals do this
11/18/2008
EECS150 Lecture #23
26
Simple Interfaces (1 of 4)

So What?




Uses few wires
No tristates
Synchronous
SPI





Signals: SO, SI, CS,
CLK
Uses: CC2420, ADC
Bit Serial
Bidirectional
Often used with
register syntax
11/18/2008
EECS150 Lecture #23
27
Simple Interfaces (2 of 4)

So What?




I2C






11/18/2008
Fewest pins (almost)
Control, not data
Long distance
EECS150 Lecture #23
Uses two wires
Master/Slave
Includes handshake
Bit Serial
Bidirectional
Often used with register
syntax
28
Simple Interfaces (3 of 4)






So What?
Very few pins (3)
No clock required
Long distance
History



UART



Bit serial
No clock signal



Good & Bad


Relies on timing for events
Often used with dataflow syntax
Simple/cheap
Noise resistant
Problems



11/18/2008
In IBM PCs
RS232 and RS485
Still widely used
Low bandwidth
Limited by internal timing clocks
Very low level protocol
EECS150 Lecture #23
29
Simple Interfaces (4 of 4)
So What?



1'b1
Stop
N64 Controllers
Used in projects
Start
1'b0
Stop
Start
N64



Asynchronous
More robust than UART
Command Syntax



Stop
0 to 1

Data
Start
Main: Reset & Read Buttons
Other: Status, Mempack, EEPROM
4us/Bit
Receiving a bit:



Look for 1’b1 (Stop) -> 1’b0 (Start)
Wait 1us (why 1us?!?)
Capture Data
4us/Bit
1'b0
0
11/18/2008
1'b0
1
1'b0
2
1'b0
3
1'b0
4
1'b0
5
1'b0
6
EECS150 Lecture #23
1'b1
1'b1
7
8
30
Intermediate Interfaces (1 of 4)

So What?




HD44780, standard
4 or 8b operation
Interesting timing
LCD

Interface




LCD_DB[7:0]: Data
LCD_RS: Registe select
LCD_RW: Read/Write
LCD_E


11/18/2008
EECS150 Lecture #23
Enable/Strobe
Provides timing
31
Intermediate Interfaces (2 of 4)

So What?




Ethernet Packet Format
Used everywhere
Framed structure
Dataflow syntax
32bits
Destination [47:16]
Destination [15:0]
10M-1G Ethernet




Ethernet Type [15:0]
Data [15:0]
Data [31:0]
Bit Serial Link


Source [47:32]
Source [31:0]
4/5bit Encoding takes 20% overhead
Bit5 is used for Data-Valid and Error
Data [31:0]
CRC [31:0]
Preamble used for clock extraction
Inter Frame Gap ensures packets aren’t
back-to-back
CRC used to avoid errors from
transmission
11/18/2008
EECS150 Lecture #23
32
Intermediate Interfaces (3 of 4)
MAC Rx FSM (Simplified)


Error | ~Valid
Idle
10M-1G Ethernet
Receive
Preamble

Valid & ~SFD


Error | ~Valid
SFD
Valid & SFD
FSM

Nibble Counter & Reset
MAC Rx Unit

Counter
Data Valid Signal

MACShift
PHY_RX_D
4b Raw Ethernet Data
Check
Ethernet
CRC
32b CRC
32b Ethernet Packet Data
11/18/2008
Transmit is similar
CRC

1b Data Valid Signal
PHY_RX_DV
1b Ethernet Data Valid

Data
MAC Rx Detailed Block Diagram
Wait for DataValid & SFD
Start shifting/FIFOing data
Wait for DataValid to go
low
Check CRC, discard/mark
packet

An LFSR based code
Appended to the end of
each frame
Used to ensure nothing is
corrupted
CRC Valid?
EECS150 Lecture #23
33
Intermediate Interfaces (4 of 4)

So What?


Source Synchronous
Very high bandwidth


966Mbps per pair
Interchip




Dataflow structure
Send clock alongside data
Requires async FIFO
Differential pairs require special signaling
for this
11/18/2008
EECS150 Lecture #23
34
CPU Interfaces (1 of 3)

So What?




Key Assumptions



Allow CPU to control peripherals
Old: Simplicity of I/O devices (no FPGAs back in the day)
New: Bandwidth (audio & video)
CPU is in control
Separation of data (high bandwidth) and control (very low latency)
Basic Organization

Historically “bus” based




Single arbiter, or even single master
Most devices are simple and respond only
Memory/register centric (e.g. read/write ops)
Newer point to point designs


11/18/2008
PCIe, HyperTransport
Based on command packets (e.g. read/write ops)
EECS150 Lecture #23
35
CPU Interfaces (2 of 3)

So What?



ISA

Very widespread standard
Simple enough to describe here
Synchronous bus



Basic Operations




Address (CPU -> IO)
Control (CPU -> IO)
Data (CPU <-> IO)
Extensions



Assumes 1 cycle access
8MHz standard
DMA
Interrupts
History




IBM PC XT
8b and then 16b
PnP Added Later
Open Standard
11/18/2008
EECS150 Lecture #23
36
CPU Interfaces (3 of 3)

So What?




Higher bandwidth than old parallel busses
Overcomes pin limitations
Separates physical and logical transport to allow more complex analog design
PCIe

Based on bit-serial lanes



Point to Point



Packet/Switch Based
High overhead for small messages (interrupts)
Layers




Very high bandwidth
Channel bonding, similar to 10Gbps Ethernet
Physical
Data Link (ack/nak)
Transactions (memory/int)
History


Developed by Intel
2.5 GTps, 5GTps …
11/18/2008
EECS150 Lecture #23
37
Design (1 of 3)

So What?




Well, you’ve been designing some interfaces
You will keep using them
Similar principles apply to hardware and software
Back to Principles


What do you want from the interface (SHOULD)
What do you need from the interface (MUST)
11/18/2008
EECS150 Lecture #23
38
Design (2 of 3)

Reuse & Standardization




May introduce overhead
Leverage well tested modules
Eases debugging & documentation
Modeling, Verification & Debugging

Requires two implementations


E.g. transmitter & receiver
Automated testing


11/18/2008
Allows you to quickly verify any changes
Greatly simplifies life for someone else
EECS150 Lecture #23
39
Design (3 of 3)

Good Interfaces

Simplify the interacting modules





Both the design and implementation
Simplify doesn’t always mean “making smaller”
Are self-documenting
Are naturally widely applicable
Bad Interfaces




Are complex, or hard to debug
Are expensive to design and implement
Make incorrect assumptions
Do more work than necessary


11/18/2008
Eliminating timing assumptions, when we know the timing
Otherwise checking invariants we know to be true
EECS150 Lecture #23
40
A Case Study (1 of 2)

The RAMP DRAM Interface

What MUST we do




What should we do



Convey address to the controller
Convey data in both directions
Support handshaking to deal with variable latency in controller
Allow multiple users to share DRAM
Support extremely high bandwidth
The Design




3 FIFOs with Ready/Valid
Command: read/write and address to controller
DataIn: data to be written (and mask)
DataOut: data which was read (and any error counts for ECC)
11/18/2008
EECS150 Lecture #23
41
A Case Study (2 of 2)

Metrics





Datapath & Control





Bandwidth: maximized by using wide data FIFOs
Latency: minimized by avoiding any serialization
Pint Count: dictated by need for maximum bandwidth
Complexity: low thanks to ready/valid
All 3 FIFOs are datapath
Separate initialization & power state for control
Clocking: Each FIFO can have a separate clock
Handshaking is Ready/Valid
Protocol



Low level: dataflow
Intermediate level: commands
High level: register
11/18/2008
EECS150 Lecture #23
42
Summary (1 of 2)



Any useful system includes at least two interfaces: input and
output
The most difficult work in any system is matching incompatible
interfaces
Principles






Metrics: Bandwidth, Latency, Pin Count & Logic Overhead
Datapath & Control (States & Events)
Synchronization: Clock & Reset
Handshaking (Ready/Valid)
Protocols (structure, syntax, sematics)
Design



Back to principles
Reuse & Standardization
Modeling, Verification & Debugging
11/18/2008
EECS150 Lecture #23
43
Summary (2 of 2)

Interfaces

Simple Interfaces



Intermediate Interfaces



SPI, I2C, UART, N64
JTag, Slave Serial, MDI (Ethernet)
SDRAM, Audio, LCD, Ethernet (10M-10G), Interchip
CC2420, Video Encoder/Decoder
CPU Interfaces


11/18/2008
ISA, PCIe
MCA, PCI, PCI-X, HyperTransport, Intel FSB, AGP, AMBA
EECS150 Lecture #23
44

naturally

Transcript naturally

Directory