Transcript SOC-CH5b

Chapter 5
Interconnect
Computer System Design
System-on-Chip
by M. Flynn & W. Luk
Pub. Wiley 2011 (copyright 2011)
soc 5.1
SOC interconnect design approach
soc 5.2
Interconnect design
• find the cost and performance of alternatives
• iterated to find the least expensive design
that meets the requirements
• consider the larger issues: reliability,
scalability, design costs, availability of IP
soc 5.3
SOC module with interconnect
soc 5.4
Many alternatives
• find requirements: number of nodes,
performance requirements, marginal and
development cost
• bus based: purchased IP or proprietary
• NOC based: static vs dynamic
soc 5.5
AMBA bus based system
soc 5.6
Bus terminology
•
•
•
•
•
•
•
protocol
master / slave; agents on the bus
arbitration / arbitrator :assigns bus ownership
bridge: communications between protocols
physical configuration: wires, bidirectional
synchronization: clock management
bus wrapper: manages multiple protocols
soc 5.7
MUX connects 3 masters, 4 slaves
soc 5.8
Simple AHB transfer
soc 5.9
Core connect SOC
soc 5.10
PLB transfer protocol
soc 5.11
Bus types and ideal performance
(a) Simple Bus
(b) Bus with arbitration support
(c) Tenured split bus: 4 bytes wide
(d) Tenured split bus: 16 bytes wide
soc 5.12
bus transmission time: 1 cycle
OCP and bus wrappers
soc 5.13
Sonics microNetwork
soc 5.14
Hardware gates
for write buffer
Performance of
buffer; burst mode
soc 5.15
Analyzing bus performance
• find offered occupancy (r) for each source
(master); find the number of sources (n)
– note a complex superscalar can have multiple
sources as I, D caches can prefetch independently
• does the source immediately resubmit a
request if it is denied?
• find achieved occupancy (ra)
• overall the system’s performance is
reduced by (ra/r)
soc 5.16
Without resubmissions
• Prob(processor does not access bus) = 1 – r
• Prob(n processors do not access bus) = (1 – r)n
• Prob(bus is busy) = 1 – (1 – r)n
= bus bandwidth
= bus B(r, n)
• Bw = Bus B(r, n) / Tbw
• achieved bandwidth per processor ra is
nra = B(r, n)
ra = B(r, n) / n
soc 5.17
Resubmissions: iterate to find ra
• let offered occupancy be a; initially set r=a
• find new a =r/ (r+(ra / r)(1- r))
• nra =1-(1-a)n
soc 5.18
SOC interconnect switches (NOC)
• nodes are the units to be connected
• links are the connections
– width, w bits
– cycle time, Tch, determines bandwidth
– they can be uni or bidirectional
• message consists of Header
– target node address H and payload l
– transmission: Tch(H/w + l /w)
– h=H/w usually assumed to be 1
• links can be
– static: links between nodes fixed
– dynamic: links vary, as in crossbar
soc 5.19
Static: nodes, links and fanout
soc 5.20
Static (k,d) networks
• networks with
– k nodes per dimension
– d dimensions (k,d)
• total nodes, N = kd
– in hypercube k=2
• most (k,d) have end around closure
– fanout = 2d (k>2)
• diameter
– (max internode distance with closure) =dk/2
soc 5.21
Static network
soc 5.22
Examples of static networks
soc 5.23
Static network analysis
• for a static (k,n) network
– let kd be average number of network hops for
message to transit a single dimension
– for bidirectional network with closure kd = k/4, k even
• time to transmit message without contention Tc
– Tc = n x kd + (l/w) in network cycles
– for h = 1
soc 5.24
Dynamic network
soc 5.25
Switch based interconnect
soc 5.26
Dynamic, indirect network
soc 5.27
Crossbar 2x2, kxk
soc 5.28
Dynamic, Indirect Networks
• switches are separate from the nodes
- centralized as a MIN (Multistage
Interconnection Network)
• a switch
- k x k crossbar with no storage
• an N-node (1 channel/node) network
- has (N/k)w switches per stage.
• min. number of stages to connect N to N
soc 5.29
- [logkN]
Baseline dynamic network address
selects output
soc 5.30
Xfabric (direct network w 2D grid)
soc 5.31
Xfabric
Junction
soc 5.32
Format of Nexus burst
soc 5.33
NOC layer architecture
soc 5.34
Typical layered NOC
soc 5.35
NOC layered architecture
• physical layer
– how packets are transmitted over physical wires
• transport layer
– packet routing
• transaction layer
– NIU provides service to the IP
• each layer transparent to the other
soc 5.36
NOC layered advantages
• layers can be independently optimized
• scalable
• better Quality of Service control
– more optimization points of control
• flexible throughput
– can reallocate physical layer resources as required
• multiple clock domain operation
soc 5.37
Transaction, transport and physical
layers of an NOC
soc 5.38
PivotPoint Architecture, 3x3 crossbar
soc 5.39
Dynamic vs Static
• Section 5.9 assumes
– h=1 (header sent in 1 cycle)
– wormhole routing (message can begin to
leave node after h=1 cycles)
• spreadsheet can be used to compare
configurations
soc 5.40
Message and header
soc 5.41
Bus pros (+) and cons (-)
• Every unit attached adds
parasitic capacitance (-)
• Bus timing is difficult in deep
sub-micron process (-)
• Bus testability is problematic and
slow (-)
• Bus arbiter delay grows with the
number of masters. The arbiter is
also instance-specific (-)
• Bandwidth is limited and shared
by all units attached (-)
• Bus latency is zero once arbiter
has granted control (+)
• The silicon cost of a bus is low
for small systems (+)
• Any bus is directly compatible
with most IPs, including
software running on CPUs (+)
• The concepts are simple and
well understood (+)
soc 5.42
NOC pros (+) and cons (-)
• Only point-to-point one-way
wires are used for all network
sizes (+)
• Network wires can be pipelined
because the network protocol is
globally asynchronous (+)
• Dedicated BIST is fast and
complete (+)
• Routing decisions are distributed
and the same router is
reinstanciated, for all network
sizes (+)
• Aggregated bandwidth scales
with the network size (+)
• Internal network contention
causes a small latency (-)
• Network uses significant
silicon area (-)
• Software needs clean
synchronization in
multiprocessor systems (-)
• System designers need reeducation for new concepts (-)
soc 5.43
Summary
• SOC interconnect design
– find the cost and performance of alternatives
• common choices include
– buses, e.g. AMBA, CoreConnect
– Network-on-Chip NOC, static/dynamic networks
• iterated to find the least expensive design
that meets the requirements
• consider the larger issues: reliability,
scalability, design costs, availability of IPsoc 5.44