A Multi-Standard Mobile Digital Video Receiver in 0.18um

Download Report

Transcript A Multi-Standard Mobile Digital Video Receiver in 0.18um

IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 9, SEPTEMBER 2008
Photonic Networks-on-Chip for Future
Generations of Chip Multiprocessors
Assaf Shacham, Member, IEEE, Keren Bergman, Senior Member, IEEE, and
Luca P. Carloni, Member, IEEE
A. Shacham is with Aprius Inc., 440 N. Wolfe Rd., Sunnyvale, CA 94085. E-mail: [email protected].
K. Bergman is with the Department of Electrical Engineering, Columbia University, 500 W. 120th St., 1300 Mudd, New
York, NY 10027. E-mail: [email protected].
L.P. Carloni is with the Department of Computer Science, Columbia University, 466 Computer Science Building, 1214
Amsterdam Avenue, Mail Code: 0401, New York, NY 10027-7003. E-mail: [email protected].
2011. 06. 08.
Kim Yeo-myung
RFAD LAB, YONSEI University
CONTENTS
 I. INTRODUCTION
 II. RELATED WORK
 III. HYBRID NOC MICROARCHITECTURE
 IV. NETWORK DESIGN
 V. DESIGN ANALYSIS AND OPTIMIZATION
 VI. COMPARATIVE POWER ANALYSIS
 VII. CONCLUSION
RFAD LAB, YONSEI University
INTRODUCTION
 Parallel Computational Core
– New commercial release for driving performance
– The role of interconnect and associated global communication
infrastructure is becoming central to the chip performance
 Issue of Network-on-Chip(NoC)
– Large Bandwidth & stringent latency requirements
– Electrical NoC can provide enough performance but required
large power consumption → Photonic NoC
 Photonic NoCs can deliver a dramatic reduction in power
expended on intrachip global communi-cations while
satisfying the high bandwidths requirements of CMPs
 Hibryd NoC Architecture – Photonic + Electronic
RFAD LAB, YONSEI University
RELATED WORK
 Relative performance of optical and electrical on-chip
interconnects <Collet et al>
– The penetration of on-chip optical interconnects can be
envisioned in lengths larger than 1,000 times the wavelength
where they can have lower power and latency than electronic
interconnects
 Multicore processor architecture where remote
memory accesses are implemented as transactions on a
global on-chip optical bus <Kirman et al>
– A latency reduction as high as 50 percent for some applications
and a power reduction of about 30 percent over a baseline
electrical bus
RFAD LAB, YONSEI University
RELATED WORK
 An optical NoC based on a wavelength-routed crossbar
<Briere et al>
– The crossbar, comprised of passive resonator devices and
routing between an input-output pair, is achieved by selecting
the appropriate wavelength
– Problem : requires either widely tunable laser sources or large
arrays of fixed-wavelength sources with fast wavelengthselection switches
 Benefits of optical intrachip interconnects<Intel>
– While optical clock distribution networks are not especially
attractive, wavelength division multiplexing (WDM) does offer
interesting advantages for intrachip optical interconnects over
copper in deep-submicron processes.
RFAD LAB, YONSEI University
HYBRID NOC MICROARCHITECTURE
 Meaning of Hybrid
– Optical + Electronic
– Circuit-switched network(bulk message) + packet-switched
network(short message)
 Why Hybrid?
– Photonic packet switching? Two necessary functions for packet
switching, namely, buffering and header processing, are very
difficult to implement with optical devices
– Electronic NoC Problem? Electronic NoCs do have many
advantages in flexibility and abundant functionality, but tend to
consume high power, which scales up with the transmitted
bandwidth
RFAD LAB, YONSEI University
HYBRID NOC MICROARCHITECTURE
 Operation of optical circuit switching
1. Electronic control packet is transmitted → routed in the
electronic network & setting up a photonic path
2. Buffering takes place for the electronic packets during the pathsetup phase
3. The established paths are optical circuits between processing
cores → enabling low power, low latency, high BW.
 Advantage of photonic path
– Bit-rate transparency : 어떤 소자가 광 신호의 전송 속도(bitrate)에 관계없이 처리 할 수 있는 능력 → Dynamic power
dissipation scales with the bit rate in electronics(switching
power). But photonic switches switch on and off once per
message and their energy dissipation does not depend on the bit
rate
– Low loss in optical wave guides
RFAD LAB, YONSEI University
HYBRID NOC MICROARCHITECTURE
 Exploiting Photonics in
* Optical Clock
NoC Design
Distribution Network
* Torus Networks * Off-Chip Laser
* WDM
Optical Switch
(Microring-resonator structure)
Waveguide & Fiber
Coupling lens
Modulator
The construction of the photonic NoC
in a single layer, above the metal
RFAD LAB, YONSEI University
HYBRID NOC MICROARCHITECTURE
 Life of a Message in the Photonic NoC
1. A write operation that takes place from a processing unit in a
core to a memory that is located in another core is start.
2. As soon as the write address is known a path-setup packet is
sent on the electronic control network.
3. The control packet is routed in the electronic network, reserving
the photonic switches along the path for the photonic message
which will follow it.
4. When the path-setup packet reaches the destination port, the
photonic path is reserved and is ready to route the message.
5. A short light pulse can then be transmitted onto the waveguide
in the opposite direction (from the destination to the source),
signaling to the source that the path is open.
6. After the message transmission is completed, a path teardown
packet is sent to free the path resources for usage by other
messages.
RFAD LAB, YONSEI University
NETWORK DESIGN(Building Blocks)
 Photonic Switching Element(PSE)
– Microring-resonator structure(similar device : optically pumped)
– OFF state: The resonant frequency of the rings is different from
the wavelength
– ON state: The switch is turned on by the injection of electrical
current into p-n contacts surrounding the rings
– Switching time : 30 ps
– Their merit lies mainly in their extremely small footprint, with ring
diameters of approximately 12um, and their low power
RFAD LAB, YONSEI University
NETWORK DESIGN(Building Blocks)
 Photonic Switching Element(PSE)
– 4 X 4 switches (controlled by electronic circuit termed an ER)
– Control packets are received in the ER, processed, and sent to
their next hop, while the PSEs are switched ON and OFF
accordingly
– Blocking Relation is exist. (Nonblocking switches offer improved
performance and simplify network management and routing.)
RFAD LAB, YONSEI University
NETWORK DESIGN(Topology)
 4 X 4 folded torus network
– The communication requirements of a CMP are best served by a
2D regular topology such as a mesh or a torus
– A regular 2D topology requires 5 X 5 switches which are overly
complex to implement using photonic technology.
– Therefore use a folded-torus topology as a base and augment
it with access points for the gateways.
RFAD LAB, YONSEI University
NETWORK DESIGN(Topology)
 4 X 4 folded torus network
– The access points for the gateways are designed with two goals
in mind: 1) to facilitate injection and ejection without interference
with the through traffic on the torus and 2) to avoid blocking
between injected and ejected traffic which may be caused by the
switches internal blocking.
NETWORK DESIGN(Topology)
 4 X 4 folded torus network
NETWORK DESIGN(Flow Control)
 XY dimension-order routing on the torus network
– Path setup time is required (travel a number of ERs and undergo
some processing in each hop & blocking) (nanosecond order)
– The transmission latency of the optical data is very short and
depends only on the group velocity of light in a silicon waveguide :
2cm – 300ps
RFAD LAB, YONSEI University
DESIGN ANALYSIS AND OPTIMIZATION
 Simulation Setup
– Developed POINTS (Photonic On-chip Interconnection Network
Traffic Simulator)
– 36-core CMP, 6X6 Planar layout, 22nm CMOS tech.
– The chip size is assumed to be 20 mm along its edge, so each
core is 3.3 X 3.3 mm in size.
– The network is a 6 X 6 folded-torus network augmented with
36 gateway access points, so it uses a matrix of 12 X 12
switches.
– A propagation velocity of 15.4 ps/mm in a silicon waveguide for
the optical signals
– The inter-PSE delay and interrouter delay are, therefore, 13 and
220 ps, respectively
– The PSE setup time is assumed to be 1 ns and the router
processing latency is 600 ps
RFAD LAB, YONSEI University
DESIGN ANALYSIS AND OPTIMIZATION
 Dealing with Deadlock
– Deadlock :
1. 프로그램 1이 자원 A를 요청하여, 그것을 할당받았다.
2. 프로그램 2가 자원 B를 요청하여, 그것을 할당받았다.
3. 프로그램 1이 자원 B를 추가로 요청하였으나, 자원 B가 다른 프로그램에 의해
사용 중이므로, 사용 가능한 상태가 될 때까지 대기 열에서 기다리고 있다.
4. 프로그램 2가 자원 A를 추가로 요청하였으나, 자원 A가 다른 프로그램에 의해
사용 중이므로, 사용 가능한 상태가 될 때까지 대기 열에서 기다리고 있다.
DESIGN ANALYSIS AND OPTIMIZATION
 Optimizing Message Size
– Large messages → Link utilization is compromised and
serialization latency is increased.
– Small messages → The relative overhead of the path-setup
latency becomes too large and efficiency is degraded.
DESIGN ANALYSIS AND OPTIMIZATION
 Optimizing Message Size
– The optimal DMA block size for the transactions over the
photonic NoC ranges between 4 and 16 Kbytes
DESIGN ANALYSIS AND OPTIMIZATION
 Increasing Path Multiplicity
DESIGN ANALYSIS AND OPTIMIZATION
 Evaluating Path-setup Procedures
– Reductions in path-setup latency translate to improved efficiency
of the network interfaces and to higher average bandwidth.
– tq is a major contributor to the overall setup latency
– Some of the Technique is mentioned to reduce the tq.
(Immediately dropping any path-setup packet that is blocked
instead of buffering it)
COMPARATIVE POWER ANALYSIS
 Power Analysis → The main motivation for the
design of a photonic NoC
– To evaluate this power analysis, perform a comparative high
level power analysis.
 Condition of Power Analysis
– Same bandwidth & same number of processing core
– Assume : 22nm CMOS technology, hosting 36 processing cores,
each requiring a peak bandwidth 800 Gbps, average bandwidth
512 Gbps
– Assume : uniform traffic model, mesh topology, XY dimensionorder routing
RFAD LAB, YONSEI University
COMPARATIVE POWER ANALYSIS
 Reference Electronic NoC
1.
2.
3.
4.
5.
Reading from a buffer (for high-BW, Large parallel line is required)
Traversing the routers’ internal crossbar,
Transmission across the interrouter link,
Writing to a buffer in the subsequent router, and
Triggering an arbitration decision.
RFAD LAB, YONSEI University
COMPARATIVE POWER ANALYSIS
 Proposed Photonic NoC
1. The photonic data-transfer network (6X6 CMP)
Path multiplicity factor : 2 → 12 X 12 Photonic mesh (576 PSEs)
Power of PSE : On state → 10 mW, Off state → no dissipation
Total Power consumption (statistic)
2. Electronic Control network (6X6 CMP)
Each photonic message is accompanied by two 32-bit control
packets and the typical size of a message is 2 Kbytes.
COMPARATIVE POWER ANALYSIS
 Proposed Photonic NoC
3. The electronic control network
960 Gbps BW → 40 Gbps X 24 Wavelengths → 24 modulator and
receiver is required.
We estimate that Silicon ring-resonator modulator, SiGe photodetectors the energy will decrease to about 0.2 pJ/bit in the next 810 years
(Supplementary circuits that are usually required for the implementation of optical
receivers(CDR,serializer etc) are not needed in an ultrashort link in which the
modulation rate is equal to the chip clock rate)
(The off-chip laser sources consume an estimated power of 10 mW per wavelength.
Although a large number of lasers are required to exploit the bandwidth potential of
the optical NoC, their power is dissipated off-chip and does not contribute to the chip
power density)
CONCLUSION
 The motivation behind our work
– 1. Multicore processors step into an era where high bandwidth
communications between large numbers of cores is a key
driver of computing performance.
– 2. Power dissipation has clearly become the limiting factor in
the design of high-performance microprocessors
– 3. Recent breakthroughs in the field of silicon photonics
suggest that the integration of optical elements with CMOS
electronics is likely to become viable in the near future.
 This paper aims at laying the groundwork for future
research progress by providing a complete discussion of
the fundamental issues that need to be addressed to
design a photonic NoC for CMPs