Slides used by Karthik

Download Report

Transcript Slides used by Karthik

Intel Single Chip Cloud Computer
(SCC) – An Overview
by
Karthik.V.M.
Motivations for


[1]
SCC
Many-core processor research

High-performance power-efficient fabric

Fine-grain power management

Message-based programming support
Parallel Programming Research

Better support for scale-out model servers


OS, Communication architecture
Scale out programming model for client

Programming languages, runtimes
Courtesy: intel
SCC Feature Set




First Si with 48 iA cores on a single die
Power envelope 125W, Core @ 1GHz, Mesh @
2GHz
Message passing architecture

No coherent shared memory

Proof of concept for scalable many-core solution
Next generation 2D mesh interconnect


Bisection B/W 1.5Tb/s to 2Tb/s, avg.power 6W to
12 W
Fine grain dynamic power management
Courtesy: intel
SCC system overview
Courtesy: intel
Die Architecture
Courtesy: intel
Voltage and Frequency islands
Courtesy: intel
Package and Test Board
Courtesy: intel
Core & Router Fmax
Courtesy: intel
SCC Platform Board Overview
Courtesy: intel
SCC Software





SCC customized linux
Cross compilers for pentium processor
available for c++ & fortran
Cross compiled MPI2 including iTAC trace
analyzer available
C++ programming frame work ”baremetal C”
availble for creating baremeta apps, OS etc
Management Console PC software

sccGui
Courtesy: intel
Programmer's view of SCC
Courtesy: intel
– A small library for
many-core communication
[2][3]
RCCE



Compact light weight communication
Research vehicle to see how message passing
APIs map to many cores
One can work close to the hardware (eg
manipulate the MPB)

Same program executes at all cores

Has MPI style APIs & Power mgmt APIs

Two level APIs – gory & non gory

RCCE emulator
Courtesy: intel
Software Managed Cache
Coherence


Implementing hardware managed cache
coherence is difficult

Limited Power budget

High complexity and validation effort
Software Managed Coherence


Scales with number of cores
Multiple apps running in separate coherency
domains

Dynamically reconfigurable coherency domains

Most apps are RO-shared, few RW-shared
Courtesy: intel
Software Managed Cache
Coherence (cont)



Shared virtual memory can be used to support
coherency (like DSM)
The coherency is maintained by regions being
owned exclusively
The regions can then be handed over to other
core for exclusive operation

Some regions are jointly acessible

No coherence traffic until ownership is changed

Consistency guaranteed only at release/acquire
points
Courtesy: intel
Separated Coherency Domains
Courtesy: intel
Multiple SCC Chips – Wider
Coherency
Courtesy: intel
References
[1] J. Howard et al., “A 48-core IA-32 message-passing processor with
DVFS in 45nm CMOS,” in Solid-State Circuits Conference Digest of
Technical Papers (ISSCC), 2010 IEEE International, 7-11 2010, pp. 108
–109.
[2] T. G. Mattson and R. F. V. der Wijngaart, “Rcce: a small library for
many-core communication,” Intel Corporation, Tech. Rep., May 2010.
[3] T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy,
J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe, “The 48-core
scc processor: the programmer’s view,” in Proceedings of the 2010
ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis, ser. SC ’10. Washington, DC,
USA: IEEE Computer Society, 2010, pp. 1–11. [Online]. Available:
http://dx.doi.org/10.1109/SC.2010.53