Slides used by Karthik
Download
Report
Transcript Slides used by Karthik
Intel Single Chip Cloud Computer
(SCC) – An Overview
by
Karthik.V.M.
Motivations for
[1]
SCC
Many-core processor research
High-performance power-efficient fabric
Fine-grain power management
Message-based programming support
Parallel Programming Research
Better support for scale-out model servers
OS, Communication architecture
Scale out programming model for client
Programming languages, runtimes
Courtesy: intel
SCC Feature Set
First Si with 48 iA cores on a single die
Power envelope 125W, Core @ 1GHz, Mesh @
2GHz
Message passing architecture
No coherent shared memory
Proof of concept for scalable many-core solution
Next generation 2D mesh interconnect
Bisection B/W 1.5Tb/s to 2Tb/s, avg.power 6W to
12 W
Fine grain dynamic power management
Courtesy: intel
SCC system overview
Courtesy: intel
Die Architecture
Courtesy: intel
Voltage and Frequency islands
Courtesy: intel
Package and Test Board
Courtesy: intel
Core & Router Fmax
Courtesy: intel
SCC Platform Board Overview
Courtesy: intel
SCC Software
SCC customized linux
Cross compilers for pentium processor
available for c++ & fortran
Cross compiled MPI2 including iTAC trace
analyzer available
C++ programming frame work ”baremetal C”
availble for creating baremeta apps, OS etc
Management Console PC software
sccGui
Courtesy: intel
Programmer's view of SCC
Courtesy: intel
– A small library for
many-core communication
[2][3]
RCCE
Compact light weight communication
Research vehicle to see how message passing
APIs map to many cores
One can work close to the hardware (eg
manipulate the MPB)
Same program executes at all cores
Has MPI style APIs & Power mgmt APIs
Two level APIs – gory & non gory
RCCE emulator
Courtesy: intel
Software Managed Cache
Coherence
Implementing hardware managed cache
coherence is difficult
Limited Power budget
High complexity and validation effort
Software Managed Coherence
Scales with number of cores
Multiple apps running in separate coherency
domains
Dynamically reconfigurable coherency domains
Most apps are RO-shared, few RW-shared
Courtesy: intel
Software Managed Cache
Coherence (cont)
Shared virtual memory can be used to support
coherency (like DSM)
The coherency is maintained by regions being
owned exclusively
The regions can then be handed over to other
core for exclusive operation
Some regions are jointly acessible
No coherence traffic until ownership is changed
Consistency guaranteed only at release/acquire
points
Courtesy: intel
Separated Coherency Domains
Courtesy: intel
Multiple SCC Chips – Wider
Coherency
Courtesy: intel
References
[1] J. Howard et al., “A 48-core IA-32 message-passing processor with
DVFS in 45nm CMOS,” in Solid-State Circuits Conference Digest of
Technical Papers (ISSCC), 2010 IEEE International, 7-11 2010, pp. 108
–109.
[2] T. G. Mattson and R. F. V. der Wijngaart, “Rcce: a small library for
many-core communication,” Intel Corporation, Tech. Rep., May 2010.
[3] T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy,
J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe, “The 48-core
scc processor: the programmer’s view,” in Proceedings of the 2010
ACM/IEEE International Conference for High Performance Computing,
Networking, Storage and Analysis, ser. SC ’10. Washington, DC,
USA: IEEE Computer Society, 2010, pp. 1–11. [Online]. Available:
http://dx.doi.org/10.1109/SC.2010.53