Ei dian otsikkoa - Petrozavodsk State University
Download
Report
Transcript Ei dian otsikkoa - Petrozavodsk State University
Caching in multiprocessor systems
Tiina Niklander
In AMICT 2009, Petrozavodsk
19.5.2009
Background
More transistors on one chip
Multiple cores
Larger cache
Multiple on chip caches
More functionality (more functional units, dedicated
multimedia / deciphering cell, integrated GPU)
Multiple cores introduce
Cache organization
Private vs shared caches
Cache coherence
Cache organization
Common organization:
L1 is private
Last-level cache is shared
With three levels:
L1 private
L2 ? Private or shared
L3 Shared
Private vs Shared cache
Fully private, fully shared, partially shared
Private L2 (pair of processors share)
Shared L2 (all can access all L2)
F. Sibai: On the performance benefits of sharing and privatizing second
and third-level cache memories in homogeneous multi-core
architectures.
Microprocessors and Microsystems 32 ( 2008), pp. 405-412
Shared cache
Simple coherence issue (just one copy)
Different latencies (CPU - cache location)
Cache access competition (wait for other core)
M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip
Multiprocessors. In SC2008. IEEE, 2008, pp.
Private cache
No access competition, smaller latencies,
But coherence becomes an issue!
Same date in multiple caches -> invalidate on write
Cache partitioning
Design time: Fixed partitioning
Run time:
Fixed partitioning (configuration issue)
Dynamic (based on current need)
Cache coherence
Protocols: MESI, MSI, MOSI, MOESI
Invalidation message: RFO (Read for ownership)
Each cache snoops the bus to monitor memory ops
M
E
S
I
M
N
N
N
Y
E
N
N
N
Y
S
N
N
Y
Y
I
Y
Y
Y
Y
wikipedia
M – modified
(O- Owned)
E – Exlusive
S – Shared
I – Invalid
N – not allowed state
Y – allowed state
(Distributed) cooperative caches
Add a directory structure
Knows the data locations in local caches
Cache-to-cache copying
When in another cache (directory locates)
On eviction (store temporarily on another cache)
E, Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In PACT’08. ACM 2008, pp. 134-142
New improvement ideas for
cache performance
1/2
Split the cache for different tasks
Dynamically allocate cache areas
Software controlled eviction
GOAL: thread moves unneeded, but strongly-shared
data to shared cache to improve performance of
other threads
New instruction evict tells the processor to move
some data from private L1 or L2 to shared L3
New improvement ideas for
cache performance
2/2
Helper threads
GOAL: additional thread executes parts of the code
ahead of the actual thread to ‘prefetch’ data to cache
Generate memory traces for the programmer
Tuning the software performance
Conclusion
Focus on fine-tuning the cache performance
Cache coherence itself is solved earlier
Not always used (if allowed non-coherent usage)
L2 and L3 caches
Shared or private
Cache partitioning
Support for software-based improvements
Eviction hints
Traces
Prefetching (like helper thread)
References
S. Fide, S. Jenks: Proactive use of shared L3 caches to enhance cache communications in multi-core processors. IEEE Comp. Arch. L. vol 7 (2008), pp 57-60
E. Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In Conf. on
Parallel architectures and compilation techniques, PACT’08. ACM 2008, pp. 134-142
M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for
Chip. Multiprocessors. In Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. IEEE,
2008, pp. 1-12
L. Peng, et.al.: Memory hierarchy performance measurement of commercial dual-core
desktop processors. Journal of Systems Architecture 54(2008), pp. 816-828.
F. Sibai: On the performance benefits of sharing and privatizing second and third-level
cache memories in homogeneous multi-core architectures. Microprocessors and
Microsystems 32 ( 2008), pp. 405-412
J. Zhang, X. Fan, S.H. Liu: A Pollution Alleviate L2 Cache Replacement Policy for Chip
Multiprocessor Architecture. In Int. Conf. on Networking, Architecture and Storage,
IEEE, 2008, pp. 310-316