Transcript SMP OS
Operating System Issues in
Multi-Processor Systems
John Sung
Hardware Engineer
Compaq Computer Corporation
www.compaq.com
Outline
Multi-Processor Hardware Issues
Snoopy Bus System Architecture
AMD Athlon’s Snoopy Protocol
ccNUMA System Architecture
AMD Athlon’s LDT System Bus
SGI Origion’s ccNUMA System Architecture
Alpha 21364 System Architecture
ccNUMA and CPU Scheduling
Conclusion
Multi-Processor Hardware Issues
Bandwidth/Latency
Scalability
Processor to Processor
Processor to Memory
Processor to I/O
Increase performance as you increase CPU/Memory
Coherency/Synchronization
Give software coherent view of memory
Provide synchronization primitives
Snoopy Bus System
Architecture
Snoopy Bus System Architecture
A bus Connects Processors,Memory,and I/O
Scales upto ~16 processors
Limited by bus bandwidth
Cache Coherency Protocol
Snoops the bus for memory traffic
Each set has to “listen” for addresses in it’s cache
Does the “right thing” to give software coherent
view of memory
Snoopy Bus System Architecture
CPU
Core
CPU
Core
CPU
Core
Cache
Cache
Cache
Bus
Memory
I/O
Memory
I/O
Memory
I/O
ccNUMA System
Architecture
ccNUMA System Architecture
Cache-Coherent Non-Uniform Memory Access
Memory is distributed and attached to processors
Some network connects each processor/memory sets
Each processor owns part of the memory space
Cache coherency protocol
Gives software coherent view of memory
Protocol primitives for synchronization
Directory to keep track of who has a copy of memory
ccNUMA System Architecture
CPU
Core
CPU
Core
Cache
Cache
Memory Network
Directory Router
I/O
Memory Network
Directory Router
I/O
Network Fabric
SGI Origin System
Architecture
SGI CrayLinkTM
Node = 2 CPU and their cache
Module = Memory + Directory + HUB
2 Modules per Router
TM
System = Modules + Routers + CrayLink
Network
SGI CrayLinkTM
Processor System Network
Bisectional Bandwidth
ccNUMA and CPU
Scheduling Issues
OS’s Questions
Single CPU System
What to schedule next?
ccNUMA System
What to schedule next?
Which cpu to schedule it to?
Where should the process information be located at?
1 or many instances of OS?
OS’s Choices for a Process
Single CPU System
Process has1 choice
Process information has 1 choice
ccNUMA System with N CPU’s and M Memory
Process has N choices
Process information M choices per virtual page
“Distance” between process and it’s information
Context Switch Penalty
Single CPU System
Saving/Restoring process state (PCB)
Scheduling routine
ccNUMA System
Saving/Restoring process state (PCB)
Scheduling routine
Moving process’s information
Some Common Sense
Replicate parts of the OS across processors
Minimize process movement
System calls will happen often
Cost of moving a process to another CPU is high
Less than swaping to disk, most of the time
Higher than simple context switching
But if you have to move a process
Minimize the amount of information to move
Opportunity for a cache????
Conclusion
Hardware
Bandwidth and Latency for performance
Cache Coherency for correctness
Operating System
ccNUMA adds complexity in CPU scheduling
HW performance = Lower Context Switch Penalty
=> flexibility in scheduling choices for a process
References
Alpha
AMD
http://www.amd.com/products/cpg/mpf/speech/slides99.ppt
SGI
http://www.digital.com/alphaoem/present/ev7forum98.ppt
http://www.compaq.com/InnovateForum99/presentation/session31/
http://www.digital.com/alphaoem/
http://www-europe.sgi.com/origin/numa_tech.html
BenchMarks
http://www.spec.org/
http://www.tpc.org/
Abbreviation Index
AMD - Advanced Micro Devices
SGI - Silicon Graphics Inc.
ECC - Error Correction Code
SECDED - Single Error Correct Double Error Detect
API - Alpha Processor Inc
AGP - Accelerated Graphics Port
DDR DRAM - Double Data Rate Dynamic RAM
LTD - Lightning Data Transport
PCI - Peripheral Component Interconnect
CMOS - Complementary Metal Oxide Semiconductor
CAS - Column Address Strobe
TPC-C -Transaction Processing Performance Council Benchmark
ccNUMA - Cache-Coherent Non-Uniform Memory Access
SMP - Symmetric Multi-Processing