Transcript ppt

Tornado: Maximizing Locality and
Concurrency in a Shared Memory
Multiprocesor Operating System
By: Ben Gamsa, Orran Krieger, Jonathan Appavoo,
Michael Stumm
Presented by: Holly Grimes
Locality

In uniprocessors



Spatial Locality—access neighboring memory locations
Temporal Locality—access memory locations that where
accessed recently
In multiprocessors, locality involves each processor using
data from its own cache
Why is Locality Important?

Modern multiprocessors exhibit






Higher memory latency
Large write-sharing costs
Large cache lines (false sharing)
Larger system sizes
Large secondary caches
NUMA effects
Why is Locality Important?
Why is Locality Important?
Why is Locality Important?
Why is Locality Important?
Why is Locality Important?
Why is Locality Important?
Why is Locality Important?


Sharing the counter requires moving it back and forth
between the CPU caches
Solution??


Split the counter into an array of integers
Each CPU gets its own counter
Achieving Locality
Achieving Locality
Achieving Locality
Achieving Locality


Using an array of counters seems to solve our problem…
But what happens if both array elements map to the same
cache line?


False sharing
Solution:

Pad each array element

Different elements map to different cache lines
Comparing Counter Implementations
Why is Locality Important?

Modern multiprocessors exhibit






Higher memory latency
Large write-sharing costs
Large cache lines (false sharing)
Larger system sizes
Large secondary caches
NUMA effects
NUMA Effects
NUMA Effects
Achieving Locality in an OS

We’ve seen several ways to achieve locality in the
implementation of a counter

Now we extend these concepts to see how locality can
be achieved in an OS

Tornado’s Approach –


Built a new OS from the ground up
Make locality the primary design goal
Tornado’s Approach
User Application
OS Implementation
Locality
Locality
Independence
Independence
Illustration by Philip Howard
Tornado’s Locality-Maximizing features

Object-oriented Design

Clustered Objects

A New Locking Strategy

Protected Procedure Calls
Object-oriented Design
Object-oriented Structure


Each OS resource is represented as a separate object
All locks and data structures are internal to the objects


This structure allows different resources to be managed



localizes and encapsulates the locks and data
without accessing shared data structures
without acquiring shared locks
Simplifies OS implementation

modular design
Example: Memory Management Objects
HAT
COR
Region
FCM
Process
DRAM
Region
FCM
COR
HAT
FCM
COR
DRAM
Hardware Address Translation
File Cache Manager
Cached Object Representative
Memory manager
Illustration by Philip Howard
Object-oriented Design

For each resource, this design provides one object to be
shared by all CPUs


Comparable to having one copy of the counter shared among
all CPUs
To maximize locality, something more needs to be done
Clustered Objects
Clustered Objects

Clustered objects are composed of a set of
representative objects


Clients access a clustered object using a common
reference to the object



There can be one rep for the system, one rep per processor,
or one rep for a cluster of processors
Each call to an object using this reference is automatically
directed to the appropriate rep
Clients do not need to know anything about the location or
organization of the reps to use a clustered object
Impact on locality is similar to the effect of padded arrays
on the counter
Clustered Objects
Keeping Clustered Objects Consistent

When a clustered object has multiple reps, there must be
a way of keeping the reps consistent

Coordination between reps can happen via


Shared Memory
Protected Procedure Calls
Clustered Object Implementation

Each processor has a translation table




The table is located at the same virtual address in every
processor
For each clustered object, the table contains a pointer to the
rep that serves the given processor
A clustered object reference is just a pointer into this
table
Reps for a clustered object are created dynamically when
they are first accessed

Dynamic rep creation is dealt with by the global miss handler
Clustered Object Implementation
A New Locking Strategy
Synchronization

Two kinds of synchronization issues must be dealt with

Using locks to protect data structures

Ensuring the existence of needed data structures
Locks


Tornado uses spin-then-block locks to minimize the
overhead of the lock/unlock instructions
Tornado limits lock contention by


Encapsulating the locks in objects to limit the scope of locks
Using clustered objects to provide multiple copies of a lock
Existence Guarantees

Traditional Approach: use locks to protect object
references



Prevents races where one process destroys an object that
another process is using
Tornado’s Approach: semi-automatic garbage collection
Garbage collection destroys a clustered object only when
there are no more references to the object


When clients use an existing reference to access a clustered
object, they have a guarantee that the referenced object still
exists
References can be safely accessed without the use of (global)
locks
Protected Procedure Calls
Interprocess Communication


Tornado uses Protected Procedure Calls (PPCs) to bring
locality and concurrency to interprocess communication
A PPC is a call from a client object to a server object


Acts like a clustered object call that passes between the protection
domains of the client and server processes
Advantages of PPC



Client requests are always serviced on their local processor
Clients and servers share the CPU in a manner similar to handoff
scheduling
The server has one thread of control for each client request
Protected Procedure Calls
Performance
Conclusions


Tornado’s increased locality and concurrency has
produced a scalable OS design for multiprocessors
This locality was provided by several key system features




An object-oriented design
Clustered objects
A new locking strategy
Protected procedure calls