Transcript K42

C
O
M
P
U
T
A
T
I
O
N
A
L
R
E
S
E
A
R
C
H
D
I
V
I
S
I
O
High End Computing
with K42
Paul H. Hargrove and
Katherine Yelick
Lawrence Berkeley National Lab
Angela Demke Brown and
Michael Stumm
University of Toronto
Patrick Bridges
University of New Mexico
Orran Krieger and
Dilma Da Silva
IBM
FastOS PI Meeting
June 9, 2005
N
• The HECRTF and FastOS reports
enumerate unmet needs in the area of
Operating Systems for HEC, including
–
–
–
–
Availability of Research Frameworks
Support for Architectural Innovation
Performance Visibility
Adaptability to Application Requirements
• This project uses the K42 Operating
System to address these needs
What is K42?
– K42 is a GPLed research O/S at IBM
– Framework for PERCS (DARPA HPCS) work
• O/S research and architectural innovation
–
–
–
–
API and ABI compatible w/ Linux on PPC64
Runs most Linux kernel modules (fs, etc)
Many OS services implemented in user-space
Object oriented
•
•
•
•
Every virtual or physical instance is an object
Every class may have multiple implementations
Implementations can be hot-swapped, per-instance
Very modular for easy addition/modification
– Has extensive performance/tracing capabilities
– Design/implementation is very SMP-scalable
Who?
• IBM
– The group most experienced with K42 and
responsible for its continued development
– The group performing O/S research for PERCS
– Has contributed K42-developed ideas to Linux
• Linux Trace Toolkit (LTT)
• Object-based reverse mapping (memory mgmt)
• Read-Copy-Update (RCU)
Who?
• LBNL
– DOE applications and application scientists
• Especially via BIPS and SciDAC PERC project
– Scalable Systems Software SciDAC project
• BLCR (system-initiated checkpointing for Linux)
– Access to OpenMPI team (former LAM/MPI)
– UPC and Titanium teams (GASNet runtime)
– Linux kernel experience
• Including M-VIA and BLCR
Who?
• University of New Mexico
– Experience with implementing and
porting light-weight kernels
• SUNMOS, Puma, Cougar and Catamount
– Experience with the development of the
Portals API
– Experience in configurable/adaptable
systems software
• X-kernel, Scout and Cactus
Who?
• University of Toronto
– A prominent member of the existing K42
research community
– Origin of Tornado, the direct
predecessor to K42
– Key preliminary work in arbiter-object
technology for hot swap and dynamic
adaptation
What are we doing?
• Work divides into three major areas:
– Framework for OS/Runtime research for
HEC applications
– Dynamic Adaptation
– Architecture of a parallel operating
system
Framework for OS/Runtime research
for HEC
• Make K42 usable as a platform
– To perform basic O/S and Runtime
research of importance to HEC
– To develop/run/debug/tune HEC
applications
Framework (1)
Issue: K42 is hard to build/install
Approach: Create a distribution for a dualboot K42/Linux system (src and bin)
Issue: K42 runs only on PPC64
Approach: Port to AMD64 (maybe EM64T?)
Issue: K42 lacks a full HEC environment
Approach: Build/port the required
environment (SSS OSCAR)
– Numeric libraries (OSCAR)
– Batch system (Scalable System Software suite)
– Programming models (MPI UPC Titanium CAF)
Dynamic Adaptation
• Utilize K42’s design
– To expose performance information
below the app-O/S interface
– To allow static and dynamic
specialization of O/S and runtime
services
Dynamic Adaptation (1)
• Issue: O/Ses and runtimes are
performance-opaque
• Approach: Extend what K42 has
– K42 already has extensive
performance/tracing capabilities
– Expose K42’s object structure
– Use arbiter objects for per-object
collection of h/w counter data
– Develop graphical tools to connect
performance data to OS/runtime objects
Dynamic Adaptation (2)
• Issue: What to adapt?
• Approach: Study HEC Apps
– Use performance tools to identify
OS/Runtime objects which are
bottlenecks and in what situations
– Investigate what alternative
implementations offer better
performance in those situations
– Add these implementations to K42
Dynamic Adaptation (3)
• Issue: When to adapt?
• Approach 1: user-directed
customization
• Approach 2: compiler-directed
customization
• Approach 3: Runtime adaptation
using arbiter objects to monitor and
adapt to changing conditions
Dynamic Adaptation (4)
• Issue: How to adapt?
• Approach: Use hot-swapping of
object implementations in K42
– Allows one to replace implementations,
per-instance, without the need to block
the application
Dynamic Adaptation (5)
• Issue: Need a small set of
applications, representative of HEC
today and in the future
• Approach: LBNL has identified a set
of applications that we feel are a
good starting set.
– separate presentation if time allows
Building a Parallel O/S
• K42’s design principles yield
excellent scaling on SMPs, with
minimal UP impact
• Apply these principles to parallel
runtime services
• Integrate these services with the O/S
Build a Parallel O/S (1)
• Issue: K42 lacks a native RPC mechanism
• Approach:
– Adapt the best design features of
• Protected Procedure Calls (inter-address space)
• Active Messages (inter-node)
– Simple but powerful AM-style mechanism for
parallel runtime services
– Reduce to protected procedure call in the
intra-node case
Build a Parallel O/S (2)
• Issue: AM has no “name service”
• Approach: Design and prototype a
simple mechanism for locating
required services
– Services may be load-balanced
– Services may migrate/fail-over
Build a Parallel O/S (3)
• Issue: Asynchronous events are
common in a parallel environment
• Approach: Reusable event service
– K42 is already an event-driven system
– Design and prototype a distributed
event service
– Simple Publish/Subscribe API?
Build a Parallel O/S (4)
• Issue: Parallel job management
– Spawn, signal, ps
• Approach: Extended Process Spaces
– Not the same as SSI, more like PAGs
– Process id tuple: (ProcSpace, ID)
– Each parallel job is a ProcSpace
– A process can see those ProcSpaces to
which it is “attached” (creator, member
or observer)
Build a Parallel O/S (5)
• Issue: Just TCP/IP sockets in K42
• Approach:
– Characterize the application impact of s/w
communication overheads
– Investigate App/kernel/NIC APIs
• Offload of communication processing
• Application-specific customization
– Implement O/S support for other
communication abstractions
• Active Messages
• RDMA
Applications Performance Evaluation
• Can K42 be a production HEC
environment?
Applications Performance
Evaluation
• Head-to-head Linux-vs-K42
comparisons
– Many comparisons already possible
– Port of HEC environment will allow
more complete comparison
• Port to Opteron will allow head-tohead comparison to Catamount
– Interesting: Apps programming in K42
is as easy as Linux, but can K42
perform as well as Catamount on HEC?
And then what?
• What can/should one do with the
resulting framework?
Possible Follow-on Work (1)
• The research framework will
allow/ease many potentially
interesting research areas
– Some cut from our proposal
– Some are part of other FastOS projects
– Others are new
– Presented on following slides in no
particular order
Possible Follow-on Work (2)
• Filesystem work via native RPC
mechanisms and RDMA networking
• Cluster software management
– K42 hot-swapping can allow full OS/Runtime
upgrade of a live system with no downtime
(like TELCO equipment)
• Co-scheduling
– Reduce O/S-induced load-imbalance (“noise”)
that perturbs collective operations (especially
barriers)
Possible Follow-on Work (3)
• Scheduling and resource management for
multi-threaded and multi-core processors
– Example: page coloring for cache partitioning
• Virtualization, checkpoint/restart and
migration
– Interposing objects makes virtualization trivial
– Use of RCU makes most (all?) quiescing
unnecessary when checkpointing
• High performance network drivers for K42
– InfiniBand, Quadrics QSNetII, MyriNet
In Conclusion…
Summary (1)
• Produce a K42 platform that:
– Is easy to install and use for O/S and
runtime research in HEC
– Includes nearly all of the HEC
environment a user is expecting
– Helps users to track/understand
performance within the O/S and runtime
– Accepts static customization hints from
users and/or compilers
Summary (2)
• Produce a K42 platform that:
– Will dynamically identify performance
bottlenecks in the O/S and runtime and
dynamically switch to more appropriate
object implementations
– Includes custom HEC-appropriate
application/kernel/network APIs
– Includes an infrastructure for building
of parallel operating environments
– Includes a scalable mechanism for
parallel job control
Summary (3)
• Work with DOE SC applications
– To determine HEC-appropriate
implementations/policies/APIs
– To improve applications performance
• Evaluate the performance of K42 as a
production HEC platform
– Head-to-head vs. Linux
– Head-to-head vs. Catamount