Transcript XtreemOS
Managed by
Overview of XtreemOS
Christine Morin
XtreemOS scientific coordinator
[email protected]
Phenix Workshop, Rennes
December 07, 2006
XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576
Grid Environment & VO
VO1
WAN
• Multiple users from
different institutions
VO2
• Large scale
• Uncountable number of resources
• Dynamicity
• VO, users, resources
Overview of XtreemOS - Phenix Workshop, December 7, 2006
• Multiple geographically
distributed resources in
different administrative
domains
2
State of the Art
Current OS are not Grid-aware & not VO-aware
A variety of Grid middleware & Toolkits for Grid
Computing
•
•
•
•
Resource management
Changing interfaces
Security pitfalls
Complexity for users, programmers & administrators
Overview of XtreemOS - Phenix Workshop, December 7, 2006
3
XtreemOS Objectives
Design & implement a reference open source Grid operating
system based on Linux
– Native support for virtual organizations
Validate the XtreemOS Grid OS with a set of real use cases on
a large Grid testbed
Promote XtreemOS software in the Linux community and
create communities of users and developers
Overview of XtreemOS - Phenix Workshop, December 7, 2006
4
XtreemOS Research Challenges
Identify fundamental functionalities to be embedded in Linux for
secure application execution in Grids
Build a set of scalable self-healing OS services for secure
resource management in very large dynamic grids
Provide a simple Grid API compliant with Posix while adding
new functionality and supporting Grid-aware applications
Aggregate cluster resources into powerful grid nodes by
integrating single system image mechanisms in Linux
Build an XtreemOS flavour for mobile devices enabling
ubiquitous access to grid resources
Overview of XtreemOS - Phenix Workshop, December 7, 2006
5
XtreemOS Flavours
Appli
Application
Appli
Appli
Middleware
XtreemOS
Linux
Linux
Linux
Linux
Computer
Computer
Computer
Computer
PC
Federation of PCs
– Cluster
Mobile device
– PDA
– Mobile phone
Overview of XtreemOS - Phenix Workshop, December 7, 2006
6
XtreemOS Architecture
Scientific Applications
Business Applications
XtreemOS API
VO & Security
Application
Management
Data Management
Infrastructure for Highly Available and Scalable Services
Linux-XOS: Grid-enabled Linux Operating System
Linux-XOS for PC
Linux-XOS for
Cluster
Linux-XOS for
Mobile Devices
Overview of XtreemOS - Phenix Workshop, December 7, 2006
7
XtreemOS Use Cases
14 applications
– Simulation applications (aerospace, energy)
– Business applications
– Bioinformatics application
– Virtual reality application
– Finance application
– Telecom application
Overview of XtreemOS - Phenix Workshop, December 7, 2006
8
XtreemOS & Linux
Acceptance in the Linux community is key for the success of the
XtreemOS project
– Packaging for multiple Linux distributions
Mandriva Linux
Red Flag Linux
Debian
– Integration in OSCAR
– Get XtreemOS patches accepted in Linux OS
Overview of XtreemOS - Phenix Workshop, December 7, 2006
9
XtreemOS Project Phases
Phase 1 (M1-M6)
– Specification of XtreemOS
Phase 2 (M7-M18)
– Design and implementation of XtreemOS basic version
– Preliminary experiments with LinuxSSI
Phase 3 (M19-M24)
– Integration of all XtreemOS components
– Delivery of first XtreemOS prototype
Phase 4 (M25-M48)
– Evaluation with real use cases
– Design and implementation of advanced features of
XtreemOS
– Public releases
Overview of XtreemOS - Phenix Workshop, December 7, 2006
10
XtreemOS Sub-projects
SP1 - Project Management
SP2 - Linux for Virtual
Organizations
SP2
SP3
SP3 - Grid Support for Linux
SP4 - Software integration,
packaging, experimentation &
validation
XtreemOS
SP5 - Communication,
dissemination, exploitation &
training
SP4
Overview of XtreemOS - Phenix Workshop, December 7, 2006
11
VO and Security Management
Scientific Applications
Business Applications
XtreemOS API
VO & Security
Application
Management
Data Management
Infrastructure for Highly Available and Scalable Services
Linux-XOS: Grid-enabled Linux Operating System
Linux-XOS for PC
Linux-XOS for
Cluster
Linux-XOS for
Mobile Devices
Overview of XtreemOS - Phenix Workshop, December 7, 2006
12
VO & Security Management
A VO can be seen as a temporary or permanent
coalition of geographically dispersed entities
(individuals, groups, organizational units or entire
organizations) that pool resources, capabilities and
information to achieve common objectives.
– Legal or contractual arrangements between
entities
– Resources can be physical equipment or other
capabilities such as knowledge, information or
data
Overview of XtreemOS - Phenix Workshop, December 7, 2006
13
Some Lessons from the State of the
Art
Open issues
– Scalability of in-the-large VO management
• Short-lived VOs
– Ease of management of VO and VO identities
– Security and VO policy enforcement at the
node and site level
Overview of XtreemOS - Phenix Workshop, December 7, 2006
14
VO & Security Management
Key components of VO
– Owner/administrator of the VO
– A set of participating users in different
participating domains
– A set of participating resources in different
participating domains
– A set of roles which users/resources can play
in the VO
– A set of rules/policies on resource availability
and access control
– An (renewable) expiry time of the VO
Overview of XtreemOS - Phenix Workshop, December 7, 2006
15
VO Lifecycle
VO identification
– Identify and name VO candidates
VO formation
– Creation and configuration of the VO according to the
anticipated roles of members
VO operation
– Members should be identified for effectively logging and
auditing
– The VO should be able to classify the resources to different
access control level for effective management
VO evolution
– Managing change in participating entities or in their condition
of use
– Members can be added and linked into a VO by authorization
– Users can be classified at different levels with associated
operation rights
VO dissolution
– Non persistent information should be deleted, credentials
reclaimed and user and resource providers notified
– Should take place after all activities finished
Overview of XtreemOS - Phenix Workshop, December 7, 2006
16
VO Management
Two levels
– VO level (administration)
• Performed by XtreemOS-G services
Distributed information management for membership
tracking and accounting of users and resources
– Node level
• Performed by XtreemOS-F
• Add mechanisms to Linux OS for recognizing, controlling,
and enforcing usage of global Grid entities
Grid identity management
Resource access granting and accounting
VO policy checking, auditing and enforcing
Overview of XtreemOS - Phenix Workshop, December 7, 2006
17
Node Level VO Management
Minimal with respect to changes to the kernel code to reduce
pressure to get VO related changes accepted in Linux community
– Keep changes localized in dynamically loadable kernel
modules
Features
– PAM-plug-in based authentication
– Static and dynamic identity mapping to local user/group ids
– Kernel level key retention mechanisms
– ACL mechanisms
• Flexible, secure, efficient and easily sustainable from the software
engineering point of view VO model
Investigation of synergies with existing security enhancement for
Linux
– Linux Security Module (LSM)
• Refinement of access control and enforcement mechanisms
Overview of XtreemOS - Phenix Workshop, December 7, 2006
18
Infrastructure for Highly Available
and Scalable Services
Scientific Applications
Business Applications
XtreemOS API
VO & Security
Application
Management
Data Management
Infrastructure for Highly Available and Scalable Services
Linux-XOS: Grid-enabled Linux Operating System
Linux-XOS for PC
Linux-XOS for
Cluster
Linux-XOS for
Mobile Devices
Overview of XtreemOS - Phenix Workshop, December 7, 2006
19
Infrastructure for Highly Available
and Scalable Grid Services
Grid
– Very large number of nodes that are distributed worldwide
– Dynamicity: nodes join, leave, fail
Applications
– Standalone (interact only with the user that launched
them)
– Services (present an interface to the outside world and
can be invoked)
• System level functionalities
• Application-level functionalities
Targets of the infrastructure
– XtreemOS-G services
– Application-level services
Overview of XtreemOS - Phenix Workshop, December 7, 2006
20
Infrastructure for Highly Available
and Scalable Grid Services
Management of collections of nodes
Overview of XtreemOS - Phenix Workshop, December 7, 2006
21
Infrastructure for Highly Available
and Scalable Grid Services
Toolbox
– Facilities to construct structured collections
• Application initialization
• DHT, N-dimensional matrix, ranked nodes
– Distributed servers
• Present a single stable address to the external world
hiding the internal organization of the service
– Virtual nodes
• Fault tolerant groups of nodes capable of taking over each
other’s tasks
– Publish/Subscribe
• Useful for applications and also to build structured
collections
• Fully decentralized implementation
– Directory service
• Node monitoring and failure detection
• Adapt to the dynamicity of the monitored attributes
Overview of XtreemOS - Phenix Workshop, December 7, 2006
22
Application Management
Scientific Applications
Business Applications
XtreemOS API
VO & Security
Application
Management
Data Management
Infrastructure for Highly Available and Scalable Services
Linux-XOS: Grid-enabled Linux Operating System
Linux-XOS for PC
Linux-XOS for
Cluster
Linux-XOS for
Mobile Devices
Overview of XtreemOS - Phenix Workshop, December 7, 2006
23
Application Management
Entities taking part in job execution
– Job
• One or more processes that collaborate to achieve a common goal
• Resource allocation unit
– Resources
• Physical or virtual component of limited availability within a
computer system
Have static and dynamic characteristics
Application execution management
– Job submission and scheduling
– Job and resource control
– Job and resource monitoring
Overview of XtreemOS - Phenix Workshop, December 7, 2006
24
Application Life Cycle
Overview of XtreemOS - Phenix Workshop, December 7, 2006
25
Application Execution Management
AEM is generic and flexible as much as possible
– Does not target specific users or types of jobs
AEM allows users to exploit advantages of executing a job in a Grid
AEM provides an easy to use job submission, control and monitoring
interface
– Unix-like submission (with default description of requirements)
– Batch-like submission
• Requirements
• Hints (additional information optionally provided by users)
– Adaptive and accurate monitoring
AEM deals with Grid dynamicity
– Job migration and checkpointing
– Hide failures and changes as much as possible to users
Overview of XtreemOS - Phenix Workshop, December 7, 2006
26
Application Execution Management
AEM has to guarantee access to authorized resources and their limited utilization
– Jobs executed in the context of a grid user and a VO
– Rely on VO and security management services (WP2.1, WP3.5)
Scalability and fault tolerance taken into account in the design of AEM
– Most of AEM services are in the scope of a job which is suitable for scalability
•
–
–
JobDirectory and jController need to be fault tolerant
WP3.2 services will be used as appropriate
•
•
Resource discovery
Distributed servers
Tight integration with the Linux OS
– Enforcement in the usage of agreed resources (quota, access control)
•
–
Job-id to be known by XtreemOS-F
Users will have more information and control on how their jobs are running
•
JExecMng and jMonitor could potentially have to manage hundreds of nodes
Performance metrics, occurred errors, exit status, …
AEM provides a basic set of system-level functionalities
– Users may rely on user-level services (eg. workflow manager, SAGA runtime)
Overview of XtreemOS - Phenix Workshop, December 7, 2006
27
Data management
Scientific Applications
Business Applications
XtreemOS API
VO & Security
Application
Management
Data Management
Infrastructure for Highly Available and Scalable Services
Linux-XOS: Grid-enabled Linux Operating System
Linux-XOS for PC
Linux-XOS for
Cluster
Linux-XOS for
Mobile Devices
Overview of XtreemOS - Phenix Workshop, December 7, 2006
28
Data Management
XtreemFS
– Federated object-based file system for Grid environments
• Centralised metadata servers replaced by a federation of metadata
servers
Independence of participating organizations while maintaining a global
view of the system
• Designed with wide-area networks in mind
File replication
Location and access management based on an intelligent monitoring
service
o Access pattern-aware replication
• Semantic naming and advanced query functions to allow users to
find data in huge archives
– Object Sharing Service (OSS)
• Inter-process communication via volatile memory, mapped files,
dynamically allocated objects and grid pipes
Overview of XtreemOS - Phenix Workshop, December 7, 2006
29
XtreemFS Components
Object Storage Device (OSD)
– Data access in the file system
• Read/write access, concurrency control
– Object-based storage interface to hide complexity of
underlying block-based storage mechanisms
Metadata and Replica Catalogue (MRC)
– Maintenance of all file system metadata
• Posix metadata
• Extended (user defined) metadata
• Information on replica locations
Replica Management Service (RMS)
– Decides when replicas have to be replicated and with what
distribution among OSD
– Replica removal
Client
– Hosts running the access layer (file system adapter or
XtreemFS library)
• Linux traditional file system interface for transparent access to
MRC, OSD, RMS
• Native XtreemFS interface
Overview of XtreemOS - Phenix Workshop, December 7, 2006
30
Overview of XtreemOS - Phenix Workshop, December 7, 2006
31
Object Storage Device (OSD)
Container of objects
– Reliably store and retrieve data from physical media
– Security enforcement for access to stored objects
• Capabilities built by MRC and received with each request
– Multi-object files
• Striping and/or replication
• Each file replica has its own striping policy
– Transactional files
• Changes performed on a local copy (and not forwarded to
other OSD) and committed or rolled back at some time
Overview of XtreemOS - Phenix Workshop, December 7, 2006
32
Replica Management Service (RMS)
Take care of autonomous creation and deletion of replicas
Replication policies
– Must satisfy security needs and comply with local regulations
• Countries, real organization, VO, racks in a data centre
Replica creation
– Gathering information from other services to decide when and
where to create a replica
• Each time a file is open
RMS is contacted to see if a better replica should be created
o Decision depends on the file size, OSD availability
o A client may start accessing a “bad replica” during the creation of a new
one
• MRC may keep track of opens to predict future access from the
previous ones
• AEM can inform RMS that a job is about to start its execution
RMS can anticipate the creation of a new replica before the job
execution
Removing “obsolete” replicas
– Lack of free space, file or replica very seldom used, close
replicas not anymore useful, …
– A replica can be removed at any time even while being used
Overview of XtreemOS - Phenix Workshop, December 7, 2006
33
MetaData and Replica Catalogue
(MRC)
MRC
– Acts logically as one service but will be composed of replicated
service instances to improve availability and performance
– Access control management
• Support of a variety of policies
• Volume ACL
Data model
– Hierarchical directory structure and/or extended metadata
– Core abstraction for controlling access to file metadata and file data is
the volume
– Files can be copied between volumes and links to files in other
volumes can be created
Internal architecture
– Exactly one meta object per physical object on a storage device
To what extend it is possible to decouple system components
while preserving a global view to the system
Overview of XtreemOS - Phenix Workshop, December 7, 2006
34
Object Sharing Service (OSS)
Inter-process communication via volatile memory, mapped
files, dynamically allocated objects and grid pipes
– All components designed to be scalable and fault
tolerant to deal with the dynamic behaviour of the Grid
Features
– Management of shared objects containing references
– Object access detection
• Page based
– Object access monitoring to control false sharing and
object replicas
– Object consistency management
• Strict, weak and transactional memory consistency models
Overview of XtreemOS - Phenix Workshop, December 7, 2006
35
LinuxSSI: Linux-XOS for Clusters
Scientific Applications
Business Applications
XtreemOS API
VO & Security
Application
Management
Data Management
Infrastructure for Highly Available and Scalable Services
Linux-XOS: Grid-enabled Linux Operating System
Linux-XOS for PC
Linux-XOS for
Cluster
Linux-XOS for
Mobile Devices
Overview of XtreemOS - Phenix Workshop, December 7, 2006
36
LinuxSSI: XtreemOS-F Cluster
Flavour
LinuxSSI will leverage Kerrighed SSI OS for
clusters
Four work directions for LinuxSSI
– Scalability to hundreds of processors
– LinuxSSI file system
– Automatic reconfiguration of LinuxSSI
– Checkpoint/restart mechanisms for parallel
applications
– Customizable scheduler
Overview of XtreemOS - Phenix Workshop, December 7, 2006
37
Scalability & Reconfiguration
Management
Scalability to hundreds of processors
– Removing hard limits on the amount of nodes
– Evaluating the scalability of Kerrighed internal
algorithms
Automatic reconfiguration of LinuxSSI
– Node addition, eviction or failure management
– Leverage the existing mechanisms provided by
Kerrighed in the HotPlug module
Overview of XtreemOS - Phenix Workshop, December 7, 2006
38
LinuxSSI File System
LinuxSSI file system
– Exploitation of the disks attached to cluster
nodes
• Single name space (root file system)
• Policies for placing/replicating data on disk
• Efficient parallel accesses to large data volumes
– Performance as a primary target in LinuxSSI
basic version
– LinuxSSI file system should not fail in the
event of failures
• Better support to failures in the advanced version of
LinuxSSI
Overview of XtreemOS - Phenix Workshop, December 7, 2006
39
Checkpoint/Restart in LinuxSSI
Checkpoint and restart of parallel application units in a cluster
– Shared memory and message-passing programming models
will be supported
– Checkpointer multi-level architecture
• Kernel checkpointer
Process/thread checkpointing
Based on Kerrighed mechanisms
Transparent or application-aware checkpointing
• System checkpointer
Application unit checkpointing (inside a cluster)
Coordination of thread/process checkpoints for parallel applications
Configurable service
• Grid checkpointer
Application checkpointing (an application may span multiple Grid
nodes)
Coordination of application unit checkpoints for an application
comprising of multiple units
Overview of XtreemOS - Phenix Workshop, December 7, 2006
40
Customizable Scheduler
Customizable scheduler
– Long-term scheduler
• Application admission in the cluster (job queuing system)
– Load balancing scheduler
• Balance the current workload between cluster nodes
Long-term scheduler
– DRMAA standard interface
– Adapted to take advantage of the SSI “virtual multiprocessor”
– Resource sharing (a CPU may not be dedicated to a single application)
– Advanced monitoring capabilities
Load balancing scheduler
– Policy customization
• Multilevel architecture (probes, analyzers, decision-making)
– Self adaptation of policy based on the current state of the cluster
– Advanced policies
• Shared memory, IPC
Interaction with the Grid level services when needed
Overview of XtreemOS - Phenix Workshop, December 7, 2006
41
From LinuxSSI to LinuxSSI-XOS
Virtual organization support
– Support of the kernel key retention system
• Impact on the Ghost module
– XtreemOS-G services will run as a single
instance on a LinuxSSI cluster
• Example: daemons in charge of mapping global user,
VO and group identities onto the Linux UID/GID
Overview of XtreemOS - Phenix Workshop, December 7, 2006
42
XtreemOS Consortium
19 partners
– 1 public financial institution as coordinator
– 9 research centers & universities
– 9 industrial partners
• 4 SME
8 countries
– Europe
• France, Germany, Italy, Slovenia, Spain, The Netherlands, UK
– China
Overview of XtreemOS - Phenix Workshop, December 7, 2006
43
XtreemOS Partners
Overview of XtreemOS - Phenix Workshop, December 7, 2006
44
Fact Sheet
Start date
– June 1st, 2006
Duration
– 4 years
Budget
– Approx. 30 Meuros
– EC funding 14.2
Meuros
Website
– http://www.xtreemos.eu
Administrative and
financial coordinator
– CDC, Jean-Noël Forget
Scientific and technical
staff
– More than 100 persons
Overview of XtreemOS - Phenix Workshop, December 7, 2006
45