On-Demand Virtual Workspaces: Quality of Life in the Grid

Download Report

Transcript On-Demand Virtual Workspaces: Quality of Life in the Grid

Division of Labor:
Tools for Growing and
Scaling Grids
Tim Freeman, Kate Keahey,
Ian Foster, Abhishek Rana, Frank
Wuerthwein, Borja Sotomayor
Division of Labor
The greatest improvements in the productive powers of labour , and the
greater part of the skill, dexterity, and judgment with which it is anywhere
directed, or applied, seem to have been the effects of the division of labour.
(Adam Smith)
How can we implement division of labor in
Grid computing?
tools to implement an
abstraction
requirements
for an abstraction
12/05/06
ICSOC ‘06
Overview

Problem Definition


Workspace Service



The Edge Service Use Case
Overview of the workspace service
Extensions to workspace service
Implementation and Evaluation

CPU enforcement

Network Enforcement

Status of the Edge Services Project

Conclusions
12/05/06
ICSOC ‘06
Overview

Problem Definition


Workspace Service





12/05/06
Overview of the workspace service
Extensions to workspace service
Implementation and Evaluation


The Edge Service Use Case
CPU enforcement
Network Enforcement
Status of the Edge Services Project
Conclusions
ICSOC ‘06
Providers and Consumers
12/05/06
Resource provider
Resource consumers
Has a limited number of resources
Want the resources when they
need them & as much as they need
Has to balance the software
needs of multiple users
Want to use specific
software packages
Has to provide a limited execution
environment for security reasons
Wants as much control
as possible over resources
ICSOC ‘06
The Edge Service Use Case
12/05/06
ICSOC ‘06
Edge Services: Challenges

VO-specific Edge Services


Resource management



The VOs would like to provide quality of service to
their users
The resource needs of the VOs are change
dynamically
Dynamic, policy-based deployment and
management of Edge Services

12/05/06
Each VO has very specific configuration requirements
Updates, ephemeral edge services, infrastructure
testing, short-term usage
ICSOC ‘06
Division of Labor Dimensions

Environment and Configuration

Isolation


12/05/06
Critical from the point of view of the provider if the
VOs are to be allowed some independence
Resource usage and accounting

Application-independent

Management along different resource aspects

Dynamically renegotiable/adaptable
ICSOC ‘06
Overview

Problem Definition


Workspace Service





12/05/06
Overview of the workspace service
Extensions to workspace service
Implementation and Evaluation


The Edge Service Use Case
CPU enforcement
Network Enforcement
Status of the Edge Services Project
Conclusions
ICSOC ‘06
GT4 workspace service

The GT4 Virtual Workspace Service (VWS)
allows an authorized client to deploy and
manage workspaces on-demand.
GT4 WSRF front-end
 Leverages multiple GT services
 Currently implements workspaces as VMs


Uses the Xen VMM but others could also be used
Current release 1.2.1 (December, 06)
 http://workspace.globus.org

12/05/06
ICSOC ‘06
Workspace Service Usage Scenario
The VWS manages a set of nodes
inside the TCB (typically a cluster).
This is called the node pool.
The workspace service has
a WSRF frontend that allows
users to deploy and manage
virtual workspaces
VWS
Service
VWS
Node
Each node must have a VMM (Xen)
installed, along with the workspace
backend (software that manages
individual nodes)
Image
Node
VM images are staged to a
designated image node
inside the TCB
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Trusted Computing Base (TCB)
12/05/06
ICSOC ‘06
Deploying Workspaces

Adapter-based
implementation
model
VWS
Service
Workspace

Transport adapters
- Workspace metadata
 Default scp, then gridftp
- Resource Allocation


Control adapters

Default ssh

Deprecated: PBS, SLURM
VW deployment
adapter


12/05/06
Image
Node
Xen
Previous versions:
VMware
ICSOC ‘06
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Interacting with Workspaces
The workspace service publishes
information on each workspace
as standard WSRF Resource
Properties.
VWS
Service
Users can query those
properties to find out
information about their
workspace (e.g. what IP
the workspace was
bound to)
Users can interact
directly with their
workspaces the same
way the would with a
physical machine.
12/05/06
Image
Node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Pool
node
Trusted Computing Base (TCB)
ICSOC ‘06
Deployment Request Arguments

A workspace, composed of:


VM image
Workspace metadata




Need not change between deployments
Resource Allocation


12/05/06
XML document
Includes deployment-independent information:
 VMM and kernel requirements
 NICs + IP configuratoin
 VM image location
Specifies availability, memory, CPU%, disk
Changes during or between deployments
ICSOC ‘06
Workspace Service Interfaces
Handles creation of workspaces.
Also publishes information on
what types of workspaces it
can support
Workspace
Meta-data/Image
Create()
Resource
Allocation
Workspace Factory
Service
Workspace Resource Instance
inspect & manage
notify
Workspace Service
Service
Workspace
Resource Properties publish the
assigned resource allocation, how
VW was bound to metadata (e.g.
IP address), duration, and state
Handles management of
each created workspace
(start, stop, pause, migrate,
inspecting VW state, ...)
12/05/06
authorize & instantiate
ICSOC ‘06
Extensions to Resource Allocation
12/05/06
ICSOC ‘06
Overview

Problem Definition


Workspace Service



The Edge Service Use Case
Overview of the workspace service
Extensions to workspace service
Implementation and Evaluation

CPU resource allocation

Network resource allocation

Status of the Edge Services Project

Conclusions
12/05/06
ICSOC ‘06
Edge Services Today
Compute Element (CE) implemented as GT GRAM
VO1
7.83 jpm
VO1
8 jpm
VO2
GRAM
Both VOs
share the
same resource
Job throughput is low as both VOs are equally
impacted by the high VO1 traffic
12/05/06
ICSOC ‘06
Allocating Resources for Edge
Services
Resource Allocation:
MEM: 896 MB
CPU: CPU %: 45%
CPU arch: AMD Athlon
VO1
VO1
Resource Allocation:
MEM: 896 MB
CPU: CPU %: 45%
CPU arch: AMD Athlon
4.18 jpm
GRAM
22.36 jpm
VO2
GRAM
Workspace
Service
Dom0 CPU %: 10%
Job throughput for VO2 is high as it is
unimpacted by the high VO1 traffic
12/05/06
ICSOC ‘06
Tracking Requests Overtime
Com paris on of Re que s t Throughput ove r Tim e
VO1Client
VO2Client
30
- Histogram of
request throughput
25
Completed jobs
20
- Resource usage
is enforced on an
“as needed” basis
15
10
5
0
30
90
150
210
270
330
390
450
510
570
630
690
750
Time (in 30 second buckets)
12/05/06
ICSOC ‘06
810
Increasing Load on VO1
VO2 (under changing VO1 load conditions
1mill-VO2
2mill-VO2
- Histogram of
request
throughput
3mill-VO2
16
14
Jobs completed
12
- The load on
VO1 increases
2x and 3x
10
8
6
4
2
0
30
60
90
120
150
180
210
240
270
300
330
360
390
420
Time (in 30 second buckets)
12/05/06
ICSOC ‘06
450
480
- Request
throughput
for VO2 is
unimpacted
Network Resource Allocation
domU
dom0
B
domU

Processing network traffic requires CPU

In Xen: for both dom0 and guest domains


CPU allocation tradeoffs

Scheduling frequency
The mechanism is general

12/05/06
Save for direct drivers
ICSOC ‘06
Network Resource Allocation

Network Allocation Implementation

CPU allocations based on a parameter
sweep



Linux network shaping tools
Negotiating network resource allocations

12/05/06
Close to maximum bandwidth
Policy: accepting only CPU allocations that
match the bandwidth
ICSOC ‘06
Storage Element (SE) Edge Service
Resource Allocation:
MEM: 128 MB
CPU: CPU %: 6%
CPU arch: AMD Athlon
NIC: Incoming: 4.1 MB/s
Resource Allocation:
MEM: 128 MB
CPU: CPU %: 6%
CPU arch: AMD Athlon
NIC: Incoming: 4.1 MB/s
VO1
VO2
Workspace
Service
VO1
GridFTP
VO2
GridFTP
Dom0 CPU %: 22%
12/05/06
ICSOC ‘06
Negotiating Bandwidth
12/05/06
ICSOC ‘06
Renegotiating CPU and Bandwidth
Resource Allocation:
MEM: 128 MB
CPU: CPU %: 14%
6%
CPU arch: AMD Athlon
NIC: Incoming: 8.2
4.1 MB/s
MB/s
VO1
GridFTP
Resource Allocation:
MEM: 128 MB
CPU: CPU %: 6%
CPU arch: AMD Athlon
NIC: Incoming: 4.1 MB/s
VO2
GridFTP
Workspace
Service
Dom0 CPU %: 22%
12/05/06
ICSOC ‘06
Renegotiating CPU and Bandwidth
12/05/06
ICSOC ‘06
Renegotiating CPU
Resource Allocation:
MEM: 128 MB
CPU: CPU %: 34%
14%
CPU arch: AMD Athlon
NIC: Incoming: 8.2 MB/s
VO1
GridFTP
Resource Allocation:
MEM: 128 MB
CPU: CPU %: 6%
CPU arch: AMD Athlon
NIC: Incoming: 4.1 MB/s
VO2
GridFTP
Workspace
Service
Dom0 CPU %: 22%
12/05/06
ICSOC ‘06
Renegotiating CPU
12/05/06
ICSOC ‘06
Edge Services: Status

OSG activity




12/05/06
www.opensciencegrid.org/esf
Edge Services in use (database caches)

ATLAS: mysql-gsi db built by the DASH project

CMS: frontier database
Base Image library

SDSC: SL3.0.3, FC4, CentOS4.1

FNAL: SL3.0.3, SL4, LTS 3, LTS 4
Sites

Production: SDSC

also testing at FNAL, UC and ANL
ICSOC ‘06
Related Work

Edge Service efforts






OGF efforts: WS-Agreement, JSDL
Managed Services
QoS with Xen




Padma Apparo, Intel (VTDC paper)
Rob Gardner & team, HP
Credit-based scheduler
Grid computing and virtualization

12/05/06
VO boxes, EGEE
APAC, static Edge Services
Grid-Ireland, static Edge Services
Work at University of Florida, Purdue, Northwestern,
Duke and others
ICSOC ‘06
Conclusions


VM-based workspaces are a promising tool to
implement “division of labor”
Renegotiation is an important resource
management tool



12/05/06
Enforcement methods: dynamic reallocation,
migration, etc.
Aggregate resource allocations


Protocols
Different resource aspects influence each other
More work on managing VM resources is needed
ICSOC ‘06