Grid Resources

Download Report

Transcript Grid Resources

Uni Innsbruck Informatik - 1
Peer-to-Peer and Grid Systems
Grid computing
Michael Welzl http://www.welzl.at
DPS NSG Team http://dps.uibk.ac.at/nsg
Institute of Computer Science
University of Innsbruck, Austria
TU Darmstadt
Darmstadt, Germany
2-4 January 2007
Uni Innsbruck Informatik - 2
Complexity
• Grid poses difficult problems
– Heterogeneity and dynamicity of resources
– Secure access to resources with different users in various roles,
belonging to VTs which belong to VOs
• “run program X at site Y subject to community policy P, providing
access to data at Z according to policy Q”
– Efficient assignment of data and tasks to machines (“scheduling“)
Heterogeneous
distributed
systems
Massively
parallel
systems
Uni Innsbruck Informatik - 3
Grid requirements
• Computer scientists can tackle these problems
– Grid application users and programmers are often not computer scientists
• Important goal: ease of use
– Programmer should not worry (too much) about the Grid
– User should worry even less
– Ultimate goal: write and use an application as if using a single computer
(power grid metaphor)
• How do computer scientists simplify?
• Abstraction.
• We build layers.
• In a Grid, we typically have Middleware.
Uni Innsbruck Informatik - 4
Grid Middleware
Grid Architecture
Uni Innsbruck Informatik - 5
Grid computing without middleware
• Example manual Grid application execution
1. scp code to 10 machines
2. log in to the 10 machines via ssh and start “application > result“
everywhere
3. Estimate running time, or let application tell you that it‘s done
(e.g. via TCP/IP communication in app code)
4. retrieve result files via scp
• Tedious process - so write a script file
• Do this again for every application / environment?
• What if your colleagues need something similar?
• Standards needed, tools introduced
Uni Innsbruck Informatik - 6
Toolkits
• Most famous: Globus Toolkit
– Evolution from GT2 via GT3 to GT4 influenced the whole Grid community
– Reference implementation of Open Grid Forum (OGF) standards
• Other well-known examples
– Condor
• Exists since mid-1980‘s
• No Grid back then - system gradually evolved towards it
• Traditional goal: harvest CPU power of normal user workstations
 many Grid issues always had to be addressed anyway
• Special interfaces now enable Condor-Globus communication (“Condor-G“)
– Unicore (used in D-Grid)
– gLite (used in EGEE)
• Issues that these middlewares (should) address
–
–
–
–
–
Load Balancing, error management
Authentification, Authorization and Accounting (AAA)
Resource discovery, naming
Resource access and monitoring
Resource reservation and QoS management
Uni Innsbruck Informatik - 7
How the tools are applied in practice
Web
Browser
Compute
Server
Simulation
Tool
Web
Portal
Registration
Service
Data
Viewer
Tool
Chat
Tool
Credential
Repository
Telepresence
Monitor
Application services
organize VOs & enable
access to other services
Camera
Camera
Database
service
Data
Catalog
Database
service
Database
service
Certificate
authority
Users work
with client
applications
Compute
Server
Collective services
aggregate &/or
virtualize resources
Resources implement
standard access &
management interfaces
Source: Globus presentation by Ian Foster
Uni Innsbruck Informatik - 8
Example: Globus Toolkit version 4 (GT4)
Contrib/
Preview
Grid
Telecontrol
Protocol
Delegation
Data
Replication
Community
Scheduling
Framework
Community Data Access Workspace
Authorization & Integration Management
Deprecated
WebMDS
Python
WS Core
Trigger
C
WS Core
Reliable
File
Transfer
Grid Resource
Allocation &
Management
Pre-WS
Authentication
Authorization
GridFTP
Pre-WS
Pre-WS
Grid Resource Monitoring
Alloc. & Mgmt & Discovery
Credential
Mgmt
Replica
Location
Security
Data Mgmt
Authentication
Authorization
Example
Info
Services
Web Services
Components
Java
WS Core
Index
C Common
Libraries
eXtensible
IO (XIO)
Execution
Mgmt
Core
Non-WS
Components
Common
Runtime
Source: Globus presentation by Ian Foster
Uni Innsbruck Informatik - 9
Grid Resource Allocation Manager (GRAM)
• Globus tool for job execution
– Unified, resource independent replacement for steps in “manual Grid“ example
• Unified way to set environment variables:
Resource Specification Language (RSL) (stdout = x, arguments = y, ..)
• Steps 1-4 become
– Blocking: “globus-job-run -stage hostname applicationname“
• -stage option copies code to remote machine
• Different architectures: recompilation needed – but not supported!
– Nonblocking: scp code, then “globus-job-submit hostname applicationname“
(staging not yet supported)
• Obtain unique URL, continuously use it to query job status
• When done, use “globus-job-get-output URL stdout“ to retrieve stdout
• More complex systems are built on top of GRAM
– E.g. Message Passing Interface (MPI) for the Grid: MPICH-G2
Uni Innsbruck Informatik - 10
GRAM /2
• GRAM leaves a lot of questions unanswered
– How to recompile application for different architectures?
(automatically + in a unified way)
– What if your computer‘s IP address changes?
– What if the 10 accessed computer‘s IP addresses change?
– What if two of the computers becomes unavailable?
– What if 3 other users start to work with 5 of the 10 computers?
• A tool for each problem...
– General-purpose Architecture for Reservation and Allocation (GARA)
Integrated QoS via “advance reservation“ of resources (CPU, Disk, Network)
– Monitoring and Discovery System (MDS) for locating and monitoring resources
– Resource Broker (Globus: do it yourself; Condor: “matchmaker“) translates
requirement specification (CPU, memory, ..) into IP address
• Diversity of complex tools standardized + available in Globus,
addressing some but not all of the issues  need for an architecture
Uni Innsbruck Informatik - 11
Open Grid Services Architecture (OGSA)
•
OGSA supports:
–
–
–
–
Creation,
Maintenance, and
Application of services
OGSA is based on a layered
architecture
• Two core components:
1. OGSI provides base
infrastructure
2. OGSA Platform provides core
services
–
•
Policy, logging, etc.
Applications built on top of
platform services
Domain-specific
Services
OGSA Platform
Services
Open Grid Service
Infrastructure (OGSI)
Hosting
Environment
Protocol
Uni Innsbruck Informatik - 12
Resource vs. Service
Service
Interface
Interface
Client
Service
Resource
Interface
• Resource is something sharable
• Resource has several interfaces for accessing it
• Service realizes the interface
– With all message binding etc. for use of the client
Uni Innsbruck Informatik - 13
OGSA Goals
1.
2.
3.
4.
Identify use cases for OGSA platform components
Identify and define OGSA platform components
Define hosting and platform-specific bindings
Define interoperable resource models
•
Standardization in Open Grid Forum (OGF)
–
–
•
formerly Global Grid Forum (GGF)
modelled after IETF
Lots of smaller goals under the above large goals:
–
Distributed resource management, seamless QoS, management
solutions, open interfaces, integration of existing solutions
(e.g. web services), …
Uni Innsbruck Informatik - 15
OGSA Core Components
Domain-specific Services
OGSA Meta-OS Services
CMM
Logging
OGSA Domain Services
Policy
Data Access
Integration
Provisioning
Service
Collection
OGSI
Notification
HandleMapper
Factory
Manageability
Registry
Hosting Platform (= Native OS, …)
• Core: OGSA Platform and OGSI
• Independent of layers above and below
Lifecycle
Discovery
Uni Innsbruck Informatik - 16
OGSI
• OGSI layer is based on web services
– Services called grid services
• Every grid service is also a web service
– Converse is not necessarily true
• OGSI specifies
– How grid service instances are named
– Common interfaces and behavior
– How to specify additional interfaces
• Focus is on message-level interoperability
– Common XML format
Uni Innsbruck Informatik - 17
Grid Services vs. Web Services
•
Web services already widely used and well specified
–
WSDL for descriptions, XML for message formats
•
•
Web services typically stateless
Grid services typically long-running processes with state
•
Two kinds of stateful services
1. Service maintains state about itself
• In grid services, service state = state of the resource
2. Interaction pattern between client and service is stateful
•
OGSI tackles the first problem
–
–
–
Need to separate physical state from logical state
Logical state maintained by service
Service might not contain all of physical state
Uni Innsbruck Informatik - 18
OGSI Model
• Grid services layered on web
services
• Grid services contain state
• Both communicate with XML
messages
• Grid services described in
GWSDL
OGSI
WS + OGSI
Service
data
– HTTP, SOAP, ...
State
GWSDL
WSDL
Client
Web service
– Extension of WSDL
• Client programming is the
same
• Transport selected at
runtime
Grid service
WS
XML Messages
Web service
XML
Uni Innsbruck Informatik - 19
Terminology
• Web service
– Software component identified by a URI and with an XML interface
• Stateful web service
– Web service that maintains state between interactions
• Grid service
– Stateful web service that conforms to OGSI
• Grid service description
– Description of interface and behavior of a grid service
– Defined in combination of WSDL and GWSDL
• Grid service instance
– Instance of a grid service, identified with a grid service handle (GSH)
• Grid service references
– Description of a grid service endpoint
• Service data element
– Publicly accessible state of a service
Uni Innsbruck Informatik - 20
Common Management Model (CMM)
• CMM is an abstract representation of real resources
• Key terminology:
• Manageable resource
– Entity that has some state to which management operations can be applied
– For example, hardware, software, ...
• Manageability
– Concept where resource defines information that can be used to manage it
• Management
– Actual process of managing a resource, monitoring, modifying, making
decisions, ...
Uni Innsbruck Informatik - 21
CMM
GSH
• Each management resource is a grid
service
• Manageability Interface
– Interfaces and behavior common to all
CMM services
• Domain-specific Interface
– Additional, resource- and domainspecific interfaces for managing the
resource
Manageability
Interface
Domainspecific
Interface
Grid Service Facade to
Managed Resource
• Three aspects of manageability
– XML schema for modeling the resource
– Collection of manageability portTypes
– Guidelines for modeling the resource
Resource
Uni Innsbruck Informatik - 22
Resource Modeling
• Resources defined with CMM typically coarse-grained services
– Services are self-contained, few relationships to other services
• Resource lifecycle modeling
– Lifecycle is a set of states that a resource can have
• How to define a generic lifecycle model?
• CMM defines a lifecycle model with 5 states
–
–
–
–
–
Down
Starting
Up
Stopping
Failed
Uni Innsbruck Informatik - 23
Policy
• No clear definition; OGSA defines policy as
“Definitive goal, course, or method of action based on a set of
conditions to guide and determine present and future decisions”
• Policies are implemented and used in context
– For example, security, workload, networking, etc. policies
• OGSA defines a framework for policies
• OGSA policy model is a collection of rules based on conditions and
actions
• In general, policy is: if <condition> then <action>
– For example: if <customer is boss> then <provide best QoS>
Uni Innsbruck Informatik - 24
Levels of Policy
• Policies can be defined on several
levels
Business Level
• Business level
– Typically SLA between institutions
Domain Level
• Domain level
– Canonical form defined by OGSA
policy framework
• Device level
– Device-specific format
– Used at enforcement points
Device Level
Uni Innsbruck Informatik - 25
Policy Framework
Policy Service Core
Producer of Policies
Policy Admin
Policy Tools
Policy Autonomic
Manager
Canonical
policies
Repository
Service
Consumer of Policies
Policy Enforcement
Point
Policy
Transformation
Service
Policy Service
Manager
Canonical
policies
Policy Service
Agent
Non-canonical (changes to resource)
Common
Management Model
• OGSA Policy Service Core
Policy Validation
Service
Policy Resolution
Service
Uni Innsbruck Informatik - 26
Policy Framework Components
• Policy Manager
– Responsible for controlling access
to policy repository
– Expects policies in canonical format
• Policy Repository
– Stores policy documents
– Provides abstract interface
– Any storage, accessed through Data
Access Interface Services
• Policy Enforcement Points
– Execute policy
– Work with Policy Service Agent to
resolve conflicts
• Policy Service Agent
– Policy decision maker agents
• Policy Transformation Service
– Transforms business objectives and
canonical policies to device-level
configurations
• Policy Validation Service
– Validate policy changes
• Policy Resolution Service
– Evaluate policies in context of
business-level SLA
• Policy Tools and Autonomic
Managers
– Creation of policy documents
Uni Innsbruck Informatik - 27
Logging
• OGSA defines distributed logging system
• Provides facilities for
– Decoupling of log producers and consumers
– Transforming logs to a common representation
• Uses XML and XSLT
– Filtering and aggregation
– Configurable persistency
– Consumption patterns
– Secure logging
• Requirements result in a publish/subscribe-type logging system
Uni Innsbruck Informatik - 28
Data Access and Replication
• Data Access and Integration Services (DAIS)
• DAIS group is working on a common data management and interface
solution to data access
• OGSA requirements on data management:
– Data access service
• Uniform access regardless of heterogeneity
– Data replication
• Allows local copies of data for improved performance
– Data caching service
– Metadata catalog and services
– Schema transformation services
– Storage services
Uni Innsbruck Informatik - 29
Conceptual Model
DRM
EDRM
EDR
DR
EDR
EDRM
• Two kinds of resources
– Resources external to OGSI
– OGSI-compliant logical counterparts of the above
EDRM
Uni Innsbruck Informatik - 30
Conceptual Model
• External Data Resource Manager (EDRM) and Data Resource Manager
(DRM)
– Data management system, e.g., file system
– DRM represents EDRM
• External Data Resource (EDR) and Data Resource (DR)
– EDR is data managed by EDRM, e.g. directory in a file system
– DR is a grid service that binds to EDR
– DR is the contact point to data and exposes metadata
Uni Innsbruck Informatik - 31
Why Grid Security is hard
• Resources may be valuable and the problems sensitive
• Resources often located in distinct administrative domains
– Each resource has its own policies and procedures
• Set of resources used by a single computation may be large,
dynamic, and unpredictable
– Not just client/server
– Requires delegation?
• Security must be broadly available and applicable
– Standard, well-tested, well-understood protocols
– Integrated with wide variety of tools
Uni Innsbruck Informatik - 32
Grid Security Requirements
• User view
1. Easy to use
2. Single sign-on
3. Run applications
• ftp,ssh,Web,…
4. User based trust model
5. Proxies/agents (delegation)
• Resource owner view
1. Specify local access control
2. Auditing, accounting, etc.
3. Integration w/ local system
• Kerberos, AFS, license
mgr.
4. Protection from compromised
resources
• Developer view
1. API/SDK with authentication,
flexible message protection,
2. Flexible communication,
delegation, ...
3. Direct calls to various security
functions
4. Security integrated into
higher-level SDKs
• How to meet all these
requirements?
Uni Innsbruck Informatik - 33
Grid Security Infrastructure (GSI)
• Extensions to standard protocols & APIs
– Standards: SSL/TLS, X.509 & CA, GSS-API
– Extensions for single sign-on and delegation
• Globus Toolkit reference implementation of GSI
– SSLeay/OpenSSL + GSS-API + SSO/delegation
– Tools and services to interface to local security
– Tools for credential management
• Login, logout, etc.
• Smartcards
• MyProxy: Web portal login and delegation
• K5cert: Automatic X.509 certificate creation
Uni Innsbruck Informatik - 34
Community Authorization Service
• Question: How does a large community grant its users access to a
large set of resources?
– Should minimize burden on both the users and resource providers
• Community Authorization Service (CAS)
–
–
–
–
–
Community negotiates access to resources
Resource outsources some authorization to CAS
CAS handles user registration, group membership…
User who wants access to resource asks CAS for a capability credential
Resources can also do local access control
Uni Innsbruck Informatik - 35
Security Summary
• GSI successfully addresses wide variety of Grid security issues
• Broad acceptance, deployment, integration with tools
• Standardization ongoing in OGF
• Community Authorization Service to address community-based
allocation of resources
– Continuing development
Uni Innsbruck Informatik - 36
Architectural evolution of the Grid
• OGSI / OGSA: Open Grid Service Infrastructure / Architecture
– OGSI failed: too complex, not compliant with Web Service standards
– There are different (standards compliant) ways to achieve the same
• Move to WSRF is a general move towards service orientation (SOA)
GT3
GT4
Source: Globus presentation by Ian Foster
Uni Innsbruck Informatik - 37
Resource Virtualisation
• Handle heterogeneous and low-level resources in a uniform and highlevel manner using Web technologies
Service
Orientation
Resource
Orientation
Uni Innsbruck Informatik - 38
Compute Resource Virtualisation
Web Service-based
GRAM
PBS
Posix Fork
ssh
Uni Innsbruck Informatik - 39
Service Oriented Architecture
Service
Registry
UDDI
Register
Find
Service
Consumer
Client
UDDI
WSDL
Service
Contract
Bind
SOAP
Service
Provider
Service
Uni Innsbruck Informatik - 40
Web Services Standards
UDDI
Registry
points to description
describes
service
finds
service
Service
Consumer
WSDL
SOAP
communicates with
XML messages
Web
Service
Uni Innsbruck Informatik - 41
Web Services and the Grid
• Standards are important
–
–
–
–
–
Stability
Interoperability
Tool support
Implementation re-use
Web standards are present world wide
• But very few standards in Web Services
– Young technology
– SOAP, WSDL, UDDI
• Are these three standards enough to build a service-oriented Grid
infrastructure?
Uni Innsbruck Informatik - 42
Business Applications
• Web services are designed for business applications
• Short lifetime
– State is not really needed
• Reliability is first priority
– Statelessness ensures easy recovery upon crashes
– Transaction-based
• Persistent
– Never shut-down
– E.g. Amazon, google
– Lifetime is not an issue
Uni Innsbruck Informatik - 43
Grid Resources
• Jobs are typical example of Grid resources
– One implementation
– Multiple instances that start and end
• Resources are transient
– Limited lifetime (start and end times, not persistent)
– Web Services implementations support multiple and shared instances,
but this is not portable
• Resources have state
–
–
–
–
Queued
Running
Terminated
Failed
Uni Innsbruck Informatik - 44
Web Services Lifetime Management
• Web services are persistent
• Lifetime management supported by different Web services
implementations, but
– This is not standardised
– Implementation specific
– No portablility and no interoperability
• Sample Web services lifecycle events
– Deployment, Creation, Execution, Destruction, Undeployment, Failure
– All have different names and APIs
– Axis  WSDP  ETTK  Systinet  …
Uni Innsbruck Informatik - 45
Web Services State
• A Web service is a remote procedure call over the Internet using XML
messages
– No words about state
• Stateless service implements message exchanges with no access or
use of information not contained in the input message
– Cannot remember information
– Serve don’t care requests
– E.g., compress and uncompress a file
• Requirements
– Manage internal data and attributes across multiple
• Invocations
• Clients
• Other dependent services
Uni Innsbruck Informatik - 46
Stateless Web Service Invocation
Uni Innsbruck Informatik - 47
Stateful Web Service Invocation
Uni Innsbruck Informatik - 48
Web Services Resource Framework
Domain-Specific Applications / Services
OGF
Job Management
WS-GRAM
Data Services
GridFTP
OGSA Core Services
OASIS
W3C
WS-Resource Framework
Web Services
Uni Innsbruck Informatik - 49
WSRF Scope
• Models stateful resources
– Using existing Web services standards (unmodified SOAP & WSDL)
– Defines new thin standards
WS-Resource
Properties
Modeling
Stateful
Resources with
Web Services
WSRenewable
References
Uni Innsbruck Informatik - 50
Web Service Standards
Service
Composition
Quality of
Experience
(QoX)
WS-Reliable Messaging
WS-Security
WS-Resource Properties
Description
XSD
Messaging
XML
Transports
HTTP/HTTPS
WSDL
SOAP
BPEL4WS
WS-Notification
WS-Service Group
WS-Transaction
WS-Resource Lifetime
WS-Base Faults
WS-Policy
WS-Addressing
SMTP
WS-Metadata Exchange
WS-Renewable References
RMI / IIOP
JMS
Uni Innsbruck Informatik - 51
WSRF Standards
• WS-Addressing
– Reference and identification of stateful resources in a Web services
context
• WS-ResourceProperties
– Modeling of state as an XML document.
– Accessing state as WSDL defined interfaces
• WS-ResourceLifetime
– Management of leases on resource access
– Create, destroy, expire
• WS-ServiceGroups
– Creating and managing aggregations of Web services
• WS-BaseFaults
– Baseline for extensible fault framework
– Ability to reproduce exception hierarchies, as in Java
Uni Innsbruck Informatik - 52
Problem: performance demand
vs. large stacks
Grid apps
Middleware
DoD
WS-RF
SOAP
HTTP
TCP
IP
Source: http://img.dell.com/images/global/topics/power/ps1q02-broadcom1.gif
Uni Innsbruck Informatik - 53
Layering inefficiency example:
loss of “connection“ semantics
Can‘t reuse
connections!
Breaking the chain
Stateful Grid Service
Stateless
Doesn‘t care,
can do both
Stateless
Connection
state
Stateless
HTTP 1.0
TCP
IP
Web Service
SOAP
HTTP 1.1
Reuses
connections
Can‘t reuse
connections!
WS-RF
Could reuse
connections,
but doesn‘t!
Connection
state
Uni Innsbruck Informatik - 54
Research towards the power outlet
Automatic parallelization
Uni Innsbruck Informatik - 55
Current SoA
• Standards are only specified when mechanisms are known to work
– Globus only includes such working elements
• Lots of important features missing
• Practical issues with existing middlewares
– Submitting a Globus job is very slow (Austrian Grid: approx. 20 seconds)
 significant granularity limit for parallelization!
– Globus is a huge piece of software
• Currently, some confusion about right location of features
– On top of middleware? (research on top of Globus)
– In middleware? (other Middleware projects)
– In the OS? (XtreemOS)
 Upcoming slides concern mechanisms which are mostly on top
and partially within middleware
Uni Innsbruck Informatik - 56
Automatic parallelization
• Has been addressed in the past
• Microcode parallelism (pipelining in CPU)
– Relatively easy: simple dependencies
• Instruction level parallelism
– More complex dependencies
– Can automatically be analyzed by compiler
• Reordering, loop unrolling, ..
for (i=1; i<100; i++)
a[i] = a[i] + b[i] * c[i];
(Intel C++ compiler)
Source: WIKIPEDIA
/* Thread 1 */
for (i=1; i<50; i++)
a[i] = a[i] + b[i] * c[i];
/* Thread 2 */
for (i=50; i<100; i++)
a[i] = a[i] + b[i] * c[i];
Uni Innsbruck Informatik - 57
Automatic parallelization /2
• Parallel Computing: complete applications parallelized
– Very complex dependencies
– Decomposition methods + mapping of tasks onto processors: usually not
automatic (depends on problem and interconnection network)
– Algorithm specific methods developed (matrix operations, sorting, ..)
– Some parts can be automatized, but not everything
 explicit parallelism (OpenMP) and even allocation (MPI) quite popular
• Some research efforts on half-automatic
parallelization (“manual“ aid)
– Programmer knows about problem-specific
locality needs (interacting code elements)
– Examples:
• Java extensions such as JavaSymphony
[Thomas Fahringer, Alexandru Jugravu]
• HPF+ HALO concept
[Siegfried Benkner]
Source: http://www.par.univie.ac.at/~sigi/aurora/project2/
Uni Innsbruck Informatik - 58
Automatic parallelization in Grids
• Scheduling; important issue for “power outlet“ goal
– Automatic distribution of tasks and inter-task data transmissions = scheduling
• Grid scheduling encompasses
– Resource Discovery
• Authorization Filtering, Application Requirement Definition,
Minimal Requirement Filtering
– System Selection
• Dynamic Information Gathering
• System Selection
– Job Execution
• (optional) Advance Reservation
• Job Submission
• Preparation Tasks
• Monitoring Progress
• Job Completion
• Clean-up Tasks
• So far, most scheduling efforts consider embarassingly parallel
applications - typically parameter sweeps (no dependencies)
Uni Innsbruck Informatik - 59
The Scheduling Problem
• Hypothesis
– N tasks
– M machines
– Execution time of each task on each machine
• Goal
– Find the optimum mapping of the N tasks on the M machines that
produces the fastest completion time
• Solution
– Finding the optimum requires to evaluate all search space points
– There are MN possible mapping
– This is impossible in real time
• NP-complete optimization problem
– Cannot be solved in polynomial time
– Requires heuristics to find approximate “good” solutions
Uni Innsbruck Informatik - 60
Example algorithms
• Minimum Execution Time (MET)
– Schedule a task on the machine that executes it fastest
– Do not consider queuing time
– Fast machines will be overloaded and slow machines will be idle
• Minimum Completion Time (MCT)
– The earliest completion time on a machine
– I/O transfer time + MET + machine ready time
Note: needs
predictions of this
information!
• Switching Algorithm (SA)
– Use MET and MCT heuristics in a cyclic fashion
– Use MET at the expense of load balance until a given threshold
– Then use MCT to smooth the load across the machines
Uni Innsbruck Informatik - 61
Gantt Chart
• Scheduling chart
• How tasks are mapped to
machines over time
• A timetable of machine usage
Gantt Chart
T2
Machine
M1
M2
T1
M3
Timeline
T4
T6
T3
T8
T5
T7
Uni Innsbruck Informatik - 62
Condor case study
• Application name, parameters, etc. + requirements specified in ClassAds
– “Requirements = Memory >= 256 && Disk > 10000; Rank = (KFLOPS*10000) + Memory“
 only use computers which match requirements (else error), order them by rank
– Explicit support for parameter sweeps: loop variables
• Resources registered with description; “central manager“ checks pool against
application ClassAds (“matchmaking“) every 5 minutes, assigns jobs
• Checkpointing in Condor: need to recompile applications,
link with special library (redirects syscalls)
– Save current state for fault tolerance or vacating jobs
• Because preempted by higher priority job, machine busy, or user demands it
• Used in Grid Application Development Software Project (GrADS) for
rescheduling (dynamic scheduling) and metascheduling (negotiation between
multiple applications); ClassAds language extended
– e.g., aggregation functions such as Max, Min, Sum
Uni Innsbruck Informatik - 63
Grid workflow applications
• Dependencies between applications (or large parts of applications)
typically specified in Directed Acyclic Graph (DAG)
– Condor: DAG manager (DAGMan) uses .dag file for simple dependencies
– “Do not run job ‘B’ until job ‘A’ has completed successfully”
• DAGMan scheduling: for all tasks do...
– Find task with earliest starting time
– Allocate it to processor with Earlierst Finish Time
– Remove task from list
• GriPhyN (Grid Physics Network) facilitates workflow design
with “Pegasus“ (Planning for Execution in Grids) framework
– Specification of abstract workflow: identify application components,
formulate workflow specifying the execution order, using
logical names for components and files
– Automatic generation of concrete wf (map components to resources)
– Concrete workflow submitted to Condor-G/DAGMan
Uni Innsbruck Informatik - 64
Grid Workflow Applications /2
Grid Workflows
based on activities
Dynamic Instantiation
Service Orchestration
Quality of Service
Web Services
Service Description
Discovery, Selection
Deployment, Invocation
Components
Descriptor Generation
Component Interaction
Optimization, Adaptation
Legacy codes
OMP
MPI
MPI
HPF
OMP
HPF
MPI
Java
Legacy Codes
Source: presentation by Thomas Fahringer
• Components are built, Web (Grid) Services are defined,
Activities are specified
• Several projects (e.g. K-WF Grid) and systems (e.g. ASKALON) exist
• Most applications have simple workflows
– E.g. Montage: dissects space image, distributes processing, merges results
Uni Innsbruck Informatik - 65
Source: http://www.dps.uibk.ac.at/projects/teuta/
Uni Innsbruck Informatik - 66
Scheduling example: HEFT algorithm
Step 1 - task prioritizing
Task P1
P2
Task
Rank calculation
1
1
5
0.5
1
2
0.5
1.5
4
2+0.5+7=9.5
3
2
2
3
2+0.5+2=4.5
4
1.5
2.5
2
5
0.5
0.5
1+max(0.5+2+2+1,
0.5+7+2+2)=12.5
1
1
3
16.5
2
1+max(12.5+3, 0.5+7+2+4)=16.5
2
12.5
4
9.5
3
4.5
5
0.5
• Rank of a task: longest “distance“
to the end
(Mean processing + transfer costs)
• Tasks are sorted by decreasing
rank order
4
2
1
1
Task Rank
1
1
4
2
3
7
2
5
0.5
2
Uni Innsbruck Informatik - 67
Step 2 - processor selection (EFT)
P1
1
0
1
1
4
2
1
4
2
2
4
3
7
2
5
FT(T2, P1) = 1+0.5=1.5
FT(T2, P2) = 1+3+1.5=5.5
2
2
2
FT(T1, P1) = 1
FT(T1, P2) = 1
1
3
1
P2
3
3
3
4
Task
P1
P2
1
1
1
2
0.5
1.5
4
1.5
2.5
3
2
2
5
0.5
0.5
FT(T4, P1) = 1.5+1.5=3
FT(T4, P2) = 1.5+2+2.5=6
FT(T5, P1) = 4.5+2+0.5=7
FT(T5, P2) = 3+7+0.5=10.5
5
6
7
FT(T3, P1) = 3+2=5
FT(T3, P2) = 1.5+1+2=4.5
5
Processor idle + task ready
Data transfer
Task processing
Uni Innsbruck Informatik - 68
HEFT discussion
• HEFT is not a solution, just a heuristic
– problem is known to be NP-complete
• Outperformed competitors (DAGMan
scheduling, genetic algorithm) in ASKALON
real-life experiments
– Still, many improvements possible
e.g., other functions than mean, and
extension for rescheduling suggested
• Heterogeneous network
capacities and traffic
interactions ignored
Not detected!
Tasks = {T1, T2, T3, T4}
Resources = {R1, R2, R3, R4}
Data transfers = {D1, D2, D3, D4}
Uni Innsbruck Informatik - 69
How far have we come?
• Remember: systems on last slides are still research
– Not standardized, not part of reference middleware implementations
– Right place (OS / Middleware / App) for some functions still undecided
• A lot is still manual
– Basically three choices for deploying an application on the Grid
• Simply use it if it‘s a parameter sweep
• “Gridify“ it (rewrite using customized allocation - e.g. MPICH-G2)
• Utilize a workflow tool
• Convergence between P2P systems and Grids has only just begun
• Several issues and possible improvements
– Large number of layers are a mismatch for high performance demands
– Network usage simplistic, no customized mechanisms
Uni Innsbruck Informatik - 70
How far have we come? /2
• Strangely, parallel processing background seems to be ignored
– E.g., work on task-processor mapping + P2P overlays such as hypercube = ?
Manual aid
needed
Arbitrary parallel
applications
Complexity
(Dependencies)
Workflow
applications
Instruction level
parallelism
Parameter
sweeps
Microcode
Granularity
Untouched!
Uni Innsbruck Informatik - 71
Grid InterNetworking
An example of ongoing research
Note: “InterNetworking“ refers to networking assuming
infrastructure = public Internet
(some different work was done for large dedicated networks
(e.g. User Controlled LightPath, UCLP) )
Uni Innsbruck Informatik - 72
Research gap: Grid-specific
network enhancements
Enriched with customised
network mechanisms
Original Internet technology
Bringing the Grid to its full potential !
Today‘s Grid
applications
Driving a racing car
on a public road
Traditional Internet
applications
(web browser, ftp, ..)
EC-GIN
EC-GIN enabled
Grid applications
Applications with special
network properties and
requirements
Real-time multimedia
applications (VoIP,
video conference, ..)
Uni Innsbruck Informatik - 73
Grid-network peculiarities
• Special behavior
– Predictable traffic pattern - this is totally new to the Internet!
– Web: users create traffic
– FTP download: starts ... ends
– Streaming video: either CBR or depends on content! (head movement, ..)
• Could be exploited by congestion control mechanisms
– Distinction: Bulk data transfer (e.g. GridFTP) vs. control messages (e.g. SOAP)
– File transfers are often “pushed“ and not “pulled“
– Distributed System which is active for a while
• overlay based network enhancements possible
– Multicast
– P2P paradigm: “do work for others for the sake of enhancing the whole system (in
your own interest)“ can be applied - e.g. act as a PEP, ...
• sophisticated network measurements possible
– can exploit longevity and distributed infrastructure
• Special requirements
– file transfer delay predictions
• note: useless without knowing about shared bottlenecks
– QoS, but for file transfers only (“advance reservation“)
Uni Innsbruck Informatik - 74
What is EC-GIN?
• European project: Europe-China Grid InterNetworking
– STREP in IST FP6 Call 6
– 2.2 MEuro, 11 partners (7 Europe + 4 China)
– Networkers developing mechanisms for Grids
Uni Innsbruck Informatik - 75
Research Challenges
• Research Challenges:
– How to model Grid traffic?
• Much is known about web traffic (e.g. self-similarity) - but the Grid is different!
– How to simulate a Grid-network?
• Necessary for checking various environment conditions
• May require traffic model (above)
• Currently, Grid-Sim / Net-Sim are two separate worlds
(different goals, assumptions, tools, people)
– How to specify network requirements?
• Explicit or implicit, guaranteed or “elastic“, various possible levels of granularity
– How to align network and Grid economics?
• Combined usage based pricing for various resources including the network
– What P2P methods are suitable for the Grid?
• What is the right means for storing short-lived performance data?
Uni Innsbruck Informatik - 76
NWS: The Network Weather Service
• Most common tool for performance prediction
– Important for making good scheduling decisions
• Distributed system consisting of
–
–
–
–
Name Server (boring)
Sensor - actual measurement instance, regularly stores values in......
Persistent State
Forecaster (calculations based on data in Persistent State)
• Interesting parts:
Duration of a
long TCP transfer
RTT of a small
message
– Sensor
Measured resources: availableCpu, bandwidthTcp, connectTimeTcp, currentCpu,
freeDisk, freeMemory, latencyTcp
– Forecaster
Apply different models for prediction, compare with actual measurement data,
choose best match
Uni Innsbruck Informatik - 77
NWS critique
• Architecture (splitting into sensors, forecaster etc.) seems reasonable;
open source  consider integrating new work in NWS
• Sensor
– active measurements even though non-intrusiveness was an important design goal
- does not passively monitor TCP (i.e. ignores available data)
– strange methodology:
ssthresh (Large message throughput) “Empirically, we have observed that a message size
of 64K bytes (..) yields meaningful results“
– ignores packet size ( = measurement granularity ) and path characteristics
– trivial method - much more sophisticated methods available
– point-to-point measurements: distributed infrastructure not taken into account
• Forecaster
– relies on these weird measurements, where we don‘t know much about the
distribution (but we do know some things about net traffic IFF properly measured)
– uses quite trivial models (but they may in fact suffice...)
Uni Innsbruck Informatik - 78
NWS measurements (Austrian Grid)
Muhammad Murtaza Yousaf, Michael Welzl, Malik Muhammad Junaid: "Fog in the Network Weather Service: A case
for novel Approaches", MetroGrid workshop, co-located with GridNets 2007, Lyon, France, 19 October 2007.
• Salzburg-Linz (left): more than 20 MB needed to saturate link
• Within Innsbruck (right), Gigabit link: around 100 MB needed
• NWS supposedly designed to be non-intrusive...
Uni Innsbruck Informatik - 79
The impact of shared bottlenecks
• Example problem:
– C allocates tasks to A and B (CPU, memory available); both send results to C
– B hinders A - task of B should have been kept at C!
• Path changes are rare - thus, possible to detect potential problem in advance
– generate test messages from A, B to C - identify signature from B in A‘s traffic
• Another issue in this scenario: how valid is a prediction that A obtains if a
measurement / prediction system does not know about the shared bottleneck?
Uni Innsbruck Informatik - 80
EC-GIN Large File Transfer Scenario (LFTS)
Multipath file transfer
Multipath file transfer not beneficial
(AB + ACB) beneficial
due to shared bottleneck
Questions: when does this make sense, how to expose this functionality,
how to authenticate and authorize…?
Uni Innsbruck Informatik - 81
Shared bottleneck detection with SVD
Muhammad Murtaza Yousaf, Michael Welzl, Bulent Yener (2007); under submission
• Input: end-to-end forward delays of multiple flows
• Analysis
– Multivariate Analysis Method SVD (Singular Value Decomposition)
• Matrix operation which yields clustered values for correlating flows
– Calculate differences between values, consider changes between clusters as
outliers, apply simple outlier detection method
• Output: clusters of flows which share a bottleneck
– Very precise, easy to calculate, can cluster multiple flows at the same time (other
work uses pairwise cross correlation)
Uni Innsbruck Informatik - 82
Extending the Padhye equation to N flows
Dragana Damjanovic, Werner Heiss, Michael Welzl: "An Extension of the TCP Steady-State Throughput Equation for
Parallel TCP Flows", poster, ACM SIGCOMM 2007, 27-31 August, Kyoto, Japan.
• Fair amount of work done, but so far, no (easily usable) approximation exists
which also takes loss into account
• Useful in a Grid for multiple reasons:
– Prediction of GridFTP throughput (multiple TCP flows)
– Protocol with tunable aggression (MulTFRC)
• Because, if a Grid application uses two flows which share a bottleneck, flow 1
may be 3.7 times as important to it as flow 2 (e.g. if flow 2 is from replication)
• For fairness: if we take a break, we may earn “aggression points“
0  12bpE[W ]4  (4bp  16np  12bnp) E[W ]3 
(6bnp  32np  5bn 2 p  bp  32n ) E[W ]2 
(4bn 3 p  4bnp) E[W ]  (4bn 4 p  8bn 3 p  4bn 2 p )
we get E[W]
depending on n, p
and b
Uni Innsbruck Informatik - 83
Other current EC-GIN work
• Scheduling of advance reservations for bulk data transfers (using
high-speed congestion control mechanisms)
– another NP-complete scheduling problem
• Work package dedicated to modeling (and ns-2 simulation code)
– currently collecting measurements from everywhere...
– Grid-Net scheduling simulator using ns-2 already available
• High-speed SOAP engine
– Tackling multiple parts of the SOAP stack: asynchronous I/O, pull based
XML parsing, data mapping template
• Extending P2P incentive + security mechanisms for the Grid
Uni Innsbruck Informatik - 84
Conclusion
• Grids exist, they are working and growing
– and complex
– only briefly touched upon several aspects of the Grid here
• At the same time, still a lot of room for enhancement
– e.g. Grid InterNetworking is a research gap
• We are still quite far away from the power outlet goal
• Recently, the term “Grid“ has become unpopular
– replaced with Service Oriented [whatever] (SO-?)
– New EC vision: “Service Oriented Knowledge Utility“ (SOKU)
• Web Services + Semantics + Dynamics …
• Terms may change, but the user demand and researcher / developer interest
will not go away; Grid will stay, but evolution still ongoing