Studied Grid Technologies - Center for Systems and Software
Download
Report
Transcript Studied Grid Technologies - Center for Systems and Software
The Anatomy and Physiology
of the Grid Revisited
Nenad Medvidovic
USC-CSSE and Computer Science Department
University of Southern California
[email protected]
http://csse.usc.edu/~neno/
Collaborative work with
Joshua Garcia, Ivo Krka, Chris Mattmann, and Daniel Popescu
What is the grid?
• A distributed systems technology that enables the
sharing of resources across organizations scalably,
efficiently, reliably, and securely
• Analogous to the electric grid
Why Study the Grid?
• A highly successful technology
• Deficiencies in the existing guidance for building
grids
More to come
• Grids are not easy to build
– See CERN’s Large Hadron Collider
• Their architecture was published very early
– “anatomy” and “physiology”
• Yet
“What is (not) a grid?”
is still a subject of debate
The Architectural Perspective
• Grids are large, complex systems
– Thousands of nodes or more
– Span many agency boundaries
• Qualities of Service (QoS) are critical
–
–
–
–
Scalability
Security
Performance
Reliability ...
• Software architecture is just what the doctor
ordered
The set of principal design decisions about a software
system [Taylor, Medvidovic, Dashofy 2009]
So, What Did We Set out to Do?
• Study grid’s reference requirements and
architecture
• Study the architectures of existing grid
technologies
• Compare the two
Knowing that there will likely be very few
straightforward answers
• Suggest how to fix any discrepancies
Knowing that there will likely be very few
straightforward answers
Architectural Recovery Approach
Original grid reference architecture
Some Reference Requirements
Studied Grid Technologies
Technology
PL
KSLOC
# Modules
Alchemi
Apache Hadoop
Apache HBase
Condor
DSpace
Ganglia
GLIDE
Globus 4.0 (GT 4.0)
Grid Datafarm
Gridbus Broker
Jcgrid
OODT
Pegasus
SciFlo
iRODS
Sun Grid Engine
Unicore
Wings
C# (.NET)
Java, C/C++
Java, Ruby, Thrift
Java, C/C++
Java
C
Java
Java, C/C++
Java, C
Java
Java
Java
Java, C
Python
Java, C/C++
Java, C/C++
Java
Java
26.2
66.5
14.1
51.6
23.4
19.3
2
2218.7
51.4
30.5
6.7
14
79
18.5
84.1
265.1
571
8.8
186
1643
362
962
217
22
57
2522
220
566
150
320
659
129
163
572
3665
97
Architecture Recovery Technique
- Focus • Establish idealized architecture and candidate
architectural style(s)
• Identify data and processing components
– Groups implementation modules according to a set of
rules
• Map identified data and processing components onto
an idealized architecture
Examine
Source code
Documentation
Runtime behavior
Tie to requirements satisfied by component
Rules of Focus
1.
2.
3.
4.
5.
6.
7.
Group based on isolated classes
Group based on generalization
Group based on aggregation
Group based on composition
Group based on two-way association
Identify domain classes
Merge classes with a single originating domain class
association into domain class
8. Group classes along a domain class circular dependency
path
9. Group classes along a path with a start node and end
node that
reference a domain class
10. Group classes along paths with the same end node, and
whose start node references the same domain class
Some Refinements to the Rules
• Domain class rules
– Class with large majority of outgoing calls
• Exclusion rules
–
–
–
–
Class with large majority of incoming calls
Utility classes
Heavily passed data-structures
Benchmarking and test classes
• Additional groupings
– By exception
– By interface
– By package if idealized architecture matches first-class
component
Focus Rules for Distributed Systems
• Infer distributor connectors from idealized
architecture
• Classes with methods and names similar to
first-class components are domain classes
• Classes importing network communication
libraries are domain classes
• main() functions often identify first-class
components
• Classes deployed onto different hosts must be
grouped separately
Discovered discrepancies
•
•
•
•
Empty layers
Skipped Layers
Up-calls
Multi-layer components
Empty
Layers
- Wings -
Skipped
Layers
- Pegasus -
Upcalls
- Hadoop -
Multi-Layer
Components
- iRODS -
What about Globus?
What about Globus?
Application
GetOpts
JMSAdapterClient
Two layer boundary
AND
Upcall
upcall
CLOptionDescriptor
EJBServiceClient
ToolingCommand
Document
ServiceRequest
OGSA ClientOperation
CommandLineTool
CL Option
CLArgsParser
Collective
Element
Couldn’t
determine right
“layer”
JavaGridServiceDeployConstants
ServiceLocator
JavaGridServiceDeployWriter
GridContext
GenerateUndeploy
Emiter
Exception Data
DynamicFactoryCallbackImpl
HomeWrapper
EJBFactoryCallback
TypeMappingInfo
NotificationSubscriptionFactoryCallbackImpl
Two layer
Resource
boundary
AND
upcall
WSDL2
Map
Utilities
ServiceDataAnnotation
WSDL2Java
DescriptorHandler
PersistentGridServiceImpl
List
TimerTask
ServiceNotificationThread
BasicHandler
TypeEntry
Parser
ServiceDataAttributes
ServiceAnnotatorSimpleWriter
HandleType
Upcall
ExtendedDateTimeType
SecureContainerHandler
Connectivity
WSDDService
Method
Fabric
MessageContext
ServiceData
ServiceDesc
BinarySecurityToken
CreateInfo
RPCURIProvider
ServiceDataSet
ServiceActivatorHolder
PrivilegedInvokeMethodAction
BinarySecurityTokenFactory
GlobusDescriptorSetter
NotificationSinkNotifyer
WSDLConstants
FlattenedWSDLDefinition
Semaphore
OGSI AuthenticationToken
GSSCredential
OGSI FaultType
Upcall
Java2WSDL
SecurityDescriptor
Two layer
boundary AND
OGSI LoggingFaultElement
ServiceDeployment
PrivateKey
SymbolTable
ServiceEntry
NotificationSinkManager
ServiceContainer
upcall
GroupLogAttribute
SecContext
PerformanceLog
UUID
OGSIHolder
ServiceLifecycleMonitorImpl
X509 Certificate
AuthMethod
JAXRPCHandler
GSSContext
OGSIType
OGSI AuthenticationFault
JavaClassWriter
ServicePropertiesImpl
Discrepancies
Found
Revised Grid Architecture
•
•
•
•
The connectivity layer is eliminated
Explicitly addressing deployment view
Subsystem types rather than layer-oriented
Four architectural styles comprise the grid
– Client/server
– Peer-to-peer
– Layered
– Event-based
• An improved classification of grid technologies
Revised Grid
Reference
Architecture
Grid Styles – C/S
• Application components are clients
to Collective components
– e.g., application components query
for resource component locations
from collective components
• Application components are clients
to Resource components
– e.g., direct job submission from
application components to resource
components
• Resource components can act as
clients to Collective components
– e.g., resource components may obtain
locations of other resource
components through collective
components
Grid Styles – p2p
• Resource components are
peers
– e.g., Grid Datafarm Filesystem
Daemon (gfsd) instance makes
requests for file data from
other gfsds
• Collective components are
peers
– e.g., iRODS agents
communicate with each other
to exchange data to create
replicas
Grid Styles – Event-Based
• Resource components notify
Collective components that
monitor them
– e.g., executors send heartbeats
to managers
Grid Architectural Styles – Layered
• Collective or Resource
components request
services from Fabric
components
– e.g., iRODS agent accesses a
DBMS with metadata
Grid Technology Classification
• Computational grid
– Implementing all
Collective components
– e.g., Alchemi and Sun
Grid Engine
Grid Technology Classification
• Data grid
– Job scheduling
components in Collective
subsystem are not
required
– e.g., Grid Datafarm and
Hadoop
Grid Technology Classification
• Hybrid
– Resource components
providing services either
to perform operations
on a storage repository
or to execute a job or
task
– e.g. Gridbus Broker and
iRODS
File
Resource
Computational
Resource
Correcting Violations in
the Reference Architecture
• Why were there originally so many upcalls?
– Legitimate client-server and event-based communication
• Why so many skipped layer calls?
– The Fabric layer was at the wrong level of abstraction
– Mostly utility classes that should be abstracted away
• Why so many multi-layer components?
– Connectivity layer was at the wrong level of abstraction
– Not a layer, but utility libraries to enable connector functionality
– Also accounts for skipped layer calls
• Benefit of the deployment view
– Essential for distributed systems
– Helped to identify that the Fabric layer was not abstracted
properly
Where Are We Currently?
• There are remaining violations
– Are they legitimate or a result of an improperly recast
reference architecture?
• Original Focus is not ideal for recovering systems
of these types
– Distributed systems realized by a middleware
• A more automated approach that combines static
and dynamic analysis would be preferable
• Use the recast reference architecture to build a
new grid
• What are the overarching grid principles?
Evolving Grid Principles
1.
2.
3.
4.
5.
6.
7.
A grid is a collection of logical resources (computing and data) distributed
across a wide-area network of physical resources (hosts).
In a single grid-based application, the logical resources are owned by a single
agency, while the physical resources are owned by multiple agencies.
All resources in a grid are described using a common meta-resource language.
Atomic-level logical resources are defined independently of the atomic-level
physical resources.
The allocation of the atomic-level logical resources to the atomic-level
physical resources can be N:M.
All computation in a grid is initiated by a client, which is a physical resource.
The client sends the logical resources to the servers, which are also physical
resources. A server can, in turn, delegate the requested computation to other
physical resources.
All agencies that own physical resources in a grid must be able to specify
policies that enforce the manner in and extent to which their physical
resources can be used in grid applications.