Peer-to-Peer Approaches to Grid Resource Discovery

Download Report

Transcript Peer-to-Peer Approaches to Grid Resource Discovery

Peer-to-Peer Approaches to Grid
Resource Discovery
Ann Chervenak
University of Southern California Information Sciences Institute
Joint work with Shishir Bharathi, Min Cai
1
Resource Discovery In Grids
• Applications running in wide area distributed
environments or Grids need information about
resources and services
• Resource information has an impact on
– Scheduling decisions
– Replica selection
– Planning for workflow execution (e.g., Pegasus)
• Resource discovery services need to be
– Highly available, fault tolerant, reliable
– Highly scalable
– Provide flexible query interfaces
• Typical Grid resource discovery services are
organized as hierarchies of index services
• Applying P2P techniques offers promise of selforganization, self-healing, improved scalability
2
Outline
• Resource Discovery in Grids: current approaches
• Peer-to-Peer Resource Discovery
• Applying P2P techniques to Grid Resource
Discovery
– Unstructured P2P Information Service
• Joint work with Shishir Bharathi
– Structured P2P Replica Location Service
• Joint work with Min Cai
• Summary and Future Work
3
Typical Grid Resource Discovery
Currently, most services for resource discovery in Grids are
query-based index services
• One or more indexes aggregate information about
resources
• Often distributed using a hierarchical structure
• Service provides a front end query interface
• Database back end stores resource information
• Service responds to queries by identifying resources that
match desired properties
• Scalable
– Hold large amounts of resource information
– Support high query rates
• Often specialized for particular types of resource
information and/or queries
– Specialized query APIs, resource schemas, etc.
4
Index-based Discovery Services
• Globus Monitoring and Discovery System (MDS)
– Hierarchical, distributed Index Service
– Aggregates information about resources (CPUs, storage
systems, etc.)
– Answers queries for resources with specified properties
– Typically close to resources, whose information may
change frequently (e.g., index co-located with a cluster)
5
Index-based Discovery Services (Cont.)
• Globus Replica Location
Service (RLS)
Replica Location Index Nodes
RLI
– Hierarchical, distributed index
– Provides mappings from logical
names for data to physical locations
of replicas
• Metadata Catalog Service (MCS)
RLI
LRC
RLI
RLI
LRC
RLI
LRC
LRC
Local Replica Catalogs
– Centralized database of metadata
attributes associated with data items
– Answers queries for resources with specified metadata
characteristics
• Storage Resource Broker MCAT
– Centralized (or partitioned) catalog with metadata,
replica location and resource information
6
Challenges for Resource Discovery Services
• Grids are growing larger
– Increasing number of resources and resource discovery
service instances
• Organization of resource discovery services is
challenging
– Creation/maintenance of efficient hierarchies
– Avoiding hotspots, eliminating update cycles
• Much of the configuration and maintenance of
these services is done manually
– Few capabilities for self-configuration or self-healing
– Limits scalability
– Makes services complex to deploy and maintain
• Goal: Use peer-to-peer techniques make services
more self-configuring, reliable and scalable
7
Peer-to-Peer Systems
• Service instances create an overlay
network
• Queries and responses forwarded/routed
in the overlay
• Structured overlay – Chord, Pastry, etc.
– Distributed Hash Table (DHT) based
– Effective when storing/retrieving <key, value>
pairs
– Strong bounds on performance
• Unstructured overlay – Gnutella, KaZaA
– Effective when querying on attributes
– Not DHT based
– Flooding algorithms
• Hybrid approaches also possible
• Good scalability, self-organization,
reliability, self-healing
8
Structured P2P Networks
• Maintain a structured overlay network among peers and use
message routing
• Basic functionality: lookup (key), which returns the identity
of the node storing the object with that key
• Often based on Distributed Hash Table (DHT)
• Objects are associated with a key that can be produced by
hashing the object name
• Nodes have identifiers that share the same space as keys
• Each node is responsible for storing a range of keys and
corresponding objects
• Nodes maintain an overlay network, with each node having
several other nodes as neighbors
• When a lookup (key) request is issued from one node, the
lookup message is routed through the overlay network to the
node responsible for the key
9
O(dN 1 / d )
Structured P2P Networks (cont.)
• Different DHT systems construct a variety of overlay
networks and employ different routing algorithms
• They can guarantee to finish a lookup operation in O(log N)
or O(dN1/d) hops
• Each node only maintains the information of O(log N) or d
neighbors for an N node network (where d is the dimension
of the hypercube organization of the network)
• So DHT systems provide good scalability as well as fault
tolerance
• DHT systems include Pastry, Chord, CAN and Koorde
10
Example: Chord Structured P2P System
• Chord algorithm proposed by Stoica, et al.
• Chord uses a one-dimensional circular identifier space with
modulo 2m for both node identifiers and object keys
• Every node in Chord is assigned a unique m-bit identifier by
hashing their IP address and port number
• All nodes self-organize into a ring topology based on their node
identifiers in the circular space
• Each object is also assigned a unique m-bit identifier called its
object key
• Object keys are assigned to nodes by using consistent hashing
– Key k is assigned to the first node whose identifier is equal to or
follows the identifier of k in the circular space
– This node stores the object with key k and is called its successor
node
11
An Example of Chord Network
N60
N4
N54
N8
Key52
Key18
N48
N20
N24
N40
Key31
12
Chord Structured P2P System (cont.)
• Each Chord node maintains two sets of neighbors: its
successors and its fingers
• Successor nodes immediately follow the node in the
identifier space
• Finger nodes are spaced exponentially around the identifier
space
• Each node has a constant number of successors and at most
m fingers
• The i-th finger for the node with identity n is the first node
that succeeds n by at least 2i-1 on the identifier circle,
where 1<=i<=m
• The first finger node is the immediate successor of n, where
i=1
13
An Example of Chord Network
Finger Table
N60
N4
N4+1,
N4+2,
N4+4
N54
N4+1 => N8
N4+2 => N8
N4+4 => N8
N4+8 => N20
N4+16 => N20
N4+32 => N40
N8
N4+32
N4+8,
N4+16
N48
N20
N24
N40
14
Chord (cont.)
• When node n wants to lookup the object with key k, it will
route a lookup request to the successor node of key k
• If the successor node is far away from n, node n forwards
the request to the finger node whose identifier most
immediately precedes the successor node of key k
• By repeating this process, the request gets closer and closer
to the successor node
• Eventually, the successor node receives the lookup request
for the object with key k, finds the object locally and sends
the result back to node n
• Each hop from one node to the next node covers at least
half the identifier space (clockwise) between that node and
the successor node of key k
• Number of routing hops for a lookup is O(log N) for a Chord
network with N nodes
– Insertion time also O(log N)
• Each node maintains pointers to O(log N) neighbors
15
An Example of Chord Network
Finger Table
Lookup(key52)
N60
N4
Key52
N54
lookup(key52)
N4+1,
N4+2,
N4+4
N4+1 => N8
N4+2 => N8
N4+4 => N8
N4+8 => N20
N4+16 => N20
N4+32 => N40
N8
Key52
N4+32
N4+8,
N4+16
Key18
N48
N20
N24
N40
Key31
16
Unstructured P2P Systems
• An unstructured P2P network usually does not impose any
constraints on links between nodes in the system
• Choice of neighbors to peer with is less restrictive and is
often probabilistic or randomized
• Unstructured overlays do not create associations between
nodes or links in the system and the information stored in
those nodes
– Do not require that information adhere to a particular format
or be tied to the structure of the overlay
– Unlike DHT systems that store <key,value> pairs and hash on
key value
• Information is usually stored only at the node where it was
generated or replicated in a probabilistic manner
• Query-response pathways are also not well defined
• Queries are propagated in the system using flooding based
algorithms
– Responses are routed back on the same path as the queries
17
Unstructured P2P Networks (cont.)
• Cannot provide guarantees on query performance
– Don’t know the number of hops taken by a query message to
reach a node that can answer the query
– Unlike structured overlays
• Cannot guarantee that results will be returned if they exist
in the network
– The time-to-live field in the message dictates how far the
message travels in the network
– Message may not reach all nodes
• Applications must be capable of dealing with these issues as
they do with other failure modes
18
Unstructured P2P Networks (cont.)
• Examples of unstructured P2P systems: Napster, Gnutella
and Kazaa
• Successful in internet file sharing applications
• Allow peers to host content, discover content on other
peers, and download that content
• Popular in the Internet community despite known
disadvantages:
– Vulnerability of central indexes in Napster
– High network loads imposed by Gnutella’s flooding algorithms
• Optimizations of unstructured systems have been developed
based on file and query distributions and on the use of
replication and caching
19
Outline
• Resource Discovery in Grids: current approaches
• Peer-to-Peer Resource Discovery
• Applying P2P techniques to Grid Resource
Discovery
– Unstructured P2P Information Service
• Joint work with Shishir Bharathi
– Structured P2P Replica Location Service
• Joint work with Min Cai
• Summary and Future Work
20
Applying Peer-to-Peer Techniques to
Grid Resource Discovery
• P2P technologies successful in internet file sharing
applications
– Gnutella, Kazaa, Napster, etc.
– Allow peers to host content, discover content and download
• Grid resource discovery services have similar
requirements
• Would like to use peer-to-peer technologies for
resource discovery in Grids
– Improved configuration, scalability, etc.
• Convergence of P2P and Grid has been predicted
– But not yet a reality…
21
Challenges in Applying P2P
Technologies to Grids
• Performance
– May require multiple network hops to resolve a query
– Some P2P overlays distribute resource information
widely (e.g., structured overlays)
– Still want to make use of specialized Grid indexes
• Security issues
– Access to resources and information about resources
may need to be controlled
– Need a security model that allows us to use P2P safely
• Practical Issues
– Has taken several years to make Grid discovery services
scalable, stable
– To support greater scalability, need further
improvements in simple and dynamic deployment
22
P2P Grid Information Service
• Explore organizing an existing grid information
service (GT4 Index Service ) as a peer-to-peer
system
• Background
– Grid Information Services
– GT4 Index Service
• P2P Index Service Design
– Issues and design choices
– Optimizations
• Implementation
• Experimental results
“Design of a Scalable Peer-to-Peer Information
System Using the GT4 Index Service”, Shishir
Bharathi and Ann Chervenak, CCGrid 2007
Conference, May 2007
23
Grid Information Services
• Collect information about resources
– skynet.isi.edu has 96 nodes
– GridFTP runs on port 2811
– Avg. load on skynet-login.isi.edu is 2.55
• Aggregate Information from multiple types of
resources and services
• Queried by schedulers, workflow planners, clients
that need information about current resource state
• Process different types of queries
– What port does GridFTP run on ?
• One response expected
– What servers have load < 3.0 ?
• Expect multiple response, information gathering step
24
Organization of GT4 Indexes
• Globus Toolkit Version 4 Index Service
– Part of the Monitoring and Discovery System (MDS)
– Aggregates information about resources, responds to queries
– Issues: Designing efficient hierarchies, avoiding hot spots,
avoiding update cycles, scaling with amount of information
– Dealing with multiple administrative domains
25
GT4 Index Service
• WS-RF service, part of the
Monitoring and Discovery
System (MDS)
• Information Providers
generate resource
information in XML format
– E.g. Hawkeye, Ganglia,
GridFTP
• Aggregator sources aggregate
information
• Index Service publishes
aggregated information in a
single Resource Property
document
• Processes XPath queries and
returns matching XML
content
26
Design of P2P GT4 Index Service
• Modified GT4 Index Services to create P2P Index Service
• P2P Indexes organize themselves into an unstructured overlay
• Each P2P Index can be accessed both via the overlay and as a
standalone GT4 Index Service
27
Design of GT4 Index Service (cont.)
• Overlay used only for self-organization and for
forwarding and routing of queries and responses
– Continue to use specialized MDS4 Index Services that
aggregate resource information
• Resource information is not stored or updated via
the overlay
– Each P2P Index acquires resource information via an outof-band mechanism
– Policies may dictate when and how indexes are updated
• Resource information is not replicated via the
overlay
– May change quickly, so replication is not effective
• Separate “storing of information” from “querying
for information”
28
Design Issue: Security Model
• Grid services typically impose a strict security model
– Client and server go through mutual authentication and
authorization
• P2P systems impose a less strict security model
– Access to information and access to resource via the same
overlay
• Separation of “access to information” from “access
to resources”
– User authenticates at any node and queries for information
• Trusted at VO level to access information.
• e.g. Find compute resource that can execute job
– User accesses resource directly and not through the overlay
• Involves mutual authentication and authorization
• Trusted at individual site level to access resources.
• e.g. Submit job to resource directly
29
Design Issue: Choice of Overlay
• Our choice: Unstructured Overlays
–
–
–
–
Easy overlay management
Suitable for storing arbitrary content
Support VO defined topologies
Previous work mostly in file-sharing
applications
• Why not Structured Overlays?
– Well researched in context of
information services. However…
– Not ideal for storing XML resource
information
– Policy restrictions may prevent nodes
from storing/indexing information
generated at other nodes
30
Issues for Unstructured Overlays
• Unstructured overlay + no replication + flooding
algorithm means…
– Cannot guarantee answer will be found
– Depends on max-hops field
– No guarantees on the number of hops taken to reach answer
• Exponential growth in the number of messages sent
– Need to optimize message forwarding to counter this
explosion
• Typical message handling
– Process query locally. If result not found, forward to peers
– Reduces number of messages sent
– BUT slow, if client expects multiple responses
31
Optimization: Query Caching with
Probabilistic Forwarding
• Goal: Reduce number of messages sent
– Replication/Caching most popular technique
• Cannot cache query responses
– Information may change quickly, policy restrictions, etc.
• Can cache queries
• Query Caching with Probabilistic Forwarding
– Cache information about which nodes responded to query
– If a node responded earlier to same query,
deterministically forward query to that node
– Forward query to other nodes with low probability
• Identify nodes that have been updated with new information
• Set up caches along duplicate paths correctly
– Similar to other learning-based & probabilistic approaches
• Effective for applications that may issue queries
repeatedly (e.g., Pegasus workflow planner)
32
Optimization: Early Forwarding
• Goal: Improve performance of attribute based queries
– Process and forward model may be slow
• Distinguish between “Return one” vs. “Return all” semantics
• Return one - explicit queries
– What is the load on skynet.isi.edu?
– Requires a single response
– Process query locally before forwarding to reduce messages
• Return all – attribute based queries
– What sites have load < 3.0?
– Likely to be multiple responses
– Forward query before processing locally to reduce response
time (“early forwarding”)
• Tag QUERY messages with hints (“Return one” or “Return
all”) that indicate whether to do early forwarding
33
Implementation of P2P Index Service
• Layered implementation
– P2P Resource component
maintains overlay and
processes messages
– IndexServiceResource
component processes
queries
– “Almost” plug-and-play
• Gnutella-like message
forwarding
• Updated using standard
MDS aggregator
framework and not
through the overlay
• Query Caching and Early
Forwarding optimizations
34
Experiments
Experimental Set-up
• Evaluate overhead of applying a P2P overlay
• Evaluate wide-area performance
• Test beds - LAN at ISI, PlanetLab
– Mostly comparison tests – small networks
Applications
• Pegasus (A workflow planning application)
– Site and transformation catalogs used by Pegasus
• Simple random query client
– Artificial records
Metrics
• Time taken by Pegasus to generate a plan
• Query rates
35
Experiments: Overhead of P2P Layer
Comparison of a single Index Service and a
single P2P Index Service - Indicator of P2P
Index Service
layer overhead
Time taken by
Pegasus to
generate a plan
(seconds)
600
P2P Index
Service
500
400
300
200
100
0
20
40
60
80 100 120 140 160
Number of data sets in the index
• Pegasus planning a 100 job workflow
• Query Overhead reasonably constant as the number
of datasets in the index increase
36
Queries/Minute
Experiments: Wide Area Performance
300
250
200
150
100
50
0
Scalability Tests on 3 Networks
1000 datasets at each index, 2 peers per
index
World WAN
US WAN
LAN
1
2
4
8
16
32
64
Number of clients (4 threads each)
• WAN measurements on the PlanetLab test bed
• 8 nodes, 2 peers per node, up to 256 concurrent client threads
• Query rates in “World” WAN slightly higher than in “US” WAN
– Higher load on the US PlanetLab nodes
– Query processing is compute intensive
37
P2P Index Service: Conclusions
• P2P Organization of GT4 Information Service
– Low overhead from adding a P2P layer
• Key design features:
– Separation of storage of information from querying for
information (Overlay used only to forward queries)
– Separation of access to information from access to
resources (Security model: Choose what is exposed at VO
level and apply additional restrictions at resource level)
• Simple optimizations help address issues with flooding
(results not shown here)
– Query caching with probabilistic forwarding
– Early Forwarding
• Future Work
– Scale to larger sizes
– P2P version of Replica Location Service using overlay
– Experiment with replicating relatively static information
38
Outline
• Resource Discovery in Grids: current
approaches
• Peer-to-Peer Resource Discovery
• Applying P2P techniques to Grid Resource
Discovery
– P2P Information Service
• Joint work with Shishir Bharathi
– P2P Replica Location Service
• Joint work with Min Cai
• Summary and Future Work
39
Peer-to-Peer Replica Location Service
• Implemented a P2P Replica Location Service based on:
– Globus Toolkit Version 3.0 RLS
– Chord structured Peer-to-Peer overlay network
“A Peer-to-Peer Replica Location Service Based on A
Distributed Hash Table,” Min Cai, Ann Chervenak, Martin
Frank, Proceedings of SC2004 Conference, November
2004.
“Applying Peer-to-Peer Techniques to Grid Replica
Location Services,” Ann L. Chervenak, Min Cai, Journal of
Grid Computing, 2006.
40
The Globus Replica Location Service
• The existing Globus Replica Location Service (RLS) is a
distributed registry service that records the locations of
data copies and allows discovery of replicas
• Maintains mappings between logical identifiers and target
names
Replica Location Index Nodes
• Local Replica Catalogs
(LRCs) contain logical-toRLI
RLI
RLI
target mappings
• Replica Location Index
Nodes (RLIs) aggregate
information about LRCs
• Soft state updates sent
from LRCs to RLIs
LRC
LRC
LRC
LRC
Local Replica Catalogs
41
Motivation for a Peer-to-Peer RLS
• Each RLS deployment is statically configured
– If upper level RLI fails, the lower level LRCs need to be
manually redirected
• More automated and flexible membership management
is desirable for:
– larger deployments
– dynamic environments where servers frequently join and leave
• We use a peer-to-peer approach to provide distributed
RLI index for {logical-name, LRC} mappings
• Consistent with our security model: resource discovery
at the RLI level, stricter security at LRC level
• In P2P RLS, replicate mappings, unlike in P2P MDS
– Easier to hash on logical name than on arbitrary XML content
– Mappings are much less dynamic than resource information
42
P2P Replica Location Service (P-RLS) Design
• A P-RLS server consists of:
– An unchanged Local Replica Catalog (LRC) to maintain
consistent {logical-name, target-name} mappings
– A Peer-to-Peer Replica Location Index node (P-RLI)
• The P-RLS design uses a Chord overlay network to
self-organize P-RLI servers
– Chord is a distributed hash table that supports scalable key
insertion and lookup
– Each node has log (N) neighbors in a network of N nodes
– A key is stored on its successor node (first node with ID
equal to or greater than key)
– Key insertion and lookup in log (N) hops
– Stabilization algorithm for overlay construction and topology
repair
43
P-RLS Design (cont.)
• Uses Chord algorithm to store mappings of logical
names to LRC sites
– Generates Chord key for a logical name by applying SHA1
hash function
– Stores {logical-name, LRC} mappings on the P-RLI
successor nodeof the mapping
• When P-RLI node receives a query for LRC(s) that
store mappings for a logical name:
– Answers the query if it contains the logical-to-LRC
mapping(s)
– If not, routes query to the successor node that contains
the mappings
• Then query LRCs directly for mappings from logical
names to replica locations
44
An Example of P-RLS Network
Finger Table
SHA1(“lfn1000”) = 18
SHA1(“lfn1001”) = 52
SHA1(“lfn1002”) = 31
rli_get_lrc
(“lfn1001”)
N60
N4
<lfn1001, rlsn://lrc1001>
N54
lookup(key52)
<lfn1001, rlsn://lrc1001>
N4+1,
N4+2,
N4+4
N8
N4+8,
N4+16
lookup(key52)
N4+32
N48
N4+1 => N8
N4+2 => N8
N4+4 => N8
N4+8 => N20
N4+16 => N20
N4+32 => N40
<lfn1000, rlsn://lrc1000>
lookup(key52)
N20
N24
N40
<lfn1002, rlsn://lrc1002>
45
P-RLS Implementation
• Implemented a prototype of P-RLS
• Extends RLS implementation in Globus Toolkit 3.0
• Each P-RLS node consists of an unchanged LRC server and a
peer-to-peer P-RLI server
• The P-RLI server implements the Chord protocol operations,
including join, update, query, successor, probing & stabilization
• LRC, RLI & Chord protocols implemented on top of RLS RPC layer
P-RLS
P-RLI
Server
LRC
RLI
Chord
Protocol Protocol Protocol
RLS RPC Layer
Successor,
Join,
Update,
Query,
Probing
Stabilizatio
n
Chord Network
LRC
Server
P-RLI
Server
RLS Client API
RLS Client API
LRC
Server
P-RLS
LRC
RLI
Chord
Protocol Protocol Protocol
RLS RPC Layer
46
P-RLS Performance
• P-RLS network runs on a 16-node cluster
• 1000 updates (add operations) on each node, updates overwrite
existing mappings, and maximum 1000 mappings in the network
• Update latencies increase on log scale with number of nodes
Update latency (ms)
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Number of nodes
47
P-RLS Measurements (cont.)
• Query latencies with 100,000 and 1 million mappings
• Total number of mappings has little effect on query times
– Uses hash table to index mappings on each P-RLI node
Query latency (ms)
• Query times increase on log scale with number of nodes
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
1
2
3
4
5
6 7 8 9 10 11 12 13 14 15
Number of nodes
100,000 preloaded mappings
1,000,000 preloaded mappings
48
Successor Replication in P-RLS
• Need better reliability when P-RLI nodes fail or leave
– Replicate mappings so they are not lost
• Min Cai proposes Adaptive Successor Replication:
– Extends previously described scheme
– Replicate each mapping on k successor nodes of the root node
• Provides reliability despite P-RLI failures
– No mappings lost unless all k successors fail simultaneously
• Distributes mappings more evenly among P-RLI nodes as
replication factor increases
– Improves load balancing for popular mappings
49
Summary: P-RLS Work
• Implemented a P2P Replica Location Service based on:
– Globus Toolkit Version 3.0 RLS
– Chord structured Peer-to-Peer overlay network
• Measured the performance of our P-RLS system with
up to 15 nodes
– Query and update latencies increase at rate of O(logN) with
size of P-RLS network
• Simulated the performance of larger P-RLS networks
• Replication of mappings results in more even
distribution of mappings among nodes
• Successor replication scheme provides query load
balancing for popular mappings
50
Related Work: P2P and Grids
Other approaches to applying P2P to Grid services
• GLARE
– Mumtaz Siddiqui et al., University of Innsbruck
– Structured P2P approach to Grid information services
• P2P Replica Location Service
– Matei Ripeanu et al., University of British Columbia
– Use bloom filters and an unstructured P2P network for
replica location service
Structured Peer-to-Peer Networks
• Examples: Chord, CAN, Tapestry, Pastry
Unstructured Peer-to-Peer Networks
• Examples: Gnutella, KaZaA, Gia
51
Summary
• Resource discovery services in Grids are well-suited to P2P
approaches
– Similar in function to internet file sharing applications
• P2P approaches are attractive because
– Scale of Grids growing larger
– Organization of hierarchical resource discovery services is
challenging, especially at large scale
– Need self-configuration, self-healing
• Security, performance & practical issues to be overcome
• Two systems implemented and evaluated
– P2P Information Service: Uses unstructured overlay to
support resource information specified in XML
– P2P Replica Location Service: Uses structured overlay to
distribute mappings among indexes, bounds query response
52
Future Work
• Continued research
– Larger scale
– Different overlay schemes
– Additional optimizations to improve query performance,
reduce message counts
• Incorporate P2P techniques into real-world Grid
resource discovery services
– Make GT4-based peer-to-peer overlay available as open
source contribution to Globus Toolkit
– Release P2P Index Service component for MDS4
– Tested a version of Replica Location Service using the
unstructured GT4 overlay
– Additional improvements to services for easy, dynamic
deployment to make these approaches practical
53