PowerPoint - University of Houston

Download Report

Transcript PowerPoint - University of Houston

Automatic Clustering of Grid
Nodes
Nov 14, 2005
Qiang Xu, Jaspal Subhlok
University of Houston
High Performance Systems Lab
Texas Learning and Computation Center
Grid Scheduler
Computational
Resource
|
CPU, memory
Network Link
|
Latency,
Bandwidth
I will decide which
group of nodes are best
for an application!!!
Network
Topology
High Performance Systems Lab
Texas Learning and Computation Center
Network Topology
• Fine-grained physical network topology --- Hard!
heterogeneous, dynamic, and distributed nature of a
grid system
• We focus on the “logical” network topology
logical network topology: the connectivity between nodes
based on the observed behavior.
1) Easier to compute
2) Sufficient to tackle the resource selection problem
High Performance Systems Lab
Texas Learning and Computation Center
Discover Clusters/Logical Topology
A set of nodes with IP addresses / hostnames
Connectivity?
High Performance Systems Lab
Texas Learning and Computation Center
Discover Clusters/Logical Topology
Cluster A
Dist(A—B)
Dist(A—C)
Dist(B—C)
Cluster B
Cluster C
nodes close to each other  same cluster
High Performance Systems Lab
Texas Learning and Computation Center
Outline
•
•
•
•
•
Introduction
Internet  Geometric Space
Automatic Clustering
Experiments and Result
Conclusion
High Performance Systems Lab
Texas Learning and Computation Center
Internet Topology Map 1
A macroscopic snapshot of the Internet : 4 April 2005 - 17 April 2005.
High Performance Systems Lab
Texas Learning and Computation Center
Internet Topology Map 2
Internet map as of 1998 by
Bill Cheswick, Bell Labs
Hal Burch, CMU
High Performance Systems Lab
Texas Learning and Computation Center
Why Geometric Space ?
Internet Topology Map --- Complex!
I can’t tell the distance
between nodes!!
Geometric Space (N-Dimension Euclidean Space)
GNP(Global Network Positioning) --- T. S. Eugene Ng and
Hui Zhang, INFOCOM'02
High Performance Systems Lab
Texas Learning and Computation Center
Magic Landmarks!
12
3
8
Landmark
Node
Landmarks: A set of distributed nodes across the internet
High Performance Systems Lab
Texas Learning and Computation Center
Geometric Space
1. One axis per
landmark
X4=12
Y4=8
2. Coordinate of
nodes ≡ Latency
from each
landmark.
Z4=3
High Performance Systems Lab
Texas Learning and Computation Center
Internet  Geometric Space
Complex Internet Structure
Simple Geometric Space
High Performance Systems Lab
Texas Learning and Computation Center
Advantage of Geometric Space
• Simple --- distance in Geometric Space is well defined,
e.g. the Euclidean distance.
• Scalable --- for M Nodes
Pairwise distance among M nodes  M*M probes
Mapping to Geometric space  M*N probes
N is the number of landmarks – a number ~7 is
known to be sufficient.
• Easy to manage --- only need to control the landmarks
High Performance Systems Lab
Texas Learning and Computation Center
Outline
•
•
•
•
•
Introduction
Internet  Geometric Space
Automatic Clustering
Experiments and Result
Conclusion
High Performance Systems Lab
Texas Learning and Computation Center
Again the problem!
Cluster A
Dist(A—B)
Dist(A—C)
Dist(B—C)
Cluster B
Cluster C
High Performance Systems Lab
Texas Learning and Computation Center
Place Nodes in Geometric Space !
How do I cluster?
Simple Geometric Space
High Performance Systems Lab
Texas Learning and Computation Center
Distance and Threshold
• Network Distance:
Euclidean_Distance( A , B ) 
• Threshold:
N T
N
2
(A

B
)
 i i
i 1
If Distance < Threshold, nodes belong to the same logical
cluster
– N is the # of landmarks
– T parameter describes how close nodes have to be to
be in the same cluster
• for a typical domain to be one cluster ,T = 1ms
High Performance Systems Lab
Texas Learning and Computation Center
Build Unidirected Graph
• All grid nodes are graph nodes
• Add an edge between nodes if Distance < Threshold
High Performance Systems Lab
Texas Learning and Computation Center
Typical Case
• Edge exist if Distance < Threshold
Clusters are
obvious and
easy to
distinguish!
High Performance Systems Lab
Texas Learning and Computation Center
Pathological Case
• Border Node ?
Where are the
clusters?
General Case: Find
maximal cliques in
the graph – each
clique is a cluster
High Performance Systems Lab
Texas Learning and Computation Center
Summary of Inter-domain Clustering
1. Place Nodes in the geometric space.
2. Calculate the Euclidean distance.
3. Build a graph based on distance and Threshold.
4. Find the maximal cliques.
inter-domain clustering ---  good!
intra-domain clustering ---  not good enough!
High Performance Systems Lab
Texas Learning and Computation Center
Intra-domain clustering
•
Nodes in the same domain but in different subnets.
•
Short latency --- less than 1ms.
•
Landmark-based approach --- resolution is not
sufficient!
measurement error ~ real latency
We need to change the approach for intradomain clustering !
High Performance Systems Lab
Texas Learning and Computation Center
Intra-domain Clustering
1. Distance between nodes is directly measured latency
instead of projected geometrical distance.
(M × M but M is smaller and measurements are
quick.)
2. Basis for clustering is relative
Distance between any two nodes inside a cluster
is within β% of the smallest distance in the
cluster.
High Performance Systems Lab
Texas Learning and Computation Center
Intra-domain Clustering Procedure
Initially each node is a cluster
Each edge is measured latency
REPEAT:
Select least cost edge, say connecting clusters A and B
If A and B are not the same cluster; and if this edge cost is
within β% of least cost edges inside A and B, then combine
them into one cluster
High Performance Systems Lab
Texas Learning and Computation Center
Outline
•
•
•
•
•
Introduction
Internet  Geometric Space
Automatic Clustering
Experiments and Result
Conclusion
High Performance Systems Lab
Texas Learning and Computation Center
Experiments
• Inter-Domain Clustering
3 Landmarks: UT(Austin), Rice, CMU
36 Compute Nodes: Rice, UT-Dallas, TAMU-College
Station, TAMU-Galveston
• Intra-Domain Clustering
4 clusters at University of Houston:
PGH201, Itanium, Opetron, Stokes
• TCP Ping(not ICMP Ping) to measure latency
High Performance Systems Lab
Texas Learning and Computation Center
Inter-domain Cluster ( 2 landmarks)
+ UT Dallas
TAMU Galveston
TAMU College Station
Rice
Cannot
distinguish
between
UT Dallas
&
TAMU
Galveston
High Performance Systems Lab
Texas Learning and Computation Center
Inter-domain Cluster ( 3 landmarks)
+ UT Dallas
TAMU Galveston
TAMU College Station
Rice
4 clusters
are well
distinguished
High Performance Systems Lab
Texas Learning and Computation Center
Inter-domain Cluster ( 2 landmarks)
+ UT Dallas
TAMU Galveston
TAMU College Station
Rice
High Performance Systems Lab
Texas Learning and Computation Center
Intra-domain Cluster latency
Clusters PGH201 Opteron Itanium
Stokes
PGH201
0.09
0.32
0.32
0.30
Opteron
0.25
0.09
0.09
0.50
Itanium
0.30
0.10
0.10
0.35
Stokes
0.40
0.50
0.60
0.10
Latency between Nodes (ms)
High Performance Systems Lab
Texas Learning and Computation Center
Illustration of Intra-domain Clusters
+ UT Dallas
TAMU Galveston
TAMU College Station
Rice
High Performance Systems Lab
Texas Learning and Computation Center
Future Work
• Integrate into a grid scheduling system
• Use Bandwidth as a factor for clustering
• Dynamically update logical clusters
• Nodes behind a NAT (Network address
translation) -- nodes with local IP addresses
High Performance Systems Lab
Texas Learning and Computation Center
Conclusions
• Efficient and scalable procedure to
hierarchically group distributed nodes into
logical clusters
• Validation with experiments on nodes
distributed across Texas
• An important step for scheduling in a grid
environment.
High Performance Systems Lab
Texas Learning and Computation Center
Questions?
Thank you!
High Performance Systems Lab
Texas Learning and Computation Center