xrootd - Indico

Download Report

Transcript xrootd - Indico

Hyper-Scaling
Xrootd Clustering
Andrew Hanushevsky
Stanford Linear Accelerator Center
Stanford University
29-September-2005
http://xrootd.slac.stanford.edu
Root 2005 Users Workshop
CERN
September 28-30, 2005
Outline
Xrootd Single Server Scaling
Hyper-Scaling via Clustering


Architecture
Performance
Configuring Clusters



Detailed relationships
Example configuration
Adding fault-tolerance
Conclusion
29-September-05
2: http://xrootd.slac.stanford.edu
Latency Per Request (xrootd)
29-September-05
3: http://xrootd.slac.stanford.edu
Capacity vs Load (xrootd)
29-September-05
4: http://xrootd.slac.stanford.edu
xrootd Server Scaling
Linear scaling relative to load

Allows deterministic sizing of server
Disk
 NIC
 Network Fabric
 CPU
 Memory

Performance tied directly to hardware cost
29-September-05
5: http://xrootd.slac.stanford.edu
Hyper-Scaling
xrootd servers can be clustered

Increase access points and available data


Complete scaling
Allow for automatic failover

Comprehensive fault-tolerance
The trick is to do so in a way that

Cluster overhead (human & non-human) scales linearly



Allows deterministic sizing of cluster
Cluster size is not artificially limited
I/O performance is not affected
29-September-05
6: http://xrootd.slac.stanford.edu
Basic Cluster Architecture
Software cross bar switch

Allows point-to-point connections


Client and data server
I/O performance not compromised

Assuming switch overhead can be amortized
Scale interconnections by stacking switches

Virtually unlimited connection points

Switch overhead must be very low
29-September-05
7: http://xrootd.slac.stanford.edu
Single Level Switch
A
open file X
Redirectors
Cache file
location
2nd open X go to C
Who has file X?
B
go to C
Client
Redirector
(Head Node)
C
Data Servers
Cluster
Client sees all servers as xrootd data servers
29-September-05
8: http://xrootd.slac.stanford.edu
Two Level Switch
Client
A
Data Servers
open file X
B
D
C
Supervisor
E
(sub-redirector)
F
go to C
Redirector
(Head Node)
open file X
go to F
Cluster
Client sees all servers as xrootd data servers
29-September-05
9: http://xrootd.slac.stanford.edu
Making Clusters Efficient
Cell size, structure, & search protocol are critical

Cell Size is 64




Limits direct inter-chatter to 64 entities
Compresses incoming information by up to a factor of 64
Can use very efficient 64-bit logical operations
Hierarchical structures usually most efficient


Cells arranged in a B-Tree (i.e., B64-Tree)
Scales 64h (where h is the tree height)



Client needs h-1 hops to find one of 64h servers (2 hops for 262,144 servers)
Number of responses is bounded at each level of the tree
Search is a directed broadcast query/rarely respond protocol

Provably best scheme if less than 50% of servers have the wanted file


Generally true if number of files >> cluster capacity
Cluster protocol becomes more efficient as cluster size increases
29-September-05
10: http://xrootd.slac.stanford.edu
Cluster Scale Management
Massive clusters must be self-managing

Scales 64n where n is height of tree
Scales very quickly (642 = 4096, 643 = 262,144)
 Well beyond direct human management capabilities


Therefore clusters self-organize
Single configuration file for all nodes
 Uses a minimal spanning tree algorithm




280 nodes self-cluster in about 7 seconds
890 nodes self-cluster in about 56 seconds
Most overhead is in wait time to prevent thrashing
29-September-05
11: http://xrootd.slac.stanford.edu
Clustering Impact
Redirection overhead must be amortized

This is deterministic process for xrootd
All I/O is via point-to-point connections
 Can trivially use single-server performance data


Clustering overhead is non-trivial
100-200us additional for an open call
 Not good for very small files or short “open” times


However, compatible with the HEP access patterns
29-September-05
12: http://xrootd.slac.stanford.edu
Detailed Cluster Architecture
A cell is 1-to-64 entities (servers or cells)
clustered around a cell manager
The cellular process is self-regulating and creates a
B-64 Tree
Head Node
29-September-05
M
13: http://xrootd.slac.stanford.edu
xrootd
The Internal Details
ofs odc
xrootd
olbd
Control Network
Managers, Supervisors & Servers
(resource info, file location)
Redirectors
olbd
M
Data Network
(redirectors steer clients to data
Data servers provide data)
ctl
xrootd
Data Clients
29-September-05
olb
olbd
S
data
xrootd
Data Servers
14: http://xrootd.slac.stanford.edu
xrootd
Schema Configuration
ofs odc
olb
Redirectors
Supervisors
Data Servers
(Head Node)
ofs.redirect remote
odc.manager host port
(sub-redirector)
(end-node)
ofs.redirect remote
ofs.redirect target
ofs.redirect target
olb.role manager
olb.port port
olb.allow hostpat
olb.role supervisor
olb.subscribe host port
olb.allow hostpat
29-September-05
15: http://xrootd.slac.stanford.edu
olb.role server
olb.subscribe host port
Example: SLAC Configuration
kan01
kan02
kan03
kanrdr01
kan04
kanrdr02
kanxx
kanrdr-a
client machines
Hidden Details
29-September-05
16: http://xrootd.slac.stanford.edu
xrootd
Configuration File
ofs odc
if kanrdr-a+
olb.role manager
olb.port 3121
olb.allow host kan*.slac.stanford.edu
ofs.redirect remote
odc.manager kanrdr-a+ 3121
else
olb.role server
olb.subscribe kanrdr-a+ 3121
ofs.redirect target
fi
29-September-05
17: http://xrootd.slac.stanford.edu
olb
xrootd
Potential Simplification?
if kanrdr-a+
olb.role manager
olb.port 3121
olb.allow host kan*.slac.stanford.edu
ofs.redirect remote
odc.manager kanrdr-a+ 3121
else
olb.role server
olb.subscribe kanrdr-a+ 3121
ofs.redirect target
fi
ofs odc
olb.port 3121
all.role manager
if kanrdr-a+
all.role server
if !kanrdr-a+
all.subscribe kanrdr-a+
olb.allow host kan*.slac.stanford.edu
Is the simplification really better?
We’re not sure, what do you think?
29-September-05
olb
18: http://xrootd.slac.stanford.edu
Adding Fault Tolerance
xrootd
xrootd
olbd
olbd
xrootd
xrootd
xrootd
olbd
olbd
olbd
Data Server
xrootd
xrootd
(Leaf Node)
olbd
olbd
Manager
(Head Node)
Supervisor
(Intermediate Node)
Fully Replicate
Hot Spares
Data Replication
Restaging
Proxy Search*
^xrootd has builtin proxy support today; discriminating proxies will be available in a near future release.
29-September-05
19: http://xrootd.slac.stanford.edu
Conclusion
High performance data access systems achievable

The devil is in the details
High performance and clustering are synergetic

Allows unique performance, usability, scalability, and
recoverability characteristics
Such systems produce novel software architectures

Challenges


Creating applications that capitilize on such systems
Opportunities

Fast low cost access to huge amounts of data to speed discovery
29-September-05
20: http://xrootd.slac.stanford.edu
Acknowledgements
Fabrizio Furano, INFN Padova

Client-side design & development
Principal Collaborators

Alvise Dorigo (INFN), Peter Elmer (BaBar), Derek Feichtinger (CERN),
Geri Ganis (CERN), Guenter Kickinger (CERN), Andreas Peters (CERN),
Fons Rademakers (CERN), Gregory Sharp (Cornell), Bill Weeks (SLAC)
Deployment Teams
 FZK, DE; IN2P3, FR; INFN Padova, IT; CNAF Bologna,
IT; RAL, UK; STAR/BNL, US; CLEO/Cornell, US;
SLAC, US
US Department of Energy

Contract DE-AC02-76SF00515 with Stanford University
29-September-05
21: http://xrootd.slac.stanford.edu