xrootd - Indico
Download
Report
Transcript xrootd - Indico
Hyper-Scaling
Xrootd Clustering
Andrew Hanushevsky
Stanford Linear Accelerator Center
Stanford University
29-September-2005
http://xrootd.slac.stanford.edu
Root 2005 Users Workshop
CERN
September 28-30, 2005
Outline
Xrootd Single Server Scaling
Hyper-Scaling via Clustering
Architecture
Performance
Configuring Clusters
Detailed relationships
Example configuration
Adding fault-tolerance
Conclusion
29-September-05
2: http://xrootd.slac.stanford.edu
Latency Per Request (xrootd)
29-September-05
3: http://xrootd.slac.stanford.edu
Capacity vs Load (xrootd)
29-September-05
4: http://xrootd.slac.stanford.edu
xrootd Server Scaling
Linear scaling relative to load
Allows deterministic sizing of server
Disk
NIC
Network Fabric
CPU
Memory
Performance tied directly to hardware cost
29-September-05
5: http://xrootd.slac.stanford.edu
Hyper-Scaling
xrootd servers can be clustered
Increase access points and available data
Complete scaling
Allow for automatic failover
Comprehensive fault-tolerance
The trick is to do so in a way that
Cluster overhead (human & non-human) scales linearly
Allows deterministic sizing of cluster
Cluster size is not artificially limited
I/O performance is not affected
29-September-05
6: http://xrootd.slac.stanford.edu
Basic Cluster Architecture
Software cross bar switch
Allows point-to-point connections
Client and data server
I/O performance not compromised
Assuming switch overhead can be amortized
Scale interconnections by stacking switches
Virtually unlimited connection points
Switch overhead must be very low
29-September-05
7: http://xrootd.slac.stanford.edu
Single Level Switch
A
open file X
Redirectors
Cache file
location
2nd open X go to C
Who has file X?
B
go to C
Client
Redirector
(Head Node)
C
Data Servers
Cluster
Client sees all servers as xrootd data servers
29-September-05
8: http://xrootd.slac.stanford.edu
Two Level Switch
Client
A
Data Servers
open file X
B
D
C
Supervisor
E
(sub-redirector)
F
go to C
Redirector
(Head Node)
open file X
go to F
Cluster
Client sees all servers as xrootd data servers
29-September-05
9: http://xrootd.slac.stanford.edu
Making Clusters Efficient
Cell size, structure, & search protocol are critical
Cell Size is 64
Limits direct inter-chatter to 64 entities
Compresses incoming information by up to a factor of 64
Can use very efficient 64-bit logical operations
Hierarchical structures usually most efficient
Cells arranged in a B-Tree (i.e., B64-Tree)
Scales 64h (where h is the tree height)
Client needs h-1 hops to find one of 64h servers (2 hops for 262,144 servers)
Number of responses is bounded at each level of the tree
Search is a directed broadcast query/rarely respond protocol
Provably best scheme if less than 50% of servers have the wanted file
Generally true if number of files >> cluster capacity
Cluster protocol becomes more efficient as cluster size increases
29-September-05
10: http://xrootd.slac.stanford.edu
Cluster Scale Management
Massive clusters must be self-managing
Scales 64n where n is height of tree
Scales very quickly (642 = 4096, 643 = 262,144)
Well beyond direct human management capabilities
Therefore clusters self-organize
Single configuration file for all nodes
Uses a minimal spanning tree algorithm
280 nodes self-cluster in about 7 seconds
890 nodes self-cluster in about 56 seconds
Most overhead is in wait time to prevent thrashing
29-September-05
11: http://xrootd.slac.stanford.edu
Clustering Impact
Redirection overhead must be amortized
This is deterministic process for xrootd
All I/O is via point-to-point connections
Can trivially use single-server performance data
Clustering overhead is non-trivial
100-200us additional for an open call
Not good for very small files or short “open” times
However, compatible with the HEP access patterns
29-September-05
12: http://xrootd.slac.stanford.edu
Detailed Cluster Architecture
A cell is 1-to-64 entities (servers or cells)
clustered around a cell manager
The cellular process is self-regulating and creates a
B-64 Tree
Head Node
29-September-05
M
13: http://xrootd.slac.stanford.edu
xrootd
The Internal Details
ofs odc
xrootd
olbd
Control Network
Managers, Supervisors & Servers
(resource info, file location)
Redirectors
olbd
M
Data Network
(redirectors steer clients to data
Data servers provide data)
ctl
xrootd
Data Clients
29-September-05
olb
olbd
S
data
xrootd
Data Servers
14: http://xrootd.slac.stanford.edu
xrootd
Schema Configuration
ofs odc
olb
Redirectors
Supervisors
Data Servers
(Head Node)
ofs.redirect remote
odc.manager host port
(sub-redirector)
(end-node)
ofs.redirect remote
ofs.redirect target
ofs.redirect target
olb.role manager
olb.port port
olb.allow hostpat
olb.role supervisor
olb.subscribe host port
olb.allow hostpat
29-September-05
15: http://xrootd.slac.stanford.edu
olb.role server
olb.subscribe host port
Example: SLAC Configuration
kan01
kan02
kan03
kanrdr01
kan04
kanrdr02
kanxx
kanrdr-a
client machines
Hidden Details
29-September-05
16: http://xrootd.slac.stanford.edu
xrootd
Configuration File
ofs odc
if kanrdr-a+
olb.role manager
olb.port 3121
olb.allow host kan*.slac.stanford.edu
ofs.redirect remote
odc.manager kanrdr-a+ 3121
else
olb.role server
olb.subscribe kanrdr-a+ 3121
ofs.redirect target
fi
29-September-05
17: http://xrootd.slac.stanford.edu
olb
xrootd
Potential Simplification?
if kanrdr-a+
olb.role manager
olb.port 3121
olb.allow host kan*.slac.stanford.edu
ofs.redirect remote
odc.manager kanrdr-a+ 3121
else
olb.role server
olb.subscribe kanrdr-a+ 3121
ofs.redirect target
fi
ofs odc
olb.port 3121
all.role manager
if kanrdr-a+
all.role server
if !kanrdr-a+
all.subscribe kanrdr-a+
olb.allow host kan*.slac.stanford.edu
Is the simplification really better?
We’re not sure, what do you think?
29-September-05
olb
18: http://xrootd.slac.stanford.edu
Adding Fault Tolerance
xrootd
xrootd
olbd
olbd
xrootd
xrootd
xrootd
olbd
olbd
olbd
Data Server
xrootd
xrootd
(Leaf Node)
olbd
olbd
Manager
(Head Node)
Supervisor
(Intermediate Node)
Fully Replicate
Hot Spares
Data Replication
Restaging
Proxy Search*
^xrootd has builtin proxy support today; discriminating proxies will be available in a near future release.
29-September-05
19: http://xrootd.slac.stanford.edu
Conclusion
High performance data access systems achievable
The devil is in the details
High performance and clustering are synergetic
Allows unique performance, usability, scalability, and
recoverability characteristics
Such systems produce novel software architectures
Challenges
Creating applications that capitilize on such systems
Opportunities
Fast low cost access to huge amounts of data to speed discovery
29-September-05
20: http://xrootd.slac.stanford.edu
Acknowledgements
Fabrizio Furano, INFN Padova
Client-side design & development
Principal Collaborators
Alvise Dorigo (INFN), Peter Elmer (BaBar), Derek Feichtinger (CERN),
Geri Ganis (CERN), Guenter Kickinger (CERN), Andreas Peters (CERN),
Fons Rademakers (CERN), Gregory Sharp (Cornell), Bill Weeks (SLAC)
Deployment Teams
FZK, DE; IN2P3, FR; INFN Padova, IT; CNAF Bologna,
IT; RAL, UK; STAR/BNL, US; CLEO/Cornell, US;
SLAC, US
US Department of Energy
Contract DE-AC02-76SF00515 with Stanford University
29-September-05
21: http://xrootd.slac.stanford.edu