Transcript BASF

Distributed BELLE
Analysis Framework
HIGUCHI Takeo
Department of Physics, Faulty of Science, University of Tokyo
Representing dBASF Development Team
BELLE/CHEP2000
1
Introduction to B Factory at KEK
KEK-B
Accelerator is e + e - asymmetric energy collider:
3.5GeV/c for positrons
8.0GeV/c for electrons
Designed luminosity is 1.0 x 1034 cm-2s-1
Now KEK-B is operated at ~5.0 x 1032 cm-2s-1
BELLE Experiment
Goal of BELLE experiment is to study CP violation in B
meson decays
Experiment is in progress at KEK
BELLE/CHEP2000
2
BELLE Detector
3
BELLE Detector
SVD
ECL
Precise vertex detection
Electromagnetic
calorimeter for e - and g
reconstruction
CDC
Track momentum
reconstruction
Particle ID with dE/dx
KLM
Muon and K L and
detection
ACC
Aerogel Cherenkov counter
for particle ID
EFC
TOF
Particle ID and trigger
BELLE/CHEP2000
Electromagnetic
calorimeter for luminosity
measurement
4
Current Event Reconstruction
Computing Environments
Event reconstruction is performed on 8 SMP machines
UltraEnterprise x 7 servers equipped with 28 CPUs
Total CPU power is 1,200 SEPCint95
Sharing CPUs with user analysis jobs
MC production is done on PC farm (P3 500MHz x 4 x 16)
Reconstruction Speed
15Hz/server
70Hz/server with L4 (5.0 x 1032 cm-2s-1)
BELLE/CHEP2000
5
Necessity for System Upgrade
In Future
We will have more luminosity
200Hz after L4 (1.0 x 1034 cm-2s-1)
Data size may increase more
Possibly background
It causes lack of computing power
We need 10 times of current computing power when
considering DST reproduction and user analysis activities
BELLE/CHEP2000
6
Next Computing System
Low Cost Solution
We will build new computing farm with sufficient
computing power
Computing servers will consist of
~50 units of 4-CPU PC servers with Linux
~50 units of 4-CPU SPARC servers with Solaris
Total CPU power will be 12,000 SPECint95
BELLE/CHEP2000
7
Configuration of Next System
Tape I/O: 24MB/s
tape library
switch
FS
file server
Sun
Sun
Sun
Sun
I/O servers
Gigabit
switch
hub
hub
100Base-T
PC
PC servers
PC
PC
PC
8
Current Analysis Framework
BELLE AnalysiS Framework (B.A.S.F.)
B.A.S.F. supports event by event parallel processing on
SMP machines hiding parallel processing nature from
users
B.A.S.F. is currently used widely in BELLE from DST
production to user analysis
We develop an extension to B.A.S.F. to utilize many PC
servers connected via network to be used in next
computing system
BELLE/CHEP2000
9
New Analysis Framework
New Framework Should Provide:
Event by event parallel processing capability over
network
Resource usage optimization
Maximize total CPU usage
Draw maximum I/O rate from tape servers
Capability of handling other purpose than DST
production
User analysis, Monte Carlo simulation or anything
Application for parallel processing at university site
dBASF – Distributed B.A.S.F
Super-framework for B.A.S.F.
10
Link of dBASF Servers
Job Client
I/O
B.A.S.F.
init/term B.A.S.F.
B.A.S.F.
B.A.S.F.
I/O
report of resource usages
dynamic change of node allocation
Resource
PC server
SPARC
PC server
PC server
Job Client
B.A.S.F.
B.A.S.F.
I/O
B.A.S.F.
B.A.S.F.
I/O
Communication among Servers
Functionality
Call function on a remote node by sending a message
Shared memory expanded over network space
Implementation
NSM – Network Shared Memory
House-grown product
Originally used for BELLE DAQ
Based on TCP and UDP
BELLE/CHEP2000
12
Components of dBASF
dBASF Client
User interface
Accepts from user:
B.A.S.F. execution script
Number of CPUs to be allocated for analysis
Asks Resource manager to allocate B.A.S.F. daemons
Resource manager returns allocated nodes
Initiates B.A.S.F. execution on allocated nodes
Waits for completion
Notified from B.A.S.F. daemons when job ends
BELLE/CHEP2000
13
Components of dBASF
Resource Manager
Collects resource usage from B.A.S.F. daemons through
NSM shared memory
CPU load
Network traffic rate
Monitors idling B.A.S.F. daemons of each dBASF session
Increase/decrease number of allocated B.A.S.F. daemons
dynamically when better assignment is discovered
BELLE/CHEP2000
14
Components of dBASF
B.A.S.F. Daemon
Runs on each computing server
Accepts ‘initiation request’ from dBASF client and forks
B.A.S.F. processes
Reports resource usage to Resource manager through
NSM shared memory
BELLE/CHEP2000
15
Components of dBASF
I/O Daemon
Reads tapes or disk files and distributes events to B.A.S.F.
running on each node through network
Collects processed data from B.A.S.F. through network
and writes them to tapes or disk files
In case of Monte Carlo event generation, event
generator output is distributed to B.A.S.F. where detector
simulation is running
BELLE/CHEP2000
16
Components of dBASF
Miscellaneous Servers
Histogram server
Merges histogram data accumulated on each node
Output server
Collects standard out on each node and saves them to file
BELLE/CHEP2000
17
Resource Management
Best Performance
Achieved when total I/O rate becomes maximum with
minimum number of CPUs
Dynamic Load Balancing
CPU bound:
Increase number of Computing servers so that I/O speed
becomes maximum
I/O bound:
Decrease number of Computing servers so as not to
change I/O speed
BELLE/CHEP2000
18
Resource Management
Load Balancing
When n now CPUs are assigned for a job, best assignment
number of CPUs; n new is given by:
nnew
CPU real spent time
IOcurrent rate


 nnow
IOlimit rate CPU analysisspent time
BELLE/CHEP2000
19
Resource Management
Job Client
initiate B.A.S.F.
B.A.S.F.
B.A.S.F.
B.A.S.F.
B.A.S.F.
increase node
report of resource usage
best allocation? no
Resource
decrease node
B.A.S.F.
Job Client
terminate B.A.S.F.
B.A.S.F.
B.A.S.F.
B.A.S.F.
Data Flow
SPARC
Raw Data
PC servers
B.A.S.F.
B.A.S.F.
TCP/IP
B.A.S.F.
B.A.S.F.
TCP/IP
Processed Data
I/O
I/O
PC servers
STDOUT
Histogram
BELLE/CHEP2000
21
Status
System test is in progress on BELLE PC farm consisting of
16 units of P3 550MHz x 4 servers
Node-to-node communication framework was developed
and being tested
Resource management algorithm is under study
Basic speed test of network data transfer has been finished
Fastether: Point-to-Point, 1-to-n
GigbitEther: Point-to-point, 1-to-n
New computing system will be available in March 2001
22
Summary
We will build computing farm of 12,000 SPECint95 with PC
Linux and Solaris servers to solve facing computing power
shortness
We began to develop management scheme of computing
system extending current analysis framework
We have developed communication framework and are
studying resource management algorithm
BELLE/CHEP2000
23