Off-line data processing grid

Download Report

Transcript Off-line data processing grid

JOINT INSTITUTE FOR NUCLEAR RESEARCH
OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING
FOR NICA
Nechaevskiy A.
1
Dubna, 2012
AGENDA






NICA off-line data processing parameters
Tasks for simulation
Simulation platform choice
Model efficiency estimation
First results
Conclusion
2
DATA PROCESSING SCHEMA FOR NICA MPD
NICA’s data flow parameters:
• high speed of the events generation
(to 6 KHZ),
• in the central collision of Au-Au
about 1000 particles are formed,
• the size of the file with modelled
information from detectors for 100000
events occupies about 5 TB.
MPD parameters
№ Parameter
1 Speed of data collection from all detector’s
components
2 Duration of the set of statistics period within
a year
3 Frequency of the event emergence on an
installation exit
4 Dead time after event emergence
5 Average of tracks in an event
6 Average of particles collisions
7 Average of bytes on each collision
8 Average time of event's reconstruction on the
processor in capacity 1КSI2K
Value
4.7 GB/s
120 days
6 KHz
1 cicle (50%)
500
20
45
2 s.
3
SOURCE DATA
The specification of requirements to NICA experiment off-line data processing
№
Requirements
Value
1
Quantity of events to processing in a year
1.87 е10
2
Total data volume to storage in a year
8,4 PB
3
Total Disk space in case storage is RAID6 (+25%) in a year
10 PB
4
Total CPUs in grid structure, minimum necessary for data
recovery with the speed equal to a set of events, proceeding
from 7000 thousand astronomical clock of work a year
1480
5
Numbers of grid sites
20
6
Minimum of Data transfer speed from JINR to Sites
2,5 Gb/s
The expected number of data processed events is about 19 billions. If
data transfer speed from sensors is 4.7 GB/s, the total amount of source
data can be estimated as 30 PB annually, or 8.4 PB after processing.
4
GRID FOR EXPERIMENTS
Hierarchical grid infrastructure with some computing
centers Tier 0/1/2 already used in ALICE experiment and
others.
PANDA experiment wants to use it also.
Questions For Simulation
• Grid Infrastructure Architecture?
• Number Resource centers?
• Amount of the Resources?
• Capacity of the network?
• Resource distribution between users groups?
• etc.
Urgency
Recommendation and specification for NICA grid infrastructure creation
5
SIMULATION TASKS
Task 1.
Task 2.
6
GRIDSIM SIMULATION PACKAGE
•Allows to simulate various classes of heterogeneous
resources, users, applications and brokers
• There are no restrictions on jobs number which can
be sent on a resource;
• Capacity of a network between resources can be set;
• System supports simulation of statistical and
dynamic schedulers;
• Statistics of all or the chosen operations can be
registered
• Implemented in Java
• Configuration files are used to set simulation’s
parameters
• Source code is available
• A lot of examples of the GridSim using
• Multilevel architecture allows to add new
components easily
http://www.gridbus.org/gridsim/
GridSim Architecture
7
MODEL
EFFICIENCY ESTIMATION
Parameters of the model efficiency:
a) Average network loading by days [%]
b) Numbers of the running /waiting jobs
c) Number of uses CPUs
d) Total Data transfers in hours [GB]
e) Total Storage uses [%]
f) Cluster uses [%]
j) Refused CPUs [%]
8
MODEL COMPONENTS
1.
2.
3.
4.
User Interface (edit/add
model)
MySQL database to
save simulation
parameters
Simulation System
Results Visualization
Tools
9
TEST SIMULATION
Clusters: 1 Machine 2 CPUs
Users: 1
Jobs: 10
10
EXAMPLE OF GRAPHIC REPRESENTATION OF THE SIMULATION RESULTS
1. Waiting and Running Jobs
2. Average Clusters Usage
11
DONE!




The web interface of the model editing with one test scenario of the grid
work is created
key parameters of the model estimate are allocated;
Results visualization tools are created;
Simulation passed debugging and verification phase.
12
CONCLUSION
The model will allow :
 to estimate some architectures (parameters) of the data
processing system by changing entrance data only;
 library of scenarios (Data processing, architectures, other) will
allow to compare various technical solutions and to choose
optimum.
Plans:
― the user interface development;
― debugging the model in client-server architecture
― development of a scenarios sets of grid systems work
― user’s editing and adding grid model parameters
13
Questions?
14