Off-line data processing grid
Download
Report
Transcript Off-line data processing grid
JOINT INSTITUTE FOR NUCLEAR RESEARCH
OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING
FOR NICA
Nechaevskiy A.
1
Dubna, 2012
AGENDA
NICA off-line data processing parameters
Tasks for simulation
Simulation platform choice
Model efficiency estimation
First results
Conclusion
2
DATA PROCESSING SCHEMA FOR NICA MPD
NICA’s data flow parameters:
• high speed of the events generation
(to 6 KHZ),
• in the central collision of Au-Au
about 1000 particles are formed,
• the size of the file with modelled
information from detectors for 100000
events occupies about 5 TB.
MPD parameters
№ Parameter
1 Speed of data collection from all detector’s
components
2 Duration of the set of statistics period within
a year
3 Frequency of the event emergence on an
installation exit
4 Dead time after event emergence
5 Average of tracks in an event
6 Average of particles collisions
7 Average of bytes on each collision
8 Average time of event's reconstruction on the
processor in capacity 1КSI2K
Value
4.7 GB/s
120 days
6 KHz
1 cicle (50%)
500
20
45
2 s.
3
SOURCE DATA
The specification of requirements to NICA experiment off-line data processing
№
Requirements
Value
1
Quantity of events to processing in a year
1.87 е10
2
Total data volume to storage in a year
8,4 PB
3
Total Disk space in case storage is RAID6 (+25%) in a year
10 PB
4
Total CPUs in grid structure, minimum necessary for data
recovery with the speed equal to a set of events, proceeding
from 7000 thousand astronomical clock of work a year
1480
5
Numbers of grid sites
20
6
Minimum of Data transfer speed from JINR to Sites
2,5 Gb/s
The expected number of data processed events is about 19 billions. If
data transfer speed from sensors is 4.7 GB/s, the total amount of source
data can be estimated as 30 PB annually, or 8.4 PB after processing.
4
GRID FOR EXPERIMENTS
Hierarchical grid infrastructure with some computing
centers Tier 0/1/2 already used in ALICE experiment and
others.
PANDA experiment wants to use it also.
Questions For Simulation
• Grid Infrastructure Architecture?
• Number Resource centers?
• Amount of the Resources?
• Capacity of the network?
• Resource distribution between users groups?
• etc.
Urgency
Recommendation and specification for NICA grid infrastructure creation
5
SIMULATION TASKS
Task 1.
Task 2.
6
GRIDSIM SIMULATION PACKAGE
•Allows to simulate various classes of heterogeneous
resources, users, applications and brokers
• There are no restrictions on jobs number which can
be sent on a resource;
• Capacity of a network between resources can be set;
• System supports simulation of statistical and
dynamic schedulers;
• Statistics of all or the chosen operations can be
registered
• Implemented in Java
• Configuration files are used to set simulation’s
parameters
• Source code is available
• A lot of examples of the GridSim using
• Multilevel architecture allows to add new
components easily
http://www.gridbus.org/gridsim/
GridSim Architecture
7
MODEL
EFFICIENCY ESTIMATION
Parameters of the model efficiency:
a) Average network loading by days [%]
b) Numbers of the running /waiting jobs
c) Number of uses CPUs
d) Total Data transfers in hours [GB]
e) Total Storage uses [%]
f) Cluster uses [%]
j) Refused CPUs [%]
8
MODEL COMPONENTS
1.
2.
3.
4.
User Interface (edit/add
model)
MySQL database to
save simulation
parameters
Simulation System
Results Visualization
Tools
9
TEST SIMULATION
Clusters: 1 Machine 2 CPUs
Users: 1
Jobs: 10
10
EXAMPLE OF GRAPHIC REPRESENTATION OF THE SIMULATION RESULTS
1. Waiting and Running Jobs
2. Average Clusters Usage
11
DONE!
The web interface of the model editing with one test scenario of the grid
work is created
key parameters of the model estimate are allocated;
Results visualization tools are created;
Simulation passed debugging and verification phase.
12
CONCLUSION
The model will allow :
to estimate some architectures (parameters) of the data
processing system by changing entrance data only;
library of scenarios (Data processing, architectures, other) will
allow to compare various technical solutions and to choose
optimum.
Plans:
― the user interface development;
― debugging the model in client-server architecture
― development of a scenarios sets of grid systems work
― user’s editing and adding grid model parameters
13
Questions?
14