ppt - monarc 2 - California Institute of Technology

Download Report

Transcript ppt - monarc 2 - California Institute of Technology

National Center for Information Technology
MONARC 2
- distributed systems simulation -
Corina Stratan
Ciprian Mihai Dobre
California
POLITEHNICA
Institute of Technology
University of Bucharest
The Goals of the Project
• To perform realistic simulation and modelling of large
scale distributed computing systems, customised for
specific large scale HEP applications.
• To provide a design framework to evaluate the
performance of a range of possible computer systems, as
measured by their ability to provide the physicists with the
requested data in the required time, and to optimise the
cost.
• To narrow down a region in this parameter space in which
viable models can be chosen by any of the future LHC-era
experiments.
• To offer a dynamic and flexible simulation environment.
LHC Computing: Different from
Previous Experiment Generations
One of the four LHC
detectors (CMS)
online system
multi-level trigger
filter out background
reduce data volume
Raw recording rate 0.1 – 1 GB/sec
3 - 8 PetaBytes / year
Off-Line LHC Computing
Data Analysis
Geographical dispersion: of people and resources
Complexity: the detector and the LHC environment;
Scale: ~100 times more processing power; Petabytes per year of data
CMS
1800 Physicists
150 Institutes
32 Countries
VERY LARGE SCALE DISTRIBUTED SYSTEM AND IT HAS TO PROVIDE
(NEAR) REAL-TIME DATA ACCESS FOR ALL THE PARTICIPANTS
Experiment
Regional Center Hierarchy
(Worldwide Data Grid)
~PBytes/sec
Online System
Bunch crossing per 25 nsecs.
Event is ~1 MByte in size
100–1000 MBytes/sec
Tier 0 +1
~0.6 - 2.5 Gbits/sec
FNAL Center
Italy Center
UK Center
Tier 1
Offline Farm,
CERN Computer
France Center
~2.4 Gbits/sec
Tier 2
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
~622 Mbits/sec
Tier 3
InstituteInstitute Institute
~0.25TIPS
Physics data cache
Workstations
Institute
100 - 1000
Mbits/sec
Tier 4
Physicists work on analysis “channels”.
Processing power:
~200,000 of today’s fastest PCs
Simulation Models
 The simulation model:
 abstracts the components of the real system and their
interactions
 must be equivalent to the simulated system
 Simulation models:
 continuous time - the system is described by a set of
differential equations
 discrete time - the state changes only at certain time
moments
 In MONARC: one of the discrete time models (Discrete
Event Simulation – DES); the events represent important
activities from the system, managed with the aid of an
internal clock
A Global View for Modelling
Computing Models
Specific Components
Basic Components
LAN
DB
Simulation
Engine
WAN
CPU
Job
Scheduler
MetaData
Catalog
Distributed Scheduler
Analysis
Jobs
MONITORING
REAL
Systems
Testbeds
Regional Center Model
DB
Index
REGIONAL CENTER
Link
Port
DB
Server
Link
Port
LAN
FARM
Link
Port
AJob AJob
AJob ...
CPU
WAN
DB
Server
Link
Port
AJob AJob
AJob ...
CPU
Link
Port
AJob AJob
AJob ...
CPU
Job
Scheduler
Job
Activity
Job
Activity
Job
Activity
The Simulation Engine
 Provides the multithreading mechanism for the simulation
 The entities with time dependent behavior are mapped on
“active objects”
 In the simulation engine: management of active objects and
events
 Thread reusability (thread pool)
Activity
Scheduler
AJob
Job
Task
Farm
JobScheduler
WorkerThread
Engine
CPUUnit
Event
EventQueue
Pool
Multitasking Processing Model
Concurrent running tasks share resources (CPU, memory, I/O)
“Interrupt” driven scheme:
For each new task or when one task is finished, an interrupt is
generated and all “processing times” are recomputed.
It provides:
An efficient mechanism
to simulate multitask
processing.
Handling of concurrent
jobs with different
priorities.
An easy way to apply
different load balancing
schemes.
Engine tests
Processing a TOTAL of 100 000 simple jobs in
1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs
(number of CPUs = number of parallel threads):
10000
2X2.4 GHz, Linux
2X450MHz Solaris
2X3GHz, Windows
Time [s]
1000
100
10
1
10
100
1000
10000
100000
No of THREADS
more tests:
http://monalisa.cacr.caltech.edu/MONARC/
Job Scheduling
 Dynamically loadable modules for each regional
center
 Basic job scheduler: assigns the jobs to CPUs from
the local farm
 More complex schedulers: allow job migration
between regional centers
CPU FARM
Site A
JobScheduler
Dynamically loadable
module
Centralized Scheduling
CPU FARM
CPU FARM
Site A
Site B
JobScheduler
JobScheduler
GLOBAL
Job Scheduler
Distributed Scheduling
– market model –
CPU FARM
Site A
CPU FARM
COST
Site B
JobScheduler
JobScheduler
Request
DECISION
JobScheduler
CPU FARM
Site A
Example: simple distributed scheduling
 Very simple scheduling
algorithm, based on
searching the center with
the minimum load
 We simulated the activity of
4 regional centers
 When all the centers are
heavily loaded, the number
of job transfers grows
unnecessarily
Network Model
Farm
LinkPort
Simulated network
components
Farm
WA
N
WA
N
LAN
LinkPort
LAN
Simulated
local traffic
Simulated
inter-regional
traffic
LAN/WAN Simulation Model
Link
Node
Link
Node
LAN
Node
ROUTER
Node
“Interrupt” driven simulation :
for each new message an
interrupt is created and for all
the active transfers the speed
and the estimated time to
complete the transfer are
recalculated.
Internet
Connections
LAN
Node
Node
ROUTER
Link
Node
LAN
Node
Node
Continuous Flow between events !
An efficient and realistic way to simulate concurrent transfers
having different sizes / protocols.
Network Model
The TCP/IP layers are closely followed
Application Layer
Network
Job
Protocol:
TCPProtocol
UDPProtocol
Message
LinkPort,
LAN,
WAN
Transport Layer
Internet Layer
Network Access
Layer
Data Model
Database Index
Client
Mapare
LinkPort
Task Database Entity
Database
Database
Database
DContainer
DContainer
Database Server
Mass Storage
DContainer
Data Model
Generic Data
Container
Size
Event Type
Event Range
Access Count
 INSTANCE
FILE
Data Base
FTP Server
Node
DB Server
Export / Import
META DATA Catalog
Replication Catalog
Network
FILE
NFS Server
Custom Data
Server
Data Model
Data Processing
JOB
Data Request
META DATA Catalog
Replication Catalog
Data Container
Data Container
Data Container
Data Container
Select from the options
JOB
List Of IO Transactions
Activities: Arrival Patterns
A flexible mechanism to define the Stochastic
process of how users perform data processing tasks
Dynamic loading of “Activity” tasks, which are threaded objects and
are controlled by the simulation scheduling mechanism
Physics Activities
Injecting “Jobs”
Regional Centre Farm
Job
Job
Job
Activity
Activity
for( int k =0; k< jobs_per_group; k++) {
Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process);
farm.addJob(job ); // submit the job
sim_hold ( 1000 ); // wait 1000 s
}
Each “Activity” thread generates data processing jobs
These dynamic objects are used to model the users behavior
Output of the simulation
Node
DB
Simulation
Engine
Output Listener
Filters
Router
Output Listener
Filters
GRAPHICS
Log Files
EXCEL
User C
Any component in the system can generate generic results objects
Any client can subscribe with a filter and will receive the results it is
Interested in .
VERY SIMILAR structure as in MonALISA . We will integrate soon
The output of the simulation framework into MonaLISA
Conclusions
 Modelling and understanding current systems, their
performance and limitations, is essential for the design of the
large scale distributed processing systems. This will require
continuous iterations between modelling and monitoring
 Simulation and Modelling tools must provide the functionality
to help in designing complex systems and evaluate different
strategies and algorithms for the decision making units and
the data flow management.
 For future development: efficient distributed scheduling algorithms,
data replication, more complex examples.
http://monalisa.cacr.caltech.edu/MONARC