Simulation_Emulation

Download Report

Transcript Simulation_Emulation

Simulation, Emulation
Sathish Vadhiyar
Sources / Credits: Microgrid,
Simgrid
Importance
Needed for characterizing behavior of
Grid systems in the future
During development period, to test
methodologies under repeatable
conditions
For simulating “what if” scenarios
Needed when there is no real grid.
Needed in India
MicroGrid
Enables systematic design and evaluation of
middleware, applications, and network services for
computational Grid.
Provides an environment for scientific and repeatable
experiments.
Microgrid can also predict performance on futuristic
and fictional topologies
Features



Enables use of Globus applications without change by
virtualizing execution environment providing the illusion of
virtual Grid.
Uses global virtual time to preserve simulation accuracy
Provides basic resource simulation models for computing,
memory and networking
Virtualizing resources
Uses mapping table for mapping from virtual IP
address to physical IP address
Intercepts relevant library calls



Gethostbyname
Bind, send, receive
Process creation – process created through Globus resource
management functions
User will be logged in directly to a physical host and
submit jobs to virtual hosts
Globus gatekeeper, job managers and client hosts run
on virtual hosts
All socket interfaces and information services are
also virtualized
Global Coordination
Simulation Rate – rate at which simulator
runs. How much of real cpu is simulator using.
Minimum feasible simulation rate depending
on desired virtual resources and actual
capacities of physical resources
Minimum value of SR over all resources –
fastest rate at which simulation can be run in
a functionally correct manner
Simulation Rate Examples
Given physical = 1 GHz, virtual = 2 GHz,
simulation rate cannot be less than 2.
Otherwise you will be guaranteeing more
than 100% CPU usage !
Given physical = 2 GHz, virtual = 1 GHz,
simulation rate cannot be less than 0.5.
Same argument.
More
Another parameter (say x) that determines
how fast time progresses in the application
Greater the value, faster the time progresses
in the application
Calls like gettimeofday and select use these
parameters to return appropriate adjusted
times
Thus virtual cpu twice the speed of real cpu,
simulation rate = 2, and x =2 will give ½ the
time for a code fragment
Resource Simulation
Resource Simulation
Simulation rate is divided equally across
all processes executing on the physical
host
The resulting fractions are then
enforced by local MicroGrid CPU
scheduler
It is a scheduler daemon using signals to
allocate local physical CPU capacity to
local MicroGrid tasks
How to ensure CPU usage
Naïve strategy - Calculate usage for procs. on
virtual machine. Give all procs. the same
usage.
E.g. if (virtual / physical) is 25% and 2 procs.
running on virtual machine, assign each
process 10 milliseconds every 80 milliseconds.
Not good


An application process should always be ready to
run if it has not used its available CPU slots
A computation intensive process should be able to
fully utilize the quota for virtual machine
MicroGrid CPU Controller
Each CPU controller on each physical host
Uses SIGSTOP and SIGCONT to stop and continue
processes
Consists of 3 parts



Live process interception – whenever a virtual process is
created or destroyed on microgrid using main() or exit(), CPU
controller traps it and updates its process table
CPU usage monitoring – every sliding window, the controller
reads CPU usage from /proc of processes in its process table
Process scheduling – the controller calculates CPU usage of
each virtual host in a time window. If the amount of
effective cycles exceed the speed of the virtual hosts, the
controller sends SIGSTOP to all processes of the virtual
hosts, otherwise, it wakes up processes and let them proceed
CPU Controller
Determining sliding window size
E - design accuracy error
p - scaled virtual machine speed (fraction of
physical CPU)
w - the sliding window size in jiffies
n - the available jiffies in a sliding window
n should satisfy: w = round(n/p) and | 1 n/(p*w) | < E
Find the smallest n that satisfies equation | 1
- (n/p)/round(n/p) | < E, then find w.
Example
Real machine – 1 GHz
Virtual machine – 600 MHz
Simulation rate – 2
E – 0.05
p = 600/1000 = 60%, with simulation rate 2, it
is 30% real cpu
Smallest n that satisfies | 1 - (10n/3) /
round(10n/3) | < 0.05
Try n= 1,2,3…
Here, n = 2
w=7
Network Simulation
Based on MaSSF – a scalable packet-level
network simulator that supports direct
execution of unmodified application
Uses a distributed simulation engine
Can model many kinds of network protocols
including TCP/IP, UDP, user-defined protocols
etc.
Intercepts live network streams at the
socket level using wrapper library called
WrapSocket
Live traffic interception
Scalability
Given a network topology and available
cluster nodes, MaSSF partitions the
virtual network to multiple blocks and
assigns each block to a cluster node
Every cluster node runs a discrete
event simulation engine
Events are exchanged among simulation
engines. Cluster nodes also needs to
synchronize periodically. Involves
traffic
Scalability
Hence network mapping has to be done
carefully to minimize communication of
simulation events between simulation
engine nodes and to achieve load balance
across partitions
Network mapping problem modeled as
graph partitioning problem – can
estimate the number of simulation
events on each single link and use it to
calculate edge weight.
Improving scalability
Graph partitioning for network
mapping problem



Input graph – traffic information
(defines edge weights), network
structure
Constraints – weighted sum of
computation and memory requirement
on each simulation engine node (vertex
weight) to be balanced among multiple
vertices
Objectives – communication across
partitions (edge-cut) to be minimized
Partitioned network defines the
mapping of simulated network
nodes to physical resources
Real applications on MicroGrid Lot more to do…
SimGrid
You know it
References / Sources / Credits
Validating and Scaling the MicroGrid: A Scientific
Instrument for Grid Dynamics, Xin Liu, Huaxia
Xia, and Andrew Chien, to appear in the Journal
of Grid Computing.
The MicroGrid: a Scientific Tool for Modeling
Computational Grids , in Proceedings of SC2000
(Song, Liu, Jakobsen, Bhagwan, Zhang, Taura
and Chien)
Simgrid: A Toolkit for the Simulation of
Application Scheduling. CCGrid 01
JUNK!
Calls



Setting up the simulated application and computation
environment
Simulating the application execution once the tasks have been
assigned to resources – SG_simulate
Scheduling algorithms
Based on performance prediction – SG_getPrediction
Implementation of scheduling decision –
SG_scheduleTaskOnResource
Also supports runtime scheduling algorithms. Control must be
returned from SG_simulate to scheduling algorithm itself. For work
queue control is returned after each task completes. For others,
user can specify how long a simulation should run before control is
returned. SG_unscheduleTask can be used to modify scheduling
decisions for tasks. Many API calls help the user to keep track of
past scheduling decisions.
SG_getclock returns virtual global time
Can do post mortem analysis with the help
of resource usage and start and end times
and compute various metrics and how the
simulation behaved
SimGrid-2 paper
Simulations allow


Repeatable experiments
To explore wide range of application and resource
scenarios
Simgrid



For developing and evaluating scheduling algorithms
Objectives – good usability, fast simulations,
configurable, tunable and extensible simulations,
scalable
Aim towards simulation standardization
Simgrid components
Agent – implements scheduling algorithm,
contains code, private data and location
Location – where agent runs, defined by
location, mail boxes for communicating with
other agents and private data
Task – defined by amount of computing, data
size, private data
Path – routing abstractions
Channel – abstraction representing
communication between agents
Simulation program steps
Definition of code for each agent


Modeling application
Done with MSG_Task_Get, MSG_Task_Put,
MSG_Task_Execute
Creation of resources



Modeling the physical platform
Hosts, links, routing table paths
MSG_host_create, MSG_link_create, MSG_routing_table_set
Creation and allocation of agents to locations


Application deployment
MSG_process_create
Starting simulation

MSG_main
Resource sharing is supported by SimGrid by
supporting different models



FIFO
FRFO
SHARED – fair sharing or priority-based sharing
Challenges


Users to construct large simulated platforms
To simulate the complex network contention
behaviors of applications executing on these
platforms
Modeling grid topologies
Simgrid allows users to import platform
descriptions obtained with Effective
Network View (ENV).
Thus SimGrid uses ENV and NWS to
instantiate platform models which
represent realistic platforms both in terms
of topology and in terms of traffic.
Bandwidth sharing models
Algorithm first considers all bottleneck links and flows on
these links
Assigns a bandwidth to flows on these links inversely
proportional to their rtts.
Algorithm reduces bandwidths on the links traversed by
these flows
Process repeated until bandwidths assigned to all flows
Simgrid makes it possible to define two types of links:
those where bandwidth is shared and those where
bandwidth is not shared
Good for modeling grid computing topology where local
networks connected by a shared backbone
GridSim
Individual resource
brokers and central
schedulers
Simjava
Simulations in Simjava contain a number
of entities each running as own threads
Entities call simulation functions
(sim_schedule, sim_hold, sim_wait) and
events are generated.
Every event has source entity and
destination entity
NPB with MicroGrid
Scheduling quanta length and
Modeling Accuracy
Internal Performance
NPB run on real Alpha
cluster of 4 machines and
on Microgrid with CPU
fraction 4%
The periodic execution
times obtained every 1
second for alpha cluster
and _? second(s) for
MicroGrid
Close match with root
mean square percentage
difference to be 3.08%