Performance of Logic Event Simulation on PC
Download
Report
Transcript Performance of Logic Event Simulation on PC
Performance of Parallel Logic
Event Simulation on PCCluster
Proceedings of 7th International Symposium on Parallel Architectures, Algorithms and
Network (ISPAN’04) 1087-4089/04 $ 20.00 2004 IEEE
Presenter: Che-wei Chang
Outline
Introduction
Experiment Environments
Parallel Simulation Algorithm
The Parallel Simulation Experiment
Experimental Results
Conclusion
Introduction
The effects of PC-cluster communication
latencies on the performance of discrete
event simulation.
The synchronizes by the time warp
mechanism and the problem domain is
partitioned for best parallel performance.
Experiment Environments
8-node Beowulf cluster on both system :
600MHz Pentium III processor
256MB RAM
On one system :
100Mb/s Ethernet cards
On another system :
33MHz PCI Myrinet network cards
1MB of SRAM
DMA engine
Parallel Simulation
Algorithm(1/5)
Paralle simulation is mainly governed :
Defining the dependent and independent
of the events.
Selecting a best-performance parallel
partitioning schene for the model.
The synchronization methods :
The conservative synchronization
The optimistic synchronization
Parallel Simulation
Algorithm(2/5)
The dependent objects located on different
processors communication with each other
by the time-stamped message.
In optimistic simulation technique the time
warp and roll-back processes are used to
automatically maintain the concurrency of
the model.
Parallel Simulation
Algorithm(3/5)
Each processor acts independently from
the others and at the same time keeps
track of the message coming into the input
message queue.
Each processor receives and acts upon
message in the input the queue one by one
in the order of their time-stamps until the
queue is exhausted.
Parallel Simulation
Algorithm(4/5)
Figure 1. An example of optimistic simulation.
Parallel Simulation
Algorithm(5/5)
To be able to roll back :
The time warp mechanism saves the state
Input message of the objects
After restoring a previous state, the simulation
starts simulating forward again.
The Parallel Simulation
Experiment(1/4)
The IP router model for the “ next
generation network ” includes :
A packet processing engine
Switch architecture
The packet processing engine is
responsible for :
QoS
Packet routing
The Parallel Simulation
Experiment(2/4)
The QoS is responsible for :
Performing flow detection
Managing the packet sending order by
scheduling.
The packet routing is responsible for :
Transmitting data within the network
according to a routing scheme, including :
Routing table searches
Modifying the packet headers in accordance
with the routing.
The Parallel Simulation
Experiment(3/4)
Our switch model is a 3x3 configuration
with the distributed schedulers, the input
buffers and the matrix switch array.
The internal message are taken from a
global event list and are processed in
order.
The Parallel Simulation
Experiment(4/4)
External message received in the
communication layer buffers are inserted into
the global event list as the internal events.
When the processor has no more internal
messages to processed, it will send out a
null-message and get all external events from
the communication buffer.
Experimental Results (1/3)
TR : real-time
TM : overall message communication time
NM : number of message
TL : latency
TR - NM ( TL )
TR - TM
SPTheory =
=
NP
NP
Experimental Results (2/3)
Table 1. Non-rolling back results from Myrinet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
344
1.69
1,812,011
1.9
4
258
2.25
5,180,086
3.9
6
227
2.56
6,056,217
5.8
8
224
2.59
7,123,671
7.8
1
Table 2. Non-rolling back results from Ethernet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
563
1.03
1,812,011
1.9
4
888
0.65
5,180,086
3.8
6
962
0.60
6,056,217
5.7
8
1087
0.53
7,123,671
7.7
1
Experimental Results (3/3)
Table 1. Rolling back results from Myrinet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
405
1.31
1,812,011
1.6
4
371
1.57
5,180,086
2.2
6
302
1.92
6,056,217
3.3
8
266
2.18
7,123,671
5.0
1
Table 2. Rolling back results from Ethernet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
672
0.86
1,812,011
1.4
4
1099
0.53
5,180,086
1.6
6
1202
0.48
6,056,217
1.7
8
1334
0.44
7,123,671
1.8
1
Conclusion
Rolling-back consumes lots of speed-up
performance and so the faster CPU, the
harder to get parallel speed-up.
The algorithm can avoid or minimize
computational rolling-back must be
investigated with more efforts.
Thanks for Your Attention.