Performance of Logic Event Simulation on PC

Download Report

Transcript Performance of Logic Event Simulation on PC

Performance of Parallel Logic
Event Simulation on PCCluster
Proceedings of 7th International Symposium on Parallel Architectures, Algorithms and
Network (ISPAN’04) 1087-4089/04 $ 20.00 2004 IEEE
Presenter: Che-wei Chang
Outline






Introduction
Experiment Environments
Parallel Simulation Algorithm
The Parallel Simulation Experiment
Experimental Results
Conclusion
Introduction
 The effects of PC-cluster communication
latencies on the performance of discrete
event simulation.
 The synchronizes by the time warp
mechanism and the problem domain is
partitioned for best parallel performance.
Experiment Environments
 8-node Beowulf cluster on both system :
 600MHz Pentium III processor
 256MB RAM
 On one system :
 100Mb/s Ethernet cards
 On another system :
 33MHz PCI Myrinet network cards
 1MB of SRAM
 DMA engine
Parallel Simulation
Algorithm(1/5)
 Paralle simulation is mainly governed :
 Defining the dependent and independent
of the events.
 Selecting a best-performance parallel
partitioning schene for the model.
 The synchronization methods :
 The conservative synchronization
 The optimistic synchronization
Parallel Simulation
Algorithm(2/5)
 The dependent objects located on different
processors communication with each other
by the time-stamped message.
 In optimistic simulation technique the time
warp and roll-back processes are used to
automatically maintain the concurrency of
the model.
Parallel Simulation
Algorithm(3/5)
 Each processor acts independently from
the others and at the same time keeps
track of the message coming into the input
message queue.
 Each processor receives and acts upon
message in the input the queue one by one
in the order of their time-stamps until the
queue is exhausted.
Parallel Simulation
Algorithm(4/5)
Figure 1. An example of optimistic simulation.
Parallel Simulation
Algorithm(5/5)
 To be able to roll back :
 The time warp mechanism saves the state
 Input message of the objects
 After restoring a previous state, the simulation
starts simulating forward again.
The Parallel Simulation
Experiment(1/4)
 The IP router model for the “ next
generation network ” includes :
 A packet processing engine
 Switch architecture
 The packet processing engine is
responsible for :
 QoS
 Packet routing
The Parallel Simulation
Experiment(2/4)
 The QoS is responsible for :
 Performing flow detection
 Managing the packet sending order by
scheduling.
 The packet routing is responsible for :
 Transmitting data within the network
according to a routing scheme, including :
 Routing table searches
 Modifying the packet headers in accordance
with the routing.
The Parallel Simulation
Experiment(3/4)
 Our switch model is a 3x3 configuration
with the distributed schedulers, the input
buffers and the matrix switch array.
 The internal message are taken from a
global event list and are processed in
order.
The Parallel Simulation
Experiment(4/4)
 External message received in the
communication layer buffers are inserted into
the global event list as the internal events.
 When the processor has no more internal
messages to processed, it will send out a
null-message and get all external events from
the communication buffer.
Experimental Results (1/3)




TR : real-time
TM : overall message communication time
NM : number of message
TL : latency
TR - NM ( TL )
TR - TM
 SPTheory =
=
NP
NP
Experimental Results (2/3)
Table 1. Non-rolling back results from Myrinet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
344
1.69
1,812,011
1.9
4
258
2.25
5,180,086
3.9
6
227
2.56
6,056,217
5.8
8
224
2.59
7,123,671
7.8
1
Table 2. Non-rolling back results from Ethernet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
563
1.03
1,812,011
1.9
4
888
0.65
5,180,086
3.8
6
962
0.60
6,056,217
5.7
8
1087
0.53
7,123,671
7.7
1
Experimental Results (3/3)
Table 1. Rolling back results from Myrinet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
405
1.31
1,812,011
1.6
4
371
1.57
5,180,086
2.2
6
302
1.92
6,056,217
3.3
8
266
2.18
7,123,671
5.0
1
Table 2. Rolling back results from Ethernet system
#PEs
Real-time(s)
Overall speed-up
# of messages
Theo. Speed-up
1
581
-
2
672
0.86
1,812,011
1.4
4
1099
0.53
5,180,086
1.6
6
1202
0.48
6,056,217
1.7
8
1334
0.44
7,123,671
1.8
1
Conclusion
 Rolling-back consumes lots of speed-up
performance and so the faster CPU, the
harder to get parallel speed-up.
 The algorithm can avoid or minimize
computational rolling-back must be
investigated with more efforts.
Thanks for Your Attention.