投影片 1 - National Sun Yat

Download Report

Transcript 投影片 1 - National Sun Yat

Presenter: Zong Ze-Huang
Rohit Sinha, Aayush Prakash, and Hiren D. Patel
Design Automation Conference(ASP-DAC), 2012 17th Asia and South Pacific
1
2015/7/21
This work presents a methodology that parallelizes the simulation of
mixed-abstraction level SystemC models across multicore CPUs, and
graphics processing units (GPUs) for improved simulation
performance. Given a SystemC model, we partition it into processes
suitable for GPU execution and CPU execution. We convert the
processes identified for GPU execution into GPU kernels with
additional SystemC wrapper processes that invoke these kernels. The
wrappers enable seamless communication of events in all directions
between the GPUs and CPUs. We alter the OSCI SystemC simulation
kernel to allow parallel execution of processes. Hence, we co-simulate
in parallel, the SystemC processes on multiple CPUs, and the GPU
kernels on the GPUs; exploit both the CPUs, and GPUs for faster
simulation. We experiment with synthetic benchmarks and a set-top
box case study.
2
2015/7/21
What’s the problems?
• SystemC’s reference implementation can’t exploit
parallel processing.
Proposed method
• A methodology that parallelizes the simulation of
mixed-abstraction level SystemC model across
multicore CPUs and GPU.
3
2015/7/21
[2]
[3]
[8]
A conservative approach to systemcparallelization
parSC: synchronous parallel systemc
simulation on multi-core host architectures
Multi-core parallel simulation of system-level
description languages
[7] SCGPSim: a fast SystemC simulator
on GPUs
parallelize the simulation of SystemC
RTL models on GPUs
Conservative approcach for
parallel simulation
[9] Optimizing system models for
simulation efficiency
[6] Optimistic parallelisation of
systemc
a technique that converts SC_THREADs to
SC_METHODs by generating a finite
state machine
Present an opportunistic approach to
the simulation of RTL and TL models
This paper
4
Delta cycle
Immediate cycle
Simulation cycle
5
2015/7/21
 Method(SC_METHOD)
◦ Behaves like a function
◦ Whenever sensitive list changes
 Thread(SC_THREAD)
◦ When called keeps executing or waiting for some event to
occur
 Clocked Thread(SC_CTHREAD)
◦ SC_CTHREAD process has the name of the process and
the clock that triggers the process.
6
2015/7/21
7
2015/7/21
 CUDA is a parallel computing
platform and
programming model created by NVIDIA and
implemented by the graphics processing
uinits (GPUs) that they produce.
 CUDA is a complex instruction
set that compile by
GPU .
8
2015/7/21

Specification stage
◦ Designer use SystemC at RTL or TL abstraction to specify their design.

GPU-Suitability Analysis stage
◦ SystemC processes are identified as suitable for GPU execution.

Translation stage
◦ Translates processes a and b into their equivalent GPU kernel and SystemC
wrappers.

Compilation stage
◦ Compiles the processes and links them together into one GPU-CPU binary.

9
Simulation stage
◦ Takes binary and co-simulates the SystemC design on multiple CPUs and GPUs.
2015/7/21
10
2015/7/21
 Converts SC_THREADs to SC_METODs by generating a finite
state machine equivalent to the original SC_THREAD
specification.
11
2015/7/21
 Gpu_sc_process’s statement
contains three types:
• wait
• notify
• other systemc statement
12
2015/7/21
 Wait() :
◦ Wrapper process first fetches <e,v,t> from GPU global
memory.
◦ While wrapper process waits, other GPU kernels and
processes on the CPUs continue execution.
◦ When the wrapper process resumes execution, <e,v,t> is
reset in the GPU memory.
 Notify() :
◦ Store <e,v,t> tuple in the GPU global memory.
13
2015/7/21
 12 Intel Xeon E5645 processor
 NVIDIA Tesla C2075 graphics card
 Linux 2.6.32-33
14
operating system
2015/7/21
15
2015/7/21
 Conclusions
• Present a methodoloty that parallelizes mixed-abstraction
SystemC models across multicore CPUs and GPUs.
• Present a translation algorithm that converts selected
SystemC processes to GPU CUDA kernels.
 My comments
◦ This paper help me to understand SystemC process and
architecture.
◦ GPGPU is a powerful technique on improve execution
efficiency.
16
2015/7/21