Large Scale Simulations - Computer Science & Engineering

Transcript Large Scale Simulations - Computer Science & Engineering

RADICAL
HRL PROPRIETARY
Large Scale Simulations
HRL Shared Software Framework
GPU Computing cluster
Narayan Srinivasa
June 18, 2010
Aleksey Nogin
Work performed by HRL under DARPA contract HRL0011-09-C-001
1
RADICAL
HRL PROPRIETARY
Shared Software Infrastructure
• Infrastructure overview – three aspects:
– Legal “limited LGPL” – like agreement
– General Public License (GPL) does not permit incorporating HRL code into
proprietary programs. Since the HRL code is a subroutine library, you may
consider it more useful to permit linking proprietary applications (which will
be our partners code) with the library. This is allowed by LGPL.
– Subversion server for sharing code
– The API and the software itself
• Summary of the latest:
– Legal agreement is “stuck” on some technicalities and it would take time to
resolve
• In the meantime, we will rely on existing subcontracts for HRL<->Sub sharing
– The subversion server “ExRep” is fully operational
• Already contains the HRL Shared Infrastructure code.
– The GPU cluster is fully operational
– We have ported our infrastructure to GPUs (full 1ms updates!)
– Most of the multi-GPU/multi-node code is written
• Some refactoring of the initialization and “glue” code still needed.
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
2
RADICAL
HRL PROPRIETARY
HRL Shared Source Agreement
• Terms (reminder):
– LGPL-style, but limited to “SyNAPSE Team Members” and “SyNAPSE purposes”
only
– “Shared Source” code can be modified and redistributed to any “SyNAPSE Team
Members”
– Object code have to be accompanied by source – or source can be placed in the
Subversion repository
– Code for separate pieces that only use the “shared source” infrastructure through its
APIs does not have to become a part of “shared source”
• You do not have to release your models to “shared source”
• Currently “stuck” on export restriction technicalities
– Would take time to resolve
• Unfortunately our legal turnaround is very slow
– For now, we will rely on existing subcontracts for 2-way HRL ↔ Sub sharing
• Disable Sub ↔ Sub sharing not covered by subcontracts:
– Provide a Shared area with read-only access to not-HRL people
– Separate areas for those who want to share with HRL
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
3
RADICAL
HRL PROPRIETARY
Subversion Repository
• The subversion server “ExRep” is fully operational
– Already contains the HRL Shared Infrastructure code.
• You have to agree to ExRep Term and Conditions to get access
– This is not SyNAPSE-specific and separate from subcontracts and Shared
Source Agreement
– Agreement binds you as an ExRep user, not your Institution
• E.g. you promise not to share your account credentials with others
– Aleksey emailed all prospective users a copy of the Agreement
• You need to send Aleksey an email stating that you agree.
• SSH public keys are used to grant access
– Aleksey have emailed all prospective users instructions
– You need to email Aleksey a copy of your public key
• ExRep is capable of sending email notifications for all commits
– We are waiting on IT to allow outgoing emails to non-HRL accounts
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
4
RADICAL
HRL PROPRIETARY
GPU-Based High Performance
Computing Cluster
HRL has purchased a high-performance
computing cluster at no cost to DARPA
– SyNAPSE project will be the primary user
– Head node:
• 2of: NVIDIA Tesla C1060 GPUs, each with:
– 933 GFLOP peak performance
– 4GB of GDDR3 memory, at 102 GB/sec
– PCIe 2.0 x16 interconnect (16 GB/sec)
• 48GB RAM
• 2 of: 4-core Nehalem 2.66 Ghz CPUs (64-bit)
• 11TB HDDs (RAID configuration – 8.5TB usable)
– 91 compute nodes, each:
• 2 of: NVIDIA Tesla M1060 GPUs
• 12 GB RAM
• 2 of: 4-core Nehalem 2.26 Ghz CPUs (64-bit)
– Hi-speed 20Gbps InfiniBand Interconnect
– 1Gbps Ethernet switch
The cluster is now fully operational
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
5
RADICAL
HRL PROPRIETARY
GPU Cluster – InfiniBand Fabric
16 compute nodes
(20Gbps each)
36-port
Switch
36-port
Switch
4x40Gbps
4x40Gbps
36-port
Switch
36-port
Switch
June 18, 2010
96-port fast InfiniBand fabric
36-port
Switch
36-port
Switch
• Switches run
at 40Gbps
• Interface cards run
at 20Gbps.
• Each 2 switches
connected at
160Gbps
16 compute nodes
(20Gbps each)
Work performed by HRL under DARPA contract HRL0011-09-C-001
6
RADICAL
HRL PROPRIETARY
GPU and multi-GPU code
• We have ported our infrastructure to GPUs
– Full 1ms updates, do not have to rely on UCI 1s batching
• A closer match to CPU simulations and hardware
– Do not implement axonal delays
– Artificial “80%/20%” uniformly connected network:
• 105 neurons 107 synapses @ 10Hz – runs in real time
– A 2D 2-layer random Gaussian connectivity network:
• 0.3*105 neurons 0.8*107 synapses @10Hz – 3.2x faster than real time
– Generic experiment code runs the same on CPU/GPU based on a
compilation flag in a configuration file.
• We have mostly implemented an MPI-based framework:
– Running on multiple GPUs, multiple CPUs, or even a mix of the two
– Initialization code needs to be rewritten to work with MPI
– The API for specifying the experiments need to be updated to work
with the new code.
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
8
RADICAL
HRL PROPRIETARY
Shared Simulation & Experimentation
Infrastructure
For each experiment, a custom binary is compiled, with 4 components:
Code
Glue
Network
Creating a description of the neural network to be
simulated (connectivity, parameters, etc)
• PyNN style
C++ API
• Translation
code
Inputs
Generating the input signals for the network, or:
Taking the input signals from the virtual environment
C++ API
Computation
Simulating the spiking neural network on a CPU,
GPU, or a cluster; may have experiment-specific
compilation options
• C++ API
• Build scripts
Analysis
Printing experiment-specific and generic statistics
during the simulation; saving synaptic weights and/or
spike trains for off-line analysis.
C++ APIs:
• On-line
• Off-line.
• Portions of the code will be experiment-specific
• Portions of the code will be provided by the shared infrastructure
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
11
RADICAL
HRL PROPRIETARY
Neural Networks –
Levels of Flexibility
Currently we support three different levels of flexibility:
– Per-simulation – compile-time switches and compile-time global
constants defined in build scripts (including “experiment definition
files”). Fastest and most efficient, least flexible.
– Per-neuron – including defining properties of synapses as a
property of pre- or post-synaptic neurons.
– Per-synapse – memory-intensive, would like to avoid.
In general, would prefer to have the least flexibility that we can
get away with.
Simulator may support features that are not (yet?) expected to
be included in hardware, but we have to be careful.
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
12
RADICAL
HRL PROPRIETARY
Neural Model Flexibility
Per-simulation
Per-neuron
Neuron model
LIF or Izhikevich
Izhikevich a,b,c,d
Synapse model
• Enabled or not:
– Inhibitory STDP
– Short-term plasticity
– Weighted STDP
• Output: instant. or exp.
decay.
• Parameters:
– STDP: A+,A-,t+,t– Max weight
– STP, Inh STDP, etc
• Whether or not:
– Plastic (post &
pre – plastic
when both say
“yes”)
– Inhibitory (pre)
• Parameters:
Outputs
June 18, 2010
Axonal delays
(may decide not
to support)
– Will be made perneuron as needed
• Spike trains
(“dummy neurons”)
• Current injection
External inputs
Off-line data
Per-synapse
What to collect
Spike trains (new)
Synaptic weights
Spike trains
Work performed by HRL under DARPA contract HRL0011-09-C-001
13
RADICAL
HRL PROPRIETARY
API Overview
Network
BuildNetwork
Incremental construction of
neural networks
Immutable portion of the network
state (connectivity, parameters)
State
Mutable portion of the network
state (weights, statistics)
User’s code for
constructing a network
API/control dependencies,
not data flow
Compute
Execute simulation
steps
CPU
CUDA
Statistics
Experiment
At regular interval – save data for
future analysis, print basic stats
Controls the
computation
Users extend, if needed
InputGen
Virtual Environment
(optional)
Call-back functions to fill in input
spike trains and/or currents
Main
Users extend, if needed
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
14
RADICAL
HRL PROPRIETARY
Building networks incrementally.
API Fragments (Simplified)
struct NeuronKind {
struct NeuronKind SetInhibitory(bool inhibotory = true);
NumberGen a, b, c, d; // Izhikevich parameters – constant, or probability distribution parameters
}
class BuildNetwork { // Add a new set of neurons to the network
Population NewPopulation (int size, NeuronKind & neuron);
};
struct SynapseKind {
NumberGen weight; // Initial weight
NumberGen delay; // Axonal delay
}
class Population { // New synapses - to a different populations. Return the number of synapses
int ConnectFull(NeuronPopulation& to, SynapseKind & synapse);
int Connect1to1(NeuronPopulation& to, SynapseKind & synapse);
int ConnectRandom(NeuronPopulation& to, float probability, SynapseKind & synapse);
int ConnectGauss(NeuronPopulation& to, float max_probability, float expected_inputs, SynapseKind);
int ConnectFixedPreNum(NeuronPopulation& to, float n, const SynapseKind & synapse);
}
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
15
RADICAL
HRL PROPRIETARY
Building networks incrementally.
Example
BuildNetwork build;
Population excitatory = build.NewPopulation (800, NeuronKind());
Population inhibitory = build.NewPopulation(200, NeuronKind().SetInhibitory());
excitatory.ConnectRandom(excitatory, 0.2); // E -> E
excitatory.ConnectRandom(inhibitory, 0.2); // E -> I
inhibitory.ConnectRandom(excitatory, 0.2); // I -> E
inhibitory.ConnectRandom(inhibitory, 0.2); // I -> I
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
16
RADICAL
HRL PROPRIETARY
Overview of the
Shared Framework Code
• Major components:
– The simulator and related glue code (meant to be immutable)
– mk/config file - selects the main parts:
• Which experiment to run (some experiments have variants)
• Which computation engine to use (cpu or cuda)
• Which communication engine to use (null or mpi)
– Experiment-definition file (roughly one per experiment):
• Defines the per-simulation parameters
• Specifies which files contain the experiment code modules
– Experiment code:
• May be split into several files
• Pieces of an experiment code can be reused in different experiments
– Analyzers for off-line data analysis
• Generic and experiment-specific
• Code in ExRep contains the complete simulator, and
several sample experiments and analyzers.
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
17
RADICAL
HRL PROPRIETARY
ExRep Directory Structure
• ExRep repository root:
svn+ssh://[email protected]/exrep/
– Note: you do not have access to ExRep root, only to some
particular subdirectories
• Many versions of Subversion have a problem with it, make sure to use
svn version 1.6.11 – this latest version fixes some bugs related to this
scenario.
– SyNAPSE area in ExRep:
…/CRAD/SyNAPSE/
• SyNAPSE Shared Area – a subdirectory of …/SyNAPSE
…/Code/Shared
– Mentioned by name in the Shared Source Agreement
– Right now you’ll only get read access – and only to this subdirectory
• Other subdirectories of SyNAPSE directory will be created as needed
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
18
RADICAL
HRL PROPRIETARY
SyNAPSE Shared Framework
Directory Structure
Under svn+ssh://[email protected]/exrep/CRAD/SyNAPSE/Code/Shared/
– The OMake subdirectory contains some generic build scripts for the OMake Build
Tool – CUDA, MPI, etc.
– The Sim subdirectory contains the framework itself, with subdirectories:
• hrlsim – core framework, C++ code and headers
– hrlsim/config.h – generated by the build process, summarizes all the persimulation parameters (with comments) – more on next slide
• mk – core framework, build scripts
– mk/config – global configuration file for the build (not in ExRep, will be created on
first invocation of the build tool)
– mk/compute-consts.om – default simulation parameters
• sample_exp – sample simulation experiments and helper/template code
– …/mk/*.exp – experiment definition files
– …/src/ – C++ source files for experiments:
» setup a network, generate inputs, print extra statistics in on-line mode
– …/analyzers/ – off-line analysis templates and samples (C++)
– …/scripts/ – shell/Python scripts for follow-up analysis and visualization
• Data – directory for temporary off-line data (weights, spikes, etc).
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
19
RADICAL
HRL PROPRIETARY
Running an Existing Experiment
• Once:
– Download the OMake Build Tool from http://omake.metaprl.org/
• We will probably need to release an updated version soon
– Go to Sim directory
– Run “omake” – this will create a default mk/config file
• Edit the mk/config file
– It has several configuration variables, each fully commented
• Which experiment, which computation engine, initial RNG seed, etc
– The file is re-created by OMake on every run
• Only value changes for existing variables are allowed/preserved
– The list of valid experiments is generated from experiment definition files
(sample_exp/mk/*.exp)
• Run “omake” to build the custom simulator
– Generates the ./sim or ./sim-cuda binary
– Will generate hrlsim/config.h in the process
• Useful summary of per-simulation parameters
– Will also build all applicable analyzers
• Run the custom simulator “./sim N” (or “./sim-cuda N”)
– Where “N” is the simulation duration in virtual seconds
– “N” can be omitted when the experiment definition file gives a default duration
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
20
RADICAL
HRL PROPRIETARY
Defining a New Experiment
• Create a Sim/private_exp directory
– With the subdirectories following the structure of the sample_exp
• Create a new experiment definition file
– Needs to go into Sim/private_exp/mk/
– With a .exp extension
– Use an existing sample file as a template
• Create the C++ code
– Needs to go into Sim/private_exp/src/
– The experiment definition file should list all the .cpp files you are
using – from either private_exp/src of sample_exp/src
• Proceed as described in the previous slide
– After you create your new experiment definition file and run “omake”
for the first time, the list of available experiments in mk/config will
include your new experiment
June 18, 2010
Work performed by HRL under DARPA contract HRL0011-09-C-001
21