Pattern-Forming Instabilities in the Swift

Download Report

Transcript Pattern-Forming Instabilities in the Swift

Pattern-Forming Instabilities in
the Swift-Hohenberg Model
Micah Brodsky
6.338 Spring ‘08
Swift-Hohenberg Model
• Scalar partial differential equation
• Can derive as approximation to NavierStokes at onset of convection
• But used generally as case study in
pattern formation
• Has a Lyapunov functional – relaxes
towards a steady state, no turbulence
– But does interesting things along the way
The Model
• ∂u/∂t = εu - u3 – (∇2 – 1)2u + g2u2
– Expands to:
(ε - 1)·u - ∇2u - 2∇4u + g2u2 - u3
Linear instability
High-frequency
stabilization
Low-order
nonlinearities
Preferred length scale
• One of the simplest possible patternforming equations
Saturation
term!
Let’s try it out…
ε = 1.5, g2 = 0
Initialized with low-amplitude random perturbations about zero
What Do We Understand?
• We can easily predict simple features
– Instability
– Characteristic wavelength
• Perturbational analysis gives a bit more
– Wavelength instabilities (e.g. Zigzag and
“Eckman”)
– Higher order wave effects (e.g. hexagons!)
• But what about high-amplitude behavior?
Scales of order
ε=4
ε = 0.5
ε = 0.1
Competition Among Patterns
ε = 0.1, g2 = 0.5
Initialized with low-amplitude random perturbations about zero,
along with a strip of -1.
High-Amplitude Instabilities
• Initially saturates to
uniform steady state
solutions
– 0 is linearly unstable,
but ±√(ε – 1) are
linearly stable
• But then…
ε = 3, g2 = 0
Initialized with low-amplitude random perturbations plus a slight positive
bias, along with a strip of -1.
Implementation
• Explicit finite difference solver
– I.e. ui+1[x] = ui[x] + dt * ( … ui[x-1] - 2ui[x] + ui[x+1] ... )
– Instability is the bottleneck (and ∇4 makes it hit hard)
• C++ / MPI on SiCortex
– Domain split into vertical slabs
– Processors exchange 2-cell-wide boundaries after
every time step
Output
• SDL for real-time visualization
– Forwarding X to the rank-0 node is a pain but
can be done
• Raw data dump to file, imported into
Matlab with fread()
Where’s The Bottleneck?
• Worst case communication / computation:
~ 4 doubles / ~ 66 flops
– But SiCortex has lots of bandwidth
• Most of the time is spent computing
– Compiler effects
– Memory, memory, memory…
Compiler
• -O3 goes without saying
– But wait, why is it generating such crummy machine
code (sometimes you have to look!)
– Pointer analysis!
• Compiler can use registers and software pipelining efficiently
only if it knows that it’s not tripping over its own calculations
• Use __restrict__ pointers wherever possible
• Means no other pointer points to the same data
• 50% FLOPS improvement on simpler kernel
(diffusion equation)
• But… only about 10% here.
Memory Hierarchy
• Profile with papiex -a
• L1 cache easily overrun
– Solution: block striping
Memory access
stencil:
– Adds some overhead, but about 10% net savings on
“tall” problems (why not more?)
• L2 usually okay
– (though not if we go 3D)
Communication
• SiCortex has a lot of bandwidth
– 1 double for every 2 clock cycles (without congestion)
– This is plenty to keep us busy
• But, communication is still up to 25% of runtime
(use mpipex, subtract startup overhead)
– We can still gain by overlapped communication &
computation
– Or can we?
Overlapped Communication
• Solution:
– Compute edges first, then start non-blocking transfers
(Isend, Irecv)
– Wait for results only when
we need to compute next
edges
• Problem! This kills our L1 cache performance
gains
– Communication is heaviest when middle block is
narrowest
• Net result: 5-7% gain on large problems
• 13% on large, “thin” problems
The real problems
• Kernel has too many instructions!
– Processor has poor instruction-level parallelism (ILP)
• Not to mention the time step is *really* slow
– Implicit / Crank-Nicholson methods?
– Spectral methods?
• Scalability
– Slab slicing limits to width / 2 processors
– Switch to block decomposition?
• 3D
– I really want to see 3D results. In 3D. =)