Dynamic Scenes

Download Report

Transcript Dynamic Scenes

Parallelism Just Isn’t Enough
Dynamic Scenes
Paul Arthur Navrátil
Outline
• Toward Rapid Reconstruction for Animated Ray
Tracing
Lext and Akenine-Möller, Eurographics 2001
– Lessons for parallel implementations?
• Parallel Tree Building on a Range of Shared Address
Space Multiprocessors: Algorithms and Application
Performance
Shan and Singh, Proc. IPPS/SPDP 1998
– Application to Ray Tracing acceleration structures?
Animated Ray Tracing: Motivation
[Lext and Akenine-Möller, 2001]
• Two competing goals in graphics processing
– Generate photo-realistic images
– Render at real-time rates ( > 20 fps )
• Can Ray Tracing give us both?
– Parallelizing RT and frameless rendering help
– Latest efforts yielding interactive rates for static scenes
(e.g., [Wald et al., 01])
– Dynamic scenes still too computationally intensive
Why doesn’t parallelism solve the
problem?
• Data Structure overhead
– Reconstructing the
acceleration data structures
has worse complexity and
less obvious parallelism
– In a dynamic scene, all
changed objects need to be
updated in the acceleration
structure
– Traditionally, this means
rebuild it!
Previous Work
• Special-case animated objects [Parker et al., 1999]
– Objects outside the acceleration structure not scalable
• Reuse frame information to save render time
[Adelson and Hodges, 1995]
– Performance improved 92%, but only for scenes with
eye movement
• Use lazy evaluation to prune acceleration structure
[McNeill et al., 1992]
– Evaluates only the structure that is actually used
– Only tested on static scenes
Insight: Only Part of the Scene is
Dynamic!
• Distinguish between static and dynamic objects in
acceleration structure
• Dynamic objects exhibit spatial locality
• Update transform matrices for each scene node,
transform rays before calculating intersection
[Wald et al., 2002]
Dynamic versus Static Parts
Idea: Be Lazy!
• If modifying the scene graph fails to provide
significant speedup (or even if it does) use lazy
evaluation of the acceleration structure
– Evaluate a subsection only when a ray enters the voxel
• Adapt acceleration structure according to use
– Simplify or eliminate the structure if usage is low
– Do this at runtime, based on some feedback measure?
• Neither of these ideas were in the tested system,
but it can be extended to include them
Data Structure: Hierarchical
Bounding Boxes
• Surround each set of primitives with a minimum
area Oriented Bounding Box (OBB)
– Set defined as primitives to which one transform is
applied (static or dynamic)
– Put a recursive grid in each OBB
• Encapsulate all top-level dynamic OBBs in a
special OBB-grid
– These are recalculated every frame due to the
movement of the contained grids
Algorithm Execution
• Create OBB grids
– One grid for static objects, the rest contain all dynamic
objects
• Update OBB grids
– Transform to root node CS, then to original node CS
(previous frame?) Recurse if this contains subgrids
– Apply incremental transformations to primitives
– Create new OBB around subgrids (and primitives?)
– Transform to new OBB CS
BART Robots Benchmark video
http://www.ce.chalmers.se/BART/
A Benchmark for Animated Ray Tracing (BART) [Lext et al., 2001]
Results: A Silver Lining
Tree Building Methods: Motivation
• Use tree structures to organize work solving
N-body problems
– Classic example: find positions of N bodies attracted by gravity
after a period of time
– Graphics corollary? Find position of dynamic objects in given
frame
• Studies 5 strategies across 4 systems
–
–
–
–
Physically distributed memory (SGI Origin 2000)
Bus-based shared memory MP (SGI Challenge)
Shared virtual memory in software (Intel Paragon)
Configurable memory in software with hardware assistance
(Wisconsin Typhoon-zero)
Strategy Characteristics
• ORIG
– Global octree built by processors loading objects into a single
shared tree.
– Split cells into 8 subcells when objects within cell exceed threshold
– Processor operates on cells it ‘created’
• ORIG-LOCAL
–
–
–
–
Optimized version of ORIG
Uses different data structures for internal nodes than leaves
Processor allocates and manages its own cell and leaf arrays
Thus cells can be kept in contiguous memory
Strategy Characteristics
• UPDATE
– Insight: distributions evolve slowly over time
(in animation too?)
– Objects that move out of the cell in which they’re
placed (think: location in the scene) is small
– Update only as much as the tree as is necessary
– Leverage tree hierarchy to find new cell (if cells
arranged according to scene space)
Strategy Characteristics
• PARTREE
– Insight: in previous algorithms, a lock is needed to
ensure mutual exclusivity on the single global tree
– Causes synchronization overhead, contention, and
remote access overhead for root and high inner nodes of
tree
– Solution: each processor creates a local tree, populates
it, then merges tree into global tree
– Uses ‘tree template’ to simplify merge
– Global inserts and synchronizations reduced
– Redundant work minimized if spatial locality used to
distribute objects to processors
Strategy Characteristics
• SPACE
– Divide up space among processors, rather than objects
(Pharr?)
– Each process loads objects that are in its space
Ideally space units (voxels) map to tree cells
– No need for locking, but high potential for load imbalances if
number of objects per space is unbound
– Can lose data locality since processors don’t necessarily compute
on the objects they put in the tree (true for graphics?)
– No locking during global tree assembly, because only one
processor has the cells for a given subtree