Kd-Tree Acceleration Structures for a GPU Raytracer

Download Report

Transcript Kd-Tree Acceleration Structures for a GPU Raytracer

GH05
KD-Tree Acceleration Structures
for a GPU Raytracer
Tim Foley, Jeremy Sugerman
Stanford University
Motivation
GH05
• Accelerated raytracing
– On commodity HW
– Production rendering
– Real-time applications?
• Performance trend
– 9800 XT : 170M ray-triangle intersects/s
– X800 XT PE: 350M ray-triangle intersects/s
GPU Raytracing
• Promising early results
– Simple scenes
• Uniform grid
– Problems with complex scenes
• Hierarchical accelerator (kd-tree)
– Improve scalability
GH05
Outline
• Background
– GPU Raytracing
– KD-Tree Algorithm
• KD-Restart, KD-Backtrack
• Results
• Future Work
GH05
Background
• RayEngine [Carr et al. 2002]
– Parallel ray-triangle intersection
– Host controls culling
• [Purcell et al. 2002]
– Entire raytracing pipeline
– Many rays required for efficiency
– Uniform Grid
GH05
Why not KD-Tree?
• Uniform grid acceleration structure
– Regular structure = efficient traversal
– Regular structure = poor partitioning
• KD-Trees
– Adapt to scene complexity
– Compact storage, efficient traversal
– “Best” for CPU raytracing [Havran 2000]
GH05
KD-Tree
tmin
GH05
Z
X
B
X
Y
C
Y
D
Z
A
A
tmax
B
C
D
KD-Tree Traversal
GH05
Z
X
B
X
Y
C
Y
D
Z
A
A
B
C
D
Per-Fragment Stacks
• Parallel (per-ray) push
– No indexed write in fragment program
• Per-ray stack storage
• [Ernst et al. 2004]
– Emulate push with extra passes
– Impractical, slow
GH05
Our Contribution
• Stackless kd-tree traversal algorithms
– KD-Restart
– KD-Backtrack
GH05
Observation
GH05
Z
X
B
X
Y
C
Y
D
Z
A
A
B
Current leaf’s tmax = Next leaf’s tmin
C
D
KD-Restart
GH05
Z
X
B
• Standard traversal
– Omit stack operations
– Proceed to 1st leaf
Y
C
A
D
• If no intersection
– Advance (tmin,tmax)
– Restart from root
• Proceed to next leaf
KD-Restart
• Restart traversal after each leaf
– m leaves
– Average depth d
– Cost O(m*d)
• Balanced tree of n nodes
– Upper bound: O(n log(n))
• Standard algorithm: O(n)
– Expected: O( log(n) )
GH05
Observation
GH05
Z
X
B
X
Y
C
Y
D
Z
A
A
B
Ancestor of A is parent of Z
C
D
KD-Backtrack
GH05
Z
X
• If no intersection
B
– Advance (tmin, tmax)
– Start backtracking
Y
C
A
D
• If node intersects
(tmin, tmax)
– Resume traversal
• Proceed to next leaf
KD-Backtrack
• Backtrack after leaf
– Revisits previous nodes
– At most twice: from left, right
• Within constant factor of standard traversal
– Upper bound: O(n)
– Expected: O( log(n) )
• Requires additional storage
– Parent pointers
– Bounding boxes for internal nodes
GH05
Implementation
GH05
• Built GPU raytracer in Brook [Buck et al.]
• 4 intersection schemes:
– Brute Force
– Uniform Grid
– KD-Restart
– KD-Backtrack
Scenes
GH05
Cornell Box
Stanford Bunny
32 triangles
69451 triangles
BART Robots
BART Kitchen
71708 triangles
110561 triangles
Results
Box
9
GH05
Bunny
Robots
Kitchen
12.9
8
7
Brute
Grid
Restart
Backtrack
6
5
4
3
2
1
0
Relative speedup over brute-force intersection.
Results
GH05
Traverse
Backtrack
Ideal
10.86M
0
Restart
21.80M
0
Backtrack
10.86M
7.78M
Intersect
5.91M
5.91M
5.91M
Rays in each state throughout traversal.
Discussion
• Absolute performance
– Trails best CPU implementations 5-6x
• Sources of inefficiency
– Load balancing
– Data reuse
GH05
Load Balancing
• Subset of rays intersecting, traversing
– Occlusion queries to select kernel
– Early-Z to cull inactive rays
• Approximately 5x overhead
– Query, kernel switch overhead
– Worse with fewer rays
GH05
Data Reuse
GH05
• Every kernel
– Loads ray origin/direction
– Load/Store traversal state
• Consumes streaming bandwidth
– We are bandwidth-limited
– CPU implementation stores these in registers
Branching
GH05
• Merge multiple passes into larger kernel
– Fragment branches for load balancing
– Avoid load/store of reused data
• Current branching has high overhead
• Shifts efficiency burden to HW
Conclusion
• Stackless Traversal
– Allows efficient GPU kd-tree
– Scales to larger, more complex scenes
• Future Work
– Changes in HW
– Alternative acceleration structures
– “Out-of-core” scenes
– Dynamic scenes
GH05
Acknowledgements
• Tim Purcell (NVIDA)
– Streaming raytracer
• Mark Segal (ATI)
– Demo machine
• NVIDIA, ATI : HW
• DARPA, Rambus : Funding
GH05
Questions
GH05