Interactive Rendering with Coherent Raytracing

Download Report

Transcript Interactive Rendering with Coherent Raytracing

Interactive Rendering With
Coherent Ray Tracing
Eurogaphics 2001
Wald, Slusallek, Benthin, Wagner
Comp 238, UNC-CH, September 10, 2001
Joshua Stough
The Gist
• The authors present “ a highly optimized implementation
of a ray tracer that improves performance by more than an
order of magnitude compared to currently available ray
tracers…makes better use of computational
resources…and better exploits image and object space
coherence.”
Organization
• Why Ray Tracing over Rasterization?
• An Optimized Ray Tracing Implementation
– Code structure, Caching, Coherence
– Intersections
– Volume Traversal (Memory Layout, Overhead)
• Performance of the Ray Tracing Engine
Why Ray Tracing Over Raster?
•
•
•
•
•
•
•
•
Automatic Occlusion Culling
Logarithmic Complexity in number of scene primitives
Flexible sampling – allows for more effective use of time
Efficient Shading – “avoids computation for invisible geometry”
Shader Programming – direct use verses pipeline model
More Correct Physically – and can use the same approximations
“Trivially Parallel” – though initial resources required are higher
Coherence
“Coherence is the key to efficiency.”
• Basic (Recursive Tree) Ray Tracer lacks concern for:
– Modern CPU design – pipeline execution
– Caching to hide low bandwidth and high latency on main memory
• Instead, “pay particular attention to:”
– Caching – efficient/aligned data structures, traversing mechanisms
– Pipelining
– Parallel execution possibilities
• “We show that even today the performance of a software
ray tracer on a single PC can challenge dedicated
rasterization hardware for complex environments.”
An Optimized Ray Tracing Implementation
•
•
•
•
Reducing Code Complexity
Optimizing cache usage
Reducing memory bandwidth
Prefetching Data
• And with SIMD/SSE:
– Ray intersections
– Scene traversal
– Shading
Code Complexity
• Few conditionals, Tight Inner loops
• Axis aligned BSP Tree – iterative algorithm possible
• Triangles only – reduces branches
• Shading less important – once verses 40-50 traversals 5-10
intersections
Caching
• Performance bound by bandwidth, not CPU speed
– BSP traversal, low computation to bandwidth ratio
• Fetching on entire cache line
• Carefully lay out data
– Data together only if used together (geometry vs. shading)
– Separate read-only (preprocessing) data from read-write (mailboxes)
• Hide latency with prefetching
Ray-Triangle Intersection
Compute distance to plane (defined by triangle) along ray
If distance is within current interval for testing (via BSP)
Compute hit point
Project into an axis-aligned plane (largest angle to normal)
Barycentric coordinates of the hit point in 2d
Data alignment – 2 2D edge equations, plane equation for
distance, tag for projection axis = 9 floats + tag. Padded to
48 bytes (memory tradeoff).
CPU Cost of Ray-Triangle Test
Bary.
C Code
Min 78
Pleucker Bary
SSE
SSE
77
22
Max 148
123
SpeedUp
3.5
41
3.7
**
-41 cycles ~ 20M ray-triangle intersections/sec
-SSE requires bundling four rays at a time.
The Bundling of Four Rays at Once
• Better than four Triangles/One Ray
• Requires new Traversal algorithm
• Potential Overhead
• Primary rays verses shadow rays
BSP Traversal
• Before, 2x-3x more time spent than on intersections
• Axis Aligned BSP Tree
– Only 2 binary decisions – efficient in parallel
– Any ray traverses a child node => All four traverse in parallel
• Algorithm
–
–
–
–
Maintain current ray segment [near, far]
Calculate distance to splitting plane
Three cases
Update segments and traverse children if necessary
BSP Tree Memory Layout
• Caching and Prefetching in mind
• 1 children node pointer, node type flag, split coordinate
– = 8 bytes/node = 4 nodes/cache line.
– Aligned children
– Memory bandwidth reduced by 4x.
• Possible Overhead
– Incoherent rays = high overhead
– Worst case = no worse than normal
Performance of the Ray Tracer
Considerations
• 11-15x Performance Increase!
• RTRT on 256MB RAM, others on 1GB!
BUT
• Difference in features
• Others not limited to triangles
• Others did not target performance
Comparison With Raster Hardware
Miscellaneous
• Reflections/Shadows
– Coherence less likely
– Hot spots, but same hacks as in raster (environment maps).
• Linear scaling for rasterization vs. Logarithmic for ray
tracing => higher complexity in favor of ray tracing
http://graphics.cs.uni-sb.de/%7Ewald/Publications/EG2001_IRCRT/Gallery.html