Mapping Computational Concepts to GPUs

Download Report

Transcript Mapping Computational Concepts to GPUs

General-Purpose Computation on
Graphics Hardware
Introduction
David Luebke
University of Virginia
Course Introduction
• The GPU on commodity video cards has
evolved into an extremely flexible and
powerful processor
– Programmability
– Precision
– Power
• This course will address how to harness that
power for general-purpose computation
Motivation: Computational Power
• GPUs are fast…
–
–
–
–
3.0 GHz dual-core Pentium4: 24.6 GFLOPS
NVIDIA GeForceFX 7800: 165 GFLOPs
1066 MHz FSB Pentium Extreme Edition : 8.5 GB/s
ATI Radeon X850 XT Platinum Edition: 37.8 GB/s
• GPUs are getting faster, faster
– CPUs: 1.4× annual growth
– GPUs: 1.7×(pixels) to 2.3× (vertices) annual growth
Courtesy Kurt Akeley,
Ian Buck & Tim Purcell, GPU Gems (see course notes)
Motivation: Computational Power
Courtesy Ian Buck, John Owens
An Aside: Computational Power
• Why are GPUs getting faster so fast?
– Arithmetic intensity: the specialized nature of GPUs
makes it easier to use additional transistors for
computation not cache
– Economics: multi-billion dollar video game market is
a pressure cooker that drives innovation
Motivation: Flexible and Precise
• Modern GPUs are deeply programmable
– Programmable pixel, vertex, video engines
– Solidifying high-level language support
• Modern GPUs support high precision
– 32 bit floating point throughout the pipeline
– High enough for many (not all) applications
Motivation: The Potential of GPGPU
• In short:
– The power and flexibility of GPUs makes them an
attractive platform for general-purpose computation
– Example applications range from in-game physics
simulation to conventional computational science
– Goal: make the inexpensive power of the GPU
available to developers as a sort of computational
coprocessor
The Problem: Difficult To Use
• GPUs designed for & driven by video games
– Programming model unusual
– Programming idioms tied to computer graphics
– Programming environment tightly constrained
• Underlying architectures are:
– Inherently parallel
– Rapidly evolving (even in basic feature set!)
– Largely secret
• Can’t simply “port” CPU code!
Course goals
• A detailed introduction to general-purpose
computing on graphics hardware
• We emphasize:
– Core computational building blocks
– Strategies and tools for programming GPUs
– Tips & tricks, perils & pitfalls of GPU programming
• Case studies to bring it all together
Why a SIGGRAPH Course?
• Why SIGGRAPH, not (say) Supercomputing?
– Many graphics applications can benefit from GPGPU
• “Hot topic” examples: shadows, level sets, fluids
• Keeping computation on-card!
– Many graphics applications strive for visual
plausibility rather than rigorous scientific realism
• Better tolerate GPU limitations in precision, memory
• Well suited as GPGPU “early adopters”
– GPGPU programming still requires expertise of
SIGGRAPH audience
Course Prerequisites
• We assume
– Familiarity with interactive graphics and computer
graphics hardware
– Ideally, some experience programming vertex and
pixel shaders
• Target audience
– Researchers interested in GPGPU
– Graphics and games developers interested in
incorporating these techniques into their work
– Attendees wishing a survey of this exciting field
Course Topics
•
•
•
•
GPU building blocks
Languages and tools
Effective GPU programming
GPGPU case studies
Course Topics: Details
• GPU building blocks
– Linear algebra
– Sorting and searching
– Geometric Computing
• Languages and tools
– High-level languages
– Debugging tools
Course Topics: Details
• Effective GPU programming
–
–
–
–
Efficient data-parallel programming
GPU memory resources & data layout approaches
GPU computation strategies & tricks
Data structures
• Case studies in GPGPU Programming
– Databases and data mining operations on GPUs
– Particles & grids on GPUs
– Adaptive shadow maps & octree textures on GPUs
Speakers
• In order of appearance:
–
–
–
–
–
–
–
–
David Luebke, University of Virginia
Mark Harris, NVIDIA
Jens Krüger, TU-Munich
Tim Purcell, NVIDIA
Naga Govindaraju, University of North Carolina
Ian Buck, NVIDIA
Cliff Woolley, University of Virginia
Aaron Lefohn, University of California Davis
Schedule
8:30
Introduction
Luebke
Welcome, overview, the graphics pipeline
GPU Building Blocks
8:50 Computational concepts: CPUGPU
Harris
Streaming, resources, CPU-GPU analogies, branching
9:15
Linear algebra
Krüger
Representations, operations, example algorithms
9:50
Sorting & Searching
Bitonic sort, Binary & k-nearest neighbor search
10:15 Break
Purcell
Schedule
10:30 Geometric computation
Govindaraju
Visibility, collision & proximity, reliable computation
Languages and Tools
11:00 High-level languages
Buck
Cg/HLSL/GLslang, Sh, Brook
11:30 Debugging tools
imdebug, DirectX/OpenGL shader IDEs, ShadeSmith
Purcell
Schedule
Effective GPGPU Programming
11:50 GPU program optimization
Woolley
Computational frequency, profiling, load balancing
12:15 Lunch break
1:45 GPU memory models
Lefohn
Memory objects, layout of data structures, FBOs
2:15
GPU computation strategies & tricks
Buck
Precision, performance, scatter, branching
2:55
GPU data structures
High-level data structures
3:30
Break
Lefohn
Schedule
Case Studies
3:45 Databases & data mining on GPUs
Govindaraju
Queries, aggregation, mining frequencies & quantiles
4:15
Geometry processing on GPUs
Krüger
Particles, grids, PBO/VBO vs. FBO vs. VTF/SM3.0
4:45
Applications of adaptive
data structures
Lefohn
Adaptive shadow maps, octree textures
Conclusion
5:15 Question-and-answer session
5:30 Wrap!
All
GPU Fundamentals: The Graphics Pipeline
Graphics State
GPU
Shade
Final Pixels (Color, Depth)
Rasterize
Fragments (pre-pixels)
Assemble
Primitives
Screenspace triangles (2D)
Transform
& Light
Xformed, Lit Vertices (2D)
CPU
Vertices (3D)
Application
Video
Memory
(Textures)
Render-to-texture
• A simplified graphics pipeline
– Note that pipe widths vary
– Many caches, FIFOs, and so on not shown
GPU Fundamentals: The Modern Graphics Pipeline
Graphics State
• Programmable vertex
processor!
GPU
Fragment
Shade
Processor
Final Pixels (Color, Depth)
CPU
Rasterize
Fragments (pre-pixels)
Assemble
Primitives
Screenspace triangles (2D)
Xformed, Lit Vertices (2D)
Vertices (3D)
Application
Vertex
Transform
Processor
Video
Memory
(Textures)
Render-to-texture
• Programmable pixel
processor!
The Coming Soon Graphics Pipeline
Graphics State
• Programmable
primitive assembly!
GPU
Fragment
Processor
Final Pixels (Color, Depth)
CPU
Rasterize
Fragments (pre-pixels)
Geometry
Assemble
Processor
Primitives
Screenspace triangles (2D)
Xformed, Lit Vertices (2D)
Vertices (3D)
Application
Vertex
Processor
Video
Memory
(Textures)
Render-to-texture
• More flexible memory
access!
GPU Pipeline: Transform
• Vertex processor (multiple in parallel)
– Transform from “world space” to “image space”
– Compute per-vertex lighting
GPU Pipeline: Rasterize
• Rasterizer
– Convert geometric rep. (vertex) to image rep.
(fragment)
• Fragment = image fragment
– Pixel + associated data: color, depth, stencil, etc.
– Interpolate per-vertex quantities across pixels
GPU Pipeline: Shade
• Fragment processors (multiple in parallel)
– Compute a color for each pixel
– Optionally read colors from textures (images)
Coming Up
• Next: Mapping computational concepts
to the GPU
• Also coming up:
– Core building blocks for GPGPU computing
– Memory layout, data structures, and algorithms
– Detailed advice on writing high performance
GPGPU code
– Lots of examples
Course Evaluation Form
• Please help us improve the GPGPU Course
• Fill out the SIGGRAPH evaluation form:
http://www.siggraph.org/cgi-bin/cgi/exCoursesEval.html
Choose Course 39: GPGPU