Technical Approach - Geometric Algorithms for Modeling

Download Report

Transcript Technical Approach - Geometric Algorithms for Modeling

Interactive CGF Computations using
COTS Graphics Processors
Dinesh Manocha
University of North Carolina at Chapel Hill
[email protected]
http://gamma.cs.unc.edu/LOS
11/28/2005
Manocha
1
UNC Collaborators
 Co-PI
 Ming C. Lin
 Research Staff
 Naga Govindaraju
 Dave Tuft
 Graduate Students
 Russ Gayle
 Brandon Lloyd
 Brian Salomon
 Avneesh Sud
 Sungeui Yoon
 Talha Zaman
11/28/2005
Manocha
2
Collaborative Effort
 RDECOM
Maria Bauer
Angel Rodriguez
 SAIC
Eric Root
Marlo Verdesca
Jaeson Munro
 Stanford
Pat Hanrahan
Ian Buck
11/28/2005
Manocha
3
Acknowledgements
 BCSEO
 DARPA
 RDECOM
 PEO STRI
11/28/2005
Manocha
4
Real-time Computational
Challenges for Computer
Generated Forces (CGF)
• Atmospheric transport
models
• Vehicle dynamics
• Wide area sensors
• Petabyte Urban
Terrain Databases
Real-time Terrain Reasoning for
Computer Generated Forces
algorithms are O(N2 ) where N = objects/entities in the
CGF database (e.g., sensors, platforms, buildings, people)
• Currently over 40% of CGF CPU time for battalion-level
scenarios spent in:
• Best
– Collision detection
– Line of sight computation
– Terrain placement
• Current system can barely handle 300 entities on a 300K polygon terrain
models at 10m x 10m resolution
• Need 200-500 times improvement to handle sub-meter
resolution terrain model
• CPUs progressing at Moore’s law (1.7x per year)  need
more than 7-8 years to catch on
Current Desktop
System
GPU (500 MHz)
CPU
(3 GHz)
2 x 1 MB Cache
System Memory
(2 GB)
AGP Memory
(512 MB)
6.4 GB/s
bandwidth
Video Memory
(512 MB)
PCI-E Bus
(4 GB/s)
GPU (500 MHz)
Video Memory
(512 MB)
35.2 GB/s
bandwidth
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
GeForce 7800 – 302M
Transistors
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
CPU vs. GPU
11/28/2005
Manocha
9
CPU vs. GPU
(Henry Moreton: NVIDIA, Aug. 2005)
PEE 840
7800GTX
GPU/CPU
Graphics GFLOPs
25.6
1300
50.8
Shader GFLOPs
25.6
313
12.2
Die Area (mm2)
206
326
1.6
Die Area normalized
206
218
1.1
Transistors (M)
230
302
1.3
Power (W)
130
65
0.5
GFLOPS/mm
0.1
6.0
47.9
GFLOPS/tr
0.1
4.3
38.7
GFLOPS/W
0.2
20.0
101.6
11/28/2005
Manocha
10
Goal:Exploit GPUs
for CGF Computations
GPUs: Growing Faster than Moore’s Law
This graph highlights the relative growth rate of GPUs vs. CPUs. GPUs have
been growing at a rate faster than Moore’s law and this trend is expected to
continue for at least 5 more years.
11/28/2005
Manocha
11
Issues in using GPUs
 Programmability
 Precision
 Handling large data
11/28/2005
Manocha
12
Project Accomplishments
 GPU-based LOS algorithm
 150-200x improvement in LOS query
 Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
11/28/2005
Manocha
13
Project Accomplishments
 GPU-based LOS algorithm
 150-200x improvement in LOS query
 Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
 Region-based visibility algorithms to accelerate LOS
(Supported by ATO)
 4-10x further improvement in LOS query
 Integrations into OneSAF: 10x simulation speed
improvement in urban environments (3000 entities)
11/28/2005
Manocha
14
Project Accomplishments
 GPU-based LOS algorithm
 150-200x improvement in LOS query
 Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
 Region-based visibility algorithms to accelerate LOS
 4-10x further improvement in LOS query
 Integrations into OneSAF: 10x simulation speed
improvement in urban environments (3000 entities)
 GPU-based route planning
 10-30X improvement in route computation
 10x simulation speed improvement (3000 entities)
11/28/2005
Manocha
15
Project Accomplishments
 GPU-based LOS algorithm
 150-200x improvement in LOS query
 Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
 Region-based visibility algorithms to accelerate LOS:
 4-10x further improvement in LOS query
 Integrations into OneSAF: 10x simulation speed
improvement in urban environments (3000 entities)
 GPU-based route planning
 10-30X improvement in route computation
 10x simulation speed improvement (3000 entities)
 GPU-based collision detection
 10x estimated improvement in collision query
 10x simulation speed improvement (150 entities)
11/28/2005
Manocha
16
Project Accomplishments
 Successful demonstration at DARPATech’2005;
I/ITSEC’04; I/ITSEC’05 (RDECOM Booth #2266)
 Other GPU-based algorithms & applications
 Database, data streaming, numerical computation, fluid
dynamics, sorting, motion planning
11/28/2005
Manocha
17
LOS Integration Process
OneSAF/GPU Requirements
(SAIC/UNC)
OneSAF Technical Report
(SAIC)
GPU Algorithm Creation
(UNC)
Integration into OOS
(SAIC)
•
•
•
•
•
Execute Unit Test
(SAIC/UNC)
11/28/2005
Manocha
Add several OpenGL dll’s to ERC libraries
Place c++ header files for OpenGL among the ERC code
Create a new directory among the ERC code
- Setup a new makefile/buildfile, to allow GPU to build as its own library
Add calls to ERC Initialization to:
- Gather all the triangles in the entire database
- Gather all features in the database
- Pass all triangles and features into the initialization for the GPU
Replace all original LOS calls with the GPU counterpart
OneSAF Scenario Creation
(SAIC)
18
OneSAF Benchmark Results
(SAIC)
OneSAF with GPU-based LOS Algorithm:
Demonstration
•
Average time for Standard LOS service call:
1-2 millisecond (w/o GPU-based algorithm)
•
Average time for GPU LOS service call:
8-12 microseconds
Almost 200X speedup for single LOS query
•
15-20x improvement in OneSAF simulation speed in JRTC
terrain with 5000 entities
19
Databases: Predicate Evaluation
CPU implementation — Intel compiler 7.1 with SSE optimizations
(CPU + GPU) is ~20 times faster than only CPU
SIGMOD 2004
11/28/2005
Manocha
20
Comparison on Different GPUs:
Super-Moore’s Law
11/28/2005
Manocha
21
GPUSort: 32-bit floating point
inputs
6
GPUSort
GeForce 6800 Ultra
Sorting Time (seconds)
5
CLAPACK slasrt(),
3.4 GHz Pentium IV
4
Optimized sorting using Hyper-threading,
3.4 GHz Pentium IV
3
2
GPUSort
GeForce 7800 GTX
1
0
0
2
4
6
8
10
12
14
16
Input size (in Millions)
GPUSORT: slashdot.org & Tom’s Hardware guide (750 downloads in 6 weeks)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
LU-Decomposition with Partial
Pivoting (32-bit inputs)
IEEE/ACM SuperComputing 2005
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Project Status
 Integration of GPU-based algorithms in OOS
 Line-of-sight
 Route planning
 Collision detection
 35 publications in last 18 months
 2 best paper awards (Pacific Graphics’04; IEEE VR’05)
 Paper presentations on GPU technology in OOS
 Poster presentation at Army Science Conference’04
 Best paper in Research & Development Track at I/ITSEC’05
– Nominated for best overall paper award at I/ITSEC’05
 Other applications: sorting, stream data mining, surgical
simulation, physical simulation, computer animation,
high-performance computing
 Other collaborators: NVIDIA, Intel, ATI, AGEIA, Disney
11/28/2005
Manocha
24
Future Goals
 Develop novel GPU-based algorithms
 Other LOS computations: attenuation, handling smoke
 Force and atmospheric simulations
 Combine with multi-resolution representations
 Handle very large and complex terrains
 GPUs clusters for modeling and simulation
 Extension to multiple simulation environments, WARSIM,
JMTK, GIG
11/28/2005
Manocha
25