Technical Approach - Geometric Algorithms for Modeling
Download
Report
Transcript Technical Approach - Geometric Algorithms for Modeling
Interactive CGF Computations using
COTS Graphics Processors
Dinesh Manocha
University of North Carolina at Chapel Hill
[email protected]
http://gamma.cs.unc.edu/LOS
11/28/2005
Manocha
1
UNC Collaborators
Co-PI
Ming C. Lin
Research Staff
Naga Govindaraju
Dave Tuft
Graduate Students
Russ Gayle
Brandon Lloyd
Brian Salomon
Avneesh Sud
Sungeui Yoon
Talha Zaman
11/28/2005
Manocha
2
Collaborative Effort
RDECOM
Maria Bauer
Angel Rodriguez
SAIC
Eric Root
Marlo Verdesca
Jaeson Munro
Stanford
Pat Hanrahan
Ian Buck
11/28/2005
Manocha
3
Acknowledgements
BCSEO
DARPA
RDECOM
PEO STRI
11/28/2005
Manocha
4
Real-time Computational
Challenges for Computer
Generated Forces (CGF)
• Atmospheric transport
models
• Vehicle dynamics
• Wide area sensors
• Petabyte Urban
Terrain Databases
Real-time Terrain Reasoning for
Computer Generated Forces
algorithms are O(N2 ) where N = objects/entities in the
CGF database (e.g., sensors, platforms, buildings, people)
• Currently over 40% of CGF CPU time for battalion-level
scenarios spent in:
• Best
– Collision detection
– Line of sight computation
– Terrain placement
• Current system can barely handle 300 entities on a 300K polygon terrain
models at 10m x 10m resolution
• Need 200-500 times improvement to handle sub-meter
resolution terrain model
• CPUs progressing at Moore’s law (1.7x per year) need
more than 7-8 years to catch on
Current Desktop
System
GPU (500 MHz)
CPU
(3 GHz)
2 x 1 MB Cache
System Memory
(2 GB)
AGP Memory
(512 MB)
6.4 GB/s
bandwidth
Video Memory
(512 MB)
PCI-E Bus
(4 GB/s)
GPU (500 MHz)
Video Memory
(512 MB)
35.2 GB/s
bandwidth
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
GeForce 7800 – 302M
Transistors
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
CPU vs. GPU
11/28/2005
Manocha
9
CPU vs. GPU
(Henry Moreton: NVIDIA, Aug. 2005)
PEE 840
7800GTX
GPU/CPU
Graphics GFLOPs
25.6
1300
50.8
Shader GFLOPs
25.6
313
12.2
Die Area (mm2)
206
326
1.6
Die Area normalized
206
218
1.1
Transistors (M)
230
302
1.3
Power (W)
130
65
0.5
GFLOPS/mm
0.1
6.0
47.9
GFLOPS/tr
0.1
4.3
38.7
GFLOPS/W
0.2
20.0
101.6
11/28/2005
Manocha
10
Goal:Exploit GPUs
for CGF Computations
GPUs: Growing Faster than Moore’s Law
This graph highlights the relative growth rate of GPUs vs. CPUs. GPUs have
been growing at a rate faster than Moore’s law and this trend is expected to
continue for at least 5 more years.
11/28/2005
Manocha
11
Issues in using GPUs
Programmability
Precision
Handling large data
11/28/2005
Manocha
12
Project Accomplishments
GPU-based LOS algorithm
150-200x improvement in LOS query
Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
11/28/2005
Manocha
13
Project Accomplishments
GPU-based LOS algorithm
150-200x improvement in LOS query
Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
Region-based visibility algorithms to accelerate LOS
(Supported by ATO)
4-10x further improvement in LOS query
Integrations into OneSAF: 10x simulation speed
improvement in urban environments (3000 entities)
11/28/2005
Manocha
14
Project Accomplishments
GPU-based LOS algorithm
150-200x improvement in LOS query
Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
Region-based visibility algorithms to accelerate LOS
4-10x further improvement in LOS query
Integrations into OneSAF: 10x simulation speed
improvement in urban environments (3000 entities)
GPU-based route planning
10-30X improvement in route computation
10x simulation speed improvement (3000 entities)
11/28/2005
Manocha
15
Project Accomplishments
GPU-based LOS algorithm
150-200x improvement in LOS query
Integration into OneSAF: 15-20x simulation speed
improvement (5000 entities)
Region-based visibility algorithms to accelerate LOS:
4-10x further improvement in LOS query
Integrations into OneSAF: 10x simulation speed
improvement in urban environments (3000 entities)
GPU-based route planning
10-30X improvement in route computation
10x simulation speed improvement (3000 entities)
GPU-based collision detection
10x estimated improvement in collision query
10x simulation speed improvement (150 entities)
11/28/2005
Manocha
16
Project Accomplishments
Successful demonstration at DARPATech’2005;
I/ITSEC’04; I/ITSEC’05 (RDECOM Booth #2266)
Other GPU-based algorithms & applications
Database, data streaming, numerical computation, fluid
dynamics, sorting, motion planning
11/28/2005
Manocha
17
LOS Integration Process
OneSAF/GPU Requirements
(SAIC/UNC)
OneSAF Technical Report
(SAIC)
GPU Algorithm Creation
(UNC)
Integration into OOS
(SAIC)
•
•
•
•
•
Execute Unit Test
(SAIC/UNC)
11/28/2005
Manocha
Add several OpenGL dll’s to ERC libraries
Place c++ header files for OpenGL among the ERC code
Create a new directory among the ERC code
- Setup a new makefile/buildfile, to allow GPU to build as its own library
Add calls to ERC Initialization to:
- Gather all the triangles in the entire database
- Gather all features in the database
- Pass all triangles and features into the initialization for the GPU
Replace all original LOS calls with the GPU counterpart
OneSAF Scenario Creation
(SAIC)
18
OneSAF Benchmark Results
(SAIC)
OneSAF with GPU-based LOS Algorithm:
Demonstration
•
Average time for Standard LOS service call:
1-2 millisecond (w/o GPU-based algorithm)
•
Average time for GPU LOS service call:
8-12 microseconds
Almost 200X speedup for single LOS query
•
15-20x improvement in OneSAF simulation speed in JRTC
terrain with 5000 entities
19
Databases: Predicate Evaluation
CPU implementation — Intel compiler 7.1 with SSE optimizations
(CPU + GPU) is ~20 times faster than only CPU
SIGMOD 2004
11/28/2005
Manocha
20
Comparison on Different GPUs:
Super-Moore’s Law
11/28/2005
Manocha
21
GPUSort: 32-bit floating point
inputs
6
GPUSort
GeForce 6800 Ultra
Sorting Time (seconds)
5
CLAPACK slasrt(),
3.4 GHz Pentium IV
4
Optimized sorting using Hyper-threading,
3.4 GHz Pentium IV
3
2
GPUSort
GeForce 7800 GTX
1
0
0
2
4
6
8
10
12
14
16
Input size (in Millions)
GPUSORT: slashdot.org & Tom’s Hardware guide (750 downloads in 6 weeks)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
LU-Decomposition with Partial
Pivoting (32-bit inputs)
IEEE/ACM SuperComputing 2005
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Project Status
Integration of GPU-based algorithms in OOS
Line-of-sight
Route planning
Collision detection
35 publications in last 18 months
2 best paper awards (Pacific Graphics’04; IEEE VR’05)
Paper presentations on GPU technology in OOS
Poster presentation at Army Science Conference’04
Best paper in Research & Development Track at I/ITSEC’05
– Nominated for best overall paper award at I/ITSEC’05
Other applications: sorting, stream data mining, surgical
simulation, physical simulation, computer animation,
high-performance computing
Other collaborators: NVIDIA, Intel, ATI, AGEIA, Disney
11/28/2005
Manocha
24
Future Goals
Develop novel GPU-based algorithms
Other LOS computations: attenuation, handling smoke
Force and atmospheric simulations
Combine with multi-resolution representations
Handle very large and complex terrains
GPUs clusters for modeling and simulation
Extension to multiple simulation environments, WARSIM,
JMTK, GIG
11/28/2005
Manocha
25