Status-Week.260
Download
Report
Transcript Status-Week.260
Status – Week 260
Victor Moya
Summary
shSim.
GPU design.
Future Work.
Rumors and News.
Imagine.
shSim
Currently working:
Command Processor: reads a text based trace file
(programs, parameters, vertexs, commands to
rasterizer).
Shader: simulates a N multithreaded, variable
latency support, VS1 capable ‘vertex’ shader.
Rasterizer: OpenGL ‘emulator’, accepts resolution
and clip planes changes, recieves ‘shaded’ vertexs
from the shader (only 2 QuadFloats, vertex
positon + color), displays the triangles in a GL
window.
shSim
Tests:
2/4 multithread (with another 2/4 input buffers)
single shader.
Fixed 3 latency cycles. Shader to Rasterizer
latency of 4. CommandProcessor to Rasterizer
latency of 6.
Simple coordinate change traces (shader.input,
shader.input.2).
Ripple vertex shader example from DX8 & DX9
SDK (ripple.input):
Around 300 triangles (1100 vertexs).
Color is calculated from vertex position.
shSim
Ripple.vsh.
shSim
Screenshots from frames rendered by
shSim:
GPU Architecture
Based in current GPUs:
NV30
R300
Based in other graphic processors:
PS3
Imagine
GPU Architecture
Based in an API:
DX8
DX9
DX10
OpenGL 1.4 and extensions.
OpenGL 2.0
Based in an architecture model:
Vector
Scalar
Multithreaded
GPU Specification
Shader Model:
Language:
DX9:
– VS2.0/PS2.0.
– VS3.0/PS3.0.
OpenGL:
– NV_vertex_program_2/NV_fragment_program.
– ARB_vertex_program/ARB_fragment_program.
Our own language.
GPU Specification
Shader Architecture:
Architectural model:
Scalar.
SIMD.
Multithreaded.
Vector.
Out-of-order.
GPU Specification
Configuration:
Integer Unit:
– Number.
– Precission.
– SIMD or scalar?
Float Point Unit:
– Number.
– Precission.
– SIMD or scalar?
GPU Specification
Memory Unit:
– Number.
– Texture modes.
– Filtering modes.
Register Banks:
–
–
–
–
Number.
Ports.
Size.
Scalar or SIMD?
XBOX (NV2A) Vertex Shader
Future Work
Shader:
Command Processor:
Add branch/call/ret instructions.
Add texture instructions (Pixel Shader).
Define a trace specification: binary, gzipped?
Define an interface with OpenGL (Mesa?) or
DX8/DX9 (driver?).
Primitive Assembly:
Implement vertex cache and primitive assembly
(only triangles?).
Implement culling and clipping?
Future Work
Deferred rendering?
Transformed geometry must be stored in video
memory.
Geometry must be sorted:
Tiles.
Front to back.
Rasterization:
Triangle Setup and Fragment Generation.
Any suited method: Olano & Greer, DDA?.
MSAA support?
Future Work
Early Z and Hierarchical Z? Pixel Shader:
Implement unified with vertex shaders?
Queue/buffering mechanism? (memory/texture
latency very large).
Pixel Shader:
Unified shader architecture?
Pixels need a lot of buffering (memory/texture
operations).
Implement a TMU simulator (filter algorithms,
memory access, texture compression, cache).
Future Work
Fixed fragment operations:
Implement using the shader?
Fog: remove?
Pixel Ownership: remove?
Scissor Test: implement (needed if clipping is not
implemented).
Alpha test: same as Z Test.
Z Test and Stencil Test: must be implemented, but
could be added to a generic shader unit?
Blending: add to shader?
Dithering: remove.
Logical Op: remove or add to shader.
MSAA Operations: ?
Future Work
Framebuffer:
Z compression.
Color compression.
SSAA or MSAA support?
News and Rumors
NV30 architecture:
4x2 pixel pipes?
8x zixel pipes (Z Test & Stencil only).
ATI ready to release R350 and RV350 in a
couple of weeks.
R350: Updated R300 core with additional features
(?) and increased clock frequency (375 – 400
MHz).
RV350: value chip based in R300 core. Maybe 8x1
core, 128 bits bus. Clock frequency 300 – 400
MHz. 75 Million transistors.
Imagine
‘Computer Graphics on a Stream
Architecture’, John Douglas Owens, PhD
dissertation.
Not read yet either.