Status-Week.260

Download Report

Transcript Status-Week.260

Status – Week 260
Victor Moya
Summary
shSim.
 GPU design.
 Future Work.
 Rumors and News.
 Imagine.

shSim

Currently working:



Command Processor: reads a text based trace file
(programs, parameters, vertexs, commands to
rasterizer).
Shader: simulates a N multithreaded, variable
latency support, VS1 capable ‘vertex’ shader.
Rasterizer: OpenGL ‘emulator’, accepts resolution
and clip planes changes, recieves ‘shaded’ vertexs
from the shader (only 2 QuadFloats, vertex
positon + color), displays the triangles in a GL
window.
shSim

Tests:




2/4 multithread (with another 2/4 input buffers)
single shader.
Fixed 3 latency cycles. Shader to Rasterizer
latency of 4. CommandProcessor to Rasterizer
latency of 6.
Simple coordinate change traces (shader.input,
shader.input.2).
Ripple vertex shader example from DX8 & DX9
SDK (ripple.input):


Around 300 triangles (1100 vertexs).
Color is calculated from vertex position.
shSim

Ripple.vsh.
shSim

Screenshots from frames rendered by
shSim:
GPU Architecture

Based in current GPUs:
NV30
 R300


Based in other graphic processors:
PS3
 Imagine

GPU Architecture

Based in an API:






DX8
DX9
DX10
OpenGL 1.4 and extensions.
OpenGL 2.0
Based in an architecture model:



Vector
Scalar
Multithreaded
GPU Specification

Shader Model:

Language:

DX9:
– VS2.0/PS2.0.
– VS3.0/PS3.0.

OpenGL:
– NV_vertex_program_2/NV_fragment_program.
– ARB_vertex_program/ARB_fragment_program.

Our own language.
GPU Specification

Shader Architecture:

Architectural model:
Scalar.
 SIMD.
 Multithreaded.
 Vector.
 Out-of-order.

GPU Specification

Configuration:

Integer Unit:
– Number.
– Precission.
– SIMD or scalar?

Float Point Unit:
– Number.
– Precission.
– SIMD or scalar?
GPU Specification

Memory Unit:
– Number.
– Texture modes.
– Filtering modes.

Register Banks:
–
–
–
–
Number.
Ports.
Size.
Scalar or SIMD?
XBOX (NV2A) Vertex Shader
Future Work

Shader:



Command Processor:



Add branch/call/ret instructions.
Add texture instructions (Pixel Shader).
Define a trace specification: binary, gzipped?
Define an interface with OpenGL (Mesa?) or
DX8/DX9 (driver?).
Primitive Assembly:


Implement vertex cache and primitive assembly
(only triangles?).
Implement culling and clipping?
Future Work

Deferred rendering?


Transformed geometry must be stored in video
memory.
Geometry must be sorted:



Tiles.
Front to back.
Rasterization:

Triangle Setup and Fragment Generation.


Any suited method: Olano & Greer, DDA?.
MSAA support?
Future Work

Early Z and Hierarchical Z? Pixel Shader:



Implement unified with vertex shaders?
Queue/buffering mechanism? (memory/texture
latency very large).
Pixel Shader:



Unified shader architecture?
Pixels need a lot of buffering (memory/texture
operations).
Implement a TMU simulator (filter algorithms,
memory access, texture compression, cache).
Future Work

Fixed fragment operations:










Implement using the shader?
Fog: remove?
Pixel Ownership: remove?
Scissor Test: implement (needed if clipping is not
implemented).
Alpha test: same as Z Test.
Z Test and Stencil Test: must be implemented, but
could be added to a generic shader unit?
Blending: add to shader?
Dithering: remove.
Logical Op: remove or add to shader.
MSAA Operations: ?
Future Work

Framebuffer:
Z compression.
 Color compression.
 SSAA or MSAA support?

News and Rumors

NV30 architecture:



4x2 pixel pipes?
8x zixel pipes (Z Test & Stencil only).
ATI ready to release R350 and RV350 in a
couple of weeks.


R350: Updated R300 core with additional features
(?) and increased clock frequency (375 – 400
MHz).
RV350: value chip based in R300 core. Maybe 8x1
core, 128 bits bus. Clock frequency 300 – 400
MHz. 75 Million transistors.
Imagine
‘Computer Graphics on a Stream
Architecture’, John Douglas Owens, PhD
dissertation.
 Not read yet either.
