Transcript Direct3D 9

Direct3D 9
Or why programmable hardware kicks ass
Matthew M Trentacoste
Introduction
► Direct3D
API has changed fundamentally to
meet changes in hardware
► API has been adjusted to fit the paradigm
shift that has occurred in real-time graphics
► Adapted to the fact that someone
programming real-time graphics is writing
code for 2 asymmetric processors, the GPU
and CPU
Differences
► OpenGL
not as much bad, as outdated
► OpenGL was very well designed, but that was 15
years ago
► Everything that has happened in real-time
graphics since then has been stapled on
► It is more of a pain for beginners learning to
program graphics in Direct3D
► But much more elegant once you are experienced
enough to fully utilize the functionality provided
► Much less of a state machine than OpenGL
Differences (2)
► Has
no immediate mode, can’t just specify
vertices, colors etc… directly from code
► Built around a stream based model of data
► All data must be put into a buffer of elements to
be loaded onto the hardware
► Trying to gracefully give control of the flow of data
between CPU and GPU while still being efficient
► API is getting there, streamlining of functionality
means fewer objects to accomplish all tasks
Direct3D 9 API Object List
►
►
►
►
►
►
►
►
►
►
IDirect3DSwapChain9
IDirect3DTexture9
IDirect3DVolume9
IDirect3DVertexBuffer9
IDirect3DIndexBuffer9
IDirect3DSurface9
IDirect3DStateBlock9
IDirect3DVDecl9
IDirect3DVertexShader9
IDirect3DPixelShader9
(back buffers)
(textures)
(volume textures)
(vertex lists)
(index lists)
(render targets)
(render state container)
(vertex format)
(vertex shader)
(pixel shader)
Other Reason D3D rocks
► D3DX!!!!
► All
the math you could possibly need for
graphics already written
► Vectors, matrices, quaternions, textures,
models, etc…
► Optimized code using all special instruction
sets (3Dnow, SSE2, and what not)
► Best solution for almost anything you could
want to do, unless some crazy special case
Cool Shit
► Still
with me?
► High-order
primitives
► Adaptive tessellation
► Displacement maps
► And
pretty pictures of them
Higher Order Primitives
►
►
►
►
►
Current primitives are not ideal for representing smooth
surfaces
Direct3D 9 supports points, lines, triangles, and grid
primitives
Higher-order interpolation methods, such as cubic
polynomials, allow more accurate calculations in
rendering curved shapes
The application need only provide a desired level of
tessellation
Transmit the data using standard triangle syntax that
includes normal vectors
Adaptive Tessellation
► Adaptively
tessellates a patch, based on the depth
value of the control vertex in eye space
► Tessellation level computed per-vertex
 From API value scaled by 1.0 / Zeye
► Then
surface is tessellated accordingly
► API takes triangles, defines high order surfaces
from them, and then tessellates those surfaces as
needed
► Meaning : more detail the closer you get
Demo Time #1
Displacement Mapping
►
►
►
►
►
Adaptive tessellation enables us to use a texture
to deform a surface
A texture of a height field is spread across a
high-order surface
Tessellates surface until the detail of the
geometry is high enough to represent height field
Changes shape of surface to match displacement
as opposed to merely modifying the surface
normal vector to appear like a deformed surface
What bump maps wish they were
Demo Time #2
DirectX Graphics Architecture
Vec0
Vec2
Vertex
Shader
VB
Pos
Image
Surface
Vec1
Color
TC1
Tex0
Primitive Ops
Tex1
Pixel
Shader
Tex2
Samplers
Output pixels
Vec3
Vector
Data
Geometry
Ops
TC2
Vertex
Components
Pixel
Ops
Pipeline Overview
► Create
VertexBuffer
► Set up Vertex Stream
► Define VertexDecl
► Vertex Shader Object
► Pixel Shader Object
► FrameBuffer blender
(where model goes)
(put model there)
(what data means)
(operate on model)
(render model)
(add image of model to scene)
Vertex Declaration Object
► New
syntax for describing vertex formats for
DMA engine and tessellator behavior
► New object IDirect3DVDecl9
► Separately createable
 CreateVertexDeclaration()
► Separately
settable
 SetVertexDeclaration()
 Settable independent of vertex shader
Default Semantics
►
VertexDecl now supports “usage” field

Position, Normal, Tangent, Binormal, etc.
Provided to enable default semantics
► Allows implementation to connect shaders
together without requiring a fixed register
convention
► Acts as symbol table for run-time linking of
shaders to core API and therefore hardware
► No addl. policy is imposed over DirectX 8
►

►
Default semantics can be overridden
Deals with concepts, not memory addresses
DirectX 8 Vertex Declaration
Strm0
Strm1
Vertex layout
v0
skip
vs 1.1
mov r0, v0
…
v1
Declaration
Shader handle
Shader program
New Vertex Declaration
Strm0
Strm1
Strm0
Vertex layout
pos norm diff
Declaration
pos norm diff
vs 1.1
vs 1.1
dcl_position v0
dcl_position v0
dcl_diffuse v1
dcl_diffuse v1
Shader program
(Shader handle)
mov r0, v0
mov r0, v0
…
…
Vertex Shader Architecture
Vec0
Vec1
Vec2
Vec3
Vec4 … Vec15
A0
R0
Const0
R1
Const1
R2
Const2
Vertex ALU
R3
Const3
…
…
R11
Const95
Hpos
TC0
TC1
TC2
TC3
Color0 Color1
Vertex Shaders
Vertex Shader 2.0 Register Reference
Name
r/w?
Description
Count
Port Count
bn
r
Boolean
16
1
in
r/w
Loops-3
16
1
an
*
4-D Address
1
1
cn
rn
r
r/w
Constant
Temporary
256
12
1
3
vn
r
Vertex input
16
1
*an Can only be written to by mov and result used as integer offset in
relative addressing
Note: Port Count = number of times a different register of that class can
be used in single instruction
Math Instructions
►
Parallel ops (componentwise):
 add, sub, mul, mad, frc, cmp
►
Vector ops
 dp3, dp4
►
Scalar ops:
 rcp, rsq, exp2, log2
►
Macros
 LRP, NRM3, POW, CRS, SINCOS, SGN, ABS
Vertex Shaders
Instruction reference
max
Maximum
min
Minimum
sge
Set on greater or equal than
slt
Set on less than
rcp
Reciprocal
rsq
Reciprocal square root
expp
Exponential 16-bit precision
logp
Logarithm 16-bit precision
Vertex Shader Flow Control
DirectX 9 vertex shaders vs2.0 supports flow
control
► Result is “Structured Assembly” language
► Control logic based on constants only
► Required by ISVs to solve
►
 Enable/Disable environment mapping, etc.
 “varying # of lights” problem
 Brings support == to nonprogrammable
►
Ideally better skinning approach
 “varying # of bones” problem
Instruction Counts vs. Slots
Flow control means slots != counts
► Instruction store is 256, but more instructions
can be executed than are stored
► Executed instruction count limit is higher
►

Recommend to not exceed 1024
Sampler State Separation
► TextureStageState
(TSS) has been split
 One category for Texture Sampler data
 One category for Texture Iterator control
► Why?
 Sampler State has 16 elements as 16 textures may be
sampled in one pass
 Other state has only 8 elements
 Much of this state is for legacy pipelines
► All
enum indices remain the same
 DDI impact is minimal
Pixel Shaders
► Float
data precision supported
► Enables photoreal rendering of high-dynamic
range scenes - cf Debevec
► Pixel shader ALU must support
 At least s10e5 precision for color data
 At least s17e6 precision for all other data
►Any
inputs data of 32-bit float such as texture iterators
or reads of 32-bit float texture formats
► _pp
modifier supported on any instruction
 Highlights operations where reduced precision is
acceptable for performance
Demo Time #3
Pixel Shader 2.0 Architecture
v0
v1
t0
t1
t2
t3
t4
…
t7
r0
c0
r1
c1
r2
c2
Pixel ALU
r3
c3
…
…
r11
c31
oC0
oC1
oC2
oC3
Pixel Shaders
Pixel Shader 2.0 Register Reference
Name
r/w?
Description
Count
Port Count
vn
r
2
1
tn
r/w
8
1
rn
cn
r/w
r
Color
Texcoord
Iterators
Temporary
12
32
3
1*
sn
r
16
1
►
Constant
Texture
Samplers
Port Count = # of times different registers of same class can be
used in one instruction
Texture Load Instructions
►
►
3 instructions provided in ps_2_0
Standard texture load:
texld r0, t1, s3
►
Texture with per-pixel LOD bias:
texldb r0, t0, s2
 Bias value stored in t0.w
►
Projected texture load:
texldp r1, t2, s0
 Does perspective divide before lookup
Dependent Reads
►
►
►
Can be serialized, but only to a max depth
of 4:
dcl t0.xy;
dcl_2d s0.rg;
texld r0, s0,
texld r1, s1,
texld r2, s1,
texld r3, s1,
Is legal
t0;
r0;
r1;
r2;
Dependent Reads Rock
► What’s
so great?
► Textures become functional maps
► Any continuous function that takes up to 3
inputs and produces up to 4 outputs can be
stored as a texture
► Pre-compute results and store in texture
► Load texture at coordinates of input
► Returns output as value at that point
Dependent Reads Rock
► Allows
for results far too complicated to be
calculated in real-time to be used on GPU
with minimal cost
► Stop thinking of textures as mere images,
but stores of data
► Lookup tables, noise generators, and most
arbitrary functions are all capable of being
emulated in current hardware quickly
Multi-Render Target (MRT)
► Step
towards rationalizing textures and
vertex buffers
► Allow writing out multiple values from a
single pixel shader pass
 Up to 4 color elements plus Z/depth
 Facilitates multipass algorithms
► Can
have a pixel shader output 4 vector-4s
+ depth for each pixel
► That is 17 pieces floating point of data that
can be stored
MRT Example : Depth of Field
Original
Alpha of Original
Blurred Result
The images on the left are the original. The center
is the alpha map. Black is in focus, white is out of
focus. We can move the focal plane anywhere we
like.
MRT Example : Edge Detection
World Space Normals
Edge Detect
Eye Space Depth
►
Outlines
Edge Detection, Images courtesy of ATI
Technologies, Inc.
MRT Example : Edge Detection
►
Composite outlines to get a cell-shaded
effect. Images courtesy of ATI
High Level Shader Language
►
►
►
►
►
Why?
Because assembly sucks
Allows all the things that make C so much
better than machine code
Can separate pixel and vertex shader code
from data
No longer have to map elements of a
stream to registers, done semantically
®
DirectX
8 Assembly
tex t0
tex t1
; base texture
; environment map
add r0, t0, t1
; apply reflection
DirectX 9 HLSL Syntax
outColor =
tex2d( baseTextureCoord, baseTexture )+
texCube( EnvironmentMapCoord, Environment );
Maybe more characters, but makes much more sense
Datatypes
► Ints,
bools, floats, etc…
► All the things you know and love
► Plus things that make graphics easy like
vectors and matrixes
► 1x1 up to 4x4 first order floating point data
► matrix4x4 not matrix[4][4]
► All operations designed to operate on up to
4x4 data-types natively
DirectX 8 Vertex Declaration (again)
Strm0
Strm1
Vertex layout
v0
skip
vs 1.1
mov r0, v0
…
v1
Declaration
Shader handle
Shader program
New Vertex Declaration (again)
Strm0
Strm1
Strm0
Vertex layout
pos norm diff
Declaration
pos norm diff
vs 1.1
vs 1.1
dcl_position v0
dcl_position v0
dcl_diffuse v1
dcl_diffuse v1
Shader program
(Shader handle)
mov r0, v0
mov r0, v0
…
…
Vertex Shader Input Semantics
►
►
►
►
►
►
►
►
►
►
position[n]
blendweight[n]
blendindices[n]
normal[n]
psize[n]
diffuse[n]
specular[n]
texcoord[n]
tangent[n]
binormal[n]
untransformed position
skinning blending weight
skinning blending indices
normal vector
point size (particle system)
diffuse (matte) color
specular (shiny) color
texture coordinates
these two with normal vector
make a 3D coordinate system
VS output / PS input
semantics
►
►
►
►
►
Position
Psize
Fog
color[n]
texcoord[n]
transformed position
Pointsize
Fog blending value
Computed colors
Texture coordinates
Uses for Semantics
►
A data binding protocol:




►
Between
Between
Between
Between
vertex data and shaders
pixel and vertex shaders
pixel shaders and hardware
shader fragments
One smooth process of describing the flow
of data in an out of various elements of the
render process
So…
► Yeh,
we got all this programmable hardware
► What does it really give us?
OPTIONS!!!
► Are
finally able to compute what you want
► No longer the fixed function pipeline’s bitch
► Can render Pong, even Wolfenstein on GPU
► Think of the GPU as a signal processor of vertex
and pixel data, not merely rendering pictures
Finally
► All
graphics that use the fixed function
pipeline, ie. Standard Lighting Equation
fundamentally look the same
► Many hacks to work around
► But still stuck with:
ambient + diffuse + specular
► Allows graphics programmers to tailor the
look of their work to fit the content
Choose Your Look
►
►
►
►
►
►
►
►
Pick a unique “Look” and do it
Toon
several methods
Cheesy
unlit or flat shaded
Retro
standard FF pipelines
Radiosity
soft lighting only
Shadows
horror movie, Doom III
Gritty
ultra realistic
And many more
Time for Hands On
Hemisphere Model
Sky Color
Ground Color
Hemisphere Model
Final Color
Distributed Light Model
Hemisphere of possible incident light directions
q
Surface Normal
Microfacet Normal
- defines axis of hemisphere
2-Hemisphere Model
Sky Color
q
Ground Color
Distributed Light Model
Hemisphere of possible incident light directions
q
Microfacets
Other facets can shadow this one: Occlusion
Ray Cast Occlusion Model
Microfacet
Some rays hit this object, others miss it
Occlusion Representations
►
►
Can store result in various ways
Compute ratio of hits / misses
 Occlusion Factor
 A single scalar parameter
 Should weight with cosine
►
►
Use to blend in shadow color
Sufficient for hemisphere lighting
Hemisphere Lighting +Occlusion
Sky Color
Ground Color
Object Color
Sphere Model
Occlusion Factor
Final Color
Back to Work