Transcript Document

1
High Dynamic Range
Emeka Ezekwe
M11
Chris Thayer
M12
Shabnam Aggarwal
M13
Charles Fan
M14
4/8/2016
Manager: Matthew Russo
Agenda
2








Project Description
Charles
Marketing
Shabnam
Behavioral Description Emeka
Design Process
Chris
Floorplan Evolution
Shabnam
Layout
Charles
Design Specifications Chris
Conclusion
Emeka
4/8/2016
Project Description
3




The idea behind HDR, High Dynamic Range, is that it allows bright colors
to be bright, dark colors to be dark, and allows details to be seen
clearly.
Without HDR, these colors look distorted and incorrect
HDR is currently very big; requires 48 bits per pixel, which is huge.
 Consumes large amounts of storage space and memory bandwidth
Our goal was to create a chip that implements a more efficient HDR
decoding algorithm.
http://en.wikipedia.org/wiki/Image:Lostcoasttrnsmsn.jpg
http://en.wikipedia.org/wiki/Image:Farcryhdr.jpg
4/8/2016
Marketing
4





NVIDIA
Corporation,
theinterface
worldwide
leader
in
Our
decoder
is designed to
between
specially
encoded
textures stored
on theprocessor
GPU’s memory
and one of the
programmable
graphics
technologies,
GPU’s texture caches that feed into the shader processor.
recently announced the extension of the award Each ROP on nVidia’s g80 is capable of processing 4 pixels
winning
NVIDIA
GeForce
Series
line-up
to include
per clock
cycle. We
plan for8our
hardware
to decode
the
texture
three
newinformation
GPUs. for 4 pixels during each clock cycle.
This decoder will allow smaller textures to be stored in the
Windows Vista also now requires a GPU to realize
GPU’s memory, which will allow graphics cards to provide the
its fullfunctions
graphics
same
withpotential.
less memory.
Ultimately, this decoder can provide savings in cost, power
consumption, heat dissipation, and size in current graphics
cards.
4/8/2016
Algorithmic Description
5

 Encoding
Decoding
(Luminance
(Luminance
values)values)
 Encoding (Chrominance values)

Decoding
Decoding
(Chrominance
(Luminance
values)
Compute
per
pixel
values
Reconstruct
L’ luminancevalues)
p
 Divide
4X4
 LP = (Rthe
)/4 block into 2X2 quads
P + 2G
P + BPpixel
Calculate
To
reconstruct
G
.
L
, Lbias
is addedblock
to L’pare substituted for bits immediately
 The four bits
in pthe
compressed
 RP, GP , and BQ
P are in half float format.
 Compute
the
RQn, +B
GQ+1
,) and
BQsignificant
of eachbits.
quad.
following
the
most
The rest
of pattern
the
bits are set to

G
=
1
–
(R
zeros

Decoding
luminance
does
not
require
floating
point
calculations.
 ConstructQLbias from the
bit
Q pixel
Q luminance with the smallest-valued
zero.
Compute
Because
the
RRGB
, and
BQfor
values
are normalized you can discard GQ

thefinal
differential
each pixel
Q, GQluminance
Find
the
for
each
pixel
and
calculate
 bits(L’
– bits(Lbiasthe
) decoding process.
p) = bits(Lit
p) during



Count the leading zeroes in the bit pattern of the largest L’p. This number will become
nzeros.
Truncate the rest of the L’p bit patterns by dropping the nzeros+1 most significant bits. The
+1 is for the sign bit.
Round the truncated bit patterns to the four most significant bits.
Bit Bit
allocation
for for
encoded
luminance
valuesvalues
allocation
encoded
chrominance
4/8/2016
Data Flow
6
8
Serialize 16
Compute 1 output
Reg
pixel
Reg
7
Reg
7
Reg
4 Reg
4 Reg
4 Reg
4 Reg
Find G
Int to FP
16
Compute 1 Serialize
Reg
output
pixel
Compute 1 Serialize 16 Reg
pixel
output
Compute 1 Serialize 16
Reg
pixel
output
Design Process
7









Initially
we were
goingto
to store
use anNzeros
SRAM along
with one 32 bit register.
Removed
Module
and Lbias
 SRAM would store the entire compressed 16 pixel chunk
 This increased our input count from 97 to 104
 Decided against it because the bit widths for the values we needed are
Removed
the floating
point
multipliers.
irregulardenormal
and would support
require a in
complex
system to
read out
from the
SRAM
Integer
Multiplication is done by Wallace trees and Booth
encoding.
Shooting for 400 Mhz (2 or 3 pipeline stages)
Speed
wasadders
clearly are
our goal,
power and size were also important.
Critical
Carrybutselect.
4Integerpixels per
cycle, 4 cycles
Floating
Point per block
 no wasted cycles like before when storing special luminance values

Decided against the ROM because it was slower, bigger,
and unnecessary. We decided to go with a combinational
implementation for the Int-FP
8
Verification Of Design
Simulations
(need to add the final simulation)
4/8/2016
Stage 1
9
Critical path: 1.533ns
Stage 3
10
Critical path: 2.179ns
4
F
I
N
D
G
7
7
7
11
11Int-FP
11
11
11 16
Serial
Reg 11
Reg
11
R
G
B
B
G
R
Compute
1
Compute
1
11
output
4
Serial
Serial
11
Reg FPFPFPpixelOut11 FPpixel FP- 11Out FP7
16
11
11
11
11
11
Mult
Mult
Mult Mult
Mult
Mult
7
11
11
Serial Reg
Reg
Reg RegReg 4 >> 11 516Int (+)
11
4
Reg
>> 516Int (+)
output
5
3
8
3
5
Reg
Int
to
FP
Find
G
8
Reg
Reg
7
3
5
16 Reg
3
5
Reg
Serial Reg
(+)
Reg 4 >> 11 11output
Int (+)
16
7 RegReg 4 >> 11 Int
Reg
11
11
11
11
11
11
11
R
G
B
B
G
Compute
1
Compute 111 Serial R
4
Serial
11
Reg
FPFPFP- FPFP- 11 Serial
FP-16 Reg
F
pixel
pixel
11
Out
Out
11
4 NIRegMult Mult
Mult Mult Mult output
Mult
11
11
11
11
11
11
D 7
G 77
Int-FP
11
Floorplan Evolution
After
Finalsome
and
complete
optimizations…
layout!!!!
Initial
Better
Floorplan
Floorplan
Revised
Initial
Floorplan
(still bad!)
Close
to
final
Floorplan
4/8/2016
12
Full Chip Layout
Including functional blocks (need to add layer masks)
4/8/2016
Design Specifications
13


Delays

Stage one pipeline: 1.5 ns

Stage two pipeline: 1.53ns

Stage three pipeline: 2.179ns

maximum clock to Q: 300ps
Size: 442x453 microns

Aspect Ratio: 1:1.024
Transistors: 42,772
 Density: 0.21 T/micron^2
 Our current clock speed: 403MHz

4/8/2016
Design Specifications- Components
14

Chart including each large component with their
transistor count, area, density
 Speed in schematic and layout, and delay in both
schematic and layout
4/8/2016
Concluding Remarks
15

Need to add screenshot of last simulation,
schematics(?), layer masks (simple), and circuit specs
(also simple)
4/8/2016