Transcript 1 - CMU/ECE
1
High Dynamic Range
Emeka Ezekwe
M11
Christopher Thayer M12
Shabnam Aggarwal
M13
Charles Fan
M14
3/25/2017
Manager: Matthew Russo
Agenda
2
Project Description
Charles
Marketing
Shabnam
Behavioral Description Emeka
Design Process
Chris
Floorplan Evolution
Shabnam
Design Specifications Chris
Layout
Charles
Conclusion
Emeka
3
Project Description
Charles Fan
Project Description
4
High
Range??
FP HDRDynamic
Format requires
48 bits per pixel
Bright
colors
BRIGHT
Problem:
Tooare
much
storage space & memory bandwidth!!
Dark colors are DARK
Solution: HDR encoding yields 6:1 compression
Details are seen CLEARLY
Otherwise…
OUR GOAL: Implement efficient HDR decoding in hardware
Colors and lights look distorted & bland
6:1 pixel compression
Increases useable storage space by 6 fold
decrease memory bandwidth by 6 fold
Effectively increases performance
6
Marketing
Shabnam Aggarwal
Marketing
7
AMD’s(Organic
ATI Mobility
OLED
LightRadeon
EmittingX1900
Diode) Displays are
48-bit
floating point
HDR
being
developed
by Sony
HDR Compression
is currently NOT supported
Contrast
Ratio: 1000000:1
Performance hit deters developers
Windows Vista also now requires a high end GPU
to realize its full graphics potential.
Laptops & portable devices are using dedicated
processors for graphics
Marketing
9
Our decoder is designed to interface between specially
encoded textures stored on the GPU’s memory and one
of the GPU’s texture caches that feed into the shader
processor.
Each ROP on (**ATI) is capable of processing 4
pixels per clock cycle. We plan for our hardware to
decode the texture information for 4 pixels during
each clock cycle.
This decoder will allow smaller textures to be stored in
the GPU’s memory, which will allow graphics cards to
provide the same functions with less memory.
Ultimately, this decoder can provide savings in cost,
power consumption, heat dissipation, and size in current
graphics cards.
Our HDR
Decoder!!
Marketing
10
Our HDR Decoder:
Smaller textures stored in GPU’s memory
Same functions…less memory
Savings in:
Cost
Power consumption
Heat dissipation
Size
HDR is the next generation of display technology
11
Behavioral & Algorithmic Description
Emeka Ezekwe
Algorithmic Description
Encoding
Break
texture into 4X4 pixel blocks.
Extract luminance value of each pixel.
Normalize red and blue values and average over each
2X2 block.
Green
Allocate
can be recalculated while decoding.
more bits to luminance values.
After encoding, a 4X4 block of pixels can be
compressed from 48 bpp to 8 bpp.
Algorithmic Description
Decoding (Luminance values)
Reconstruct
Lp
1
Logical shift
1 Integer addition
Calculate
1
Integer addition
Calculate
3
final pixel values
floating-point multiplications
Total
1
GQ
calculations
logical shift + 2 Integer additions + 3 floating-point
multiplications
Data Flow
14
8
16
Reg
Compute 1
pixel
7
Reg
7
Reg
4 Reg
4 Reg
4 Reg
4 Reg
Find G
Int to FP
16
16
16
Compute 1
pixel
16
16
16
Compute 1
pixel
16
16
16
Compute 1
pixel
16
16
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Serialize
output
Serialize
output
Serialize
output
Serialize
output
15
Design Process
Chris Thayer
Design Process
16
Goal: Speed
400 MHz
4 pixels per cycle, 4 cycles per block
Architectural
decisions
No denormal support in Floating Point Multiplier
Pipelined design
Storing input values
Integer Multiplication
Wallace trees
Booth encoding
Critical adders
Carry select
Integer- Floating Point Conversion
Design Process
Circuit level decisions
Mirror FA’s to reduce carry-chain delay
Two different HA’s
AOI/OAI gates
Gate sizing along critical paths
Utilize Q and ~Q outputs from registers
Clock buffers built into register blocks
Double/Triple strapped VDD and GND
Repeaters to break up long wires
Balanced clock tree
Device Folding
Verification Process
18
C Implementation
Structural Verilog
Gate Level Schematic
Layout
Major Modules
Pipeline Stages
Global Signals
19
Floorplan Evolution
Shabnam Aggarwal
Floorplan Evolution
21
Design Specifications
Chris Thayer
Design Specifications
22
Delays
Stage one pipeline: 1.8 ns
Stage two pipeline: 1.53ns
Stage three pipeline: 2.479ns
Skew
Resulting Clock Speed: 500 MHz
2 BILLION pixels per second
Size: 442x453 microns
Stage one: x
Stage two: x
Stage three: x
Aspect Ratio: 1:1.024
Transistors: 42,772
23
Layout
Charles Fan
24
Floating Point Multiplier Layout
Pretty beautiful
Floating Point Multiplier Data Flow
Poly Layer
26
Metal One Layer
27
Metal Two Layer
28
Metal Three Layer
29
Metal Four Layer
30
Questions?