Transcript 1 - CMU/ECE

1
High Dynamic Range
Emeka Ezekwe
M11
Christopher Thayer M12
Shabnam Aggarwal
M13
Charles Fan
M14
3/25/2017
Manager: Matthew Russo
Agenda
2








Project Description
Charles
Marketing
Shabnam
Behavioral Description Emeka
Design Process
Chris
Floorplan Evolution
Shabnam
Design Specifications Chris
Layout
Charles
Conclusion
Emeka
3
Project Description
Charles Fan
Project Description
4


High
Range??
FP HDRDynamic
Format requires
48 bits per pixel








Bright
colors
BRIGHT
Problem:
Tooare
much
storage space & memory bandwidth!!
Dark colors are DARK
Solution: HDR encoding yields 6:1 compression
Details are seen CLEARLY
Otherwise…
OUR GOAL: Implement efficient HDR decoding in hardware
 Colors and lights look distorted & bland
6:1 pixel compression

Increases useable storage space by 6 fold

decrease memory bandwidth by 6 fold

Effectively increases performance
6
Marketing
Shabnam Aggarwal
Marketing
7


AMD’s(Organic
ATI Mobility
OLED
LightRadeon
EmittingX1900
Diode) Displays are
 48-bit
floating point
HDR
being
developed
by Sony
 HDR Compression
is currently NOT supported
 Contrast
Ratio: 1000000:1


Performance hit deters developers
Windows Vista also now requires a high end GPU
to realize its full graphics potential.
 Laptops & portable devices are using dedicated
processors for graphics
Marketing
9



Our decoder is designed to interface between specially
encoded textures stored on the GPU’s memory and one
of the GPU’s texture caches that feed into the shader
processor.
 Each ROP on (**ATI) is capable of processing 4
pixels per clock cycle. We plan for our hardware to
decode the texture information for 4 pixels during
each clock cycle.
This decoder will allow smaller textures to be stored in
the GPU’s memory, which will allow graphics cards to
provide the same functions with less memory.
Ultimately, this decoder can provide savings in cost,
power consumption, heat dissipation, and size in current
graphics cards.
Our HDR
Decoder!!
Marketing
10



Our HDR Decoder:
 Smaller textures stored in GPU’s memory
 Same functions…less memory
Savings in:
 Cost
 Power consumption
 Heat dissipation
 Size
HDR is the next generation of display technology
11
Behavioral & Algorithmic Description
Emeka Ezekwe
Algorithmic Description

Encoding
 Break
texture into 4X4 pixel blocks.
 Extract luminance value of each pixel.
 Normalize red and blue values and average over each
2X2 block.
 Green
 Allocate
can be recalculated while decoding.
more bits to luminance values.
 After encoding, a 4X4 block of pixels can be
compressed from 48 bpp to 8 bpp.
Algorithmic Description

Decoding (Luminance values)
 Reconstruct
Lp
1
Logical shift
 1 Integer addition
 Calculate
1
Integer addition
 Calculate
3
final pixel values
floating-point multiplications
 Total
1
GQ
calculations
logical shift + 2 Integer additions + 3 floating-point
multiplications
Data Flow
14
8
16
Reg
Compute 1
pixel
7
Reg
7
Reg
4 Reg
4 Reg
4 Reg
4 Reg
Find G
Int to FP
16
16
16
Compute 1
pixel
16
16
16
Compute 1
pixel
16
16
16
Compute 1
pixel
16
16
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Reg
Serialize
output
Serialize
output
Serialize
output
Serialize
output
15
Design Process
Chris Thayer
Design Process
16

Goal: Speed


400 MHz
4 pixels per cycle, 4 cycles per block
 Architectural






decisions
No denormal support in Floating Point Multiplier
Pipelined design
Storing input values
Integer Multiplication
 Wallace trees
 Booth encoding
Critical adders
 Carry select
Integer- Floating Point Conversion
Design Process

Circuit level decisions
Mirror FA’s to reduce carry-chain delay
 Two different HA’s
 AOI/OAI gates
 Gate sizing along critical paths
 Utilize Q and ~Q outputs from registers
 Clock buffers built into register blocks
 Double/Triple strapped VDD and GND
 Repeaters to break up long wires
 Balanced clock tree
 Device Folding

Verification Process
18
C Implementation
 Structural Verilog
 Gate Level Schematic
 Layout
 Major Modules
 Pipeline Stages
 Global Signals

19
Floorplan Evolution
Shabnam Aggarwal
Floorplan Evolution
21
Design Specifications
Chris Thayer
Design Specifications
22


Delays

Stage one pipeline: 1.8 ns

Stage two pipeline: 1.53ns

Stage three pipeline: 2.479ns
Skew




Resulting Clock Speed: 500 MHz


2 BILLION pixels per second
Size: 442x453 microns


Stage one: x
Stage two: x
Stage three: x
Aspect Ratio: 1:1.024
Transistors: 42,772
23
Layout
Charles Fan
24
Floating Point Multiplier Layout
Pretty beautiful
Floating Point Multiplier Data Flow
Poly Layer
26
Metal One Layer
27
Metal Two Layer
28
Metal Three Layer
29
Metal Four Layer
30
Questions?