Implementing
Download
Report
Transcript Implementing
1
MAD MAC 525
Farhan Mohamed Ali
Jigar Vora
Sonali Kapoor
Avni Jhunjhunwala
Design Manager: Zack Menegakis
1st May, 2006
Final Presentation
Design a crucial part of a GPU called the Multiply Accumulate
Unit (MAC) which is revolutionizing graphics
2
Agenda
Marketing – Jigar
Project and Algorithm Description – Farhan
Implementation Part I – Farhan
Implementation Part II – Sonali
Floorplan – Sonali
Layout – Avni
Verification – Avni
Design Specifications – Avni
Conclusion – Jigar
3
Marketing
Jigar
4
Purpose
MAD MAC 525 accelerates FP16 blending to
enable true HDR graphics
Huh??
5
Marketing Description Implementing Floorplan Layout Verify Specifications
6
Beauty of High Dynamic Range
With HDR rendering, pixel intensity can extend
beyond the range of traditional graphics
Nature
doesn’t have a limited pixel intensity and
neither should Computer Graphics
In other words:
Bright things can be really bright
Dark things can be really dark
And the details can be seen in both
7
Marketing Description Implementing Floorplan Layout Verify Specifications
Applications of HDR
8
Marketing Description Implementing Floorplan Layout Verify Specifications
Target Market
Target Market Segment
Graphic chip manufacturers
High speed DSP manufacturers
CPU co-processors
Potential Customers
9
Marketing Description Implementing Floorplan Layout Verify Specifications
Design Comparison
Top 180nm graphics chip is the NVIDIA NV16.
Highest speed only 250MHz
9 bit Integer precision
As games are becoming more advanced, they are
in need of fast graphics chips
Conclusion:
Market Needs a FAST MAD MAC
10
Marketing Description Implementing Floorplan Layout Verify Specifications
Description and Implementation I
Farhan
11
Project Description
• Multiply Accumulate unit (MAC)
• Executes function AB+C on 16 bit floating point inputs.
• Format – 1 bit sign, 5 bit exponent and 10 bit
significand
• Multiply and add in parallel to greatly speed up operation
• Rounding performed only once so greater accuracy than
individual multiply and add functions.
• Also known as:
• Fused Multiply Add (FMA)
• Multiply Add (MAD/MADD) in graphics shader programs
12
Marketing Description Implementing Floorplan Layout Verify Specifications
Algorithm
FP Multiply (A*B)
Multiply significands
Add exponents
Normalize
Round
FP Add (A+B)
Align smaller number to larger number
Add significands
Normalize
Round
13
Marketing Description Implementing Floorplan Layout Verify Specifications
Algorithm
FP Multiply-Add (AB+C)
Align sig C based on exp A+B-C
Multiply significands A and B
Add sig A*B result to aligned sig C
Normalize
Round
14
Marketing Description Implementing Floorplan Layout Verify Specifications
Block Diagram
A
B
C
Exp Calc
Multiplier
Adder
Align
Leading 0
Anticipator
Normalize
Round
Y
Output
Ovf Checker
15
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation
Design target: 300MHz
Speed is the design goal
Ambitious target?
How we planned achieve this
Fast Logic – parallelize ops as much as possible
Pipelining
16
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation
Adder
Carry Select vs Carry Lookahead tree
17
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation
Adder
Han-Carlson based carry lookahead adder
6 lookahead logic stages for 32 bit adder
Less logic than a Kogge-Stone adder
Less wiring than a Brent-Kung adder
18
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation
Multiplier
Carry-Save Multiplier
Avoids having ripple carry in every stage
Enables regular and compact layout
Easy to pipeline
Final 10 bit add stage using carry lookahead adder
19
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation
Leading Zero Anticipator
Predicts number of shifts to do in normalize
Normalize begins with zero delay
Operates in parallel with adder so normalize shifts can
be predicted with accuracy of 1 shift to left or right
20
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation
Latches
Pulse Latches
Practically eliminates setup time
16 transistors per pulse generator
Simplified version of those used in a certain high speed CPU
Clock pulse
generator
21
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation II and Floorplan
Sonali
22
Design Decision: Pass Logic
Extensive use of Pass Logic
Reduces transistor count
Reduces area
Transistor count reduced from 20,200 to 12,800
Example
Normalize: 3400 -> 942
Align: 1500 -> 530
Ensure all pass logic is buffered
23
Marketing Description Implementing Floorplan Layout Verify Specifications
Design Decision: Pipelining
Initially planned 6 pipeline stages
Reduced to 4 pipeline stages
Adder – Fast Carry Lookahead
architecture
Multiplier – Ripple Carry to Carry
Lookahead
24
Marketing Description Implementing Floorplan Layout Verify Specifications
Pipeline Stages
Reg A
Reg C
Exp
Calc
Reg B
Multiplier
Align
C
Ld
Zero
Adder
Normalize
Round
Output
25
Marketing Description Implementing Floorplan Layout Verify Specifications
Schematics
INPUTS
P
I
P
E
L
I
N
E
Multiplier
PIPELINE
O
U
T
P
U
T
S
OUTPUTS
26
Marketing Description Implementing Floorplan Layout Verify Specifications
Schematic
Adder
OUTPUTS
Sum Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
INPUTS
27
Marketing Description Implementing Floorplan Layout Verify Specifications
Floorplan Evolution
Initial Floorplan
Reg A
Reg B
Reg C
Exp
Calc
Multiplier
Align
C
Pipeline Reg
Ld
Zero
Pipeline Reg
Adder
Pipeline Reg
Round
Normalize
Overflow checker
Reg Y
28
Marketing Description Implementing Floorplan Layout Verify Specifications
Floorplan Evolution
Final Floorplan
Reg C
Reg A
Reg B
Exponents
Output
Ld
zero
R
o
u
n
d
O
v
f
Align
N
o
r
m
a
l
i
z
e
Multiplier
Adder
29
Marketing Description Implementing Floorplan Layout Verify Specifications
Layout, Verification & Specification
Avni
30
Layout Decisions
3 cell heights – 6.03, 5.04 and 3.55
Uniform width vdd and ground rails
Wider vdd and ground rails in power hungry
modules
Max of 8 latches per clock pulse generator
Uniform metal directionality within each
block
31
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
32
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
MULTIPLIER
33
Marketing Description Implementing Floorplan Layout Verify Specifications
Multiplier
Height: 191.6
Width: 206.38
Area: 20,388
IN
B
I
T
S
L
I
C
E
I
N
P
I
P
E
L
I
N
E
O
U
T
P
U
T
R
E
G
OUTPUT
34
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
MULTIPLIER
ADDER
35
Marketing Description Implementing Floorplan Layout Verify Specifications
Adder
Height:122.9
Width: 110.2
Area:13,202
ADDER
INCREMENTER
36
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
Input
Input
Exponents
Ld
zero
Align
O O
U v
T f
R
o
u
n
d
N
o
r
m
a
l
i
z
e
Multiplier
Adder
37
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Active: 14.04%
38
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Poly : 9.25%
39
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 1 : 34.08%
40
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 2 : 18.00%
41
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 3 : 14.99%
42
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 4 : 6.23%
43
Marketing Description Implementing Floorplan Layout Verify Specifications
Verification Of Design
Behavioral and Structural Verilog
Schematic and Layout testing
Extensive Testing – Unable to find C or Matlab
Code
Analog Simulations – Compare Output with
Behavioral
Full Chip Verification
44
Marketing Description Implementing Floorplan Layout Verify Specifications
Design Specifications
Critical path delay = 2.25ns
Clock speed = 400MHz
Pipeline stages = 4
Height by width = 195.26 um * 303.255 um
Area = 59,214 um^2
Aspect ratio = 1:1.55
Transistor density = 0.22
Total Pin Count = 67
45
Marketing Description Implementing Floorplan Layout Verify Specifications
Schematic
Layout
Schematic
Power: mW Power: mW Power: mW
(400 MHz) (400 MHz) (100 MHz)
Layout
Power: mW
(100 MHz)
Multiplier
-w/ pipeline
2.281
2.354
0.6168
0.6297
Exponents
0.3514
0.4094
0.0875
0.1029
Align
0.0782
0.0926
0.0278
0.0324
Adder
4.471
4.896
1.118
1.232
Leading 0
0.1313
0.1722
0.033
0.0433
Normalize
0.5865
0.6238
0.1481
0.1692
Round
0.6339
0.6782
0.1593
0.1709
OvfCheck
0.1632
0.1666
0.0408
0.04165
Total
12.25
13.008
3.065
3.297
46
Marketing Description Implementing Floorplan Layout Verify Specifications
Area:
um2
Transistor Transistor Schematic Layout
Count
Delay (ns) Delay
Density
(ns)
Multiplier
-w/ pipeline
0.22
3.38
1.9
N/A
2.25
20388
4496
Exponents
5,163
738
0.14
1.01
1.2
Align
3,995
500
0.13
0.480
0.637
Adder
13,202
3174
0.24
1.34
1.7
Leading 0
1,253
364
0.29
0.506
0.551
Normalize
3,190
942
0.3
0.407
0.437
Round
1,802
494
0.28
0.864
0.986
OvfCheck
200
70
0.35
0.453
0.475
Registers, etc
N/A
2038
N/A
0.179
0.193
Total
59,214
12,820
0.22
- 47
Marketing Description Implementing Floorplan Layout Verify Specifications
Conclusion
Jigar
48
Everyone Needs a MAD MAC
Graphics – HDR Rendering, Blending and Shader ops
•
Fastest 180nm GPU: 250 MHz (9-bit Int)
•
MAD MAC 525: 400 MHz (16-bit FP)
49
Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Needs a MAD MAC
DSPs – Computing Vector Dot-Products in Digital Filters
50
Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Needs a MAD MAC
Enables Fast Division, Square Root
•
Eliminates extra Hardware to handle such computation
•
Available in many new CPUs such as STI’s Cell
51
Marketing Description Implementing Floorplan Layout Verify Specifications
Future Enhancements
16 to 32 Bits
Newer process technology
Possible modifications for low power apps
52
Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Wants A
MA D MAC 525
53