Implementing

Download Report

Transcript Implementing

1
MAD MAC 525
Farhan Mohamed Ali
Jigar Vora
Sonali Kapoor
Avni Jhunjhunwala
Design Manager: Zack Menegakis
1st May, 2006
Final Presentation
Design a crucial part of a GPU called the Multiply Accumulate
Unit (MAC) which is revolutionizing graphics
2
Agenda

Marketing – Jigar
Project and Algorithm Description – Farhan
Implementation Part I – Farhan
Implementation Part II – Sonali
Floorplan – Sonali
Layout – Avni
Verification – Avni
Design Specifications – Avni

Conclusion – Jigar







3
Marketing
Jigar
4
Purpose
MAD MAC 525 accelerates FP16 blending to
enable true HDR graphics
Huh??
5
Marketing Description Implementing Floorplan Layout Verify Specifications
6
Beauty of High Dynamic Range

With HDR rendering, pixel intensity can extend
beyond the range of traditional graphics
 Nature
doesn’t have a limited pixel intensity and
neither should Computer Graphics

In other words:



Bright things can be really bright
Dark things can be really dark
And the details can be seen in both
7
Marketing Description Implementing Floorplan Layout Verify Specifications
Applications of HDR
8
Marketing Description Implementing Floorplan Layout Verify Specifications
Target Market

Target Market Segment
Graphic chip manufacturers
 High speed DSP manufacturers
 CPU co-processors


Potential Customers
9
Marketing Description Implementing Floorplan Layout Verify Specifications
Design Comparison

Top 180nm graphics chip is the NVIDIA NV16.


Highest speed only 250MHz
9 bit Integer precision

As games are becoming more advanced, they are
in need of fast graphics chips

Conclusion:
Market Needs a FAST MAD MAC
10
Marketing Description Implementing Floorplan Layout Verify Specifications
Description and Implementation I
Farhan
11
Project Description
• Multiply Accumulate unit (MAC)
• Executes function AB+C on 16 bit floating point inputs.
• Format – 1 bit sign, 5 bit exponent and 10 bit
significand
• Multiply and add in parallel to greatly speed up operation
• Rounding performed only once so greater accuracy than
individual multiply and add functions.
• Also known as:
• Fused Multiply Add (FMA)
• Multiply Add (MAD/MADD) in graphics shader programs
12
Marketing Description Implementing Floorplan Layout Verify Specifications
Algorithm

FP Multiply (A*B)





Multiply significands
Add exponents
Normalize
Round
FP Add (A+B)




Align smaller number to larger number
Add significands
Normalize
Round
13
Marketing Description Implementing Floorplan Layout Verify Specifications
Algorithm

FP Multiply-Add (AB+C)
Align sig C based on exp A+B-C
 Multiply significands A and B
 Add sig A*B result to aligned sig C
 Normalize
 Round

14
Marketing Description Implementing Floorplan Layout Verify Specifications
Block Diagram
A
B
C
Exp Calc
Multiplier
Adder
Align
Leading 0
Anticipator
Normalize
Round
Y
Output
Ovf Checker
15
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation

Design target: 300MHz
Speed is the design goal
 Ambitious target?


How we planned achieve this
Fast Logic – parallelize ops as much as possible
 Pipelining

16
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation

Adder

Carry Select vs Carry Lookahead tree
17
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation

Adder

Han-Carlson based carry lookahead adder
 6 lookahead logic stages for 32 bit adder
 Less logic than a Kogge-Stone adder
 Less wiring than a Brent-Kung adder
18
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation

Multiplier

Carry-Save Multiplier
 Avoids having ripple carry in every stage
 Enables regular and compact layout
 Easy to pipeline
 Final 10 bit add stage using carry lookahead adder
19
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation

Leading Zero Anticipator

Predicts number of shifts to do in normalize


Normalize begins with zero delay
Operates in parallel with adder so normalize shifts can
be predicted with accuracy of 1 shift to left or right
20
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation

Latches

Pulse Latches



Practically eliminates setup time
16 transistors per pulse generator
Simplified version of those used in a certain high speed CPU
Clock pulse
generator
21
Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation II and Floorplan
Sonali
22
Design Decision: Pass Logic

Extensive use of Pass Logic
 Reduces transistor count
 Reduces area

Transistor count reduced from 20,200 to 12,800
Example
 Normalize: 3400 -> 942
 Align: 1500 -> 530

Ensure all pass logic is buffered
23
Marketing Description Implementing Floorplan Layout Verify Specifications
Design Decision: Pipelining

Initially planned 6 pipeline stages

Reduced to 4 pipeline stages
 Adder – Fast Carry Lookahead
architecture
 Multiplier – Ripple Carry to Carry
Lookahead
24
Marketing Description Implementing Floorplan Layout Verify Specifications
Pipeline Stages
Reg A
Reg C
Exp
Calc
Reg B
Multiplier
Align
C
Ld
Zero
Adder
Normalize
Round
Output
25
Marketing Description Implementing Floorplan Layout Verify Specifications
Schematics

INPUTS
P
I
P
E
L
I
N
E
Multiplier
PIPELINE
O
U
T
P
U
T
S
OUTPUTS
26
Marketing Description Implementing Floorplan Layout Verify Specifications
Schematic

Adder
OUTPUTS
Sum Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
Look Ahead Logic
INPUTS
27
Marketing Description Implementing Floorplan Layout Verify Specifications
Floorplan Evolution
Initial Floorplan
Reg A
Reg B
Reg C
Exp
Calc
Multiplier
Align
C
Pipeline Reg
Ld
Zero
Pipeline Reg
Adder
Pipeline Reg
Round
Normalize
Overflow checker
Reg Y
28
Marketing Description Implementing Floorplan Layout Verify Specifications
Floorplan Evolution
Final Floorplan
Reg C
Reg A
Reg B
Exponents
Output
Ld
zero
R
o
u
n
d
O
v
f
Align
N
o
r
m
a
l
i
z
e
Multiplier
Adder
29
Marketing Description Implementing Floorplan Layout Verify Specifications
Layout, Verification & Specification
Avni
30
Layout Decisions
3 cell heights – 6.03, 5.04 and 3.55
 Uniform width vdd and ground rails
 Wider vdd and ground rails in power hungry
modules
 Max of 8 latches per clock pulse generator
 Uniform metal directionality within each
block

31
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
32
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
MULTIPLIER
33
Marketing Description Implementing Floorplan Layout Verify Specifications
Multiplier
 Height: 191.6
 Width: 206.38
 Area: 20,388
IN
B
I
T
S
L
I
C
E
I
N
P
I
P
E
L
I
N
E
O
U
T
P
U
T
R
E
G
OUTPUT
34
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
MULTIPLIER
ADDER
35
Marketing Description Implementing Floorplan Layout Verify Specifications
Adder
Height:122.9
Width: 110.2
Area:13,202
ADDER
INCREMENTER
36
Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout
Input
Input
Exponents
Ld
zero
Align
O O
U v
T f
R
o
u
n
d
N
o
r
m
a
l
i
z
e
Multiplier
Adder
37
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Active: 14.04%
38
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Poly : 9.25%
39
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 1 : 34.08%
40
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 2 : 18.00%
41
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 3 : 14.99%
42
Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks
Metal 4 : 6.23%
43
Marketing Description Implementing Floorplan Layout Verify Specifications
Verification Of Design

Behavioral and Structural Verilog


Schematic and Layout testing


Extensive Testing – Unable to find C or Matlab
Code
Analog Simulations – Compare Output with
Behavioral
Full Chip Verification
44
Marketing Description Implementing Floorplan Layout Verify Specifications
Design Specifications
Critical path delay = 2.25ns
 Clock speed = 400MHz
 Pipeline stages = 4
 Height by width = 195.26 um * 303.255 um
 Area = 59,214 um^2
 Aspect ratio = 1:1.55
 Transistor density = 0.22
 Total Pin Count = 67

45
Marketing Description Implementing Floorplan Layout Verify Specifications
Schematic
Layout
Schematic
Power: mW Power: mW Power: mW
(400 MHz) (400 MHz) (100 MHz)
Layout
Power: mW
(100 MHz)
Multiplier
-w/ pipeline
2.281
2.354
0.6168
0.6297
Exponents
0.3514
0.4094
0.0875
0.1029
Align
0.0782
0.0926
0.0278
0.0324
Adder
4.471
4.896
1.118
1.232
Leading 0
0.1313
0.1722
0.033
0.0433
Normalize
0.5865
0.6238
0.1481
0.1692
Round
0.6339
0.6782
0.1593
0.1709
OvfCheck
0.1632
0.1666
0.0408
0.04165
Total
12.25
13.008
3.065
3.297
46
Marketing Description Implementing Floorplan Layout Verify Specifications
Area:
um2
Transistor Transistor Schematic Layout
Count
Delay (ns) Delay
Density
(ns)
Multiplier
-w/ pipeline
0.22
3.38
1.9
N/A
2.25
20388
4496
Exponents
5,163
738
0.14
1.01
1.2
Align
3,995
500
0.13
0.480
0.637
Adder
13,202
3174
0.24
1.34
1.7
Leading 0
1,253
364
0.29
0.506
0.551
Normalize
3,190
942
0.3
0.407
0.437
Round
1,802
494
0.28
0.864
0.986
OvfCheck
200
70
0.35
0.453
0.475
Registers, etc
N/A
2038
N/A
0.179
0.193
Total
59,214
12,820
0.22
- 47
Marketing Description Implementing Floorplan Layout Verify Specifications
Conclusion
Jigar
48
Everyone Needs a MAD MAC
Graphics – HDR Rendering, Blending and Shader ops
•
Fastest 180nm GPU: 250 MHz (9-bit Int)
•
MAD MAC 525: 400 MHz (16-bit FP)
49
Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Needs a MAD MAC
DSPs – Computing Vector Dot-Products in Digital Filters
50
Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Needs a MAD MAC
Enables Fast Division, Square Root
•
Eliminates extra Hardware to handle such computation
•
Available in many new CPUs such as STI’s Cell
51
Marketing Description Implementing Floorplan Layout Verify Specifications
Future Enhancements

16 to 32 Bits

Newer process technology

Possible modifications for low power apps
52
Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Wants A
MA D MAC 525
53