Transcript AMD Athlon

Advanced Micro Devices - Athlon



Buddy Guest
Mike Lewitt
Bill McCorkle
November 28, 2001
Where
is theWe
Competition?
What Have
Seen So Far?
IA-32
RISC
IA-64
Overview of Today’s Events


Company History
Differences in AMD Athlon Architecture






System Bus
Macro vs. Micro Operations
Floating Point Operations
Branch Prediction
Memory Management
Comparing Processor Performance
AMD

May 1, 1969 – founded









Intel

Semiconductor company
1975 8080A and AM2900
1976 Sign cross-licensing
agreement
1987 AMD & Intel go to court
1992 Court awards full rights to
AMD to produce AM386 Processor
1991 AM386 (breaks Intel
Monopoly)
1993 AM486
1997 AMD-K6
1998 Athlon – 1st 7th
Generation Processor
July 18, 1968 – founded










Semiconductor memory
1971 4004 introduced
1971 8008 introduced
1976 Sign cross-licensing
agreement
1981 16-bit 8086
1982 286 (on-board memory)
1985 32-bit 386
1989 486
1993 Pentium
1998 Celeron & Pentium II
Architecture Summary

AMD Approach


Balanced approach to optimize processor
performance (IPC) and improving the operating
frequency at the same time.
Intel Approach


Increased pipelining depth to handle more
instructions which created loss in processor
performance (IPC).
Solution: Compensated with much higher
frequency to stay in competition. (=IPC)
Architecture Summary

Overall Improvement to Performance

Frequency Improvements





Smaller Geometries
Faster Transistors (“process shrinks”)
Deeper Pipelines
Fewer Gates Per Clock Cycle
Work Per Clock Improvements




Super scalar Architectures
Dynamic Instruction Schedulers
Larger On-Chip Caches
Advanced Branch Prediction
Architecture Summary
Clock Speed / EV6 Bus




Designed with very high clock speeds in
mind
K7 has very deep buffers to enable those
high clock speeds, offering up to 72 x86
instructions in-flight.
Uses Rising Edge and Falling Edge
Detection For Bus



100 MHz Clock  200 MHz Processor
133 MHz Clock  266 MHz Processor
AMD vs. Intel comparing same clock
Architecture Summary

EV6 Bus on AMD Athlon



Scalable up to 200 MHz Yielding
Effective frequency 400 MHz
Multiprocessor support
Highest bus bandwidth (1.60 GB/s)

Intel using 133 MHz
(1.01 GB/s)
AMD Athlon
PIII
Architecture Summary

Instruction Control Unit

Holds 72 MOps Before Assignment
(MOp = x86 instruction, therefore Athlon can
have 72 “in-flight” instructions)

P6 Only Holds 13 in-flight MOps
Architecture Summary

Execution Ports


AMD Has No Less Than 9
Intel Has 5

2 Dedicated to memory stores
Enhanced Parallelism Inside Athlon
Micro-OPs / Macro-OPs

Athlon has 3 parallel x86 instruction decoders
translate into a Macro-Op of 72-entry ICU

Uses 2 pipelines (Intel uses 1)




-Decoding common instructions (direct path)
-Decoding complex x86 instructions (vector path)
Integer Scheduler is fed and holds max 15 M-Ops,
representing 30 at a time
Leads to 3 parallel integer execution units
Micro-OPs / Macro-OPs

Athlon Decoders 3-Way Instruction

Has 3 parallel decoding units



Can handle any combination of instructions with any of
it’s decoders that are “fully capable” decoders
Handles Complex and Simple Instructions
Intel Decoders

Has 3 parallel decoding units



1 Complex
2 Simple
Handles Complex / Simple / Simple
3DNOW!
MMX Developed When FPUs Not As Important
3DNOW! (Athlon)
SSE (Intel)
Pipelines (parallel)
2
2
Instructions (how wide)
2
4
4*
4
3DNOW! / FPU
No FPU
Effective Instructions per Cycle
Registers Used
Every 4-wide Intel SSE instruction is actually 2 Athlon micro-ops
*AMD takes advantage of rising edge as well as falling edge
**SSE Cannot be used with MMX Registers
3DNOW!
Each pipeline can do any instruction above.
The second pipeline can do any instruction in any
group except the group the first pipeline has chosen.
3DNOW!

Conclusion of 3DNOW! Vs SSE

Both have pairing restrictions

SSE Separate Unit
 implementation more difficult
 program with more freedom


MMX-add & prefetch-instructions slightly better
for SSE
Final Conclusion: DRAW
AMD Athlon
Full Architecture views
PIII
Looking at the ALUs
Floating Point Operations


Fully pipelined FPU
3 ported parallel Floating Point Execution
Units


Pentium has 3 also, but are behind only one
port
FPU can execute two 80-bit extended
Ops

Intel can currently only execute one
Pipelining Differences

Determining the length


Execution rate of pipeline (ALU)
Degree of Parallelism
AMD
Athlon
Intel
Pentium III
Integer Pipeline
Length
10
12-17
Floating Point
Pipeline length
15
25
(AMD-Athlon)
Branch Prediction
Cycle
Decode
Decode
1
2
3
Fetch
if (x>0)
a=0
b=1
4
d=3
1
2
3
4
5
6
7
Example:
if (x > 0){
a=0;
b=1;
c=2;
}
Fetch
if (x>0)
a=0
b=1
c=2
Cycle
d=3;
5
if (x>0)
a=0
b=1
c=2
if (x>0)
a=0
squash
b=1
d=3
6
7
Cycle
1
2
3
4
5
Execute
if (x>0)
a=0
b=1
c=2
Execute
if (x>0)
squash
a=0
squash
b=1
d=3
Fetch
if (x>0)
d=3
Decode
if (x>0)
d=3
Execute
if (x>0)
d=3
Save
When x>0
if (x>0)
a=0
b=1
c=2
Save
When x<0
if (x>0)
squash
a=0
squash
b=1
d=3
Save
if (x>0)
d=3
Predicting
x<0
Branch Prediction

AMD Athlon



Intel Pentium III


Branch Target Buffer size of 2048 entries
Branch History Table can store 4096 entries
Dynamic Branch Predictor can store 512 entries
Approximate Correct Branch Predictions


AMD Athlon: 95%
Intel Pentium III: 90-92%
Memory Management

Level 2 Cache



512kB to 8 MB
Rate of 1/3, 1/2, 2/3, 1/1 the clock frequency
External to the CPU (Weakness of Athlon)





Intel L2: 256kB ‘on-die’
Intel moving away from Slot1 and back to socket
AMD will need to move to ‘on-die’ and socket
connections to stay competitive
Main push towards 0.18 m-process
Level 1 Cache


64kB data and instruction caches (4x Pentium III)
Scalability
Which One Is Better?

In the past (286, 386, 486)


In Today’s World


Performance = Frequency
Performance = IPC * Frequency
How else so we compare?

Benchmarking
Benchmarking


Software that performs different tasks
to obtain comparisons between
processors.
Problems:



Processor frequencies.
Other processes already running.
Types of programs

Some programs are written to take advantage
of certain architecture.
Photo Editing Software
Animation Software
3D Graphics Editor
3D Gaming
Various Benchmarks
Summary




Past couple years, AMD and Intel have
taken different approaches.
We have gone over the main
architectural differences.
We have shown how they compare.
It will be very interesting to see how
the market plays out.
Questions?
References








http://www.amd.com
http://www.amdzone.com
http://www.intel.com
Gardner, Ryan. AMD employee CPU Specialist
email: [email protected]
Hsieh, Paul. 7th Generation CPU Comparisons.
http://www.azillionmonkeys.com/qed/cpujihad.shtml . 11/30/00
Pabst, Thomas. The New Athlon Processor – AMD is Finally Overtaking Intel .
http://www6.tomshardware.com/cpu/99q3/990809/index.html. 8/9/99
Pabst, Thomas. AMD Processors vs. Intel Processors – Facts and Lies.
http://www6.tomshardware.com/cpu/00q4/001017/athlon-02.html. 10/12/00
Morgan, Rob. Power Mac G4 Dual 500 vs. Pentium 4 vs. Athlon.
http://www.barefeats.com/pentium.html . 1/08/01