New Opportunities for Computer Architecture Research

Download Report

Transcript New Opportunities for Computer Architecture Research

AMD Microprocessor Technologies
06/21/06
Ben Sander
AMD Principal Member of Technical Staff
2006
Motivation : PC Jargon Demystified
• “AMD Athlon™ 64 4200+* dual-core processor with
64-bit platform, Direct Connect Architecture and
HyperTransport™ Technology for increased
multitasking performance; improved security with
Enhanced Virus Protection**; Cool'n'Quiet™
Technology to minimize heat and noise”
2
06/21/06
Ben Sander
Talk Outline
• Motivation
• Recent innovations
– Dual-core processors
– Direct Connect ArchitectureTM and HyperTransportTM
– Power-efficient design (and Cool’n’QuietTM)
– AMD64 Architecture
• What’s next?
– Direct Connect ArchitectureTM enhancements
– HTX “Accelerators”
– Core enhancements
– Virtualization and AMD-V
• Summary and Conclusion
3
06/21/06
Ben Sander
Dual-Core AMD Opteron™ Processor Design
• Two AMD Opteron™ processor cores on a single
die
– Each with 1MB L2 cache
• Shared Northbridge
– Three HyperTransport™ technology links
– Dual-channel (128 bit) DDR interface
• AMD Opteron processor designed as CMP from
the start
– 2nd port on SRI, request management, 2 APICs, clocking
microcode
CPU0
CPU1
1MB
L2 Cache
1MB
L2 Cache
System Request Interface
Crossbar Switch
• Two complete CPUs
– Symmetric multiprocessor programming (SMP) model
– Simpler, less restrictive programming model than ‘virtual
CPU’ approach
Memory
Controller
HyperTransport™
0
1
2
Existing AMD Opteron™
Processor Design
4
06/21/06
Ben Sander
MPF 2004 - AMD Dual-Core Processor Chip
Integration:
• Two 64-bit CPU cores
• 2MB L2 cache
• On-chip Northbridge & Memory Controller
Bandwidth:
• Dedicated 64-bit L2 busses for each core
• Dual channel DDR (128-bit) memory bus
• 3 HT links (16-bit each x 2 GT/sec x 2)
Usability and Scalability:
• Socket compatible: Platform and TDP!
• Glueless SMP up to 4 sockets
• Memory capacity & BW scale w/ CPUs
Power Efficiency:
• PowerNow! Optimized power management
• Leadership system level power attributes
5
06/21/06
Ben Sander
AMD64 Dual-Core Physical Design
• 90nm
– Approximately same die size as
130nm single-core AMD Opteron™
processor
– ~205 million transistors
• 68/95 watt power envelope
– Fits into 90nm power infrastructure
• 939/940 Socket compatible
– Fits into existing sockets
6
06/21/06
Ben Sander
Dual-Core : Customer Value
• What is it?
– Two processing cores on the same die
• AMD: Clean single-core to multi-core upgrade path
– Same pinout
– Same power envelope!
• Server customers
– Server apps scale extremely well with increasing processors
Transaction processing, web serving
– Doubles compute density
More compute power from the same motherboard
More compute power in a server rack
– More efficient software licensing
• Consumers
– Efficiently run multiple programs at the same time
Operating system + background application
Virus checker + photo-editing software
– Significantly improves performance of threaded applications
Video editing, MP3 encoding
7
06/21/06
Ben Sander
Dual-Core AMD Opteron™ Processor Design
• Two AMD Opteron™ processor cores on a single
die
– Each with 1MB L2 cache
• Shared Northbridge
– Three HyperTransport™ technology links
– Dual-channel (128 bit) DDR interface
• AMD Opteron processor designed as CMP from
the start
– 2nd port on SRI, request management, 2 APICs, clocking
microcode
CPU0
CPU1
1MB
L2 Cache
1MB
L2 Cache
System Request Interface
Crossbar Switch
• Two complete CPUs
– Symmetric multiprocessor programming (SMP) model
– Simpler, less restrictive programming model than ‘virtual
CPU’ approach
Memory
Controller
HyperTransport™
0
1
2
• AMD Direct Connect Architecture
– Everything connected directly to CPU
– Reduces system architecture bottlenecks
– Further reduces latency by directly connecting two
cores on same die
8
06/21/06
Existing AMD Opteron™
Processor Design
Ben Sander
Direct Connect : Advantages of good plumbing
Chip
Chip
X
X
MCP
Chip
Chip
X
X
Chip
MCP
USB
Chip
X
X
MCP
PCI
Chip
X
X
MCP
SRQ
SRQ
Crossbar
Crossbar
Mem.Ctrlr
HT
Mem.Ctrlr
HT
8 GB/S
PCI-E
Bridge
PCI-E
Bridge
PCIeTM
Bridge
Memory
Controller
Hub
I/O Hub
Hub
I/O
Chip
SRQ
SRQ
Crossbar
Crossbar
Mem.Ctrlr
HT
Mem.Ctrlr
HT
8 GB/S
XMB
XMB
XMB
8 GB/S
PCIeTM
Bridge
XMB
PCIeTM
Bridge
8 GB/S
USB
I/O Hub
PCI
Legacy x86 Architecture
•
•
•
•
20-year old front-side bus (FSB) architecture
CPUs, Memory, I/O all share a bus
Major bottleneck to performance
Faster CPUs or more cores ≠ performance
9
06/21/06
AMD64’s Direct Connect Architecture
• Industry-standard technology
• Direct Connect eliminates the FSB bottleneck
• HyperTransport™ interconnect offers scalable high
bandwidth and low latency
Ben Sander
AMD Direct Connect : Customer Value
• What is it?
– Direct connection of cpu to the DRAM/memory
– And cpu-to-cpu for multi-processor systems.
• Increased performance
– Reduced memory latency
– Reduced chip communication latency
• Reduced power
– Reduced chip-count in system
– Reduced external pin switching
• Scalability
– Unlocks the potential of faster CPUs and additional cores
10
06/21/06
Ben Sander
What’s Consuming all the Power?
Server power
consumption
38% - 63%
Computer Room Air
Conditioner power
consumption
23% - 54%
Battery Backup
power consumption
6% - 13%
Lighting power
consumption
1% - 2%
Server Power Consumption Impacts Power
throughout the Datacenter
11
06/21/06
Ben Sander
System-level Power Consumption – Present Day
Chip
Chip
X
Chip
X
Chip
X
X
Chip
Chip
Chip
X
X
692 watts
MCP
MCP
Chip
X
X
MCP
MCP
SRQ
SRQ
Crossbar
Crossbar
HT
Mem.Ctrlr
Mem.Ctrlr
HT
380 watts
USB
I/O Hub
Hub
I/O
PCI
8 GB/S
PCI-E
Bridge
PCI-E
Bridge
PCIeTM
Bridge
14
watts
Memory
Controller
Hub
SRQ
SRQ
Crossbar
Crossbar
Mem.Ctrlr
HT
Mem.Ctrlr
HT
8 GB/S
8.5
XMB
watts
8.5
XMB
watts
8.5
XMB
watts
8 GB/S
PCIeTM
Bridge
8.5
XMB
watts
PCIeTM
Bridge
8 GB/S
USB
I/O Hub
PCI
Dual-Core Packages with legacy technology
• 692 watts for processors (173w each)
• 48 watts for external memory controller
95% More Power
Dual-Core AMD Opteron™ processors
• 380 watts for processors (95w each)
• Integrated memory controllers
740 watts
380 watts
Source: Mixture of publicly available data sheets and AMD internal estimates.
Actual system power measurements may vary based on configuration and components used
12
06/21/06
Ben Sander
Reducing Power and Cooling Requirements
with Processor Performance States
P-State
P0
HIGH
Average CPU Core Power
(measured at CPU)
2600MHz
1.40V
~95watts
25
PowerNow! ENABLED
P1
20
P2
P3
PROCESSOR
UTILIZATION
2000MHz
1.25V
~65watts
1800MHz
1.20V
~55watts
-62%
15
-75%
10
0
P5
13
-33%
5
P4
1000MHz
1.10V
~32watts
Power (W)
2400MHz
1.35V
~90watts
2200MHz
1.30V
~76watts
PowerNow! DISABLED
10500 Connections
5000 Connections
(~62% CPU Utilization) (~40% CPU Utilization)
LOW
Idle
(in OS)
Up to 75% power savings!
06/21/06
Ben Sander
Power-efficient design : Customer Value
• What is it?
– PowerNow! Technology changes frequency in response to workload
At lower frequencies, voltage is reduced as well
– Power efficiency “designed-in”
Appropriate frequency targets
Integrate external chipset logic (aka Dirrect Connect)
“Fine gating” and other design-for-power techniques
• Customer value
– Server: Save $$$ on server power and air conditioning
– Desktop: Quieter operation via “Cool’n’Quiet™” technology
– Notebook: Longer battery life
14
06/21/06
Ben Sander
AMD64 : Evolutionary 64-bit ISA
• What is it?
– Evolutionary extension to support “64-bits” on x86 processors
– Now an industry standard supported by other processor vendors
• Why 64 bits?
– Driven by apps needing large amounts of memory
CAD tools, large databases, simulations
– 64-bit integer arithmetic
Security and encryption applications
• Why extend x86 to 64 bits?
– X86 is the most widely installed instruction set in the world
– Delivers 64-bit advantages while providing full x86 compatibility
– Doesn’t require a completely new tool chain
• User benefits from 64 bits:
– Large-memory applications
Some applications see 10x speedup from additional memory.
64-bit flat programming model massively easier for software developers
– Some performance improvement from additional registers and wider data operations
– AMD64: Backwards compatibility allows migration on customer’s timeframe
15
06/21/06
Ben Sander
Design Goals for AMD64 Technology
• Processor is fully compatible with existing x86 modes
• Straightforward extensions for 64 bits
– Minimize architectural divergences
Maintain consistency with existing architecture
– Minimize instruction set encoding changes
– Straightforward implementation & verification
• Double the number of Integer and SSE registers
• Architectural support for 64 bits of virtual address
space and 52 bits of physical address space
– Implementations may support less
• 64-bit integer operations
• Eliminate unused/underutilized arcane x86 features
within the context of 64-bit mode
16
06/21/06
Ben Sander
AMD64 Programmer’s Model
RAX
17
06/21/06
Ben Sander
REX prefix byte
Instruction
Prefixes
Optional
REX
Prefix
Byte
Opcode
MODRM
0 1 0 0
7
6
5
4
SIB
Displacement
Immediate
W R X B
3
2
1
0
• Additional registers encoded without altering existing
instruction format
• Optional REX prefix specifies 64-bit operation size override
– Plus 3 additional register encoding bits
• REX is actually a family of 16 prefixes (40-4F)
• Average instruction length in 64-bit mode increased by 0.4
bytes
18
06/21/06
Ben Sander
Talk Outline
• Motivation
• Recent innovations
– Dual-core processors
– Direct Connect ArchitectureTM and HyperTransportTM
– Power-efficient design (and Cool’n’QuietTM)
– AMD64 Architecture
• What’s next?
– Direct Connect ArchitectureTM enhancements
– HTX “Accelerators”
– Core enhancements
– Virtualization and AMD-V
• Summary and Conclusion
19
06/21/06
Ben Sander
Co-processors and Accelerators
Excellent way to get power-efficient
performance boosts
 Special-purpose, tuned solutions for
common functions
 Drop to low-power states when not
in use
 Enabled by Modern API’s
Aligns with modularity imperative
 Co-processor becomes another
(optional) “IP block”
 Micro-architecture: Command
delivery, Synchronization, Streaming
Promising Concept
Many possible opportunities now,
and/or in the future
 Media processing
 JVM/CLR runtime hosting
 NIC integration (TOE, XML, SSL, etc)
20
06/21/06
Ben Sander
HyperTransport HTXTM Enables System-level
Coprocessing Today
21
06/21/06
Ben Sander
AMD’s Next Generation Processor
Technology
• Scalable performance
and balance
• Maintain performance
per watt leadership
Faster HyperTransport links (up to 5.2 GT/sec)
Additional bandwidth enhancements
On-chip shared L3 cache
Independent NB and CPU power management
Independent CPU P-state and C-state controls
• Performance on
diverse workloads
Enhanced IPC CPU core; >2X FPU performance
48-bit virtual and physical address space
1GB large page support
Platform support for co-processors
• Compatibility
DDR2 memory support with migration to DDR3
FBDIMM Gen1 and Gen2 at the appropriate time
HT-1 backwards compatibility
• Enhanced Virtualization
I/O Virtualization
Nested paging support
• Enhanced RAS
Memory mirroring
Data poisoning support
HT retry protocol support
22
06/21/06
Ben Sander
AMD’s Next Generation Processor
Technology
Optimized for 65nm SOI
and beyond
Native quad core die
Expandable shared
L3 cache
IPC enhanced
CPU cores
32B instruction fetch
Improved branch prediction
Out-of-order load execution
Up to 4 DP FLOPS/cycle
Dual 128-bit SSE dataflow
Dual 128-bit loads per cycle
Improved core and Northbridge prefetchers
Bit Manipulation extensions (LZCNT/POPCNT)
SSE extensions (EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)
23
06/21/06
Enhanced Direct
Connect Architecture
and Northbridge
HT-3 links (5.2GT/sec)
Enhanced crossbar
DDR2 with migration
path to DDR3
FBDIMM when
appropriate
Enhanced power
management
Enhanced RAS
Ben Sander
Virtualization
Virtualization
is the pooling and abstraction of
resources
in a way that masks the physical nature
and boundaries of those resources
from the resource users
24
06/21/06
Ben Sander
Virtualization: Customer Value
• What it is?
– Allows a single computer to efficiently run multiple guest
Operating Systems and associated applications
– AMD-V provides hardware acceleration for virtualization
And simplfies the development process.
• Benefits:
– Consolidation
More efficient use of compute resources
Eliminate “single-application” servers
Consolidate old unsupported servers onto newer hardware
– Migration/reliability
If a server fails, can easily move app to another server
– Allows developers to easily test multiple OS environments on
a single machine.
– Upgrades can be tested on hardware before deployment
25
06/21/06
Ben Sander
Virtualization Methods
• Software-only virtualization
–
–
–
–
Software acts a translator between OS and hardware
No need to modify the operating system
Available today
Can be slow
• OS-enabled virtualization
– Host OS and virtualization software tightly integrated
Offers improved performance
But requires changes to OS
• Processor-supported virtualization
– Processor protects memory locations so that only
virtualization software can access them
– Processor provides hooks on all system-level instructions
– Accelerated performance and better security
26
06/21/06
Ben Sander
AMD-V: Overview
• Virtualization is being used in several server scenarios
today
• AMD expects that virtualization will prove valuable for PC
clients too
• There are ways to modify the X86 architecture, so that
virtualization is easier to accomplish, performs better, and
provides more security
• AMD’s AMD-V technology is being developed for future
AMD64 CPUs for servers and clients
• Key technologies include adding new instructions,
supporting different methods of handling page tables,
handle host and guest interrupts (including SMI/SMM), and
provide DMA protection
27
06/21/06
Ben Sander
Summary and Conclusion
AMD is focused on customer-centric innovation and
value
– Dual-core processors
– Direct Connect Architecture and HyperTransport
– Power-efficient design
– AMD64 Architecture
– And more!
AMD is investing heavily in extending our leadership
–
–
–
–
28
Next generation Direct Connect Architecture technology
Next generation CPU technology
AMD-V and hardware virtualization
Developing a fundamental understanding of important emerging trends
06/21/06
Ben Sander
Thank you !
www.amd.com/power
© 2006 Advanced Micro Devices, Inc. All rights reserved.
AMD, the AMD Arrow, AMD Athlon, AMD Opteron and combinations thereof, are trademarks of Advanced Micro Devices, Inc.
HyperTransport is a trademark of the HyperTransport Consortium PCI-X, PCIe and PCI Express are trademarks of PCI-SIG
Other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.
29
06/21/06
Ben Sander
Backup
30
06/21/06
Ben Sander
AMD Architectural Generations
Now
AMD64 Architecture
Dual Core Architecture
Coming Soon
Extensions to AMD64
Multi-core Architecture
Future
FPU Extensions to AMD64
Throughput Architecture
Direct Connect Architecture
Scalable SMP Architecture
Enhanced Virus Protection
AMD-V Virtualization
Secure Execution
HyperTransport™ v1.0, v2.0
HyperTransport v3.0
HyperTransport v4.0
DDR, DDR2
AMD PowerNow!™
Technology
DDR3, FBDIMM
Partitioned PowerNow!
Mainframe-class reliability
On-chip Coprocessors
DDR4, FBD2
System Resource Mgmnt
Best-in-class Reliability
High Reliability RAS
System Performance
31
System Perf. / Watt
06/21/06
Throughput / Watt / $$
Ben Sander