SLATE A System-Level Analysis Tool for Early Exploration

Download Report

Transcript SLATE A System-Level Analysis Tool for Early Exploration

© 2006 IBM Corporation
0
IBM Research
Multi-Core
Design Automation
Challenges
John Darringer
IBM T. J. Watson Research Center
Yorktown Heights, NY, USA
DAC 2007
© 2007 IBM Corporation
System Performance Requires An Integrated Approach
Application
Device Performance
200
100
Languages,
Software Tuning
Efficient Programming
Middleware
Recent
Historical
Trend
System Level
Dynamic optimization
Assist Threads
FPG
Fast Computation
Power Optimization
Compiler Support
Production Date
20
1998
2000
2002
2004
2006
2008
Chip Level
Scaling no longer provides
traditional performance boost
SMT
Accelerators
Power Management
Interconnect
Circuits
Power limits everything
Advances will come from
entire performance stack
2
Compiler Support
Multiple Cores
Technology
Packaging, Cooling
New Devices
Dense SRAM, eDRAM
Optics
Memory
© 2006 IBM Corporation
ISU
FXU
FPU
FXU
Innovation in System Design
ISU
IDU
IDU
LSU
IFU
BXU
L3 Directory/Control
Power 6
4.7 Ghz-2007
FPU
L2
LSU
L2
IFU
BXU
L2
Power 4
Multi-Core-2001
Power 5
Multi-Thread-2004
CELL
Accelerators-2006
© 2006 IBM Corporation
3
Trend to Modular Application Optimized Systems
SMP
Core
Accelerator
Cache
Memory
...
Blades
 Growing use of diverse modular components
 Chip integration may evolve to component assembly
 Challenge is in system-level design
– Optimizing architecture for specific applications
© 2006 IBM Corporation
4
Multi-Core ASICs
 Multi-core ASIC SoCs are common today
– Address broad range of markets
– Enables high functional integration
– Provides rapid time to market
 One example from 2004
– Cisco Silicon Packet Processor
– 188 32-bit RISC processors
– 47 BIPS
© 2006 IBM Corporation
5
Multi-Core Processors
 Power efficient, reusable cores
 Application matched accelerators
 Flexible scaleable interconnect
 Optimized memory hierarchy
 High speed I/O
 Energy management
 Deliver system performance
 Rapid chip assembly to serve
diverse markets
© 2006 IBM Corporation
6
CHALLENGE
 System Design
Design Automation
– Continued performance growth
– Custom design efficiency
– Increasing power efficiency
– AISC productivity
– Optimizing for new applications
– Design and verification
Enablers
– Physical Architecture
– Integrated Early Analysis
– Multi-Core Verification
© 2006 IBM Corporation
7
Physical Architecture
 Complement logical architecture
 Streamline chip integration
 Plan for interconnect
 Provide predictable results
Example Logical Architecture
 Multiple strategies
– Fixed layout per block
– Parametric or generated
– Extended synthesis
Example Physical Architecture
© 2006 IBM Corporation
8
Modular Components
 Components need self-contained vertical stack
– with clean interfaces to enable automated integration
Mixed Fabric and
Component Function;
Custom Interface
Current Chips
Custom crafting
of clock, data, and
power meshes
Component
Function
Interface
Current
“Component”
Component
Fabric
Future
Component
Future Chips
Automated
connection with
parametric fabric
© 2006 IBM Corporation
9
Custom Design
 Careful interconnect design
– Communication
– Clock distribution
– Power and ground
 Better power efficiency
– Clock gating, Power gating
– Detailed transistor sizing
 High bandwidth memory and I/O
 Higher frequency operation
© 2006 IBM Corporation
10
Challenges of Modular Design
 Custom Layout
– Flexible shape and orientation
Core
Core
– Optimum mesh for power and clock
Core
Core
– Distributed communication and test
Core
Core
– Manually optimized
 Modular Layout
– Constrained shape and orientation
– Separate power and clock per core
Core
– Automatic connection to fabric
Core
– Parametric interconnect fabric
© 2006 IBM Corporation
11
Custom Clock Design
 Distribution network
– Latches and clocked gates
– Control skew and jitter
– Minimize power
– Survive variation and noise
 Interconnect models
– Inductance critical
– Transmission line
– Buffer placement
 Hand optimized
– Still an art
Phillip Restle
© 2006 IBM Corporation
12
Custom Power Distribution
 Distribute to all devices
 Multiple voltage domains
 Simulate detailed power demand
 Model chip and package
 Consider ground coupling
 Balance mesh and trees
 Allocate decoupling capacitors
 Focus on resonant frequency
 Explore clock/power gating
scenarios
Howard Chen
© 2006 IBM Corporation
13
Challenges of Modular Design
 Custom Wiring
– Optimized over chip
– Resources shared
– Variation minimized
– Complex analysis and integration
 Modular Wiring
– Optimized at block level
– Fixed resource allocation
– Some variation in results
– Requires automated integration
© 2006 IBM Corporation
14
Spectrum of Strategies
Modular
Reuse
Fixed Layout
Extended
Synthesis
….
Parametric
…..
Generated
Fixed physical architecture
Generated physical architecture
 Careful block design
 More abstract layout
 Custom within block
 Heavy physical synthesis
 Automated block connect
 Unique block configuration
 Predictable results
 Results will vary
 Good for planned cases
 Flexible restructuring
 Stresses design
 Stresses tools
© 2006 IBM Corporation
15
Systems Demand Early Analysis
 To explore many more options
– Cores, Accelerators, Interconnect, Memory Hierarchy, …
 To consider many design criteria simultaneously
– Power, Performance, Latency, Hotspots, Reliability, …
 To optimize system for specific market
 Environment exists for early functional modeling
 But today’s tools are not linked to physical design
© 2006 IBM Corporation
16
Early System Analysis
Assumptions
Design
Floorplan
Performance
Models
Interconnect
Analysis
Design
Team
Power
Analysis
Thermal
Analysis
Package
Technology
Implementation
 Loosely coupled disciplines with multiple experts and distinct models
© 2006 IBM Corporation
17
Performance Modeling Is Changing
 New parallel workloads emerging
– Execution vs. trace driven
 Shifting to multi-core designs
– Stresses balance of model performance and accuracy
 Complex interconnect fabric and memory hierarchy
– Bus, switch, network, asynchronous,…
 Increasing use of SystemC
– For early software development and component sharing
© 2006 IBM Corporation
18
Early Physical Planning is Essential
 Interconnect requires full chip layout
– Estimate component area before implementation
– Need more accurate methods
– Have to plan for all facilities to predict chip size
 Placement coupled to many factors
– Interconnect performance
– Power
– Thermal and reliability concerns
– Yield
© 2006 IBM Corporation
19
Modeling Interconnects in Multi-Core Designs
 Interconnect delays
– Effect performance
– Depend on placement
– Require accurate modeling
Core
Core
Cache
Cache
Interconnect Fabric
Cache
Cache
Core
Core
Interconnect
Delays
Memory
Controller
Async/Sync
Interface with
Parametric
delay
© 2006 IBM Corporation
20
Power is Key Criteria, but Hard to Predict
 Need estimate before implementation
– Voltage/Frequency scaling, Voltage islands,
clock gating, leakage
 Not just core, but many diverse chip components
– Core, cache, interconnect, controllers, I/O, pervasive
 Model “interesting” states and transitions
 Scale known implementations
– Complex measurement process for calibration
– Requires data from chip layout
© 2006 IBM Corporation
21
Integrated Early System Analysis
 Couple all forms of early analysis
Design
Team
 Share data in central repository
 Industry standard data model
– Open Access
 Hand-off to chip integration
– Assumptions, blocks, layout, …
Performance
Design
Floorplan
Package
Technology
Assumptions
Results
 Graphic interface for editing
 Stage is set for optimization
Power
Interconnect
Thermal
Optimize
Handoff
Implementation
© 2006 IBM Corporation
22
Multi-Core Verification
 Verification has always been the greatest challenge
 Complexity grows with each generation
 Challenge is to exploit reuse with multi-core designs
– Requires clear interface definition
Core
Core
Verification
Core
System
Verification
Core
Traditional Approach
Core
Core
Multi-Core Approach
© 2006 IBM Corporation
23
Core Verification
 Complexity growing
– Clock/Power gating, Voltage and frequency scaling
 Formal methods are used
– Checking RTL = netlist
– Checking assertions
– Proving implementation equivalent to reference model
 Simulation still dominates
 Need higher level of specification
– Improve quality
– Stretch synthesis and verification tools
 Reuse verification environment
© 2006 IBM Corporation
24
System Verification
 More complex systems
– Many cores, accelerators, networks, asynchronous links
 Memory and network contention is critical area
 Formal methods have made impact
– Verifying abstract memory protocols
 Simulation is still the final check
 Need system-level test case generation
– Use system knowledge to expose resource contention issues
© 2006 IBM Corporation
25
Summary
 Exciting and challenging times
– Designing application optimized multi-core systems
– Delivering custom efficiency with ASIC productivity
 Focus areas
– Physical Architecture to streamline chip integration
– Integrated Early Analysis to explore design space
– Multi-core verification that exploits reuse
 Long history of invention in today’s RTL flow
 Innovation is needed now at the system level
© 2006 IBM Corporation
26
Acknowledgements
 Thanks to the following people
– Emrah Acar, Reinaldo Bergamaschi, Pradip Bose,
Howard Chen, Nagu Dhanwada, Steven German, Steve
Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert
Ruehli, Michael Vinov.
© 2006 IBM Corporation
27