SLATE A System-Level Analysis Tool for Early Exploration
Download
Report
Transcript SLATE A System-Level Analysis Tool for Early Exploration
© 2006 IBM Corporation
0
IBM Research
Multi-Core
Design Automation
Challenges
John Darringer
IBM T. J. Watson Research Center
Yorktown Heights, NY, USA
DAC 2007
© 2007 IBM Corporation
System Performance Requires An Integrated Approach
Application
Device Performance
200
100
Languages,
Software Tuning
Efficient Programming
Middleware
Recent
Historical
Trend
System Level
Dynamic optimization
Assist Threads
FPG
Fast Computation
Power Optimization
Compiler Support
Production Date
20
1998
2000
2002
2004
2006
2008
Chip Level
Scaling no longer provides
traditional performance boost
SMT
Accelerators
Power Management
Interconnect
Circuits
Power limits everything
Advances will come from
entire performance stack
2
Compiler Support
Multiple Cores
Technology
Packaging, Cooling
New Devices
Dense SRAM, eDRAM
Optics
Memory
© 2006 IBM Corporation
ISU
FXU
FPU
FXU
Innovation in System Design
ISU
IDU
IDU
LSU
IFU
BXU
L3 Directory/Control
Power 6
4.7 Ghz-2007
FPU
L2
LSU
L2
IFU
BXU
L2
Power 4
Multi-Core-2001
Power 5
Multi-Thread-2004
CELL
Accelerators-2006
© 2006 IBM Corporation
3
Trend to Modular Application Optimized Systems
SMP
Core
Accelerator
Cache
Memory
...
Blades
Growing use of diverse modular components
Chip integration may evolve to component assembly
Challenge is in system-level design
– Optimizing architecture for specific applications
© 2006 IBM Corporation
4
Multi-Core ASICs
Multi-core ASIC SoCs are common today
– Address broad range of markets
– Enables high functional integration
– Provides rapid time to market
One example from 2004
– Cisco Silicon Packet Processor
– 188 32-bit RISC processors
– 47 BIPS
© 2006 IBM Corporation
5
Multi-Core Processors
Power efficient, reusable cores
Application matched accelerators
Flexible scaleable interconnect
Optimized memory hierarchy
High speed I/O
Energy management
Deliver system performance
Rapid chip assembly to serve
diverse markets
© 2006 IBM Corporation
6
CHALLENGE
System Design
Design Automation
– Continued performance growth
– Custom design efficiency
– Increasing power efficiency
– AISC productivity
– Optimizing for new applications
– Design and verification
Enablers
– Physical Architecture
– Integrated Early Analysis
– Multi-Core Verification
© 2006 IBM Corporation
7
Physical Architecture
Complement logical architecture
Streamline chip integration
Plan for interconnect
Provide predictable results
Example Logical Architecture
Multiple strategies
– Fixed layout per block
– Parametric or generated
– Extended synthesis
Example Physical Architecture
© 2006 IBM Corporation
8
Modular Components
Components need self-contained vertical stack
– with clean interfaces to enable automated integration
Mixed Fabric and
Component Function;
Custom Interface
Current Chips
Custom crafting
of clock, data, and
power meshes
Component
Function
Interface
Current
“Component”
Component
Fabric
Future
Component
Future Chips
Automated
connection with
parametric fabric
© 2006 IBM Corporation
9
Custom Design
Careful interconnect design
– Communication
– Clock distribution
– Power and ground
Better power efficiency
– Clock gating, Power gating
– Detailed transistor sizing
High bandwidth memory and I/O
Higher frequency operation
© 2006 IBM Corporation
10
Challenges of Modular Design
Custom Layout
– Flexible shape and orientation
Core
Core
– Optimum mesh for power and clock
Core
Core
– Distributed communication and test
Core
Core
– Manually optimized
Modular Layout
– Constrained shape and orientation
– Separate power and clock per core
Core
– Automatic connection to fabric
Core
– Parametric interconnect fabric
© 2006 IBM Corporation
11
Custom Clock Design
Distribution network
– Latches and clocked gates
– Control skew and jitter
– Minimize power
– Survive variation and noise
Interconnect models
– Inductance critical
– Transmission line
– Buffer placement
Hand optimized
– Still an art
Phillip Restle
© 2006 IBM Corporation
12
Custom Power Distribution
Distribute to all devices
Multiple voltage domains
Simulate detailed power demand
Model chip and package
Consider ground coupling
Balance mesh and trees
Allocate decoupling capacitors
Focus on resonant frequency
Explore clock/power gating
scenarios
Howard Chen
© 2006 IBM Corporation
13
Challenges of Modular Design
Custom Wiring
– Optimized over chip
– Resources shared
– Variation minimized
– Complex analysis and integration
Modular Wiring
– Optimized at block level
– Fixed resource allocation
– Some variation in results
– Requires automated integration
© 2006 IBM Corporation
14
Spectrum of Strategies
Modular
Reuse
Fixed Layout
Extended
Synthesis
….
Parametric
…..
Generated
Fixed physical architecture
Generated physical architecture
Careful block design
More abstract layout
Custom within block
Heavy physical synthesis
Automated block connect
Unique block configuration
Predictable results
Results will vary
Good for planned cases
Flexible restructuring
Stresses design
Stresses tools
© 2006 IBM Corporation
15
Systems Demand Early Analysis
To explore many more options
– Cores, Accelerators, Interconnect, Memory Hierarchy, …
To consider many design criteria simultaneously
– Power, Performance, Latency, Hotspots, Reliability, …
To optimize system for specific market
Environment exists for early functional modeling
But today’s tools are not linked to physical design
© 2006 IBM Corporation
16
Early System Analysis
Assumptions
Design
Floorplan
Performance
Models
Interconnect
Analysis
Design
Team
Power
Analysis
Thermal
Analysis
Package
Technology
Implementation
Loosely coupled disciplines with multiple experts and distinct models
© 2006 IBM Corporation
17
Performance Modeling Is Changing
New parallel workloads emerging
– Execution vs. trace driven
Shifting to multi-core designs
– Stresses balance of model performance and accuracy
Complex interconnect fabric and memory hierarchy
– Bus, switch, network, asynchronous,…
Increasing use of SystemC
– For early software development and component sharing
© 2006 IBM Corporation
18
Early Physical Planning is Essential
Interconnect requires full chip layout
– Estimate component area before implementation
– Need more accurate methods
– Have to plan for all facilities to predict chip size
Placement coupled to many factors
– Interconnect performance
– Power
– Thermal and reliability concerns
– Yield
© 2006 IBM Corporation
19
Modeling Interconnects in Multi-Core Designs
Interconnect delays
– Effect performance
– Depend on placement
– Require accurate modeling
Core
Core
Cache
Cache
Interconnect Fabric
Cache
Cache
Core
Core
Interconnect
Delays
Memory
Controller
Async/Sync
Interface with
Parametric
delay
© 2006 IBM Corporation
20
Power is Key Criteria, but Hard to Predict
Need estimate before implementation
– Voltage/Frequency scaling, Voltage islands,
clock gating, leakage
Not just core, but many diverse chip components
– Core, cache, interconnect, controllers, I/O, pervasive
Model “interesting” states and transitions
Scale known implementations
– Complex measurement process for calibration
– Requires data from chip layout
© 2006 IBM Corporation
21
Integrated Early System Analysis
Couple all forms of early analysis
Design
Team
Share data in central repository
Industry standard data model
– Open Access
Hand-off to chip integration
– Assumptions, blocks, layout, …
Performance
Design
Floorplan
Package
Technology
Assumptions
Results
Graphic interface for editing
Stage is set for optimization
Power
Interconnect
Thermal
Optimize
Handoff
Implementation
© 2006 IBM Corporation
22
Multi-Core Verification
Verification has always been the greatest challenge
Complexity grows with each generation
Challenge is to exploit reuse with multi-core designs
– Requires clear interface definition
Core
Core
Verification
Core
System
Verification
Core
Traditional Approach
Core
Core
Multi-Core Approach
© 2006 IBM Corporation
23
Core Verification
Complexity growing
– Clock/Power gating, Voltage and frequency scaling
Formal methods are used
– Checking RTL = netlist
– Checking assertions
– Proving implementation equivalent to reference model
Simulation still dominates
Need higher level of specification
– Improve quality
– Stretch synthesis and verification tools
Reuse verification environment
© 2006 IBM Corporation
24
System Verification
More complex systems
– Many cores, accelerators, networks, asynchronous links
Memory and network contention is critical area
Formal methods have made impact
– Verifying abstract memory protocols
Simulation is still the final check
Need system-level test case generation
– Use system knowledge to expose resource contention issues
© 2006 IBM Corporation
25
Summary
Exciting and challenging times
– Designing application optimized multi-core systems
– Delivering custom efficiency with ASIC productivity
Focus areas
– Physical Architecture to streamline chip integration
– Integrated Early Analysis to explore design space
– Multi-core verification that exploits reuse
Long history of invention in today’s RTL flow
Innovation is needed now at the system level
© 2006 IBM Corporation
26
Acknowledgements
Thanks to the following people
– Emrah Acar, Reinaldo Bergamaschi, Pradip Bose,
Howard Chen, Nagu Dhanwada, Steven German, Steve
Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert
Ruehli, Michael Vinov.
© 2006 IBM Corporation
27