CPU Performance

Download Report

Transcript CPU Performance

SoC Architecture Course
Oct 2008 – Jan 2009, KTH
Zhonghai Lu / Axel Jantsch
[email protected]
Course Information





Course responsible: Dr. Zhonghai Lu
Course examiner: Prof. Axel Jantsch
12 Lectures, 4 Tutorials, 3 Labs
Home page: www.ict.kth.se/courses/IL2207
Course Material




Dally, Towles: Principles and Practices of Interconnection
Networks
Distributed Material
Slides
Advanced-level course, more demanding
July 20, 2015
SoC Architecture
2
Lecture Overview












L1: Introduction
L2: Buses and Arbitration (Dally: 22, 18)
L3: Shared Memory Multiprocessors
L4: Cache Coherency Protocols
L5: Memory Consistency
L6: Introduction to Network-on-Chip, Topologies (Dally: 1, 2, 3, 4, 5)
L7: Routing Algorithms and Mechanics (Dally: 8, 9, 10, 11)
L8: Flow Control (Dally: 12, 13)
L9: Deadlock and Livelock (Dally: 12, 13, 14)
L10: Router Architecture and Network Interface (Dally: 16, 17, 20)
L11: Quality of Service and Performance Analysis (Dally: 15)
L12: Course Summary and Trends
July 20, 2015
SoC Architecture
3
Tutorial Overview

T1: Bus, arbitration and cache coherency



T2: Memory consistency and network topology



After Lecture 7, on Nov. 12
By Dr. Lu
T3: Interconnection networks (routing, flow control, deadlock etc.)



After Lecture 5, on Nov. 5
By Prof. Jantsch
After Lecture 8, on Nov. 19
By Dr. Lu
T4: Router architecture, QoS and performance analysis


After Lecture 12, on Dec. 3.
By Prof. Jantsch
July 20, 2015
SoC Architecture
4
Lab Overview





Laboratory 1: Uniprocessor SoC Design with Altera
Laboratory 2: Multiprocessor SoC Design with Altera
Laboratory 3: Wormhole Networks
Students work in groups of max. 2
Good preparation is required.
July 20, 2015
SoC Architecture
5
Course Requirements
To pass the course the student has to fulfill the following
requirements:

Pass the final exam. The grade for the exam will
be the grade of the course.




Final exam: Dec. 16, 2008, 9:00-13:00,
Room 432, 438, 439,530
** Register the exam in Daisy 2 weeks before the
exam date in order to guaranttee a seat *!
Attend tutorials
Complete all labs
July 20, 2015
SoC Architecture
6
Observations in
System Design
Advances in Integration
Intel 4004
(1971)
Intel Pentium 4
(2000)
1.5 GHz
42 million transitors
108 KHz
2,300 transistors
If automobile speed had increased similarly over the same
period, we could now drive from Stockholm to Shanghai in
about 23 seconds.
July 20, 2015
SoC Architecture
8
Advances in Integration - 2007

Intel Terflop
Chip 2007

http://techresear
ch.intel.com/arti
cles/TeraScale/1449.htm
Growing Design-Productivity Gap
Moore’s Law:
Design Productivity Crisis
Standard cell density and speed
10,000
100,000,000
Gates
Clock
1,000
100
Logic Transistor per Chip (M)
Equivalent Added Complexity
1,000
10,000
Logic Tr./Chip
100
Tr./S.M.
1,000
10
100
1
10
0.1
xx
xx
x
x
x
1
x
0.01
0.1
0.001
0.01
Productivity (K) Trans./Staff – Mo.
Density (Kgates/mm2)
ASIC clock (MHz)
Potential Design Complexity and Designer Productivity
10,000
Source: (SRC 1997)
Designs do not only get more complex, but also much more expensive!
July 20, 2015
SoC Architecture
10
The Role of the Market!
Source: Smith 1997
July 20, 2015
SoC Architecture
11
Moore’s Law drives the development
of System-in-Chip Architectures
RTL
function 1
Processor
RTL
function 2
Yesterday’s SOC
RTL
function 3
The growing number of
transistors on an SOC drives
the trend towards more RTL
blocks on the chip
Memory
RTL
I/O
Ctl RTL RTL RTL
Proc RTL RTL RTL
Mem
RTL RTL RTL
RTL RTL RTL
DSP RTL RTL RTL
Mem RTL RTL I/O
Today’s SOC
Source: Leibson (DAC2004)
July 20, 2015
SoC Architecture
12
Verification Costs

The percentage of the verification costs of the
total design costs is continuously increasing
(at present 50-70% for large designs)
July 20, 2015
SoC Architecture
13
Platforms reduce Costs
SOC Flexibility = Per-Unit Cost Reduction
Source: Leibson 2004
(Model: 100K and 1M system volumes)
120
Low-end
still camera
100
Total per unit cost
100 000
High-end
still camera
1 000 000
80
60
40
20
Video camcorder
0
1
One Chip
2
3
4
5
6
7
System designs per chip design
Many System Designs
$10M design cost, $15 manf. cost, 5% premium for programmability
July 20, 2015
SoC Architecture
14
Platform Example: Nexperia
July 20, 2015
SoC Architecture
15
Nexperia Instance: Viper
July 20, 2015
SoC Architecture
16
Arm based MPSoC Platform
July 20, 2015
SoC Architecture
17
Texas Instruments
OMAP
A SOC Platform
based on
Peter Cumming: ”The TI OMAP
Platform Approach to SOC”
The OMAP platform



OMAP products are combinations of hardware and
software allowing mutimedia capabilities to be included
in 2.5G and 3G wireless handsets and PDAs
Critical design paramters are: Performance, Power,
Cost and Time-to-Market
First Approach: ”Opportunistic Reuse”


No planned reuse, but try to reuse whenever possible
Second Approach: ”Structured Approach”

Systematic Reuse, SoC Platform
July 20, 2015
SoC Architecture
19
What is a platform?

OMAP defines a platform as


”a packaged capability used in subsequent stages of the
development to reduce development costs”
Platforms have the following characteristics:



Between silicon and systems many platforms may be
developed and used in subsequent stages of the
development
Platforms are valuable due to the notion of reuse (good for
economy)
They include hardware, software, assemblies and tools!
July 20, 2015
SoC Architecture
20
Examples for platforms



Transistor and ASIC libraries are the lowest
hardware platforms
Instruction Set Architecture and associated
Assembly Language Tools are the lowest
levels in Software
These well-understood levels are used by
other OMAP platforms
July 20, 2015
SoC Architecture
21
OMAP: Hierarchy of Platforms
Application
Specific
Ref
Design
Appl.
Platform
SoC Platform
OMAP Products
OMAP Infrastructure
ASIC Library & Tools
Silicon Technology


Reuse
OMAP uses platforms on different levels
This is a precondition for reuse
July 20, 2015
SoC Architecture
22
SoC Platform

The SoC platform consists of



The Application Platform (the OMAP product)




A library of hardware components
An architecture for their interconnection
Processor and Peripherals
Low-Level Software (Drivers)
Development Environment
The System Platform


The platform includes the code that controls all aspects of the
system from device driver to system interface
TI has a reference design group in order to understand the new
demands for OMAP
July 20, 2015
SoC Architecture
23
OMAP Products

The OMAP product range consists of several
families of devices for different markets, e.g.


Application processors for 3G: OMAP 1510 and
1610
Application processors for 2.5G: OMAP 710 and
730
July 20, 2015
SoC Architecture
24
OMAP 1510

OMAP 1510 is based on



Enhanced ARM 925 core (RISC processor)
TI C55x core
DMA, SRAM, Busses, Peripherals
July 20, 2015
SoC Architecture
25
Current OMAP platform for
Wireless Handset & PDA

OMAP™ 3 architecture combines mobile
entertainment with high performance productivity
applications (Source: Texas Instruments)
July 20, 2015
SoC Architecture
26
Strength of the OMAP concept


The main strength of the OMAP concept is that several actors can
make extensive Reuse of development efforts at several levels of
the design process
Actors:




Levels:





Mobile Device Manufacturers
Software Developers
TI’s internal Development Teams
Common Hardware and Software Interfaces
Common Development Environment
Single Low-Level Software Framework (Code can be used for several
products)
Single SoC Platform
OMAPI is an interface standard for OMAP founded by TI and ST
July 20, 2015
SoC Architecture
27
OAMP Architecture

The OMAP architectute consisting of general
purpose processor and DSP has been
chosen because of the application area




Need for Performance
Energy and Area Constraints
Two Main Tasks: User Interface and Signal
Processing
Flexibility and Reuse
July 20, 2015
SoC Architecture
28
Requirements on Software
Platform

Hardware architecture requires a matching
software approach



Well-defined Set of Application Programming
Interfaces in the high-level OS running on the
general purpose processor
System Software that links General Purpose
Applications to DSP components
Well-defined Standard for DSP Components
(TMS320 Algorithm Standard or eXpressDSP)
July 20, 2015
SoC Architecture
29
Summary

The OMAP platform




Covers a wide range of products allowing to
reuse Hardware and Software
Hardware Architecture adopted to Application
Area
Software Architecture using features of Hardware
Architecture
Efficient SOC Platform with Definitions for
Hardware and Software Reuse
July 20, 2015
SoC Architecture
30
Emerging
Architectures
System-on-Chip Architectures

A system-on-chip architecture integrates several
heterogeneous components on a single chip
Microcontroller
Communication
Structure
AnalogDigital
DSP

July 20, 2015
Memory
FPGA
DigitalAnalog
Custom
Hardware
A key challenge is to design the communication between the
different entities of a SoC in order to minimize the
communication overhead
SoC Architecture
32
System-on-Chip Architecture:
A bus-based SoC
Memory
Microprocessor
System on a chip
July 20, 2015
Custom
Logic
SoC Architecture
DSP
I/O
33
System-on-Chip Architecture:
Network-on-Chip
Switch
PE1
NI
NI
PE3
Channel
PE2
Resource
NI
NI
MEM
Network Interface


The resources are connected to the network via
network interfaces
The topology of the network and the capability of the
switches and communication channels determines
the capacity of the network
July 20, 2015
SoC Architecture
34
ASIC Technologies
What is an ASIC?



ASIC = Application Specific Integrated Circuit
An ASIC is an integrated circuit for a specifc
application and (generally) produced in
relatively small volumes.
An ASIC-technology helps to shorten the
design time by providing a semi-fabricated
integrated circuit
July 20, 2015
SoC Architecture
36
ASIC families
The term ASIC is often reserved for circuits that are fabricated
in a silicon foundry, while circuits that can be programmed at
the customer’s site are called Programmable Logic.
Programmable Logic
 Programmable Logic
Device (PLD)
 Field Programmable Gate
Array
ASIC
 Standard Cell
 Gate Array
The term full custom is reserved for circuits where all silicon
layers can be optimized. This implies a long design process
and thus full custom is mainly used for high-volume high-end
circuits.
July 20, 2015
SoC Architecture
37
Standard Cell



Standard cells are often referred
as Cell-Based Integrated Circuits
(CBIC)
All mask layers are customized
The standard cell library defines
logic elements of varying
complexity: SSI, MSI logic, data
path blocks, memories and
system-level blocks.
July 20, 2015
SoC Architecture
38
Standard Cells


Cells are configured in rows and have constant height and variable
width
Each cell is optimized for an efficient implementation
July 20, 2015
SoC Architecture
39
Gate Array


A gate array chip contains prefabricated
adjacent rows of PMOS and NMOS
transistors
The gate array is configured by the
interconnect structure
July 20, 2015
SoC Architecture
40
Channeled Gate Array


Only the interconnect is
customized
The interconnect uses
spaces between rows
of base cells
July 20, 2015
SoC Architecture
41
Channelless Gate Array
(Sea of Gates)


Only the interconnect is
customized
Cells are connected via
unused transistors
July 20, 2015
SoC Architecture
42
Field Programmable Gate
Arrays



None of the layers are
customized
Basic logic cells and
interconnect can be
programmed
Basic cells can be SRAM
based, Flash Memory
based or fuse-based
(one time programmable)
July 20, 2015
SoC Architecture
43
Programmable Logic Device
• No customized mask
layers or logic cells
• A single large block of
interconnects
• Macrocells consist of
programmable array
logic followed by a flipflop or latch
July 20, 2015
SoC Architecture
44
Comparison
FPGA, Gate Array, Standard Cell
FPGA
Initial Cost
Cost per part
Performance
Fabrication
Time
Low
High
Low
Short
High
Low
High
Long
Gate Array
Standard Cell
July 20, 2015
SoC Architecture
45
Design Trade-Offs
Design Time
Full Custom
Standard Cell
Gate Array
Programmable
Logic
Microprocessor
Performance
July 20, 2015
SoC Architecture
46
Challenges for
System Design
How to design a system-onchip?

Implementation
 Efficient implementations require
to exploit the low-level features of
the target architecture
Challenge for
System Design!
July 20, 2015
Idea (Specification)
abstract
Design

Specification
 Design productivity increases
with the level of abstraction
 The task of functional verification
is very difficult at low abstraction
levels
Abstraction Gap
detailed
Product (Implementation)
SoC Architecture
48
SoC Design


The continuous progress in silicon process technology allows
to increase more and more functionality on a single chip =>
Systems on a chip become reality
Market-driven forces:
 Shorter product design schedules and life spans
 Products have to confirm to standards
 The design has to be right from the start. An
implementation error means heavy loss of money or
product death
 Large designs are integrated into a single chip
The SoC design process must address these driving forces
July 20, 2015
SoC Architecture
49
The Design Process
Design Step
Intermediate Model
Abstraction Gap
Abstraction Level
Design Specification
Implementation
Design Space
July 20, 2015
SoC Architecture
50
Requirements on Design Flow

Design Entry



Well-defined abstract specification model
Efficient verification methodology
Design Refinement



Well-defined models at all abstraction levels
Well-defined refinement steps
Verification at all levels
July 20, 2015
SoC Architecture
51
Requirements on Design Flow

Implementation Mapping



Efficient platform architecture with well-defined
API
Mapping detailed implementation model to API
services
Tool Support




Verification
Design Refinement
Implementation Mapping
Estimation of Properties
July 20, 2015
SoC Architecture
52
Design Process

A design specification has to be mapped on an
architecture
Design
Specification
Architecture
Specification
Design
Process
Design
Implementation
July 20, 2015
SoC Architecture
53
Design Process
(Uniprocessor)

A program is compiled to assembler code for a chosen
uniprocessor and operative system
Program
(Parallel Tasks)
Uniprocessor
+
Operating Syst.
Compilation
Executable
Code
July 20, 2015
SoC Architecture
54
Design Process

The design process for a SoC applications is a very
complex task




Many components work in parallel and communicate with
each other
A task can be mapped on different components
The overhead for communication depends on how tasks
are located
The designer has to choose an appropriate SoC
architecture, since different architectures have different
strength and weaknesses
July 20, 2015
SoC Architecture
55
Design Process
(System-On-Chip)

A specification shall be mapped onto a SOCArchitecture with several heterogeneous components
Specification
(Parallel Tasks)
SoC Arch.
with several
components
Partitioning, Mapping,
Compilation
HW Descr.
Comp. A
July 20, 2015
HW Descr.
Comp. B
SoC Architecture
Code
Processor X
Code
Processor Y
56
Platform-Based Design
The idea of a platform is to simplify the
design process
Programmers Model
Hardware Abstraction
Hardware Platform
Microcontroller
FPGA
Communication
Structure
AnalogDigital
DigitalAnalog
Custom
Hardware
DSP
July 20, 2015
Memory
SoC Architecture
57
System-on-Chip Platform

Layered Concept
allows to



API
Services with Guarantees
Change the physical
architecture of the SoC
without affecting the
application
Add new services on top
of existing architecture
Changes in one layer
affect only the layer
itself and its interfaces
July 20, 2015
SoC Architecture
Transaction
Messages, Load/Store
Transport
Packets
Physical
Wires, Clocks
58
Concurrency
Embedded Systems have to
cope with Parallelism
Sink
C
A
Embedded
System
B
Reactive
Environment
D
Source



Provides an alternative to faster clock for performance
Applies at all levels of system design
Is essential within embedded system design, where the
system has to react to several inputs from the
environment
July 20, 2015
SoC Architecture
60
System-on-Chip:
A Parallel Architectures

A parallel computer is a collection of processing
elements that cooperate to solve large problems fast

Resources



Data access, Communication and Synchronization




Processing capacity of the components
Distributed and/or global memory
Communication protocol
Communication capacity
Communication abstraction and primitives
Objectives

Performance and Scalability
July 20, 2015
SoC Architecture
61
Components in a Parallel SoC




Microprocessor cores or DSP:s are cheap and
optimized for their application area
Customizable hardware can be used to guarantee a
high performance for a special task
Often each parallel task does not need a
tremendous processing power
It is important, how the parallel tasks can be
mapped onto the SoC so that the parallel nature of
the system can be fully exploited
July 20, 2015
SoC Architecture
62
Communication Primitives
System on Chip

There are two main paradigms


Shared Memory
Message Passing
July 20, 2015
SoC Architecture
63
Communication Primitives
System on Chip

Shared memory is typical for bus-systems,
since naturally a memory is connected to the
bus that all processing entities can access
Memory
System on a chip
July 20, 2015
Microprocessor
Custom
Logic
(ASIC)
SoC Architecture
DSP
I/O
64
Communication Primitives
Network on Chip
Switch
PE1
NI
NI
PE3
NI
MEM
Channel
PE2
NI
Network Interface


Message passing looks very natural for networks-onchip, since a shared memory is usually not available
However, locality is important, since otherwise huge
amounts of data have to be sent over a network
July 20, 2015
SoC Architecture
65
Message Passing
Message
P1




P2
Processes send messages between processes
A message has a sender and and receiver(s)
Primitives are Send and Receive
Programming does not include a shared memory
July 20, 2015
SoC Architecture
66
Programming Model for
Message Passing
A
C
Process
Receive Message
(Wait for message)
Send
Message
D
B


Natural Model for NoCs: Communicating Finite State
Machines
Communication is done by message passing
(languages like SDL are suitable)
July 20, 2015
SoC Architecture
67
Implementation of a Message
Passing Programming Model
 A programming model based on message
passing can still be implemented by a
shared memory architecture
 Each layer has to use the primitives that are
provided by their lower layer neighbour
Source Code
uses High-Level Comm. Primitives
Compiled Program
Operating System
uses Low-Level Comm. Primitives
uses Hardware Drivers
Mem
P1
P2
here Shared Memory Comm.
(can also be NoC)
Hardware
July 20, 2015
SoC Architecture
68
Summary



System-on-Chips are heterogeneous and
parallel
A good communication is the key to an
efficient parallel architecture
In the course we will mainly focus on
comunnication architectures


July 20, 2015
Buses
Network-on-chip
SoC Architecture
69