CPU Performance

Download Report

Transcript CPU Performance

IL2207
SoC Architecture Course
Jan – March 2011, KTH
Dr. Zhonghai Lu
[email protected]
(鲁中海)
Course Information

Course staff






12 Lectures, 4 Tutorials, 3 Labs
Home page: www.ict.kth.se/courses/IL2207/1101
Course Material



Responsible: Dr. Zhonghai Lu, [email protected]
Examiner: Prof. Axel Jantsch, [email protected]
Assistants: Huimin She, [email protected]
Abbas Eslami Kiasari, [email protected]
Dally, Towles: Principles and Practices of
Interconnection Networks
Distributed materials and slides
Advanced-level course, 7.5 credits, 40x5=200 hours
July 20, 2015
SoC Architecture
2
Lecture Overview












L1: Introduction
L2: Buses and Arbitration (Dally: 22, 18)
L3: Shared Memory Multiprocessors
L4: Cache Coherency Protocols
L5: Memory Consistency
L6: Introduction to Network-on-Chip, Topologies (Dally: 1, 2, 3, 4, 5)
L7: Routing Algorithms and Mechanics (Dally: 8, 9, 10, 11)
L8: Flow Control (Dally: 12, 13)
L9: Deadlock and Livelock (Dally: 12, 13, 14)
L10: Router Architecture and Network Interface (Dally: 16, 17, 20)
L11: Network Performance Analysis and Quality of Service (Dally:
23, Dally 15)
L12: Course Summary
July 20, 2015
SoC Architecture
3
Tutorial Overview

T1: Bus, arbitration and cache coherency
T2: Memory consistency and network topology
T3: Interconnection networks (routing, flow control, deadlock etc.)
T4: Router architecture, QoS and performance analysis

Tutorials will be given by Abbas.

For each tutorial questions, 2 Questions should be answered and
handed in to Abbas before each tutorial session. 10% for the final
grade.



July 20, 2015
SoC Architecture
4
Lab Overview

Laboratory 1: Uniprocessor SoC Design on FPGA


Laboratory 2: Multiprocessor SoC Design with Altera FPGA





Assistant: Huimin
Laboratory 3: Wormhole Networks


Assistant: Huimin
Assistant: Abbas
Each lab has 2 sessions: a, b.
Students work in groups of max. 2 students
Good preparation is required.
Take good care of the FPGA boards.
July 20, 2015
SoC Architecture
5
Course Requirements
To pass the course the student has to fulfill the following
requirements:

Pass the final exam. The grade for the exam will
be the 90% grade of the course: ABCDEFxF




Final exam: March 16, 2011, 9:00-13:00,
Register the exam in Daisy 2 weeks before the exam
date in order to guarantee a seat !
Complete all labs: Pass | Fail
Attend lectures, tutorials and labs
July 20, 2015
SoC Architecture
6
Labs

3 labs in total



Only 2 (NOT 3) lab sessions for each lab
 Possible cancelled sessions: 13:00-16:00 (Jan. 24, Feb. 7,
Feb. 21). Please note the final changes in TimeEdit.


Lab 1 and 2: FPGA board. (Assistant: Huimin She [email protected] )
Lab 3: Network simulator. (Assistant: Abbas Eslami Kiasari
[email protected] )
Evenly distributed to avoid long waiting time (Approx. 20 persons in
each session)
Lab partners



Two persons in a group
If you also take IL2212 Embedded Software, please choose the
same partner as you have for IL2212
The FPGA boards must be returned after lab 1
Note schedule changes


We are resolving schedule conflicts with
IL2201 course: Digital Integrated Circuit
Design – VLSI
For each lab, we have 3 sessions booked,
two will remain and one to be cancelled.

Tomorrow's IL2207 lecture will be 10 to 12 AM in
Ka-C21 (Electrum)
July 20, 2015
SoC Architecture
8
Observations in
System Design
Observations

Good news



Bad news



Chip capacity increases following the Moore’s law
Functionality increases accordingly to exploit
these transistors
Difficult to design, Productivity decreases
Cost increases
Platform-based design can reduce cost

Architecture is a key!
July 20, 2015
SoC Architecture
10
Advances in Integration
Intel 4004
(1971)
Intel Pentium 4
(2000)
1.5 GHz
42 million transitors
108 KHz
2,300 transistors
If automobile speed had increased similarly over the same
period, we could now drive from Stockholm to Shanghai in
about 23 seconds.
July 20, 2015
SoC Architecture
11
Intel chips with Moore’s law
October, 2008
Seminar at National Institute of Informatics, Tokyo
12/22
Scaling ARM9
ARM 9
180 nm
11.8 mm2
130 nm,
5.2 mm2
90 nm,
2.6 mm2
65 nm
1.4 mm2
Growing Design-Productivity Gap
Design Productivity Crisis
Potential Design Complexity and Designer Productivity
10,000
100,000,000
1,000
10,000
Logic Tr./Chip
100
Tr./S.M.
1,000
10
100
1
10
x
0.1
xx
xx
x
x
1
x
0.01
0.1
0.001
0.01
Productivity (K) Trans./Staff – Mo.
Logic Transistor per Chip (M)
Equivalent Added Complexity
Designs do not only get more complex, but also much more expensive!
July 20, 2015
SoC Architecture
14
The Role of the Market!
Source: Smith 1997
July 20, 2015
Time-to-Market pressure!
SoC Architecture
15
Verification Costs

The percentage of the verification costs of the
total design costs is continuously increasing
(at present 50-70% for large designs)
July 20, 2015
SoC Architecture
16
Moore’s Law drives the development
of System-in-Chip Architectures
RTL
function 1
Processor
RTL
function 2
Yesterday’s SOC
RTL
function 3
The growing number of
transistors on an SOC drives
the trend towards more RTL
blocks on the chip
Memory
RTL
I/O
Ctl RTL RTL RTL
Proc RTL RTL RTL
Mem
RTL RTL RTL
RTL RTL RTL
DSP RTL RTL RTL
Mem RTL RTL I/O
Today’s SOC
Source: Leibson (DAC2004)
July 20, 2015
SoC Architecture
17
From ASIC to SoC, MPSoC

We get more and more cores on a single chip
SoC: both hardware and software
(processor plus memory)
ASIP: Application Specific Instruction Set Processors
Platforms reduce Costs
SOC Flexibility = Per-Unit Cost Reduction
Source: Leibson 2004
(Model: 100K and 1M system volumes)
120
Low-end
still camera
100
Total per unit cost
100 000
High-end
still camera
1 000 000
80
60
40
20
Video camcorder
0
1
One Chip
2
3
4
5
6
7
System designs per chip design
Many System Designs
$10M design cost, $15 manf. cost, 5% premium for programmability
July 20, 2015
SoC Architecture
19
Platform Example: Nexperia
July 20, 2015
SoC Architecture
20
Nexperia Instance: Viper
July 20, 2015
SoC Architecture
21
Arm based MPSoC Platform
July 20, 2015
SoC Architecture
22
OMAP from Texas Instruments


TI’s OMAP (Open
Multimedia Application
Platform) is a category of
proprietary system on
chips that has capabilities
for portable and mobile
multimedia applications.
A number of mobile
phones use OMAP SoCs.
July 20, 2015
SoC Architecture
23
OMAP: Hierarchy of Platforms
Application
Specific
Ref
Design
Appl.
Platform
SoC Platform
OMAP Products
OMAP Infrastructure
ASIC Library & Tools
Silicon Technology


Reuse
OMAP uses platforms on different levels
This is a precondition for reuse
July 20, 2015
SoC Architecture
24
SoC Platform

The SoC platform consists of



The Application Platform




A library of hardware components
An architecture for their interconnection
Processor and Peripherals
Low-Level Software (Drivers)
Development Environment
The System Platform



OS and Middleware
Includes the code that controls all aspects of the system from
device driver to system interface
Compilers and tools
July 20, 2015
SoC Architecture
25
OMAP 1510

OMAP 1510 is based on



Enhanced ARM 925 core (RISC processor)
TI C55x core
DMA, SRAM, Busses, Peripherals
July 20, 2015
SoC Architecture
26
Current OMAP platform for
Wireless Handset & PDA

OMAP™ 3 architecture combines mobile
entertainment with high performance productivity
applications (Source: Texas Instruments)
July 20, 2015
SoC Architecture
27
Evolving SoC
Architectures
System-on-Chip Architectures

A system-on-chip architecture integrates several
heterogeneous components on a single chip
Microcontroller
Communication
Structure
AnalogDigital
DSP

July 20, 2015
Memory
FPGA
DigitalAnalog
Custom
Hardware
A key challenge is to design the communication between the
different entities of a SoC in order to minimize the
communication overhead
SoC Architecture
29
Questions on Interconnects
1.
2.
3.
4.
To interconnect 2 IP hardware blocks, how
would you like to let them communicate with
each other?
What if 5 to10 IP modules?
What if 20 IP blocks?
What if 200 IP blocks?
July 20, 2015
SoC Architecture
30
System-on-Chip Architecture:
A bus-based SoC
Memory
Microprocessor
System on a chip
July 20, 2015
Custom
Logic
SoC Architecture
DSP
I/O
31
Technology Impact on
Communication

Chip



Computation, storage by transistors
Communication by wires
How technology scaling affect communication
delay?
July 20, 2015
SoC Architecture
32
Scaling and Delays


Transistors are “free”; wires are “expensive”, slowing
down performance.
Long wires should be avoided, and the whole chip
cannot be treated as a monolithic piece and is preferably
segmented into communicating regions.
July 20, 2015
SoC Architecture
33
Number of Cores on Chip
June 2009
By ITRS (International Technology Roadmap for Semiconductors).
34
Communication architectures

Evolving from buses to networks


Buses are not scalable in bandwidth, power and
performance
Network-on-Chip provides


Scalable architectures
Concurrent pipelined communication
July 20, 2015
SoC Architecture
35
System-on-Chip Architecture:
Network-on-Chip
Switch
PE1
NI
NI
PE3
Channel
PE2
Resource
NI
NI
MEM
Network Interface


The resources are connected to the network via
network interfaces
The topology of the network and the capability of the
switches and communication channels determine
the capacity of the network
July 20, 2015
SoC Architecture
36
Intel Teraflop Chip - 2007
 80 Cores
 100 Million







transistors
65nm process
3.16 GHz
0.95V
62 W
1.62 Terabit/s
aggregate
bandwidth
91 Gb/s bisection
bandwidth
1.01 Teraflops
Tilera Gx Family











5 mesh networks: 32 bit; Dimension order

routing; 1-2 cycle traversal

Static Network (STN)

User Dynamic Network (UDN)
I/O Dynamic Network (IDN)
Tile Dynamic Network (TDN)
July 20, 2015
SoC Architecture
Memory Dynamic Network (MDN)
4x4, 6x6, 8x8, 10x10 Chips
3 instructions per cycle per core
32 MB on chip cache
750 GOPS (32 bit operations)
200 Tbps on chip interconnect
bandwidth
500 Gbps memory bandwidth
~ 1 GHz operating frequency
10W – 55W power consumption
38
Questions on Network Design

Network does




1 to 1 communication: unicast
1 to N communication: multicast
N to 1 communication: gather
1. What problems needed to solve in order to
realize unicast?
2. What performance metrics do you envision?
3. What factors influence the network
performance?
July 20, 2015
SoC Architecture
39
In the Course

Bus-based architectures





Buses and arbitration
Shared memory multiprocessors
Cache coherency
Memory consistency
Network-on-Chip (NoC) architectures




Topology
Routing
Flow control
Performance analysis
July 20, 2015
SoC Architecture
40