CPU Performance
Download
Report
Transcript CPU Performance
IL2207
SoC Architecture Course
Jan – March 2011, KTH
Dr. Zhonghai Lu
[email protected]
(鲁中海)
Course Information
Course staff
12 Lectures, 4 Tutorials, 3 Labs
Home page: www.ict.kth.se/courses/IL2207/1101
Course Material
Responsible: Dr. Zhonghai Lu, [email protected]
Examiner: Prof. Axel Jantsch, [email protected]
Assistants: Huimin She, [email protected]
Abbas Eslami Kiasari, [email protected]
Dally, Towles: Principles and Practices of
Interconnection Networks
Distributed materials and slides
Advanced-level course, 7.5 credits, 40x5=200 hours
July 20, 2015
SoC Architecture
2
Lecture Overview
L1: Introduction
L2: Buses and Arbitration (Dally: 22, 18)
L3: Shared Memory Multiprocessors
L4: Cache Coherency Protocols
L5: Memory Consistency
L6: Introduction to Network-on-Chip, Topologies (Dally: 1, 2, 3, 4, 5)
L7: Routing Algorithms and Mechanics (Dally: 8, 9, 10, 11)
L8: Flow Control (Dally: 12, 13)
L9: Deadlock and Livelock (Dally: 12, 13, 14)
L10: Router Architecture and Network Interface (Dally: 16, 17, 20)
L11: Network Performance Analysis and Quality of Service (Dally:
23, Dally 15)
L12: Course Summary
July 20, 2015
SoC Architecture
3
Tutorial Overview
T1: Bus, arbitration and cache coherency
T2: Memory consistency and network topology
T3: Interconnection networks (routing, flow control, deadlock etc.)
T4: Router architecture, QoS and performance analysis
Tutorials will be given by Abbas.
For each tutorial questions, 2 Questions should be answered and
handed in to Abbas before each tutorial session. 10% for the final
grade.
July 20, 2015
SoC Architecture
4
Lab Overview
Laboratory 1: Uniprocessor SoC Design on FPGA
Laboratory 2: Multiprocessor SoC Design with Altera FPGA
Assistant: Huimin
Laboratory 3: Wormhole Networks
Assistant: Huimin
Assistant: Abbas
Each lab has 2 sessions: a, b.
Students work in groups of max. 2 students
Good preparation is required.
Take good care of the FPGA boards.
July 20, 2015
SoC Architecture
5
Course Requirements
To pass the course the student has to fulfill the following
requirements:
Pass the final exam. The grade for the exam will
be the 90% grade of the course: ABCDEFxF
Final exam: March 16, 2011, 9:00-13:00,
Register the exam in Daisy 2 weeks before the exam
date in order to guarantee a seat !
Complete all labs: Pass | Fail
Attend lectures, tutorials and labs
July 20, 2015
SoC Architecture
6
Labs
3 labs in total
Only 2 (NOT 3) lab sessions for each lab
Possible cancelled sessions: 13:00-16:00 (Jan. 24, Feb. 7,
Feb. 21). Please note the final changes in TimeEdit.
Lab 1 and 2: FPGA board. (Assistant: Huimin She [email protected] )
Lab 3: Network simulator. (Assistant: Abbas Eslami Kiasari
[email protected] )
Evenly distributed to avoid long waiting time (Approx. 20 persons in
each session)
Lab partners
Two persons in a group
If you also take IL2212 Embedded Software, please choose the
same partner as you have for IL2212
The FPGA boards must be returned after lab 1
Note schedule changes
We are resolving schedule conflicts with
IL2201 course: Digital Integrated Circuit
Design – VLSI
For each lab, we have 3 sessions booked,
two will remain and one to be cancelled.
Tomorrow's IL2207 lecture will be 10 to 12 AM in
Ka-C21 (Electrum)
July 20, 2015
SoC Architecture
8
Observations in
System Design
Observations
Good news
Bad news
Chip capacity increases following the Moore’s law
Functionality increases accordingly to exploit
these transistors
Difficult to design, Productivity decreases
Cost increases
Platform-based design can reduce cost
Architecture is a key!
July 20, 2015
SoC Architecture
10
Advances in Integration
Intel 4004
(1971)
Intel Pentium 4
(2000)
1.5 GHz
42 million transitors
108 KHz
2,300 transistors
If automobile speed had increased similarly over the same
period, we could now drive from Stockholm to Shanghai in
about 23 seconds.
July 20, 2015
SoC Architecture
11
Intel chips with Moore’s law
October, 2008
Seminar at National Institute of Informatics, Tokyo
12/22
Scaling ARM9
ARM 9
180 nm
11.8 mm2
130 nm,
5.2 mm2
90 nm,
2.6 mm2
65 nm
1.4 mm2
Growing Design-Productivity Gap
Design Productivity Crisis
Potential Design Complexity and Designer Productivity
10,000
100,000,000
1,000
10,000
Logic Tr./Chip
100
Tr./S.M.
1,000
10
100
1
10
x
0.1
xx
xx
x
x
1
x
0.01
0.1
0.001
0.01
Productivity (K) Trans./Staff – Mo.
Logic Transistor per Chip (M)
Equivalent Added Complexity
Designs do not only get more complex, but also much more expensive!
July 20, 2015
SoC Architecture
14
The Role of the Market!
Source: Smith 1997
July 20, 2015
Time-to-Market pressure!
SoC Architecture
15
Verification Costs
The percentage of the verification costs of the
total design costs is continuously increasing
(at present 50-70% for large designs)
July 20, 2015
SoC Architecture
16
Moore’s Law drives the development
of System-in-Chip Architectures
RTL
function 1
Processor
RTL
function 2
Yesterday’s SOC
RTL
function 3
The growing number of
transistors on an SOC drives
the trend towards more RTL
blocks on the chip
Memory
RTL
I/O
Ctl RTL RTL RTL
Proc RTL RTL RTL
Mem
RTL RTL RTL
RTL RTL RTL
DSP RTL RTL RTL
Mem RTL RTL I/O
Today’s SOC
Source: Leibson (DAC2004)
July 20, 2015
SoC Architecture
17
From ASIC to SoC, MPSoC
We get more and more cores on a single chip
SoC: both hardware and software
(processor plus memory)
ASIP: Application Specific Instruction Set Processors
Platforms reduce Costs
SOC Flexibility = Per-Unit Cost Reduction
Source: Leibson 2004
(Model: 100K and 1M system volumes)
120
Low-end
still camera
100
Total per unit cost
100 000
High-end
still camera
1 000 000
80
60
40
20
Video camcorder
0
1
One Chip
2
3
4
5
6
7
System designs per chip design
Many System Designs
$10M design cost, $15 manf. cost, 5% premium for programmability
July 20, 2015
SoC Architecture
19
Platform Example: Nexperia
July 20, 2015
SoC Architecture
20
Nexperia Instance: Viper
July 20, 2015
SoC Architecture
21
Arm based MPSoC Platform
July 20, 2015
SoC Architecture
22
OMAP from Texas Instruments
TI’s OMAP (Open
Multimedia Application
Platform) is a category of
proprietary system on
chips that has capabilities
for portable and mobile
multimedia applications.
A number of mobile
phones use OMAP SoCs.
July 20, 2015
SoC Architecture
23
OMAP: Hierarchy of Platforms
Application
Specific
Ref
Design
Appl.
Platform
SoC Platform
OMAP Products
OMAP Infrastructure
ASIC Library & Tools
Silicon Technology
Reuse
OMAP uses platforms on different levels
This is a precondition for reuse
July 20, 2015
SoC Architecture
24
SoC Platform
The SoC platform consists of
The Application Platform
A library of hardware components
An architecture for their interconnection
Processor and Peripherals
Low-Level Software (Drivers)
Development Environment
The System Platform
OS and Middleware
Includes the code that controls all aspects of the system from
device driver to system interface
Compilers and tools
July 20, 2015
SoC Architecture
25
OMAP 1510
OMAP 1510 is based on
Enhanced ARM 925 core (RISC processor)
TI C55x core
DMA, SRAM, Busses, Peripherals
July 20, 2015
SoC Architecture
26
Current OMAP platform for
Wireless Handset & PDA
OMAP™ 3 architecture combines mobile
entertainment with high performance productivity
applications (Source: Texas Instruments)
July 20, 2015
SoC Architecture
27
Evolving SoC
Architectures
System-on-Chip Architectures
A system-on-chip architecture integrates several
heterogeneous components on a single chip
Microcontroller
Communication
Structure
AnalogDigital
DSP
July 20, 2015
Memory
FPGA
DigitalAnalog
Custom
Hardware
A key challenge is to design the communication between the
different entities of a SoC in order to minimize the
communication overhead
SoC Architecture
29
Questions on Interconnects
1.
2.
3.
4.
To interconnect 2 IP hardware blocks, how
would you like to let them communicate with
each other?
What if 5 to10 IP modules?
What if 20 IP blocks?
What if 200 IP blocks?
July 20, 2015
SoC Architecture
30
System-on-Chip Architecture:
A bus-based SoC
Memory
Microprocessor
System on a chip
July 20, 2015
Custom
Logic
SoC Architecture
DSP
I/O
31
Technology Impact on
Communication
Chip
Computation, storage by transistors
Communication by wires
How technology scaling affect communication
delay?
July 20, 2015
SoC Architecture
32
Scaling and Delays
Transistors are “free”; wires are “expensive”, slowing
down performance.
Long wires should be avoided, and the whole chip
cannot be treated as a monolithic piece and is preferably
segmented into communicating regions.
July 20, 2015
SoC Architecture
33
Number of Cores on Chip
June 2009
By ITRS (International Technology Roadmap for Semiconductors).
34
Communication architectures
Evolving from buses to networks
Buses are not scalable in bandwidth, power and
performance
Network-on-Chip provides
Scalable architectures
Concurrent pipelined communication
July 20, 2015
SoC Architecture
35
System-on-Chip Architecture:
Network-on-Chip
Switch
PE1
NI
NI
PE3
Channel
PE2
Resource
NI
NI
MEM
Network Interface
The resources are connected to the network via
network interfaces
The topology of the network and the capability of the
switches and communication channels determine
the capacity of the network
July 20, 2015
SoC Architecture
36
Intel Teraflop Chip - 2007
80 Cores
100 Million
transistors
65nm process
3.16 GHz
0.95V
62 W
1.62 Terabit/s
aggregate
bandwidth
91 Gb/s bisection
bandwidth
1.01 Teraflops
Tilera Gx Family
5 mesh networks: 32 bit; Dimension order
routing; 1-2 cycle traversal
Static Network (STN)
User Dynamic Network (UDN)
I/O Dynamic Network (IDN)
Tile Dynamic Network (TDN)
July 20, 2015
SoC Architecture
Memory Dynamic Network (MDN)
4x4, 6x6, 8x8, 10x10 Chips
3 instructions per cycle per core
32 MB on chip cache
750 GOPS (32 bit operations)
200 Tbps on chip interconnect
bandwidth
500 Gbps memory bandwidth
~ 1 GHz operating frequency
10W – 55W power consumption
38
Questions on Network Design
Network does
1 to 1 communication: unicast
1 to N communication: multicast
N to 1 communication: gather
1. What problems needed to solve in order to
realize unicast?
2. What performance metrics do you envision?
3. What factors influence the network
performance?
July 20, 2015
SoC Architecture
39
In the Course
Bus-based architectures
Buses and arbitration
Shared memory multiprocessors
Cache coherency
Memory consistency
Network-on-Chip (NoC) architectures
Topology
Routing
Flow control
Performance analysis
July 20, 2015
SoC Architecture
40