Transcript SOC-CH1b

Chapter 1
Introduction to
the Systems Approach
Computer System Design
System-on-Chip
by M. Flynn & W. Luk
Pub. Wiley 2011 (copyright 2011)
soc 1.1
SOC architecture and design
• system-on-chip (SOC)
– processors: become components in a system
• SOC covers many topics
– processors, cache, memory, interconnect, design tools
• need to know
–
–
–
–
–
–
–
–
user view: variety of processors
basic information: technology and tools
processor internals: effect on performance
storage: cache, embedded and external memory
interconnect: buses, network-on-chip
evaluation: processor, cache, memory, interconnect
advanced: specialized processors, reconfiguration
design productivity: system modelling, design exploration
soc 1.2
System on a Chip: driven by
semiconductor advances
soc 1.3
Basic system-on-chip model
soc 1.4
SOC vs processors on chip
• with lots of transistors, designs move in 2 ways:
– complete system on a chip
– multi-core processors with lots of cache
System on chip
Processors on chip
processor
multiple, simple,
heterogeneous
few, complex,
homogeneous
cache
one level, small
2-3 levels, extensive
memory
embedded, on chip
very large, off chip
functionality
special purpose
general purpose
interconnect
wide, high bandwidth
often through cache
power, cost
both low
both high
operation
largely stand-alone
need other chips
soc 1.5
iPhone: has System-on-Chip
Source: UC Berkeley
soc 1.6
iPhone SOC
I/O
Processor
1 GHz ARM Cortex
A8
I/O
Memory
Source: UC Berkeley
I/O
soc 1.7
AMD’s Barcelona Multicore
4 out-of-order cores
Processor
512KB L2
512KB L2
Core 1
Core 2

1.9 GHz clock rate

65nm technology

3 levels of caches

integrated Northbridge
Core 3
512KB L2
Northbridge
512KB L2
2MB shared L3 Cache

Core 4
http://www.techwarelabs.com/reviews/processors/barcelona/
soc 1.8
SOC design: key ideas
• to design and evaluate an SOC, designers
need to understand:
– its components: processors, memory, interconnect
– applications that it targets
• SOC economics heavily dependent on:
– costs: initial design, marginal production
– volume: applicability, lifetime
• reducing design complexity
– Intellectual Property (IP)
– reconfigurable technology
soc 1.9
SOC processors
• usually a mix of special and general
purpose (GP)
– can be proprietary design or purchased IP
• commonly GP processor is purchased IP
– includes OS and compiler support
• GP processor optimized for an application
– additional instructions
– vector units
soc 1.10
soc 1.11
Some processors for SOCs
SOC
Basic ISA
Processor description
Freescale c600:
signal processing
PowerPC
Superscalar with vector
extension
ClearSpeed
CSX600: general
Proprietary
Array processor with 96
processing elements
PlayStation 2:
gaming
MIPS
Pipelined with 2 vector
coprocessors
ARM VFP11:
general
ARM
Configurable vector
coprocessor
soc 1.12
Processor types: overview
Processor type Architecture / Implementation approach
SIMD
Single instruction applied to multiple
functional units
Vector
Single instruction applied to multiple
pipelined registers
VLIW
Multiple instructions issued each cycle
under compiler control
Superscalar
Multiple instructions issued each cycle
under hardware control
soc 1.13
Adding instructions
• additional instructions to support
specialized resources
– exception: superscalar, with hardware control
• instructions can be added to base
processor for coprocessor control
– VLIW: Very Large Instruction Word
– Array
– Vector
soc 1.14
Sequential and parallel machines
• basic single stream processors
– pipelined: basic sequential
– superscalar: transparently concurrent
– VLIW: compiler generated concurrency
• multiple stream
– array processors
– vector processors
• multiprocessors
soc 1.15
Sequential processors
• operation
– generally transparent to sequential programmer
– appear as in order instruction execution
• pipeline processor
– execution in order
– limited to one instruction execution / cycle
• superscalar processor
– multi instructions / cycle, managed by hardware
• VLIW
– multi op execution / cycle, managed by compiler
soc 1.16
Pipelined processor
Instruction #1
IF
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
Instruction #2
IF
Instruction #3
IF
Instruction #4
IF
WB
Time
• IF: Instruction Fetch
• ID: Instruction Decode
• AG: Address Generation
• DF: Data Fetch
• EX: Execution
• WB: Write Back
soc 1.17
Superscalar and VLIW processors
Instruction #1
IF
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
ID
AG
DF
EX
WB
Instruction #2
IF
Instruction #3
IF
Instruction #4
IF
Instruction #5
IF
Instruction #6
IF
Time
soc 1.18
Superscalar
VLIW
soc 1.19
Superscalar
VLIW
soc 1.20
Parallel processors
• execution managed by programmer
• array processors
– single instruction stream, multiple data streams:
SIMD
• vector processors
– SIMD
• multiprocessors
– multiple instruction streams, multiple data streams:
MIMD
soc 1.21
Array processors
• perform op if condition = mask
• operand can come from neighbour
mask
op
dest
sr1
sr2
n PEs, each with
memory; neighbour
communications
one instruction
issued to all PEs
soc 1.22
Vector processors
• vector registers, eg 8 regs x 64 words x 64 bits
• vector instructions: VR3 <- VR2 VOP VR1
soc 1.23
Array Processors
Vector Processors
soc 1.24
SOC multiprocessors
soc 1.25
Memory and addressing
• many SOC memory designs use simple
embedded memory
– a single level cache
– real (rather than virtual) addressing
• as SOC become more complex
– their designs are expected to use more complex
memory and addressing configurations
soc 1.26
Three levels of
addressing
soc 1.27
User view of memory: addressing
• a program: process address (offset + base + index)
– virtual address: process address + process id
• a process: assigned a segment base and bound
– system address: segment base + process address
• pages: active localities in main/real memory
– virtual address: translated by table lookup to real address
– page miss: virtual pages not in page table
• TLB (translation look-aside buffer): recent translations
– TLB entry: corresponding real and (virtual, id) address
• a few hashed virtual address bits address TLB entries
– if virtual, id = TLB (virtual, id) then use translation
soc 1.28
The TLB and
The MMU
soc 1.29
SOC interconnect
• interconnecting multiple active agents requires
– bandwidth: capacity to transmit information (bps)
– protocol: logic for non-interfering message transmission
• bus
– AMBA (adv. Microcontroller bus architecture) from ARM,
widely used for SOC
– bus performance: can determine system performance
• network on chip
– array of switches
– statically switched: eg mesh
– dynamically switched: eg crossbar
soc 1.30
Bus based SOC
soc 1.31
Network on a Chip
soc 1.32
SOC design approach
• understand application (compiler, OS, memory
and real time constrains)
• select initial die area, power, performance targets;
select initial processors, memory, interconnect
• assume target processor and interconnect
performance, design and evaluate memory
• evaluate and redesign processors with memory
• design interconnect to support processors and
memory
• repeat and iterate to optimize
soc 1.33
SOC design
approach
soc 1.34
Processor optimization example
• given embedded ARM processor
– in an SOC chip
• 1 IALU vs 2 IALU vs 3 IALU vs 4 IALU
– instructions per cycle?
• 16k L1 instruction cache vs 32k L1 i-cache
– how much improvement? less power?
• branch predictor: taken vs not-taken
– misprediction rate?
• aim: explore this large design space
soc 1.35
Design cost: product economics
• increasingly product cost determined by
– design costs, including verification
– not marginal cost to produce
• manage complexity in die technology by
– engineering effort
– engineering cleverness
soc 1.36
Design complexity
soc 1.37
Cost: product program vs engineering
Chip design
Fixed
costs
Variable costs
Verify & test
Labor costs
Software
Marketing,
sales,
administration
Manufacturing
costs
CAD
support
Engineering
costs
Engineering
Mask costs
CAD
programs
Fixed
project costs
Product cost
Capital
equipment
soc 1.38
Two scenarios
• fixed costs Kf, support costs 0.1 x fct(n), and
variable costs Kv*n, so
Costs  Kf  (0.1* Kf ) * 3 n  Kv * n
• design get more complex while production costs
decrease
– K1 increases while K2 decreases
– implicitly requires higher volumes to break even
• when compared with 1995, in 2005
– K1 increased by 10 times
– K2 decreased by the same amount
soc 1.39
Two scenarios
soc 1.40
Product volume dictates design effort
Basic
physical
tradeoffs
Design time
and effort
Balance point depends on
n, number of units
soc 1.41
Reduce complexity: use IP
• IP: Intellectual Property
soc 1.42
Reduce complexity: reconfig. tech.
• reconfigurable technology: no fabrication costs
– lower non-recurring engineering (NRE) costs
• reconfigurable design: faster and cheaper
– improve time-to-market
• reconfigurability
– in-system upgrade: improve time-in-market
– run-time adaptation: respond to run-time conditions
– compile-time reconfiguration: retarget accelerator
• overhead: performance, area, energy efficiency
– less effective than ASIC (application-specific IC)
soc 1.43
Summary
• to design and evaluate an SOC, designers
need to understand:
– its components: processors, memory, interconnect
– applications that it targets
• SOC economics heavily dependent on:
– costs: initial design, marginal production
– volume: applicability, lifetime
• reducing design complexity
– Intellectual Property (IP)
– reconfigurable technology
soc 1.44