Flynn`s Classifications (1972) [1]

Download Report

Transcript Flynn`s Classifications (1972) [1]

Paper Review
A New Generation of DSP
Architectures
Bryan Ackland and Paul D’Arcy
Lucent Technologies
Babak Noory
Professor Maitham Shams
97.575
March 18, 2002
Agenda
1. Look at the evolution of Digital Signal Processors
2. Review the emerging system requirements
3. Summarize recent advances in low power DSP
techniques
4. Look at a number of new high performance architectures
5. Describe a bus based multi-core architecture for task
level parallelism
Introduction
General Purpose Digital Signal Processors
Introduced in 1980
- High performance engines
- MAC speed advantage of 50:1 over the best micro-processors
Today
- Modest performance improvements
- Outperformed by micro-processors
DSP Evolution
Performance of DSPs vs. Microprocessors
Performance
1K
(Peak MACs)
Pentium MMX
DSP-1600
100
DSP-16
10
DSP-16210
Pentium
DSP-32C
DSP-1
80386
80286
1
M68000
1980
1985
1990
1995
2000
And yet, DSPs generate over $ 3 billion dollars for the semiconductor
industry every year.
DSP Evolution
Power and Cost of DSP’s vs. Microprocessors
Power (mW/MIP)
M68000 ($200)
10K
80286 ($200)
1K
80386 ($300)
DSP-1 ($150)
Pentium ($500)
DSP-32C ($250)
100
DSP-16A ($15)
DSP-1600 (<$10)
10
1
1980
1985
1990
Lower cost
Higher MOP/mm2 and MOP/mW
1995
2000
Emerging Applications
Very Low Power Applications
Portable Applications: functionalities such as video and web
browsing added to cellular phones, PDAs, and Multimedia Laptops
Average power becomes the main design constrain
High Performance Applications
Embedded Applications: digital audio broadcast and smart phones
PC based Applications: 3-D graphics and real-time video
communications
Infrastructure Applications: modem head-end and wireless
basestations
Low Power Techniques
1. Full Custom Datapath Layout
Circuit Topology
Layout Topology
Simple
Transistor Sizing
Layout Parasitics
Drain Capacitance
45.6 fF
Finger
18.7 fF
Ring
10.8 fF
X
S
X
D
S
W
a) Simple
D
W/2
S
S
W/4
b) Finger
Courtesy [1]
D
c) Ring
Low Power Techniques
2. Clock Gating
System Level Clock Gating: Limit data transition and clock
dissipation to active sub-systems
Local Clock Gating: Deactivate non-active elements in a
sequential circuit
Gate CPU
T
Crystal
Oscillator
Operation Mode
Power
Gate CPU
Section 1
&
To boards 1-3
Normal Mode (80MHz)
120mW
Gate CPU
Section 2
&
To boards 4-6
Standby (Halt)
21 mW
Gate CPU
Section 3
&
To boards 7-9
System Clock
Slow Clock (16KHz)
StopClk
Courtesy [4]
2.3mW
30uW
Low Power Techniques
3. Minimizing Data Transitions
Applicable to circuits, where data transitions are well
understood
Difficult to estimate internal node activity for complex circuits
A
B
x
C
B
C
Z
x
P(A=1) = 0.5
A
Z
P(B=1) = 0.2
P(C=1) = 0.1
Activity at node x = 0.09
Activity at node x = 0.0196
Courtesy [3]
Low Power Techniques
4. Partitioned Memory Architecture
Memories occupy a great deal of silicon area, but activity
factors in these individual circuits are very low.
Adopt hierarchical sub-banking
Replace large memory blocks with several smaller
blocks
Make use of gated clocks to limit switching activity to
active blocks
Low Power Techniques
5. Technology &Voltage Scaling
Adjusting supply voltages to meet performance requirements
Mixed voltage & mixed threshold logic families
Dynamic voltage scaling: Supply voltage and clock speed
vary continuously according to processor load
Supply “cut off:” High threshold transistors used to cut off the
power when chip goes in sleep mode
Emerging Applications (Revisited)
Very Low Power Applications
Portable Applications: functionalities such as video and web
browsing added to cellular phones, PDAs, and Multimedia Laptops
Average power becomes the main design constrain
High Performance Applications
Embedded Applications: digital audio broadcast and smart phones
PC based Applications: 3-D graphics and real-time video
communications
Infrastructure Applications: modem head-end and wireless
basestations
New Class of architectures
Minor enhancements in combination with process improvement
will not
meet the requirements of emerging applications. The new
architectures
must provide:
Performance ranging from hundreds of MOPS to tens of
GOPS
Parallel architectures, many operations/clock
Large memory and I/O bandwidth
Cache hierarchies
Compiler driven programming environment
High-level programming languages
Scalability
Media Processors
Architecture
clock
Performance
Memory
Programming
TI
Chromatics
Philips
IBM
Samsung
C80
MPACT
Tri-Media
MFAST
MSP-1
4 64bDSP
VLIW/SIMD
VLIW
VLIW/SIMD
32-way SIMD
+ 32b RISC
4 ALUs
25 exec. Units
4by4 folded
array
+ 32b RISC
40 MHz
62 MHz
100 MHz
50 MHz
100 MHz
1.2 GOPS
2.0 GOPS
4.0 GOPS
20 GOPS
6.4 GOPS
DRAM
RAMBUS
SDRAM
SDRAM
SDRAM
400 MB/s
500 MB/s
400 MB/s
800 MB/s
800 MB/s
Compiler +
Assembler
In-house
VLIW Compiler
Compiler +
Assembler
Compiler +
Assembler
Very high performance
Very fast memories
Yet all programs (save Tri-Media) have been cancelled
Media Processors
Reasons:
1. Programmability Issues
- Required large quantities of assembly code
- Explicit management of task level and instruction level parallelism
2. Lack of Scalability
- Single price/performance (except for C80)
3. Difficult Market
- Multimedia applications on PC
- Caught between high-performance ASICS and Software solutions
Daytona MIMD Architecture
Task Level Parallelism
Code and data
Ext. mem
Scalability
I/O
Memory &
I/O Controller
Bus support for N DSP
cores
STBus
Cache memory
Simulation has shown that N can
be in the range of 8 to 10
processors !
host
cache
cache
cache
DSP
DSP
DSP
Daytona DSP Core
Architecture
LIW Machine
STBus
32b SPARC + 64b SIMD
Instruction level parallelism:
Bus Interface
- 64b instructions
- 2 x 32b RISC operations
8kB
Instruction and Data Cache
- 32b RISC + 32b
coprocessor
extension
DSP core programming in C
32b SPARC
RISC up
64b 8-way SIMD
Vector Coprocessor
Conclusions(1)
The DSP world is changing
Emerging applications in combination with few backward
compatibility issues require new architectures, which can
maximize:
Parallelism
Scalability
Programmability
Generality
While other measures must be taken to minimize:
Cost
Time to Market
Conclusions(2)
The DSP world is changing
What will separate the DSPs from general purpose
microprocessors in the future, will simply be the cost factor.
Advances in programmable hardware field are also very
promising, and could further change the DSP landscape in the
future.
References
[1] A. P. Chandrakasan and R.W. Brodersen, “Low Power Digital
CMOS Design,” Kluwer Academic Publishers: Norwell, 1995.
[2] K. D. Wagner, “Clock System Design,” IEEE Design & Test of
Computers, PP. 9-27, October 1988
[3] L. Wanhammar, “DSP Integrated Circuits,” Academic Press:
London: 1999.
[4] K. Hwang, “Advanced Computer Architecture: Parallelism,
Scalability, Programmability,” McGraw-Hill: New York, 1993.
[5] T. Kudra and T. Sakurai, “Overview of Low-Power ULSI Circuit
Techniques,” IEICE Transactions on Electronics, Vol. E78-C, NO.4,
PP. 334-344, April 1995
[6] C. Hamacher, Z. Vranesic and S. Zaky, “Computer Organization,”
fifth edition, McGraw-Hill: New York, 2002.
[7] M. M. Mano, “Computer System Architecture,” McGraw-Hill: New
York, 1993.