Transcript here

DSP Architectures
Additional Slides
Professor S. Srinivasan
Electrical Engineering Department
I.I.T.-Madras, Chennai –600 036
[email protected]
Figure 4.3(a) Block diagram of a barrel shifter
Figure 4.3(b) Implementation of a 4-bit, shift-right barrel
shifter
Figure 4.5 A MAC unit with accumulator guard bits
Figure 4.6 A schematic diagram of the saturation
logic
Figure 4.7 Block diagram of an arithmetic logic unit
Figure 4.9 Register pointer updating algorithm for
circular buffer addressing mode: SAR = start address
register contents, EAR = end address register contents,
PNTR = pointer
Figure 4.10 Different cases that arise in updating the
pointer in circular buffer addressing mode
Figure 4.10 Continued
Figure 4.11 Block diagram of an address generation
unit
Bit-reversal Hardware
Figure 4.12 A conceptual diagram of a program
sequencer
Instruction Level Parallelism
VLIW architecture
• Each instruction specifies several operations
to be done in parallel
• Advantages
: Simple hardware
compilers can spot ILP easily
• Disadvantages : Little compatibilty between
generations
Explicit NOPs bloat code size
Super scalar architecture
• Hardware responsible for finding ILP in a
sequential program
• Advantage
: Compatibility between
generations
• Disadvantage : Very complex hardware
Explicitly Parallel Instruction Computing
(EPIC)
• Combines VLIW and super scalar
architectures
• Instructions are grouped into 3 operating
blocks and a template block
• Template block tells hardware if
instructions can be executed in parallel
• Also gives information whether the block
can be executed in parallel
ILP versus Power
Increasing instructions / cycle
 Requires fewer cycles to execute a task
 Uses longer clock for same performance
 Uses lower supply voltage
 And hence uses less power
However, too many functional units and too
many transitions per clock cycle increase
power consumption.
Low Power architecture
 Power consumed by additional circuits vs. ability to
lower clock rate while maintaining performance
 Circuits must be highly used
 Move complexity into software
 Voltage scaling : Reduce Vdd
 Clock gating
: Turn off clock when chip
is not in use ( applies to
sub-modules of chip also)

VLIW is more suitable than super scalar for
low power
- VLIW is smaller for same number of
functional units
- Compiler is better at finding parallelism
than hardware

Put multiple processors on chip rather than
lots of functional units in one processor

Helps in running independent tasks
General Purpose Microprocessor 2000







GHz clock speed
32-bit address or more
32-bit bus, 128-bit instructions
Complex MMU
Super scalar CPU
MMX instructions
On chip cache
Single cycle execution

32-bit floating point ALU on board

Very expensive

10s of watts of power

DSP in 2000

Clock 100 ~ 200 MHz

16-bit floating point or 32-bit floating point

16-24 bits address space

Large on-chip and off-chip memories

Single cycle execution of most instructions

Harvard architecture

Lots of special DSP instructions

50 mw to 2w power

Cheap
Future of DSP Microprocessor

Sufficiently unique for an independent
class of applications (HDD, cell phone)

Low power consumption, low cost

High performance within power, cost
constraints (MIPS/mw, MIPS/$)

Fixed point & floating point

Better compilers - but users must be
informed

Hybrid DSP/ GP systems