Transcript here
DSP Architectures
Additional Slides
Professor S. Srinivasan
Electrical Engineering Department
I.I.T.-Madras, Chennai –600 036
[email protected]
Figure 4.3(a) Block diagram of a barrel shifter
Figure 4.3(b) Implementation of a 4-bit, shift-right barrel
shifter
Figure 4.5 A MAC unit with accumulator guard bits
Figure 4.6 A schematic diagram of the saturation
logic
Figure 4.7 Block diagram of an arithmetic logic unit
Figure 4.9 Register pointer updating algorithm for
circular buffer addressing mode: SAR = start address
register contents, EAR = end address register contents,
PNTR = pointer
Figure 4.10 Different cases that arise in updating the
pointer in circular buffer addressing mode
Figure 4.10 Continued
Figure 4.11 Block diagram of an address generation
unit
Bit-reversal Hardware
Figure 4.12 A conceptual diagram of a program
sequencer
Instruction Level Parallelism
VLIW architecture
• Each instruction specifies several operations
to be done in parallel
• Advantages
: Simple hardware
compilers can spot ILP easily
• Disadvantages : Little compatibilty between
generations
Explicit NOPs bloat code size
Super scalar architecture
• Hardware responsible for finding ILP in a
sequential program
• Advantage
: Compatibility between
generations
• Disadvantage : Very complex hardware
Explicitly Parallel Instruction Computing
(EPIC)
• Combines VLIW and super scalar
architectures
• Instructions are grouped into 3 operating
blocks and a template block
• Template block tells hardware if
instructions can be executed in parallel
• Also gives information whether the block
can be executed in parallel
ILP versus Power
Increasing instructions / cycle
Requires fewer cycles to execute a task
Uses longer clock for same performance
Uses lower supply voltage
And hence uses less power
However, too many functional units and too
many transitions per clock cycle increase
power consumption.
Low Power architecture
Power consumed by additional circuits vs. ability to
lower clock rate while maintaining performance
Circuits must be highly used
Move complexity into software
Voltage scaling : Reduce Vdd
Clock gating
: Turn off clock when chip
is not in use ( applies to
sub-modules of chip also)
VLIW is more suitable than super scalar for
low power
- VLIW is smaller for same number of
functional units
- Compiler is better at finding parallelism
than hardware
Put multiple processors on chip rather than
lots of functional units in one processor
Helps in running independent tasks
General Purpose Microprocessor 2000
GHz clock speed
32-bit address or more
32-bit bus, 128-bit instructions
Complex MMU
Super scalar CPU
MMX instructions
On chip cache
Single cycle execution
32-bit floating point ALU on board
Very expensive
10s of watts of power
DSP in 2000
Clock 100 ~ 200 MHz
16-bit floating point or 32-bit floating point
16-24 bits address space
Large on-chip and off-chip memories
Single cycle execution of most instructions
Harvard architecture
Lots of special DSP instructions
50 mw to 2w power
Cheap
Future of DSP Microprocessor
Sufficiently unique for an independent
class of applications (HDD, cell phone)
Low power consumption, low cost
High performance within power, cost
constraints (MIPS/mw, MIPS/$)
Fixed point & floating point
Better compilers - but users must be
informed
Hybrid DSP/ GP systems