Lower Power Algorithm for Multimedia Systems(1)

Download Report

Transcript Lower Power Algorithm for Multimedia Systems(1)

L27:Lower Power Algorithm
for Multimedia Systems
1999. 8
성균관대학교 조 준 동
http://vada.skku.ac.kr
Contents
• Algorithmic Effects on Low Power
• Low Power Management
• Low Power Applications
– Low Power Video Processor
– Single Chip Video Camera
– Vector Quantization
– Data Encoding
– CDMA Searcher
– Viterbi Decoder
Low Power Algorithm
Algorithm Selection
• Example: 8x8 matrix DCT
Strength Reduction: DIGLOG multiplier
Cmult (n)  253n 2 , Cadd (n)  214n,
where n  world length in bits
A  2 j  AR , B  2 k  BR
A  B  (2 j  AR )(2 k  BR )  2 j  BR  2 k  AR  AR  BR
1st Iter 2nd Iter 3rd Iter
Worst-case error
-25%
-6%
-1.6%
Prob. of Error<1% 10%
70%
99.8%
With an 8 by 8 multiplier, the exact result can be obtained at a maximum of
seven iteration steps (worst case)
Logarithmic Number System
Lx  log 2 | x|,
LAB  LA  LB , LA/ B  LA  LB ,
LA2  LA  1, L A  LA  1,
--> Significant Strength Reduction
Switching Activity Reduction
(a) Average
activity in a
multiplier as a
function of the
constant value
(b) A parallel
and serial
implementations
of an adder tree.
System-Level Solutions
•
•
•
•
•
System management, System partitioning, Algorithm selection
Precompute physical capacitance of Interconnect and switching
activity (number of bus accesses)
Regularity: to minimize the power in the control hardware and the
interconnection network.
Modularity: to exploit data locality through distributed processing
units, memories and control.
– Spatial locality: an algorithm can be partitioned into natural
clusters based on connectivity
– Temporal locality:average lifetimes of variables (less temporal
storage, probability of future accesses referenced in the recent
past).
Few memory references: since references to memories are
expensive in terms of power.
System-Level Solutions - cont.
• Simulator: Instruction-level Energy
Estimation
• Software: Energy Efficient Algorithms
• OS: Voltage Scheduling Algorithms
• OS: Multiprocessing for Energy
• Microprocessor: Dynamic Caches
Processor Systems:high
Power
• Thinkpad (Pentium) 0.3 Hours/AA
• InfoPad (ARM) 0.8 Hours/AA
• Toshiba Portable (486) 0.9
Hours/AA
Operations
per Battery
• Newton
(ARM)
2.0Life:
Hours/AA
Minimize Energy Consumed per Operation
Operations per Second:
Maximize Throughput Operations/ second
DPM vs SPM
Identify power hungry modules and look for
opportunities to reduce power
• DPM (Dynamic Power
Management): stops
the clock switching of a
specific unit generated
by clock generators.
• SPM (Static Power
Management): When
the system remains
idle for a significant
period time, then it is
shut-down.
Vdd vs Delay
•Use Variable Voltage Scaling or Scheduling for Real-time
Processing
•Use architecture optimization to compensate for slower operation,
e.g., Parallel Processing and Pipelining for concurrent increasing
and critical path reducing.
•Scale down device sizes to compensate for delay (Interconnects
do not scale proportionately and can become dominant)
Power PC 603 Strategy
• Baseline: use right supply and right frequency to each part of
the system If one has to wait on the occurence of some input,
only a small circuit could wait and wake-up the main circuit
when the input occurs.
• PowerPC 603 is a 2-issue (2 instructions read at a time) with 5
parallel
• Execution units. 4 modes:
– Full on mode for full speed
– Doze mode in which the execution units are not running
– Nap mode which also stops the bus clocking and the Sleep
mode which stops the clock generator
– Sleep mode which stops the clock generator with or without
the PLL (20-100mW).
Power PC 603 Power Management
TI Structures
•
•
•
•
•
•
Two DSPs: TMS320C541, TMS320C542 reduce power and chip count and
system cost for wireless communication applications
C54X DSPs, 2.7V, 5V, Low-Power Enhanced Architecture DSP (LEAD) family:
Three different power down modes, these devices are well-suited for wireless
communications products such as digital cellular phones, personal digital
assistants, and wireless modem,low power on voice coding and decoding
The TMS320LC548 features:
– 15-ns (66 MIPS) or 20-ns (50 MIPS) instruction cycle times
– 3.0- and 3.3-V operation
32K 16-bit words of RAM and 2K 16-bit words of boot ROM on-chip
Integrated Viterbi accelerator that reduces Viterbi butterfly update in four
instruction cycles for GSM channel decoding
Powerful single-cycle instructions (dual operand, parallel instructions, conditional
instructions)
InfoPad Architecture,
UC-Berkeley
Internet
Wireless
Basestation
“PadServer”
Speech
Recognizer
Transmit audio and
raw bitmaps across
the wireless link
InfoPad
Maintain state in
the network, not
on the Pad
Web
Browser
Example:
Hand-held
speech-enabled
web-browser
Perform all computation in the network to minimize client
energy dissipation
InfoPad Hardware Flexibility
Main data-flow handled by
custom low-power ASICs
Embedded software responsible
for high-level functions
Only header sent
to microprocessor
Packet
Header
10 MIPS
μProcessor
Framebuffer
update
Radio
RX Packet
Entire packet routed
to dedicated hardware
Control
Statistics
Reliability
Debugging
Frame
Buffer
• Use hardware/software integration to
provide energy-efficient high-level functionality
Multimedia I/O Terminal.
Multimedia I/O terminal
InfoPad Evolution
Total Power: ~7 W
Where did the power go?
Inefficient
implementation
InfoPad
Commercial
DC/DC
EnergyEfficient
Processors
Intercom
No local
computation?
Commercial
radios
• High-level system design optimizes complete
solution and drives new research
Power-Down Techniques
Low Power Memory