Embedded System Hardware - University of Saskatchewan

Download Report

Transcript Embedded System Hardware - University of Saskatchewan

Embedded System Hardware
Embedded system hardware is frequently used in a loop
(“hardware in a loop“):
actuators
Embedded Systems - processing
- 1-
Processing units
Need for efficiency (power + energy):
Why worry about
energy and power?
“Power is considered as the most important constraint
in embedded systems”
[in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]
Current UMTS phones can hardly be operated for more
than an hour, if data is being transmitted.
[from a report of the Financial Times, Germany, on an analysis by Credit
Suisse First Boston; http://www.ftd.de/tm/tk/9580232.html?nv=se]
Embedded Systems - processing
- 2-
Embedded Systems - processing
- 3-
Embedded Systems - processing
- 4-
Embedded Systems - processing
- 5-
The energy/flexibility conflict
- Intrinsic Power Efficiency Operations/Watt
[MOPS/mW]
Ambient Intelligence
DSP-ASIPs
µPs
poor design
generation
techniques
10
1
0.1
0.01
1.0µ
0.5µ
0.25µ
0.13µ
0.07µ
Technology
Necessary to optimize!
[H. de Man, Keynote, DATE‘02;
T. Claasen, ISSCC99]
Embedded Systems - processing
- 6-
Power and energy are related to each other
P
E   P dt
E
t
In many cases, faster execution also means less energy,
but the opposite may be true if power has to be increased
to allow faster execution.
Embedded Systems - processing
- 7-
Low Power vs. Low Energy Consumption
• Minimizing the power consumption is important for
– the design of the power supply
– the design of voltage regulators
– the dimensioning of interconnect
– short term cooling
• Minimizing the energy consumption is important because of
– restricted availability of energy (mobile systems)
• limited battery capacities (only slowly improving)
• very high costs of energy (solar panels, in space)
– cooling
• high costs
• limited space
– dependability
• long lifetimes, low temperatures
Embedded Systems - processing
- 8-
Application Specific Circuits (ASICS)
or Full Custom Circuits
Custom-designed circuits
necessary if ultimate speed or
energy efficiency is the goal and
large numbers can be sold.
Approach suffers from long
design times and high costs (e.g.
Mill. $ mask costs).
Embedded Systems - processing
- 9-
Processors
At the chip level, embedded chips include micro-controllers
and microprocessors. Micro-controllers are the true
workhorses of the embedded family. They are the original
’embedded chips’ and include those first employed as
controllers in elevators and thermostats [Ryan, 1995].
Key requirements:
1. Energy-efficiency
2. Code-size efficiency:
Memory is a scarce resource in embedded systems,
in particular for “systems-on-a-chip”.
3. Run-time efficiency
Embedded Systems - processing
- 10 -
New ideas can actually reduce
energy consumption
Pentium
Crusoe
Running the same multimedia application.
As published by Transmeta [www.transmeta.com]
Embedded Systems - processing
- 11 -
Dynamic power management (DPM)
Example: STRONGARM SA1100
RUN: operational
IDLE: a sw routine may
stop the CPU when not
in use, while monitoring
interrupts
SLEEP: Shutdown of onchip activity
400mW
RUN
10µs
160ms
10µs
90µs
IDLE
SLEEP
50mW
Embedded Systems - processing
90µs
160µW
- 12 -
Fundamentals of dynamic voltage scaling
(DVS)
Power consumption of CMOS
circuits (ignoring leakage):
P   CL Vdd2 f w ith
Delay for CMOS circuits:
 : sw itchingactivity
Vdd
  k CL
w ith
2
Vdd Vt 
CL : load capacitance
Vt : threshhold voltage
Vdd : supply voltage
(Vt substancially  than Vdd )
f : clock frequency
 Decreasing Vdd reduces P quadratically,
while the run-time of algorithms is only linearly increased
(ignoring the effects of the memory system).
Embedded Systems - processing
- 13 -
Voltage scaling: Example
[Courtesy, Yasuura, 2000]
Embedded Systems - processing
Vdd
Exploitation
discussed in
codesign
chapter
- 14 -
Code-size efficiency
• CISC machines: RISC machines designed for run-time-,
not for code-size-efficiency
• Compression techniques: key idea
Embedded Systems - processing
- 15 -
Code-size efficiency
• Compression techniques (continued):
– 2nd instruction set, z.B. ARM Thumb instruction set:
001 10
major
opcode
Rd
Constant
16-bit Thumb instr.
ADD Rd #constant
source=
minor
opcode destination
1110 001 01001
0 Rd
zero extended
0 Rd 0000 Constant
• Reduction to 65-70 % of original code size
• 130% of ARM performance with 8/16 bit memory
• 85% of ARM performance with 32-bit memory
Embedded Systems - processing
[ARM, R. Gupta]
- 16 -
Two-level control store concept
(indirect addressing of instructions)
instruction
address
S
For each
instruction
address, S
contains table
address of
instruction.
<< 32 bit
table of used instructions
32 bit
CPU
Embedded Systems - processing
Each instruction pattern is
stored only once, and not
repeatedly stored for each
instruction address for
which it is needed.
Similar to concept of
colour lookup table.
Can be extended to
include subroutines in
lookup table.
Called nanoprogramming
in the Motorola 68000.
- 17 -
Run-time optimization:
Domain-oriented architectures (DSP)
n-1
Application: y[j] = i=0 x[j-i]*a[i]
i: 0i  n-1: yi[j] = yi-1[j] + x[j-i]*a[i]
Architecture: Example: Data path ADSP210x
a
P
D
AX
Addressregisters
A0, A1, A2
..
i+1, j-i+1
Address
generation
unit (AGU)
x
- Parallelism
- Dedicated
registers
x[j-i]
AY
MX
AF
+,-,..
AR
ADSP 2100
Embedded Systems - processing
MY
a[i]
MF
* x[j-i]*a[i]
+,yi-1[j]
MR
MR:=0; A1:=1; A2:=n-2;
MX:=x[n-1]; MY:=a[0];
for ( j:=1 to n)
{MR:=MR+MX*MY;
MY:=a[A1]; MX:=x[A2];
A1++; A2--}
- 18 -
Digital Signal Processing (DSP) Processors
- Features (1) • Multiply/accumulate (MAC) and zero-overhead loop
(ZOL) instructions (as shown)
• Heterogeneous registers (as shown)
• Separate address generation units (AGUs)
(as in ADSP 210x)
Embedded Systems - processing
- 19 -
Digital Signal Processing (DSP) Processors
- Features (2) • Modulo
addressing:
Am++ 
Am:=(Am+1)
mod n
(implements ring
or circular buffer
in memory)
sliding window
x
t1
t2
t
..
x[n-2]
x[n-1]
x[0]
x[1]
..
Memory, t=t1
Embedded Systems - processing
..
x[n-3]
x[n-2]
x[n-1]
x[n]
x[1]
Memory, t2=t1+1
- 20 -