TajanaRosingTalk
Download
Report
Transcript TajanaRosingTalk
Resource management in
embedded systems
Tajana Šimunić Rosing
HP Labs & Stanford University
Embedded System Trends
Increasingly complex systems
Mixed hard & soft real-time requirements
Coordination of subsystem & multi-system control
Demand for dynamic & adaptive response
Operation in unpredictably changing contexts
Variable performance demands
Management of resources (power, performance, availability,
accessibility, throughput, security etc.)
Demand for autonomy
Human feasibility restrictions
Faster hardware
Fast processors and networks
Integrated processing, common platforms
FPGAs vs ASICs, DSPs; SoCs
Source : NSF
EU-US
Workshop
‘01
Tajana
Simunic
Rosing
Overview of Contributions
Resource
management in wireless embedded systems
Utilize
information already present in the system to deliver lower
energy consumption with excellent quality of service
Simultaneous increase in accessibility
Power
management of embedded HW
Evolution
of SoCs into NoCs
Optimal power and performance management policy for NoCs
Energy
optimization of embedded software
Methodology
for lowering energy consumption in data-intensive
embedded software algorithms using symbolic algebra
techniques
Tajana Simunic Rosing
Resource Management of
Heterogeneous Wireless
Embedded Systems
Networked Embedded Systems – Sensor Nodes
Power consumption of sensor node subsystems
Power (mW)
20
15
10
5
0
SENSORS CPU
Energy breakdown for voice
TX
RX
IDLE
SLEEP
Energy breakdown for MPEG video
Encode Decode
Receive
RADIO
Encode
Transmit
Decode
Transmit
Receive
Lucent WaveLAN at 2 Mbps & SA-1100 CPU at 150 MPIS
Source : Mobicom’01
SensorsTutorial
Tajana Simunic
Rosing
802.11b PM - Doze Mode
- MAC layer PM is not sufficient due to continual network polling for data, increased
RTT and broadcast traffic issues
- Network layer management has no knowledge of application characteristics or of
behavior of other clients in the environment
2
Heavy traffic conditions:
the card is awake most of the time
1.8
1.6
Power(W)
1.4
awake state
1.2
1
0.8
0.6
0.4
2
0.2
1.8
987
929
871
813
755
697
639
581
523
465
407
349
291
233
175
117
59
1
0
1.6
tim e (10-4 s e c)
1.2
1
0.8
doze state
0.6
0.4
0.2
Medium traffic conditions:
the card is awake half of the time
945
886
827
768
709
650
591
532
473
414
355
296
237
178
119
time (10-4 sec)
network polling
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
tim e (10-4 s e c)
973
919
865
811
757
703
649
595
541
487
433
379
325
271
217
163
109
1
0
55
Light traffic conditions:
the card is sleeping most of the time
Power (W)
1
0
60
Power (W)
1.4
Tajana Simunic Rosing
Related Work in PM for Networks
Homogeneous
networks:
802.11e standard allows applications to specify their QoS needs
Separate control and data channels for 802.11 based networks [Shi’02]
• Decoupled low power control channel, device sends a wake-up call
Protocols for sensor networks [Estrin’02]
• Low duty cycle operations on radio
Power efficient coordination in 802.11b networks [Chen’01]
• Forwarding nodes remain active while neighboring nodes are in power save
Heterogeneous
networks
Improved network and link layers for mobility
• Mobile IP protocol for host mobility [Monarch CMU]
• Contact networking [HPL & UIUC, MobiSys '03 ]
Improved hand-off mechanism in overlay networks
• Buffering of data at multiple base stations [Barwan UCB]
Distributed file system for wireless [Coda,Odyssey CMU]
• Caching of files at clients and duplication on accessible servers
• Adjust quality of accessed data to match available resources
Tajana Simunic Rosing
Related work in PM
Command
Policy
Data
Heuristic techniques
Power
Manager
Focus on power
management of a single
device (e.g. CPU, hard
disk)
Time-out and predictive
models [Karlin’94, Hwang 97]
DVS algorithms
Stochastic methods
Requests
Active
MDP [Benini 99, Qiu 99]
• Memoryless
distributions only
• Discrepancy between
predicted and measured
savings
TISMDP [Simunic 01]
• Finite history of user
occurrences
• Decisions based on
event occurrences
• Large energy savings
measured
Queue
4 32 1
Idle
min
Client Power States
Active
Idle
Sleep
Costenergy( s, a) f ( s, a)
s a
s.t.
f ( s, a) M ( s' , s, a) f ( s' , a) 0;
a
s '
s' a
T ( s, a) f ( s, a) 1
s a
Cost perf ( s, a) f ( s, a) Constr.
s a
Tajana Simunic Rosing
System Resource Management
Concentration on resource management
aspects of diverse devices in a
heterogeneous wireless network
• maintain desired QoS
• increase accessibility
• maximize battery lifetime
Implementation of policies to define
• Which devices should communicate using
which network interface
• When should this communication take place
• When should the communicating device be
in a low power state
Client RM
Server RM
Separation of policy and control
“smart” clients communicate their needs
to the resource manager
stochastic modeling of communication
patterns for independent clients
policy takes into account client’s needs
and determines control decisions
WAN
WLAN
WPAN
Tajana Simunic Rosing
Comparison
Server-driven Management
State of the art
DATA
Server
Client
Client
Server
PM CONTROLS
Server
Continuously
Streaming
wireless
always on
• 802.11b
• Doze interval regulation
• Manual on/off
• Bluetooth
• Park,sniff,hold modes have to be
specifically initiated by the server
Server
Adaptive
Buffering
wireless
on only when
data arrives
• Control power state of all components
• Use application knowledge – higher
savings possible than just at MAC and
network layers
• Schedule/buffer with multiple clients
Tajana Simunic Rosing
Server RM Architecture Detail
Client application
Server application
RTP/UDP/TCP
APPLICATION
LAYER
Server RM
TCP
CLIENT RM
OS
LP Device
Drivers
Device RM
Dynamic clock
Speed setting
CPU
Idle/
Sleep Mode
Tajana Simunic Rosing
Server RM algorithm
Server has the additional knowledge of
multiple clients workload & traffic conditions
Efficient transmission coordination in multiple
clients environment
Scheduling based on the link quality, client data
consumption rates and power needs
Estimation
Server decides the RM policy
Enable/disable various resouces (e.g. BT park)
Adjust resource parameters (e.g. doze interval)
Set duration of the sleep intervals
Schedule on time wake-up (no delay)
Server RM relies on the client RM
Buffer size calculation
Client RM controls the devices
Utilizes Rate Monotonic and Earliest Deadline
First algorithms to schedule communication
with multiple clients
Memory
energy
Communication
energy & switch
evaluation
WNIC settings
Tajana Simunic Rosing
Estimation and buffer sizing process
Maximum likelihood estimator keeps track of changes in WNIC throughput and data
usage pattern of the applications
m
new
ln( Pmax ) ( w c 1) ln
new old t j
old
j k
Size of buffer chosen to maximize sleep times
Total buffer size:
Buffer region actively involved
in data transfer in steady state:
Bactive Bsleep Bontrans Boff trans
Buffer size required during
Bswitch Tswitchu
interface switch:
B Bactive Bswitch Bcush
Average sleep time:
Tsleep
Bsleep t u
t u
Tajana Simunic Rosing
Energy consumed by WNIC and RAM
EW NIC Pactive Ttransfer Tcushion PtransTtrans PsleepTsleep PswitchTswitch
WNIC energy:
Interface
Switching
ERAM Eactive Enon active
RAM energy:
250
0.125
0.12
200
0.115
0.11
150
0.105
100
0.1
0.095
50
0.09
0
0
1
2
3
4
5
6
7
8
9
0.085
100
200
300
400
500
600
700
800
900
7
x 10
Average RAM power vs. buffer size
Total energy:
Average WNIC power vs. buffer size
Etotal EWNIC ERAM
Tajana Simunic Rosing
Server RM Scheduling Algorithms
Classical scheduling theory result: There is no dynamical scheduling algorithm for
dynamically changing tasks on multiple processors that is provably optimal [Liu’73]
Consequence: server RM scheduler for a multi-client system has to be heuristic
Two different algorithms are compared:
Earliest Deadline First:
• Schedule clients with earliest deadline first
• Optimal dynamic preemptive algorithm based on dynamic priorities for multiple tasks
on a uniprocessor can achieve 100% utilization if:
W CTE j
Tj
1
j
Rate Monotonic:
• schedule clients with highest data consumption rate first (shortest period)
• according to RM a set of periodic, independent task can be scheduled to meet their
deadlines on a uniprocessor, if the sum of the utilization factors is given as:
• Where:
WCTE j
Tj
U (n) n(21/ n 1)
j
– WCTE is worst-case execution time of task j
– Tj is the period of task j
– U(n) is the utilization bound for n tasks
Tajana Simunic Rosing
Experimental setup
Research prototype of HP’s hotspot server
HP’s IPAQ 3970 with Bluetooth (CSR) and
802.11b (CISCO Aironet 350)
Simulator for multiple client scenarios
The applications used are
MPEG4 video
MP3 audio
Email
Telnet
WWW
DSR
Results for:
Server RM to single client
Server RM to multiple clients
Data consumption rate of applications kbps
Tajana Simunic Rosing
Server RM with MPEG4 Video Streaming
WLAN Avg Power (mW)
1200
WLAN Avg Power (mW)
No PM
802.11b PM
RM
1100
1100
No PM
802.11b PM
RM
900
1000
900
700
800
500
700
600
300
size= 50
size = 70
size = 80
size = 80
size = 70
size= 50
size=20
30 frame/sec to a mobile
15 frame/sec to a mobile
40% of power reduction
65% of power reduction
larger number of concurrent clients with real time playback
Tajana Simunic Rosing
Effect of Traffic Conditions
Power Consumption vs Traffic Conditions
for Video Streaming
Average Power
(mW)
1000
800
600
SPM
802.11 PM
400
200
0
No Traffic
Stanford
(light)
HP
(Medium)
Bologna
(Heavy)
Traffic Conditions
Server RM saves 50% of power in heavy traffic conditions with respect to 802.11b PM
The server RM performs better in medium/heavy traffic conditions
802.11b PM is useful for systems where broadcast traffic cannot be discarded
Both implementations have real time video playback
Tajana Simunic Rosing
Server RM with MP3 Audio over 802.11b
180
160
140
120
100
80
60
40
20
Stream B Energy (J)
0
Stream A Energy (J)
RM
802.11 PM
802.11
•
Large energy savings with concurrent increase in number of clients supported
–
–
•
factor of 24 relative to current usage model for 802.11b
factor of 3 with respect to 802.11 PM with no broadcast traffic
Quality of service remains constant – playback is in real time
Tajana Simunic Rosing
Server RM for MP3 & E-mail via Bluetooth
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
2x concur r ent client s
0.02
1000x client s
0
M P3 B T
E-mail B T
On
On
•
•
M P3 B T
Park
Power (W)
M P3 B T
DS
E-mail B T
DS
Large increase in availability of server to clients
• Factor of 2 for MP3 streaming
• Factor of 1000 for email
Concurrent large savings in power
• Factor of 2 for MP3 streaming and a factor of 28 for email
Tajana Simunic Rosing
WNIC switch
Comparison of power consumption for an application trace consisting of MP3
audio, Email, Telnet, WWW and MPEG4 video.
A factor of 3.0x improvement over solely using Bluetooth or 802.11b
Tajana Simunic Rosing
Server RM to Multiple Clients
BLUETOOTH
802.11b
MP3
MP3
Sch. Algo *BC: best client
*WC: worst client
Avg. power
Data
(W)
trans
Sch. Algo *BC: best client Avg. powe r
*WC: worst
client
(MB)
RM
EDF
BC (24.4kB/s)
0.0737
WC (19kB/s)
0.0422
41.5
BC (24.4kB/s)
0.0635
62
WC (19kB/s)
0.0499
49.5
BC(29.3kB/s)
0.0876
81.2
WC(14.7kB/s)
0.0039
2.6
EDF
BC(29.3 kB/s)
0.0544
53.6
WC(14.7kB/s)
0.0366
31.3
BC(5.86 kB/s)
0.0179
15.8
WC(4.11kB/s)
0.0042
0.08
BC(5.86 kB/s)
0.0191
16.9
WC(4.11kB/s)
0.0105
6.7
RM
EDF
EDF
RM
EDF
EDF
63.4
WC(19 kB/s)
0.2949
37.6
BC(24.4 kB/s)
0.3784
72.7
WC (19 kB/s)
0.2578
50.74
BC(29.3kB/s)
0.0696
75.6
WC(14.7kB/s)
0.4863
0
BC(29.3 kB/s)
0.4159
84.8
WC(14.7kB/s)
0.6201
4.33
BC(5.86 kB/s)
0.0205
14.9
WC(4.11kB/s)
1.0423
0.3
EDF
BC(5.86 kB/s)
0.7785
17.9
WC(4.11kB/s)
1.0426
5.55
SENSO R NO DE
BC(14.7kB/s)
0.0442
39.3
WC(12.5kB/s)
0.0367
32.7
BC(14.7kB/s)
0.0428
40.7
WC(12.5kB/s)
0.0365
34.4
Maximum power savings factor of 42
Average a factor of 4.6
compared to always on: 0.18 W
0.0594
WWW
RM
SENSO R NO DE
RM
BC(24.4 kB/s)
MPEG4
WWW
RM
trans
(MB)
68.7
MPEG4
RM
(W)
Data
RM
EDF
BC(14.7kB/s)
0.0362
40
WC(12.5kB/s)
0.4583
11.2
BC(14.7kB/s)
0.4488
43.9
WC(12.5kB/s)
0.5768
27.3
Maximum power savings factor of 40
Average a factor of 2
compared to always on: 0.8W with 802.11b PM
Compared Server RM Rate Monotonic scheduler with Earliest Deadline First in time period of 2hrs on
two WNICs: Bluetooth (3 clients) and WLAN (30 clients) with multiple applications
Clients with lower scheduling priority had an average 10% delay penalty
Tajana Simunic Rosing
Managing Power Consumption
in Networks on Chips
Related Work
SOC interconnect standards [AMBA,CoreConnect,VSI,OCP]
NOC architecture based on packet model
Fat tree router topology [Guerrier00]
Tiled architecture with flit-reservation flow control [Dally01]
Correct-by-construction protocol stack – MESCAL tools [Sgroi01]
Reduction of energy consumption in NOCs
Maia processor has 21 satellite units; its configuration changes according to
application needs – large energy savings [Wan00]
Node and network-centric power management suggested [Benini02]
Recently proposed power management systems
Exclusively node-centric, with little or no outside information utilized
Power management & dynamic voltage scaling occur separately
Open loop control
• policies designed once with no further optimization at run time
Tajana Simunic Rosing
Network on a Chip
ARM
Core
PCM
Buffer
DMA
Controller
Flash MPEG Audio Core PM
Audio Out
Controller
ARM
Core
DSP
DMA
Controller
Speech Processing
PM
Audio In
Controller
RAM
Flash
RAM
Router
ARM
Core
DSP
Flash
DMA
Controller
MAC
controller
MPEG Video Core PM
Display
Controller
Specifications
Embedded
CPU
Communications
PM
RAM
Baseband
DSP
RAM
Flash
EEPROM
Audio
Video
Speech
Comm.
Active (mW)
700
1885
1500
1055
5140
Idle (mW)
216
235
1000
208
1659
Sleep (mW)
0.3
1.4
100
0.6
102.2
A-S-A (ms)
45.6
54.6
40
54.6
54.6
11
11
3
11
11
150
150
100
150
150
#DVS settings
DVS switch (us)
Radio
Total
Tajana Simunic Rosing
Power Manager Implementation
Local Power Manager
Router
Core
Traffic
Core
Network PM
Request
Control
Policy
Core
Function
Estimator
Core
Power management
Node-centric – fully contained in a local power manager
Network-centric – network power management requests
Local power manager implements closed-loop power management:
Estimator
• Observes incoming core traffic, core state & network PM requests
• Estimates parameters used in recalculation of power management policy
Controller
• Sets core’s energy and performance states based on estimator input
Tajana Simunic Rosing
Node-centric PM
Power management is based on Renewal Model
Departure
Active State
foVo
Idle State
queue > 0
queue = 0
Renewal
point
Arrival
Transition to
Sleep State
Arrival
Transition to
Active State
queue > 0
No Arrival
Arrival
Component
State
User
Queue > 0
Queue = 0
Device
Active
Transition
Distribution
Exponential
Pareto
Exponential
Uniform
Sleep State
Exp 1 e et
Pareto 1 b t a
t tmin
t t tmax
Uniform tmaxtmin min
0
else
Tajana Simunic Rosing
Renewal Policy Optimization
Basic assumptions:
general distribution governs the first request arrival
exponential distribution represents arrivals after the first arrival
user, device and queue are stationary
Optimize average performance under average power constraint
randomized policy
d ( j ) p( j )
min
j
T ( j ) p( j )
j
s.t.
( Energy( j ) PconstrT ( j )) p( j ) 0
j
p( j ) 1
j
Globally optimal policy calculated in seconds using LP
Tajana Simunic Rosing
Closed-loop Renewal Policy Optimization
Formulate dual of the Lagrangian
Variables v,u & are the Lagrangian multipliers
min v
s.t. d ( j ) v( j )t ( j ) u ( j )[e( j ) t(j)Pconstr ] ( j ) 0 j
Obtain a minimum crossing point of a set of lines specified by the
following equation:
v( j )
[e( j ) t(j)Pconstr ]
d ( j)
u( j )
t ( j)
t ( j)
Indexes of Lagrangian multipliers which form a solution, together with
original constraints, are used to obtain the probabilities of transitioning
into sleep state
Real-time closed-loop control is possible
Globally optimal policy calculated in milliseconds
Tajana Simunic Rosing
Node-centric estimation
Estimation of exponential and Pareto distribution parameters
Exponential
Pareto
Pareto 1 b t a
Exp 1 e et
calculate maximum likelihood ratio for
all rate settings
calculate interarrival (or interservice)
time sums ( S tj )
evaluate natural log of maximum
likelihood ratio, ln (Pmax )
n po int s
new
ln( Pmax ) k ln
( new old ) t j
old
j nchange
if ratio is larger than the one obtained
from the lookup table, assume that the
rate has changed
estimate parameters a & b using leastsquares method on the log of Pareto
distribution
1
Prob(T>t)
0.1
0.01
Experimental
Pareto Estimate
0.001
0.01
0.1
1
Idle time (t) in sec.
10
Tajana Simunic Rosing
Controller implementation
Consists of LFSR for generating probability & policy logic
Controller on entry to idle state:
obtains a random number RND & finds jh for which RND>p(jh)
if no arrival during jh seconds, the core enters sleep state,
otherwise it stays active
Frequency and voltage are set so the average expected processing delay in
the queue is kept constant:
device
Prdelay
60%
user ( user device)
50%
Error (%)
Optimal Policy
40%
30%
20%
Pow er Err
10%
Penalty Err
Idle time Probability
(ms)
to sleep
jh
p(jh)
0
0.00
10
0.00
20
0.12
30
0.43
40
0.75
50
0.87
60
0.91
70
1.00
0%
3
8
13
LFSR Bits
FPGA synthesis
LFSR
Bits
5-15
LFSR Regs
# LABs
Max ns
1
4
Policy
# LABs
2
Max ns
35
Synposys synthesis
LFSR Regs
#FFs
% area
5
14%
9
14%
15
12%
Policy
#gates
% area
193
86%
417
86%
855Tajana Simunic
87%Rosing
Network centric PM
Local Power Manager
Router
Control
Policy
Core
Traffic
Core
Core
Function
Estimator
Network PM
Request
Core
Departure
Active State
foVo
Idle State
queue > 0
queue = 0
Arrival
Arrival
Transition to
Active State Network
queue > 0 request
Arrival
Renewal
point
Network
request
Transition to
Sleep State
No Arrival
Sleep State
Tajana Simunic Rosing
Network centric PM implementation
Estimator continues to have the same function as before
Renewal model is expanded to include network requests
Controller implementation changes:
When all network cores release the local core, the probability
of transition to sleep is 1.0
Source
Node
Network
Idle Time
(ms)
0
70
120
Any time
Transition Probability
0
0.3
1
1
As soon as a request comes from a network core to the local
core, the local core transitions to the active state with
probability 1.0
Node-centric PM is still needed to implement DVS and PM in
situations when early network requests are not available
Tajana Simunic Rosing
Network-centric results
Specifications
Audio
Video
Speech
Comm.
Active (mW)
700
1885
1500
1055
5140
Idle (mW)
216
235
1000
208
1659
Sleep (mW)
0.3
1.4
100
0.6
102.2
A-S-A (ms)
45.6
54.6
40
54.6
54.6
11
11
3
11
11
150
150
100
150
150
#DVS settings
DVS switch (us)
Total
MPEG Audio Core
Speech Processing
MPEG Video Core
Communications
Power savings factor
PM
None
Node
Centric
PM Type
None
DVS only
DPM only
DVS &DPM
Network DVS &DPM
Audio
1
1.4
2
2.9
Video
1
2
1.5
2.9
Comm.
1
1
3
3
S peech
1
3.9
2
5.8
Total
1
1.2
2.4
2.9
3.7
3.6
4.2
6.4
4.1
Network-centric DPM increases power savings from a factor of
2.9 to 4.1, while at the same time reducing performance penalty
by more than 10%
Tajana Simunic Rosing
Complex Library Mapping
for Embedded Software
Using Symbolic Algebra
Related Work
Tree covering code generation [Aho]
Retargetable compiler [Goossens, Paulin, Marwedel]
Map to simple processor instruction
Instruction mapping of ASIPs
Power aware compiling
Memory access optimization [Catthoor, Kandemir]
Instruction reordering [Tiwari]
Symbolic algebra for data-flow synthesis [ICCAD’01]
Tajana Simunic Rosing
Methodology
Pre-optimized
Embedded Library
Algorithmic-level
C Code
Profiling
Critical Code
Polynomial
Formulation
Target Code
Library
Characterization
Symbolic
Library Mapping
Optimized C Code using
Embedded Library
Tajana Simunic Rosing
Library characterization
Target library:
Commercial library (e.g. Intel’s integrated performance primitives library)
A set of in-house pre-optimized routines
IEEE floating-point math library for Linux OS
Each library element is labeled with:
Type of inputs and outputs
Performance and energy consumption obtained from cycle-accurate
simulator
Functionality & accuracy of its polynomial representation
Library Element
Perf (s)
Factor
float SubBandSyn
0.94813
1
fixed SubBandSyn
0.01026
92
IPP SubBandSyn
0.00198
479
float IMDCT
0.3872
1
fixed IMDCT
IPP IMDCT
0.0144
0.0002
27
1898
xi
n
1
2
k 0
2n
y k cos(
(2i 1
n
)( 2k 1))
2
Tajana Simunic Rosing
Target code identification
Algorithm
Identify critical code segments with a profiler
Formulate maximum size polynomials
Higher likelihood of finding more complex
library elements
Achieved by transformation such as loop
unrolling, constant and variable
propagation, inlining…
Polynomial representation for the critical
code segments is calculated as follows:
• Linear functions are extracted directly
• Bit manipulations or Boolean functions
use interpolation-based algorithms
[Smith01]
• Nonlinear functions are approximated
with a Taylor or Chebyshev expansion
whose accuracy is verified via cycleaccurate simulation
Profiling
Critical Code
Polynomial
Formulation
Target Code
Source Code
Software Profile
for ( i=0; i<30; i++)
{
x[i] = y[i] + 2 * x[i + 1];
z[i] -= x[i];
y[i] = x[i] + z[i];
}
fun
energy
----------------15%
getD
sort
10%
init
2%
...
LD R21, #30;
ADD R21, R23,R27;
...
ARM Instruction-level Simulator
Profiler
Processor Core Model
L1 Cache
Energy
Consumption
Processor & L1 Cache Energy Model
Interconnect Energy Model
L2 Cache
Energy Model
DC-DC
Converter
Energy Model
Battery
Memory
Energy Model
Tajana Simunic Rosing
Mapping Algorithm
Polynomial Representation of Critical Code
THR
Factor
Expand
Polynomial Rep. of
Library Elements
Horner
Select Side
Relation Set
Add to Side
Relation Set
No
Simplify
Mapped?
Yes
Choose Best Solution
Tajana Simunic Rosing
Example
Phase shift keying modulation
Map a code segment of PSK to Library
S :=
1-.5*x0^2-x0*x1-.5*x1^2+.041667*x0^4+.166668*x0^3*x1
+.250002*x0^2*x1^2+.166668*x0*x1^3+.041667*x1^4;
Butterfly
IDCT
PSK
Cos
Sin
Mac
Library
Tajana Simunic Rosing
Example
Phase shift keying modulation
Map a code segment of PSK to Library
S :=
1-.5*x0^2-x0*x1-.5*x1^2+.041667*x0^4+.166668*x0^3*x1
+.250002*x0^2*x1^2+.166668*x0*x1^3+.041667*x1^4;
Butterfly
IDCT
› siderel := {y=x0+x1};
PSK
› simplify(S, siderel, [x0,x1,y]);
Cos
1.+.041667*y^4-.5*y^2
y := x0 + x1;
s := cos(y);
Sin
Mac
Library
Tajana Simunic Rosing
Experimental Setup
PCMCIA power
side 1
PCMCIA slot
Processor
power
SA-1100 based
embedded system with
MP3 from ISO as an
application
side 2
System & component
power measurements,
timer for performance
Tool implemented in C
with calls to Maple V
SDRAM slot
Fine grain power
measurements by
data acquisition board
Data acquisition board
Tajana Simunic Rosing
Execution time (s)
Original Code
r
rd
e
III
_
ec
o
_d
fm
an
hu
One frame decodes in 2.6 seconds
Profiler results show three critical functions
re
o
d.
..
eo
st
er
III
_
Functions
III
_
III
_
an
tia
lia
br
id
III
_
hy
tL
in
v
yn
th
an
dS
_m
dc
s
es
i
..
e_
s.
Su
bB
tiz
qu
an
de
III
_
s
1.4000
1.2000
1.0000
0.8000
0.6000
0.4000
0.2000
0.0000
Generate as large as possible polynomials for the critical
functions
Tajana Simunic Rosing
MP3 Final Results
Code version
Perf (s)
Original
503.92
IPP SubBand
301.43
IPP SubBand & IMDCT
211.27
In-House (IH) Library
5.47
In-House + IPP SubBand
3.33
IH+IPP SubBand & IMDCT
1.43
IPP MP3
0.41
Factor Energy (mWhr)
Factor
1.0 509.553
1.0
1.7 292.516
1.7
2.4 199.123
2.6
92.1
4.461 114.2
151.4
2.7955 182.3
352.4
1.1708 435.2
1240.8
0.3133 1626.4
Runs a factor of four faster than real-time
Additional energy savings are possible by using frequency
and voltage scaling
Tajana Simunic Rosing
Summary
Server Resource Management
Scheduling algorithms for multi-client heterogeneous wireless
embedded systems
Large energy savings with little or no cost in performance
Embedded HW management of NOCs
Node and network-centric approaches with closed-loop control
Power savings of a factor of 4 with network-centric approach,
while performance penalty reduced by more than 10%
Embedded software library mapping methodology
Symbolic algebra method maps to pre-optimized library elements
Significant productivity improvement with large energy savings
Tajana Simunic Rosing
Next steps
Server Resource Management
Combine independent and managed clients
Prediction techniques for changes in the channel
Better scheduling
• Knowledge of the environment for help with scheduling (e.g. location)
• Seamless transition to WAN when needed
• Heuristics to help EDF/RM algorithms handle conflicts
Embedded HW management of NOCs
Integrated resource management for cores and the interconnect topology
Reliability as another aspect of RM
Embedded software optimization
Integration of optimizations with compilers
Driver & OS optimization
Hardware-driven software optimization
• Given specific set of hardware components, apply appropriate optimizations
automatically
Tajana Simunic Rosing
Thank you!!!
Bluetooth
Supports point-to-point and point-to-multipoint (piconet) synchronous and
asynchronous connections
Maximum throughput for asynchronous connections is 109-723 kbps
Supported low power modes: Hold, Park, Sniff and Deep sleep (vendor specific)
Tran.Time
(m sec)
OFF
STANDBY
INQUIRE
(UNKNOWN
ADDRESS)
PAGE
(KNOWN
ADDRESS)
Active Mode
0.18
Hold Mode
0.061
Hold entry
1.68
0.068
Hold exit
11.62
0.216
Park Mode
TRANSMIT
DATA (DH/DM
PACKET)
CONNECTED
Master/Slave
PARK
HOLD
DEEP
SLEEP
Power
(W)
0.061
Park entry
2.16
0.077
Park exit
4.12
0.126
Sniff Mode
SNIFF
0.061
Sniff entry
0.94
0.078
Sniff exit
7.36
0.194
Deep Sleep
270μ
Tajana Simunic Rosing
WaveLAN-802.11b
Designed to work in adhoc as well as infrastructure mode
System states: Active (Transmit/Receive), Doze (802.11b PM) & Off
IEEE standard power management
Traffic indication map (TIM) after every 100ms
Doze mode activation if no data present
Tran. Time
(ms)
Power (W)
Transmit
-
1.4
Receive
-
0.93
Doze
-
0.045 –0.93
Off to Doze State
Doze entry
Doze exit
0.1
1.4
1
1.6
Active to Off State
Off entry
Off exit
1
1.7
300
2.3
Tajana Simunic Rosing
Server RM versus Client PM
Server
has the additional
knowledge of multiple clients
workload & traffic conditions
Efficient transmission coordination in
multiple clients environment
Scheduling based on the link quality
Server
•
Client PM can predict and schedule
only according to its user’s and
application requirements, no
knowledge of the rest of the
environment
decides the RM policy
Enable/disable various resouce
characteristics (e.g. BT park mode)
Adjust resource parameters (e.g.
doze interval for 802.11b PM)
Switch off wireless, set duration of
the OFF intervals
On time wake-up (no delay)
Server
RM relies on the client PM
Client PM controls the devices
Client PM Architecture
OS
PM
Apps
Kernel
Device driver
traffic
Hardware
Tajana Simunic Rosing
Client PM API
operating system
Power Manager
resource needs
Applications
Kernel
workload
Device driver
CLK speed, power states
Hardware
Power management of devices
Active, Idle & Sleep state control
Access mechanisms
Buffer management
Client power management
• Application driven
• Frequency/voltage scaling
• Idle and Sleep state implementation
Tajana Simunic Rosing
Video Streaming Example
SERVER
PM
SHUT-OFF, DELAY,
PM ENABLE/DISABLE,
SCHEDULE COMM
Client
device
CLIENT
PM
TCP
DECODING RATE, SNR,
BUFFER SIZE
MEDIA
SERVER
VIDEO DATA
RTP/UDP
MPEG4
Decoder
FEEDBACK
RCTP
Tajana Simunic Rosing
Client Power Management
2.8
2.4
2
1.6
No Policy
Power (W)
DPM DPM+DCS@191MHz
DCS = Dynamic Processor Clock Setting
DPM = Dynamic Power Management of Processor and Devices
real-time MP3 and Distributed Speech Recognition
with 30% power reduction using Client PM
Tajana Simunic Rosing
Exponential estimation
Maximum likelihood estimator versus exponential average and ideal
120
Exp. Average (gain=0.03)
Exp. Average (gain=0.05)
Maximum likelihood
Ideal
100
Rate
80
60
40
20
0
0
100
200
300
400
500
600
700
800
900
1000
Sample Number
Exp 1 e et
Tajana Simunic Rosing
Pareto estimation
Estimate Value
1
0.1
Estimate (a)
Estimate(b)
0.01
0
20
40
60
80
100
Sample Number
Pareto 1 b t
a
Estimate errors are below 10% in steady state
Tajana Simunic Rosing
Symbolic Library Mapping
Given: Polynomial rep. of critical code sections S
A characterized library of complex elements
A routine for accuracy and cost feedback
Goal: Decompose S into available complex library
Optimize power consumption and performance
Use symbolic computer algebra routines & algorithms
Higher level optimizations possible
Example: factorization and simplification
Output: library element(s) implementing S
Tajana Simunic Rosing