TajanaRosingTalk

Download Report

Transcript TajanaRosingTalk

Resource management in
embedded systems
Tajana Šimunić Rosing
HP Labs & Stanford University
Embedded System Trends




Increasingly complex systems
 Mixed hard & soft real-time requirements
 Coordination of subsystem & multi-system control
Demand for dynamic & adaptive response
 Operation in unpredictably changing contexts
 Variable performance demands
 Management of resources (power, performance, availability,
accessibility, throughput, security etc.)
Demand for autonomy
 Human feasibility restrictions
Faster hardware
 Fast processors and networks
 Integrated processing, common platforms
 FPGAs vs ASICs, DSPs; SoCs
Source : NSF
EU-US
Workshop
‘01
Tajana
Simunic
Rosing
Overview of Contributions
 Resource
management in wireless embedded systems
 Utilize
information already present in the system to deliver lower
energy consumption with excellent quality of service
 Simultaneous increase in accessibility
 Power
management of embedded HW
 Evolution
of SoCs into NoCs
 Optimal power and performance management policy for NoCs
 Energy
optimization of embedded software
 Methodology
for lowering energy consumption in data-intensive
embedded software algorithms using symbolic algebra
techniques
Tajana Simunic Rosing
Resource Management of
Heterogeneous Wireless
Embedded Systems
Networked Embedded Systems – Sensor Nodes
Power consumption of sensor node subsystems
Power (mW)
20
15
10
5
0
SENSORS CPU
Energy breakdown for voice
TX
RX
IDLE
SLEEP
Energy breakdown for MPEG video
Encode Decode
Receive
RADIO
Encode
Transmit
Decode
Transmit
Receive
Lucent WaveLAN at 2 Mbps & SA-1100 CPU at 150 MPIS
Source : Mobicom’01
SensorsTutorial
Tajana Simunic
Rosing
802.11b PM - Doze Mode
- MAC layer PM is not sufficient due to continual network polling for data, increased
RTT and broadcast traffic issues
- Network layer management has no knowledge of application characteristics or of
behavior of other clients in the environment
2
Heavy traffic conditions:
the card is awake most of the time
1.8
1.6
Power(W)
1.4
awake state
1.2
1
0.8
0.6
0.4
2
0.2
1.8
987
929
871
813
755
697
639
581
523
465
407
349
291
233
175
117
59
1
0
1.6
tim e (10-4 s e c)
1.2
1
0.8
doze state
0.6
0.4
0.2
Medium traffic conditions:
the card is awake half of the time
945
886
827
768
709
650
591
532
473
414
355
296
237
178
119
time (10-4 sec)
network polling
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
tim e (10-4 s e c)
973
919
865
811
757
703
649
595
541
487
433
379
325
271
217
163
109
1
0
55
Light traffic conditions:
the card is sleeping most of the time
Power (W)
1
0
60
Power (W)
1.4
Tajana Simunic Rosing
Related Work in PM for Networks
 Homogeneous
networks:


802.11e standard allows applications to specify their QoS needs
Separate control and data channels for 802.11 based networks [Shi’02]
• Decoupled low power control channel, device sends a wake-up call
 Protocols for sensor networks [Estrin’02]
• Low duty cycle operations on radio
 Power efficient coordination in 802.11b networks [Chen’01]
• Forwarding nodes remain active while neighboring nodes are in power save
 Heterogeneous
networks

Improved network and link layers for mobility
• Mobile IP protocol for host mobility [Monarch CMU]
• Contact networking [HPL & UIUC, MobiSys '03 ]
 Improved hand-off mechanism in overlay networks
• Buffering of data at multiple base stations [Barwan UCB]
 Distributed file system for wireless [Coda,Odyssey CMU]
• Caching of files at clients and duplication on accessible servers
• Adjust quality of accessed data to match available resources
Tajana Simunic Rosing
Related work in PM


Command
Policy
Data
Heuristic techniques



Power
Manager
Focus on power
management of a single
device (e.g. CPU, hard
disk)
Time-out and predictive
models [Karlin’94, Hwang 97]
DVS algorithms
Stochastic methods

Requests
Active
MDP [Benini 99, Qiu 99]
• Memoryless
distributions only
• Discrepancy between
predicted and measured
savings
 TISMDP [Simunic 01]
• Finite history of user
occurrences
• Decisions based on
event occurrences
• Large energy savings
measured
Queue
4 32 1
Idle
min
Client Power States
Active
Idle
Sleep
 Costenergy( s, a) f ( s, a)
s a
s.t.
 f ( s, a)   M ( s' , s, a) f ( s' , a)  0;
a
s '
s' a
 T ( s, a) f ( s, a)  1
s a
 Cost perf ( s, a) f ( s, a)  Constr.
s a
Tajana Simunic Rosing
System Resource Management

Concentration on resource management
aspects of diverse devices in a
heterogeneous wireless network
• maintain desired QoS
• increase accessibility
• maximize battery lifetime

Implementation of policies to define
• Which devices should communicate using
which network interface
• When should this communication take place
• When should the communicating device be
in a low power state

Client RM
Server RM
Separation of policy and control
 “smart” clients communicate their needs
to the resource manager
 stochastic modeling of communication
patterns for independent clients
 policy takes into account client’s needs
and determines control decisions
WAN
WLAN
WPAN
Tajana Simunic Rosing
Comparison
Server-driven Management
State of the art
DATA
Server
Client
Client
Server
PM CONTROLS
Server
Continuously
Streaming
wireless
always on
• 802.11b
• Doze interval regulation
• Manual on/off
• Bluetooth
• Park,sniff,hold modes have to be
specifically initiated by the server
Server
Adaptive
Buffering
wireless
on only when
data arrives
• Control power state of all components
• Use application knowledge – higher
savings possible than just at MAC and
network layers
• Schedule/buffer with multiple clients
Tajana Simunic Rosing
Server RM Architecture Detail
Client application
Server application
RTP/UDP/TCP
APPLICATION
LAYER
Server RM
TCP
CLIENT RM
OS
LP Device
Drivers
Device RM
Dynamic clock
Speed setting
CPU
Idle/
Sleep Mode
Tajana Simunic Rosing
Server RM algorithm

Server has the additional knowledge of
multiple clients workload & traffic conditions
Efficient transmission coordination in multiple
clients environment
 Scheduling based on the link quality, client data
consumption rates and power needs
Estimation


Server decides the RM policy
Enable/disable various resouces (e.g. BT park)
 Adjust resource parameters (e.g. doze interval)
 Set duration of the sleep intervals
 Schedule on time wake-up (no delay)


Server RM relies on the client RM


Buffer size calculation
Client RM controls the devices
Utilizes Rate Monotonic and Earliest Deadline
First algorithms to schedule communication
with multiple clients
Memory
energy
Communication
energy & switch
evaluation
WNIC settings
Tajana Simunic Rosing
Estimation and buffer sizing process

Maximum likelihood estimator keeps track of changes in WNIC throughput and data
usage pattern of the applications
m
new
ln( Pmax )  ( w  c  1) ln
 new  old  t j
old
j k

Size of buffer chosen to maximize sleep times

Total buffer size:

Buffer region actively involved

in data transfer in steady state:
Bactive  Bsleep  Bontrans  Boff trans
Buffer size required during
Bswitch  Tswitchu
interface switch:

B  Bactive  Bswitch  Bcush
Average sleep time:
Tsleep 
Bsleep t  u 
t u
Tajana Simunic Rosing
Energy consumed by WNIC and RAM

EW NIC  Pactive Ttransfer  Tcushion   PtransTtrans  PsleepTsleep  PswitchTswitch
WNIC energy:
Interface
Switching

ERAM  Eactive  Enon active
RAM energy:
250
0.125
0.12
200
0.115
0.11
150
0.105
100
0.1
0.095
50
0.09
0
0
1
2
3
4
5
6
7
8
9
0.085
100
200
300
400
500
600
700
800
900
7
x 10
Average RAM power vs. buffer size

Total energy:
Average WNIC power vs. buffer size
Etotal  EWNIC  ERAM
Tajana Simunic Rosing
Server RM Scheduling Algorithms

Classical scheduling theory result: There is no dynamical scheduling algorithm for
dynamically changing tasks on multiple processors that is provably optimal [Liu’73]
 Consequence: server RM scheduler for a multi-client system has to be heuristic

Two different algorithms are compared:
 Earliest Deadline First:
• Schedule clients with earliest deadline first
• Optimal dynamic preemptive algorithm based on dynamic priorities for multiple tasks
on a uniprocessor can achieve 100% utilization if:

W CTE j
Tj
1
j

Rate Monotonic:
• schedule clients with highest data consumption rate first (shortest period)
• according to RM a set of periodic, independent task can be scheduled to meet their
deadlines on a uniprocessor, if the sum of the utilization factors is given as:
• Where:

WCTE j
Tj
 U (n)  n(21/ n  1)
j
– WCTE is worst-case execution time of task j
– Tj is the period of task j
– U(n) is the utilization bound for n tasks
Tajana Simunic Rosing
Experimental setup

Research prototype of HP’s hotspot server

HP’s IPAQ 3970 with Bluetooth (CSR) and
802.11b (CISCO Aironet 350)

Simulator for multiple client scenarios

The applications used are
MPEG4 video
 MP3 audio
 Email
 Telnet
 WWW
 DSR


Results for:
 Server RM to single client
 Server RM to multiple clients
Data consumption rate of applications kbps
Tajana Simunic Rosing
Server RM with MPEG4 Video Streaming
WLAN Avg Power (mW)
1200
WLAN Avg Power (mW)
No PM
802.11b PM
RM
1100
1100
No PM
802.11b PM
RM
900
1000
900
700
800
500
700
600
300
size= 50
size = 70
size = 80
size = 80
size = 70
size= 50
size=20
30 frame/sec to a mobile
15 frame/sec to a mobile
40% of power reduction
65% of power reduction
larger number of concurrent clients with real time playback
Tajana Simunic Rosing
Effect of Traffic Conditions
Power Consumption vs Traffic Conditions
for Video Streaming
Average Power
(mW)
1000
800
600
SPM
802.11 PM
400
200
0
No Traffic
Stanford
(light)
HP
(Medium)
Bologna
(Heavy)
Traffic Conditions

Server RM saves 50% of power in heavy traffic conditions with respect to 802.11b PM

The server RM performs better in medium/heavy traffic conditions

802.11b PM is useful for systems where broadcast traffic cannot be discarded

Both implementations have real time video playback
Tajana Simunic Rosing
Server RM with MP3 Audio over 802.11b
180
160
140
120
100
80
60
40
20
Stream B Energy (J)
0
Stream A Energy (J)
RM
802.11 PM
802.11
•
Large energy savings with concurrent increase in number of clients supported
–
–
•
factor of 24 relative to current usage model for 802.11b
factor of 3 with respect to 802.11 PM with no broadcast traffic
Quality of service remains constant – playback is in real time
Tajana Simunic Rosing
Server RM for MP3 & E-mail via Bluetooth
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
2x concur r ent client s
0.02
1000x client s
0
M P3 B T
E-mail B T
On
On
•
•
M P3 B T
Park
Power (W)
M P3 B T
DS
E-mail B T
DS
Large increase in availability of server to clients
• Factor of 2 for MP3 streaming
• Factor of 1000 for email
Concurrent large savings in power
• Factor of 2 for MP3 streaming and a factor of 28 for email
Tajana Simunic Rosing
WNIC switch

Comparison of power consumption for an application trace consisting of MP3
audio, Email, Telnet, WWW and MPEG4 video.

A factor of 3.0x improvement over solely using Bluetooth or 802.11b
Tajana Simunic Rosing
Server RM to Multiple Clients
BLUETOOTH
802.11b
MP3
MP3
Sch. Algo *BC: best client
*WC: worst client
Avg. power
Data
(W)
trans
Sch. Algo *BC: best client Avg. powe r
*WC: worst
client
(MB)
RM
EDF
BC (24.4kB/s)
0.0737
WC (19kB/s)
0.0422
41.5
BC (24.4kB/s)
0.0635
62
WC (19kB/s)
0.0499
49.5
BC(29.3kB/s)
0.0876
81.2
WC(14.7kB/s)
0.0039
2.6
EDF
BC(29.3 kB/s)
0.0544
53.6
WC(14.7kB/s)
0.0366
31.3
BC(5.86 kB/s)
0.0179
15.8
WC(4.11kB/s)
0.0042
0.08
BC(5.86 kB/s)
0.0191
16.9
WC(4.11kB/s)
0.0105
6.7
RM
EDF
EDF
RM
EDF
EDF

63.4
WC(19 kB/s)
0.2949
37.6
BC(24.4 kB/s)
0.3784
72.7
WC (19 kB/s)
0.2578
50.74
BC(29.3kB/s)
0.0696
75.6
WC(14.7kB/s)
0.4863
0
BC(29.3 kB/s)
0.4159
84.8
WC(14.7kB/s)
0.6201
4.33
BC(5.86 kB/s)
0.0205
14.9
WC(4.11kB/s)
1.0423
0.3
EDF
BC(5.86 kB/s)
0.7785
17.9
WC(4.11kB/s)
1.0426
5.55
SENSO R NO DE
BC(14.7kB/s)
0.0442
39.3
WC(12.5kB/s)
0.0367
32.7
BC(14.7kB/s)
0.0428
40.7
WC(12.5kB/s)
0.0365
34.4
Maximum power savings factor of 42
Average a factor of 4.6
compared to always on: 0.18 W

0.0594
WWW
RM
SENSO R NO DE
RM
BC(24.4 kB/s)
MPEG4
WWW
RM
trans
(MB)
68.7
MPEG4
RM
(W)
Data
RM
EDF
BC(14.7kB/s)
0.0362
40
WC(12.5kB/s)
0.4583
11.2
BC(14.7kB/s)
0.4488
43.9
WC(12.5kB/s)
0.5768
27.3
Maximum power savings factor of 40
Average a factor of 2
compared to always on: 0.8W with 802.11b PM
Compared Server RM Rate Monotonic scheduler with Earliest Deadline First in time period of 2hrs on
two WNICs: Bluetooth (3 clients) and WLAN (30 clients) with multiple applications
Clients with lower scheduling priority had an average 10% delay penalty
Tajana Simunic Rosing
Managing Power Consumption
in Networks on Chips
Related Work




SOC interconnect standards [AMBA,CoreConnect,VSI,OCP]
NOC architecture based on packet model
 Fat tree router topology [Guerrier00]
 Tiled architecture with flit-reservation flow control [Dally01]
 Correct-by-construction protocol stack – MESCAL tools [Sgroi01]
Reduction of energy consumption in NOCs
 Maia processor has 21 satellite units; its configuration changes according to
application needs – large energy savings [Wan00]
 Node and network-centric power management suggested [Benini02]
Recently proposed power management systems
 Exclusively node-centric, with little or no outside information utilized
 Power management & dynamic voltage scaling occur separately
 Open loop control
• policies designed once with no further optimization at run time
Tajana Simunic Rosing
Network on a Chip
ARM
Core
PCM
Buffer
DMA
Controller
Flash MPEG Audio Core PM
Audio Out
Controller
ARM
Core
DSP
DMA
Controller
Speech Processing
PM
Audio In
Controller
RAM
Flash
RAM
Router
ARM
Core
DSP
Flash
DMA
Controller
MAC
controller
MPEG Video Core PM
Display
Controller
Specifications
Embedded
CPU
Communications
PM
RAM
Baseband
DSP
RAM
Flash
EEPROM
Audio
Video
Speech
Comm.
Active (mW)
700
1885
1500
1055
5140
Idle (mW)
216
235
1000
208
1659
Sleep (mW)
0.3
1.4
100
0.6
102.2
A-S-A (ms)
45.6
54.6
40
54.6
54.6
11
11
3
11
11
150
150
100
150
150
#DVS settings
DVS switch (us)
Radio
Total
Tajana Simunic Rosing
Power Manager Implementation
Local Power Manager
Router
Core
Traffic
Core
Network PM
Request
Control
Policy
Core
Function
Estimator
Core

Power management



Node-centric – fully contained in a local power manager
Network-centric – network power management requests
Local power manager implements closed-loop power management:
Estimator
• Observes incoming core traffic, core state & network PM requests
• Estimates parameters used in recalculation of power management policy
 Controller
• Sets core’s energy and performance states based on estimator input

Tajana Simunic Rosing
Node-centric PM

Power management is based on Renewal Model
Departure
Active State
foVo
Idle State
queue > 0
queue = 0
Renewal
point
Arrival
Transition to
Sleep State
Arrival
Transition to
Active State
queue > 0
No Arrival
Arrival
Component
State
User
Queue > 0
Queue = 0
Device
Active
Transition
Distribution
Exponential
Pareto
Exponential
Uniform
Sleep State
Exp  1  e  et
Pareto 1  b  t a
 t tmin
t  t  tmax
Uniform   tmaxtmin min
 0
else
Tajana Simunic Rosing
Renewal Policy Optimization

Basic assumptions:




general distribution governs the first request arrival
exponential distribution represents arrivals after the first arrival
user, device and queue are stationary
Optimize average performance under average power constraint
 randomized policy
 d ( j ) p( j )
min
j
 T ( j ) p( j )
j
s.t.
 ( Energy( j )  PconstrT ( j )) p( j )  0
j
 p( j )  1
j
Globally optimal policy calculated in seconds using LP
Tajana Simunic Rosing
Closed-loop Renewal Policy Optimization

Formulate dual of the Lagrangian

Variables v,u &  are the Lagrangian multipliers
min v
s.t. d ( j )  v( j )t ( j )  u ( j )[e( j )  t(j)Pconstr ]  ( j )  0 j

Obtain a minimum crossing point of a set of lines specified by the
following equation:
v( j ) 


[e( j )  t(j)Pconstr ]
d ( j)
 u( j )
t ( j)
t ( j)
Indexes of Lagrangian multipliers which form a solution, together with
original constraints, are used to obtain the probabilities of transitioning
into sleep state
Real-time closed-loop control is possible
Globally optimal policy calculated in milliseconds
Tajana Simunic Rosing
Node-centric estimation

Estimation of exponential and Pareto distribution parameters
Exponential
Pareto
Pareto 1  b  t a
Exp  1  e  et


calculate maximum likelihood ratio for
all rate settings
calculate interarrival (or interservice)
time sums ( S tj )
evaluate natural log of maximum
likelihood ratio, ln (Pmax )
n po int s
 new
ln( Pmax )  k ln
 ( new   old )  t j
 old
j  nchange

if ratio is larger than the one obtained
from the lookup table, assume that the
rate has changed

estimate parameters a & b using leastsquares method on the log of Pareto
distribution
1
Prob(T>t)

0.1
0.01
Experimental
Pareto Estimate
0.001
0.01
0.1
1
Idle time (t) in sec.
10
Tajana Simunic Rosing
Controller implementation


Consists of LFSR for generating probability & policy logic
Controller on entry to idle state:


obtains a random number RND & finds jh for which RND>p(jh)
if no arrival during jh seconds, the core enters sleep state,
otherwise it stays active
Frequency and voltage are set so the average expected processing delay in
the queue is kept constant:
 device
Prdelay 
60%
 user ( user   device)
50%
Error (%)

Optimal Policy
40%
30%
20%
Pow er Err
10%
Penalty Err
Idle time Probability
(ms)
to sleep
jh
p(jh)
0
0.00
10
0.00
20
0.12
30
0.43
40
0.75
50
0.87
60
0.91
70
1.00
0%
3
8
13
LFSR Bits
FPGA synthesis
LFSR
Bits
5-15
LFSR Regs
# LABs
Max ns
1
4
Policy
# LABs
2
Max ns
35
Synposys synthesis
LFSR Regs
#FFs
% area
5
14%
9
14%
15
12%
Policy
#gates
% area
193
86%
417
86%
855Tajana Simunic
87%Rosing
Network centric PM
Local Power Manager
Router
Control
Policy
Core
Traffic
Core
Core
Function
Estimator
Network PM
Request
Core
Departure
Active State
foVo
Idle State
queue > 0
queue = 0
Arrival
Arrival
Transition to
Active State Network
queue > 0 request
Arrival
Renewal
point
Network
request
Transition to
Sleep State
No Arrival
Sleep State
Tajana Simunic Rosing
Network centric PM implementation

Estimator continues to have the same function as before

Renewal model is expanded to include network requests

Controller implementation changes:

When all network cores release the local core, the probability
of transition to sleep is 1.0
Source
Node
Network


Idle Time
(ms)
0
70
120
Any time
Transition Probability
0
0.3
1
1
As soon as a request comes from a network core to the local
core, the local core transitions to the active state with
probability 1.0
Node-centric PM is still needed to implement DVS and PM in
situations when early network requests are not available
Tajana Simunic Rosing
Network-centric results
Specifications
Audio
Video
Speech
Comm.
Active (mW)
700
1885
1500
1055
5140
Idle (mW)
216
235
1000
208
1659
Sleep (mW)
0.3
1.4
100
0.6
102.2
A-S-A (ms)
45.6
54.6
40
54.6
54.6
11
11
3
11
11
150
150
100
150
150
#DVS settings
DVS switch (us)
Total
MPEG Audio Core
Speech Processing
MPEG Video Core
Communications
Power savings factor
PM
None
Node
Centric
PM Type
None
DVS only
DPM only
DVS &DPM
Network DVS &DPM

Audio
1
1.4
2
2.9
Video
1
2
1.5
2.9
Comm.
1
1
3
3
S peech
1
3.9
2
5.8
Total
1
1.2
2.4
2.9
3.7
3.6
4.2
6.4
4.1
Network-centric DPM increases power savings from a factor of
2.9 to 4.1, while at the same time reducing performance penalty
by more than 10%
Tajana Simunic Rosing
Complex Library Mapping
for Embedded Software
Using Symbolic Algebra
Related Work

Tree covering code generation [Aho]


Retargetable compiler [Goossens, Paulin, Marwedel]


Map to simple processor instruction
Instruction mapping of ASIPs
Power aware compiling
Memory access optimization [Catthoor, Kandemir]
 Instruction reordering [Tiwari]


Symbolic algebra for data-flow synthesis [ICCAD’01]
Tajana Simunic Rosing
Methodology
Pre-optimized
Embedded Library
Algorithmic-level
C Code
Profiling
Critical Code
Polynomial
Formulation
Target Code
Library
Characterization
Symbolic
Library Mapping
Optimized C Code using
Embedded Library
Tajana Simunic Rosing
Library characterization

Target library:




Commercial library (e.g. Intel’s integrated performance primitives library)
A set of in-house pre-optimized routines
IEEE floating-point math library for Linux OS
Each library element is labeled with:
Type of inputs and outputs
Performance and energy consumption obtained from cycle-accurate
simulator
 Functionality & accuracy of its polynomial representation


Library Element
Perf (s)
Factor
float SubBandSyn
0.94813
1
fixed SubBandSyn
0.01026
92
IPP SubBandSyn
0.00198
479
float IMDCT
0.3872
1
fixed IMDCT
IPP IMDCT
0.0144
0.0002
27
1898
xi 
n
1
2

k 0
2n
 y k cos(
(2i  1 
n
)( 2k  1))
2
Tajana Simunic Rosing
Target code identification
Algorithm
Identify critical code segments with a profiler

Formulate maximum size polynomials
 Higher likelihood of finding more complex
library elements
 Achieved by transformation such as loop
unrolling, constant and variable
propagation, inlining…
 Polynomial representation for the critical
code segments is calculated as follows:
• Linear functions are extracted directly
• Bit manipulations or Boolean functions
use interpolation-based algorithms
[Smith01]
• Nonlinear functions are approximated
with a Taylor or Chebyshev expansion
whose accuracy is verified via cycleaccurate simulation
Profiling
Critical Code
Polynomial
Formulation
Target Code
Source Code
Software Profile
for ( i=0; i<30; i++)
{
x[i] = y[i] + 2 * x[i + 1];
z[i] -= x[i];
y[i] = x[i] + z[i];
}
fun
energy
----------------15%
getD
sort
10%
init
2%
...
LD R21, #30;
ADD R21, R23,R27;
...
ARM Instruction-level Simulator
Profiler
Processor Core Model
L1 Cache
Energy
Consumption
Processor & L1 Cache Energy Model
Interconnect Energy Model
L2 Cache
Energy Model
DC-DC
Converter
Energy Model
Battery

Memory
Energy Model
Tajana Simunic Rosing
Mapping Algorithm
Polynomial Representation of Critical Code
THR
Factor
Expand
Polynomial Rep. of
Library Elements
Horner
Select Side
Relation Set
Add to Side
Relation Set
No
Simplify
Mapped?
Yes
Choose Best Solution
Tajana Simunic Rosing
Example

Phase shift keying modulation
Map a code segment of PSK to Library

S :=
1-.5*x0^2-x0*x1-.5*x1^2+.041667*x0^4+.166668*x0^3*x1
+.250002*x0^2*x1^2+.166668*x0*x1^3+.041667*x1^4;
Butterfly
IDCT
PSK
Cos
Sin
Mac
Library
Tajana Simunic Rosing
Example

Phase shift keying modulation
Map a code segment of PSK to Library

S :=
1-.5*x0^2-x0*x1-.5*x1^2+.041667*x0^4+.166668*x0^3*x1
+.250002*x0^2*x1^2+.166668*x0*x1^3+.041667*x1^4;
Butterfly
IDCT
› siderel := {y=x0+x1};
PSK
› simplify(S, siderel, [x0,x1,y]);
Cos
1.+.041667*y^4-.5*y^2
y := x0 + x1;
s := cos(y);
Sin
Mac
Library
Tajana Simunic Rosing
Experimental Setup
PCMCIA power
side 1
PCMCIA slot

Processor
power

SA-1100 based
embedded system with
MP3 from ISO as an
application
side 2

System & component
power measurements,
timer for performance

Tool implemented in C
with calls to Maple V
SDRAM slot
Fine grain power
measurements by
data acquisition board
Data acquisition board
Tajana Simunic Rosing
Execution time (s)
Original Code
r
rd
e
III
_
ec
o
_d
fm
an
hu

One frame decodes in 2.6 seconds

Profiler results show three critical functions

re
o
d.
..
eo
st
er
III
_
Functions
III
_
III
_
an
tia
lia
br
id
III
_
hy
tL
in
v
yn
th
an
dS
_m
dc
s
es
i
..
e_
s.
Su
bB
tiz
qu
an
de
III
_
s
1.4000
1.2000
1.0000
0.8000
0.6000
0.4000
0.2000
0.0000
Generate as large as possible polynomials for the critical
functions
Tajana Simunic Rosing
MP3 Final Results
Code version
Perf (s)
Original
503.92
IPP SubBand
301.43
IPP SubBand & IMDCT
211.27
In-House (IH) Library
5.47
In-House + IPP SubBand
3.33
IH+IPP SubBand & IMDCT
1.43
IPP MP3
0.41


Factor Energy (mWhr)
Factor
1.0 509.553
1.0
1.7 292.516
1.7
2.4 199.123
2.6
92.1
4.461 114.2
151.4
2.7955 182.3
352.4
1.1708 435.2
1240.8
0.3133 1626.4
Runs a factor of four faster than real-time
Additional energy savings are possible by using frequency
and voltage scaling
Tajana Simunic Rosing
Summary



Server Resource Management

Scheduling algorithms for multi-client heterogeneous wireless
embedded systems

Large energy savings with little or no cost in performance
Embedded HW management of NOCs

Node and network-centric approaches with closed-loop control

Power savings of a factor of 4 with network-centric approach,
while performance penalty reduced by more than 10%
Embedded software library mapping methodology
Symbolic algebra method maps to pre-optimized library elements
 Significant productivity improvement with large energy savings

Tajana Simunic Rosing
Next steps

Server Resource Management

Combine independent and managed clients

Prediction techniques for changes in the channel

Better scheduling
• Knowledge of the environment for help with scheduling (e.g. location)
• Seamless transition to WAN when needed
• Heuristics to help EDF/RM algorithms handle conflicts


Embedded HW management of NOCs

Integrated resource management for cores and the interconnect topology

Reliability as another aspect of RM
Embedded software optimization
Integration of optimizations with compilers
 Driver & OS optimization
 Hardware-driven software optimization

• Given specific set of hardware components, apply appropriate optimizations
automatically
Tajana Simunic Rosing
Thank you!!!
Bluetooth

Supports point-to-point and point-to-multipoint (piconet) synchronous and
asynchronous connections

Maximum throughput for asynchronous connections is 109-723 kbps

Supported low power modes: Hold, Park, Sniff and Deep sleep (vendor specific)
Tran.Time
(m sec)
OFF
STANDBY
INQUIRE
(UNKNOWN
ADDRESS)
PAGE
(KNOWN
ADDRESS)
Active Mode
0.18
Hold Mode
0.061
Hold entry
1.68
0.068
Hold exit
11.62
0.216
Park Mode
TRANSMIT
DATA (DH/DM
PACKET)
CONNECTED
Master/Slave
PARK
HOLD
DEEP
SLEEP
Power
(W)
0.061
Park entry
2.16
0.077
Park exit
4.12
0.126
Sniff Mode
SNIFF
0.061
Sniff entry
0.94
0.078
Sniff exit
7.36
0.194
Deep Sleep
270μ
Tajana Simunic Rosing
WaveLAN-802.11b

Designed to work in adhoc as well as infrastructure mode

System states: Active (Transmit/Receive), Doze (802.11b PM) & Off

IEEE standard power management
Traffic indication map (TIM) after every 100ms
 Doze mode activation if no data present

Tran. Time
(ms)
Power (W)
Transmit
-
1.4
Receive
-
0.93
Doze
-
0.045 –0.93
Off to Doze State
Doze entry
Doze exit
0.1
1.4
1
1.6
Active to Off State
Off entry
Off exit
1
1.7
300
2.3
Tajana Simunic Rosing
Server RM versus Client PM
 Server
has the additional
knowledge of multiple clients
workload & traffic conditions


Efficient transmission coordination in
multiple clients environment
Scheduling based on the link quality
 Server
•
Client PM can predict and schedule
only according to its user’s and
application requirements, no
knowledge of the rest of the
environment
decides the RM policy
Enable/disable various resouce
characteristics (e.g. BT park mode)
 Adjust resource parameters (e.g.
doze interval for 802.11b PM)
 Switch off wireless, set duration of
the OFF intervals
 On time wake-up (no delay)

 Server

RM relies on the client PM
Client PM controls the devices
Client PM Architecture
OS
PM
Apps
Kernel
Device driver
traffic
Hardware
Tajana Simunic Rosing
Client PM API
operating system
Power Manager
resource needs
Applications
Kernel
workload
Device driver
CLK speed, power states
Hardware
Power management of devices
 Active, Idle & Sleep state control
 Access mechanisms
 Buffer management
Client power management
• Application driven
• Frequency/voltage scaling
• Idle and Sleep state implementation
Tajana Simunic Rosing
Video Streaming Example
SERVER
PM
SHUT-OFF, DELAY,
PM ENABLE/DISABLE,
SCHEDULE COMM
Client
device
CLIENT
PM
TCP
DECODING RATE, SNR,
BUFFER SIZE
MEDIA
SERVER
VIDEO DATA
RTP/UDP
MPEG4
Decoder
FEEDBACK
RCTP
Tajana Simunic Rosing
Client Power Management
2.8
2.4
2
1.6
No Policy
Power (W)
DPM DPM+DCS@191MHz
DCS = Dynamic Processor Clock Setting
DPM = Dynamic Power Management of Processor and Devices
real-time MP3 and Distributed Speech Recognition
with 30% power reduction using Client PM
Tajana Simunic Rosing
Exponential estimation

Maximum likelihood estimator versus exponential average and ideal
120
Exp. Average (gain=0.03)
Exp. Average (gain=0.05)
Maximum likelihood
Ideal
100
Rate
80
60
40
20
0
0
100
200
300
400
500
600
700
800
900
1000
Sample Number
Exp  1  e  et
Tajana Simunic Rosing
Pareto estimation
Estimate Value
1
0.1
Estimate (a)
Estimate(b)
0.01
0
20
40
60
80
100
Sample Number
Pareto 1  b  t
a
Estimate errors are below 10% in steady state
Tajana Simunic Rosing
Symbolic Library Mapping
Given: Polynomial rep. of critical code sections S
A characterized library of complex elements
A routine for accuracy and cost feedback
Goal: Decompose S into available complex library

Optimize power consumption and performance

Use symbolic computer algebra routines & algorithms


Higher level optimizations possible

Example: factorization and simplification
Output: library element(s) implementing S
Tajana Simunic Rosing