2011-sam - Rice University -

Download Report

Transcript 2011-sam - Rice University -

Energy-Efficient Soft Real-Time CPU
Scheduling for Mobile Multimedia
Systems
Authors: Wanghong Yuan, Klara
Narhstedt
Appears in SOSP 2003
Presented by: Samuel Kim
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Algorithm
Implementation
Results
Related Works
Summary
About the Authors
• Wanghong Yuan
▫ B.S., M.S. Peking University
▫ Ph.D. University of Illinois at Urbana-Champaign
▫ Software Engineer at Google
• Klara Nahrstedt
▫ Ph.D. University of Pennsylvania
▫ Professor at University of Illinois at UrbanaChampaign
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Algorithm
Implementation
Experimental Results
Related Works
Conclusion
Introduction
• Multimedia Becoming A Standard in Mobile
Computing
▫ Audio
▫ Video
▫ Data
• Goal on Mobile Systems
▫ Manage System Resources
 Quality of Service - High Performance
 Energy Efficiency - Battery Life
Greater Control Over System Resources
• Hardware Adaptability
▫ CPU Voltage Scaling
▫ E = a•C•V•f2•t
• Software Adaptability
▫ Application Quality levels
▫ Statistical Performance Requirements
 Soft Real-Time guarantees
How Do We Approach System Resource
Management?
▫ Adapt resources based on system layers
Adaptive Layers
Application
Operating System /Network
Hardware
▫ Most approaches in research adapt a single layer
▫ Possible to adapt across multiple layers?
Multiple Layer Adaptation Requires
Coordination
• Conflict Adapting Multiple System Layers
▫ Scale down CPU
▫ Increase the application QoS
• Different Objectives
▫ Minimize Energy Consumption
▫ Maximize Quality/Performance
• Coordinate Objectives at a Higher Level
The Purpose of GRACE Framework:
Cross-Layer Adaptation
• Global Resource Adaptation via CoopEration
GRACE
Current Approaches
Operating
System
Operating System
Architecture, Hardware
Coordinator
Application
Network Protocols
Architecture,
Hardware
Application
Network
Protocols
• Adaptation over 1 or 2 layers
• Global cooperation of resources
Figures from S. Adve. “The Illinois GRACE Project: Global Resource Adaptation through
CoopEration”, Workshop on Self-Healing Adaptive and Self-Managed Systems, 2002
GRACE-OS: Enhanced CPU Scheduler
• Previous Methods
▫ Soft Real-Time (SRT) Scheduling
▫ Dynamic Voltage Scaling (DVS)
• GRACE-OS
▫ DVS is integrated into the CPU Scheduler
▫ Continue to keep performance guarantees of SRT
Scheduling
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Design and Algorithm
Implementation
Experimental Results
Related Works
Conclusion
Design of GRACE-OS
• Profiler
▫ How to estimate cycle usage?
 Monitor CPU cycle usage of a task
 Estimate demand by online profiling
• SRT Scheduler
▫ How to allocate CPU Resources?
 Allocate CPU cycles to task based on profiler
• Speed Adapter
▫ How to set CPU Speed/Voltage?
 Set CPU to minimum required speed based on
#cycles allocated
Algorithm: Profiler
• How to estimate the cycle usage?
• Estimate based on statistical distribution instead
than instantaneous demand
▫ More stability in CPU speeds
▫ Meets performance requirements of SRT
• Profile during run-time
Algorithm: SRT Scheduler
• Determine which task to execute
▫ When and how long (# CPU cycles)
• Grace-OS is a stochastic scheduler
▫ Decide # cycles to allocate based on:
 Performance requirement, p
 Demand distribution of task
▫ F(C) = P[X ≤ C] ≥ p
 X, # cycles required for task
 C, # cycles allocated to task
Algorithm: Dynamic DVS
• As cycle number increases, CPU accelerates
• Minimize energy consumption
▫ Constraint: CPU period is less than period
allocated for task
• Frequency a function of cycle count
▫ ∀𝑓𝑏𝑖 =
𝑚
𝑗=0
𝑠𝑗3(1−𝐹(𝑏𝑗))
𝑠𝑖 (1−𝐹 𝑏𝑖 )
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Algorithm
Implementation
Experimental Results
Related Works
Conclusion
Implementation
• Testbed
▫ HP Pavilion N5470 Laptop (Athlon Processor)
▫ Red Hat Linux 7.2
▫ Modified Linux kernel 2.4.18 (GRACE-OS)
• Software Architecture of Implementation
Implementation
• System calls added to support SRT tasks
▫
▫
▫
▫
▫
start_srt – start real-time mode
exit_srt – exit real-time mode
finish_job – tell scheduler that task finished job
set_budget – allocate cycles for task
set_dvspnt – set CPU speed in task’s speed schedule
• Modifying the process control block
▫
5 attributes
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Algorithm
Implementation
Experimental Results
Related Works
Conclusion
System Call Overhead
• System Calls: 900-1300 cycles
• Multimedia Processing: 2x105 - 2x108 cycles
▫ 0.0004% - 0.5% of cycles per job
Profiling and Estimation Overhead
• Profiling Cost: 26-38 cycles
• Overhead for online demand estimation is high
(0.1% - 100% of cycles per job)
▫ Demand estimation should be infrequent
▫ Stable models allow for infrequent estimation
Figure: Cost of Demand Estimation
Speed Scaling Overhead
• Costs 8,000 to 16,000 cycles (~10-50 us)
• Should be invoked infrequently (500 us in GRACE-OS)
• Speed change overhead should improve with processor
design
Stability of Demand Distribution
• Codec: mpgplay
a) Cycle usage varies greatly
b) Demand distribution
remains stable
Stability of Demand Distribution
(Other Codecs)
Efficiency of GRACE-OS
• Compare to other allocation schemes
• Running Single Applications
▫ Misses deadlines 0.3%-0.6%
▫ 92% CPU busy time at lowest CPU speed
▫ 53.4%-71.6% reduction in energy
• Running Multiple Applications
▫ Misses deadlines 4.9%
▫ 83.8% CPU busy time at lowest CPU speed
CPU Usage for Multiple Applications
• Dynamic DVS spends more time in lowest CPU
speed than other DVS schemes
Energy Efficiency of GRACE-OS
• toast and madplay – Low CPU demand
• GRACE-OS savings limited by CPU settings
Impact of Setting Performance p
• Normalized energy increases p = 0.5 to p = 0.95
• Fewer energy savings p = 0.95 to p = 1.0
▫ Need more CPU settings
Impact of p on Normalized Energy
Impact of Mixed Workload
• Extra allocation to extra best-effort applications
increases energy consumption
▫ Less time for each application
▫ Increases total CPU demand
Impact of Mixed Workload
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Algorithm
Implementation
Experimental Results
Related Works
Conclusion
Related Works: Soft Real-Time
Scheduling
• Proportional Sharing
▫ A. Chandra, M. Adler, P. Goyal, and P. Shenoy. Surplus fair scheduling: A proportionalshare CPU scheduling algorithm for symmetric multiprocessors. In Proc. of 4th
Symposium on Operating System Design and Implementation, Oct. 2000.
• CPU Reservations
▫ M. Jones, D. Rosu, and M. Rosu. CPU reservations & time constraints: Efficient,
predictable scheduling of independent activities. In Proc. of 16th Symposium on
Operating Systems Principles, Oct. 1997.
• Real-Time Scheduling Algorithms
▫ C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard
real-time environment. JACM, 20(1):46–61, Jan. 1973.
• Stochastic Scheduling
▫
K. Gardner. Probabilistic analysis and scheduling of critical soft real-time systems. PhD thesis,
Department of Computer Science, University of Illinois at Urbana-Champaign, 1999.
Related Works: Dynamic Voltage
Scaling
• General Purpose DVS based on Average CPU Utilization
▫ D. Grunwald, P. Levis, K. Farkas, C. Morrey III, and M. Neufeld. Policies for
dynamic clock scheduling. In Proc. of 4th Symposium on Operating System
Design and Implementation, Oct. 2000.
• Real Time DVS
▫ P. Pillai and K. G. Shin. Real-time dynamic voltage scaling for low-power
embedded operating systems. In Proc. of 18th Symposium on Operating Systems
Principles, Oct. 2001.
• Stochastic DVS
▫ J. Lorch and A. Smith. Improving dynamic voltage scaling algorithms with PACE.
In Proc. of ACM SIGMETRICS 2001 Conference, June 2001.
▫ F. Gruian. Hard real-time scheduling for low energy using stochastic data and
DVS processors. In Proc. Of Intl. Symp. on Low-Power Electronics and Design,
Aug. 2001.
Table of Contents
•
•
•
•
•
•
•
About the Authors
Introduction
Algorithm
Implementation
Experimental Results
Related Works
Conclusion
Conclusion
• Pros
▫
▫
▫
▫
▫
Optimizes multiple layers of system resources
Conserve energy while ensuring quality of service
Small overhead
Support for multiple tasks
Thorough testing
• Cons
▫ Estimate energy savings without measurement
▫ Testing limited to multimedia applications
▫ Limited number of tests per codec
 8 runs per test
 Discard largest and smallest values
▫ Limited CPU speed settings decreases energy savings