SigmaXi-Tu - Computer Science and Engineering
Download
Report
Transcript SigmaXi-Tu - Computer Science and Engineering
Control-Based Load Shedding in Data Stream Management
Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, Bin Yao‡
†Indiana Center of Database Systems, Department of Computer Sciences, 305 N. University Street, West Lafayette, IN 47907
‡School of Mechanical Engineering, 140 S. Intramural Drive, West Lafayette, IN 47907
Introduction
Our approach
Data Stream Management Systems (DSMSs) process
large number of data streams to answer user-specified
queries. These systems are generally built following a
query-passive data-active model, in which all data are
pushed to the database server for processing and query
results are sent to the users continuously. Data
processing delay is critical in DSMSs since query results
generated from old data are useless to users. In case of
overloading, data tuples have to be discarded without
processing in order to achieve desired processing delay.
This is called load shedding.
- View it as a feedback control problem
- Develop a dynamic model for a specific DSMS
- Design controller via rigorous control-theoretical methods
- Work on a real DSMS – the open-source Borealis system
Key Questions:
• When?
• How much?
• Where?
We focus on the first two questions.
Figure 3. The feedback control loop for load
shedding. Output (y): average tuple delay; Input
(u): tuple injection rate to DSMS; target delay
value (yr) and control error (e).
Figure 5. Relative performance of CTRL to AURORA and
BASELINE. A, B, C: various aspects of delay violations;
D: percentage of data discarded.
Results
Da
ta
User
Data
- Obtained a first-order linear model for Borealis
- Pole placement-based design ended up a PD controller:
Query
Results
Data
User
DSMS
D
at
a
Data
User
Figure 1. Pushed-based DSMS system model.
where c and H are system-specific constants and T is the
control period.
- Identified and solved several DSMS-specific problems
- Control framework evaluated with real and synthetic data
Figure 6. Robustness of CTRL and AURORA tested with
input streams of different burstiness (smaller bias factor
represents more bursty stream).
Objective
To design and implement a load shedding framework that
• minimize the data loss;
• maintains processing delays in rejection to disturbances:
- bursty data arrivals;
- internal dynamics of DSMS.
• is robust, i.e., works for a wide range of input streams.
Conclusions
1. First database work
theoretical methods;
that
uses
feedback-control-
2. Rigorous system modeling and controller design
generate a PD controller that controls average tuple
delays by adjusting the amount of load shedding;
3. Control framework implemented and evaluated in real
DSMS. Experiments show that feedback-control-based
method significantly improves control of delays with the
same amount of data loss as compared to current
solutions.
4.The above solution is also robust.
Figure 2. Examples of disturbances in data
processing in DSMS. Top: bursty arrival rates;
Bottom: unit processing costs.
Figure 4. Performance of our load shedding solution (CTRL),
AURORA, an open-loop solution that represents state-of-theart in DSMS load shedding, and BASELINE, a naïve
feedback-based solution.
Acknowledgements
This is joint work with my advisor, Prof. Sunil Prabhakar ([email protected]), Dr.
Song Liu ([email protected]) and Prof. Bin Yao ([email protected]) of the School of
Mechanical Engineering in Purdue University. The author would also like to thank Ms.
Nesime Tatbul and Prof. Ston Zdonik, both from the Computer Science department of
Brown University, for providing the Aurora/Borealis source code.