0624849-Nuggets - Wayne State University

Download Report

Transcript 0624849-Nuggets - Wayne State University

Machine Learning for Adaptive and Highly Reliable
Networked Computer Systems, NSF DMS-0624849
Cheng-Zhong Xu, George G. Yin, and Le Yi Wang
Wayne State University, Detroit, MI
Research Objectives: Design of adaptive, highly reliable, and
self-manageable networked computer systems via machine learning.
Approaches: Aggregated reinforcement learning, statistical
scheduling and optimization, and autonomic resource management.
Significant Results:
Fig. 1 QoS assurance for premium clients on the Internet
• Proved it feasible to use on-line feedback control approaches to provide
when the requests of WorldCup’98 was replayed.
guaranteed quality of services on Internet servers. Previous studies were
It’s adaptive to change of network conditions and server load.
limited to quality control of simple static web pages. Our eQoS approach
advances the technology to the level of control over more practical dynamic
multi-objects web pages [1].
• Proved it feasible to use statistical learning techniques to predict systems failures.
Previous studies were focused on modeling and analysis of temporal correlations of
failures. Our hPredictor approach quantifies additional spatial correlations between
processing nodes by using time-efficient aggregate stochastic models [2].
• Developed regime-switching models in stochastic optimization. We have shown that under
simple conditions, a continuous-time interpolation of the iterates of the recursive algorithm
Fir.2 hPredictor has been in operation online
to predict node failures of WSU campus grid
converges weakly to a system of ODEs with regime switching and that suitably scaled
of high perf. clusters since April 6, 2006.
sequence of the tracking errors converges to a system of switching diffusion [3]. This paves
a way to the use of stochastic approximation for autonomic resource management in large scale distributed computer systems.
Number of
observed faults
Number of
predicted faults
35
30
25
20
15
10
5
0
30
Correlated faults
25
20
15
10
5
0
0
10
20
30
40
50
60
70
80
Time (in unit of time window)
Broader Impacts: Enhance computer systems’ sustainable performance and reliability in real applications;
advance mathematical models and theories in new applications; Motivate students to participate in interdisciplinary
research in computer and mathematical sciences.
1. IEEE Trans. on Computers, 2006 (in press)
2. IEEE Trans. on DSC, to be submitted (2006)
3. SIAM J. Optim. (2004)
http://www.ece.eng.wayne.edu/~czxu/siloam.html