Reading Report 9 Yin Chen 29 Mar 2004

Download Report

Transcript Reading Report 9 Yin Chen 29 Mar 2004

Reading Report 9
Yin Chen
29 Mar 2004
Reference:
Multivariate Resource Performance Forecasting in the
Network Weather Service, Martin Swany and Rich Wolski
http://sc-2002.org/paperpdfs/pap.pap292.pdf
1
Problems Overview

Frequently, monitor data is used as a prediction of future performance.

The Common Method
“Forecast “ based on the last value :
Assume the performance of a given transfer will be the same as it was the last
time it was performed.
 i.e., For network bandwidth estimation, many users simply conduct lengthy data
transfer between end-hosts, observe the throughput, and use that observation as
the prediction.


Two Problems

Not clear that the last observation is a good estimation.
 Require the resource be “loaded” enough :
 On high-throughput networks with lengthy round-trip times, enough data
must be transferred to match the bandwidth-delay product for the end-to-end
route.
 Costly -- waste resource and lost time while the probe take place.
2
Network Weather Service’s Solutions

Using statistical techniques

Combine other performance monitoring tools

Combine infrequent, irregularly spaced expensive measurements with regularly
spaced, but far less intrusive probes

Combine short NWS bandwidth probes and previous HTTP history transfers

Combine data from two or more measurement streams

Combine heavy-weight operations with lightweight, inexpensive probes of a
resource
3
Forecasting Using Correlation

The univariate forecaster represent the current and future performance of a
single dataset.

The multivariate forecaster operate on some subset of a collection
measurement series, characterized by units and frequency, with a
combination of correlation and forecasting.

The correlator serves to map X values onto Y values where the X values are
plentiful and “cheap” and the Y values are “rare” and expensive.

Given two correlated variables, knowledge of one at a given point in time provides
information as to the value of the other.

Fix the value of one of the variables can make reasonable assumptions about the
probability of various value of the other variable, based on their history of
correlation.
4
Possible Correlator Methodology

Possible Mapping Methods :

Linear regression
 Traditional correlation operate on datasets of equal sizes
 Assume that the variables in question are related linearly

Problems -- measurement data gathered form a variety of sources:

Data sets may be of different sizes
 Data items in each set may be gathered with different frequencies or with
different regularities
 The units of measure may be different between data sets.
5
Correlator Methodology of the Work

Rank Correlation Techniques

A rank correlation measure sorts datasets and does linear correlation based on
their position in the sorted list (their “rank”).
 It is non-parametirc, appropriated because the distribution of the data cannot be
certain of.

Cumulative Distribution Function (CDF)

Defined as the probability for some set of sample points that a value is less than or
equal to some real valued x.
 If the probability density function (PDF) is known, CDF is defined as:

The reason of using CDF
 Practicality of delivering this information to Grid
 A CDF dataset may be compressed with varying degrees of loss. i.e., a
handful of points can describe the CDF with linear interpolation between
specified points.
6
Correlator Methodology of the Work (Cont.)

Correlator Methodology

Approximate the CDF by :

Where countx is the total elements in the sample set. Which is equivalent to

Rank measurements in datasets of different size by computing the CDF for each of them.
Use the position of the value in X (xtarget) to produce a value of Y (yforecast)






This gives the value in Y (in CDFY) associated with the position xtarget
If the CDF position of xtarget (CDFx(xtarget)) is between measured Y values, linear interpolation is
used to determine the value of yforecast
This mapping is similar to quantile-quantile comparision.
The xtarget can be chosen in a variety of ways.
Choose xtarget by using the NWS forecasters to produce the value
Xtarget = Forecast(X)
7
Correlator Methodology of the Work (Cont.)

Figure 1 shows the NWS(X) values.

Figure 2 shows the HTTP(X) values.

To compute the forecast for Y use the position
in the CDF of xtarget which yields some real
number between 0 and 1

This predictive method forms both the X and Y
CDFs. At each prediction cycle, a univariate
forecast of X is taken and that X value’s
position is determined. That value is used to
determine the value in Y at the corresponding
position, which is the prediction.
8
Correlator Methodology of the Work (Cont.)

Error

Compute the Mean Normalized Error Percentage (MNEP) as the last
prediction’s error over the current average value.
 This is required due to the forecast-oriented application of the system
 The mean used (meant) is the current mean and is reevaluated at each timestep.
9
Experimental Methodology

Generate default-sized NWS bandwidth
measurements (64K bytes), using the
TCP/IP end-to-end sensor between a
parit of machines every 10 sec.

Initiate a 16MB HTTP transfer every
min. and record the observed transfer
bandwidth.

Figure 3 shows the NWS data

Figure 4 shows the HTTP data

The exact correlative relationship is not
clear visually, but the shapes appear
similar.
10
Experimental Methodology (Cont.)

Goal


Why HTTP?





Widely used in the Internet
A key components of SOAP and Web services
Used more and more in high-performance computing
This data represents:
 The relationship between short and long network experiments.
 Instrumentation data being fed into to the forecasting service.
Compare with 3 prediction methods




Use this data to investigate the accuracy with which the short 64K NWS transfer could be used to
predict the long 16MB HTTP transfers given successively longer periods.
Last value method
The univariate NWS forecast applied solely to the 16MB probe history
The new correlative methods
Why 16MB?




To limit intrusiveness
Less variability in the observed bandwidth as the amount of data transferred gets larger.
Any transient periods of high congestion are amortized over the longer transfer time.
Not so long as to make things easy, but long enough to represent real applications and real data.
11
Results

Figure 5 shows the MAE of the univariate
compared to the multivariate forecasters.

Multivariate forecaster performs better.
 As Y become less frequent, the univariate
forecasters lose accuracy.
 Using the new forecasting technique, the
NWS can predict 16MB bandwidth using 64K
measurements with a mean error of 0.47
megabits/sec.

Figure 5. Comparison of MAE between univariate
and multivariate forecasts for different
frequencies of HTTP measurements.
Figure 6 shows MNEP of the MAE.

The MNEP shows the percentage of the
current average value that an error
represents.
 At 500 min. the 64K transfers can predict
16MB transfer bandwidth to within 7% on the
average.
Figure 6. Comparison of MNEP of MAE between
univariate and multivariate forecasts for different
frequencies of HTTP measurements.12
Results (Cont.)

Figure 7 shows the square root of the MSE
of both types of forecasts.

This value bears resemblance to the standard
deviation and tends to indicate the variance
of the error.
 The new method predict relatively high levels
of accuracy.

Figure 8 show the MNEP of the MAE for the
CDF forecaster and the “Last Value”
predictor.

Figure 7. Comparison of the square root of the MSE
between univariate and multivariate forecasts for
different frequencies of HTTP measurements.
Last Value predictor has over 100% error.
END…
Figure 8. Comparison of MNEP of MAE between
“Last Value” and multivariate forecasts.
13