Transcript Example 1

Vagueness of PlanetSeer's Measurement
Presented by Karl Deng
Example 1: Number of anomalies
“Passive monitoring allows us to detect more anomalies in less
time: we have confirmed nearly 272,000 anomalies in three
months. This is roughly 3,000 a day, and is 10 to 100 times
more than reported previously.”
 What do these numbers tell us?
 PlanetSeer can detect far more anomalies in less time?
Example 1: Number of anomalies
“Passive monitoring allows us to detect more anomalies in less
time: we have confirmed nearly 272,000 anomalies in three
months. This is roughly 3,000 a day, and is 10 to 100 times
more than reported previously.”
 What is the number of unique anomalies among all these
anomalies?
 PlanetSeer tends to duplicatedly report anormalies.
 A same anomaly might be duplicately reported for a lot of
times, e.g., 1000 times.
Example 1: Number of anomalies
 Without the knowledge of “to what extent anomalies are
duplicatedly reported”, we don’t get the idea of number of
unique anomalies detected by PlanetSeer.
 We can only guess: Maybe PlanetSeer can detect more
anomalies than previous approaches.
Example 2: Anomaly Distributions
“Due to our wide coverage, we see new failure distribution and
location properties. … Tier 3 seems to be the most problematic,
accounting for almost half of the loops, path changes, and path
outages that we see.”
Example 2: Anomaly Distributions
Loops
Route change and outage
Number of hops in loops
Example 2: Anomaly Distributions
“Due to our wide coverage, we see new failure distribution and
location properties. … Tier 3 seems to be the most problematic,
accounting for almost half of the loops, path changes, and path
outages that we see.”
 To what extent we can trust these distribution numbers?
(Distribution of Samples ≠ Real Distribution)
 How good are the measurement samples?
How good are the samples?
(MonDs run on 120 CoDeeN nodes in North America)
1. Paths between CoDeeN and the clients
2. Intra-CoDeeN paths
3. Paths between CoDeeN and the origin servers.
Example 2: Anomaly Distributions
“Due to our wide coverage, we see new failure distribution and
location properties. … Tier 3 seems to be the most problematic,
accounting for almost half of the loops, path changes, and path
outages that we see.”
 The sampled paths might have strong bias.
 PlanetSeer might report much more anomalies in certain
regions than other regions.
 Maybe Tier 3 is the most problematic.
Summary from the two examples
Vagueness 1: We have no idea on the number/distribution of
unique anomalies.
No estimation on the degree of duplicate reports is
provided.
Vagueness 2: We have no idea on how representative the
measurement results are.
No estimation on measurement errors is provided.
• Errors caused by the bias of samples.
• Errors caused by the bias of measurable samples
(due to the conservative methodology).
Example of Conservative Methodology
When analyzing the scope of path outage, the author tells us:
“Among all the outages, about 47% have no complete
reference path. In the following, we use only those with
complete reference paths in the scope analysis.”
 Oh, nearly half of the samples are removed because they are
not measurable!
 How much will this affect the representativeness of the
measurement results (quantitative results)?
Conclusion
Vagueness 1: We have no idea on the number/distribution of
unique anomalies.
Vagueness 2: We have no idea on how representative the
measurement results are.
 The measurement results have very limited use.
 Many conclusions are not reliable.