Transcript slides

Enabling Flow-level Latency
Measurements across Routers in Data
Centers
Parmjeet Singh, Myungjin Lee
Sagar Kumar, Ramana Rao Kompella
Latency-critical applications in data centers

Guaranteeing low end-to-end latency is important





Web search (e.g., Google’s instant search service)
Retail advertising
Recommendation systems
High-frequency trading in financial data centers
Operators want to troubleshoot latency anomalies


End-host latencies can be monitored locally
Detection, diagnosis and localization through a network: no
native support of latency measurements in a router/switch
Prior solutions

Lossy Difference Aggregator (LDA)



Kompella et al. [SIGCOMM ’09]
Aggregate latency statistics
Reference Latency Interpolation (RLI)


Lee et al. [SIGCOMM ’10]
Per-flow latency measurements
More suitable due to more fine-grained measurements
Deployment scenario of RLI


Upgrading all switches/routers in a data center network
Pros


Cons



Provide finest granularity of latency anomaly localization
Significant deployment cost
Possible downtime of entire production data centers
In this work, we are considering partial deployment of RLI

Our approach: RLI across Routers (RLIR)
Overview of RLI architecture
Router
Ingress I

Goal


Egress E
Latency statistics on a per-flow basis between interfaces
Problem setting


No storing timestamp for each packet at ingress and egress
due to high storage and communication cost
Regular packets do not carry timestamps
Ingress I
Reference
L
R
Packet
Injector


1E
2
Egress
Delay
Overview of RLI architecture
Latency
Estimator
Premise of RLI: delay locality
Approach
L
Linear interpolation
line
1
Interpolated
delay
2
R
Time
1) The injector sends reference packets regularly
2) Reference packet carries ingress timestamp
3) Linear interpolation: compute per-packet latency estimates at
the latency estimator
4) Per-flow estimates by aggregating per-packet estimates
Full vs. Partial deployment
RLI Sender (Reference Packet Injector)
RLI Receiver (Latency Estimator)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch 6

Full deployment: 16 RLI sender-receiver pairs
Partial deployment: 4 RLI senders + 2 RLI receivers

81.25 % deployment cost reduction

Case 1: Presence of cross traffic
RLI Sender (Reference Packet Injector)
Switch 1
Link utilization
estimation on Switch 1
Switch 2


Switch 3
RLI Receiver (Latency Estimator)
Switch 5
Cross
Traffic
Bottleneck
Link
Switch 4
Switch 6
Issue: Inaccurate link utilization estimation at the sender
leads to high reference packet injection rate
Approach



Not actively addressing the issue
Evaluation shows no much impact on packet loss rate increase
Details in the paper
Case 2: RLI Sender side
RLI Sender (Reference Packet Injector)


RLI Receiver (Latency Estimator)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch 6
Issue: Traffic may take different routes at an intermediate
switch
Approach: Sender sends reference packets to all receivers
Case 3: RLI Receiver side
RLI Sender (Reference Packet Injector)


RLI Receiver (Latency Estimator)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch 6
Issue: Hard to associate reference packets and regular
packets that traversed the same path
Approaches



Packet marking: requires native support from routers
Reverse ECMP computation: ‘reverse’ engineer intermediate
routes using ECMP hash function
IP prefix matching at limited situation
Deployment example in fat-tree topology
RLI Sender (Reference Packet Injector)
IP prefix matching
RLI Receiver (Latency Estimator)
Reverse ECMP computation /
IP prefix matching
Evaluation

Simulation setup


Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)
Simulator
10% / 1%
RLI
RLI
injection rate
Regular
Sender Reference
Receiver
Traffic
Traffic
Divider
Packet
Trace
Cross
Traffic

packets
Switch1
Cross Traffic
Injector
Results

Accuracy of per-flow latency estimates
Switch2
Accuracy of per-flow latency estimates
Bottleneck link utilization: 93%
67%
10% injection
1% injection
10% injection
CDF
1% injection
1.2%
Relative error
4.5%
18% 31%
Summary

Low latency applications in data centers

Localization of latency anomaly is important

RLI provides flow-level latency statistics, but full
deployment (i.e., all routers/switches) cost is expensive

Proposed a solution enabling partial deployment of RLI

No too much loss in localization granularity (i.e., every other
router)
Thank you! Questions?