Transcript slides
Enabling Flow-level Latency
Measurements across Routers in Data
Centers
Parmjeet Singh, Myungjin Lee
Sagar Kumar, Ramana Rao Kompella
Latency-critical applications in data centers
Guaranteeing low end-to-end latency is important
Web search (e.g., Google’s instant search service)
Retail advertising
Recommendation systems
High-frequency trading in financial data centers
Operators want to troubleshoot latency anomalies
End-host latencies can be monitored locally
Detection, diagnosis and localization through a network: no
native support of latency measurements in a router/switch
Prior solutions
Lossy Difference Aggregator (LDA)
Kompella et al. [SIGCOMM ’09]
Aggregate latency statistics
Reference Latency Interpolation (RLI)
Lee et al. [SIGCOMM ’10]
Per-flow latency measurements
More suitable due to more fine-grained measurements
Deployment scenario of RLI
Upgrading all switches/routers in a data center network
Pros
Cons
Provide finest granularity of latency anomaly localization
Significant deployment cost
Possible downtime of entire production data centers
In this work, we are considering partial deployment of RLI
Our approach: RLI across Routers (RLIR)
Overview of RLI architecture
Router
Ingress I
Goal
Egress E
Latency statistics on a per-flow basis between interfaces
Problem setting
No storing timestamp for each packet at ingress and egress
due to high storage and communication cost
Regular packets do not carry timestamps
Ingress I
Reference
L
R
Packet
Injector
1E
2
Egress
Delay
Overview of RLI architecture
Latency
Estimator
Premise of RLI: delay locality
Approach
L
Linear interpolation
line
1
Interpolated
delay
2
R
Time
1) The injector sends reference packets regularly
2) Reference packet carries ingress timestamp
3) Linear interpolation: compute per-packet latency estimates at
the latency estimator
4) Per-flow estimates by aggregating per-packet estimates
Full vs. Partial deployment
RLI Sender (Reference Packet Injector)
RLI Receiver (Latency Estimator)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch 6
Full deployment: 16 RLI sender-receiver pairs
Partial deployment: 4 RLI senders + 2 RLI receivers
81.25 % deployment cost reduction
Case 1: Presence of cross traffic
RLI Sender (Reference Packet Injector)
Switch 1
Link utilization
estimation on Switch 1
Switch 2
Switch 3
RLI Receiver (Latency Estimator)
Switch 5
Cross
Traffic
Bottleneck
Link
Switch 4
Switch 6
Issue: Inaccurate link utilization estimation at the sender
leads to high reference packet injection rate
Approach
Not actively addressing the issue
Evaluation shows no much impact on packet loss rate increase
Details in the paper
Case 2: RLI Sender side
RLI Sender (Reference Packet Injector)
RLI Receiver (Latency Estimator)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch 6
Issue: Traffic may take different routes at an intermediate
switch
Approach: Sender sends reference packets to all receivers
Case 3: RLI Receiver side
RLI Sender (Reference Packet Injector)
RLI Receiver (Latency Estimator)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch 6
Issue: Hard to associate reference packets and regular
packets that traversed the same path
Approaches
Packet marking: requires native support from routers
Reverse ECMP computation: ‘reverse’ engineer intermediate
routes using ECMP hash function
IP prefix matching at limited situation
Deployment example in fat-tree topology
RLI Sender (Reference Packet Injector)
IP prefix matching
RLI Receiver (Latency Estimator)
Reverse ECMP computation /
IP prefix matching
Evaluation
Simulation setup
Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)
Simulator
10% / 1%
RLI
RLI
injection rate
Regular
Sender Reference
Receiver
Traffic
Traffic
Divider
Packet
Trace
Cross
Traffic
packets
Switch1
Cross Traffic
Injector
Results
Accuracy of per-flow latency estimates
Switch2
Accuracy of per-flow latency estimates
Bottleneck link utilization: 93%
67%
10% injection
1% injection
10% injection
CDF
1% injection
1.2%
Relative error
4.5%
18% 31%
Summary
Low latency applications in data centers
Localization of latency anomaly is important
RLI provides flow-level latency statistics, but full
deployment (i.e., all routers/switches) cost is expensive
Proposed a solution enabling partial deployment of RLI
No too much loss in localization granularity (i.e., every other
router)
Thank you! Questions?