Transcript slides
Fine-Grained Latency and Loss
Measurements in the
Presence of Reordering
Myungjin Lee, Sharon Goldberg,
Ramana Rao Kompella, George Varghese
Trend toward low-latency networks
Low latency: one of important metrics in designing a
network
Switch vendors introduce switches that provide low latency
Financial data center begins to demand more stringent latency
Benefits of low-latency networks
Content
Provider
Our network provides E-to-E
latency SLA of a few μseconds Brokerage
Financial Service
Provider
Network
Low latency
An automated trading program can buy shares cheaply
A cluster application can run 1000’s more instructions
But…
Guaranteeing low latency in data centers is hard
Reason 1: No traffic models for different applications
Congestion needs to be less than a certain level
Hinders managers from predicting offending applications
Reason 2: New application’s behavior is often unforeseen
until it is actually deployed
E.g., TCP incast problem [SIGCOMM ’09]
Latency & loss measurements are crucial
Need latency & loss measurements on a continuous basis
Detect problems
Fix: re-routing offending application, upgrading links, etc.
Goal: Providing fine-grained end-to-end aggregate latency
and loss measurements in data center environments
Content
Provider
Brokerage
A
B
E-to-E latency and
loss measurements
Measurement model
Content
Provider
Out-of-order delivery
Filter
Financial Service
A
Multiple
paths
Provider
Brokerage
…
B
Brokerage
Network
Filter
Out-of-order packet delivery due to multiple paths
Packet filtering associates packet stream between A and B
Time synchronization: IEEE 1588, GPS clock, etc.
No header changes: Regular packets carry no timestamp
Measurement model
Router A
Router B
Content
Filter
Provider Interval
Interval
Financial Service
A
Message
Provider
Network
Measurement
B
Interval
Filter
Interval message: A special ‘sync’ control packet to mark
off a measurement interval
Message
Brokerage
Injected by measurement modules at an edge (e.g., Router A)
Measurement interval: A set of packets ‘bookended’ by a
pair of interval messages
Existing solutions
Active probes
Storing timestamps and packet digests locally
Problem: Not effective due to huge probe rate requirement
Problem: Significant overhead for communication
Packet sampling: Trade-off between accuracy and overhead
Lossy Difference Aggregator (LDA) [Kompella, SIGCOMM ’09]
State-of-the-art solution with FIFO packet delivery assumption
Problem: Not suitable in case where packets can be reordered
LDA in packet loss case
Bucket
Interval
Message
Packet count
Timestamp sum
X
2 A
1
5
7
Router
Hash
2
1
1
6
2
1
2
9
11
3 B
9
Router
Hash
Corrupted
bucket
2
1
12
3
1
11
(3 – 1) + (9 – 5) + (11 – 7)
12 – 6
True delay =
=3
Estimated delay =
2
3
= 3.3
Estimation error = 9%
Key point: Only useful buckets must be used for estimation
A useful bucket: a bucket updated by the same set of packets at
A and B
Bad packets: lost packets to corrupt buckets
LDA in packet loss + reordering case
Freeze buckets
Packet count
Timestamp sum
X
2 A
1
5
7
Router
Hash
2
1
1
6
2
1
2
9
Hash
No reordering 12
12
3
True delay = 3.3
12 + 24 – 6 – 9
= 5.25
Estimated delay =
4
11
13
3 B
9
Router
Reordering
2
1
11
24
Freeze buckets
after update
Estimation error = 59%
Problem: LDA confounds loss and reordering
Packet count match in buckets between A and B is insufficient
Reordered packets are also bad packets
Significant error in loss and aggregate latency estimation
True delay = 3.3
Quick fix of LDA: per-path LDA
Let LDA operate on a per-path basis
Exploit the fact that packets in a flow are not reordered by
ECMP
Issues
(1) Associating a flow with a path is difficult
(2) Not scalable: potentially need to handle millions of separate
TCP flows
Packet reordering in IP networks
Today’s trend
New trend: Data centers exploit the path diversity
No reordering among packets in a flow
No reordering across flows between two interfaces
ECMP splits flows across multiple equal-cost paths
Reordering can occur across flows
Future direction: Switches may allow reordering within
switches for improved load balancing and utilization
Reordering-tolerant TCP for use in data centers
Proposed approach: FineComb
Objective
Detect and correct unusable buckets
Control the number of unusable buckets
Key ideas
1) Incremental stream digests: Detect unusable buckets
2) Stash recovery: Make corrupted buckets useful by correction
3) Packet sampling: Control the number of bad packets included
Incremental stream digests (ISDs)
An ISD = H(pkt1) H(pkt2) … H(pktk)
Property 1: Low collision probability
is an invertible commutative operator (e.g., XOR)
Two different packet streams hash to different value
Allows to detect corrupted buckets
Property 2: Invertibility
Easy addition/subtraction of a packet digest from an ISD
The basis of stash recovery
ISDs handles loss and reordering
X
2A
04 A
03
06
Router
Hash
Packet count
Timestamp sum
ISD
2
1
1
6
2
1
2
9
03
09
04
2E
2A
03 B
06
10
Router
Hash
value
Hash
ISDs don’t
match
2
1
12
3
2
1
11
24
03
09
3A
2A
ISDs detects corrupted buckets by loss and reordering
Buckets are usable only if both packet counts and ISDs match
each other between A and B
True delay = 3.3
Latency and loss estimation
Router A
Packet count
Timestamp sum
ISD
2
6
09
2
9
2E
3
9
A1
Router B
2
12
09
2
1
24 19
3A 9C
Average latency estimation
Delay sum = (12 – 6) + (0 – 0) + (0 – 0) = 6
Count = 2 + 0 + 0 = 2
Average latency = 3.0
Loss estimation
Loss count sum = (2 – 2) + (2 – 2) + (3 – 1) = 3
Total packets = 2 + 2 + 3 = 7
Loss rate = 0.43
Stash recovery
Stash: A set of (timestamp, bucket index, hash value) tuple
of packets which are potentially reordered
(-) stash
Contains packets potentially added to a receiver (Router B)
In recovery, packet digests are subtracted from bad buckets at
a receiver
(+) stash
Contains packets potentially missing at a receiver (Router B)
In recovery, packet digests are added to bad buckets at a
receiver
Stash recovery
A bad bucket can be recovered iff reordered packets
corrupted it
Reordered packets are not counted as lost packets Increase loss
estimation accuracy
(–) Stash in B
A bucket in A
2
12
A bucket in B
2E
ISDs match
don’t
ISDs
match
1
3
–
34
3E
1
5
2
10
04
2
29
32
2E
3A
{
{
{
2
04
1
5
10
All subsets
1
2
04
1
5
10
1
2
04
}
}
1
5
10
}
Sizing buckets and stashes
Known loss and reordering rates
Given a fixed storage size, we obtain the optimal packet
sampling rate (p*)
We provision stash and buckets based on the the p*
Unknown loss and reordering rates
Use multiple banks optimized for different set of loss and
reordering rate
Details can be found in our paper
Accuracy of latency estimation
Average relative error
Packet loss rate = 0.01%, #packets = 5M, true mean delay = 10μs
1000x difference
Reordering rate
FineComb: ISD+stash,
FineComb-: ISD only
Accuracy of loss estimation
Average relative error
Packet loss rate = 0.01%, #packets = 5M
Stash helps to obtain accurate loss estimation
Reordering rate
Summary
Data centers require end-to-end fine-grain latency and
loss measurements
We proposed a data structure called FineComb
Resilient to packet loss and reordering
Incremental stream digest detects corrupted buckets
Stash recovers buckets only corrupted by reordered packets
Evaluation shows FineComb achieves higher accuracy in
latency and loss estimation than LDA
Thank you! Questions?
Backup
Average relative error
Microscopic loss estimation
Reordering rate
Average relative error
Handling unknown loss & reordering
rates
Reordering rate
LDA: 2-banks, FineComb: 4-banks with same memory size