Three Challenges in Reliable Data Transport over Heterogeneous

Download Report

Transcript Three Challenges in Reliable Data Transport over Heterogeneous

Reliable Data Transport over
Heterogeneous Wireless Networks
Hari Balakrishnan
MIT Lab for Computer Science
Motivation
25
Rapid growth
20
Cellular phones
# of units/hosts
(millions)
Sources:
Ericsson, Inc.
Matthew Gray, MIT
15
10
Internet
hosts
5
0
1993
1994
1995
1996
1997
Year
• But wireless data is floundering...
Enormous heterogeneity
 Poor performance

Goal: To make wireless devices first-class Internet citizens
Wireless Heterogeneity
Metricom Ricochet
Lucent WaveLAN
Regional-Area
Metro-Area
Cellular Digital
IBM Infrared
Packet Data (CDPD)
Campus-Area
Packet Radio
In-Building
Wireless Performance
Technology
IBM
Infrared
Lucent
WaveLAN
Metricom
Ricochet
Hybrid
wireless cable
Rated
Typical TCP
Bandwidth Throughput
1 Mbps
100-800 Kbps
2 Mbps
50 Kbps-1.5 Mbps
100 Kbps
10-35 Kbps
10 Mbps
0.5-3.0 Mbps
Goal: To bridge the gap between perceived and rated performance
TCP Overview
1. Loss recovery
7
8
10
9
6
5
4
3
1
0
1
1
1
1
2
0
lost
Timeouts based on mean round-trip time (RTT) and deviation
Fast retransmissions based on duplicate ACKs
2. Congestion control
 Window-based algorithm to determine
sustainable rate
 Upon congestion, reduce window
 “ACK clocking” sends data smoothly
TCP Dynamics
4.19E+06
Sequence number (bytes)
Data
4.18E+06
4.17E+06
Fast retransmission
4.16E+06
RTT
Window
4.15E+06
Duplicate ACKs
ACKs
4.14E+06
4.13E+06
33.52
33.54
33.56
33.58
33.6
33.62
Time (s)
33.64
33.66
33.68
33.7
33.72
Wireless Transport: The Three Challenges
• Preponderance of wireless bit-errors


Corruption vs. congestion losses
Solution: Snoop protocol
• Asymmetric effects


Bandwidth asymmetry & latency variability
Solution: TCP mods + link-layer optimizations
• Low channel bandwidths


Small windows
Solution: Limited Transmit, an optimization to TCP’s loss
recovery
Challenge #1: Wireless Bit-Errors
Internet
Router
Loss  Congestion
Burst losses lead to coarse-grained timeouts
23
2121
Loss ==> Congestion
Result: Low throughput
0
Performance Degradation
Sequence number (bytes)
2.0E+06
Best possible
TCP with no errors
(1.30 Mbps)
1.5E+06
TCP Reno
(280 Kbps)
1.0E+06
5.0E+05
0.0E+00
0
10
20
30
40
50
60
Time (s)
2 MB wide-area TCP transfer over 2 Mbps Lucent WaveLAN
Conventional Approaches
End-to-end
• Link-layer protocols
Base Station
• Adverse interactions with
transport layer
 Timer interactions
 Interactions with fast
retransmissions
 Large round-trip time
variation
ARQ/FEC
Wired connection
Wireless connection
• Split connections
 Wireless connection need not
be TCP
• Hard state at base station
 Complicates mobility
 Vulnerable to failures
• Violates end-to-end semantics
Our Solution: Snoop Protocol
• Shield TCP sender from wireless vagaries


Eliminate adverse interactions between protocol layers
Congestion control only when congestion occurs
• Preserve current TCP/IP service model


Maintain end-to-end semantics
Is connection splitting fundamentally important?
• Eliminate non-TCP protocol messages

Is link-layer messaging fundamentally important?
Fixed to mobile: transport-aware link protocol
Mobile to fixed: link-aware transport protocol
Snoop Protocol: FH to MH
6
4 3 2
1
Snoop agent
5
Base Station
FH Sender
1
Snoop agent: active interposition agent



Snoops on TCP segments and ACKs
Detects losses by duplicate ACKs and timers
Suppresses duplicate ACKs from FH sender
Cross-layer protocol design: Snoop agent
state is soft
Mobile Host
Snoop Protocol: FH to MH
1
Snoop Agent
Base Station
FH Sender
Mobile Host
Snoop Protocol: FH to MH
5
4
3
2
1
Base Station
FH Sender
Mobile Host
Snoop Protocol: FH to MH
6
4 3 2
1
5
Base Station
FH Sender
1
Mobile Host
Snoop Protocol: FH to MH
6
4 3 2
5
Sender
1
Base Station
3
2
21
Mobile Host
Snoop Protocol: FH to MH
5 4 3 2
1
6
Base Station
4
3
Sender
2
Duplicate ACK
ack 0
Mobile Host
1
Snoop Protocol: FH to MH
6 5 4 3 2
1
6
Base
5 Station
1
Sender
Retransmit from cache
at higher priority
ack 0
4 3 2
ack 0
ack 0
Mobile Host
1
Snoop Protocol: FH to MH
6 5 4 3 2
1
Base Station
5
Sender
ack 0
Suppress
Duplicate Acks
1 4 3 2
ack 4
Mobile Host
1
Snoop Protocol: FH to MH
6 5
Clean cache on new ACK
Base Station
6
Sender
ack 4
5 1 4 3 2
ack 5
Snoop Protocol: FH to MH
6
Base Station
Sender
ack 4
ack 5
6
1 5 4 3 2
ack 6
Mobile Host
Snoop Protocol: FH to MH
7
9
8
Base Station
Sender
ack 5
ack 6
6
Active soft state agent at base station
Transport-aware reliable link protocol
Preserves end-to-end semantics
1 5 4 3 2
Mobile Host
Handling Mobility: Use Local Multicast
5
4
Home Agent
Sender
1
Base Station
(Snoop agent)
1
1
3
2
2
Base Station
(Snoop agent)
Handling Mobility
6
5
Home Agent
Sender
Base Station
(Snoop agent)
2 1
2
4
3
1
Base Station
(Snoop agent)
3
1
Snoop Protocol: MH to FH
Base Station
3
21
2
Receiver
Caching and retransmission will not work

Sender

Losses occur before packet reaches BS
Congestion losses should not be hidden
Solution #1: Negative ACKs (NACKs)

NACK from BS to MH on wireless loss
Solution #2: Explicit Loss Notifications (ELN)


In-band message to TCP sender
General solution framework
0
Snoop Protocol: MH to FH
0
1
Receiver
Base Station
Sender
Snoop Protocol: MH to FH
3
21
2
Receiver
Base Station
Sender
0
Snoop Protocol: MH to FH
Add 1 to list of holes after checking for congestion
1
5
3
2
4
Receiver
Base Station
Sender
1
ack 0
0
Snoop Protocol: MH to FH
1
6
5
4
Receiver
ack 0
Sender
1
Base Station
ack 0
ack 0
Duplicate ACKs
3
2
0
Snoop Protocol: MH to FH
ELN marking
1
ack 0
ack 0
Sender
Base Station
ack 0 ELN information
on duplicate ACKs
1
6
Receiver
ack 0
ack 0
5
3
4
2
0
Snoop Protocol: MH to FH
1
Retransmit on dup ACK + ELN
No congestion control now
1
ack 0
ack 0
Sender
Base Station
ELN information
on duplicate ACKs
ack 0
1
Receiver
ack 0
ack 0
6
5
4
3
2
0
Snoop Protocol: MH to FH
Clean holes on new ACK
Receiver
ack 6 Base Station
1
6
5
4
3
2
0
Sender
Link-aware transport decouples congestion control from loss recovery
Technique generalizes nicely to wireless transit links
End-to-End Enhancements
4
Selective ACKs
ack 0 [sack 2]
2
ack 0 [sack 2,4]
• Decouple congestion control from loss recovery

Explicit Loss Notification (ELN)
• Burst losses

Selective ACKs (SACKs) [FF96,KM96,MMFR96,B96]
• Snoop protocol: no changes to fixed hosts on the
Internet
0
Snoop Performance Improvement
Sequence number (bytes)
2.0E+06
Best
possible
TCP
(1.30
Mbps)
1.5E+06
Snoop (1.11 Mbps)
TCP Reno
(280 Kbps)
1.0E+06
5.0E+05
0.0E+00
0
10
20
30
40
50
60
Time (s)
2 MB wide-area TCP transfer over 2 Mbps Lucent WaveLAN
Performance: FH to MH
1.6
Throughput (Mbps)
1.4
Snoop+SACK
1.2
Snoop
1
SPLIT-SACK
Typical error rates
TCP SACK
0.8
SPLIT
0.6
TCP Reno
0.4
• Snoop+SACK and Snoop perform best
• Connection splitting not essential
• TCP SACK performance disappointing
0.2
0
0
500
1000
1500
2000
2500
1/Bit-error Rate (1 error every x Kbits)
2 MB local-area TCP transfer over 2 Mbps Lucent WaveLAN
Empirical Error Modeling
1.2
Data collected from Reinas Env. Monitoring Network
Santa Cruz, CA
1
Error duration
0.8
CDF
Error-free duration
0.6
0.4
0.2
0
0
2
4
6
Duration (ln ms)
8
10
Real-World Web Performance
# of downloads
in 1000 s
3000
2500
Snoop performance improvement:
3X-6X over Reno & SACK
2000
1500
Empirical wireless error
model from real traces
of Reinas wireless network,
UC Santa Cruz
Empirical Web workload
model from real traces
1000
500
0
1 conn.
Reno
SACK
Snoop
2 conns. 3 conns. 4 conns.
P-HTTP
1 conn.
2 conns.
3 conns.
4 conns.
P-HTTP
170
179
849
186
203
975
102
177
1033
206
76
1085
966
985
3000
Reno
SACK
Snoop
Congestion Window (bytes)
Benefits of TCP-Awareness
Snoop
60000
50000
40000
30000
20000
10000
0
LL (no duplicate ack suppression)
0
10
20
30
40
50
60
70
80
Time (sec)
• 30-35% improvement for Snoop: LL congestion window is
small (but no coarse timeouts occur)
• Connection bandwidth-delay product = 25 KB
Suppressing duplicate acknowledgments and TCP-awareness
leads to better utilization of link bandwidth and performance
Summary: Wireless Bit-Errors
• Problem: Wireless corruption mistaken for congestion
• Solution: Snoop Protocol
• General lessons


Lightweight soft-state agent in network infrastructure
• Fully conforms to the IP service model
• Automatic instantiation and cleanup
Cross-layer protocol design & optimizations
Transport
Network
Transport-aware link
(Snoop agent at BS)
Link
Physical
Link-aware transport
(ELN)
Challenge #2: Asymmetric Effects
• Asymmetric access technologies
ADSL, (wireless) cable modems, DBS, etc.
 Low-bandwidth ACK channel [LM97, KVR98]

• Packet radio networks
Metricom’s Ricochet, CDPD, etc.
 Adverse interactions between data and ACK flow

Problem: Imperfect ACK feedback degrades TCP performance
The Character of Asymmetry
Router
Server
Router
Forward
ACK
Client
The network and traffic characteristics in one
direction significantly affect performance in the other
Bandwidth: 10-1000 times more in the forward direction
Latency: Variability due to MAC protocol interactions
Packet loss: Higher loss- or error-rate in one direction
Bandwidth Asymmetry Problems
Router
Data 9
Data 10
Forward
Data 11
Data 8
Bottleneck
Router
Server
ACK
0
1. Acks arrive slowly (large buffer)
Client
1 2 3 4 5 6 7
1
2. Acks are dropped (small buffer)
4
3
2
7
6
5
3. Acks are queued behind data packets
Data
Data
Ack flow
1
TCP Throughput (Mbps)
Hybrid Wireless Cable Measurements
6
5
10 Mbps Ethernet
4
3
28.8 C-SLIP
2
9.6 C-SLIP
1
0
28.8 SLIP
9.6 SLIP
0
20
40
60
80
100
120
140
Socket Buffer Size (KB)
160
180
200
Return channel speed and latency affects performance
Latency Asymmetry: Packet Radio
Networks
RTS
Fixed Host
Ethernet Radios
FH
ER
PT
PT
Mobile Host
CTS
PT
Internet
GW
ER
PT
Modem PR
PT
ER
PT
MH
Poletop Radios PT
Half-duplex radios
Synchronization before communication
Packet Radio Networks
Data
Fixed Host
Ethernet Radios
FH
ER
PT
PT
Mobile Host
Ack
No response
PT
Internet
GW
ER
RTS
PT
Modem PR
PT
ER
PT
MH
Poletop Radios
Exponential
PT backoff
Problem: Large and variable communication latency
Problem: Large Round-Trip Time Variations
Example: Metricom Ricochet Wireless Network
Sequence Number trace
RTT Estimate
6000
Fast retransmissions
250000
200000
RTT Estimate (msec)
Timeouts
150000
100000
50000
5000
4000
3000
2000
1000
0
•
•
•
•
20
40
60
Time (sec)
80
100
Mean rtt = 2.45s, std deviation = 1.5s  long timeout!
Long idle periods after multiple losses (~ 20 Kbps)
In contrast, UDP throughput = 50-64 Kbps
ACK flow affects data latency
19
17
Sample number
15
13
11
9
7
5
0
3
0
1
Sequence Number (bytes)
300000
Solutions
• Problems arise because of imperfections in
the ACK feedback
• Reduce frequency of acks
ACK Filtering (AF)
 ACK Congestion Control (ACC)

• Handle infrequent acks
Sender Adaptation (SA)
 ACK Reconstruction (AR)

General solution approach for asymmetric situations
ACK Filtering (AF)
Router
Forward
Router
Server
1
3 5 7 9
Client
11
13
• Purge all redundant, cumulative ACKs from
constrained reverse queue
• Used in conjunction with sender adaptation
or ACK reconstruction
ACK Congestion Control (ACC)
Data 20
Data 21
Router
Data 22
Data 19
Forward
16
8
Server
10
14
18
Client
Delack factor = 2
Adaptive extension of TCP delayed ACKs based on
congestion feedback from router or sender
ACK Congestion Control (ACC)
Data
Data
Router
Data
Forward
Data
Data
Client
12
Delack factor = 2
Server
22
RED [FJ93] marking
of ECN bit [F94]
(Explicit Congestion
Notification)
ACK Congestion Control (ACC)
Data
Echo ECN marking
to receiver
Data 40
Data
Router
Forward
Data
Client
22
Server
Delack factor = 2
ACK Congestion Control (ACC)
Data 42
Data 43
Router
Data 41
Data 40
Forward
36
40
Client
Delack factor = 4
Server
Sender Adaptation (SA)
• Infrequent ACKs cause slow window growth
• Sender tends to be bursty
Forward
Router
Client
Server
1
9
15
1. cwnd += 8
cwnd += 8/cwnd
Increment window by
amount of data ack’d
2.
19
20
21
22
...
Regulation: pace packets out at rate
estimated by cwnd/srtt
This reduces burstiness
ACK Reconstruction (AR)
Forward
Server
1
1
Client
9
11
3
5
7
ACK reconstructor
13
3 5 7 9
ACK filter
• Regenerates ACKs at other end of reverse channel
• Shields sender from large gaps in ack sequence
• AR rate determined by
input ACK rate
 target ACK spacing

Bandwidth Asymmetry Performance


TCP transfers in the forward direction alone
Maximum window size 100 KB; no losses on forward path
Throughput (Mbps)
10
8
Reno
ACC
AF
AF+AR
6
4
2
0
10 pkt C/10 pkt 50 pkt C/50 pkt
– Header compression helps
– Large reverse channel buffer hurts for Reno and ACC
– Fairness greatly improves using AF and ACC for multiple transfers
Performance: Single Transfer
• AF reduces chances that peer radio is busy

MAC backoffs less frequent
• Round-trip std deviation reduces from 1.5 s to 0.6 s
60
Throughput (Kbps)
50
40
Reno
Reno+ACC
Reno+AF
30
20
10
0
1 hop
2 hops
3 hops
AF: 20-35% throughput improvement compared to Reno
Performance: Concurrent Transfers
• Metrics: utilization and fairness
• Simultaneous connections over 2-hop network

Performance more predictable and consistent with AF
• Unpredictable performance caused by long timeouts
Jain's fairness index
1
0.8
0.6
AF
Reno
0.4
AF: 25% improvement in fairness over Reno
0.2
0
2
4
6
8
10
12
Number of connections
Summary: Asymmetric Effects
• General definition of asymmetry

Problem: ACK channel impacts TCP performance
• Classification of types of asymmetry


Bandwidth asymmetry due to technologies
Latency asymmetry due to MAC interactions
• General solutions: Two-pronged approach


Reduce frequency of ACKs (AF, ACC)
Handle infrequent ACKs (SA, AR)
• Status


BSD/OS 3.0 implementation
Soon-to-be Internet RFC
Challenge #3: Low Bandwidth
Low channel bandwidths
Burst packet losses
Short Web transfers
Sender
• Small transmission window size

3
Timeouts for most losses
• Result: Unacceptably low throughput
1
4
Receiver
2
Enhanced TCP Loss Recovery
1
Sender
Goal: Better data-driven loss recovery
Web trace analysis: 25% of all timeouts after at least 1
packet was successfully received
Receiver
Enhanced TCP Loss Recovery
Limited Transmit
65
Early “fast recovery”: send new packet on dup ACK
5
Sender
ack 0
1st dup ack
3
ack 0
Need to guard against packet reordering
1
4
Receiver
2
Performance: Enhanced Recovery
450
400
Packet sequence #
350
Enhanced Recovery
300
250
200
150
TCP SACK
100
50
0
0
1
2
3
Time (s)
4
5
6
• Timeouts occur only on persistent congestion


Entire window is lost
Retransmission is lost
TCP Loss Recovery: Status
• SACK implementation in BSD/OS

Released March 1996 (IETF presentation); patches
June 1996
• Enhanced loss recovery
BSD/OS implementation
 Experiments over Internet paths and Ricochet
network
 Now documented as RFC 3042

Summary
• Three fundamental challenges to efficient reliable data
transport over wireless networks



Wireless bit-errors: Berkeley Snoop protocol (local
recovery + ELN)
Asymmetric effects: Two-pronged approach with end-to-end
and link schemes (AF, ACC, SA, AR)
Low channel bandwidths: Enhanced TCP loss recovery
• Lessons for protocol design



Cross-layer protocol optimizations: Snoop, ELN, AF
Soft-state network agents: Snoop, AR
Data-driven loss recovery: Snoop, Limited Transmit
protocol