Transcript Slide 1

Enabling New Applications
with Optical Circuit-Switched
Networks
Xuan Zheng
April 27, 2004
Outline

Background and problem statement

Proposed RESCUE service


Application I: High-speed optical Dial-Up
Internet access service using RESCUE
circuits
Application II: end-to-end RESCUE circuits
to improve file transfer delays

Implementation of application II

Summary
2
Background

Current optical network architectures
Enterprise building
Internet - Packet Switched backbone network
(IP routers interconnecting various networks)
Internet service
provider router
Ethernet
hosts
Ethernet
switch/
IP router
Access service provider
node
Metro optical
access network
Leased lines

Metro optical
core network
Inter-switch
circuits
Wide-area
optical network
Current optical network applications


Leased access circuits for enterprise users
High-speed inter-switch/inter-router circuits
3
Gaps between User Needs and Current
Network Solutions

Access link bottleneck problem



TCP limitations


Date rates of access links are still slow.
Access links are often heavily utilized.
TCP is not suited for High-Delay-Bandwidth-Product (HDBP)
networks because of its congestion control scheme.
Hard to create end-to-end connections to provide
QoS for interactive real-time applications

Current Internet is connectionless.
4
Prior work

In packet-switched networks

Packet-switched ring (RPR) is proposed for access links


TCP enhancements are proposed to achieve high end-to-end
TCP throughputs



Increasing the circuit rate does not help a lot if the packet loss
rate remains high.
HighSpeed TCP, Scalable TCP, FAST TCP, etc.
Did not touch the shared nature of Internet; no end-to-end
QoS guarantee.
QoS in IP based networks



IntServ, DiffServ, TCP switching, etc.
Implemented at IP routers instead of end hosts.
Not scalable, especially when traffic is large.
5
Prior work

In circuit-switched networks

Traditionally, bandwidth-on-demand is primarily focused on
inter-switch/inter-router circuits in service provider
networks.



Fast restoration and rapid provisioning
Centralized resource management with human interventions
Latest efforts on bandwidth-on-demand






UCLP in Canarie network, ESnet, etc.
Provide user-controlled end-to-end optical circuit provisioning
Still centralized approach
Applications are limited to the elephant data transfer and other
eScience applications in a small community
Too costly
Does not scale for commodity service
6
Problem Statement

Design new network architectures
exploiting advances in optical switching
technologies to bridge the gaps between
user needs and network limitations.


High-speed circuit switches
Dynamic distributed control with
signaling/routing protocols
7
Proposed Architecture: Reconfigurable
Ethernet/SONET Circuits for End Users (RESCUE)
Enterprise building
Application +
Ethernet
Software upgrade
RESCUE software
hosts
To ISP's router or
another signaling-capable
To ISP's router
network switch
OS
Second NIC
NIC 2
NIC 1
Ethernet
switch/IP router
Optical circuit-switched network
Primary Internet
leased access circuit
RESCUE circuit
From other end
hosts
MSPP Ethernet
Interface
SONET
Interface
Other
Enterprises
8
RESCUE: An “Add-on” Service to
Primary Internet Access

Two paths between two entities: the primary TCP/IP
path and an Ethernet/SONET circuit.
Packet-switched
Internet
End host
I

Optical Circuitswitched
Network
End host
II
“Parallel-hybrid” architecture vs. traditional
“sequential-hybrid” architecture
9
RESCUE: Applications


High-speed
optical DialUp Internet
access service
End-to-end
file transfers

Gap #1

Gap #2
10
Application I: Dial-Up Internet Access
Service using RESCUE Circuits
ARP table
Map MAC addresses
to newly setup
RESCUE circuit
Enterprise building
Ethernet
hosts
User space+
Application
RESCUE software
Internet service
provider
OS
NIC 2
Routing table
Map IP address to
newly setup
RESCUE circuit
NIC 1
f  100MB
Optical
circuit-switched
access network
Dial-Up server
(signaling
+
configuration
software)
rprimaryswitch/IP
 Ethernet
100Mbps,
Tprop  50ms, Pprimary  0.01  Transfer
delay  7min
switch/IP
router
Primary Internet
leased access circuit
Ethernet
router
SONET
MSPP
From other
end hosts
rdialup  100Mbps, Tprop  50ms, Pdialup  0.00001 Transfer delay  10sec
SONET
MSPP
rdialup Ethernet
45Mbps,
Interface
Tprop
RESCUE circuit
Dial-Up
for
50ms,
P service
 0.00001
dialup
 Transfer delay  19sec
11
Application II: End-to-end RESCUE
Circuits to Improve File Transfer Delays
Enterprise building
Ethernet
hosts
Enterprise building
User space
Application +
RESCUE software
Kernal
OS
space
NIC 2
NIC 1
User
space +
Application
Internet - Packet Switches
(IP routers interconnecting
various
networks)
f  1TB
RESCUE software
Kernal
OS
space
NIC 1
Ethernet
hosts
NIC 2
rprimary  1Gbps, Tprop  50ms, Pprimary  0.0001 Transfer delay  4 days and 15.3 hours
Ethernet
switch/IP router
Primary Internet
leased access circuit
Ethernet
switch/IP router
rrescue  1Gbps, Tprop  50ms  Transferdelay  f/rrescue  Tprop / 2  8000sec 2.2 hours
From other
end hosts
SONET
MSPP

Ethernet
Interface
Optical circuit-switched
networks
RESCUE circuit for EndTo-End file transfer
service
From other
end hosts
Ethernet
Interface
Use new transport protocols other than TCP on end-to-end
RESCUE circuits
SONET
MSPP
12
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
E[Trescue ]  (1  Pb )(E[Tsetup ]  Ttransfer )  Pb ( E[T fail ]  E[Ttcp ])
(1)
Pb : the call- blockingprobability on the opticalcircuit- switched
network,
E[Tsetup ] : the mean call- setup delay of a successfulcircuitsetup,
E[T fail ] : the mean call- setup delay of a failedcircuitsetup,
E[Ttcp ] : the mean time to transfer the file usingthe primary accesslink.
Ttransfer
f T prop
 
: the time to transfer the file on the RESCUEcircuit
rc
2
f : the size of the file being transferred
rrescue : the data rate of the circuit
13
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
Compare E[Tresuce ] from (1) with E[Ttcp ])
if E[Tresuce ]  E[Ttcp ] resort directly to the TCP/IP path
if E[Tresuce ]  E[Ttcp ] attempt circuitsetup
(2)
By approximating E[T fail ] to be equalto E[Tsetup ], we get :
 E[Tsetup ]

if 
 E[Ttcp ]  Ttransfer  resort directly to the TCP/IP path (3)
 1  Pb

 E[Tsetup ]

if 
 E[Ttcp ]  Ttransfer  attempt circuitsetup
 1  Pb

14
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
E[Ttcp ]  E[T ss ]  E[Tloss ]  E[Tca ]  E[Tdelay ]  F(rprimary ,Tprop ,Ploss )
(4)
[1] J. Padhye, V. Firoiu,D. Towsley, and J. Kurose, " Modeling
TCP Throughput: A Simple Model and its EmpiricalValidation, ”
IEEE/ACMTransaction on Networking, vol. 9, pp. 31 - 46, February 2001.
[2] N. Cardwell,S. Savage,and T. Anderson, " ModelingTCP Latency,”
proceedi ngof IEEE Infocom,vol. 3, pp. 1742- 1751, Tel - Aviv,Israel,
March2000.
E (Tsetup ) 
msig
rs
 (1 
 sig
 sp
dialup
)  ( k  1)  Tsp  (1 
)  k  T prop
2(1   sig )
2(1   sp )
(5)
msig : the cumulativesi ze of si gnalingmessages used in call setup,
rs : the si gnalinglink rate,
 sig : the trafficload on the si gnalinglink with an M/D/1queue,
 sp : the trafficload on the call processor with an M/D/1queue,
k : the number of switches on the Dial- Up circuitpath,
Tsp : the call- processingdelay incurredat each switch,
dialup
T prop
: round - trip propagation delay between the Dial- Up end host
and the ISP's IP router.
15
Application II: Analytical Basis for the
Routing Decision -Delay Analysis
rrescue  rprimary  100Mbps, sig  sp  0.7, k  20, msig  100B, rs  10Mbps
Tprop = 0.1ms
Tprop = 50ms
16
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
Crossoverfile sizeswhen rrescue  rprimary  100Mbps andTprop  0.1ms
For example:
Pb=0.
3
+ Ploss=0.01
Crossover file size=180KB
17
Application II: Analytical Basis for the
Routing Decision - Utilization Analysis
Total network utilization u  ua  uc
1) uc : per - circuit utilization
uc 
E[Ttransfer ]
E[Tsetup ]  E[Ttransfer ]
, where E[Ttransfer ] 
E[ X | X   ]
rc
αχ
: the fractionalaverage file size with Pareto distribution
α 1
α  1.06 : the shape parameter of Pareto distribution
E[ X|X  χ] 
k  1000: the scale parameter of Pareto distribution
χ : the crossover file size
rc : the circuit rate
( v access , Pbaccess )
Local traffic
Long distance traffic
maccess
mcore
N
(v core , Pbcore )
maccess
...
reduced load approximation.
maccess
...
(uaaccess ,uacore ) is calculatedby usingthe fixed- point
Symmetric three-link network model

...
2) ua : aggregate circuit utilization
18
maccess
N
Application II: Analytical Basis for the
Routing Decision - Utilization Analysis
local
longdist
N  100, f  0.8, Tprop
 0.1ms, Tprop
 50ms, and mcore  10maccess
93%
84%
Access link utilization uaccess
Core link utilization ucore
19
Analytical Basis for the Routing Decision

In low propagation-delay environments



Delay-based decision
Crossover file size depends upon the link rates
and the loading conditions on the two paths
In high propagation-delay environments


Utilization-based decision
A lower bound is needed for crossover file size
20
Implementation of Application II

End-host RESCUE software



A high-speed transport protocol module for end-to-end filetransfer applications,
A routing decision module,
A signaling module.
RESCUE software
Routing decision
Database
Application
Signaling
High speed
transport
protocol
TCP
NIC I
NIC II
Primary TCP/IP path
End-to-end RESCUE circuit
21
High-speed Transport Protocol: Design
Rationale

Flow control: rate-based scheme to achieve high
circuit utilization.


Error control: selective-Automatic-Repeat-reQuest
(selective-ARQ) scheme to achieve a high
efficiency.



Negative Acknowledgements (NAK) because of the
guaranteed in-sequence delivery of data blocks on
dedicated circuits.
Positive Acknowledgements (ACK) are still needed to
update sender’s retransmission buffers.
Dual communication paths


Implementation is not trivial.
Use primary TCP/IP path to transport reverse-path
control messages.
Our transport solution: Fixed Rate Transport
Protocol (FRTP).
22
High-speed Transport Protocol: FRTP
Specification

The model of FRTP connections
The sender
Control process
Data transfer
process
The receiver
Control channel over
primary TCP/IP path
Data channel over
RESCUE circuit
Control process
Data transfer
process
23
High-speed Transport Protocol: An
Implementation of FRTP protocol

FRTP is implemented as an application-level process using a
combination of UDP and TCP.
FRTP sender
FRTP receiver
Initiation
Initiation
Listening
Establish TCP
control channel
TCP channel
Establish TCP
control channel
FRTP parameter
exchange
TCP channel
FRTP parameter
exchange
Copy one block of
data into
retransmission buffer
Disk-IO thread
* Check and
process feedback
from the receiver
The loss list is
empty?
Yes
Retransmission buffer
Encapsulate a new
DATA packet
TCP channel
Move one block of
data out of
resequencing buffer
Disk-IO thread
No
Pick up a lost
packet
The loss list
Transmit a
DATA packet
** Send feedback
to the sender if
necessary
P
UD
l
nne
cha
Resequencing buffer
Receive
DATA packet
If an error
detected?
The loss list
Yes
Send ERR packet
to the sender
No
Wait one interpacket time
Network-IO thread
Update the loss list and the next
expected sequence number
Network-IO thread
24
High-speed Transport Protocol: An
Implementation of FRTP protocol

Experimental environment:


Connections: Two Dell Precision 650 workstations
connected via a Dell PowerConnect Gigabit Ethernet
switch.
Hardware configurations:





A 2.4-GHz Intel CPU connected to a 533-MHz front-side
bus (34Gbps CPU bandwidth),
An E7505 chipset with 512MB of DDR 266MHz memory
(17Gbps memory bandwidth),
An 80GB ATA/100 7200 RPM EIDE disk drive with 2MB
cache (400Mbps average access rate measured by Bonnie
[66]), and,
A 64bit/100MHz PCIx bus for the GbE NIC (6.4Gbps
network bandwidth).
The operating systems: RedHat Linux 9 with version
2.4.20-30.9 kernel.
25
High-speed Transport Protocol: An
Implementation of FRTP protocol

Experimental results with default settings

256KB UDP buffer size, 1500Bytes DATA packet size, 40MB FRTP
buffer size, and 8MB block size for disk I/O operations.
FRTP throughput
FRTP packet-loss rate
26
High-speed Transport Protocol: An
Implementation of FRTP protocol

Impact of UDP buffer size

500Mbps sending rate, 1500Bytes DATA packet size, 40MB FRTP
buffer size, and 8MB block size for disk I/O operations.
FRTP throughput
FRTP packet-loss rate
27
High-speed Transport Protocol: An
Implementation of FRTP protocol

Impact of FRTP DATA packet size

500Mbps sending rate, 256K UDP buffer size, 40MB FRTP buffer
size, and 8MB block size for disk I/O operations.
FRTP throughput
FRTP packet-loss rate
28
Routing Decision Module Design
QUERY
(f, dest)
Table look
up
Run-time
module
File size
comparison
Database
Dest IP
Ploss
Pb
Tprop
r
rc
Crossover
file size
...
...
...
...
...
...
...
192.168.0.2
0.01
10%
30ms
100Mbps
100Mbps
27MB
192.168.0.8
0.001
10%
30ms
10Mbps
100Mbps
600KB
...
...
...
...
...
...
...
Attempt circuit setup Use TCP/IP path
if f > fc
if f < fc
Pre-computation
module
29
Signaling Module Design

A RSVP-TE implementation
Dell
workstation 1
RESCUE software
RESCUE software
Routing
decision
Routing
decision
Dell
workstation 2
Application
Application
Signaling
Signaling
Ethernet switch
TCP
NIC I
NIC I
NIC II
NIC II
FRTP
FRTP
TL1
messages
Dell
workstation
3
TCP
Cisco
MSPP
RSVP_TE RSVP_TE
messages messages
Sycamore
switch
Sycamore
switch
TL1
messages
Cisco
MSPP
RSVP_TE
messages
30
Contributions

New network architecture




“Parallel-hybrid” instead of traditional “sequential-hybrid”
Dedicated end-to-end high-speed connectivity between end hosts
Distributed, dynamic end-to-end circuit provisioning instead of
centralized resource management.
Objective: a large-scale network providing commodity services

High aggregate network utilization

Commodity services: the elephant data transfer as well as small data
transfer



Call blocking mode with packet-switched back-up paths.
High circuit utilization



High traffic load -> high utilization -> low cost
Superfast provisioning: distributed + hardware signaling
High-speed rate-based flow control
Leveraging current conditions of Ethernet and SONET


Circuit-switched SONET are widely deployed in wide-area networks.
Ethernet dominates local-area networks.
31
Publications from this work

Journal papers:



Conference papers:




X. Zheng, M. Veeraraghavan, and H. Lee, “Using Dial-Up Optical Circuits to Address
the Access Link Bottleneck Problem,” Under revision based on reviews from Infocom
2004.
Best Student Paper Award, M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, and W. Feng,
“CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture,”
Proceeding of Opticomm 2003, Dallas, TX, Oct. 13-16, 2003.
T. Moors, M. Veeraraghavan, Z. Tao, X. Zheng, R. Badri, Experiences in automating the
testing of SS7 Signaling Transfer Points, International Symposium on Software
Testing and Analysis (ISSTA), July 22-24, 2002, Via di Ripetta, Rome - Italy.
Magazine paper:


M. Veeraraghavan and X. Zheng, “A Reconfigurable Ethernet/SONET Circuit Based
Metro Network Architecture,” IEEE JSAC on Advances in Metropolitan Optical
Networks (Architectures and Control), 2004.
M. Veeraraghavan, X. Zheng, W. Feng, Hojun Lee, E. Chong, and H. Li, “Scheduling and
transport for file transfers on high-speed optical circuits,” JOGC on High
Performance Networking, 2004.
M. Veeraraghavan, D. Logothetis, and X. Zheng, “Using dynamic optical networking for
high-speed access,” Optical Networks Magazine, special issue on “Dynamic Optical
Networking around the Corner or Light Years Away?”, vol. 4, no. 5, pp. 30-40, Sep.
2003.
Workshop papers:

M. Veeraraghavan, H. Lee, and X. Zheng, “File transfers across optical circuit-switched
networks,” PFLDnet 2003, Geneva, Switzerland, Feb. 3-4, 2003.
32
Questions?
Thanks! 
33