Transcript Slide 1

Enabling New Applications
with Optical Circuit-Switched
Networks
Xuan Zheng
April 27, 2004
Outline

Background and problem statement

Proposed RESCUE service


Application I: High-speed optical Dial-Up
Internet access service using RESCUE
circuits
Application II: end-to-end RESCUE circuits
to improve file transfer delays

Implementation of application II

Summary
2
Background

Current optical network architectures
Enterprise building
Internet - Packet Switched backbone network
(IP routers interconnecting various networks)
Internet service
provider router
Ethernet
hosts
Ethernet
switch/
IP router
Access service provider
node
Metro optical
access network
Leased lines

Metro optical
core network
Inter-switch
circuits
Wide-area
optical network
Current optical network applications


Leased access circuits for enterprise users
High-speed inter-switch/inter-router circuits
3
Gaps between User Needs and Current
Network Solutions

Access link bottleneck problem



TCP limitations


Date rates of access links are still slow.
Access links are often heavily utilized.
TCP is not suited for High-Delay-Bandwidth-Product (HDBP)
networks because of its congestion control scheme.
Hard to create end-to-end connections to provide
QoS for interactive real-time applications

Current Internet is connectionless.
4
Prior work

In packet-switched networks

Packet-switched ring (RPR) is proposed for access links


TCP enhancements are proposed to achieve high end-to-end
TCP throughputs



Increasing the circuit rate does not help a lot if the packet loss
rate remains high.
HighSpeed TCP, Scalable TCP, FAST TCP, etc.
Did not touch the shared nature of Internet; no end-to-end
QoS guarantee.
QoS in IP based networks



IntServ, DiffServ, TCP switching, etc.
Implemented at IP routers instead of end hosts.
Not scalable, especially when traffic is large.
5
Prior work

In circuit-switched networks

Traditionally, bandwidth-on-demand is primarily focused on
inter-switch/inter-router circuits in service provider
networks.



Fast restoration and rapid provisioning
Centralized resource management with human interventions
Latest efforts on bandwidth-on-demand






UCLP in Canarie network, ESnet, etc.
Provide user-controlled end-to-end optical circuit provisioning
Still centralized approach
Applications are limited to the elephant data transfer and other
eScience applications in a small community
Too costly
Does not scale for commodity service
6
Problem Statement

Design new network architectures
exploiting advances in optical switching
technologies to bridge the gaps between
user needs and network limitations.


High-speed circuit switches
Dynamic distributed control with
signaling/routing protocols
7
Proposed Architecture: Reconfigurable
Ethernet/SONET Circuits for End Users (RESCUE)
Enterprise building
Application +
Ethernet
Software upgrade
RESCUE software
hosts
To ISP's router or
another signaling-capable
To ISP's router
network switch
OS
Second NIC
NIC 2
NIC 1
Ethernet
switch/IP router
Optical circuit-switched network
Primary Internet
leased access circuit
RESCUE circuit
From other end
hosts
MSPP Ethernet
Interface
SONET
Interface
Other
Enterprises
8
RESCUE: An “Add-on” Service to
Primary Internet Access

Two paths between two entities: the primary TCP/IP
path and an Ethernet/SONET circuit.
Packet-switched
Internet
End host
I

Optical Circuitswitched
Network
End host
II
“Parallel-hybrid” architecture vs. traditional
“sequential-hybrid” architecture
9
RESCUE: Applications


High-speed
optical DialUp Internet
access service
End-to-end
file transfers

Gap #1

Gap #2
10
Application I: Dial-Up Internet Access
Service using RESCUE Circuits
ARP table
Map MAC addresses
to newly setup
RESCUE circuit
Enterprise building
Ethernet
hosts
User space+
Application
RESCUE software
Internet service
provider
OS
NIC 2
Routing table
Map IP address to
newly setup
RESCUE circuit
NIC 1
Ethernet
switch/IP router
rprimary  100Mbps,
Optical
circuit-switched
f  100MB
access network
Tprop  50ms, Pprimary  0.01  Transfer
switch/IPdelay  7min
Primary Internet
leased access circuit
From other
end hosts
rdialup  100Mbps,
SONET
MSPP
rdialup Ethernet
4 5 Mbps,
Interface
Ethernet
router
Tprop  50ms, Pdialup  0.00001  Transfer
Tprop
Dial-Up server
(signaling
+
configuration
software)
RESCUE circuit
Dial-Up
 for
50ms,
P service
 0.00001
dialup
 Transfer
SONET
MSPP
delay  10sec
delay  19sec
11
Application II: End-to-end RESCUE
Circuits to Improve File Transfer Delays
Enterprise building
Ethernet
hosts
Enterprise building
User space
Application +
RESCUE software
Kernal
OS
space
NIC 2
rprimary  1Gbps,
User
space +
Application
Internet - Packet Switches
(IP routers interconnecting
various
networks)
f  1TB
NIC 1
Tprop  50ms, Pprimary  0.0001  Transfer
Ethernet
switch/IP router
Primary Internet
leased access circuit
rrescue  1Gbps, Tprop  50ms  Transfer
From other
end hosts
SONET
MSPP

Ethernet
Interface
RESCUE software
Kernal
OS
space
NIC 1
Ethernet
hosts
NIC 2
delay  4 days and 15.3 hours
Ethernet
switch/IP router
delay  f/r rescue  Tprop / 2  8000sec
Optical circuit-switched
networks
RESCUE circuit for EndTo-End file transfer
service
 2.2 hours
From other
end hosts
Ethernet
Interface
Use new transport protocols other than TCP on end-to-end
RESCUE circuits
SONET
MSPP
12
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
E [ T rescue ]  (1  Pb )( E [ T setup ]  T transfer )  Pb ( E [ T fail ]  E [ T tcp ])
(1)
Pb : the call - blocking probabilit y on the optical circuit - switched
network,
E [ T setup ] : the mean call - setup delay of a successful
circuit setup,
E [ T fail ] : the mean call - setup delay of a failed circuit setup,
E [ T tcp ] : the mean time to transfer
T transfer 
f
rc

T prop
2
the file using the primary
: the time to transfer
access link.
the file on the RESCUE circuit
f : the size of the file being transferre
d
rrescue : the data rate of the circuit
13
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
Compare
E [T resuce ] from (1) with E [T tcp ])
if E [T resuce ]  E [ T tcp ]  resort directly
if E [T resuce ]  E [ T tcp ]  attempt
By approximat
to the TCP/IP path
(2)
circuit setup
ing E [T fail ] to be equal to E [T setup ], we get :
 E [T setup ]
if 
 E [T tcp ]  T transfer
 1  Pb




resort directly
 E [T setup ]
if 
 E [T tcp ]  T transfer
 1  Pb




attempt
to the TCP/IP path
(3)
circuit setup
14
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
(4)
E[T tcp ]  E[T ss ]  E[T loss ]  E[T ca ]  E[T delay ]  F(r primary ,T prop ,Ploss )
[1] J. Padhye,
V. Firoiu, D. Towsley,
TCP Throughput
IEEE/ACM
proceeding
" Modeling
: A Simple Model and its Empirical
Transactio
[2] N. Cardwell,
and J. Kurose,
n on Networking
S. Savage,
, vol. 9, pp. 31 - 46, February
and T. Anderson,
of IEEE Infocom,
Validation , ”
" Modeling
2001.
TCP Latency, ”
vol. 3, pp. 1742 - 1751, Tel - Aviv, Israel,
March 2000.
E ( T setup ) 
m sig
 (1 
rs
 sig
2 (1   sig )
m sig : the cumulative
)  ( k  1)  T sp  (1 
size of signaling
 sp
2 (1   sp )
messages
)  k  T prop
dialup
(5)
used in call setup,
rs : the signaling link rate,
 sig : the traffic load on the signaling link with an M/D/1 queue,
 sp : the traffic load on the call processor
k : the number
of switches
T sp : the call - processing
dialup
T prop
with an M/D/1 queue,
on the Dial - Up circuit path,
delay incurred
at each switch,
: round - trip propagatio n delay between
and the ISP' s IP router.
the Dial - Up end host
15
Application II: Analytical Basis for the
Routing Decision -Delay Analysis
rrescue  r primary  100 Mbps ,  sig   sp  0 . 7, k  20 , m sig  100 B , rs  10 Mbps
Tprop = 0.1ms
Tprop = 50ms
16
Application II: Analytical Basis for the
Routing Decision - Delay Analysis
Crossover
file sizes when rrescue  r primary  100 Mbps and T prop  0 .1ms
For example:
Pb=0.
3
+ Ploss=0.01
Crossover file size=180KB
17
Application II: Analytical Basis for the
Routing Decision - Utilization Analysis
utilizatio n u  u a  u c
E [ T transfer ]
E [ T setup ]  E [ T transfer ]
E [ X|X  χ ] 
αχ
α 1
, where E [ T transfer ] 
: the
fractional
(u
access
a
,u
reduced
core
a
file size with Pareto
of Pareto distributi on
k  1000 : the scale parameter
of Pareto distributi on
circuit
2) u a : aggregate
average
rc
α  1 . 06 : the shape parameter
χ : the crossover
rc : the
E[ X | X   ]
distributi on
file size
rate
circuit
utilizatio n
) is calculated
load approximat
by using the fixed - point
ion.
Symmetric three-link network model

(v
maccess
access
access
, Pb
Local traffic
Long distance traffic
maccess
)
mcore
N
(v
maccess
core
core
, Pb
)
...
uc 
utilizatio n
...
1) u c : per - circuit
...
Total network
18
maccess
N
Application II: Analytical Basis for the
Routing Decision - Utilization Analysis
long  dist
N  100 , f  0 . 8 , T prop  0 . 1ms , T prop
local
 50 ms , and m
core
 10 m
access
93%
84%
Access link utilization uaccess
Core link utilization ucore
19
Analytical Basis for the Routing Decision

In low propagation-delay environments



Delay-based decision
Crossover file size depends upon the link rates
and the loading conditions on the two paths
In high propagation-delay environments


Utilization-based decision
A lower bound is needed for crossover file size
20
Implementation of Application II

End-host RESCUE software



A high-speed transport protocol module for end-to-end filetransfer applications,
A routing decision module,
A signaling module.
RESCUE software
Routing decision
Database
Application
Signaling
High speed
transport
protocol
TCP
NIC I
NIC II
Primary TCP/IP path
End-to-end RESCUE circuit
21
High-speed Transport Protocol: Design
Rationale

Flow control: rate-based scheme to achieve high
circuit utilization.


Error control: selective-Automatic-Repeat-reQuest
(selective-ARQ) scheme to achieve a high
efficiency.



Negative Acknowledgements (NAK) because of the
guaranteed in-sequence delivery of data blocks on
dedicated circuits.
Positive Acknowledgements (ACK) are still needed to
update sender’s retransmission buffers.
Dual communication paths


Implementation is not trivial.
Use primary TCP/IP path to transport reverse-path
control messages.
Our transport solution: Fixed Rate Transport
Protocol (FRTP).
22
High-speed Transport Protocol: FRTP
Specification

The model of FRTP connections
The sender
Control process
Data transfer
process
The receiver
Control channel over
primary TCP/IP path
Data channel over
RESCUE circuit
Control process
Data transfer
process
23
High-speed Transport Protocol: An
Implementation of FRTP protocol

FRTP is implemented as an application-level process using a
combination of UDP and TCP.
FRTP sender
FRTP receiver
Initiation
Initiation
Listening
Establish TCP
control channel
TCP channel
Establish TCP
control channel
FRTP parameter
exchange
TCP channel
FRTP parameter
exchange
Copy one block of
data into
retransmission buffer
Disk-IO thread
* Check and
process feedback
from the receiver
The loss list is
empty?
Yes
Retransmission buffer
Encapsulate a new
DATA packet
TCP channel
Move one block of
data out of
resequencing buffer
Disk-IO thread
No
Pick up a lost
packet
The loss list
Transmit a
DATA packet
** Send feedback
to the sender if
necessary
P
UD
l
nne
cha
Resequencing buffer
Receive
DATA packet
If an error
detected?
The loss list
Yes
Send ERR packet
to the sender
No
Wait one interpacket time
Network-IO thread
Update the loss list and the next
expected sequence number
Network-IO thread
24
High-speed Transport Protocol: An
Implementation of FRTP protocol

Experimental environment:


Connections: Two Dell Precision 650 workstations
connected via a Dell PowerConnect Gigabit Ethernet
switch.
Hardware configurations:





A 2.4-GHz Intel CPU connected to a 533-MHz front-side
bus (34Gbps CPU bandwidth),
An E7505 chipset with 512MB of DDR 266MHz memory
(17Gbps memory bandwidth),
An 80GB ATA/100 7200 RPM EIDE disk drive with 2MB
cache (400Mbps average access rate measured by Bonnie
[66]), and,
A 64bit/100MHz PCIx bus for the GbE NIC (6.4Gbps
network bandwidth).
The operating systems: RedHat Linux 9 with version
2.4.20-30.9 kernel.
25
High-speed Transport Protocol: An
Implementation of FRTP protocol

Experimental results with default settings

256KB UDP buffer size, 1500Bytes DATA packet size, 40MB FRTP
buffer size, and 8MB block size for disk I/O operations.
FRTP throughput
FRTP packet-loss rate
26
High-speed Transport Protocol: An
Implementation of FRTP protocol

Impact of UDP buffer size

500Mbps sending rate, 1500Bytes DATA packet size, 40MB FRTP
buffer size, and 8MB block size for disk I/O operations.
FRTP throughput
FRTP packet-loss rate
27
High-speed Transport Protocol: An
Implementation of FRTP protocol

Impact of FRTP DATA packet size

500Mbps sending rate, 256K UDP buffer size, 40MB FRTP buffer
size, and 8MB block size for disk I/O operations.
FRTP throughput
FRTP packet-loss rate
28
Routing Decision Module Design
QUERY
(f, dest)
Table look
up
Run-time
module
File size
comparison
Database
Dest IP
Ploss
Pb
Tprop
r
rc
Crossover
file size
...
...
...
...
...
...
...
192.168.0.2
0.01
10%
30ms
100Mbps
100Mbps
27MB
192.168.0.8
0.001
10%
30ms
10Mbps
100Mbps
600KB
...
...
...
...
...
...
...
Attempt circuit setup Use TCP/IP path
if f > fc
if f < fc
Pre-computation
module
29
Signaling Module Design

A RSVP-TE implementation
Dell
workstation 1
RESCUE software
RESCUE software
Routing
decision
Routing
decision
Dell
workstation 2
Application
Application
Signaling
Signaling
Ethernet switch
TCP
NIC I
NIC I
NIC II
NIC II
FRTP
FRTP
TL1
messages
Dell
workstation
3
TCP
Cisco
MSPP
RSVP_TE RSVP_TE
messages messages
Sycamore
switch
Sycamore
switch
TL1
messages
Cisco
MSPP
RSVP_TE
messages
30
Contributions

New network architecture




“Parallel-hybrid” instead of traditional “sequential-hybrid”
Dedicated end-to-end high-speed connectivity between end hosts
Distributed, dynamic end-to-end circuit provisioning instead of
centralized resource management.
Objective: a large-scale network providing commodity services

High aggregate network utilization

Commodity services: the elephant data transfer as well as small data
transfer



Call blocking mode with packet-switched back-up paths.
High circuit utilization



High traffic load -> high utilization -> low cost
Superfast provisioning: distributed + hardware signaling
High-speed rate-based flow control
Leveraging current conditions of Ethernet and SONET


Circuit-switched SONET are widely deployed in wide-area networks.
Ethernet dominates local-area networks.
31
Publications from this work

Journal papers:



Conference papers:




X. Zheng, M. Veeraraghavan, and H. Lee, “Using Dial-Up Optical Circuits to Address
the Access Link Bottleneck Problem,” Under revision based on reviews from Infocom
2004.
Best Student Paper Award, M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, and W. Feng,
“CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture,”
Proceeding of Opticomm 2003, Dallas, TX, Oct. 13-16, 2003.
T. Moors, M. Veeraraghavan, Z. Tao, X. Zheng, R. Badri, Experiences in automating the
testing of SS7 Signaling Transfer Points, International Symposium on Software
Testing and Analysis (ISSTA), July 22-24, 2002, Via di Ripetta, Rome - Italy.
Magazine paper:


M. Veeraraghavan and X. Zheng, “A Reconfigurable Ethernet/SONET Circuit Based
Metro Network Architecture,” IEEE JSAC on Advances in Metropolitan Optical
Networks (Architectures and Control), 2004.
M. Veeraraghavan, X. Zheng, W. Feng, Hojun Lee, E. Chong, and H. Li, “Scheduling and
transport for file transfers on high-speed optical circuits,” JOGC on High
Performance Networking, 2004.
M. Veeraraghavan, D. Logothetis, and X. Zheng, “Using dynamic optical networking for
high-speed access,” Optical Networks Magazine, special issue on “Dynamic Optical
Networking around the Corner or Light Years Away?”, vol. 4, no. 5, pp. 30-40, Sep.
2003.
Workshop papers:

M. Veeraraghavan, H. Lee, and X. Zheng, “File transfers across optical circuit-switched
networks,” PFLDnet 2003, Geneva, Switzerland, Feb. 3-4, 2003.
32
Questions?
Thanks! 
33