The California Institute for Telecommunications and

Download Report

Transcript The California Institute for Telecommunications and

Physical Buildout of the OptIPuter
at UCSD
What Speeds and Feeds Have Been
Deployed Over the Last 10 Years
Performance per Dollar Spent
Doublings
DWDM Capability
16 - 32 x
13
10000Mb
OptIPuter
Infrastructure
Uplink Speed
Endpoint Speed
Wiglaf
10Mb
0
2
4
6
Number of Years
Scientific American, January 2001
8
10000Mb
10
Rockstar
1000Mb
7
10
UCSD is Prototyping
The UCSD OptIPuter Deployment
a Campus-Scale OptIPuter
0.320 Tbps
Backplane
Bandwidth
Juniper
T320
To CENIC and NLR
Dedicated Fibers
Between Sites Link
Linux Clusters
SDSC
SDSC
JSOE
SDSC
Annex
Calit2
Preuss
Engineering
Cisco 6509
8 – 10GigE
SOM
Medicine
Phys. Sci Keck
High School
CRCA
6th
College
Collocation
Node M
Earth
Sciences
SIO
Chiaro
Estara
½ Mile
Source: Phil Papadopoulos, SDSC;
Greg Hidley, Cal-(IT)2
UCSD Packet Test Bed
OptIPuter Year 2
SDSC
Infiniband
64 nodes
Infiniband
4 nodes
JSOE
Sun
17-node
storage
cluster
Sun
128-node
compute
cluster
IBM
48-node
storage
cluster
Sun
17-node
compute
cluster
Geowall 2
Tiled Display
HP CSE
28-node
cluster
(shared)
8-node
cluster
(shared)
Dell 5224
Extreme 400
10
Fujitsu
9-node
cluster 7-node cluster
10
(shared) (shared)
To StarLight via NLR
10
Chiaro
Enstara
Extreme 400
HP
4-node
control
Dell Viz
Dell 5224
Dell
Geowall
CRCA
1
1
1
10
Geowall 2
Tiled Display
UCSD &
CalREN-HPR Shared
IP Network
To UCI, ISI and StarLight
via CalREN-XD and NLR
Preuss
6
1
3-node
viz cluster
10
Dell 5224
IBM 9 mpixel
display pairs
4
Extreme 400
SIO
Dell 6024F
IBM 128-node
compute cluster
IBM 9-node
viz cluster
IBM 9 mpixel
Sun
Sun
display pairs
17-node
5-node
SOM
compute viz cluster
cluster
Sun
22-node
viz cluster
6th
College
Dell 5224
Different Kind of Experimental Infrastructure
•
UCSD Campus Infrastructure
– A campus-wide experimental apparatus
•
Different Kinds of Cluster Endpoints (scaling in the usual dimensions)
–
–
–
–
–
•
Compute
Storage
Visualization
300 + Nodes available for experimentation (ia32, Opteron, Linux)
7 different labs
Clusters and Network can be allocated and configured by the researcher at
the lowest level
– Machine SW configuration: OS (kernel, networking modules, etc),
Middleware, OptIPuter System Software, Application Software
– Root access given to researchers when needed
– As close to chaos as we can get 
•
Networks
– Packet oriented network. 10 Gbps/site. Multiple 10GigE where needed
– Adding lambda capability (Quartzite: Research Instrumentation Award)
What’s Coming Soon?
•
10 GigE Switching
– Force 10 e1200. Initially with sixteen 10GigE Connections
– Expansion is $6K/Port + Optics ($2K for Grey, $5K for DWDM)
– Line Cards, Grey Optics here. Awaiting Chassis
– Force 10 S50 Edge Switches
– 48-port GigE + two 10GigE uplinks ~ $10K with Grey Optics
•
10 GigE NICs
– Neterion
– PCI-X (Intel OEM) with XFP (just received)
– Myrinet 10G (PCI Express)– Ready to place Order
•
DWDM
– On Order: four 10GigE XFPs, 40KM, Channels 31,32 (2 each).
– Delayed: Expect arrival in March (Sigh).
– Following NASA’s lead on the DWDM Hardware (Very good Results on Dragon)
– Arrived: two 8 channel Mux/DeMux from Finisar
•
DWDM Switching
– Expect Wavelength selective switch this summer.
What’s Changing II
•
•
“Center Switching Complex” moving to Calit2
Should be done my end of March
•
•
A modest number of endpoint for OptIPuter Research will be added
A larger Number (e.g. CAMERA) of “production” resources added
•
Increasing emphasis on longer haul connections
– Connections to UCI
Quartzite: Reconfigurable Networking
• NSF Research Instrumentation, Papadopoulos, PI
• Packet network is great
– Give me bigger and faster of what I already know
– Even though TCP is challenged on big pipes
– What about lambdas? And switching lambdas?
• Existing Fiber Plant is fixed.
– Want to Experiment with different topologies? -> “buy” a telecom
worker to reconnect cables as needed
• Quartzite: Research Instrumentation Award (Started 15 Sep)
– Hybrid Network “Switch stack” at our Collocation Point
– Packet Switch
– Transparent Optical Switch
– Allows us to physically build new topologies without physical rewiring
– Wavelength-Selective Switch
– Experimental device from Lucent
Quartzite: DWDM
• Cheap uncooled lasers
• 0W Optical splitters/combiners
• 0.8nm spacing for DWDM
•1GigE, 10GigE
$10K/
switch
Bonded or
Separate
www.aurora.com
www.optoway.com
www.fibredyne.com
$2K/Channel
(Mux/demux)
Single fiber pair
+
$5K/
XFP
= $14K/Connecte
d Pair
UCSD Quartzite Core at Completion (Year 5 of
OptIPuter)
Quartzite Communications
Core Year 3
To 10GigE cluster
node interfaces
.....
Quartzite
Core
• Funded 15 Sep 2004
• Physical HW to Enable
Optiputer and Other Campus
Networking Research
Wavelength
Selective
Switch
• Hybrid Network Instrument
To 10GigE cluster
node interfaces and
other switches
To cluster nodes
.....
To cluster nodes
.....
GigE Switch with
Dual 10GigE Upliks
32 10GigE
Production
OOO
Switch
To cluster nodes
.....
To
other
nodes
GigE Switch with
Dual 10GigE Upliks
...
GigE Switch with
Dual 10GigE Upliks
Chiaro Enstara
GigE
10GigE
4 GigE
4 pair fiber
Juniper T320
CalREN-HPR
Research
Cloud
Campus Research
Cloud
Reconfigurable
Network and
Enpoints
Scalable and automated network mapping for
Optiputer/Quartzite Network
Optiputer AHM Meeting
San Diego, CA
January 17 2006
Praveen Jagadishprasad Hassan Elmadi
Calit2, UCSD
Phil Papadopoulos
SDSC
Mason Katz
Network Map ( 01/16/2006)
Motivation
• Management
– Inventory
– Troubleshooting
• Programming the network
– Ability to view and manipulate the network as a single
entity.
– Aid network reconfiguration in a heterogenous network
– Experimental networks have high degree of reconfiguration
• Glimmerglass based physical changes
• VLAN based logical topology changes
– Final goal to automate the reconfiguration process.
• Focus on switch/router configuration process
Automated Discovery
•
Minimal input needed.
– One gateway might be sufficient
•
SNMP based discovery
– Not tied to vendor protocol
– Tested with Cisco, HP, Dell, Extreme etc
– Almost all major vendors support SNMP
•
Fast
– Discovery process highly threaded
– 3 minutes for UCSD optiputer network (~600 hosts and 20 switches)
•
Framework based
– Extensible to include mibs for specific switch/router models. For example
– Cisco vlans
– Extreme trunking
Design for discovery and mapping
•
Phase 1 ( Layer 3 )
– Router discovery
– Subnet discovery
•
Phase 2 ( Layer 2)
–
–
–
–
•
Switch discovery
Host discovery
Switch <---> Host mapping
IP arp mapping
Phase 3
– Network mapping
– Form integrated map through
novel algorithms
– Area of research
•
Phase 4
– Web based Viz
– Database storage
Future work
•
•
Reliable discovery of logical topology ( VLANs)
Automate generation of switch/router configs
– Use physical topology information to aid config generation
– Fixed templates for each switch/router model
– Templates are extended depending on configuration needed
•
Batch configuration of switches/routers
–
–
–
–
Support Custom VLANS with only end-host specification
Constructing spanning tree of end-host and intermediate switches/routers\
Schedule dependencies for step-by-step configuration
Physical topology information essential
Optiputer Network Inventory Management –
Logical View
•
Logical topology adds an VLAN table to the physical topology tables.
– VLAN composed of trunks.
– Each Trunk can be a single/multiple port to port connection between same set
of switches
– Schema supports retaining VLAN id when modifying trunks and vice-versa.
LOGICAL TOPOLOGY (Single VLAN) GRAPH
Look at Parallel Data Serving
•
•
128 node Rockstar Cluster (Same as SC2003 Build)
1 SCSI Drive/File Server Node
48 Port GigE + 10GigE Uplink
8
8
8
8
48-port GigE
48-port GigE
48-port GigE
48-port GigE
8 Lustre
Clients
8 Lustre
Clients
8 Lustre
Clients
8 Lustre
Clients
10 Lustre
File Servers
10 Lustre
File Servers
10 Lustre
File Servers
10 Lustre
File Servers
10 Lustre
File Servers
10 Lustre
File Servers
10 Lustre
File Servers
10 Lustre
File Servers
Basic Performance
•
•
•
32, 8, 16, 4 nodes reading the same 32 GB file
Under these Ideal Circumstances, able to read more than 1.4GB/sec from disk
Writing different 10 GB files from each nodes: about 700MB/s
Why a Hybrid Structure
•
•
Create different physical topologies quickly
Change when site/node is connected via packet, lambda or a hybrid
combination
– Want to understand the practical challenges in different circumstances
•
•
Circuits don’t scale in the Internet Sense
Packet switches will be congested in for long-haul
– Real QoS is unreachable in the ossified Internet
•
The engineering compromise is likely a hybrid network
– Packet paths always exist (internet scalability argument)
– Circuit paths on demand
– Think private high-speed networks not just point-to-point
Summary
•
OptIPuter is addressing a subset of the research needed for figuring out
how to waste (I mean utilize) bandwidth
•
Work at multiple levels of the Software stack – protocols, virtual machine
construction, storage retrieval
•
Trying to understand how lambdas are presented to applications
– Explicit?
– Hidden?
– Hybrid?
•
Building an experimental infrastructure as large as our budget will allow
– OptIPuter is already international in scale at 10gigabit.
– Approximating the Terabit Campus with Quartzite