Networking - Internet2

Download Report

Transcript Networking - Internet2

What's Next for the Net? Grid Computing
Internet2 Member Meeting
Sept 21,2005
Debbie Montano
[email protected]
www.force10networks.com
1
Global Grid – Networking

Debbie Montano
– Director R&E Alliances, Force10 Networks

Force10 Networks
– GigE / 10 GigE switch/routers
2

Will our networks be able to provide the highspeed access that Grid users will need and
demand?

Grid - Sharing Resources
–
–
–
–
Computing Cycles
Software
Databases / Storage
Network Bandwidth…!
Global Grid – Vision to Reality
Themes…
3

Networks WILL keep up (or catch up) with
needs of Grids

Flexible use of Bandwidth will become
integral to Grids

Ethernet is key
Networks will support Grids

If Grids are the driving applications, the
network will be there

The need is recognized for
–
–
–
–

4
robust networks
increased bandwidth
new network infrastructure
To support vast amounts of data and grid
collaborations
Example: SC2005 supercomputing & high
performance networking conference:
– Over 55 x 10 Gbps of WAN bandwidth is converging on
Seattle
– Approx 40 x 10 GigE of bandwidith for Bandwidth
Challenge
TeraGrid – NSF investment
Credits: Graphics: N.R. Fuller, National Science Foundation Bottom images (left to right): (1) A. Silvestri, AMANDA Project,
University of California, Irvine; (2) B. Minsker, University of Illinois, Urbana-Champaign, using an MT3DMS model
developed at the Army Corps of Engineers and modified by C. Zheng, University of Alabama; (3) M. Wheeler, University of
Texas, Austin; J. Saltz, Ohio State University; M. Parashar, Rutgers University; (4) P. Coveney, University College London /
Pittsburgh Supercomputing Center; (5) A. Chourasia, Visualization Services, San Diego Supercomputer Center and The
Southern California Earthquake Center Community Modeling Environment
5

NSF investing
$150M – on top
of the initial >
$100M
investment -- to
ensure access
to and use of
this Grid
resource!

Most TeraGrid
nodes use
Force10
switch/routers for
access to users
Top 500: Customer Segment
Segment
2004
2005
Industry
55.0%
52.8%

Research
22.0%
22.2%

Academic
16.0%
18.6%

Classified
3.0%
3.4%

Vendor
3.8%
2.8%

Others
6
0.2%
0.2%
-

In the top 500
supercomputers, more than
half of the clusters are
owned by Industry

That type of investment will
drive efficient use and the
necessary supporting
infrastructure

Over 41% of clusters are in
research & academic
environments.

The days of exclusive
ownership and control are
being replaced by sharing
across disciplines, across
university systems,
research labs, states and
even around the world
CERN – International Resource

CERN – International Resource;
International Collaboration

Scientific partners around the world

Investing in networking:
– Announced Monday, 9/19/2005, CERN will deploy
the TeraScale E-Series family of switch/routers as
the foundation of its new 2.4 Terabit per second
(Tbps) high performance grid computing farm
– The TeraScale E-Series will connect more than
8,000 processors and storage devices
– Also provides the first intercontinental 10 Gigabit
Ethernet WAN links in a production network
7
State & Regional Investment

Networking Investment at all Layers

Regional Optical Networks (RONs) are Growing
– State and Universities investing in their own fiber and optical
infrastructure to ensure affordable growth and abundant
bandwidth
– Southern Light Rail
– I-Light Indiana
– LEARN – Texas
– Louisiana Optical Networking Initiative (LONI)
8

Additional GigaPOP Layer 2/3 Services

Costs are continuing to go down
– Ethernet port costs, for example, continuing to drop
– Densities for GigE and 10 GigE continuing to improve
– Lower cost technologies being used more
Flexibility of Bandwidth
9

Lots of Bandwidth but “smart” use

High Speed links dedicated to specific grids versus shared
flexible use of bandwidth

Network links as a resource on the grid itself, to be shared,
managed and allocated as a needed

Need flexible layers above the “dedicated lambdas”
New Architectures: HOPI
NLR 10 GigE
Lambda
NLR Optical
Terminal
NLR Optical
Terminal
OPTICAL
Force10 E600
Switch/Router
Regional
Optical
Network (RON)
Optical
Cross
Connect
Control
Measurement
Support
OOB
HOPI Node
PACKET
10 GigE Backbone
Abilene
Network
Abilene
Network
Abilene core router
GigaPOP
10
GigaPOP
Ethernet is Key

Local Area Network (LAN)

Metropolitan Area Network (MAN)
– Metro Ethernet
– Ethernet Aggregation

Wide Area Network (WAN)
– Carriers moving to ethernet and IP services
– WAN PHY (Physical Interface) playing a role

All the way down to CPU-to-CPU
communication in supercomputers
– Ethernet adoption is continuing to grow
11
What Drives Grid / Cluster Topology?
Four Networking Requirements
I/O To Users
WAN
(Campus backbone or WAN)
Users
I/O To Storage
2 Gigabit
Fiber
15
TByte
Storage
700
Mbytes/sec
1
5000 Linux” compute”
cluster nodes
2
3
Fiber
Connect
15
TByte
10
SAN
Interconnect
Management
(node-to-node communication)
User directory
and applications
12
Grids / Clusters

System Interconnects
– Node-to-node: Inter-processor
Communication (IPC)
– Management Network
– I/O to users, outside world
(campus, LAN, WAN)
– Storage, servers & storage
subsystems

IPC Interconnect Technology –
GigE now #1
– Top 500 Supercomputers
– Ethernet Rapid Growth
– Favored in Clusters

Other System Interconnection
– Major reliance on Ethernet
13
Type
2004
2005
Ethernet
35.2%
42.4%
Myrinet
38.6%
28.2%

SP Switch
9.2%
9.0%

NUM Alink
3.4%
4.2%

Crossbar
4.6%
4.2%

3.4%

Proprietary

Infiniband
2.2%
3.2%

Quadrics
4.0%
2.6%

Other
2.8%
2.8%
-
Interconnects – Ethernet NICs

Speedup methods
– Stateless offload (performance improvement
without breaking I/O stack, compatible with offthe-shelf OS TCP/IP)
– TOE - TCP Offload Engine
– OS bypass / eliminate context switching
– RDMA / remote DMA / eliminate payload
copying
– iWARP / combination of TOE, OS Bypass, and
RDMA
Hot 10 GbE NIC vendors:
14
Management I/O
What Makes Sense?

Management network
is ALWAYS required
– Out-of-band, in-band,
control & management
– CPU & memory utilization per node,
system temperature, cooling.

Management has to touch
each node – device density
is important, helping to
simplify topology

If the cluster is in trouble,
management network is needed
to fix it – must be reliable!

With Ethernet, Management
is FREE
15
User Gateway
What Makes Sense?

Ethernet is ALWAYS the
user gateway
– Dominant installed base &
knowledge base
– End systems are
connected via Ethernet

Ethernet advantages
– No distance limitation
– 5 microseconds per mile
– 7 Gbps over 20km
(541 GB of data in 10 min.)
– Data center or cluster core
switch/router extends directly
into the LAN
– Less devices,
simplifying topology
16
An Example Of Long Distance Sharing
NSF / DoE TeraGrid
Compute-Intensive
256 nodes
Extensible
Backplane
Network
Data collection
analysis
55 nodes
LA
Hub
30Gb/s
Chicago
Hub
40 Gb/s
30Gb/s
Data-Intensive
128 nodes
30Gb/s
Data Set Moved
Here for Computing
Data Sets
Stored Here
17
30Gb/s
Visualization
112 nodes
ComputeIntensive
814 nodes
Role of Ethernet – Benefits
18

Industry Standard (IEEE)

Ubiquitous (Everywhere) and proven Technology

Standard Communication Technology when the Cluster Talks to
the Rest of the World (Grid)

Does Not Suffer From distance Limitations

Scales to 1000’s and even 10,000’s of nodes

Allows for Single Fabric Design

Easy to Configure, Manage, and Administer for Cluster
Environments (Competing Fabrics require cumbersome
multichassis solutions & COMPLEX mapping)

53% yr/yr reduction in price / bit in 15 yrs (ref: Gartner)

Almost All Shipping Servers Include one or more 1000Base-TX
NICs w/ TOE
Global Grid – Vision to Reality
Themes…
19

Networks WILL keep up (or catch up) with
needs of Grids

Flexible use of Bandwidth will become
integral to Grids

Ethernet is key
Thank You
www.force10networks.com
20