Clusterix:National IPv6 Computing Facitlity in Poland

Download Report

Transcript Clusterix:National IPv6 Computing Facitlity in Poland

Clusterix:National IPv6 Computing
Facility in Poland
Artur Binczewski
[email protected]
Radosław Krzywania
[email protected]
Maciej Stroiński
[email protected]
Jan Węglarz
[email protected]
Agenda
• Clusterix Project
• PIONIER Network
• Clusterix Network Architecture
• Network as a resource
• Dynamic Computing Resources
Clusterix Project
Clusterix Project
• Initiated in the year 2003 by 12 Polish computing centers
• Objectives:
– To build productive and efficient GRID environment
– To provide enhanced security to created GRID infrastructure
– To introduce IPv6 based communication to GRID applications
– To create scalable computing infrastructure with dynamic resourced
attachment
Clusterix Project
• 64 bits Intel computing nodes
• Over 800 processors with
computing power at 4.4 TFLOPS
• Linux operating system (Debian
distribution)
• IPv6 as primary protocol (with
IPv4 coexistence)
• Communication based on
dedicated channels within
PIONIER network
PIONIER network
PIONIER network
• Polish Optical Internet – PIONIER
– Modern fiber based network
– Connects 21 academic and research centres
– Over 5500 km of fibers is planned (over 3500 km
exist by now)
– Build with DWDM infrastructure
– 10 Gbps capacity is available by now
PIONIER network
GDAŃSK
KOSZALIN
OLSZTYN
BASNET 34 Mb/s
PIONIER’S FIBERS
SZCZECIN
BYDGOSZCZ
BIAŁYSTOK
TORUŃ
2 x 10 Gb/s
(2 lambdas)
POZNAŃ
10 Gb/s
(1 lambda)
GÉANT 10+10 Gb/s
WARSZAWA
ZIELONA
GÓRA
CBDF 10GE
1 Gb/s
ŁÓDŹ
RADOM
WROCŁAW
Metropolitan
Area
Networks
CZĘSTOCHOWA
KIELCE
OPOLE
PUŁAWY
LUBLIN
KATOWICE
RZESZÓW
KRAKÓW
BIELSKO-BIAŁA
CESNET, SANET 10 Gb/s
Clusterix Network Architecture
Clusterix Network Architecture
•
•
•
•
•
Communication to all cluster is
passed through router/firewall
routing based on IPv6 protocol,
with IPv4 for back compatibility
feature
Application and Clusterix
middleware are adjusted to
IPv6 usage
For security reason only
outgoing connections to
Internet are permitted
Two 1 Gbps VLANs are used to
improve management of
network traffic
–
–
Local Cluster
Switch
Clusterix Storage
Element
PIONIER
Core Switch
Access Node
1 Gbps
Backbone Traffic
Computing
Nodes
Communication VLAN is dedicated Communication
to support nodes messages
& NFS VLANs
exchange
NFS VLAN is dedicated to support
file transfer
Internet Network
Internet Network
Access
Router
Firewall
Network as a resource
Network as a resource
• Network management application
– Objectives and features
• Tracking and monitoring network status
• Performing measurements
• Discovering failures location
• Providing network statistics for GRID services
• Layer 3 QoS management
• Automatic measurement session configuration
• Failure resistance
Network as a resource – Measurements
• Measurement
architecture
PIONIER
Backbone
Measurements
SNMP
Monitoring
– Distributed 2-level
measurement agent mesh
(backbone/cluster) Measurement
Network
Manager
– Centralized control Reports
manager (multiple
redundant instances)
– Switches are monitored via
SNMP
– Reports are stored by
manager (forwarded to
database)
– IPv6 protocol and
addressing schema is used
for measurement
Computing Cluster
Local Cluster
Measurements
– GUI shows network status
and configure manager
– Backup managers improves
failure recovery (active
manager switching)
– External applications are
allowed to retrieve various
network statistics
– Devices and agents
management modules collect
network data
System Manager
– Statistics are stored in
external database (short time
backup is stored in manager)
System
Resources
• Manager architecture
External
Entities
Network as a resource – Architecture
Database
Controller
External
Clients
GUI
External
Interfaces
Backup
Manager
Redundancy
Controller
System Logic
Measurement Agents
Manager
Device
Manager
Backbone measurements
Devices
Local Cluster measurements
Network as a resource – Protocol
• Active Measurement Protocol
–All agent types uses the same communication protocol
–First implementation was OWAMP based
–One way measurements was abandoned, and round trip
measurement approach is used
–Future modifications was done due to non-fixed messages length and
extra requirements
–Protocol supports both IPv6 and IPv4 protocols
–Measurements traffic pattern can be specified for more detailed
network examination
–Network metrics: •RTT
•Jitter
•Packet loss
•Duplicated packets
•Packets out of order
Network as a resource – Monitoring
• Monitoring
– Core switches are monitored via SNMP protocol to track
• Interfaces status
• Maximum available capacity
• Current link utilization
– SNMP View is used to improve device's security
Network as a resource – Fail Safe
Regular working
Manager
Backup
Manager
Synchronization
Data
Measurement Network
• Only one active manager is
allowed (selection algorithm is
based on Bully algorithm)
• Required data are exchanged
between active and backup
managers
• Measurement agents register at
active manager only
Network as a resource – Fail Safe
Failure event
• In case of failure, the selection of
new active manager is
performed
• Agents not register until new
active manager is elected
• Measurements are still
performed, and results are
temporarily stored on agents
side
• Newly elected manager recovers
system state and accepts agents
registrations
• System is ready to serve
Manager
Failure
New
Manager
Network as a resource – GUI
• GUI
– Provides view of network status
– Gives look at statistics
– Simplifies network
troubleshooting
– Allows to configure
measurement sessions
– Useful for topology browsing
Dynamic Computing Resources
Dynamic Computing Resources –
Motivation
• External clusters can be easily attached to Clusterix
infrastructure in order to:
– Increase computing power with new clusters
– Utilize external clusters during nights or non-active
periods
– Make Clusterix infrastructure scalable
Dynamic Computing Resources Architecture
• Dynamic cluster attachment:
– Requirements needs to be
checked against new clusters
Local
Switch
PIONIER
Backbone Switch
• Installed software
• SSL certificates
– Communication through
router/firewall
Internet
– Network Management System
will automatically discover new Regular
resources
Cluster
– New cluster can serve
computing power on regular
basis
Router
Firewall
Dynamic
Resources
Summary
• Fast computing center interconnection through
PIONIER
• IPv6 protocol is introduced to GRID environment
• Failure resist network monitoring system
• Network is used as a regular GRID resource
• Dynamic architecture allows easy power upgrades
Thank you for your attention!
Visit http://www.clusterix.pcz.pl