The University of Sunderland Grid Computer

Download Report

Transcript The University of Sunderland Grid Computer

The University of Sunderland
Cluster Computer
IET Lecture by John Tindle
Northumbria Network, ICT Group
Monday 11 February 2008
Overview of talk







SRIF3 and Potential Vendors
General Requirements
Areas of Application
Development Team
Cluster Design
Cluster System Hardware + Software
Demonstrations
United Kingdom – Science
Research Investment Fund (SRIF)

The Science Research Investment Fund
(SRIF) is a joint initiative by the Office of
Science and Technology (OST) and the
Department for Education and Skills (DfES).
The purpose of SRIF is to contribute to higher
education institutions' (HEIs) long-term
sustainable research strategies and address
past under-investment in research
infrastructure.
SRIF3




SRIF3 - 90% and UoS - 10%
Project duration about two years
Made operational by late December
2007
Heriot Watt University - coordinator
Potential Grid Computer
Vendors






Dell – selected vendor
CompuSys – SE England
Streamline - midlands
Fujitsu - Manchester
ClusterVision - Dutch
OCF - Sheffield
General requirements
General requirements






High performance general purpose
computer
Built using standard components
Commodity off the shelf (COTS)
Low cost PC technology
Reuse existing skills - Ethernet
Easy to maintain - hopefully
Designed for Networking
Experiments





Require flexible networking
infrastructure
Modifiable under program control
Managed switch required
Unmanaged switch often employed in
standard cluster systems
Fully connected programmable intranet
System Supports






Rate limiting
Quality of service (QoS)
Multiprotocol Label Switching (MPLS)
VLANs and VPNs
IPv4 and IPv6 supported in hardware
Programmable queue structures
Special requirements 1


Operation at normal room temperature
Typical existing systems require





a low air inlet temperature < 5 Degrees C
a dedicated server room with
airconditioning
Low acoustic noise output
Dual boot capability
Windows or Linux in any proportion
Special requirements 2

Concurrent processing, for example




continued
Boxes 75% cores for Windows
Boxes 25% cores for Linux
CPU power control – 4 levels
High resolution displays for media and
data visualisation
Advantages of design




Heat generated is not vented to the
outside atmosphere
Airconditioning running cost are not
incurred
Heat is used to heat the building
Compute nodes (height 2U) use
relatively large diameter low noise fans
Areas of application
Areas of application
1. Media systems – 3D rendering
2. Networking experiments
MSc Network Systems – large cohort
3.
4.
5.
6.
Engineering computing
Numerical optimisation
Video streaming
IP Television
Application cont 1
7. Parallel distributed computing
8. Distributed databases
9. Remote teaching experiments
10. Semantic web
11. Search large image databases
12. Search engine development
13. Web based data analysis
Application cont 2
14. Computational fluid dynamics
15. Large scale data visualisation using
high resolution colour computer
graphics
UoS Cluster Development
Team

From left to right

Kevin Ginty
Simon Stobart
John Tindle
Phil Irving
Matt Hinds

Note - all wearing Dell tee shirts




UoS Team
UoS Cluster
Work Area
At last all up and running!
UoS Estates Department

Very good project work was completed
by the UoS Estates Department


Electrical network design
Building air flow analysis




Computing Terraces
Heat dissipation
Finite element (FE) study and analysis
Work area refurbishment
Cluster Hardware
Cluster Hardware




The system has been built using
Dell compute nodes
Cisco networking components
Grid design contributions from both Dell
and Cisco
Basic Building Block

Compute nodes
Dell PE2950 server
Height 2U
Two dual core processors
Four cores per box
Ram 8G , 2G per core

http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/





Computer Nodes




Network interface cards 3 off
Local disk drives 250G SATA II
The large amount of RAM facilitates
virtual computing experiments
VMWare server and MS VirtualPC
Cisco 6509 switch





Cisco 6509 URL (1off)
Cisco 720 supervisor engines (2off)
Central network switch for the cluster
RSM router switch module
Provides
6509 Provides






720Mbps full duplex, (4off port cards)
Virtual LANs - VLAN
Virtual private networks - VPN
Link bandwidth throttling
Traffic prioritisation, QoS
Network experimentation
Cluster Intranet

1.
2.
3.
The network has three buses
Data
IPC
IPMI
1. Data bus


User data bus
A normal data bus required for
interprocessor communication between
user applications
2. IPC Bus



Inter process communication (IPC)
“The Microsoft Windows operating system provides
mechanisms for facilitating communications and data
sharing between applications.
Collectively, the activities enabled by these
mechanisms are called interprocess communications
(IPC). Some forms of IPC facilitate the division of
labor among several specialized processes”.
IPC Bus




“Other forms of IPC facilitate the division of
labor among computers on a network”.
Ref Microsoft Website
IPC is controlled by the OS
For example IPC is


continued
Used to transfer and install new disk images on
compute nodes
Disk imaging is a complex operation
3. IPMI Bus

IPMI

Intelligent Platform Management Interface
(IPMI) specification defines a set of
common interfaces to computer hardware
and firmware which system administrators
can use to monitor system health and
manage the system.
Master Rack A







Linux and Microsoft
2 – PE2950 control nodes
5 – PE1950 web servers
Cisco Catalyst 6509
720 supervisor engines
2 * 720 supervisors
4 * 48 port cards (192 ports)
Master Rack A cont

Compute nodes require

40*3 = 120 connections

Disk storage 1 – MD1000

http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/


Master rack resilient to mains failure
Power supply

6 kVA APC (hard wired 24 Amp PSU)
Master Rack A KVM Switch






Ethernet KVM switch
Keyboard, Video display, Mouse - KVM
Provides user access to the head nodes
Windows head node, named – “Paddy”
Linux head node, named - “Max”
Movie USCC MVI_6991.AVI
Rack B Infiniband



InfiniBand is a switched fabric
communications link primarily used in highperformance computing.
Its features include quality of service and
failover and it is designed to be scalable.
The InfiniBand architecture specification
defines a connection between processor
nodes and high performance I/O nodes.
Infiniband Rack B




6 – PE2950 each with two HCAs
1 – Cisco 7000P router
Host channel adapter (HCA) link
http://157.228.27.155/website/CLUSTER-GRID/Ciscodocs1/HCA/

Infiniband

http://en.wikipedia.org/wiki/InfiniBand
Cisco Infiniband




Cisco 7000p
High speed bus 10Gbits/sec
Low latency < 1microsec
Infiniband 6 compute nodes


24 cpu cores
High speed serial communication
Infiniband



Many parallel channels
PCI Express bus (serial DMA)
Direct memory access (DMA)
General compute Rack C


11 – PE2950 computer nodes
Product details
Racks





A*1 - 2 control (+5 servers) GigE
B*1 - 6 Infiniband (overlay)
C*3 - 11 (33) GigE
N*1 - 1 (Cisco Netlab + VoIP)
Total compute nodes

2+6+33+1 = 42
Rack Layout
-CCBACNF C C B A C N F
 Future expansion – F
 KVM video - MVI_6994.AVI

Summary - Dell Server 2950







Number of nodes 40 + 1(lin) + 1(win)
Number of compute nodes 40
Intel Xeon Woodcrest 2.66GHz
Two dual core processors
GigE NICs – 3 off per server
RAM 8G, 2G per core
Disks 250G SATA II
Summary - cluster speedup






Compare time taken to complete a task
Time on cluster = 1 hour
Time using a single CPU = 160 hours or
160/24 = 6.6 days approx 1 week
Facility available for use by companies
“Software City” startup companies
Data storage

Master nodes via PERC5e to MD1000
using 15 x 500G SATA drives
Disk storage 7.5T
Linux 7 disks
MS 2003 Server HPC 8 disks
MD1000 URL

http://157.228.27.155/website/CLUSTER-GRID/Dell-docs2/




Power



Total maximum load generated by Dell
cluster cabinets
Total load = 20,742kW
Values determined by using Dells
integrated system design tool

Power and Noise
Web servers

PE1950
Height 1U
Five server
Web services
Domain controller, DNS, DHCP etc

http://157.228.27.155/website/CLUSTER-GRID/Dell-docs1/




Access Workstations



Dell workstations (10 off)
Operating Systems WinXP Pro
HD displays LCD (4 off)





Size 32 inch wall mounted
Graphics NVS285 – 8*2 GPUs
Graphics NVS440 – 2*4 GPU
Graphics processor units
Support for HDTV
Block Diagram
University of Sunderland Cluster Computer USCC
Servers 1950
Server 2950
Switch 6509
CPUs
Campus network
2 * 720 supervisor engines
720 Gb duplex
Cluster gateway
Web server
5 * 1950
3 * Win2003 server
2 * Linux
online support for users
Local access workstations
10 * Dell PCs
Visualsation of data, display area
4 * 37 inch LCD flatscreens
Intranet
data, control
monitor, spare
40Gb per slot
720 aggregate bandwidth
4 * line cards
4*48 port line cards
4 * 48 = 192 ports
1Gb Ethernet links copper
Support for
VPNs
QoS
MPLS
rate limiting
private VLANs
IPv4 and IPv6 routing
in hardware
GigE intranets
3 * LANS
2 CPUs/node,
2 cores/CPU
4 cores/node
42 * nodes, 2 * head nodes Lin/Win, 40 compute nodes
NAS 7.5Tb
Infininband switch - 7000P
250Gb SATA
Distributed stirage 40 * 250Gb = 10Tb
Ethernet 3 LANs
8 * 500Gb Lin
7 * 500G Win
Ram 8Gb
Infiniband overlay
6 nodes
2 * HCA/node
10Gbps links
Movie USCC

MVI_6992.AVI
Cluster Software
Cluster Software



Compute Node Operating Systems
Scientific Linux (based on Redhat)
MS Windows Server 2003

High performance computing - HPC
Scali

Scali Management


Scali is used to control the cluster





software to mange high performance cluster
computers
start and stop processes, upload data/code and
schedule tasks
Scali datasheet
http://www.scali.com/
http://157.228.27.155/website/CLUSTER-GRID/Scali/
Other software





Apache web services
Tomcat, Java server side programming
Compilers C++, Java
Servers FTPD
3D modelling and animation


Blender
Autodesk 3DS Max software
Virtual Computing
Virtual Network Security
Experiment - example


Virtual Network VMWare Appliances
Components
(1)
(2)
(3)
(4)
(5)
NAT router
WinXP-sp2 attacks FC5 across network
Network hub - interconnection
Firewall - protection
Fedora Core FC5 target system
Network Security Experiment
VMware host
XPProSP2
Eth0
NAT/ (VMnet8)
SW2
Red
eth1
Ethernet 2
FC5
Green
eth0
Ethernet
NAT Firewall
Forward port 80 from Red to
FC5’s IP
HUB (VMnet4)
Eth0
Load Apache (httpd) web
server
Security Experiment





A total of 5 virtual networking devices
using just one compute box
Port scanning attack (Nessus)
Intrusion detection (Snort)
Tunnelling using SSH and Putty
RAM required 500K+ for each network
component
Cisco Netlab





Cisco Netlab provides
Remote access to network facilities for
experimental purposes
Netlab is installed the Network cabinet
Plus
VoIP demonstration system for teaching
purposes
Network Research
Current Research
Network Planning



Network Planning Research
Network model using OOD
Hybrid parallel search algorithm based
upon features of



Parallel genetic algorithm (GA)
Particle swarm optimisation (PSO)
Ring of communicating processes
Network Planning Research




Web services
Server side programs - JSP
FTPDaemon, URL objects, XML
Pan Reif solver


Steve Turner PhD student


based on Newton’s Method
Submit May 2008 – first to use USCC
UoS Cluster Computer USCC
Hybrid GA
Telecom Network Planning
DSL for ISP
DSL Network Plan
Schematic Diagram
Numerical output from
GA optimiser – PON Equipment
Data visualisation - multidimensional data
structure: location, time and service types
Demonstrations
1.
2.
IPTV
Java test program
Demonstration 1 - IPTV





IP television demonstration
IP internet protocol
Video LAN client – VLC
Number of servers and clients – 10
Video streams standard definition


4 to 5Mbps
Multicasting Class D addressing
IPTV

IGMP





Internet group management protocol
Video streams HD 16Mbps
HD only uses 1.6% of 1Gbps
Rudolph Nureyev dancing
Six Five Special 1957


Don Lang and the Frantic Five
New dance demonstration - Bunny Hop
Demonstration 2






Java demonstration test program
Compute node processes 40
Workstation server 1
Communication via UDP
Graphical display on local server of data
sent from compute nodes
Network configuration – star
Star network
Cluster Demonstration Program
Node 1
Node 2
Server Node
Star
Ring
Node 39
Node 40
Cluster configuration file

Description of File ipadd.txt

1
192.168.1.50
192.168.1.5
192.168.1.7
192.168.1.51

Equation

double val = 100 * ( 0.5 + Math.exp(-t/tau) * 0.5 * Math.sin(theta)) ;




Node id
Hub server address
Previous Compute Node
Next Compute Node
Hub2 spare
Screenshot of hub server bar
graph display
USCC configuration

Single demo in a compute node


All compute node


40*5 = 200
Workstations 10


Dirs 1+4 = 5 (top level + one per core)
20*200 = 2000
Ten demos

10*2000 = 20,000 directories to set up
Java program
to configure cluster
UoS Cluster Computer
Inaugural Event
UoS Cluster Computer
Inaugural Event




Date: Thursday 24 April 2008
Time: 5.30pm
Venue: St Peter’s Campus
Three speakers (each 20minutes)



John MacIntyre - UoS
Robert Starmer - Cisco San Jose
TBA - Dell Computers
USCC Inaugural Event



Attendance is free
Anyone wishing to attend is asked to
register beforehand to facilitate catering
Contact via email
[email protected]
The End


Thank you for your attention
Any questions

Slides and further information
available at URL

http://157.228.27.155/website/CLUSTER-GRID/
