June_2006_paris_pillons_xavier_windows_compute.pps

Download Report

Transcript June_2006_paris_pillons_xavier_windows_compute.pps

Windows Compute Cluster Server
Overview and Update
Paris OpenFabrics Workshop 2006
Xavier Pillons – [email protected]
Principal Consultant
Microsoft France
Top 500 Supercomputer Trends
Clusters
over 70%
Industry
usage
rising
GigE is
gaining
(50% of
systems)
x86 is
leading
(Pentium
41%,
EM64T
16%,
Opteron
11%)
Windows Compute Cluster Server 2003
 Faster time-to-insight through simplified cluster
deployment, job submission and status monitoring
 Better integration with existing Windows infrastructure
allowing customers to leverage existing technology
and skill-sets
 Familiar development environment allows developers
to write parallel applications from within the powerful
Visual Studio IDE
Compute
Cluster
Pack
Compute Cluster Edition
•Support for high performance hardware (x64bit
architecture)
•RDMA support for high performance
interconnects (Gigabit Ethernet, Infiniband,
Myrinet, and others)
Compute
Cluster
Pack
+
Compute Cluster Pack
•Support for Industry Standards MPI2
•Integrated Job Scheduler
•Cluster Resource Management Tools
CCS Key Features
Node Deployment and Administration
Task-based configuration for head and compute nodes
UI and command line-based node management
Monitoring with Performance Monitor (Perfmon), Microsoft Operations Manager
(MOM), Server Performance Advisor (SPA), & 3rd-party tools
Integration with existing Windows and management infrastructure
Integrates with Active Directory, Windows security technologies, management,
and deployment tools
Extensible job scheduler
3rd-party extensibility at job submission and/or job assignment
Submit jobs from command line, UI, or directly from applications
Simple job management, similar to print queue management
Secure MPI
User credentials secured in job scheduler and compute nodes
Standardized MPI stack
Microsoft provided stack reduces application/MPI incompatibility issues
Integrated Development Environment
OpenMP Support in Visual Studio, Standard Edition
Parallel Debugger in Visual Studio, Professional Edition
Windows Compute Cluster Server 2003
Head Node
Active
Directory
Job Mgmt
Cluster Mgmt
Scheduling
Resource Mgmt
Desktop App
Policy,
reports
Jobs
User
Admin
Console
Job Mgr UI
DB/FS
Domain\UserA
Management
Tasks
Input
Cmd line
Admin
Cmd line
High speed,
low latency
interconnect
Data
Node Manager
Job Execution
User App
MPI
End-To-End Security
Kerberos
Client
credential
Scheduler
Secure channel
credential
Secure channel
Data Protection
API
Active
Directory
Node
Mgr
Logon as user
credential
MSDE
Spawn
DB/FS
Data
Kerberos
Task
LSA
Automatic
Ticket renewal
Logon
token
Typical Cluster Topology
Corporate IT Infrastructure
Windows
Update
AD
DNS
Monitoring
• MOM
• 3rd party
DHCP
Systems
Management
• SMS
• 3rd party
Public
Network
Head Node
Compute Node
Compute Node
Admin / User Cons
Node Manager
Node Manager
RIS
MPI
MPI
Job Scheduler
Management
Management
MPI
Management
Private
Network
MPI
Network
NAT
Compute Cluster
Job/Task Conceptual Model
Serial Job
Parallel MPI Job
Task
Task
Proc
Proc
IPC
Proc
Parameter Sweep Job
Task
Task
Task
Proc
Proc
Proc
Task Flow Job
Task
Task
Task
Task
Compute Cluster Server’s
Developer Environment
Compute Cluster Server’s Scheduler software
Programmatic job submission/control
Compute Cluster Server’s MPI software
Derived from Argonne National Lab’s MPI-2 implementation
(MPICH2)
MS MPI consists of 2 parts
For ISVs: Full-featured API of160+ functions
For Users: Command-line (mpiexec) or GUI tool to launch jobs
Why did MS HPC team choose MPI?
MPI has emerged as de-facto standard for parallel programming
Visual Studios 2005
New parallel debugger!
MPI & Open Message Passing (OpenMP) support
Proliferation of MPI libraries
Application
Vendor N
MPICH
Vendor 1
LAM
GigE
GigE
GigE
GigE
Myrinet
Myrinet
Myrinet
Myrinet
IB
IB
MPI
Implementations
IB
IB
Interconnects
RH
RH
SuSE
RH
SuSE
RH
SuSE
RH
RH
SuSE
SuSE
RH
SuSE
RH
SuSE
RH
SuSE
RH
SuSE
SuSE
OS
Distributions
Kernel
Versions
Customers as integrators
Application 2
Application 1
MPICH
LAM
GigE GigE
GigE
GigE
IB
RH
Myrinet
IB
RH RH
SuSE
RH RH RH
SuSE
SuSE
SuSE
SuSE
SuSE
GigE
MyrinetMyrinet
IB
RH
RH RH RH RH RH
SuSE
SuSE
SuSESuSE SuSESuSE
RH
V
e
n
d
GigE
o
r
Myrinet
N IB
RH
SuSE SuSE
Application 3
MPICH
V
LAM
e
V
n
e
d GigE
n
GigE GigE
GigE
o
d
Myrinet
Myrinet
Myrinet
r
o Myrinet
IB
IB
IB
IBr
N
1
RH RH RH RH RH RH
RH RH
RH
SuSE SuSESuSE SuSE SuSE
SuSESuSESuSESu SuSE
SE
RH
IT Manager’s Challenges
What versions of Kernels?
If application X has an MPI error, which vendor should I call for
support?
If application X drops Kernel version Y support, which nodes do I
need to upgrade?
If I remove a node for maintenance, what applications won’t be able
to run anymore?
CCS brings back the sanity
Leverage Winsock Direct for best
performance (latency & throughput)
and CPU efficiency.
Increased Flexibility: Users can
upgrade their network interconnect with
no changes to the application or MPI
stack.
Increased Utilization: All applications
on the cluster benefit from the faster
network…not just MPI applications.
Application
MSMPI
WSD
Any
Interconnect
W2k3
Targeting Industry Standard
Interconnect Fabrics:
Gigabit Ethernet to minimize costs for a
fast network
Infiniband for latency-sensitive and
bandwidth-intensive applications
MS-MPI Leverages Winsock Direct
User Mode
HPC Application
MPI
WinSock DLL
Switch traffic
based on sub-net
Winsock Switch
IB
w/ RDMA
GigE
w/ RDMA
Ethernet
IB WinSock Provider
DLL
User API (verbs based)
User Host Channel
Adapter Driver
TCP
GigE
RDMA
WinSock
Provider
DLL
IP
Kernel Mode
NDIS
Miniport
(GigE)
Virtual Bus Driver
Manage hardware
resources in user
space (eg., Send
and receive
queues)
Miniport
(IPoIB)
Kernel API (verbs based)
Host Channel Adapter Driver
Networking Hardware
OS
component
IHV-provided
component
Networking Performance Continuum
Application
Characteristics
Networking
Offloads
Availability
Legacy
Sockets
App
Legacy
Sockets
App
Enhanced
Sockets
App
Enhanced
Sockets
App
RDMA-enabled NIC
supporting
WSD
Layer 2 traditional
NIC only
Now
RDMA aware
Sockets
App
RDMA-enabled NIC
supporting
RDMA Chimney
Future Windows Server
release
WinIB
Mellanox InfiniBand software stack for Windows
Based on OpenFabrics development
InfiniBand HCA verbs driver and Access Layer
InfiniBand subnet management
IPoverIB driver
SDP driver
WinSock Direct Driver (WSD)
SCSI RDMA Protocol Driver (SRP)
Windows Server 2003, Windows Compute
Cluster Server 2003, Windows Server “Longhorn”*
* WinIB on Windows XP SP2 is supported by Mellanox – It is unsupported by Microsoft
Win IB Software Stack
Applications
MPI2*
User
Winsock
Socket Switch
WSD SAN
Provider
WinSock
Provider
SDP
SPI
MPI2
Management
Tools
Access Layer Library
Verbs Provider Library
Kernel
NDIS
IPoIB Miniport
StorPort
SDP
SRP
Miniport
Kernel Bypass
Applications
TCP/UDP/ICMP
IP
Windows
Win IB
Hardware
Access Layer
Verbs Provider Driver
HCA Hardware
* Windows Compute
Cluster Server 2003
SDP vs. WSD
Sockets Direct Protocol
(SDP)
API
Windows Sockets Direct
(WSD)
Winsock 2.x, POSIX/BSD like API
WHQL
None
SAN / Winsock Direct
Wire protocol
specification
Standard
Microsoft proprietary
OS supported
Windows Server code name
“Longhorn” *
Wire protocol
Interoperability
Any OS that conforms to SDP
specification
IHV module
WinSock Service Provider
Library
SDP kernel module
Windows Server 2003 SP1,
Windows Server code name
“Longhorn”
Windows Server 2003,
Windows Server code name
“Longhorn”
SAN Provider Library
Implementation
domain
Mostly kernel mode
Mostly user mode
* SDP on Windows XP SP2 and Windows Server 2003 SP1 is supported by Mellanox.
It is unsupported by Microsoft.
WHQL for InfiniBand
Background
Driven by Windows
Networking Team
Collaborated with
OpenIB
Available Since Mid
May 06
Details
A test suite for WSD
providers and IP over
IB Miniport drivers
Include functional test
only (no code
coverage)
Signature covers
networking only (no
storage)
Partners
Resources
Main Windows HPC Page
http://www.microsoft.com/hpc
Windows HPC Community
http://www.windowshpc.net
Scalable Networking
http://www.microsoft.com/technet/itsolutions/network/snp/default.
mspx
Download WinIB
http://windows.openib.org/downloads/binaries/
OpenFabrics InfiniBand Windows drivers development –
sign up to contribute
http://windows.openib.org/openib/contribute.aspx
Questions ?
© 2006 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.