STORAGE - Part - III - Department of Computer Science and

Download Report

Transcript STORAGE - Part - III - Department of Computer Science and

A Tutorial
Designing Cluster Computers
and
High Performance Storage Architectures
At
HPC ASIA 2002, Bangalore INDIA
December 16, 2002
By
Dheeraj Bhardwaj
N. Seetharama Krishna
Department of Computer Science &
Engineering
Indian Institute of Technology, Delhi INDIA
e-mail: [email protected]
Centre for Development of Advanced
Computing
Pune University Campus, Pune INDIA
e-mail: [email protected]
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
1
Acknowledgments
• All the contributors of LINUX
• All the contributors of Cluster Technology
• All the contributors in the art and science of parallel
computing
• Department of Computer Science & Engineering, IIT
Delhi
• Centre for Development of Advanced Computing, (C-DAC)
and collaborators
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
2
Disclaimer
• The information and examples provided are based on the
Red Hat Linux 7.2 installation on the Intel PCs platforms
( our specific hardware specifications)
• Much of it should be applicable to other versions of
Linux,
• There is no warranty that the materials are error free
• Authors will not be held responsible for any direct,
indirect, special, incidental or consequential damages
related to any use of these materials
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
3
Outline
• Introduction
• Overview of storage
components
• Overview of Storage
Models
• Files Systems
• I/O
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
• Introduction
• Brief history of storage
technologies
• Importance of storage
subsystems
• Recent requirements and
developments
N. Seetharama Krishna <[email protected]>
4
Introduction
Brief History of Storage Technologies - Make 2-3 slides
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
5
Introduction
Importance of Storage Subsystems
• Greater Demand from Technical and commercial users for
– Higher capacity to meet the growing demands
– Higher performance for meeting the increased user base
– Very high performance to meet the balance between compute and
I/O in technical computing.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
6
Introduction
Importance of Storage Subsystems
• Greater Demand from Technical and commercial users
for
– Manageability challenges for managing data
• A large user base demands
• Large capacity
• Ever increasing demand for through put
• Ever changing application configuration needs
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
7
Introduction
• Required Capabilities
– Meet the demands of Multi Tera Flop Compute
power
– Scalable from 1 TF needs to 10 TF needs
– Network-centered Architecture
– Scalable in performance and capacity
– Centralized back up and archive and
management
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
8
Introduction
• Required Capabilities
– In Built Parallel operation
– A Design Based on Standard Components
– Multiple Hierarchies and Class of Service
– Heterogeneous compute systems support
– Large file size support
– Balanced architecture for mixed work load
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
9
Introduction
Today’s Storage Challenges
• Managing the increasing Volume of Data
• Providing continuous access to information
• Adopting an evolving set of Storage
Technologies
• Investment protection on legacy resources
• Multi vendor Inter operability Issues
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
10
Introduction
Today’s Storage Challenges
Solution :
• An open, standards-based approach to storage
management must be the rule, not the exception
• Open standards address key concerns
– Supporting changing requirements
– Managing heterogeneous device topologies
– Incorporating best-of-breed products to create a
complete storage solution.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
11
Objective
• To create state of the art Scalable, Enterprise wide,
Interoperable, Manageable, Modular and High
Performance Storage involving
– Study of existing technologies
– Sizing the requirements : capacity and performance
– Architecture to meet HPC and Non HPC user
community.
– Meet the mixed and ever changing work load
patterns.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
12
Objective
• To create state of the art Scalable, Enterprise wide,
Interoperable, Manageable, Modular and High
Performance Storage involving
– Central storage facility accessible to authentic in
house & remote users.
– Central Back up facility to take backup of storage
as well as local clients.
– Cost effective Storage Solution
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
13
Outline
• Introduction
• Overview of Storage
Components
• Overview of storage
components
• Disks
• Overview of Storage Models
• Interfaces
• Files Systems
• Protocols (SCSI,FCAL,iSCSI,FC-IP)
• Parallel I/O
• Secondary Storage
• Storage management
(RAID)
Software
• Tertiary Storage (Back
• Security
tapes)
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
14
Storage Components - Disks
Please add at least one slide for one component
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
15
Storage Components - Interfaces
Please add at least one slide for one component
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
16
Storage Components - Protocols
Please add at least one slide for one component
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
17
Storage Components – Secondary Storage (RAID)
Please add at least one slide for one component
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
18
Storage Components – Tertiary Storage (Tape)
Please add at least one slide for one component
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
19
Outline
• Introduction
• Overview of Storage
Models
• Overview of storage
components
• DAS
• Overview of Storage Models
• NAS
• Files Systems
• SAN
• Parallel I/O
• FAS (NAS & SAN coexists)
• Storage management
Software
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
20
Overview of Storage Models - DAS
Direct Attached Storage (DAS) Model
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
21
Direct Attached Storage
Please write Features. Advantages and Disadvantages
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
22
Network Attached Storage (NAS)
Server
Server
Server
Server
TCP/IP LAN
NFS/CIFS Protocol
NAS Head
To Tape
RAID Disk Array
Network Attached Storage (NAS) Model
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
23
Network Attached Storage (NAS)
Please write Features. Advantages and Disadvantages
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
24
Storage Area Network (SAN)
TCP/IP LAN
Server
Server
To Tape
Server
Server
Fiber Channel
Switch
RAID Disk Array
Storage Area Network (SAN) Model
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
25
Storage Area Network (SAN)
Please write Features. Advantages and Disadvantages
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
26
Fiber Attached Storage (FAS)
LAN
Host
Adapters
CIFS/NFS
Host
Adapters
Server
Server
NAS Device
Switch / Hub
Disk
Array
Fiber Attached Storage (FAS) – NAS and SAN co-exists
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
27
NAS and SAN co- exists
Justify NAS and SAN co-existence – Pick up from our
papers
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
28
Advantages of FAS
• Centralizing management to improve staff efficiency
for monitoring and administration
• Enabling storage to be more readily available to any
servers on the network, making stored information a
more valuable asset, and increasing the utility of the
network itself.
• Improving the availability, usefulness, and
distribution of business applications.
• Making automation simpler, and reducing IT
operational costs and staffing requirements.
• Providing greater visibility into the availability and
performance of storage components.
• Facilitating continuous availability requirements.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
29
Outline
• Introduction
• File Systems
• Overview of storage
• Overview
components
• File System
• Overview of Storage Models
Calculations
• Files Systems
• VFS
• Parallel I/O
• CFS
• Storage management
• PFS
Software
• HPSS
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
30
File System Calculation
Aggregate Bandwidth Rates for One Parallel Job
Teraflops
1
Memory Size (GB)
700GB
I/O Rates (GB/s)
1.17 – 2
• Assumptions :
– The lower estimates of memory it is assumed that for n
teraflops machine n3/4 TB of memory is required.
– The higher estimates of memory it is assumed that for n
teraflops machine 2/3*n Terabytes is required.
– Reference :
• Statement of Work :SGS File System
• Report – DOE National Nuclear Security Administration ,
USA , April 2001
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
31
Assumptions for File System Capacity Calculations
• The lower I/O rate estimates are based on the
throughput needed to store one half of the smaller
memory in five minutes.
– (1/2 *700GB) / (5 * 60s) = 1.17 GB/sec.
• The higher I/O rate estimates are assumed that
applications will store one byte for every 500 floating
point operations. This is a common thumb rule used.
– 1TF / 500 Flops = 2GB/sec
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
32
Assumptions for File System Capacity Calculations
• For number of directories it is assumed that very user will
have approximately 5000 directories.
– 300 users * 5000 directories = 1.5*106
•
For number of files it is assumed that minimum 25 files
per directory and maximum 2,00,000 files per directory.
– Minimum
37.5 * 106
1.5*106 directories * 25 files
– Maximum
1011
1.5*106 directories * 2*105 files = 3 *
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
=
33
Assumptions for File System capacity calculations
• File system size is derived using formula
– File system size = 1.25 (7 to 18 * Peak Performance) TB
• Minimum
1.25 (7 * 1 TF) = 8.75 TB
• Minimum 37.5 * 106 * 256K = 9.6TB
• Maximum 1.25 (18 * 1TF) = 22.5 TB
• For number of devices/subsystem we are assuming that
72GB drives are used.
• 8.75 TB / 72GB
 121 drives
• 22.5 TB / 72GB
 312 drives
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
34
In Summary
File System Capacities
Teraflops
1
Number of Users
300
Number of Directories
1.5*106
Number of Files
37.5 * 106 to 3 * 1011
File System size (TB)
8.75 – 22
Number of devices/subsystem
Dheeraj Bhardwaj <[email protected]>
121 - 312 (72GB drives)
N. Seetharama Krishna <[email protected]>
35
I/O Bandwidth
• The File system Maximum Sustained Bandwidth can be
obtained by the formula
• Bfs
* E Max Sustained Bandwidth
Bfs = N * Bdrives
File System
N
Number of Drives
Bdrives
Sustained bandwidth of the slowest disk
E
File system efficiency factor (0.85)
– Minimum Bfs = 121 * 100 MB/s * 0.85 = 10.28 GB/s
– Maximum Bfs = 312 * 100 MB/s * 0.85 = 22.70 GB/s
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
36
Parallel File System
• PFS is designed as a client-server system with multiple I/O
servers, which have disk/RAID attached to them. Each PFS
file is striped across the disk on the I/O nodes.
• PFS also has a manager that handles only metadata
operations such as permission checking for file creation,
open, close and remove operations.
• Direct Parallel I/O
– All participating clients access the storage directly via
request to parallel I/O server.
– This provides the maximum throughput as it by passes the
overheads of intermediate file servers.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
37
Cluster File System
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
38
Outline
• Introduction
• Parallel I/O
• Overview of storage
• Introduction
components
• Parallel I/O
• Overview of Storage Models
Approaches
• Files Systems
• (You can add some
more)
• Parallel I/O
• Storage management
Software
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
39
Introduction
Parallel & Serial I/O: Write the basic differences
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
40
I/O Approaches
•
Following four I/O approaches can be used for data distribution across the
participating processors in the parallel program:
–
UNIX I/O on NFS
–
Parallel I/O on NFS
–
PFS : UNIX I/O with PFS support
– Parallel I/O with PFS support
– Direct Parallel I/O
•
UNIX I/O on NFS
– UNIX I/O, process with rank zero reads the input file using standard UNIX read,
partitions it and distributes it to other processors.
– The file is NFS mounted on the processor with process rank zero only.
•
Parallel I/O on NFS
– All the processors open the file concurrently and read their required data blocks by
moving offset pointer to the beginning of their corresponding data block in the input
file.
– File is NFS mounted from server to all the compute nodes.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
41
I/O Approaches
• UNIX I/O with PFS support
– Define these terms
• Parallel I/O with PFS support
– Define these terms
• Direct Parallel I/O
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
42
Outline
• Introduction
• Storage Management
Software
• Overview of storage
components
• Overview
• Overview of Storage Models
• Features
• Files Systems
• Details of available
software and their
• Parallel I/O
features
• Storage management
• Etc
Software
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
43
Storage Management Software
Please make few slides --- say 8-10
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
44
Outline
• Introduction
• Storage Security
• Overview of storage
• Overview
components
• Other aspects
• Overview of Storage Models
• Files Systems
• Parallel I/O
• Storage management
Software
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
45
Storage Security
Make some slides on Security aspects of Storage systems
e.g. Kerberose etc
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
46
Outline
• Introduction
• Design of Storage
Architecture
• Overview of storage
components
• Approach
• Overview of Storage Models
• Traditional
• Files Systems
• Ideal
• Parallel I/O
• Logical
• Storage management
• Proposed
Software
• Etc
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
47
Approach on Architecture
•
–
•
Compute Nodes
File Servers and File Systems
To support a high bandwidth we have to use special
purpose file systems rather than the traditional file systems
such as UFS, CIFS.
–
Cluster File System (CFS) is a highly available, distributed,
cache-coherent file system that allows UFS file system to be
concurrently accessed on multiple cluster nodes
–
Parallel File System (PFS) is necessary to stripe the data file
across the multiple disks to increase the total I/O
throughput.
–
A set of File Servers configured with cluster file system
(CFS) and parallel file system (PFS) ensures the high
availability and throughput of the data to the users
C1
C2
C35
PARAM System Area Network
C36
C37
C70
Gigabit
Switch
Cluster of File Servers
running Cluster File System
and Parallel File System
FS1
FS2
FS3
FS4
FS5
FS6
FS7
FS8
Distribution Networks
–
As of today, there are two networks (standard Ethernet and
proprietary) available to be used to connect compute nodes
to file servers for data transfer.
–
A third approach, extending the SAN to directly to compute
nodes and avoid file servers (Direct parallel I/O) will reduce
the network bottleneck but an expensive option.
Fiber
Switch
Storage
Array
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
48
Design of Architecture
•
We propose an architecture, which is the mix of DAS, NAS and SAN connected
together to the High Performance Computing Cluster.
•
We have chosen Direct Attached Storage directly connected to the application
server for catering its application development need such as compliers, tools,
source codes etc.
•
It is advisable to keep the application and data storage spaces separate to get
the best performance and to avoid the single point of failure.
•
To achieve a high throughput a massive scalable storage system by combining
multiple disk arrays or a single large array with large number of FC-AL
interfaces.
•
To achieve the throughput of multi Gigabytes at file system level, we have to
size the storage array output to twice the requirement.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
49
Design of Architecture
•
We also have to size the number of disks, which can deliver desired sustained
performance.
•
Our approach of keeping the applications data on to DAS and sequential users data on
NAS and high performance computing data on SAN attached storage, will automatically
separate the data from each other
•
The highly automated tape library connected to the storage array, NAS and DAS with the
Fiber channel interface and accompanied by the data acquisition backup master server,
will help to take the online backup in the server free, and LAN free environment.
•
This will free the CPUs of the file servers for the backup and restore jobs and focus on
serving the high performance computing users.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
50
Scalability
•
The quantities, which should scale are:
–
Access
–
Storage capacity
–
SAN
–
I/O bandwidth
•
Access : Parallel access to multiple devices.
•
Storage Capacity : This can be addressed in two ways:
–
Big Monolithic Storage Box : Support several hundreds of disks but Realizing a large disk array may have
limitations in terms of bandwidth scalability and reliability.
–
Multiple RAID arrays:connected to the fiber channel SAN and configure them as a single storage unit to enhance
the capacity without affecting the bandwidth.
•
SAN :Chassis based storage directors where they can scale from eight ports to few hundreds of ports.
This will provide a non-blocking, full-fledged scalability in SAN.
•
I/O bandwidth : Parallel File System (PFS) that stripes the data file across the multiple disks in the
array through the I/O nodes to increase the total I/O throughput.
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
51
Typical Storage Architecture
System
Area
Network
Backup / Archive
System
Tape
System
Cluster A
Storage
Area
Network
NFS/CIFS
Clients
Visualization
Cluster A File System
LAN
System
Area
Network
WAN
To
other
Sites
Cluster B
Cluster B File System
NFS, CIFS
Servers
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
52
Ideal Storage Architecture
System
Area
Network
System
Area
Network
Cluster A
Cluster B
File System
Servers (CFS/PFS)
NFS/CIFS
Clients
Visualization
GPFS
Storage
Area
Network
Gigabit
LAN
Backup Archive Server
NFS, CIFS Servers
Tape
Dheeraj
Bhardwaj <[email protected]>
System
N. Seetharama Krishna <[email protected]>
WAN
53
Physical Storage Components connectivity
I/O Storage Nodes
M0
FS1
FS2
FS3
FS4
FS5
FS6
32 Port Switch - A
I/O Spare, B/up Dev.,Storage Mgr
FS7
FS8
M1
M2
M3
32 Port Switch - B
Tape Library
A
A
B
B
A
A
B
B
A
A
B
B
A
A
B
Storage
Controller
Storage
Controller
Storage
Controller
Storage
Controller
Disk Subsystem
Disk Subsystem
Disk Subsystem
Disk Subsystem
Dheeraj Bhardwaj <[email protected]>
B
N. Seetharama Krishna <[email protected]>
54
Network Based Scalable High Performance Storage Architecture
PARAM 20000
C1
C2
Cluster of File Servers
running Cluster File System
Miscellaneous Servers
FS1
FS2
FS3
FS4
M0
M1
M2
M3
Router
PARAM System Area Network
C36
C37
Internet
C35
FS5
C70
FS6
FS7
FS8
M4
M5
M6
M7
DAS
M0 – Scheduler
M1 – Spare Server
M2 – Developmental User Nodes
M3 – Storage Mgmt. Server
M4 – Visualization Server
M5 – Gateway & Authentication Server
M6 – Backup Server
M7 – Spare Server
Gigabit
Storage Area Network
Fast
Ethernet
FS1- FS8 – File Servers
C1 – C70 – Compute Nodes
NAS Server
FC-AL SWITCH
2 GBps
1TB – 3TB
Project 1
.
.
Storage
Array
Backup
Library
Gigabit Ethernet
Project n
MIS
PARAM System
Area Network
Trunk Ethernet
FC-AL
2TB – 20TB 20TB – 200 TB
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
Fast Ethernet
55
Outline
• Introduction
• Discussions
• Overview of storage
• Suggested technologies
components
• Future
• Overview of Storage Models
• Other aspects
• Files Systems
• Conclusion
• Parallel I/O
• Storage management
Software
• Security
• Designing the Storage
Architectures
• Discussions
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
56
Recommended Technologies
•
Disks :
Min 72 GB , Dual Port FC-AL ,10000 RPM
•
Protocol :
SCSI
•
Interface :
FC-AL Interface
•
Storage Connectivity :
2 Gb/s Multi-Path Fiber Switches
•
Storage Array :
Host Intelligence Based with Modular and linear
scale up Architecture
•
File System Access :
•
File System :
•
Back Up :
•
Compute node
•
Architecture :
Direct ,PFS and NFS V4 thro’ Gigabit N/W
POSIX Compliant IEEE/ANSI 1003.X Cluster File
System with PFS
Fiber tape Libraries with HSM
Access through NFS and PFS on Gigabit Ethernet .
FAS Based, Combination of DAS, NAS & SAN
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
57
Futuristic C-DAC Enterprise File System by 2005
Special
Purpose
Computers
Visualization
Workstation
R&D
Project SMP /
Numa Systems
PARAM 20000
DB Servers
Suitable Architecture for GRID
Dheeraj Bhardwaj <[email protected]>
N. Seetharama Krishna <[email protected]>
58