Parallel IO for Cluster Computing

Download Report

Transcript Parallel IO for Cluster Computing

Parallel IO for Cluster Computing
Tran, Van Hoai
Why data is concerned?
Ultimate performance of the machine depends
heavily on the quality of input/output (I/O)
operations
• People often say
– let us assume that the data has been pre-loaded in
the various processors
– suppose that data can be sent to the processors in
linear time
– or other false premises
Technology development
• Moore's law: the doubling of transistors on a
chip every 18 months or so
• Real disk storage densities progress at a rate
of 60 to 80% per year
• Disk access time improvement has been less
than 10% per year
Typical disk drive is 105 times slower in performing a random
access than is achieved in the main memory of a computer
Amdahl's law
• Sequential IO operations decrease overall
performance
Main topics in Parallel IO
• File Systems and Parallel I/O for Clusters
• Data Distribution and Load Balancing in the Presence of I/O
Operations
• Novel Hardware and Software I/O Architectures
• Parallel Disk Models and Algorithms
• Parallel I/O Support for Databases
• I/O Performance Analysis: Resources and Tools for Benchmarking
• Drivers and Application Programming Interfaces
• Advances in Storage Technology
• Tools for Operating and Managing I/O Operations
• Compilers Techniques for High Performance I/O Operations
• Language and Runtime Libraries
• Network Attached Storage and Storage Area Network
• Standards and Industrial Experiences with Massive Data Sets
Driver level
• Direct
readings/writing
in disk transfer
Driver level (2)
Activity of buses with two concurrent readings
Driver level (3)
Re-write SCSI driver of Linux to avoid one copy towards memory
Direct access vs normal access
25% throughput improved
Parallel File System
• A parallel file system transparently stripes
data across multiple disks and I/O nodes
• It provides a global name space which results
in simplified file management and flexible
access to files
Parallel Virtual File System (PVFS)
• User-level system utilizing TCP and the existing
file system on each I/O node
• File data is striped across I/O nodes according
to user specification
• Client-server
– Client: a daemon that controls reading and writing
for that node.
– Single manager: a daemon stores meta-data and
controls file operations
Parallel Virtual File System