Trends and Directions of Mass Storage in the Scientific Computing

Download Report

Transcript Trends and Directions of Mass Storage in the Scientific Computing

Trends and Directions of Mass Storage in
the Scientific Computing Arena
CAS 2001
Gene Harano
National Center for Atmospheric Research
Scientific Computing Division
Vision
• How do we accomplish that vision?
•
•
•
•
•
•
Handling large datasets – Analysis and Visualization
Shared File Systems and Cache Pools
Middleware and layering
Management tools
Emerging Technologies
(To name a few)
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
2
Large Datasets
• The NCAR MSS was originally a tape based archive.
• NCAR MSS average file size is 35 MBs (11 M files); small
due to historical restrictions (single volume datasets,
model history files) and a large number (25%) of files <
1 MB (user backups)
• Single TB sized files are common for visualization and
analysis
• Currently these large files are sliced up prior to landing in the
archive.
• Access is generally sequential, but some random access.
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
3
Large Datasets
•
Are tape based archives obsolete?
•
No, but there is a need to reevaluate the entire
storage structure at NCAR.
•
•
•
Cache pools
Data warehouses, data sub-setting
The NCAR MSS is being treated as a shared file
system rather than an archive.
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
4
Shared File System
•
•
•
•
Heterogeneous
High-Performance
High-Capacity
Doesn’t yet exist.
Web/
GRID/
servers
Programmatic
Shared Data
Command
Line
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
5
Cache Pools
• External to the archive
• Minimize archive activity
• Temporary data stays out of the archive
• Customized for a smaller set of associated data
• Internal to the archive
•
•
•
•
Minimize tape activity
Improve response time
Federate and distribute
Repackage small files for tape storage under system
control
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
6
Terascale Modeling & Analysis
Advanced Research Computing System (IBM SP)
MSS
Proxy
Data analysis
GPFS
Shared File System
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
7
Terascale Analysis & Visualization
MSS Proxy
Vislab
Data analysis
Storage Area Network
Shared File System
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
8
Data Provisioning & Access
MSS
Proxy
Data
Processor
DSS server
CDP/ESG
Unidata,
DODs
Storage Area Network
Shared File System
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
9
Internal Cache Pools
• NCAR MSS event log modeling (April 2000 –
April 2001) – looking at tape activity
• 20 TB cache pool – can be federated and
distributed
•
•
•
•
30 day average cache residency
70% reduction in tape read-backs
Greatly enhanced response time
Reduce the amount of tape resources or redefine
their use.
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
10
Middleware and Layering
Role of an archive
• An Archive performs 2 basic functions
• Reliably storing data
• Returning data on demand
• Data analysis, data mining, data assimilation,
distributed data servers, etc. are functions
utilizing middleware that sits on top of an
archive and should be implemented independent
of the underlying archive.
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
11
Middleware and Layering
• Separate archive functionality from
•
•
•
•
•
Visualization
Data servers
Data warehousing, data mining, data subsetting
Web and Grid access
Etc.
• Maximally enables the use of COTS
• Allows (transparent) replacement of components as
needed
• Fill the gaps with custom software
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
12
Future Data
WEBServices
Visualization
Data Analysis/Mining/Assimilation
Digital Libraries, Data Servers
Data Cataloging/
Searching
Data Storage
Data Storage
File
Services
Cache
Pools
NCAR MSS Archive
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
13
Management Tools
• There is a need for better user and system
management tools as MSS capacity scales.
• How does a single user manage 1 million files?
• How does a MSS administrator dynamically tune
a system, predict workloads, find and correct
bottlenecks?
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
14
Management Tools
NCAR MSS tools
• Defining new roles
• Single ordinary user
• MSS superuser
• As users come and go, there is a need for:
• Project superuser (new)
• Division data administrator (new)
• Web based metadata user tools
• List, search, catalog holdings – metadata mining
• Remove unwanted files
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
15
Management Tools
NCAR MSS tools
• From the system perspective – utilize data
warehousing and data mining techniques
• System modeling using event logs.
• Capacity planning
• Identify bottlenecks
• Operational monitoring
• Track errors, identify trends (media problems)
• Intrusion detection
• Dynamic system tuning
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
16
Emerging Technologies
•
•
•
•
•
Data Path
Tape
Holographic Storage
Probe-Based MEMS
High-Density Rosetta (analog)
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
17
Data Path
• HIPPI in use today in the NCAR archive
• Fibre Channel will replace our HIPPI in the near
term
• FC SAN for RAID Cache Pools
• FC SAN for Tape sharing
• Others
• iSCSI
• FC over IP
• Infiniband
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
18
Tape
40
1 TB,60MB,2004
35
2H02
200GB 1Q02
Data Rate (MB/sec)
30
500GB 2003
Opt
2003
1 TB
25
20
9840B
15
3590E
10
DTF
SD-3
Mammoth 2
AIT-2 9940
SDLT
3570C 3590
9490 EE
9840 Accelis
Mammoth
3490 E
5
3480/90
0
0
Linear
Helical
DLT-7000
AIT
DLT-4000
3570
10
Ultrium
2001
20
30
40
50
60
70
80
90
100
Native Cartridge Capacity (GB)
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
19
Tape
• To be competitive with magnetic disk, magnetic
tape must grow at 10x each 5 years.
• Achieved by a combination of increased areal
density and longer (and possibly wider) tape.
(from a storage vendor)
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
20
Tape
• RAIT (Redundant Array of Independent Tapes)
•
•
•
•
Increased Performance
Higher Reliability with the use of parity
Higher single “volume” Capacity
Large datasets on a single “volume”
• RAIL (Redundant Array of Independent Libraries)
• Greater total system capacity
• Improved response time
• These are resource intensive solutions – dedicated
libraries and drives
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
21
Holographic
• Large capacity – 10 GBs in a single cubic
centimeter (10 Gbits/in2 for magnetic disk)
• High-speed – 2 Gigabits/sec
• Low power
• Billions of write cycles
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
22
Probe-Based MEMS
• MEMS – Micro-Electrical Mechanical Systems
• Probe-based storage arrays
•
•
•
•
Dense
Highly parallel to achieve high bandwidth
Rectilinear 2D positioning
Commercial devices in the next several years
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
23
HD Rosetta
• Product marketed by Norsam Technologies
• Developed at Los Alamos National Lab
• Analog
• Lifetime of 1000s of years
• Can be read back with only a microscope
• Stores text and images
Scientific Computing Division
CAS 2001 – October 30, 2001
Copyright © 2001 University Corporation for Atmospheric Research
24