GTS SAN Usage A UNIX SysAdmin’s View of How A SAN Works

Download Report

Transcript GTS SAN Usage A UNIX SysAdmin’s View of How A SAN Works

Storage Area Network Usage
A UNIX SysAdmin’s View of
How A SAN Works
Disk Storage
Embedded

Internal Disks within the System Chassis
Directly Attached

External Chassis of Disks connected to a Server via a Cable
Directly Attached Shared

External Chassis connected to more than one Server via a Cable
Networked Storage



NAS
SAN
others
Disk Storage – 2000-2004
Type
Bus Speed Distance
Cable Pins
ATA
100 MB/s
18 inches
40
SCSI
320 MB/s
12 m
68 or 80
FC
400MB/s
10K m
4
SATA-II
300MB/s
6m
22
SAS
300MB/s
10 m
22
Deficiencies of Direct Connect Storage
Single System Bears Entire Cost of Storage


Small Server in an EMC Shop
Large Server cannot easily share its unused storage
Managability

Fragmented and Isolated
Scalability


Limited
What happens when you run out of peripheral bus slots?
Availability


“SCSI Bus Reset”
Failover is a complicated add-on, if available at all
DASD
Direct Access Storage Device

They still call it this in an IBM Mainframe Shop
Basic Limits of Disk Storage Recognized

Latency
 Rotation Speed of the disk

Seek Time
 Radial Movement of the Read/Write Heads

Buffer Sizes
 Stop sending me data, I can’t write fast enough!
SCSI
SCSI – Small Computer System Interface

From Shugart’s 1979 SASI implementation
 SASI: Shugart Associates System Interface
Both Hardware and I/O Protocol Standards



Both have evolved over time
Hardware is source of most limitations
I/O Protocol has long-term potential
SCSI - Pro
Device Independence


Mix and match device types on the bus
Disk, Tape, Scanners, etc…
Overlapping I/O Capability

Multiple read & write commands can be
outstanding simultaneously
Ubiquitous
SCSI - Con
Distance vs. Speed

Double the Signaling Rate
 Speed: 40, 80, 160, 320 MBps

Halve the Cable Length Limits
Device Count: 16 Maximum

Low voltage Differential Ultra3 SCSI can support
only 16 devices on a 12 meter cable at 160 MBps
Server Access to Data Resources

Hardware changes are disruptive
SCSI – Overcoming the Con
New Hardware & Signaling Platforms
SCSI-3 Introduces Serial SCSI Support


Fibre Channel
Serial Storage Architecture (SSA)
 Primarily an IBM implementation

FireWire (IEEE 1394 – Apple fixes SCSI)
 Attractive in consumer market
Retains SCSI I/O Protocol
Scaling SCSI Devices
Increase Controller Count within Server

Increasing Burden To CPU
 Device Overhead
 Bus Controllers can be saturated


You can run out of slots
Many Queues, Many Devices
 Queuing Theory 101 (check-out line) - undesirable
Scaling SCSI Devices
Use Dedicated External Device Controller

Hides Individual Devices
 Provide One Large Virtual Resource



Offloads Device Overhead
One Queue, Many Devices - good
Cost and Benefit
 Still borne by one system
RAID
Redundant Array of Inexpensive Disks
Combine multiple disks into a single
virtual device
How this is implemented determines
different strengths


Storage Capacity
Speed
 Fast Read or Fast Write

Resilience in the face of device failure
RAID Functions
Striping

Write consecutive logical byte/blocks on consecutive physical disks
Mirroring

Write the same block on two or more physical disks
Parity Calculation





Given N disks, N-1 consecutive blocks are data blocks, Nth block is
for parity
When any of the N-1 data blocks are altered, N-2 XOR calculations
are performed on these N-1 blocks
The Data Block(s) and Parity Block are written
Destroy one of these N blocks, and that block can be reconstructed
using N-2 XOR calculations on the remaining N-1 blocks
Destroy two or more blocks – reconstruction is not possible
RAID Function – Pro & Con
Striping


Pro:
Con:
Increases Spindle Count for Increased Thruput
Does not provide redundancy
Mirroring


Pro:
Con:
Provides Redundancy without Parity Calculation
Requires at least 100% disk resource overhead
Parity Calculation


Pro:
Con:
Cuts Disk Resource Overhead to 1/N
Parity calculation is expensive
N-2 calculations are required
If all N-1 data blocks are not in cache, they must be read
RAID Types
RAID 0

Stripe with No Parity
RAID 1

Mirror two or more disks
RAID 0+1

Stripe on Inside, Mirror on Outside
RAID 1+0

Mirrors on Inside, Stripe on Outside
RAID 3

Synchronous, Subdivided Block Access; Dedicated Parity Drive
RAID 4

Independent, Whole Block Access; Dedicated Parity Drive
RAID 5

Like RAID 4, but Parity striped across multiple drives
RAID 0
RAID 1
RAID 3
RAID 5
RAID 1+0
RAID 0+1
Breaking the Direct Connection
Now you have high performance RAID



The storage bottleneck has been reduced
You’ve invested $$$ to do it
How do you extend this advantage to N
servers without spending N x $$$?
How about using existing networks?
How to Provide Data Over IP
NFS (or CIFS) over a TCP/IP Network



This is Network Attached Storage (NAS)
Overcomes some distance problems
Full Filesystem Semantics are Lacking
 …such as file locking


Speed and Latency are problems
Security and Integrity are problems as well
IP encapsulation of I/O Protocols


Not yet established in the marketplace
Current speed & security issues
NAS and SAN
NAS – Network Attached Storage


File-oriented access
Multiple Clients, Shared Access to Data
SAN – Storage Area Network


Block-oriented access
Single Server, Exclusive Access to Data
NAS: Network Attached Storage
File Objects and Filesystems


OS Dependent
OS Access & Authentication
Possible Multiple Writers

Require locking protocols
Network Protocol: i.e., IP
“Front-end” Network
SAN: Storage Area Network
Block Oriented Access To Data
Device-like Object is presented
Unique Writer
I/O Protocol: SCSI, HIPPI, IPI
“Back-end” Network
A Storage Area Network
Storage


StorageWorks MA8000 (24), EVA (2)
HDS is 2nd Approved Storage Vendor
 9980 Enterprise Storage Array – EMC class storage
Switches

Brocade 12000 (8), 3800 (20), & 2800 (34)
 3900’s are being deployed – 32 port
UNIX Servers on the SAN

Solaris (56), IRIX (5), HP-UX (5), Tru64 (1)
Storage Volume Connected to UNIX Servers

13000 GB as of May, 2003
Windows Servers

Windows 2000 (74), NT 4.0 (16)
SAN Implementations
FibreChannel


FC Signalling Carrying SCSI Commands & Data
Non-Ethernet Network Infrastructure
iSCSI


SCSI Encapsulated By IP
Ethernet Infrastructure
FCIP – FibreChannel over IP




FibreChannel Encapsulated by IP
Extending FibreChannel over WAN Distances
Future Bridge between Ethernet & FibreChannel
iFCP - another gateway implementation
NAS & SAN in the Data Center
FCIP In The Data Center
FibreChannel
How SCSI Limitations are Addressed




Speed
Distance
Device Count
Access
FibreChannel – Speed
266 Mbps – ten years ago
1063 Mbps – common in 1998
2125 Mbps – available today
4 Gbps – near future products

Backward compatible to 1 & 2 Gbps
10 Gbps – 2005?



Not backward Compatible with 1/2/4Gbps
But 10 Gig Ethernet will compete
Remember FDDI & ATM
Why I/O Protocols are Coming to IP
IP Networking is ubiquitous
Gigabit ethernet is here

10Gbps ethernet is just becoming available
Don’t have to invest in a second network

Just upgrade the one you have 
IP & Ethernet software is well understood

Existing talent pool for vendors to leverage
 Developers, not end-user Network Engineers
FibreChannel – Distance
1063 Mbps



175m (62.5 um
500m (50.0 um
10 km (9 um
– multi-mode)
– multi-mode)
– single-mode)
2125 Mbps


500m (50.0 um
2 km (9 um
– multi-mode)
– single-mode)
FibreChannel – A Network
Layer 1 – Physical (Media: fiber, copper)


Fibre: 62.5, 50.0, & 9.0 um
Copper: Cat6, Twinax, Coax, other
Layer 2 – Data Link (Network Interface & MAC)


WWPN: World Wide Port Name
WWNN: World Wide Node Name
 In a single port node, usually WWPN = WWNN


64-bit device address
Comparable to 48-bit Ethernet device addresses
Layer 3 – Network (IP & SCSI)


24-bit fabric address
Comparable to an IP address
FibreChannel Terminology: Port Types
N_Port
 Node port – Computer, Disk, or Storage Node
F_Port
 Fabric port – Found only on a Switch
E_Port
 Expansion Port – Switch to Switch port
NL_Port
 Node port with Arbitrated Loop Capabilities
FL_Port
 Fabric port with Arbitrated Loop Capabilities
G_Port
 Generic Switch Port: Can act as any of F_Port, E_Port, or FL_Port
FibreChannel - Topology
Point-to-Point
Arbitrated Loop
Fabric
FibreChannel – Point-to-point
Direct Connection of Server and Storage Node
Two N_Ports and One Link
FibreChannel - Arbitrated Loop
Up to 126 Devices in a Loop via NL_Ports
Token-access, Polled Environment (like FDDI)
Wait For Access Increases with Device Count
FibreChannel - Fabric
Arbitrary Topology
Requires At Least One Switch
Up to 15 million ports can be concurrently logged
in with the 24-bit address ID.
Dedicated Circuits between Servers & Storage

via Switches
Interoperability Issues Increase With Scale
FibreChannel – Device Count
126 devices in Arbitrated Loop
15 Million in a fabric (24-bit addresses)




Bit 0-7:
Bit 8-15:
Bit 16-23:
Port or Arbitrated Loop addr
Area, identifies FL_Port
Domain, address of switch
239 of 256 address available
256 x 256 x 239 = 15,663,104
FibreChannel Definitions
WWPN
Zone & Zoning
LUN
LUN Masking
FibreChannel - WWPN
World-Wide Port Number
A unique 64-bit hardware address for each
FibreChannel Device
Analogous to a 48-bit ethernet hardware address
WWNN - World-Wide Node Number
FibreChannel – Zone & Zoning
Switch-Based Access Control
Analogous to an Ethernet Broadcast Domain
Soft Zone


Zoning based on WWPN of Nodes Connected
Preferred
Hard Zone

Zoning Based on Port Number on Switch
 to which the Nodes are Connected
FibreChannel - LUN
Logical Unit
Storage Node Allocates Storage and
Assigns a LUN
Appears to the server as a unique device
(disk)
FibreChannel – LUN Masking
Storage Node Based Access Control List (ACL)
LUNs and Visible Server Connections (WWPN)
are allowed to see each other thru the ACL.
LUNs are Masked from Servers not in the ACL
LUN Security
Host Software
HBA-based

firmware or driver configuration
Zoning
LUN Masking
LUN Security
Host-based & HBA




Both these methods rely on correct security
implemented at the edges
Most difficult to manage due to large numbers and
types of servers
Storage Managers may not be Server Managers
Don’t trust the consumer to manage resources
 Trusting the fox to guard the hen house
LUN Security
Zoning


An access control list
Establishes a conduit
 A circuit will be constructed thru this


Allows only selected Servers see a Storage Node
Lessons learned
 Implement in parallel with LUN Masking
 Segregate OS types into different Zones
 Always Promptly Remove Entries For Retired Servers
LUN Security
LUN Masking

The Storage Node’s Access Control List
 Sees the Server’s WWPN
 Masks all LUNs not allocated to that server
 Allows the Server to see only its assigned
LUNs

Implement in parallel with Fabric Zoning
LUN - Persistent Binding
Persistent Binding of LUNs to Server Device IDs
Permanently assign a System SCSI ID to a LUN.
Ensures the Device ID Remains Consistent
Across Reconfiguration Reboots
Different HBAs use different binding methods &
syntax
Tape Drive Device Changes have been a repeated
source of NetBackup Media Server Failure
SAN Performance
Storage Configuration
Fabric Configuration
Server Configuration
SAN - Storage Configuration
More Spindles are Better
Faster Disks are Better
RAID 1+0 vs. RAID 5


“RAID 5 performs poorly compared to RAID 0+1 when both are
implemented with software RAID”
Allan Packer, Sun Microsystems, 2002
Where does RAID 5 underperform RAID 1+0?
 Random Write
Limit Partition Numbers Within RAIDsets
SAN - Fabric Configuration

Common Switch for Server & Storage
 Multiple “hops” reduce performance
 Increases Reliability

Large Port-count switches
 32 ports or more
 16 port switches create larger fabrics simply to
carry its own overhead
SAN - Server Configuration
Choose The Highest Performance HBA
Available


PCI: 64-bit is better than 32-bit
PCI: 66 MHz is better than 33 MHz
Place in the Highest Performance Slot


Choose the widest, fastest slot in the system
Choose an Underutilized Controller
Size LUNs by RAIDset disk size

BAD: LUN sizes smaller than underlying disk size
SAN Resilience
At Least Two Fabrics
Dual Path Server Connections


Each Server N_Port is Connected to a Different Fabric
Circuit Failover upon Switch Failure
Automatic Traffic Rerouting
Hot-Plugable Disks & Power Supplies
SAN Resilience – Dual Path
Multiple FibreChannel Ports within Server
Active/Passive Links
Most GPRD SAN disruptions have affected
single-attached servers
SAN – Good Housekeeping
Stay Current With OS Drivers & HBA
Firmware
Before You Buy a Server’s HBA

Is it supported by the switch & storage vendors?
Coordinate Firmware Upgrades

Storage & Other Server Admin Teams Using SAN
Monitor Disk I/O Statistics

Be Proactive; Identify and Eliminate I/O Problems
SAN Backups – Why We Should
Why We Should




Offload Front-end IP Network
Most Servers are still connected to 100baseT IP
1 or 2 Gbps FC Links Increase Thruput
Shrink Backup Times
Why We Don’t

Cost
 NetBackup Media Server License: starts at $5K list
Backup Futures
Incremental Backups


No longer stored on tape
Use “near-line” cheap disk arrays
 Several vendors are under current evaluation
Still over IP


1 Gbps ethernet is commonly available on new servers
10 Gbps ethernet needed in core
Questions