Transcript Document

Storage
1
Storage 101
What is a storage array?
Agenda
• Definitions
• What is a SAN Fabric
• What is a storage array
• Front-end connections
• Controllers
• Back-end connections
• Physical Disks
• Management
• Performance
• Future – Distributed storage
Definitions
• SAN – Storage Area Network
– This is generally used as a catch all term for all the following definitions
– For storage personnel SAN does NOT equal storage array
• LUN – Logical Unit Number, also known as a volume
• WWN – World Wide Name
•
•
•
•
•
•
•
– MAC address for storage networks
Fabric – Network that connects hosts to storage
iSCSI – Internet SCSI
SCSI –
FC – Fibre Channel
FCoE – Fibre Channel over Ethernet
FCIP – Fibre Channel over IP
Storage Array – Storage Device that provides block level access to
volumes
• DAS/DASD – Direct Attached Storage
– Storage directly attached to a server without any network
• NAS – Network attached Storage
– Storage device that provides file level access to volumes
• RAID – Redundant array of Independent Disks
– A way to combine multiple physical disks into a logical entity providing
different performance and protection characteristics.
What is a SAN fabric
• A network comprising hosts, storage arrays they access,
and storage switches that provide the network connectivity
5
6
7
SAN Fabric Details
• A SAN Fabric has hosts that connect to the network
– Each host has a physical connection and some logical
addresses
 pWWN (Port WWN) is the equivalent MAC address for the port on
the host that is connected to the network
 FCID is a dynamic address that represents the connection as well
– Only HP-UX 11v2 and below use this
– Typically hosts connect into some storage switch
 These look like traditional network switches in many ways and
operate the same way.
 These switches will contain both host ports and storage ports, or
in the storage world, initiators and targets
– Storage arrays that provide storage also connect into these
switches to provide the full network
8
What is a storage array?
• A storage array is a system that consists of components
that provide storage available for consumption
– The components are front-end ports, controllers, back-end
ports, and physical disk drives
10
Front-end connections
• Front-end connections are used for individual hosts to
connect to the storage array and utilize the volumes
available
– This can be directly connected in a small or medium size
SAN, or in a DAS environment
• The physical transport mechanism can be fibre or copper
• The logical transport protocols can be block level protocols
such as iSCSI, FC, or FCoE
– Some arrays also support file level protocols as well such as
NAS devices
• The larger arrays tend to have more front-end connections
to aggregate bandwidth and provide load balancing
• Volumes are typically presented via one or more front-end
connections to hosts
Controllers
• Controllers are the brains that translate the request from
the front-end ports and determine how to fulfill the request
• Controllers run code optimized for moving data and
performing mathematical calculations needed to support
RAID levels
• Controllers also have a certain amount of on-board
memory, or cache, to help reduce the amount of data that
has to come from spinning disks.
– Many arrays perform some level of read-ahead caching and
write caching to optimize performance
• They also have some diagnostics routines and
management in order to support the operations of the
array.
Back-end connections
• From the controllers themselves to the physical disk
shelves or disks there are back-end connections.
– These send actual commands to the disks commanding them
to retrieve or write blocks of data.
– These connections are usually transparent to all but the most
sophisticated storage engineer.
 Often times these have specific fan-out ratios where each disk
shelf may have two or four connections and split the bandwidth
available in some way.
– Back-end connections are rarely a bottleneck
Physical Disks
• These days physical disks come in all shapes and sizes
– Spinning drives come in capacities of anywhere from 146GB
to 3TB, with the space increasing year over year (though not
performance)
 These drives also come in various rotational speeds anywhere
from 5400 RPM in a laptop drive to 15000 RPM in an enterprise
class drive, which directly affects performance
– Non Spinning drives, also known as SSD’s, come in
capacities that don’t yet match spinning drives, though there
are SSD cards that have up to 960GB of storage space
available.
– These physical disks directly impact the performance of the
storage array system, and are usually the bottleneck for most
enterprise class storage systems.
Provisioning
• Provisioning storage is a multi-step process
– Configure the host with any software including multi-path
support
– Alias the host port WWN
– Zone the host port alias to a storage array WWN
– Activate update zone information
– Create host representation on storage array
– Create volume on storage array
– Present/LUN Mask volume to correct host
– Format volume for use
15
Performance
• There are many statistics you can use to monitor your storage devices,
however there tend to be two key ones that directly impact
performance more than most.
• IOPS – Input/Output Operations Per Second
– This is based on the number of disks that support the volume being
used and the RAID level of the volume
 15k RPM disks provide 200 IOPS raw without any RAID write penalty
 Raid 1 has a 1:2 ratio for writes. For every 1 write command sent to the array,
2 commands are sent to the disks.
 Raid 5 has a 1:4 ratio, while Raid 6 has a 1:6 ratio
– Read existing data block, Read Parity 1, Read Parity 2, Calculate XOR
(parity) is not I/O, Write data, Write Parity 1, Write Parity 2
 Read commands are always 1:1
 For an application that has a requirement of 10,000 IOPS and a 50/50 read to
write ratio on a raid 6 volume:
– 5,000 read IOPS, translating into 25 physical disks
– 5,000 write IOPS translating into 30,000 back-end operations requiring 150
physical disks
– Total requirement is 175 physical disks just to support the performance
needed!
• Bandwidth
– This is based on the speed of the connections from the host to the array
as well as how much oversubscription is taking place within the SAN
Fabric.
Performance
• Bandwidth
– This is based on the speed of the connections from the host
to the array as well as how much oversubscription is taking
place within the SAN Fabric.
• Fibre Channel currently supports 16Gb full duplex, though
8Gb is more common
– That’s 3200 MBps in each direction, transferring 3GB of data
each second in one direction or 6GB of data bi-directionally.
• FCoE currently supports 10Gb, though the roadmap
includes 40Gb and 100Gb
– 10Gb is 2400 MBps in each direction, while 100Gb is 24000
MBps, 23.4GB per second!
• Besides the speed is the matter of oversubscription
Performance
• Oversubscription – The practice of providing less
aggregate bandwidth than the environment may add up to
• In an environment with 100 servers having dual 8Gb FC
connections we’d have a total of 1600Gb that is directed at
a storage array via some SAN switch
• The storage array may only have a total of eight 8Gb FC
connections for 64Gb aggregated bandwidth
• We have a ratio of 1600:64 or 25:1.
– This is done in networking all the time and is now a standard
in the storage world.
– The assumption is that there will never be a need for all 100
hosts to be transmitting 100% of the time their full bandwidth
Storage Futures
• Converged Infrastructure
– Datacenters designed today talk about converged
infrastructures
 One HP Blade enclosure can encompass servers, networking, and
storage components that need to be configured in a holistic
manner
 Virtualization has helped speed this convergence up, though
organizational design is usually still far behind.
– Storage arrays are beginning to support target based zoning
 The goal is to reduce the administration needed to configure a
host to storage mapping letting the storage array do more
intelligent administration without human intervention
19
Storage Futures
• Over the last few years storage has begun transitioning
from “big old iron” to distributed systems where data is
spread across multiple nodes for capacity and
performance.
– EMC Isilon
– HP Ibrix
– Nutanix
– Vmware VSAN
– Nexenta
• As always in IT, the pendulum is swinging back to the
distributed platforms for storage where each node hosts a
small amount of data instead of a big platform hosting all
of the data.
20
Storage Futures
• Data protection is maturing from traditional RAID levels such as
1, 1+0, 5, 6, etc
– RAID levels do offer additional protection however don’t protect
against corruption most of the time
– RAID levels also have performance implications that are usually
negative to the applications residing upon them
• These days the solution is to create multiple copies of files or
blocks based upon some rules
– Most of the large public cloud providers use this solution including
Amazon S3, or simple storage service
 It just so happens by default anything stored in S3 has three copies!
• The ‘utopia’ world is a place where each application has some
metadata that controls what protection level and performance
characteristics are required
– This would enable these applications to run internally or externally
yet provide the same experience regardless.
– This is the essence of SDDC, Software Defined Data Center. The
application requirements will define where they run without any
intervention.
21