Transcript Document
Storage
1
Storage 101
What is a storage array?
Agenda
• Definitions
• What is a SAN Fabric
• What is a storage array
• Front-end connections
• Controllers
• Back-end connections
• Physical Disks
• Management
• Performance
• Future – Distributed storage
Definitions
• SAN – Storage Area Network
– This is generally used as a catch all term for all the following definitions
– For storage personnel SAN does NOT equal storage array
• LUN – Logical Unit Number, also known as a volume
• WWN – World Wide Name
•
•
•
•
•
•
•
– MAC address for storage networks
Fabric – Network that connects hosts to storage
iSCSI – Internet SCSI
SCSI –
FC – Fibre Channel
FCoE – Fibre Channel over Ethernet
FCIP – Fibre Channel over IP
Storage Array – Storage Device that provides block level access to
volumes
• DAS/DASD – Direct Attached Storage
– Storage directly attached to a server without any network
• NAS – Network attached Storage
– Storage device that provides file level access to volumes
• RAID – Redundant array of Independent Disks
– A way to combine multiple physical disks into a logical entity providing
different performance and protection characteristics.
What is a SAN fabric
• A network comprising hosts, storage arrays they access,
and storage switches that provide the network connectivity
5
6
7
SAN Fabric Details
• A SAN Fabric has hosts that connect to the network
– Each host has a physical connection and some logical
addresses
pWWN (Port WWN) is the equivalent MAC address for the port on
the host that is connected to the network
FCID is a dynamic address that represents the connection as well
– Only HP-UX 11v2 and below use this
– Typically hosts connect into some storage switch
These look like traditional network switches in many ways and
operate the same way.
These switches will contain both host ports and storage ports, or
in the storage world, initiators and targets
– Storage arrays that provide storage also connect into these
switches to provide the full network
8
What is a storage array?
• A storage array is a system that consists of components
that provide storage available for consumption
– The components are front-end ports, controllers, back-end
ports, and physical disk drives
10
Front-end connections
• Front-end connections are used for individual hosts to
connect to the storage array and utilize the volumes
available
– This can be directly connected in a small or medium size
SAN, or in a DAS environment
• The physical transport mechanism can be fibre or copper
• The logical transport protocols can be block level protocols
such as iSCSI, FC, or FCoE
– Some arrays also support file level protocols as well such as
NAS devices
• The larger arrays tend to have more front-end connections
to aggregate bandwidth and provide load balancing
• Volumes are typically presented via one or more front-end
connections to hosts
Controllers
• Controllers are the brains that translate the request from
the front-end ports and determine how to fulfill the request
• Controllers run code optimized for moving data and
performing mathematical calculations needed to support
RAID levels
• Controllers also have a certain amount of on-board
memory, or cache, to help reduce the amount of data that
has to come from spinning disks.
– Many arrays perform some level of read-ahead caching and
write caching to optimize performance
• They also have some diagnostics routines and
management in order to support the operations of the
array.
Back-end connections
• From the controllers themselves to the physical disk
shelves or disks there are back-end connections.
– These send actual commands to the disks commanding them
to retrieve or write blocks of data.
– These connections are usually transparent to all but the most
sophisticated storage engineer.
Often times these have specific fan-out ratios where each disk
shelf may have two or four connections and split the bandwidth
available in some way.
– Back-end connections are rarely a bottleneck
Physical Disks
• These days physical disks come in all shapes and sizes
– Spinning drives come in capacities of anywhere from 146GB
to 3TB, with the space increasing year over year (though not
performance)
These drives also come in various rotational speeds anywhere
from 5400 RPM in a laptop drive to 15000 RPM in an enterprise
class drive, which directly affects performance
– Non Spinning drives, also known as SSD’s, come in
capacities that don’t yet match spinning drives, though there
are SSD cards that have up to 960GB of storage space
available.
– These physical disks directly impact the performance of the
storage array system, and are usually the bottleneck for most
enterprise class storage systems.
Provisioning
• Provisioning storage is a multi-step process
– Configure the host with any software including multi-path
support
– Alias the host port WWN
– Zone the host port alias to a storage array WWN
– Activate update zone information
– Create host representation on storage array
– Create volume on storage array
– Present/LUN Mask volume to correct host
– Format volume for use
15
Performance
• There are many statistics you can use to monitor your storage devices,
however there tend to be two key ones that directly impact
performance more than most.
• IOPS – Input/Output Operations Per Second
– This is based on the number of disks that support the volume being
used and the RAID level of the volume
15k RPM disks provide 200 IOPS raw without any RAID write penalty
Raid 1 has a 1:2 ratio for writes. For every 1 write command sent to the array,
2 commands are sent to the disks.
Raid 5 has a 1:4 ratio, while Raid 6 has a 1:6 ratio
– Read existing data block, Read Parity 1, Read Parity 2, Calculate XOR
(parity) is not I/O, Write data, Write Parity 1, Write Parity 2
Read commands are always 1:1
For an application that has a requirement of 10,000 IOPS and a 50/50 read to
write ratio on a raid 6 volume:
– 5,000 read IOPS, translating into 25 physical disks
– 5,000 write IOPS translating into 30,000 back-end operations requiring 150
physical disks
– Total requirement is 175 physical disks just to support the performance
needed!
• Bandwidth
– This is based on the speed of the connections from the host to the array
as well as how much oversubscription is taking place within the SAN
Fabric.
Performance
• Bandwidth
– This is based on the speed of the connections from the host
to the array as well as how much oversubscription is taking
place within the SAN Fabric.
• Fibre Channel currently supports 16Gb full duplex, though
8Gb is more common
– That’s 3200 MBps in each direction, transferring 3GB of data
each second in one direction or 6GB of data bi-directionally.
• FCoE currently supports 10Gb, though the roadmap
includes 40Gb and 100Gb
– 10Gb is 2400 MBps in each direction, while 100Gb is 24000
MBps, 23.4GB per second!
• Besides the speed is the matter of oversubscription
Performance
• Oversubscription – The practice of providing less
aggregate bandwidth than the environment may add up to
• In an environment with 100 servers having dual 8Gb FC
connections we’d have a total of 1600Gb that is directed at
a storage array via some SAN switch
• The storage array may only have a total of eight 8Gb FC
connections for 64Gb aggregated bandwidth
• We have a ratio of 1600:64 or 25:1.
– This is done in networking all the time and is now a standard
in the storage world.
– The assumption is that there will never be a need for all 100
hosts to be transmitting 100% of the time their full bandwidth
Storage Futures
• Converged Infrastructure
– Datacenters designed today talk about converged
infrastructures
One HP Blade enclosure can encompass servers, networking, and
storage components that need to be configured in a holistic
manner
Virtualization has helped speed this convergence up, though
organizational design is usually still far behind.
– Storage arrays are beginning to support target based zoning
The goal is to reduce the administration needed to configure a
host to storage mapping letting the storage array do more
intelligent administration without human intervention
19
Storage Futures
• Over the last few years storage has begun transitioning
from “big old iron” to distributed systems where data is
spread across multiple nodes for capacity and
performance.
– EMC Isilon
– HP Ibrix
– Nutanix
– Vmware VSAN
– Nexenta
• As always in IT, the pendulum is swinging back to the
distributed platforms for storage where each node hosts a
small amount of data instead of a big platform hosting all
of the data.
20
Storage Futures
• Data protection is maturing from traditional RAID levels such as
1, 1+0, 5, 6, etc
– RAID levels do offer additional protection however don’t protect
against corruption most of the time
– RAID levels also have performance implications that are usually
negative to the applications residing upon them
• These days the solution is to create multiple copies of files or
blocks based upon some rules
– Most of the large public cloud providers use this solution including
Amazon S3, or simple storage service
It just so happens by default anything stored in S3 has three copies!
• The ‘utopia’ world is a place where each application has some
metadata that controls what protection level and performance
characteristics are required
– This would enable these applications to run internally or externally
yet provide the same experience regardless.
– This is the essence of SDDC, Software Defined Data Center. The
application requirements will define where they run without any
intervention.
21