Storage Management 2003

Download Report

Transcript Storage Management 2003

Hosted by
Storage Network Designs for OLTP
Business Continuity
Marc Farley
President, Building Storage
Networks, Inc.
Hosted by
Agenda
 The Vendor Neutral Approach
 Overview of OLTP &High Availability
 I/O Redundancy Methods
 Storage Network Technologies
 Storage Networking for HA OLTP
Hosted by
Vendor Neutral Approach
 Generic terms, not vendor terms
 Assumed basic knowledge of SAN, NAS,
RAID
Hosted by
And now, for something
completely different…..
Hosted by
OLTP Environments
 Mission critical business applications
• Business in real-time
 Expensive equipment and software
 Aggressive performance objectives
 Highly skilled IT staff
• Hands-on computing operations
Hosted by
OLTP Database Software
 Oracle,
• 8i Oracle Parallel Server (OPS)
• 9i Real Application Cluster (RAC)
 IBM
• DB2 UDB
• Informix
 MS SQL Server
 Sybase, My SQL, others
Hosted by
OLTP OS Platforms
 IBM S/390 MVS
 Unix Systems
 Windows 2000+
 HA Linux
Hosted by
OLTP Requirements
 99.999% uptime
 Non-degrading response time
 High transaction rates
 Seamless scalability
 Cost relief
Hosted by
Database Storage Approaches
 Raw parititions
• Bypass OS I/O buffering
 File system
• Facilitates data management
 NFS mounted
• Offload DB server, NTAP + Oracle
Hosted by
ACID Properties of OLTP
Atomicity – No partial transactions
Consistency – All tables are in a consistent state
before and after a completed transaction
Isolation – One transaction cannot contaminate other
transactions
Durability – Transactions are complete only when the
database updates are written to disk storage
Hosted by
Challenges of OLTP
 Major systems integration effort
• Intricate tuning and monitoring
• Little tolerance for errors
 Complex data structures & relationships
 Time and sequence-sensitive processes
• Must be adhered to for data integrity
 Shifting workloads and bottlenecks
Hosted by
OLTP Database Files
 Data files
• Database data, tablespaces
 Redo log files, archive log files
• Reconstruct or rollback transactions
 Control files
• File layout information
Hosted by
OLTP Table Space Storage
 Use many spindles to distribute hot spots
 RAID 0+1 recommended
 File system recommended over raw
partitions
• Easier data management
Hosted by
Striping for Performance
RAID Controller (Microsecond performance)
Disk
Drive
Disk
Drive
Disk
Drive
Disk
Drive
Disk
Drive
Disk Drives (Millesecond performance)
From rotational latency and seek time
Disk
Drive
Hosted by
My Personal Favorite, RAID 0+1
RAID Controller
DiskDisk
Drive
Drive
1
DiskDisk
Drive
Drive
2
DiskDisk
Drive
Drive
DiskDisk
Drive
Drive
DiskDisk
Drive
Drive
3
4
5
Mirrored Pairs of Striped Members
Hosted by
OLTP Redo Log Storage
 Raw partitions recommended
• Sequential high speed writes
 Separate mirror pairs per log file group
 Capacity for 30 – 60 minutes of data
 Goal is to limit disk contention for current
and active log files
Hosted by
OLTP Archive Log Storage
 File system or NFS mounting is required
• NFS mounting is recommended
 Mirroring or RAID
 Goal is to have easy access in case they
are needed for reconstruction
Hosted by
High Availability
 The ability for a system or application to immediately
continue its mission after loss or damage to system
components, systems, facilities and data
Hosted by
Availability Threats
 Expected
•
Scaling limitations
 Processor
 Storage capacity
 Network
•
•
Consolidations
Product life cycles
 Unexpected
•
•
•
•
•
Failures
Bugs
Virus
Operator errors
Disasters
Hosted by
HA Engages All Elements
 Systems
• Application
 Network connections
• Network services
 Storage and I/O subsystems
Hosted by
Scoping the Risks
System
Network
Storage
Component
HBA
Cable
Disk drive
System
Server
Switch
Subsystem
Pathological
Virus attack
Service provider
Environmental
on platform
outage
media loss
Server rooms
All external
Total data loss
Site
gutted
communications
Hosted by
Managing the Risks
 Local copies of data
• Immediate availability
 (Remote) Nearby
• Immediate availability to several hours
 Remote Far away
• One to several days availability
Hosted by
Disaster/Availability Radii
Local
Remote Nearby
Remote Far Away
Hosted by
Nobody Expects…..
 Weird things to happen to them
 Disintegration of media
 Underground flooding through tunnels
 Fires in Telco switching centers
Hosted by
High Availability for OLTP
 Duplication of functions
• Without degrading performance
• Without risking data integrity
 Brute force techniques
 Automation and efficiency
 Cost is always an issue
• And high availability DOES cost
Hosted by
A Long Time Ago in a Job Not So
Far Away…………….
You must learn
the
Remember
Marc,
Redundancy.
Got
it Jim.
Whatever
Let’s
Again!
Eat!
Marc Skyfaller
Farley
to be isa master
there
only oneof
REDUNDANCY!
redundancy
concept: it if
you are going to be
a storage geek.
Jedi Jim Gast
Hosted by
Eventually, I Learned to
Appreciate His Teachings……
Don’t get the giant spicy
Polish for lunch – its too
much
thePoint
digestion
NSPoF
(Nofor
Single
of Failure)
•REDUNDANCY
Hosted by
OLTP HA Requires Complete Redundancy
Protection
 Client network
 Server systems and components
 Application modules
 I/O Channels and Networks
 Storage subsystems and components
 Data
Hosted by
A Quick Look At Clustered Storage
Shared Nothing
Each server
controls its
own storage
address space
Shared Everything
Both servers share
control of a
common storage
address space
Hosted by
Examples of OLTP Clusters
Microsoft SQL Server
Oracle 9.1 RAC
Data is exchanged
between servers
Failover
paths
only
Data is accessed
directly from
storage
Hosted by
One more time, with subsystems…
Microsoft SQL Server
Same subsystem
but different
address spaces
Oracle 9.1 RAC
All storage is
shared by all
cluster nodes
Hosted by
I/O Redundancy
 Host to subsystem
• Mirroring: Host to independent targets
• Multi-pathing: Host to a single target
 Subsystem to subsystem
• Store and forward:
 Local
 Remote
Hosted by
Disk Mirroring:
Redundant storage targets
Independent, identically sized storage
address spaces
One controller
Two controllers
Hosted by
Disk Mirroring: I/Os to 2 Targets
 “Brute force” redundancy: fast and simple
 Both read and write I/Os
• Overlapped reads for performance
 Local connections
 Limited capacity*
 I/O Bottlenecks* for random I/O activity
•
* if targets are disk drives
Hosted by
Disk Mirroring for Redo Log Files
 Log files are a common bottleneck
 Use raw partitions
 Redundancy is required
• Mirroring is adequate
 Use highest RPM with lowest seek times
 Put on a separate channel from database I/O
 Use separate mirrored pairs per group
Hosted by
Mirroring to Storage Subsystems
Storage
Subsystem
Independent,
identically sized
storage address spaces
Two controllers
Storage
Subsystem
Hosted by
Mirroring to Subsystems
 Targets are subsystems, not disks
• Separate address spaces
 Capacity scales to subsystem max
 Double level redundancy
• Mirroring plus RAID
 Multiple disk spindles reduces I/O
bottlenecks
Hosted by
Disk Mirroring Datafiles from Host to
Storage Subsystems
 Disk mirroring + subsystem RAID
 Excellent capacity scaling
 Adjacent and across campus/town
• One subsystem outside site radius
 Requires longer distance cabling
 Reads and writes both transmitted
Hosted by
Multi-Pathing: Redundant Paths Between
a Host & Subsystem
Pathing software determines
that a transmission error occurs
& switches to a redundant path
Application
data volume
Hosted by
Multi-pathing vs Mirroring
 Mirroring assumes independent, but
similar storage targets
 Multi-pathing assumes multiple paths to
the exact same target
 Mirroring can use a single HBA, multipathing needs two HBAs
Hosted by
Path Failures
1. HBA problem
2. Link, switch or network problem
3. Subsystem controller problem
Application
data volume
Hosted by
Transmission failures recognized
after SCSI timeouts are exceeded
I/O sent to storage
No ack received
The I/Os is retried and eventually an
error is passed back to the process
that issued the I/O
Hosted by
Path Failover for OLTP I/O
 Redundant path resources take over activities for a failed
path to sustain operations without disrupting service or
risking data integrity
Hosted by
Store and Forward
Independent, identically sized
storage address spaces
Host
A
B
Hosted by
Store & Forward: One Host I/O and Two
Copies of Data
 Only real option for remote copies
 Does not forward read I/Os
 Proprietary protocols and methods
• Standards are emerging ie. FC/IP
 First step to storage snapshots
Hosted by
Store and Forward: Acknowledgements
Asynchronous
I/O ACK
Forward
Synchronous
I/O
ACK
Forward
ACK
A
B
A
B
Hosted by
Trade-offs with
Acknowledgement Handling
 Synchronous
• Always preferred
• Slowest performance
• State of copy is precise
 Asynchronous:
• Fastest performance
• Least precise knowledge of copy status
Hosted by
Store & Forward:
Local and Remote Copies
 Local & nearby copy techniques
•
•
Synchronous
Fiber optic cabling, optical/DWDM services
 Remote-far away copy techniques
•
•
Asynchronous
ATM gateways, OC-12 or less, FC/IP
Hosted by
Mirroring vs Synchronous Store and
Forward for Local & Nearby Copies
 Mirroring
•
•
•
•
•
Async I/O
Reads and writes
No snapshot tie-in
Uses more host slots
 Store and Forward
•
•
•
•
Async or Sync I/O
Writes only
Snapshot ready
May conserve host I/O
slots
Least costly
•
Most costly
Hosted by
Combining Mirroring with Store and Forward
Store and
Forward Radius
Local
Nearby
Mirroring Radius
Remote Far
Away
Hosted by
Data Redundancy for OLTP
 Backup
 Snapshots
 Delta (log files)
Hosted by
Backup for OLTP
 A whole subject unto itself
 Disaster recovery primarily
 Cold? Who can afford to do that
anymore?
 Hot – put DB in backup mode
 Backup snapshot image of data
Hosted by
Subsystem Snapshots for OLTP
1. Flush host buffers
(sync, sync)
Database
Server
Disk
Storage
Subsystem
A
2. Create
Snapshot
Disk
Storage
Subsystem
B
Disk
Storage
Subsystem
c
Hosted by
Logical Snapshots for OLTP
1. The address space
is mapped
2. First
updates
v
Overwritten data
locations are not
returned to the free space
pool. (Undelete)
3. Second
updates
Hosted by
Delta Redundancy with Log Files
 Recording of all transaction activities
 Roll forward, bring up to date
 Roll Backward, go to known good state
 Terrific tool for remote redundancy
 Not HA
 Process cannot have holes in it
Hosted by
Remote Redundancy w/ Log Files
-1
d(x) = f(x) – f(x-1)
Latest Redo Log File
f(x-1)
Previous Instance
f(x)
Current to Log File
Switch Checkpoint
Hosted by
He never
does anything
except eat
and sleep
How come I
always end
up doing all
the work?
Managing
Redundancy isRedundancy
a way of life
And now, some
is Hard Work
thoughts from
our sponsor…..
Hosted by
SAN Considerations
 Fabrics and SAN Islands
 Zoning
 Switches and directors
 Multiplexing (oversubscribing)
 Security
Hosted by
Fabrics ARE the SAN Environment
 One size does not fit all applications
 Larger fabrics carry more risks
 VSANs are probably a good idea
 Only use switches supporting hot, stateful
firmware upgrades
Hosted by
SAN Islands May be Best for OLTP
 Most risk averse approach
 Dual fabrics, one fabric per I/O path
 Switch problems do not cascade
 But, higher management costs
Hosted by
Zoning & OLTP
 All ports defined to zones
• No rogue ports and zombie zones
 Restrict access to current servers
• Need-to-access only
Hosted by
Switches and Directors
 Redundancy eats slots and ports
• Pathing, mirroring
• Separate channels for data and logs
 Avoid traversing ISLs, if possible
• Added latency and blocking potential
• Trunking must have NSPoF
Hosted by
Security
 Admin security for an OLTP SAN should
be as strong as possible
• No monkey business
 No default passwords left
 WAN encryption of log files
Hosted by
Recommendations:
 Determine OLTP availability needs
• Where copies should be, time to access
 Match storage network implementation to
DB file types
 Develop availability-driven policies
• Equipment
• Processes