Storage Management 2003
Download
Report
Transcript Storage Management 2003
Hosted by
Storage Network Designs for OLTP
Business Continuity
Marc Farley
President, Building Storage
Networks, Inc.
Hosted by
Agenda
The Vendor Neutral Approach
Overview of OLTP &High Availability
I/O Redundancy Methods
Storage Network Technologies
Storage Networking for HA OLTP
Hosted by
Vendor Neutral Approach
Generic terms, not vendor terms
Assumed basic knowledge of SAN, NAS,
RAID
Hosted by
And now, for something
completely different…..
Hosted by
OLTP Environments
Mission critical business applications
• Business in real-time
Expensive equipment and software
Aggressive performance objectives
Highly skilled IT staff
• Hands-on computing operations
Hosted by
OLTP Database Software
Oracle,
• 8i Oracle Parallel Server (OPS)
• 9i Real Application Cluster (RAC)
IBM
• DB2 UDB
• Informix
MS SQL Server
Sybase, My SQL, others
Hosted by
OLTP OS Platforms
IBM S/390 MVS
Unix Systems
Windows 2000+
HA Linux
Hosted by
OLTP Requirements
99.999% uptime
Non-degrading response time
High transaction rates
Seamless scalability
Cost relief
Hosted by
Database Storage Approaches
Raw parititions
• Bypass OS I/O buffering
File system
• Facilitates data management
NFS mounted
• Offload DB server, NTAP + Oracle
Hosted by
ACID Properties of OLTP
Atomicity – No partial transactions
Consistency – All tables are in a consistent state
before and after a completed transaction
Isolation – One transaction cannot contaminate other
transactions
Durability – Transactions are complete only when the
database updates are written to disk storage
Hosted by
Challenges of OLTP
Major systems integration effort
• Intricate tuning and monitoring
• Little tolerance for errors
Complex data structures & relationships
Time and sequence-sensitive processes
• Must be adhered to for data integrity
Shifting workloads and bottlenecks
Hosted by
OLTP Database Files
Data files
• Database data, tablespaces
Redo log files, archive log files
• Reconstruct or rollback transactions
Control files
• File layout information
Hosted by
OLTP Table Space Storage
Use many spindles to distribute hot spots
RAID 0+1 recommended
File system recommended over raw
partitions
• Easier data management
Hosted by
Striping for Performance
RAID Controller (Microsecond performance)
Disk
Drive
Disk
Drive
Disk
Drive
Disk
Drive
Disk
Drive
Disk Drives (Millesecond performance)
From rotational latency and seek time
Disk
Drive
Hosted by
My Personal Favorite, RAID 0+1
RAID Controller
DiskDisk
Drive
Drive
1
DiskDisk
Drive
Drive
2
DiskDisk
Drive
Drive
DiskDisk
Drive
Drive
DiskDisk
Drive
Drive
3
4
5
Mirrored Pairs of Striped Members
Hosted by
OLTP Redo Log Storage
Raw partitions recommended
• Sequential high speed writes
Separate mirror pairs per log file group
Capacity for 30 – 60 minutes of data
Goal is to limit disk contention for current
and active log files
Hosted by
OLTP Archive Log Storage
File system or NFS mounting is required
• NFS mounting is recommended
Mirroring or RAID
Goal is to have easy access in case they
are needed for reconstruction
Hosted by
High Availability
The ability for a system or application to immediately
continue its mission after loss or damage to system
components, systems, facilities and data
Hosted by
Availability Threats
Expected
•
Scaling limitations
Processor
Storage capacity
Network
•
•
Consolidations
Product life cycles
Unexpected
•
•
•
•
•
Failures
Bugs
Virus
Operator errors
Disasters
Hosted by
HA Engages All Elements
Systems
• Application
Network connections
• Network services
Storage and I/O subsystems
Hosted by
Scoping the Risks
System
Network
Storage
Component
HBA
Cable
Disk drive
System
Server
Switch
Subsystem
Pathological
Virus attack
Service provider
Environmental
on platform
outage
media loss
Server rooms
All external
Total data loss
Site
gutted
communications
Hosted by
Managing the Risks
Local copies of data
• Immediate availability
(Remote) Nearby
• Immediate availability to several hours
Remote Far away
• One to several days availability
Hosted by
Disaster/Availability Radii
Local
Remote Nearby
Remote Far Away
Hosted by
Nobody Expects…..
Weird things to happen to them
Disintegration of media
Underground flooding through tunnels
Fires in Telco switching centers
Hosted by
High Availability for OLTP
Duplication of functions
• Without degrading performance
• Without risking data integrity
Brute force techniques
Automation and efficiency
Cost is always an issue
• And high availability DOES cost
Hosted by
A Long Time Ago in a Job Not So
Far Away…………….
You must learn
the
Remember
Marc,
Redundancy.
Got
it Jim.
Whatever
Let’s
Again!
Eat!
Marc Skyfaller
Farley
to be isa master
there
only oneof
REDUNDANCY!
redundancy
concept: it if
you are going to be
a storage geek.
Jedi Jim Gast
Hosted by
Eventually, I Learned to
Appreciate His Teachings……
Don’t get the giant spicy
Polish for lunch – its too
much
thePoint
digestion
NSPoF
(Nofor
Single
of Failure)
•REDUNDANCY
Hosted by
OLTP HA Requires Complete Redundancy
Protection
Client network
Server systems and components
Application modules
I/O Channels and Networks
Storage subsystems and components
Data
Hosted by
A Quick Look At Clustered Storage
Shared Nothing
Each server
controls its
own storage
address space
Shared Everything
Both servers share
control of a
common storage
address space
Hosted by
Examples of OLTP Clusters
Microsoft SQL Server
Oracle 9.1 RAC
Data is exchanged
between servers
Failover
paths
only
Data is accessed
directly from
storage
Hosted by
One more time, with subsystems…
Microsoft SQL Server
Same subsystem
but different
address spaces
Oracle 9.1 RAC
All storage is
shared by all
cluster nodes
Hosted by
I/O Redundancy
Host to subsystem
• Mirroring: Host to independent targets
• Multi-pathing: Host to a single target
Subsystem to subsystem
• Store and forward:
Local
Remote
Hosted by
Disk Mirroring:
Redundant storage targets
Independent, identically sized storage
address spaces
One controller
Two controllers
Hosted by
Disk Mirroring: I/Os to 2 Targets
“Brute force” redundancy: fast and simple
Both read and write I/Os
• Overlapped reads for performance
Local connections
Limited capacity*
I/O Bottlenecks* for random I/O activity
•
* if targets are disk drives
Hosted by
Disk Mirroring for Redo Log Files
Log files are a common bottleneck
Use raw partitions
Redundancy is required
• Mirroring is adequate
Use highest RPM with lowest seek times
Put on a separate channel from database I/O
Use separate mirrored pairs per group
Hosted by
Mirroring to Storage Subsystems
Storage
Subsystem
Independent,
identically sized
storage address spaces
Two controllers
Storage
Subsystem
Hosted by
Mirroring to Subsystems
Targets are subsystems, not disks
• Separate address spaces
Capacity scales to subsystem max
Double level redundancy
• Mirroring plus RAID
Multiple disk spindles reduces I/O
bottlenecks
Hosted by
Disk Mirroring Datafiles from Host to
Storage Subsystems
Disk mirroring + subsystem RAID
Excellent capacity scaling
Adjacent and across campus/town
• One subsystem outside site radius
Requires longer distance cabling
Reads and writes both transmitted
Hosted by
Multi-Pathing: Redundant Paths Between
a Host & Subsystem
Pathing software determines
that a transmission error occurs
& switches to a redundant path
Application
data volume
Hosted by
Multi-pathing vs Mirroring
Mirroring assumes independent, but
similar storage targets
Multi-pathing assumes multiple paths to
the exact same target
Mirroring can use a single HBA, multipathing needs two HBAs
Hosted by
Path Failures
1. HBA problem
2. Link, switch or network problem
3. Subsystem controller problem
Application
data volume
Hosted by
Transmission failures recognized
after SCSI timeouts are exceeded
I/O sent to storage
No ack received
The I/Os is retried and eventually an
error is passed back to the process
that issued the I/O
Hosted by
Path Failover for OLTP I/O
Redundant path resources take over activities for a failed
path to sustain operations without disrupting service or
risking data integrity
Hosted by
Store and Forward
Independent, identically sized
storage address spaces
Host
A
B
Hosted by
Store & Forward: One Host I/O and Two
Copies of Data
Only real option for remote copies
Does not forward read I/Os
Proprietary protocols and methods
• Standards are emerging ie. FC/IP
First step to storage snapshots
Hosted by
Store and Forward: Acknowledgements
Asynchronous
I/O ACK
Forward
Synchronous
I/O
ACK
Forward
ACK
A
B
A
B
Hosted by
Trade-offs with
Acknowledgement Handling
Synchronous
• Always preferred
• Slowest performance
• State of copy is precise
Asynchronous:
• Fastest performance
• Least precise knowledge of copy status
Hosted by
Store & Forward:
Local and Remote Copies
Local & nearby copy techniques
•
•
Synchronous
Fiber optic cabling, optical/DWDM services
Remote-far away copy techniques
•
•
Asynchronous
ATM gateways, OC-12 or less, FC/IP
Hosted by
Mirroring vs Synchronous Store and
Forward for Local & Nearby Copies
Mirroring
•
•
•
•
•
Async I/O
Reads and writes
No snapshot tie-in
Uses more host slots
Store and Forward
•
•
•
•
Async or Sync I/O
Writes only
Snapshot ready
May conserve host I/O
slots
Least costly
•
Most costly
Hosted by
Combining Mirroring with Store and Forward
Store and
Forward Radius
Local
Nearby
Mirroring Radius
Remote Far
Away
Hosted by
Data Redundancy for OLTP
Backup
Snapshots
Delta (log files)
Hosted by
Backup for OLTP
A whole subject unto itself
Disaster recovery primarily
Cold? Who can afford to do that
anymore?
Hot – put DB in backup mode
Backup snapshot image of data
Hosted by
Subsystem Snapshots for OLTP
1. Flush host buffers
(sync, sync)
Database
Server
Disk
Storage
Subsystem
A
2. Create
Snapshot
Disk
Storage
Subsystem
B
Disk
Storage
Subsystem
c
Hosted by
Logical Snapshots for OLTP
1. The address space
is mapped
2. First
updates
v
Overwritten data
locations are not
returned to the free space
pool. (Undelete)
3. Second
updates
Hosted by
Delta Redundancy with Log Files
Recording of all transaction activities
Roll forward, bring up to date
Roll Backward, go to known good state
Terrific tool for remote redundancy
Not HA
Process cannot have holes in it
Hosted by
Remote Redundancy w/ Log Files
-1
d(x) = f(x) – f(x-1)
Latest Redo Log File
f(x-1)
Previous Instance
f(x)
Current to Log File
Switch Checkpoint
Hosted by
He never
does anything
except eat
and sleep
How come I
always end
up doing all
the work?
Managing
Redundancy isRedundancy
a way of life
And now, some
is Hard Work
thoughts from
our sponsor…..
Hosted by
SAN Considerations
Fabrics and SAN Islands
Zoning
Switches and directors
Multiplexing (oversubscribing)
Security
Hosted by
Fabrics ARE the SAN Environment
One size does not fit all applications
Larger fabrics carry more risks
VSANs are probably a good idea
Only use switches supporting hot, stateful
firmware upgrades
Hosted by
SAN Islands May be Best for OLTP
Most risk averse approach
Dual fabrics, one fabric per I/O path
Switch problems do not cascade
But, higher management costs
Hosted by
Zoning & OLTP
All ports defined to zones
• No rogue ports and zombie zones
Restrict access to current servers
• Need-to-access only
Hosted by
Switches and Directors
Redundancy eats slots and ports
• Pathing, mirroring
• Separate channels for data and logs
Avoid traversing ISLs, if possible
• Added latency and blocking potential
• Trunking must have NSPoF
Hosted by
Security
Admin security for an OLTP SAN should
be as strong as possible
• No monkey business
No default passwords left
WAN encryption of log files
Hosted by
Recommendations:
Determine OLTP availability needs
• Where copies should be, time to access
Match storage network implementation to
DB file types
Develop availability-driven policies
• Equipment
• Processes