What Influences Performance?

Download Report

Transcript What Influences Performance?

Avanade: Tid for oppetid
Bernt Lervik
Infrastructure Architect
Avanade
Avanade is the leading technology integrator
specialising in the Microsoft platform.
Our people help customers around the world
maximise their IT investment and create
comprehensive solutions that dive business
results.
Additional information can be found at
www.avanade.com
Agenda
Failover Clustering
Basic principles of failover clustering
Best practices for failover clustering
Best practices for geo clustering
Database Mirroring
How database mirroring works
What influences performance
Failover considerations
Deployment considerations
Best practices database mirroring
Basic Principle of Failover Clustering
Public Network
SQL Server Virtual Server
Microsoft
Cluster Server
Heartbeat
Node A
* Disks in a shared-nothing
configuration – each drive letter is
owned by only one node (at a time)
Microsoft
Cluster Server
Node B
Shared
Disk
Array*
Best Practices for Failover Clustering
Increasing reliability for site/server/database failure
Understand the technology – multiple nodes access a shared disk array
in a shared-nothing configuration
Pros
Provides redundancy at the server level
Supports automatic detection and automatic failover
Clients unaware of where SQL Server is running – SQL Server runs as a “Virtual
Server” on the Windows Cluster
Entire server is protected (there is ONLY one database/server, no external
dependencies)
Relatively easy once it’s set up and configured
Cons
Only *one* copy of the database
Single point of failure, failover when shared disk unaffected
No standby server (aka, no copy) for reporting
Proprietary hardware can be expensive
Only choose hardware from the Microsoft
Windows Server Catalog, Cluster Solutions:
http://www.microsoft.com/windows/catalog/server/default.aspx?subID
=22&xslt=categoryProduct&pgn=8b712458-b91c-4a7d-8695-23e9cd3ae95b
Best Practices for Geo Clustering
Increasing reliability for site/server/database
failure
Understand the Technology – based on a common hardware strategy
called remote mirroring, where disk activity is mirrored remotely to a
second copy of the database. Can cause the secondary system to be
corrupt if write order and block size are not preserved (be sure to choose
a supported configuration)
Pros
Provides zero to minimal data loss through redundant storage area
networks/arrays, providing redundancy at the system level
Removes single point of failure when compared to failover clustering
Cons
Performance may be impacted in synchronous (no data loss) configurations but
data loss is possible in high performance configurations (there’s your key trade-off
in a mirroring solution)
Proprietary hardware can be expensive
Only choose hardware from the Windows Server Catalog, Geographically
Dispersed Cluster Solution Category:
http://www.microsoft.com/windows/catalog/server/default.aspx?subID=22&xslt=categoryPro
duct&pgn=b55095f4-71f3-4b26-98b1-05f3a9506d0d
How Database Mirroring Works
No Mirroring
Application
Principal
Commit
3
1
SQL Server
2
>2
Log
Data
How Database Mirroring Works
Asynchronous Mirroring
Application
Principal
Mirror
Commit
3
1
>2
SQL Server
2
>2
Log
SQL Server
>>>2
Data
>>>2
>>2
Log
Data
How Database Mirroring Works
Synchronous Mirroring
Witness
Application
Principal
Mirror
Commit
5
1
2.1
SQL Server
2
4
>2
Log
SQL Server
Data
3
>3
Log
Data
Transaction Safety
Synchronous
SAFETY FULL (Default)
ALTER DATABASE <database name> SET SAFETY FULL
Guaranteed protection of data
High availability / High protection (with a witness)
Allows automatic failover (with a witness)
Asynchronous
SAFETY OFF
ALTER DATABASE <database name> SET SAFETY OFF
Potential loss of data in the event of failure
High Performance mode
Force service for failover
What Influences Performance?
Synchronous Mirroring
Witness
Application
Principal
The most important factor
is the log generation rate
Commit
Mirror
5
1
2.1
SQL Server
2
4
>2
Log
SQL Server
Data
3
>3
Log
Data
What Influences Performance ?
Log generation rate
Network latency and bandwidth
Transaction safety level
Number of concurrent user connections
Transaction size and volume
Hardware sizing (spindles spindles spindles)
Test Workloads
Characteristic
Workload1
Workload2
40
20
1000
20
Maximum think time between
transactions (sec)
4
0
Baseline (No Mirroring) %CPU
4
40
Baseline (No Mirroring) Transactions
/ sec
241
215
Baseline (No Mirroring) log
generation rate (KB / sec)
720
12000
Database size (GB)
Number of concurrent user
connections
Transaction Safety vs. Performance
Workload1
Transaction Throughput for Workload1
Transactions/sec
250
200
Marginal impact when log
generation rate is low
150
100
50
241
241
238
No Mirroring
Safety OFF
Safety FULL
0
Transaction Safety Levels
Transaction Safety vs. Performance
Workload2
Transaction Throughput for Workload2
More impact when log
generation rate is high
Transactions/sec
250
200
150
100
50
215
211
158
No Mirroring
Safety OFF
Safety FULL
0
Transaction Safety Levels
Impact of Network Latency
Synchronous with Workload1
250
12
200
10
8
150
6
100
4
50
2
0
0
2
14
20
50
100
200
Round Trip Time (ms)
Transactions/sec
Response Time (sec)
Response Time (sec)
Transactions / sec
Synchronous Mirroring with Network Latency for Workload1
Impact of Network Bandwidth
Synchronous with Workload1
Synchronous Mirroring with varied Network Bandwidth for
Workload1
20
250
15
200
150
10
100
5
50
0
0
1
10
100
1000
Network Bandwidth (Mbps)
Transactions/sec
Response Time (sec)
Response Time
(sec)
Transactions/sec
300
Failover Considerations
Failover is at a database level
No group / instance failover
Data outside the database is not propagated
Master: logins, user written stored procedure, etc.
MSDB: Jobs, histories, etc.
Events During an Automatic Failover
Database
available
Time to coordinate
with witness.
Failure
occurs
Fixed
overhead
Time to detect
failure.
Time
Failure
detected
Redo
Phase
Redo
Complete
Decide to
failover
Undo
Phase
The time from detecting the failure of the principal to the
time the mirror assumes the role of the principal is the
database failover time
Failure Detection for Automatic
Failover
Two different types of failures
SQL Server
Ping each other once a second
By default if 10 “pings” are missed, then declare a
failure
Outside SQL Server
Operating system
Network errors
IO errors
Process errors
Examples of Failures
Fast
SQL Server instance crashes
Endpoint closes port quickly
Network retry from partner quickly fails
OS says that the port is closed
Fast failure!
Failover begins in seconds
Examples of Failures
Not as fast
Catastrophic server failure
Power supply fails
Network retry from partner waits for timeout
SQL Server “ping” will most likely fail first
Failover begins in 10 seconds
Examples of Failures
Slower…
Someone pulls the log drive on principal
Pending IOs to the log drive queue up
SQL Server “pings” are working fine
After 20 seconds, SQL Server issues IO warning
After 40 seconds, SQL Server declares IO failure
Failover begins 40 seconds after log drive is
pulled
Examples of Failures
Either No failover or Fast failover
Database page fails checksum
Client connection is broken
Transaction rolls back automatically
No failover
Database page fails checksum
Transaction was in the middle of a rollback
Now the database is inconsistent
Database goes SUSPECT
Fast failover!!!
Issues with Extended Disconnects
Long Disconnects
Mirror unavailable → DISCONNECTED
Mirroring session suspended → SUSPENDED
Log records keep accumulating at the principal
Transaction log can NOT be truncated, even if you backup
transaction log
May eventually fill up the transaction log space and the database comes
to halt
Look at LOG_REUSE_WAIT_DESC column in sys.databases
RESUME the mirroring session, or break it (manually resynchronize via
backup/copy/restore, resume mirroring – just as when you setup
mirroring)
Deployment Considerations 2
Customer stories
Mission critical applications deploying synchronous with witness
For DR, customers deploy asynchronous with great success
Some customers want synchronous, but prefer manual failover
Multiple databases
Corporate IT policies demand human involvement
Start simple with asynchronous mirroring
Increase complexity as needed – one at a time
Turn on synchronous
Add a witness
Summary Performance Considerations
Applications generating more transaction log experience
higher performance impact with database mirroring
Applications with fewer connections experience more impact
on transaction throughput when synchronous mirroring is
turned on
Applications with smaller transaction size experience
relatively larger performance impact with database mirroring
Applications with low transaction log rate may sustain
acceptable throughput with slight reduction in network
bandwidth or slight increase in the network latency
Applications with high transaction log rate may experience
severe performance degradation with lower network
bandwidth or higher network latency
While using asynchronous mirroring, monitor send queue to
determine the possible data loss in the event of failure of the
principal
Best Practices for Database Mirroring
Increasing reliability for site/server/database
failure
Understand the technology – Communicates through a dedicated TCP
endpoint and continuously sends transactional information to the mirror
(copy) database
Pros
Provides zero to minimal data loss through a configured database mirroring partnership –
which includes a copy of the database, database redundancy
Removes single point of failure when compared to failover clustering
No hardware dependencies
Includes transparent client redirect for better client connection management
Cons
Performance may be impacted in synchronous (no data loss) configurations but data loss is
possible in high performance configurations (again, the key trade-off in any mirroring solution)
Understand the Configurations
High Availability-Synchronous mirroring, Automatic detection/failover, no
data loss
High Protection-Synchronous mirroring, manual failover, no data loss
High Performance-Asynchronous mirroring, manual failover,
some data loss possible
Summary Best Practices Recommendations
Start simple (asynchronous) and then gradually increase complexity to
synchronous without witness (therefore without automatic
detection/automatic failover) and then add the witness
If you are not interested in automatic failover, don’t setup a witness
Understand performance and availability requirements of the application
Synchronous database mirroring is “generally” not recommended for a
remote mirror
Keep the mirror prepared for a failover, but transferring the logins, jobs,
etc.
Test performance implications thoroughly before setting up in production
Test performance over network before deploying mirroring between two
geographically distant servers
Test failover with different failure scenarios