Local Checkpoint Tuning - Conferences

Download Report

Transcript Local Checkpoint Tuning - Conferences

0 to 60 in 3.1
Presented by,
MySQL AB® & O’Reilly Media, Inc.
Tyler Carlton
Cory Sessions
<Insert funny joke here>
Presented by,
MySQL AB® & O’Reilly Media, Inc.
The Project
 Medium sized demographics data mining
project
 1,700,000+ User base
 Hundreds of data points per user
“Legacy” System – Why Upgrade?
+
 Main DB (External Users)
 Offline backup (Internal Users)
 Weekly manual copy backups
 Max of 3 simultaneous data
pulls
 8hr+ data pull times for
complex data pulls
 Random index corruption
Notes:
Smaller is Better
On average, CPU usage with MySQL was
20% lower than our old database solution.
Why We Chose MySQL Cluster




Scalable
Distributed processing
5 – 9’s Reliability
Instant data availability
between internal
& external users
What We Built – NDB Data Nodes
 8 Node NDB cluster
Dual Core 2 Quad 1.8 ghz
16 Gig ram (Data memory)
6x Raid 10 SAS 15k RPM drives
What We Built – API & MGMT Nodes
 3 API nodes + 1 management node
Dual Core 2 Quad 1.8 ghz
8 Gig ram
300 gig 7200rpm (Raid 0)
NDB Issues with a Large Data Set
 NDB load times
Loading from backup: ~ 1 hour
Restarting NDB nodes: ~ 1 hour
Note: Load times differ depending on your data size
NDB Issues with a Large Data Set
 Indexing Issues
Force index (NDB picks wrong)
Index creation/modification order matters (Seriously!)
 Local Checkpoint Tuning
TimeBetweenLocalCheckpoints - 20 means 4MB (4
× 220) of write operations
NoOfFragmentLogFiles – No. of 4 x 16MB files
None deleted until 3 local checkpoints
On startup: Local checkpoint buffers would overrun
RTFM (two, maybe three times)
NDB Network Issues
 Network transport packet size
Buffer would fill and overrun
This caused nodes to miss their heartbeats and drop
 This would happen when:
A backup was running
A local checkpoint was running at the same time
 Solved by : Increasing network packet buffer
Issues - IN Statements
 IN statements die with
engine_condition_pushdown=ON with a set of
apx. 10,000 or more. (caused with zip codes)
 Really need engine_condition_pushdown=ON,
but this broke it for us, so… we had to disable it.
Structuring Apps: Redundency
 Redundant power supply + dual power sources
 Port trunking w/ redundant Gig-E switches
 # NDB Replicas: 2 (2x4 setup) 64 gig max
data size
 MySQL (API Nodes ) Heads: Load balanced
with automatic fail over
Structuring Apps: Internal Apps
 Ultimate goal: Offload the data intensive
processing to the MySQL nodes
The Good Stuff: Stats!
Queries per Second (over 20 days)
 Average 1100-1500 Queries / Sec
during our peak times
 Average 250 Queries / Sec
Website Traffic
Stats for March 2008
Net Usage: NDB Node
 All NDB data nodes have nearly identical
network bandwidth usage
MySQL
(
API
)
Nodes

use about 9 MBs max

under our current
structure
Totaling 75 MBs
during peak(600 Mbs)
Monitoring & Maintenance
 SNMP Monitoring: CPU, Network, Memory, Load, Disk
 Cron Scripts:
Node status &
Node down notification
Backups
Database maintenance routines
MySQL Clustering book provided the base the scripts
Net Usage: Next steps…
 Dolphin NIC Testing
 4 node test cluster
 4 x overall performance
 Brand new patch to
handle automatic Ethernet
failover / Dolphin Fail Over
( beta as of March 28 )
Questions?
Contact Information
 Tyler Carlton
www.qdial.com
[email protected]
 Cory Sessions
CorySessions.com
OrangeSoda.com