CS300 Fall ’00

Download Report

Transcript CS300 Fall ’00

Self-Managing Cost Models
Shivnath Babu
Stanford University
Cost Models in Database Systems
• Conventional query optimization:
– Enumerate query plans
– Estimate physical cost of each plan (e.g., execution time,
total resources--CPU & I/O--required)
– Choose plan with minimum cost
• Estimation of physical cost is based on (operator)
cost models
• Very important to have fairly accurate cost models
2
Current Approach to Deriving Cost Models
• Trial and error
• Classic: Linear combination of CPU cost & the
number of disk blocks accessed
• Sequential Vs. Random accesses
– Data layout, data access pattern
• Buffer pool hit ratio
– Buffer pool size, data access pattern, number of
concurrent queries
• L1, L2, L3 cache hit ratio
3
Problems with the Current Approaches
• Growing importance of:
–
–
–
–
Autonomic Computing
Diverse data management needs in many new apps
Non-monolithic uses of database software
Better user experience (Ex: SLAs, progress bars)
• Current manual approach to cost model
management is a hindrance in this new world:
– Hard to port across system configurations (Ex: Local disk
Vs. RAID Vs. NAS Vs. Remote database) or workloads
– Complex, many lines of code, hard to maintain
– Assumptions (Ex: ignores interference across queries)
– Severely restricts auto-configuration and plug & play
4
Solution #1: Get Rid of Cost Models
• Use Eddies: no plan, no optimizer  no cost
models
• Jury is still out
5
#2: Automated Cost-Model Management
1. Bootstrapping--Start with:
•
•
•
An overall objective (Ex: minimize execution time)
A common-case model (Ex: CPU + Seq. I/O + Rand. I/O)
A list of other factors that could affect cost (Ex: cache
misses, #concurrent processes)
2. Detect deviations from model during execution
•
Ignore deviations resulting from stats. estimation errors
3. Troubleshoot online (challenging)
•
•
Does the deviation matter?
What is the cause? Use extra “probe queries”
4. Update model and test: Online what-if analysis
6
Epilogue
• Related work:
– In data integration (e.g., CORDS-MDBS, Garlic)
– In main-memory databases (e.g., Monet)
– Not comprehensive or fully automated
• Self-managing cost models:
– A big step toward Autonomic Database Systems
– Will improve re-usability of DB software
– Should improve overall performance and userexperience
7