Slides - University of Pennsylvania

Download Report

Transcript Slides - University of Pennsylvania

Self-Tuning and Self-Configuring Systems
Zachary G. Ives
University of Pennsylvania
CIS 650 – Database & Information Systems
March 16, 2005
Administrivia
No class 3/21 – out of town
 Read and summarize the Natix paper for Wednesday 3/23
 Tomorrow, 3PM, Levine 101:
Shuchi Chawla, CMU, Path-Planning Algorithms
 Tomorrow, 4:30PM, Levine 307:
Sihem Amer-Yahia, AT&T Labs:
“Full-Text Querying in XML: A Little Bit of Standards and
Lot's o' Research”
 Tuesday, 3/21, Levine 101:
Nick Feamster, MIT: Robust Internet Routing
2
Today’s Trivia Question
3
Midterm Mini-Retrospective
 We’ve now seen many of the major issues in
databases
 … Which are?
 Mike Stonebraker thinks we’ve run out of good
things to work on
 Is he right?
 What problems should people be working on now?
4
A Few of My Thoughts
(Please chime in with your own!)
 More automation
 Different data types
 “Schema mostly”, text, …
 Semantic reconciliation and mapping
 Perhaps we’ll never solve this, but we can clearly do better
 Uncertainty and inconsistency
 Probabilities, inconsistencies, different perspectives, …
 Truly scalable data sharing
 Can’t we share at the level of the Web?
 Two-way data exchange
 Streams and sensors
5
Self-Tuning Systems
 Databases are complicated!
 Schema design is hard
 Lots of “knobs” to tweak
 Need appropriate information
 Does the DB approach give us more ability to “selftune” than some other approach (e.g., Java)?
6
What Would We Like to Auto-Tune?






Query optimization – statistics, bad decisions, …
The schema itself?
Indices
Auxiliary materialized views
Data partitioning
Perhaps logging?
7
What Are The Challenges in
Building Adaptive Systems?
 Really, a generalization of those in adaptive query
processing
 Information gathering – how do we get it?
 Extrapolating – how do we do this accurately and
efficiently?
 Sampling or piloting
 Minimizing the impact of mistakes if they happen
 Using app-specific knowledge
8
Who’s Interested in these Problems?
 Oracle:
 Materialized view “wizard”
 Microsoft “AutoAdmin”:
 Index selection, materialized view selection
 Stats on materialized views
 Database layout
 IBM SMART (Self-Managing And Resource Tuning):




Histogram tuning (“LEO” learning optimizer)
Partitioning in clusters
Index selection
Adaptive query processing
9
A Particular Instance:
Microsoft’s Index Tuning Wizard
 Why not let the system choose the best index
combination(s) for a workload
 The basic idea:
 Log a whole bunch of queries that are frequently run
 See what set of indices is best
 Why is this hard? Why not index everything?
 Create these indices with little or no human input
10
Possible Approaches
 Obviously: only consider indices that would be useful
 The optimizer can “tell” which indices it might use in executing a
query
 But that continues to be a lot of indices!
 Can exhaustively compare all possible indices
 Note that indices can interact (esp. for updates)
 How do we compare costs and benefits of indices?
 Execute for real
 Use optimizer cost model with whatever stats we have
 Gather some stats (e.g., build histograms, sample) and use cost model
11
SQL Server Architecture
12
Their Approach in More Detail
 For a workload of n queries:
 Generate a separate workload with each query
 Evaluate the candidate indices for this query to find the best
“configuration” – limited to 2 indices, 2 tables, single joins
 Candidate index set for workload is the union of all configurations
 Too expensive to enumerate all; use a greedy algorithm:
 Exhaustively enumerate (using optimizer) best m-index configuration
 Pick a new index I to add, which seems to save cost relative to adding
some other I’ or to the current cost
 Repeat until we’ve added “enough” k indices
 “Despite interaction among indices, the largest cost reductions often
result from indices that are good candidates by themselves”
 They iteratively expand to 2-column indices – index on
leading column must be desirable for this to be desirable
13
How Many Candidates?
14
Savings Due to Considering Single Joins
15
Compared to Baseline
Baseline considers all indices during enumeration, with
greedy algorithm mentioned previously
16
Further Enhancements
 Use the tool for “what-if” analysis
 What if a table grows by a substantial amount?
 Supplement with extra info gathered from real query
execution
 Maybe we can “tweak” estimates for certain selectivities
 An attempt to compensate for the “exponential error”
problem
17
Where Next?
 Perhaps we can go further, automating database
design itself?
 How would we start to tackle this problem?
18
Next Time: XML
 The bridge between databases and the Web in
general
19