Slides - University of Pennsylvania
Download
Report
Transcript Slides - University of Pennsylvania
Self-Tuning and Self-Configuring Systems
Zachary G. Ives
University of Pennsylvania
CIS 650 – Database & Information Systems
March 16, 2005
Administrivia
No class 3/21 – out of town
Read and summarize the Natix paper for Wednesday 3/23
Tomorrow, 3PM, Levine 101:
Shuchi Chawla, CMU, Path-Planning Algorithms
Tomorrow, 4:30PM, Levine 307:
Sihem Amer-Yahia, AT&T Labs:
“Full-Text Querying in XML: A Little Bit of Standards and
Lot's o' Research”
Tuesday, 3/21, Levine 101:
Nick Feamster, MIT: Robust Internet Routing
2
Today’s Trivia Question
3
Midterm Mini-Retrospective
We’ve now seen many of the major issues in
databases
… Which are?
Mike Stonebraker thinks we’ve run out of good
things to work on
Is he right?
What problems should people be working on now?
4
A Few of My Thoughts
(Please chime in with your own!)
More automation
Different data types
“Schema mostly”, text, …
Semantic reconciliation and mapping
Perhaps we’ll never solve this, but we can clearly do better
Uncertainty and inconsistency
Probabilities, inconsistencies, different perspectives, …
Truly scalable data sharing
Can’t we share at the level of the Web?
Two-way data exchange
Streams and sensors
5
Self-Tuning Systems
Databases are complicated!
Schema design is hard
Lots of “knobs” to tweak
Need appropriate information
Does the DB approach give us more ability to “selftune” than some other approach (e.g., Java)?
6
What Would We Like to Auto-Tune?
Query optimization – statistics, bad decisions, …
The schema itself?
Indices
Auxiliary materialized views
Data partitioning
Perhaps logging?
7
What Are The Challenges in
Building Adaptive Systems?
Really, a generalization of those in adaptive query
processing
Information gathering – how do we get it?
Extrapolating – how do we do this accurately and
efficiently?
Sampling or piloting
Minimizing the impact of mistakes if they happen
Using app-specific knowledge
8
Who’s Interested in these Problems?
Oracle:
Materialized view “wizard”
Microsoft “AutoAdmin”:
Index selection, materialized view selection
Stats on materialized views
Database layout
IBM SMART (Self-Managing And Resource Tuning):
Histogram tuning (“LEO” learning optimizer)
Partitioning in clusters
Index selection
Adaptive query processing
9
A Particular Instance:
Microsoft’s Index Tuning Wizard
Why not let the system choose the best index
combination(s) for a workload
The basic idea:
Log a whole bunch of queries that are frequently run
See what set of indices is best
Why is this hard? Why not index everything?
Create these indices with little or no human input
10
Possible Approaches
Obviously: only consider indices that would be useful
The optimizer can “tell” which indices it might use in executing a
query
But that continues to be a lot of indices!
Can exhaustively compare all possible indices
Note that indices can interact (esp. for updates)
How do we compare costs and benefits of indices?
Execute for real
Use optimizer cost model with whatever stats we have
Gather some stats (e.g., build histograms, sample) and use cost model
11
SQL Server Architecture
12
Their Approach in More Detail
For a workload of n queries:
Generate a separate workload with each query
Evaluate the candidate indices for this query to find the best
“configuration” – limited to 2 indices, 2 tables, single joins
Candidate index set for workload is the union of all configurations
Too expensive to enumerate all; use a greedy algorithm:
Exhaustively enumerate (using optimizer) best m-index configuration
Pick a new index I to add, which seems to save cost relative to adding
some other I’ or to the current cost
Repeat until we’ve added “enough” k indices
“Despite interaction among indices, the largest cost reductions often
result from indices that are good candidates by themselves”
They iteratively expand to 2-column indices – index on
leading column must be desirable for this to be desirable
13
How Many Candidates?
14
Savings Due to Considering Single Joins
15
Compared to Baseline
Baseline considers all indices during enumeration, with
greedy algorithm mentioned previously
16
Further Enhancements
Use the tool for “what-if” analysis
What if a table grows by a substantial amount?
Supplement with extra info gathered from real query
execution
Maybe we can “tweak” estimates for certain selectivities
An attempt to compensate for the “exponential error”
problem
17
Where Next?
Perhaps we can go further, automating database
design itself?
How would we start to tackle this problem?
18
Next Time: XML
The bridge between databases and the Web in
general
19