Query Optimization - Lyle School of Engineering

Download Report

Transcript Query Optimization - Lyle School of Engineering

Query Optimization
Allison Griffin
Importance of Optimization





Time is money
Queries are faster
Helps everyone who uses the server
Solution to speed lies in the algorithm
Different performance improvements with
different database engines and schemas
Brief History


Before 1970’s: Dark days, manual optimization
Late 70’s to mid 80’s:
–
–
–
–

Mid 80’s to early 90’s:
–

Birth of relational data model and declarative SQL
Optimization is job of system
System R-beginning work on join order optimization
Dynamic Programming: Heuristic Optimizers
Extensible query optimization (Exodus)
Mid 90’s to late 90’s:
–
Materialized Views
Volcano Extensible Query
Optimizer Generator

General purpose cost based query optimizer,
based on equivalence rules in algebra
–
–
–
Equivalences: join associativity, select push
down, aggregate push down
Extensible: new operations and equivalences can
be easily added
Developed by Graefe and McKenna 1993
Materialized Views

Can materialize (pre-compute and store) views
to speed up queries
–
Incremental maintenance

–
when database is updated, propagate updates to
materialized view without complete re-computation
Deciding when to use materialized views

even if query does not refer to materialized view, optimizer
can figure out it can be used
Deciding What to Materialize

Maintenance cost and query cost
–
Workload depends on what is materialized:



queries and update transactions
weights for each component of workload
Goal: find set of views that gives minimum
cost if materialized, subject to space
constraints
What we already know…

Query optimizer analyzes set of query
execution plans and gives optimal (least
cost)
–
–
Heavily dependent on optimizer’s estimate for
number of rows that will result at each step of
QEP
Estimates rely on statistics typically stored in
histograms
Recent Approaches to
Improve Statistics

Paper “Distinct-Value Synopses for Multiset
Operations” by Kevin Beyer, Rainer Gemulla,
Peter J. Haas, Berthold Reinwalk, and
Yannis Sismanis, 2007

IBM’s LEO (Learning Empirical Results in
Query Optimization), 2001
Summary of Paper Results



Addresses the problem of efficient estimate
of number of distinct values of an attribute
Builds on leveraging of randomized
algorithms
Claim to have unbiased estimator for distinct
values with lower mean squared error
–
Past attempts tend to by higher than the actual
number so they have come up with way to cut
that number down to be more reasonable
Distinct-Value Estimation

Propose summary structure (synopsis) for a
relation
–
–
–
Synopsis can be used to estimate number of DVs in the
partition
Synopses can be combined to create synopses for
compound partitions created from base partitions using
multiset union, intersection or difference operations
Updates can be performed on compound partitions by using
synopses from base relations
LEO - Learning Emperical Results in
Query Optimization



Autonomic feedback loops that create a selftuning database query optimizer
Self-validates and adjusts to improve query
optimization and execution without requiring
user interaction to repair incorrect statistics
or cardinality estimates
Reduces the total cost of owning database
management systems by simplifying
database administration
How it works




Monitors queries as they execute
Compares the optimizer’s estimates with
actuals at each step in a QEP
Then computes adjustments to its estimates
that may be used during future optimizations
of similar queries
Moreover, estimation errors can also trigger
re-optimization of a query in mid-execution.
Challenges in Research of LEO


(1) ensuring stability and convergence of the
autonomic system
(2) guaranteeing consistency of the overall
optimizer's model upon refinements
Results




Reduction of query execution time by orders
of magnitude at negligible additional run-time
cost
Reduced administration time
Fewer problem queries
Overall improved query performance with
increased robustness and predictability of
query response times
Bibliography






“LEO-Learning Empirical Results in Query Optimization.” IBM.
<http://domino.watson.ibm.com/comm/research.nsf/pages/r.datamgmt.innovatio
n.html>.
“Optimizing for Query Speed”. SQL.
<http://www.devshed.com/c/a/MySQL/Optimizing-for-Query-Speed/1/
“Optimizing Database Queries”. IBM.
<http://www.stevengould.org/portfolio/developerWorks/efficientPHP/waeffphp/wa-effphp-4-1.html>.
“Optimize Queries Theory in Practice”.
<http://www.serverwatch.com/tutorials/article.php/2175621/How-to-OptimizeQueries-Theory-an-Practice.htm>.
Beyer, Kevin, Gemulla, Rainer, Haas, Peter J., Reinwald, Berthold, Sismani,
Yannis. “Distinct-Value Synopses for Multiset Operations”. Communications of
the ACM. Vol. 52. October 2009.
Chaudhuri, Surajit. “Technical Perspective: Relational Query Optimization-Data
Management Meets Statistical Estimation”. Communications of the ACM. Vol.
52. October 2009.