Transcript Slide 1

Outline

SQL Server Optimizer
 Enumeration
architecture
 Search space: flexibility/extensibility
 Cost and statistics

Automatic Physical Tuning
 Database
Tuning Advisor
 New Directions
Running Example (TPC-H Database)
Primary Index
Secondary Index
Running Example (Query)
Sample query: Obtain information about certain ordered lineitems that are filtered by suppliers and parts.
SELECT l_orderkey, l_linenumber, o_orderstatus
FROM
lineitem JOIN orders ON l_orderkey = o_orderkey
WHERE l_suppkey < 2000 AND l_partkey < 2000
l_orderkey, l_linenumber, o_orderstatus
l_suppkey<2000, l_partkey<2000
l_orderkey=o_orderkey
lineitem
orders
SQL Server Query Optimizer

Based on Cascades Framework
 Transformation-based,
top-down approach
 Optimization = Tasks + Memo
( Programs


= Algorithms + Data Structures )
Fully cost-based
Flexible and Extensible
 Search
space easy to change
 New operators and rules easy to add
SQL
Query
Parsing,
Validation
Normalization
(predicate unfolding,
join collapsing, view
substitution, etc.)
Cost-based
Optimization
Execution
Plan
The Memo

Search Space Memory



Compactly stores all explored alternatives (AND-OR graph)
Groups together equivalent operator trees and their plans
Provides memoization, duplicate detection, property and cost
management, etc.
Groups
a<10
R
b>20
S
b>20
a<10
1) SELECT (a<10, )
R
R
Expressions
1) SELECT (a<10, )
2) JOIN (x=y, , )
3) ...
1) GET(R)
R
b>20
1) SELECT (b>20, )
S
S
S
1) GET(S)
1) JOIN (x=y, , )
2) ...
Optimization Tasks
Optimize Inputs
Initialize Memo
Optimize Root Group
Obtain optimization context
Optimize children groups
Calculate costs, pick best
Optimize Group
Apply Rule
Explore Group
Obtain enforcers and
implementation rules for all
expressions
Apply Rules by Promise
Pruning checks
Generate bindings, substitutes
Restrict ruleset
optimizing?Opt Inputs:Explore
Explore Group
For each expression
-Get Guidance
-Explore Expression.
Explore Expression
Apply-always rules
Explore Inputs
Rest of Apply-always Rules
Other Rules by Promise
SQL Server Optimizer: Summary

Transformation-based, top-down approach
No need for bottom-up interesting orders

Fully Cost-based
No separation into phases (heuristic+cost)

Flexible and Extensible
New operators, rules, and strategies are simple to add

Adaptive
 Automatic
statistics create and refresh
 Automatic optimization levels
 Physical Tuning
Problem Statement
Workload
Database
Physical
Design Tool
Configuration
Set of physical
structures (i.e.,
indexes and
views) that make
similar workloads
execute as fast as
possible
Challenges

Recommend a variety of physical structures.
 Indexes,


indexed views, partitions, XML indexes, etc.
Support space constraints, update queries.
Exceptionally large search space, especially for
materialized views and partitions.
 Strong
interaction among access paths.
 Merging needed: Optimal solution with suboptimal
parts.
 Cannot implement and test alternatives!

Industrial-strength quality (not trivial!)
Design Principles



Workload-driven: Take into account database
usage.
What-if API: Determine impact without actually
materializing physical design.
Don’t second-guess the query optimizer!
 An
index is useful only if the query optimizer decides
to use it.
 Don’t use external estimator of goodness, but the
optimizer-estimated cost.

Keep tool reasonably separated from today’s
optimizer (extensibility).
Database Tuning Advisor in Yukon
Workload
Compress
Workload
Tuning
Client
Candidate Selection
(per query)
Merging
What-if
API
Query
Optimizer
Database
Server
Metadata
…
Enumeration
Yes
Time?
No
Recommendation
- Create Hypothetical Index/View.
- Optimize Query with respect to
hypothetical configurations.
New Architecture
Workload
Get Optimal Configuration
(per query)
Tuning
Client
Relaxation
Yes
Requests
API
Request
Identification
What-if
API
Query
Optimizer
Time?
No
Recommendation


Instrumenting the query optimizer.
Search strategy based on relaxations.
Metadata
…
Database
Server
Related Bibliography





Surajit Chaudhuri
An Overview of Query Optimization in Relational Systems.
PODS 1998: 34-43
Goetz Graefe
The Cascades Framework for Query Optimization.
IEEE Data Eng. Bull. 18(3): 19-29 (1995)
Surajit Chaudhuri, Vivek R. Narasayya
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server.
VLDB 1997: 146-155
Sanjay Agrawal, Surajit Chaudhuri, Vivek R. Narasayya
Automated Selection of Materialized Views and Indexes in SQL Databases.
VLDB 2000: 496-505
Nicolas Bruno, Surajit Chaudhuri
Automatic Physical Database Tuning: A Relaxation-based Approach.
SIGMOD 2005