What is a Database System?
Download
Report
Transcript What is a Database System?
Query optimisation
Query optimisation
1
Query optimisation
Example - hospital database
Doctors
Name
Office
…
L. Johnson
Sur1-left
…
…
…
…
P. Thomson IC100
…
…
C. Craig
Int-100
100 tuples
Patients
2500 tuples
Name
Disease
D_name
…
C. Reed
M. Fox
C. Fish
…
…
P. Wolf
prk11
blood press
stomach-ul
…
…
kidn-fail
P. Thomson
L. Johnson
L. Johnson
…
…
U. Ulrich
…
…
…
…
…
…
2
Query optimisation
Query
Get the name of the doctors who treat patients suffering
from the prk11 disease
SELECT Doctors.name
FROM
Doctors, Patients
WHERE Disease = ‘prk11’ AND
D_name = Doctors.Name
3
Query optimisation
Evaluation #1
restrict Patients to those who suffer from prk11
• read: 2500 tuples; result: estimated 50 tuples; no need to
write intermediate result - sufficiently small
join above result with Doctors
• read: 100 tuples (Doctors); result 50 tuples; no need to
write to disk intermediate result
project result over Doctors.name
• the desired result is in the memory
estimated cost (read and write) 2600
4
Query optimisation
Evaluation #2
• suppose the internal memory allows only some 350 tuples
join Patients with Doctors
• read Patients in batches of 250 tuples; therefore read
Doctors 10 times; read: 2500 + 1000 = 3500; write
intermediate result (too big) to disk: 2500;
restrict above result
• read 2500; result: estimated 50 tuples;
project
cost: 8500 (read and write)
5
Query optimisation
Intermediate conclusions
the evaluation strategy (procedural aspect) can
lead to very big differences in computation time,
for the same query
• computation time: read from and write to disk (quintessential)
• processor time
the actual evaluation procedures are far more
complex than in the previous introductory
example
6
Query optimisation
Optimisation - what
deciding upon the best strategy of evaluating
a query
it is performed automatically by the optimiser
of the DBMS
not just for data retrieval operations, but for
updating operations as well (e.g. UPDATE)
not guaranteed to give the best result
7
Query optimisation
Optimisation - how
based on statistical information about the
specific database (not necessarily, though)
perform
expression transformation (cast query in some
internal form and convert to respective canonical form
candidate low level procedures selection
query plans generation and selection
statistical information - could you think of examples?
cardinality of base relations, indexes, ...
8
Query optimisation
Cast (transform) query in some internal form
internal format
• more suitable for automatic processing
• trees (syntax tree or query tree)
from a conceptual point of view is is easier to
assume that the internal format is relational
algebra
9
Query optimisation
Convert to canonical form
the initial expression is transformed into an
equivalent but more efficient form
• “efficient form” = efficient when executed
• these transformation are performed independently from
actual data values and access paths
10
Query optimisation
Expression transformation
examples
(A WHERE condition#1) WHERE condition#2
(A WHERE condition#1 AND condition#2)
(A [projection#1] ) [projection#2]
A [projection#2]
(A [projection]) WHERE condition
(A WHERE condition) [restriction]
11
Query optimisation
Expression transformation
distributivity
commutativity and associativity
idempotence
scalar expressions
conditional expressions
semantic transformation
12
Query optimisation
Set level operations
the operators of relational algebra are set
level
• i.e. they manipulate sets (relations) and not individual
tuples
however, these operators are implemented by
internal (DBMS) procedures
• these procedures, inherently, need tuple-access (in fact,
they need access to scalar values)
13
Query optimisation
Choose candidate low-level procedures
the optimiser decides how to execute the query
(expressed in canonical form)
• access paths are relevant at this stage
in the main, each basic operation (join , restriction, …)
has a set of procedures that implement it
• e.g. RESTRICTION - (1) on candidate key; (2) on indexed
key; (3) on other attributes …
• each procedure has associated a cost function (usually
based on the required I/O disk operations); these functions
are used in the next stage
14
Query optimisation
Implementing JOIN - examples
R and P - two relations to be joined
J - the attribute on which the (natural) join is
performed
R[i] and P[j] mean the i-th tuple of R and the
j-th tuple of P, respectively
R[i].J means the value of the attribute J for
the i-th tuple of the relation R
R has M and P has N tuples, respectively
15
Query optimisation
Implementing JOIN - brute force
for i:=1 to M
for j := 1 to N do
if R[i].J = P[j].J then
add joined tuple R[i]*P[j] to result
end
end
end
16
Query optimisation
Index lookup
index X on Patients.D_name
D_name
Johnson
Johnson
Johnson
Thomson
Thomson
U. Ulrich
Pointer
Name
Disease
D_name
…
C. Reed
M. Fox
C. Fish
M. Maria
P. Bosh
P. Wolf
prk11
blood press
stomach-ul
ear
nose
kidn-fail
Thomson
Johnson
Johnson
Thomson
Johnson
Ulrich
…
…
…
…
…
…
17
Query optimisation
Implementing JOIN - index lookup
/* index X on P.J */
for i:=1 to M
for j := 1 to K[i] do
add joined tuple (R[i] * PK[j]) to result
/* PK[j] represents the tuple of P
that K[j] points to */
end
end
18
Query optimisation
Choose the cheapest query plan
construct query plans (query evaluation plan)
•
•
•
•
combine candidate low level procedures
choose the cheapest
total cost = the sum of individual costs
individual costs depend on the actual data values;
estimates are used instead, based on statistical data
• usually not all possible evaluation procedures are
generated; the search space is reduced by applying
heuristics
19
Query optimisation
Database statistics - in the data dictionary
for each base table
• cardinality
• space occupied
• etc.
for each column of each base table
•
•
•
•
no of distinct values
maximum, minimum and average values
histogram of values
…
...
20
Query optimisation
An optimiser is never perfect
the following example is a real life example
suppose a Postgres definition for
• base relation: Treatment(Patient, Drug, Disease, …)
the query
• get all the drugs that are taken by patients that suffer from prk11
• (all the drugs, not only those for prk11)
SELECT DISTINCT Drug FROM Treatment
WHERE Patient IN
(SELECT Patient FROM Treatment
WHERE Disease = ‘prk11’) ;
the query is far slower that the equivalent one (next) ...
21
Query optimisation
An optimiser is never perfect
/* this query is faster than the previous one,
even though it seems to be performing more
computations - Patient is not unique! */
CREATE VIEW V_Treatment AS
SELECT * FROM Treatment
SELECT DISTINCT Treatment.Drug
FROM Treatment, V_Treatment
WHERE Treatment.Patient = V_Treatment.Patient
AND Disease = ‘prk11’ ;
22