What is a Database System?

Download Report

Transcript What is a Database System?

Query optimisation
Query optimisation
1
Query optimisation
Example - hospital database
Doctors
Name
Office
…
L. Johnson
Sur1-left
…
…
…
…
P. Thomson IC100
…
…
C. Craig
Int-100
100 tuples
Patients
2500 tuples
Name
Disease
D_name
…
C. Reed
M. Fox
C. Fish
…
…
P. Wolf
prk11
blood press
stomach-ul
…
…
kidn-fail
P. Thomson
L. Johnson
L. Johnson
…
…
U. Ulrich
…
…
…
…
…
…
2
Query optimisation
Query
Get the name of the doctors who treat patients suffering
from the prk11 disease
SELECT Doctors.name
FROM
Doctors, Patients
WHERE Disease = ‘prk11’ AND
D_name = Doctors.Name
3
Query optimisation
Evaluation #1
 restrict Patients to those who suffer from prk11
• read: 2500 tuples; result: estimated 50 tuples; no need to
write intermediate result - sufficiently small
 join above result with Doctors
• read: 100 tuples (Doctors); result 50 tuples; no need to
write to disk intermediate result
 project result over Doctors.name
• the desired result is in the memory
 estimated cost (read and write) 2600
4
Query optimisation
Evaluation #2
• suppose the internal memory allows only some 350 tuples
 join Patients with Doctors
• read Patients in batches of 250 tuples; therefore read
Doctors 10 times; read: 2500 + 1000 = 3500; write
intermediate result (too big) to disk: 2500;
 restrict above result
• read 2500; result: estimated 50 tuples;
 project
 cost: 8500 (read and write)
5
Query optimisation
Intermediate conclusions
 the evaluation strategy (procedural aspect) can
lead to very big differences in computation time,
for the same query
• computation time: read from and write to disk (quintessential)
• processor time
 the actual evaluation procedures are far more
complex than in the previous introductory
example
6
Query optimisation
Optimisation - what
 deciding upon the best strategy of evaluating
a query
 it is performed automatically by the optimiser
of the DBMS
 not just for data retrieval operations, but for
updating operations as well (e.g. UPDATE)
 not guaranteed to give the best result
7
Query optimisation
Optimisation - how
 based on statistical information about the
specific database (not necessarily, though)
perform
 expression transformation (cast query in some
internal form and convert to respective canonical form
 candidate low level procedures selection
 query plans generation and selection
 statistical information - could you think of examples?
 cardinality of base relations, indexes, ...
8
Query optimisation
Cast (transform) query in some internal form
 internal format
• more suitable for automatic processing
• trees (syntax tree or query tree)
 from a conceptual point of view is is easier to
assume that the internal format is relational
algebra
9
Query optimisation
Convert to canonical form
 the initial expression is transformed into an
equivalent but more efficient form
• “efficient form” = efficient when executed
• these transformation are performed independently from
actual data values and access paths
10
Query optimisation
Expression transformation
 examples
(A WHERE condition#1) WHERE condition#2
(A WHERE condition#1 AND condition#2)
(A [projection#1] ) [projection#2]
A [projection#2]
(A [projection]) WHERE condition
(A WHERE condition) [restriction]
11
Query optimisation
Expression transformation
 distributivity
 commutativity and associativity
 idempotence
 scalar expressions
 conditional expressions
 semantic transformation
12
Query optimisation
Set level operations
 the operators of relational algebra are set
level
• i.e. they manipulate sets (relations) and not individual
tuples
 however, these operators are implemented by
internal (DBMS) procedures
• these procedures, inherently, need tuple-access (in fact,
they need access to scalar values)
13
Query optimisation
Choose candidate low-level procedures
 the optimiser decides how to execute the query
(expressed in canonical form)
• access paths are relevant at this stage
 in the main, each basic operation (join , restriction, …)
has a set of procedures that implement it
• e.g. RESTRICTION - (1) on candidate key; (2) on indexed
key; (3) on other attributes …
• each procedure has associated a cost function (usually
based on the required I/O disk operations); these functions
are used in the next stage
14
Query optimisation
Implementing JOIN - examples
 R and P - two relations to be joined
 J - the attribute on which the (natural) join is
performed
 R[i] and P[j] mean the i-th tuple of R and the
j-th tuple of P, respectively
 R[i].J means the value of the attribute J for
the i-th tuple of the relation R
 R has M and P has N tuples, respectively
15
Query optimisation
Implementing JOIN - brute force
for i:=1 to M
for j := 1 to N do
if R[i].J = P[j].J then
add joined tuple R[i]*P[j] to result
end
end
end
16
Query optimisation
Index lookup
index X on Patients.D_name
D_name
Johnson
Johnson
Johnson
Thomson
Thomson
U. Ulrich
Pointer
Name
Disease
D_name
…
C. Reed
M. Fox
C. Fish
M. Maria
P. Bosh
P. Wolf
prk11
blood press
stomach-ul
ear
nose
kidn-fail
Thomson
Johnson
Johnson
Thomson
Johnson
Ulrich
…
…
…
…
…
…
17
Query optimisation
Implementing JOIN - index lookup
/* index X on P.J */
for i:=1 to M
for j := 1 to K[i] do
add joined tuple (R[i] * PK[j]) to result
/* PK[j] represents the tuple of P
that K[j] points to */
end
end
18
Query optimisation
Choose the cheapest query plan
 construct query plans (query evaluation plan)
•
•
•
•
combine candidate low level procedures
choose the cheapest
total cost = the sum of individual costs
individual costs depend on the actual data values;
estimates are used instead, based on statistical data
• usually not all possible evaluation procedures are
generated; the search space is reduced by applying
heuristics
19
Query optimisation
Database statistics - in the data dictionary
 for each base table
• cardinality
• space occupied
• etc.
 for each column of each base table
•
•
•
•
no of distinct values
maximum, minimum and average values
histogram of values
…
 ...
20
Query optimisation
An optimiser is never perfect
 the following example is a real life example
 suppose a Postgres definition for
• base relation: Treatment(Patient, Drug, Disease, …)
 the query
• get all the drugs that are taken by patients that suffer from prk11
• (all the drugs, not only those for prk11)
SELECT DISTINCT Drug FROM Treatment
WHERE Patient IN
(SELECT Patient FROM Treatment
WHERE Disease = ‘prk11’) ;
 the query is far slower that the equivalent one (next) ...
21
Query optimisation
An optimiser is never perfect
/* this query is faster than the previous one,
even though it seems to be performing more
computations - Patient is not unique! */
CREATE VIEW V_Treatment AS
SELECT * FROM Treatment
SELECT DISTINCT Treatment.Drug
FROM Treatment, V_Treatment
WHERE Treatment.Patient = V_Treatment.Patient
AND Disease = ‘prk11’ ;
22