E.age

Transcript E.age

Chapter 13: Query Processing
Database System Concepts
21.1
©Silberschatz, Korth and Sudarshan
Chapter 13: Query Processing
 Overview
 Measures of Query Cost
 Selection Operation
 Sorting
 Join Operation
2
Database System Concepts
21.2
©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
3
Database System Concepts
21.3
©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
(Cont.)
 Parsing and translation
 translate the query into its internal form. This is then translated into
relational algebra.
 Parser checks syntax, verifies relations
 Evaluation
 The query-execution engine takes a query-evaluation plan, executes
that plan, and returns the answers to the query.
 Query Optimization: Amongst all equivalent evaluation plans
choose the one with lowest cost.
 Cost is estimated using statistical information from the
database catalog
 e.g. number of tuples in each relation, size of tuples, etc.
4
Database System Concepts
21.4
©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing :
Optimization
 A relational algebra expression may have many equivalent
expressions
 E.g., balance2500(balance(account)) is equivalent to
balance(balance2500(account))
 Each relational algebra operation can be evaluated using one of
several different algorithms
 Correspondingly, a relational-algebra expression can be evaluated in
many ways.
 Annotated expression specifying detailed evaluation strategy is
called an evaluation-plan.
 E.g., can use an index on balance to find accounts with balance < 2500,
 or can perform complete relation scan and discard accounts with
balance  2500
5
Database System Concepts
21.5
©Silberschatz, Korth and Sudarshan
Equivalent execution plans
select d.customer-name
from branch b, account a, depositor d
where b.branch-name = a.branch-name
and a.account-number = d.account-number
and b.branch-city = ‘Brooklyn’
6
Database System Concepts
21.6
©Silberschatz, Korth and Sudarshan
Measures of Query Cost
 Cost is generally measured as total elapsed time for
answering query
 Many factors contribute to time cost
 disk accesses, CPU, or even network communication
 Typically disk access is the predominant cost, and is also
relatively easy to estimate. Measured by taking into
account
 Number of seeks
* average-seek-cost
 Number of blocks read * average-block-read-cost
 Number of blocks written * average-block-write-cost
 Cost to write a block is greater than cost to read a block
– data is read back after being written to ensure that the
write was successful ( MAIS o CUSTO DAS
TRANSACÇÔES!!!)
7
Database System Concepts
21.7
©Silberschatz, Korth and Sudarshan
SORTING
8
Database System Concepts
21.8
©Silberschatz, Korth and Sudarshan
Sorting
 We may build an index on the relation, and then use the index to
read the relation in sorted order. May lead to one disk block
access for each tuple.
 For relations that fit in memory, techniques like quicksort can be
used. For relations that don’t fit in memory, external
sort-merge is a good choice.
9
Database System Concepts
21.9
©Silberschatz, Korth and Sudarshan
External Sort-Merge
Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of i be N
2. Merge the runs (next slide)…..
10
Database System Concepts
21.10
©Silberschatz, Korth and Sudarshan
External Sort-Merge (Cont.)
2. Merge the runs (N-way merge). We assume (for now)
that N < M.
1. Use N blocks of memory to buffer input runs, and 1 block to
buffer output. Read the first block of each run into its buffer
page
2. repeat
1. Select the first record (in sort order) among all buffer
pages
2. Write the record to the output buffer. If the output buffer
is full write it to disk.
3. Delete the record from its input buffer page.
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer.
3. until all input buffer pages are empty:
11
Database System Concepts
21.11
©Silberschatz, Korth and Sudarshan
External Sort-Merge (Cont.)
 If N  M, several merge passes are required.
 In each pass, contiguous groups of M - 1 runs are merged.
 A pass reduces the number of runs by a factor of M -1, and
creates runs longer by the same factor.
 E.g. If M=11, and there are 90 runs, one pass reduces
the number of runs to 9, each 10 times the size of the
initial runs
 Repeated passes are performed till all runs have been
merged into one.
12
Database System Concepts
21.12
©Silberschatz, Korth and Sudarshan
Example: External Sorting Using Sort-Merge
13
Database System Concepts
21.13
©Silberschatz, Korth and Sudarshan
JOIN
14
Database System Concepts
21.14
©Silberschatz, Korth and Sudarshan
Nested-Loop (NL) Join
 To compute the theta join
r

s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition 
if they do, add tr.ts to the result.
end
end
 r is called the outer relation and s the inner relation of the join.
 Requires no indices and can be used with any kind of join condition.
 Expensive since it examines every pair of tuples in the two relations.
 A tabela interior deve ser a mais pequena por causa do LRU (para estar
sempre em memória)
 Pode usar um indice para evitar-se aceder à tabela interior
15
Database System Concepts
21.15
©Silberschatz, Korth and Sudarshan
Block Nested-Loop Join
 Variant of nested-loop join in which every block of inner
relation is paired with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end
16
Database System Concepts
21.16
©Silberschatz, Korth and Sudarshan
Merge-Join
1. Sort both relations on their join attribute (if not already sorted on the join
attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the sort-merge algorithm.
2. Main difference is handling of duplicate values in join attribute: every pair with
same value on join attribute must be matched
17
Database System Concepts
21.17
©Silberschatz, Korth and Sudarshan
Hash-Join Algorithm
The hash-join of r and s is computed as follows.
1. Partition the relation s using hashing function h. When
partitioning a relation, one block of memory is reserved as the
output buffer for each partition.
2. Partition r similarly.
3. For each i:
(a) Load si into memory and build an in-memory hash index on it
using the join attribute. This hash index uses a different hash
function than the earlier one h.
(b) Read the tuples in ri from the disk one by one. For each tuple tr
locate each matching tuple ts in si using the in-memory hash
index. Output the concatenation of their attributes.
Relation s is called the build input and
r is called the probe input.
18
Database System Concepts
21.18
©Silberschatz, Korth and Sudarshan
Hash-Join (Cont.)
 r tuples in ri need only to be compared with s tuples in si
Need not be compared with s tuples in any other partition,
since:
 an r tuple and an s tuple that satisfy the join condition will have
the same value for the join attributes.
 If that value is hashed to some value i, the r tuple has to be in ri
and the s tuple in si.
19
Database System Concepts
21.19
©Silberschatz, Korth and Sudarshan
Hash-Join (Cont.)
20
Database System Concepts
21.20
©Silberschatz, Korth and Sudarshan
Índices - revisão
 O que é:
 É uma estrutura auxiliar de acesso à informação
 Um índice definido sobre uma chave K, permite aceder
rapidamente aos registos contendo a chave K.
 Um índice tem 2 componentes:
 A que permite encontrar a chave K dentro do índice
 A que permite encontar a informação dos registos que contêm a
chave K
 A chave pode ser composta por vários atributos
 Podem existir vários índices sobre a mesma tabela/atributos
 A alteração/remoção/inserção do valor de K de algum registo na
tabela implica a re-ordenação da estrutura de índice.
21
Database System Concepts
21.21
©Silberschatz, Korth and Sudarshan
Índices -1
Ficheiro
de
Índice
Tabelas
Index
Entries
Define o algoritmo de procura
da chave
Data
Entries
Contêm o endereço dos
blocos de dados no disco
com uma dada chave
Data
Records
Registos dos Dados
Index
O algoritmo de procura da chave pode ser :
Entries
- Sequencial
- Hash Table
- B-tree
Database System Concepts
22
21.22
©Silberschatz, Korth and Sudarshan
Índices - 2
Define o algoritmo de procura
da chave
Index
Entries
Ficheiro
de
Índice
Data
Entries
Tabelas
Contêm o endereço dos blocos
de dados no disco
Data
Records
Data
Entries
Registos dos Dados
Contêm 1 de 3 hipóteses:
- O próprio Data record com chave k
- <k, endereço do data record no disco com chave k>
- <k, lista de endereços de data records no disco com
chave k>
23
Database System Concepts
21.23
©Silberschatz, Korth and Sudarshan
Índices - 3
Define o algoritmo de procura
da chave
Index
Entries
Ficheiro
de
Índice
Data
Entries
Tabelas
Data
Records
Data
Records
Database System Concepts
Contêm o endereço dos blocos
de dados no disco
Registos dos Dados
- Contêm os dados propriamente ditos
- Podem:
- não estar ordenados fisicamente pela
chave k (tal como figura em cima)
- estar ordenados fisicamente pela chave k
(tal como figura do próximo slide) 24
21.24
©Silberschatz, Korth and Sudarshan
Índices - 4
Define o algoritmo de procura
da chave
Index
Entries
Ficheiro
de
Índice
Data
Entries
Tabelas
Contêm o endereço dos blocos
de dados no disco
Data
Records
Data
Records
Registos dos Dados
- Quando os registos estão ordenados
fisicamente, diz-se que a tabela/indíce está
“Clustered”
25
Database System Concepts
21.25
©Silberschatz, Korth and Sudarshan
Types of Queries
1.
Point Query
3.
SELECT number
FROM accounts
WHERE balance > 10000;
SELECT balance
FROM accounts
WHERE number = 1023;
2.
Multipoint Query
Range Query
4.
Prefix Match Query
SELECT *
FROM employees
WHERE name = ‘Jensen’
and firstname = ‘Carl’
and age < 30;
SELECT balance
FROM accounts
WHERE branchnum = 100;
26
Database System Concepts
21.26
©Silberschatz, Korth and Sudarshan
Types of Queries
5.
Function Query
SELECT *
FROM accounts
WHERE balance =
max(select balance from accounts)
6.
7.
Grouping Query
SELECT branchnum, avg(balance)
FROM accounts
GROUP BY branchnum;
8.
Join Query
Ordering Query
SELECT distinct branch.adresse
FROM accounts, branch
WHERE
accounts.branchnum =
branch.number
and accounts.balance > 10000;
SELECT *
FROM accounts
ORDER BY balance;
27
Database System Concepts
21.27
©Silberschatz, Korth and Sudarshan
Heurísticas de criação de índices (1)
 Quando as queries forem simples.
 Se não os índices não são usados!!!! Exemplo, se um atributo
aparece como argumento de uma função os índices não são
normalmente usados.
 Quando os resultados representarem uma pequena proporção
do total da tabela (< 20%) ?
 Quando os valores distintos de uma coluna representarem uma
pequena proporção do total dos valores da coluna (<5%) ?
 Se houver condições com multiplos atributos:
 As chaves compostas devem que ser declaradas da mesma ordem
que usadas
 As chaves compostas podem ser usadas para incluir no Data
Entries a própria resposta à query..
28
Database System Concepts
21.28
©Silberschatz, Korth and Sudarshan
Heurísticas de criação de índices (2)
 Quando a chave for usada em condições nas Cláusulas de
Where.
 Condições de “range” sugerem índices sequenciais ou Btree
 Condições de igualdade sugerem índices hashed
 Quando a tabela for grande (o número de Data Records for
grande)
 Só se não existirem muitos outros índices na mesma tabela
29
Database System Concepts
21.29
©Silberschatz, Korth and Sudarshan
Heurísticas de criação de índices(3)
 Como um índice sobre a chave K:
 acelera as leituras de registos identificados por k
 acelera os updates de atributos (que não sejam parte de K) e que
sejam de registos identificados por k
 atrasa as inserções e eliminações de registos
 atrasa as alterações aos atributos que pertencem à chave K
Só será interessante quando o número acessos dos dois primeiros
cenários for bastante superior ao número de acessos dos dois
últimos cenários.
Regra por omissão é 1 para 5!!
30
Database System Concepts
21.30
©Silberschatz, Korth and Sudarshan
Heurísticas para a criação de Ìndices
“Clustered”
 Os índices Clustered
 implicam a re-ordenação física dos dados sempre que são
inseridos novos registos na tabela.
 Optimizam as pesquisas de “ranges” (intervalos)
 Logo só são interessantes quando:
 existem queires importantes com cláusulas de Where com “ranges”
 as insersões/modificações da chave são pouco frequentes (ou
então se o factor de compactação for muito baixo.)
31
Database System Concepts
21.31
©Silberschatz, Korth and Sudarshan
Heurísticas para a criação de Ìndices
“Bitmaps”
 Os índices bitmaps
 implicam a criação de um array de bits por cada valor discreto que
a coluna assume
 Logo só são interessantes quando:
 Os valores distintos de uma coluna são uma pequena % dos
valores totais da coluna (< 0,1%)
 as insersões/modificações da são reduzidas
32
Database System Concepts
21.32
©Silberschatz, Korth and Sudarshan
Agregração de várias tabelas


É possível definir um Índice Clustered de várias
tabelas simultâneamente.

BI=100
 Joao, Lisboa
Exemplo 1:
 Bx-23-45, Fiat
 00-23-IP, Rover
Pessoa(BI, Nome, Morada), Carro(Mat, marca, BI),

BI=110
100, Joao, Lisboa
Bx-23-45, Fiat, 100
 Pedro, Sintra
110, Pedro, Sintra
00-23-IP, Rover, 100
 XX-23-45, Fiat
200, Pedro, Sintra
XX-23-45, Fiat, 110
201, Rui, Almada
00-23-II, Opel, 110
33-23-II, Opel, 110
 00-23-II, Opel
 33-23-II, Opel
 00-23-II, Opel

 Pedro, Sintra
00-23-II, Opel, 110

Um Índice em BI clustered sobre Pessoa e Carro seria
armazenado da seguinte forma:
BI=200
BI=201
 Rui, Almada
33
Database System Concepts
21.33
©Silberschatz, Korth and Sudarshan
Indexes with Composite Search Keys
 Composite Search Keys: Search on a
combination of fields.

Examples of composite key
indexes using lexicographic order.
Equality query: Every field value is equal
to a constant value. E.g. wrt <sal,age>
index:
 age=20 and sal =75

11,80
11
12,10
12
Range query: Some field value is not a
constant. E.g.:
12,20
 age =20; or age=20 and sal > 10
13,75
<age, sal>
 Data entries in index sorted by search
key to support range queries.

Lexicographic order, or

Spatial order.
10,12
20,12
75,13
name age sal
bob 12
10
cal 11
80
joe 12
20
sue 13
75
<sal, age>
Data entries in index
sorted by <sal,age>
21.34
13
<age>
10
Data records
sorted by name
80,11
Database System Concepts
12
20
75
80
<sal>
Data entries
sorted by <sal>
©Silberschatz, Korth and Sudarshan
Multi-Attribute Index Keys
 To retrieve Emp records with age=30 AND sal=4000, an index on <age,sal>
would be better than an index on age or an index on sal.

Such indexes also called composite or concatenated indexes.

Choice of index key orthogonal to clustering etc.
 If condition is: 20<age<30 AND 3000<sal<5000:

Clustered tree index on <age,sal> or <sal,age> is best.
 If condition is: age=30 AND 3000<sal<5000:

Clustered <age,sal> index much better than <sal,age> index!
 Composite indexes are larger, updated more often.
35
Database System Concepts
21.35
©Silberschatz, Korth and Sudarshan
Example 1
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
 Hash index on D.dname supports ‘Toy’ selection.

Given this, index on D.dno is not needed.
 Hash index on E.dno allows us to get matching (inner) Emp tuples for each
selected (outer) Dept tuple.
 What if WHERE included: `` ... AND E.age=25’’ ?

Could retrieve Emp tuples using index on E.age, then join with Dept tuples
satisfying dname selection. Comparable to strategy that used E.dno index.

So, if E.age index is already created, this query provides much less motivation
for adding an E.dno index.
36
Database System Concepts
21.36
©Silberschatz, Korth and Sudarshan
Example 2
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE E.sal BETWEEN 10000 AND 20000
AND E.hobby=‘Stamps’ AND E.dno=D.dno
 What index should we build on Emp?

B+ tree on E.sal could be used, OR an index on E.hobby could be used. Only
one of these is needed, and which is better depends upon the selectivity of the
conditions.
 As a rule of thumb, equality selections more selective than range selections.
 Clearly, Emp should be the outer relation.

Suggests that we build a hash index on D.dno.
 As both examples indicate, our choice of indexes is guided by the plan(s)
that we expect an optimizer to consider for a query. Have to understand
37
optimizers!
Database System Concepts
21.37
©Silberschatz, Korth and Sudarshan
Examples of Clustering
 B+ tree index on E.age can be used to get
qualifying tuples.

How selective is the condition?

Is the index clustered?
SELECT E.dno
FROM Emp E
WHERE E.age>40

SELECT E.dno, COUNT (*)
If many tuples have E.age > 10, using E.age FROM Emp E
index and sorting the retrieved tuples may be WHERE E.age>10
costly.
GROUP BY E.dno

Clustered E.dno index may be better!
 Consider the GROUP BY query.
 Equality queries and duplicates:

SELECT E.dno
FROM Emp E
WHERE E.hobby=´Stamps´
38
Clustering on E.hobby helps!
Database System Concepts
21.38
©Silberschatz, Korth and Sudarshan
Clustering and Joins
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
 Clustering is especially important when accessing inner tuples.

Should make index on E.dno clustered.
 Suppose that the WHERE clause is instead:
WHERE E.hobby=‘Stamps´ AND E.dno=D.dno

If many employees collect stamps, Sort-Merge join may be worth considering. A
clustered index on D.dno would help.
 Summary: Clustering is useful whenever many tuples are to be retrieved.
39
Database System Concepts
21.39
©Silberschatz, Korth and Sudarshan
Index-Only Plans
<E.dno>
 A number of queries
can be answered
without retrieving any
tuples from one or
more of the relations
involved if a suitable
index is available.
<E.dno,E.eid>
Tree index!
SELECT D.mgr
FROM Dept D, Emp E
WHERE D.dno=E.dno
SELECT D.mgr, E.eid
FROM Dept D, Emp E
WHERE D.dno=E.dno
<E.dno>
SELECT E.dno, COUNT(*)
FROM Emp E
GROUP BY E.dno
<E.dno,E.sal>
Tree index!
SELECT E.dno, MIN(E.sal)
FROM Emp E
GROUP BY E.dno
<E. age,E.sal> SELECT AVG(E.sal)
or
FROM Emp E
<E.sal, E.age> WHERE E.age=25 AND
40
Tree! E.sal BETWEEN 3000 AND 5000
Database System Concepts
21.40
©Silberschatz, Korth and Sudarshan

E.age

Transcript E.age

Directory