Transcript branch-name
Chapter 21:
Database System Concepts
21.1
©Silberschatz, Korth and Sudarshan
Índices e Queries
Database System Concepts
21.2
©Silberschatz, Korth and Sudarshan
Índices
Indices
Sequenciais, Hash,
Únicos e não únicos
primários
Simples e compostos
Vantagens e desvantagens dos indices
Declaração em SQL:
CREATE UNIQUE INDEX nome ON tabela(a1,....,an)
3
Database System Concepts
21.3
©Silberschatz, Korth and Sudarshan
Índices e Queries
Select A1,..,An from T where Ai = x
O índices tipo hash são melhores nestes casos
Select A1,..,An from T where Ai < x AND Ai > y
O índices tipo sequenciais são melhores nestes casos.
Select A1,..,An from T where Ai = x AND Aj = y
Podemos usar índices simples em Ai e Aj
Podemos usar índices compostos (Ai, Aj)
4
Database System Concepts
21.4
©Silberschatz, Korth and Sudarshan
Basic Concepts
Indexing mechanisms used to speed up access to desired data.
E.g., author catalog in library
Search Key - attribute to set of attributes used to look up
records in a file.
An index file consists of records (called index entries) of the
form
search-key
pointer
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
5
Database System Concepts
21.5
©Silberschatz, Korth and Sudarshan
Index Evaluation Metrics
Access types supported efficiently. E.g.,
records with a specified value in the attribute
or records with an attribute value falling in a specified range of
values.
Access time
Insertion time
Deletion time
Space overhead
6
Database System Concepts
21.6
©Silberschatz, Korth and Sudarshan
Ordered Indices
Indexing techniques evaluated on basis of:
In an ordered index, index entries are stored sorted on the
search key value. E.g., author catalog in library.
Primary index: in a sequentially ordered file, the index whose
search key specifies the sequential order of the file.
Also called clustering index
The search key of a primary index is usually but not necessarily the
primary key.
Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
non-clustering index.
Index-sequential file: ordered sequential file with a primary index.
7
Database System Concepts
21.7
©Silberschatz, Korth and Sudarshan
Hash Indices
Hashing can be used not only for file organization, but also for
index-structure creation.
A hash index organizes the search keys, with their associated
record pointers, into a hash file structure.
Strictly speaking, hash indices are always secondary indices
if the file itself is organized using hashing, a separate primary hash
index on it using the same search-key is unnecessary.
However, we use the term hash index to refer to both secondary
index structures and hash organized files.
8
Database System Concepts
21.8
©Silberschatz, Korth and Sudarshan
Example of Hash Index
9
Database System Concepts
21.9
©Silberschatz, Korth and Sudarshan
Comparison of Ordered Indexing and Hashing
Cost of periodic re-organization
Relative frequency of insertions and deletions
Is it desirable to optimize average access time at the expense of
worst-case access time?
Expected type of queries:
Hashing is generally better at retrieving records having a specified
value of the key.
If range queries are common, ordered indices are to be preferred
10
Database System Concepts
21.10
©Silberschatz, Korth and Sudarshan
Index Definition in SQL
Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch-name)
Use create unique index to indirectly specify and enforce the
condition that the search key is a candidate key is a candidate
key.
Not really required if SQL unique integrity constraint is supported
To drop an index
drop index <index-name>
11
Database System Concepts
21.11
©Silberschatz, Korth and Sudarshan
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select account-number
from account
where branch-name = “Perryridge” and balance = 1000
Possible strategies for processing query using indices on
single attributes:
1. Use index on branch-name to find accounts with balances of
$1000; test branch-name = “Perryridge”.
2. Use index on balance to find accounts with balances of $1000;
test branch-name = “Perryridge”.
3. Use branch-name index to find pointers to all records pertaining
to the Perryridge branch. Similarly use index on balance. Take
intersection of both sets of pointers obtained.
12
Database System Concepts
21.12
©Silberschatz, Korth and Sudarshan
Indices on Multiple Attributes
Suppose we have an index on combined search-key
(branch-name, balance).
With the where clause
where branch-name = “Perryridge” and balance = 1000
the index on the combined search-key will fetch only records
that satisfy both conditions.
Using separate indices in less efficient — we may fetch many
records (or pointers) that satisfy only one of the conditions.
Can also efficiently handle
where branch-name = “Perryridge” and balance < 1000
But cannot efficiently handle
where branch-name < “Perryridge” and balance = 1000
May fetch many records that satisfy the first but not the
second condition.
13
Database System Concepts
21.13
©Silberschatz, Korth and Sudarshan
Performance Tuning
Database System Concepts
21.14
©Silberschatz, Korth and Sudarshan
Performance Tuning
Adjusting various parameters and design choices to improve
system performance for a specific application.
Tuning is best done by
1. identifying bottlenecks, and
2. eliminating them.
Can tune a database system at 3 levels:
Hardware -- e.g., add disks to speed up I/O, add memory to
increase buffer hits, move to a faster processor.
Database system parameters -- e.g., set buffer size to avoid
paging of buffer, set checkpointing intervals to limit log size. System
may have automatic tuning.
Higher level database design, such as the schema, indices and
transactions (more later)
15
Database System Concepts
21.15
©Silberschatz, Korth and Sudarshan
Desempenho dos Sistemas
Como desempenho do Processadores actualmente,
porquê que ainda temos problemas com o desempenho?
Decomposição do tempo
gasto numa transacção
óptima:
10% – CPU
30% – Rede
30% – Base de Dados
30% – Idle (caso
contrário as queues
entram em “trash”)
Rede
Processador
& Memória
IO
Base de
Dados
Log
Com transacções
distribuídas, o cenário é
bastante pior
Tipicamente, 5 a 20 I/Os
por transacção
16
Database System Concepts
21.16
©Silberschatz, Korth and Sudarshan
Bottlenecks
Performance of most systems (at least before they are tuned)
usually limited by performance of one or a few components:
these are called bottlenecks
Transactions request a sequence of services
e.g. CPU, Disk I/O, locks
E.g. 80% of the code may take up 20% of time and 20% of code
takes up 80% of time
Worth spending most time on 20% of code that take 80% of time
Bottlenecks may be in hardware (e.g. disks are very busy, CPU
is idle), or in software
Removing one bottleneck often exposes another
De-bottlenecking consists of repeatedly finding bottlenecks, and
removing them
This is a heuristic
17
Database System Concepts
21.17
©Silberschatz, Korth and Sudarshan
Identifying Bottlenecks
With concurrent transactions, transactions may have to wait for a
requested service while other transactions are being served
Can model database as a queueing system with a queue for each
service
transactions repeatedly do the following
request a service, wait in queue for the service, and get serviced
Bottlenecks in a database system typically show up as very high
utilizations (and correspondingly, very long queues) of a particular
service
E.g. disk vs CPU utilization
100% utilization leads to very long waiting time:
Rule of thumb: design system for about 70% utilization at peak load
utilization over 90% should be avoided
18
Database System Concepts
21.18
©Silberschatz, Korth and Sudarshan
Queues In A Database System
19
Database System Concepts
21.19
©Silberschatz, Korth and Sudarshan
Tunable Parameters
Tuning of hardware
Tuning of schema
Tuning of indices
Tuning of materialized views
20
Database System Concepts
21.20
©Silberschatz, Korth and Sudarshan
Tuning of Hardware
Even well-tuned transactions typically require a few I/O
operations
Typical disk supports about 100 random I/O operations per second
Suppose each transaction requires just 2 random I/O operations.
Then to support n transactions per second, we need to stripe data
across n/50 disks (ignoring skew)
Number of I/O operations per transaction can be reduced by
keeping more data in memory
If all data is in memory, I/O needed only for writes
Keeping frequently used data in memory reduces disk accesses,
reducing number of disks required, but has a memory cost
21
Database System Concepts
21.21
©Silberschatz, Korth and Sudarshan
Hardware Tuning: Five-Minute Rule
Question: which data to keep in memory:
If a page is accessed n times per second, keeping it in memory saves
n*
price-per-disk-drive
accesses-per-second-per-disk
Cost of keeping page in memory
price-per-MB-of-memory
ages-per-MB-of-memory
Break-even point: value of n for which above costs are equal
If accesses are more then saving is greater than cost
Solving above equation with current disk and memory prices leads to:
5-minute rule: if a page that is randomly accessed is used
more frequently than once in 5 minutes it should be kept in
memory
(by buying sufficient memory!)
22
Database System Concepts
21.22
©Silberschatz, Korth and Sudarshan
Hardware Tuning: One-Minute Rule
For sequentially accessed data, more pages can be read per
second. Assuming sequential reads of 1MB of data at a time:
1-minute rule: sequentially accessed data that is accessed
once or more in a minute should be kept in memory
Prices of disk and memory have changed greatly over the years,
but the ratios have not changed much
so rules remain as 5 minute and 1 minute rules, not 1 hour or 1
second rules!
23
Database System Concepts
21.23
©Silberschatz, Korth and Sudarshan
RAID Levels
RAID 0: nonredudant striping
(Block striping)
C
C
RAID 1: mirroed disks
C
C
(Block striping)
P
RAID 3: Bit interleaved parity
RAID 5: Block interleaved distributed parity
P
P
P
P
P
24
Database System Concepts
21.24
©Silberschatz, Korth and Sudarshan
Hardware Tuning: Choice of RAID Level
To use RAID 1 or RAID 5?
Depends on ratio of reads and writes
RAID 5 requires 2 block reads and 2 block writes to write out one
data block
If an application requires r reads and w writes per second
RAID 1 requires r + 2w I/O operations per second
RAID 5 requires: r + 4w I/O operations per second
For reasonably large r and w, this requires lots of disks to handle
workload
RAID 5 may require more disks than RAID 1 to handle load!
Apparent saving of number of disks by RAID 5 (by using parity, as
opposed to the mirroring done by RAID 1) may be illusory!
Thumb rule: RAID 5 is fine when writes are rare and data is very
large, but RAID 1 is preferable otherwise
If you need more disks to handle I/O load, just mirror them since
disk capacities these days are enormous!
25
Database System Concepts
21.25
©Silberschatz, Korth and Sudarshan
Tuning the Database Design
Schema tuning
Vertically partition relations to isolate the data that is accessed most
often -- only fetch needed information.
• E.g., split account into two, (account-number, branch-name) and
(account-number, balance).
• Branch-name need not be fetched unless required
Improve performance by storing a denormalized relation
• E.g., store join of account and depositor; branch-name and
balance information is repeated for each holder of an account, but
join need not be computed repeatedly.
• Price paid: more space and more work for programmer to keep
relation consistent on updates
• better to use materialized views (more on this later..)
Cluster together on the same disk page records that would
match in a frequently required join,
compute join very efficiently when required.
26
Database System Concepts
21.26
©Silberschatz, Korth and Sudarshan
Tuning the Database Design (Cont.)
Index tuning
Create appropriate indices to speed up slow queries/updates
Speed up slow updates by removing excess indices (tradeoff between
queries and updates)
Choose type of index (B-tree/hash) appropriate for most frequent types
of queries.
Choose which index to make clustered
Index tuning wizards look at past history of queries and updates
(the workload) and recommend which indices would be best for the
workload
27
Database System Concepts
21.27
©Silberschatz, Korth and Sudarshan