Lecture - Department of Computing
Download
Report
Transcript Lecture - Department of Computing
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
3
Physical Design
1
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Introduction
Database Design Methodology
requirements specification
ER/EER modelling
validation of ER/EER models; aggregation of different views
transformation of ER/EER model into relational model
normalisation
physical design
monitor and tune operational system
2
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Outline
overview of physical design
base relations and enterprise constraints
• known from before
transactions analysis
file organisation and indexes
3
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
1
4
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Physical Design – Rationale
all the documentation produced until physical design
represents a detailed specification of what we intend
to build, of what is required
informally, physical design is the process that
transforms these specifications into a good working
system, using the functionality provided by the
chosen DMBS
5
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Overview of Physical Design
express logical model using the DDL language of the
chosen/target DBMS
design optimal storage
design user views
design security mechanisms
in next lecture
6
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Express Logical Model in Target DBMSd
essentially covered in the first term
design and implement base relations
analyse and document derived data
design enterprise constraints
7
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Design Base Relations
are domains supported?
are attribute constraints supported?
• NOT NULL, UNIQUE
are keys supported?
candidate
• primary
• alternate (UNIQUE + NOT NULL)
foreign
• referential integrity
• FK rules
8
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Analyse and Document Derived Data
derived attributes
not present in the relational model;
but present in the EER model
• it is possible that not all derived attributes are documented thus far
• then, this issue can be given further consideration at this point
trade-off
calculate a derived attribute each time it is being used
• may be too time-expensive
store it in the database
• redundancy, therefore more space required (space-expensive) and
possibility of inconsistencies
• to maintain consistency – integrity constraints of active rules; now
this may become time-expensive
9
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Design Enterprise Constraints
support for integrity constraints
e.g., SQL’s
• CONSTRAINT … CHECK … NOT EXISTS …
support for active rules or triggers
e.g., Postgres’:
• CREATE RULE … ON UPDATE TO … WHERE … DO …
example?
10
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Design Optimal Storage
criteria
maximise transaction throughput
maximise response time
minimise storage space
(they determine and are determined by the system resources)
issues
transactions analysis
file organisation and indexes
11
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
2
12
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Analyse Transactions
identify critical transactions to the functionality of the
information system
e.g., transactions that should never fail
identify transactions that put a significant load on the
DBMS
run frequently
processing time is high
identify the periods when the database system is
heavily used (down to individual transactions)
identify type of users by whom or locations from
where the database is going to be heavily used
13
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Map Transactions [Paths] to Relations
identifies mostly used
relations
matrix
transactions
relations
elements of the matrix:
• Y/N (i.e. used, not used)
• number of accesses
(per hour, day …)
T1
T2
T3
T4
I R UD
I R UD
I R UD
I R UD
rel 1
rel 2
rel 3
rel 4
rel 5
14
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Determine Frequency Information
determine average, maximum and minimum number
of times a transaction runs per hour, per day, …
transaction usage map
determine peak periods
determine demanding transactions that have in
common some of the resources they access
problems due to locking
15
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Bookings for Personal Tutor
(A) check availability
• see if tutor is available
at a specific time
(B) see my appointments
• list all my appointments
for given period
(C) see details of my
appointments
• list all my appointments,
including the tutor’s
nameand office
Student
Tutor
name
office
email
With
(A)
name
programme
email
(C)
For
Booking
day
time
topic
(B)
16
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Transaction Usage Maps
frequency: per hour
Student
300
Tutor
20
1
1
With
(A)
avg: 20
max: 200
For
avg: 20
(C) max: 100
0..*
Booking
1500
0..*
(B)
avg: 40
max: 150
17
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Analyse Data Usage
more detailed analysis
(relations) attributes and the type of access
• updated attributes – not candidates for access structures
attributes used in predicates
• are candidates for access structures
attributes involved in joins
• candidates for access structures
attributes affecting performance of critical transactions
• higher priority for access structures
transaction analysis form
• refer to Connolly, p.490
18
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Analyse Transactions - Conclusion
essentially, the transaction analysis identifies the
critical aspects related to the usage of the database
(e.g., relations used frequently, attributes involved in
“expensive predicates”, …)
on its basis, file organisations and indexes can be
chosen
19
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
3
20
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
File Organisation
data has to be stored in an efficient way
• all 4 operations (insert, retrieve, update and delete) require
efficiency
efficient has different meanings in different contexts
different storage structures or file organisations
represent efficient ways for different contexts; e.g.
• “heap” structure is suitable for bulk-loading and bulk-retrieval
(all student names, all programmes, …)
• “hash” structure is suitable for “exact match” queries (student
name = ‘Joe Bloggs’)
note that a structure that is efficient in one context
may not be efficient in a different context
21
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Index
a structure that allows the DBMS to locate records in
a table/file more quickly
the decision as of
which attributes to be chosen as indexes, and
which type of indexes they should be (which type of file
organisation)
… is determined by the results of the transaction
analysis
22
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
File Organisation vs Index
file organisation – method of storing data on disk,
with our without the use of indexes
index – data structure used to access records more
quickly
primary and clustering index – part of the storage of the
actual records; the records themselves are physically
ordered according to the index
secondary index – auxiliary data structure; the records may
be (usually are) unordered according to the index;
index is sometimes used to mean secondary index
the issue of indexes is subsumed by the issue of file
organisation
23
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
File Structures/Types
Heap / unordered
Index Sequential Access Method
Hash
B+ tree
24
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Heap / Unordered
records are written in the file in the same way as they
are inserted, at the end of the file
insertion is efficient, in particular for bulk-loading
there is no ordering
retrieval is very inefficient if it involves predicates/conditions
similarly update and deletion
the space freed by deleted records is not
automatically reused
administrator has to run routines
25
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Indexed Sequential Access (ISAM) / Ordered
records are ordered on the basis of some attributes –
the index field
primary index – ordering attributes are a key
• one index value to a tuple
clustering index – ordering attributes are not a key
• one index value to a group of tuples
sequentially ordered secondary indexes can also be
created
what is the difference be between a clustering index and a
secondary sequential index?
26
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Use of ISAM Files
recommended
exact matching (based on the index field)
range of values (based on the index field)
drawback
ISAM indexes are static (created when the file is created)
not recommended
updates to index field
• the access key sequence deteriorates
27
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Hash (“random” or “direct” access)
a hash function calculates the address where each
record is to be stored
the calculation is performed on the basis of some fields
records appear to be randomly distributed across the file
space
the function should be chosen such that it leads to an as
good as possible distribution of the records in the
available space
problem : most hashing functions do not guarantee a unique
address, because the file space much smaller than possible
values of hash field
• address generated by hash function BUCKET (with SLOTS)
• COLLISION (SYNONYMS) same bucket, different slots
hash attributes – secondary index
28
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Use of Hash Files
recommended
for retrievals based on exact matches, in particular when the
access order (the order in which queries arise) is random
not recommended
retrieval based on pattern match
retrieval based on ranges of values
retrieval based on other fields than the exact hash filed
when the hash field is frequently updated
29
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
B+ Tree
B Balanced tree
more versatile than the hash structure
details – optional issue
recommended
exact match (index filed)
pattern matching (index filed)
range of values (index filed)
part index filed specification
advantage
B+ tree is dynamic – it grows as the relation grows
performance does not deteriorate with updates
B+ tree – secondary index
30
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Choosing Indexes
consider results from transactions analysis
primary/clustering index
attribute(s) used in joins
attribute(s) most often used to access the relation
secondary indexes
trade-off: maintenance of an index vs. efficiency of queries
• work in class on maintenance operations
choosing secondary indexes
index primary key (if not already a primary index)
index attributes often involved in joins and selection criteria
do not index attributes which are frequently updated
…
31
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
File Organisation - Conclusion
pre-defined file-structures exists that provide better
efficiency of certain database operations in certain
contexts
32
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
–
33
Term 2, 2004, Lecture 5, Physical Design
Marian Ursu, Department of Computing, Goldsmiths College
Conclusions
physical design
what it consists of
transaction analysis
identifies “hot-spots” of the database
file organisation and indexes
make work with the “hot-spots” more efficient
34