Transcript Pclec06

A Brief Look at Reality
datastructures 6.1
Aircraft Noise Monitoring System
1
Sydney Airport - 7000 complaints per day
Software Used : CA Windows/4GL - Application
Development Tool
CA-OpenIngres
- DBMS
Global Environment Management System
MOSAIX Technologies Spatial Application
Development Tool
datastructures 6.2
Aircraft Noise Monitoring System
2
• Developers - Lochard Environment Systems
(Melbourne group 1 of many involved)
Have installed noise and flight plan systems at:
Sydney, Melbourne, Brisbane, Cairns, Perth, Adelaide,
Coolangatta
All systems are integrated into 1 system monitored from the
offices of Air Services, Canberra
datastructures 6.3
Aircraft Noise Monitoring System
3
Some Features:
Environment Monitoring Units (EMU)
(data source - 16 at Sydney
- A reading every second)
Exception Levels - Report of noise level > permissible
threshold
Correlated with
Secondary Surveillance Radar
and
Flight Plan Information from the Air Traffic Control
System
datastructures 6.4
Aircraft Noise Monitoring System
4
Noise and Flight Plan overlaid to a map of surrounding
area - viewed in real time
Records stored in CA-OpenIngres database
Geographic Information System interface
GEMS database contains attributes such as:
aircraft information
type
call sign
engine capacity
address of owner/operator
datastructures 6.5
Aircraft Noise Monitoring System
Hardware Platform
Sun Microsystems:
Quad processor SunSPARC 514
256Mb RAM, 50Gbyte disk storage
Communications
TCP/IP network
- CA-Ingres/NET Communications
software
datastructures 6.6
Aircraft Noise Monitoring System
MOSAIX
Technologies
Spatial Application
Development
EMU’s
CA Windows/4GL
CA-Open Ingres
Global Environment
Management
System
datastructures 6.7
6.0 Design and Data Structures
Data structures are the bricks and mortar that hold
databases together.
Data structures (for the ANSI/SPARC standard) are defined
in the internal model level and implemented in the physical
data organisation.
datastructures 6.8
Design and Data Structures
Data structures are often hidden from the application
programmer, since they are primarily used by the DBMS
and the Operating System
Understanding data structures is important for performance
reasons, to improve program design and allow easier
communication with DBMS specialists
datastructures 6.9
Goals of Relational Design
To determine what Relations should exist and what
Attributes they should contain
Avoid Redundancy if possible
- minimise storage space
Avoid Anomalies
Avoid Nulls
Avoid Joins which produce misleading or incomplete rows
datastructures 6.10
Assumptions
A group of attributes has a natural “inherent” structure
This structure is independent of the way the data is used
Normalisation
• introduced by E.F. Codd together with relational database
theory
• originally Codd defined three normal forms
• later expanded to Boyce-Codd and fourth and fifth
normal forms
datastructures 6.11
Dependency Theory
" one truly scientific part of the field [of database design]"
Date 6th Ed p.380 - needs more research
Relational Database Design - a mechanical approach to
producing a database schema with certain desirable
properties
Review Normal Forms and the Problems they Solve
Study Algorithms which produce Normalised Relations
with Desirable Properties
datastructures 6.12
Normal Forms
Formal measure of why one grouping of attributes
may be better than another
Each Normal Form requires that a Relation satisfies
criteria for that form. This eliminates a different
kind of redundancy
Normalised Relations will remain consistent following
database operations and will store each fact only once
Database operations applied to unnormalised relations
may lead to anomalies
datastructures 6.13
Anomalies
Person_Id Project_budget
Project
Time_Spent_on_Project
S75
32
P1
7
S75
40
P2
8
S79
32
P1
4
S79
27
P3
1
S80
40
P2
5
-
17
P4
-
Relation Assign
Null Values are considered to be anomalies
datastructures 6.14
Anomalies
Insertion Anomaly
add row (ASSIGN , <S85,35,P1,9>)
- two conflicting budgets for P1 S75 32 P1 7
S79 32 P1 4
Deletion Anomaly
delete row(ASSIGN, <S79,27,P3,11>)
- removes project budget for P3 S79 27 P3 1
(the previous ‘table’ was more of a ‘report’ format)
datastructures 6.15
Functional Dependencies
- the values of one set of attributes effect the values
of another attributes
X
Y
The value of X determines the value of Y
The value of Y depends on the value of X
The simplest case is 1 attribute determines another single
attribute
datastructures 6.16
Functional Dependencies
Project
Project Budget
Person-Id
Project
Time Spent on Project
Functional Dependency Diagram
Project
Project Budget
datastructures 6.17
Minimal Functional Dependencies
Employee
Department
Employee determines Department
Department
Location
Department determines Location
Employee
Location
Employee determines Location - This Functional Dependency
Is Redundant
datastructures 6.18
Normalisation
What is it ?
It is a method for increasing the quality of database design.
It also doubles up as a theoretical base for definition of the
properties of tables.
It probably has its origins in the early days of moving from
‘files’ to database tables.
It supports and verifies the database model.
datastructures 6.19
Normalisation
1st Normal Form
Repeating groups must not occur
Subject
Student No. Result
COT7180
COT2138
COT7700
9142717
9131618
9077184
8967384
8737980
9142717
6932475
C
D
P
N
P
P
HD
Name
Wilson
Renoir
Gilbey
Breton
Balzac
Wilson
Gilbey
1st Normal Form
datastructures 6.20
Normalisation
Corrected Table
Subject
COT7180
COT7180
COT7180
COT2138
COT2138
COT7700
COT7700
StudentNo
9142717
9131618
9077184
8967384
8737980
9142717
6932475
Result
C
D
P
N
P
P
HD
Name
Wilson
Renoir
Gilbey
Breton
Balzac
Wilson
Gilbey
Formally expressed as:
Results(subject code, studentno,result,name
datastructures 6.21
Normalisation - Examples
1. 1NF or First Normal Form Rule
Each Row MUST CONTAIN the same number of
columns
Example: Course Instructor table
class code
lecturer
tutor
c3576
k4567
b6745
r3289
tutor
tutor
Doe,J
Jones,R Smith,V
Nguyen,P
Fabbri,M Ong,W
Pratt,W
Archer,V Barrat,N
Ng,K
The number of columns (attributes) is not consistent
Create a table of TUTORS with Class Codes
datastructures 6.22
Normalisation - 2
2. 2NF or Second Normal Form
(1) The table must be in 1 NF
(2) Each attribute which is NOT part of the key
must be functionally dependent on the whole key
Building
A
A
B
Room
214
242A
213
Seats
85
25
135
No. of Levels
4
4
6
The number of levels is functionally dependent on
BUILDING only, not Building and Room
datastructures 6.23
Normalisation 3
3. 3NF or Third Normal Form
(1) Table must be in 2NF ( and therefore 1NF)
(2) Every attribute which is NOT part of the
Primary Key must be dependent ONLY on
the P.K. (i.e. not dependent on any other
NON-KEY attribute)
Class Code
c3567
k4567
b7654
Lecturer
Doe,J
Nguyen,L
Fabbri,M
Lecturer’s Office
101 A Building
209 F Building
312 B Building
NOT
3NF
datastructures 6.24
Normalisation 3a
Table1
Class Code Lecturer
c3576
Doe,J
k4567
Nguyen,L
b7645
Fabbri,M
Table 2
Lecturer Lecturer’s Office
Doe,J
101 A Building
Nguyen,L 209 F Building
Fabbri,M 312 B Building
These tables are in 3NF.
datastructures 6.25
Functional Dependencies
supplier
name
p.k
supplier
number
status
city
Notice that these
attributes are
dependencies of the
primary key
part
name
colour
p.k
part
number
mass
city
Not functionally dependent
city assets
datastructures 6.26
A Simple Test for 3NF
Each attribute should depend on :
• the key
• the whole key
• and nothing but the key
datastructures 6.27
Normal Forms Based on FD’s
Review of Kent's Process
1NF - the shape of the record - fixed length
- no repeating groups
- Relational Model does not handle repeating groups
Relationship between key and non-key fields
1:1 or 1:N
2NF is violated when a non-key field is a fact about a
subset of the key
datastructures 6.28
Normal Forms Based on FD’S
Problems with relations in 2NF
- repeated information
- update anomalies
- potential inconsistency
- delete anomalies
3NF
- is violated when a non-key field is a fact about another
non-key field
Problems with relations in 3NF
when a non-key field is a fact about a key field
datastructures 6.29
Modelling - some hints
1. Do not prematurely combine entities into tables
2. Concentrate on Access Mechanisms which can be
shared among requests
3. Deviate from the data model in a responsible manner
4. Use table/view and column names which closely reflect
data model names
5. Do not define multiple attributes as one (composite)
column in a table. (i.e. attribute = column)
6. Develop an ‘architecture’ to support business rules (rules,
procedures, integrity)
7. Position your organisation for future technology
(windows, CASE, optimisers)
datastructures 6.30
Physical Access Considerations
We are going to have a quick look at some
aspects of accessing data from the database
datastructures 6.31
3 level architecture
external
view A
Cobol
+ DML
external
view B
C
+ DML
DBMS
external
view C
DML
external
view X
Assembler
+ DML
Conceptual Model
Internal Model
operating
system
Physical Level
External
Schemas
user
interface
Conceptual
Schema
Internal
Schema
logical
record
interface
stored
record
interface
physical
record
interface
datastructures 6.32
Transaction Processes
Locate Record
Bring Block
into Buffer
Rewrite
CREDITS field xxx
in buffer
3
1
2
4
6
Write block
to disk
5
datastructures 6.33
Database Access
DBMS
FUNCTIONS
1
2
Request for
Stored Record
Request for
Stored Page
Stored Record returned
for processing
FILE
MANAGER
Stored Page Returned
6
5
DISK
MANAGER
3
Operating
System
Disk I/O
Operation
Data Read from Disk
DATA
BASE
4
DISK MANAGER
Physical Disk Addresses
(data and free space)
FILE MANAGER
Files and Page Sets
datastructures 6.34
Implementation Design - File Organisation and Access
File Access and Organisation
datastructures 6.35
Table Access Calculation
Number of PAGES or BLOCKS
No. of Disk I/O’s per second
Assume 1 disk I/O per page
Assume HEAP storage structure
300,000
3
select * from employee
where employee.name = ‘Johnston’;
[closure feature of SQL]
datastructures 6.36
Table Access Calculation
Time taken = 300000/30 = 10,000 seconds
(nearly 3 hours !)
Indicates the need of a storage structure which
provides access on a ‘key’
Modify existing structure to ? ? on what key
(attribute or multiple attributes)
Other items: fillfactor minpages maxpages
compression leaffill location
datastructures 6.37
Indexes - Performance Improvers
Performance is dependent on how quickly an application can
access table data (remember caching in Session 2 ?)
In Oracle there are these points to consider in
Implementation Design
1. Applications can access table data with or without indexes
2. IF an index is present and IF the index will help performance,
Oracle will use the index
3. Oracle will automatically update an index to keep it in
synchronisation with its table
datastructures 6.38
Indexes - Performance Improvers
It is not a good tactic to index every attribute set.
Indexes should be restricted to key attributes
Index maintenance generates overheads - time and storage
datastructures 6.39
File Organisation
A file organisation is a technique for physically arranging the records of a
file on a secondary storage device.
File organisations
Sequential
Sequential
(block index)
Hardwaredependent
(ISAM)
Indexed
Nonsequential
(full index)
Direct
RelativeAddressed
HashAddressed
Hardwareindependent
(VSAM)
datastructures 6.40
Record Access Modes
Sequential Access
In sequential access, record storage starts at a designated point, usually
the beginning, and proceeds in a linear sequence through the file. Each
record can only be retrieved by accessing all the records that physically
precede it.
Random Access
In random access, a given record is accessed "out of the blue" without
referencing other records in the file.
datastructures 6.41
File Organisation and Access Mode
A File organisation is established when the file is created, and is rarely
changed. However, record access mode can change each time the file
is used.
File
Organisation
Record access mode
Sequential
Random
Sequential
Yes
No (impractical)
Indexed Seq.
Yes
Yes
Direct-Relative
Yes
Yes
Direct-Hashed
No
(impractical)
Yes
datastructures 6.42
Indexed Sequential File Organisation
There are two basic implementations of the indexed sequential
organisation:
- hardware-dependent uses block index on the key, disk
address to the prime area which contains the data records and
the track index for the cylinder
- hardware-independent uses a control interval which may be
considered a virtual track, free space for new records is
provided by distributed free space.
datastructures 6.43
Relative Files
- Each record can be retrieved by specifying
its relative record number.
- The relative record number is a number 0 to n
that gives the position to the record relative to
the beginning of the file.
- This provides a method of direct file organisation.
datastructures 6.46
Hashed Files
For applications which do updates and retrievals in random
mode, and there is rarely the need for sequential access to the
data records (e.g. reservation systems). Hashed file
organisation provides rapid access to individual records based
on a key.
The major disadvantage of hash organisation is that sequential
organisation is not convenient because the records are not
stored in primary key sequence. But highly concurrent
environments doing random access are suitable for using hash
organisation.
The basis of a hash file is an addressing algorithm which
transforms the record identifier into a relative address.
datastructures 6.47
Binary Trees
A non-linear data structure, each element having several
"next" elements ( branching ).
A binary tree has a maximum of two branches per element or
node.
A node consist of some data and a maximum of two pointers,
a left pointer to the left branch and right pointer to the right
branch. If there is no left or right branch then a nil pointer is
used.
datastructures 6.54
A Diagram of a Binary Tree
Basic binary
PRODUCTNO
LLINK
RLINK
tree record
Primary Data
Less Than Greater Than
layout for
PRODUCT
Key
Pointer
Pointer
__________________________________________
1000
1000
<
>
1600
(1) Initial tree
< 1000
0350
(2) Insert 1000
(3) Insert 1600
<
1600
>
2000
(5) Insert 2000
0350
1600
(4) Insert 0350
<
0350
>
1600
>
0975
(6) Insert 0975
>
2000
>
0350
1000
>
1000
1000
>
1600
>
< 0975
>
2000
0625
(7) Insert 0625
datastructures 6.55
An Example of a Binary Tree
1000
<
1600
<
0350
<
0100
>
<
0625
>
0975
1250
>
1425
>
2000
<
1775
Task: Indicate the different traversals on this diagram.
datastructures 6.57
B
Trees
The problem with Binary Trees is balance, the tree can easily
deteriorate to a linked list.
Consequently, the reduced search times are lost.
This problem is overcome in B-trees.
B for Balanced, where all the leaves are the same distance
from the root.
B-trees guarantee a predictable efficiency.
datastructures 6.58
B+ Trees
There are several varieties of B-trees, most applications use
the B+ tree.
The + indicates the presence of an Index -
A B-tree index is an ordered tree of index nodes
Each index node contains one or more index entries
Each index entry corresponds to a row in a table
The index entry contains :
the indexed attribute vale (or set) for the row
the ROWID (physical disk location)
datastructures 6.59
Example of a B+ Tree
Leaves
1250
0625
0350
0350
1000
0625
0625
1425
1000
1000
1250 1300
1250
2000
1425 1600
2000
1425
1300
2000
1600
Actual Data Records
datastructures 6.63
A Review of Trees
Can permit rapid retrieval of data for both random and
sequential processing.
Can be used on primary or secondary keys.
Trees are special cases of networks; in networks, records from
different files are joined without a strict hierarchy
being observed. This is addressed in the hierarchical and
network model lectures.
datastructures 6.64
Other Methods
Bit Mapping
This is another table and its contents are
– a bit to indicate the presence of some value
– a row i.d. to reference the row
Row I.D.
Row 1
Row 2
Row 3
Row 4
Green
Red
1
1
1
% of distinct values to total should be low
Useful in DSS/data warehouse applications
Not good for frequent update or insert applications
datastructures 6.65
Other Methods
Clustering - Oracle
Is a technique which ‘clusters’, or groups together, related
rows of one or more tables in the same data block.
The objective is to store (on disk) rows of an application which
are used together (e.g. Orders and Items ) - this saves disk
I/O on analysis applications
A cluster key is necessary for each cluster.
Not very successful for high volume processing
datastructures 6.66
Storage Structure Selection
Requirements
Table size : small
Heap
2
Hash
1
1
ISAM BTree
1
3
Table size : medium (modify
disk space available)
4
1
1
1
Table size : large (>1/2 disk
2**
4
4
1
Deletes Frequent
4
1
1
3
Updates Frequent
4
1
1
2
1
1
Secondary Index Structure
N/A
1
** secondary indexes used with a heap structure
datastructures 6.67
Storage Structure Selection
2
Requirements
Heap
Hash
ISAM
BTree
Need Pattern Matching
4
4
1
1
Need Range Searches
4
4
1
1
Exact Match Key Retrieval
4
1
2
2
Sorted Data
4
4
2
1
Concurrent Updates
4
1
1
2
Add Data - No Modify
2
3
3
1
Sequential Addition of Data
1**
2
5
1
datastructures 6.68
Storage Structure Selection
Requirements
Heap
Hash
3
ISAM BTree
Initial Bulk Copy of Data
1
2
2
2
Table Growth - nil/static
N/A
1
1
2
Table growth - low (15%)
Plan to modify periodically
N/A
1
1
2
Table growth - high
Too fast to modify
3
3
3
1
datastructures 6.69