Database Introduction -2

Download Report

Transcript Database Introduction -2

And Franchise Colleges
HSQ - DATABASES & SQL
Supplement 01(b)
Database-Introduction-2
By MANSHA NAWAZ
Supplement 01 (b)
Database Introduction-2
The Database Concept (4GL TOOLS & TECHNIQUES)
•
A Database
A data base is a collection on non-redundant data shareable
between different application systems. [Howe, D.R. 1989]
– This does not imply a computer system.
– Is a well organised filing cabinet a database by this definition?
•
A Database Management System (DBMS)
– A sophisticated software development package capable of handling a systems
database stored needs.
•
We are particularly interested in Relational Database Management Systems in
this module (RDBMS).
Supplement 01 (b)
Database Introduction-2
Application Systems Sharing Data
Admission
System
Operation
Scheduling
Report
Medical
Staff Report
Database
•
The Applications (Application Systems / Programs etc.
– The Admission System uses patient and medical staff data.
– The Operation Scheduling Report uses operating theatre, patient and medical staff
data.
– The Medical Staff Report uses medical staff data.
•
The application systems share data.
Supplement 01 (b)
Database Introduction-2
Data Models
•
A Data Model provides a particular way of thinking about data, at least in terms of its structure
Data Models include data descriptions, data relationships, data semantics, consistency constraints
[Silberschatz et. al.]
Data model comprises three components: data structures, data manipulators, and general integrity
rules [Codd (1970)]
•
There are many types of data model.
– Hierarchic
– Entity-Relationship
•
- Network
- Object-Oriented
- Relational
- Multimedia
- Multidimensional
- etc.
Each database uses a definition language
– imposes restrictions on
• what can be defined
• how entities relate to each other
•
In this module we are interested mainly in the Relational and Entity-Relationship data models.
•
Why? The principles involved in Entity Relationship Modelling apply to all data models
Supplement 01 (b)
Database Introduction-2
The Relational Model

Relational model first proposed in 1970 by Dr E F (Ted) Codd in the paper ‘A
relational model of data for large shared data banks’.
– Achieve program/data independence by treat data in a disciplined way
– Has a mathematical basis – the term “relation” comes from this
• Apply rigour of mathematics
• Use set theory
•
Determining data structure.
– Data is stored in a structure of relations (tables) defined by a data definition language
(DDL).
– The elements of data structure used in relational models are relations (tables),
attributes, tuples (rows), and domains. [Rolland p39-44]
•
Defining data integrity.
– Data integrity means that data remains stable, secure, and accurate.
– It is maintained by internal constraints known as integrity rules that are invisible to
users. [Rolland p45-48]
Supplement 01 (b)
Database Introduction-2
The Relational Database
•
•
A relational database is made up of relations (tables) in which data are stored.
A relation (table) is a 2-dimensional structure made up of attributes (columns) and
tuples (rows).
Relation
•
A relation is a table that obeys the following rules:
–
–
–
–
There are no duplicate rows in the table.
The order of the rows is immaterial.
The order of the columns is immaterial.
Each attribute value is atomic, ie each cell can contain one and only one data value.
Supplement 01 (b)
Database Introduction-2
Example of a Table (Relation)
•
•
Relations can be manipulated and changed using a data manipulation language
(DML) that employs relational operators.
– These operators are based on the concepts of relational algebra.
Information is represented as two dimensional tables as below.
ANIMAL TABLE
ANIMAL
Supplement 01 (b)
ANAME
AFAMILY
Candice
Camel
1800
Zona
Zebra
900
Sam
Snake
5
Elmer
Elephant
5000
Leonard
Lion
1200
Database Introduction-2
WEIGHT
Tables and Keys – for relationships
•
•
•
•
A primary key is a unique identifier for each row in a table. Can consist of one or
more columns. Each table contains data about one entity.
A foreign key is a column or columns in one table which reference(s) a primary key
column or columns in another table.
Values in a foreign key must match an existing value in the primary key or be NULL.
This is known as the referential integrity rule.
ANO in the ANIMAL-FOOD table is part of the primary key and also a foreign key.
ANIMAL-FOOD
ANO FOOD
ANIMAL
CA1 Hay
ANO ANAME AFAMILY WEIGHT
CA1 Buns
CA1 Candice Camel
1800
ZE4 Brush
ZE4
Zona
Zebra
900
SN1 Mice
SN1 Sam
Snake
5
SN1 People
EL3
Elmer
Elephant
5000
EL3 Leaves
LI2
Leonard Lion
1200
LI2
People
LI2
Meat
Supplement 01 (b)
Database Introduction-2
Relational Database Terminology
•
•
•
•
•
•
•
Relation
Tuple
Attribute
Primary key
Domain
Degree
Cardinality
Supplement 01 (b)
a table with rows and columns
a row of a relation
a named column of a relation
a unique identifier for each row in a relation
the set of allowable values for a column
the number of columns in a relation
the number of rows in a relation
Database Introduction-2
Flight Table Example
Flight:
Flight#
BA143
BA142
KT222
KT401
KT402
KT111
KT112
DE477
DE478
BA101
BA102
Origin
NAP
ROM
LHR
JFK
DUL
CCG
MIA
ATH
CDG
EDI
LHR
Destination
ROM
NAP
JFK
DUL
JFK
MIA
CCG
CDG
ATH
LHR
EDI
Arrival
10.15
10.22
10.34
10.45
10.54
11.06
11.11
11.34
11.56
12.04
12.33
Table type usually written as follows:
Flight: (Flight#, Origin, Destination, Arrival)
Supplement 01 (b)
Database Introduction-2
Relational Database Terminology
Flight:
Key Field
Tuple, Row or
Record
Flight#
BA143
BA142
KT222
KT401
KT402
KT111
KT112
DE477
DE478
BA101
BA102
Origin
NAP
ROM
LHR
JFK
DUL
CCG
MIA
ATH
CDG
EDI
LHR
Destination
ROM
NAP
JFK
DUL
JFK
MIA
CCG
CDG
ATH
LHR
EDI
Arrival
10.15
10.22
10.34
10.45
10.54
11.06
11.11
11.34
11.56
12.04
12.33
Column,
Attribute, Type
or Field
Intension
or
table type
Attribute
Value, Data
or Value
Domain
= set of values drawn upon by a particular attribute
Cardinality = No. of Rows in a relation
Q: Which is most
Degree
= No. of Columns in a relation
likely to change?
Supplement 01 (b)
Database Introduction-2
Your turn – define the following terms!!
Order:
Client#
C1
C2
C2
C2
C3
C4
Value
Domain
Attribute type
Attribute Value
Tuple
Extension
Supplement 01 (b)
Part#
P1
P1
P2
P3
P2
P1
Qty-ordered
25
12
8
4
3
20
Row
Degree of Order
Cardinality of Order
Table Type
Table Occurrence
Relation
Database Introduction-2
Date
22/08/99
24/08/99
24/08/99
24/08/99
25/08/99
25/08/99
Column
Intension
Record
Field
Attribute
Why Tables ?
Shop:
Supplement 01 (b)
Shop ID
1
2
3
4
5
6
7
Area-code
2
1
1
2
3
4
1
Location
Edinburgh
London
London
Edinburgh
Birmingham
Ipswich
London
Database Introduction-2
Primary Key Data Duplication
Shop:
Area:
Supplement 01 (b)
Shop ID
1
2
3
4
5
6
7
Area-code
2
1
3
4
Area-code
2
1
1
2
3
4
1
Location
Edinburgh
London
Birmingham
Ipswich
Database Introduction-2
Duplication of
primary key data
maintains the link
(relationships)
between tables
(data)
Normalisation is
the process used
to make sure
tables are nonredundant
Keys within the Relational Data Model
•
•
The Primary Key is the field which uniquely identifies a record
The Primary Key (Unique Identifier) concept:
Example :
student (student# , name , …) (the # character means 'number’)
•
If student# is the primary key then a particular student#, e.g. 'S4', can only
occur once in that column of the table.
student#
S4
S2
S9
S11
Supplement 01 (b)
name
Ramesh
Peter
Anthony
Priti
Database Introduction-2
•
The Primary Key uniquely identifies a student and thus that student can only
have one row in the student table.
•
Breaking that rule ….
student#
S4
S2
S9
S11
S4
•
•
name
Ramesh
Peter
Anthony
Priti
Fred
The new row in the table does not make much sense.
– The first row for 'S4' is sufficient to hold the name and we cannot allow a
second row with 'S4' as the primary key value.
Why?
Supplement 01 (b)
Database Introduction-2
•
Example
TASK (employee#, project#, role, supervisor, ours_allocated,
hours-so-far, hours-required , …)
employee#
project#
role
supervisor
hours-allocated
hours-so-far
hours-required
E2
P9
program
E123
120
85
100
E2
P4
design
E101
300
250
200
E101
P9
design
E101
60
128
56
E22
P11
test
E345
40
0
40
•
This table has a composite primary key.
– The primary key is composed of the three attributes
[employee#, project# , role].
•
In each row, the composite of the values for these attributes must be
unique.
– For example, the first row has the values ['E2, P9, program'] for these
attributes.
– No other row is allowed to have the same combination.
Supplement 01 (b)
Database Introduction-2
•
Breaking that rule …
employee#
project#
role
supervisor
hours-allocated hours-sofar
hoursrequired
E2
P9
program
E123
120
85
100
E2
P4
design
E101
300
250
200
E101
P9
design
E101
60
128
56
E22
P11
test
E345
40
0
40
E2
P9
program
E99
12
0
2000
•
Again this makes no sense or the basic design is wrong
– Perhaps an employee can be re-allocated to a project, with a different
supervisor, to do more programming.
– In this case the chosen primary key is wrong and needs the addition of an
extra attribute such as date.
•
For example, [employee#, project# , role, start_date] might be an
appropriate identifier.
Supplement 01 (b)
Database Introduction-2
Foreign Keys (Posted Identifier) concept
•
Example:
student(student#, course#, student_name, …)
course (course#. course_name, …)
•
course# is the identifier (primary key) of the course table.
•
The course# is posted into the student table and is thus called a
FOREIGN KEY (or posted identifier).
– Now for any student we can easily find the appropriate course# and look up
futher details of that course in the course table is needed.
– This is easy to do in any relational database.
Supplement 01 (b)
Database Introduction-2
Candidate Keys (Candidate Identifier)
– A favourite example
•
•
Employee(emp#, name, address, National_Insurance#, ..)
Or
Employee(National_Insurance# , name, address, emp#,..)
•
Both National_Insurance# and emp# can be primary keys (unique identifiers) of
employee.
•
You choose one as the most appropriate from the two candidate (possible)
keys.
•
You could argue that the composite [name, address] is another candidate
primary key.
Supplement 01 (b)
Database Introduction-2
Examples - Primary and Foreign keys
Shop:
Visit:
Shop ID Area-code
1
2
2
1
3
1
4
2
5
3
6
4
7
1
Visit#
V1
V2
V3
V4
V5
Area:
Area-code
2
1
3
4
Supplement 01 (b)
Location
Edinburgh
London
Birmingham
Ipswich
Patient#
P1
P1
P2
P3
P1
Doctor#
D1
D1
D1
D1
D2
Date
22/08/99
24/08/99
24/08/99
24/08/99
25/08/99
Order:
Client#
Client#
C1
C2
C2
C2
C3
C4
Database Introduction-2
Part#
Part#
P1
P1
P2
P3
P2
P1
Qty-ordered
Qty-ordered
25
12
8
4
3
20
Date
Date
22/08/99
24/08/99
24/08/99
24/08/99
25/08/99
25/08/99
Examples - Primary and Foreign keys
Employee:
Marriage:
Supplement 01 (b)
Emp#
E1
E2
E3
E4
E5
E6
Man#
Man#
P1
P2
P2
P3
P4
P5
Emp_name
Fred Brown
Eve Munsen
Joyce Goldberg
Paul Samuels
Paul Josephs
Terry Wain
Women#
Women#
P6
P7
P8
P9
P10
P8
Database Introduction-2
Dept#
Mtg
R&D
Admin
Mtg
R&D
Production
Date
Date
22/04/94
23/08/95
24/04/97
2/01/99
5/07/99
5/08/99
Status
Manager
Manager
G1
G4
G3
Manager
Anatomy of a table - a reminder

A Table Occurrence: using a variation of the Library Copies table
access_no
isbnx
price
now_price
condition
timesloaned
4,887,642
0-7131-3688-X
£12.95
02.06.92
A2
4
4,887,657
0-7131-3688-X
£12.95
17.09.91
B1
47
6,055,432
0-7248-1045-5
£37.65
12.04.92
A2
17
9,387,263
0-6542-1212-B
£15.99
14.02.91
B2
37
7,365,241
0-2435-3468-V
£27.40
19.11.91
A3
7
3,874,652
0-2435-3468-V
£27.40
19.11.91
A1
11
Attribute: Example: date-purchased
Value:
Example: 02.06.92
Table
Example: COPIES(access_no, isbnx, price, now_price, condition, times-loaned)
– What is special about isbnx in this table?
Supplement 01 (b)
Database Introduction-2
The 4 Rules for Normalised Tables [Rolland p72-]
•
No row order significance.
•
No column order significance.
•
No multiple values at row/column intersections.
•
No duplicate rows.
•
Snapshots of table occurrences.
– When we look at a paper copy of a table remember that the data in a real database
table can be expected to change all the time.
– The COPIES table could have 5 rows the first time we look and on another day
there could be hundreds or thousands.
– Always
table really
has thousands of rows.
Supplement
01 (b) assume a databaseDatabase
Introduction-2
The 4 rules for Normalised Tables broken
•
No row order significance (Rule broken).
access_no
isbnx
price
now_price
condition
timesloaned
4,887,642
0-7131-3688-X
£12.95
02.06.92
A2
4
17.09.91
B1
47
4,887,657
6,055,432
0-7248-1045-5
£37.65
12.04.92
A2
17
9,387,263
0-6542-1212-B
£15.99
14.02.91
B2
37
7,365,241
0-2435-3468-V
£27.40
19.11.91
A3
7
19.11.91
A1
11
3,874,652
•
If you swap the rows you lose information as copies are sometimes dependent
on the row above for their ISBNX number.
Supplement 01 (b)
Database Introduction-2
No Column Order Significance (Rule broken).
access_no
•
•
•
isbnx
price
condition
times-loaned
4,887,642
0-7131-3688-X £12.95 02.06.92
A2
4
4,887,657
0-7131-3688-X £12.95 17.09.91 28.07.93
B1
47
6,055,432
0-7248-1045-5 £37.65 12.04.92
A2
17
9,387,263
0-6542-1212-B £15.99 14.02.91 31.08.93
B2
37
7,365,241
0-2435-3468-V £27.40 19.11.91
A3
7
3,874,652
0-2435-3468-V £27.40 19.11.91
A1
11
The two columns with no attribute type shown are intended to indicate the datepurchased followed by the date-removed.
– The date-removed column would contain a significant number of NULLS
(explained later).
The two columns are now dependent on the column order for their meaning.
– If you move the first date column to the end of the table then the meaning is lost.
Clearly having each column with its own attribute type is simpler and makes the
columns order independent of each other.
Supplement 01 (b)
Database Introduction-2
No Multiple Values at Row/column Intersections (Rule Broken).
•
Or No Repeating Groups
access_no
isbnx
price
date
condition
timesloaned
4887642,
4887657
0-7131-3688-X
£12.95
02.06.92,
17.09.91
A2, B1
4, 47
6,055,432
0-7248-1045-5
£37.65
12.04.92
A2
17
9,387,263
0-6542-1212-B
£15.99
14.02.91
B2
37
7365241,
3874652
0-2435-3468-V
£27.40
19.11.91,
19.11.91
A3, A1
7, 11
•
This is just too complicated – it makes searching and sorting difficult.
•
Can you sort this into date order??
•
Can it be easily searched on access_no ??
Supplement 01 (b)
Database Introduction-2
No Duplicate Rows (Rule broken).
access_no
isbnx
price
date
condition
times-loaned
4,887,642
0-7131-3688-X
£12.95
02.06.92
A2
4
4,887,657
0-7131-3688-X
£12.95
17.09.91
B1
47
6,055,432
0-7248-1045-5
£37.65
12.04.92
A2
17
9,387,263
0-6542-1212-B
£15.99
14.02.91
B2
37
7,365,241
0-2435-3468-V
£27.40
19.11.91
A3
7
3,874,652
0-2435-3468-V
£27.40
19.11.91
A1
11
4,887,642
0-7131-3688-X
£12.95
02.06.92
A2
4
•
Why is this such a BAD idea?
•
If you allow redundantly duplicated data in a real system, what will be the end
result?
Supplement 01 (b)
Database Introduction-2
Domains
•
An attribute cannot contain just any data.
– For example we could have the attribute student_date_of_birth
– Whilst '11-January-1980' might be a suitable value, 'FRED BLOGGS'
clearly is not - the wrong data type.
•
So any value of student_date_of_birth at least should be a valid date.
– Other rules might apply to the attribute - dates before the year 1900 seem
unlikely to be useful etc.
•
The Domain concept carries this a bit further.
– A Domain is the pool of values from which an attribute draws its actual
values.
Supplement 01 (b)
Database Introduction-2
•
We may say that the attribute student_name is of data type string.
– So 'JOE BLOGGS' is a valid value.
•
However, 'X23Y B&&9' is also a string but is not a student name.
(unless that student has particularly annoying parents).
•
We could argue that there the finite list of possible student names is a
subset of all possible random strings.
– We can't predict what a student will be called.
– Thus we have to implement the domain of student_name by using the
data type string.
•
In other cases, for example the attribute, student_eye_colour, we could
easily define a list of values that defines the full range of possibilities.
Supplement 01 (b)
Database Introduction-2
NULLS
•
There are two basic types of NULL value 'not applicable' and 'not known'.
emp#
emp-name
age
car-reg#
E4
D.Jones
34
F345DRT
E77
L.Smith
27
E9
J.Smith
E2
N.Patel
G467BBT
55
Null(Not known)
K976BJT
Null (Not
applicable)
•
Every employee would have an age but it might not be known in a
particular case.
•
However, not every employee need own a car ( with a car-reg#).
Supplement 01 (b)
Database Introduction-2
Summary
•
Introduction to Database Concepts
•
Introduction to Data Modelling
•
Introduction to Databases and Redundancy.
•
Relational Data Model
•
Terminology associated with Relational Data Model.
•
Duplicated Data
•
Primary and Foreign Keys
•
Duplicated Data (Foreign key to Primary key references to link data in tables)
but not redundant.
•
Additional supplementary material on Normalisation available
Supplement 01 (b)
Database Introduction-2
END OF LECTURE
Supplement 01 (b)
Database Introduction-2