Relational Data Analysis

Download Report

Transcript Relational Data Analysis

An Introduction to
Databases
Dr Stephen Swift
The Intelligent Data Analysis Group
Brunel University
1
An Introduction to
Databases
• Databases
• The Parts of a Database
• A Brief Description of SQL
• Examples Using Microsoft Access
2
What is a Database? (1)
• A Database System is a Computerised
Record Keeping System
• Rather Like an Electronic Filing Cabinet
• The Data can be Added to, Deleted,
Modified etc…
• The Data Contained is of the Same Type
• Would Not Have a Database Containing
Patient Records and the Sales Records of
a Pet Shop, For Example
3
What is a Database? (2)
• In Large Organisations, a Database
System is Usually a Subsystem of a
Larger Information System
• An Information System Supports the
Information Handling Requirements of
an Organisation
• Smaller Organisations Might Just Have a
Single Database
• A Database Management System (DBMS)
is a Software System that Enables Users
to Define, Create, Maintain and Control
Access to a Database
4
Why Are Databases
Needed?
• A Huge Amount of Data is Being
Collected Every Second of the Day
• The Data:
– Is Often Complex
– Large in Size
– Requires Sophisticated Manipulation
• Databases and DBMS are Essential
to Successfully Manage Such Data
5
An MS Access Database
An MDB
File
Tables
Queries
Forms &
Reports
Macros &
Modules
6
Microsoft Access
• A Stand Alone Database System
• All Aspects of the Database are
Contained in a Single MDB File
• Slow When Handling Huge Volumes
of Data
• Can be Used to Create Database
Applications
7
Tables
Rows
Foreign
key
Column name
Primary
key
Patient
No.
Patient
Surname
923
109
854
231
459
Moneybags
Foot
Hare
Knee
Legg
Patient
Forename
Sex
Maurice
Ivor
Susan
Boris
Brian
Male
Male
Female
Male
Male
Date of
birth
23/7/53
3/4/41
13/11/61
4/2/31
10/2/70
Male
Female
Domains
Patient No.
Surname
Sex
Forename
Ward
No.
10
11
10
7
10
1–12
Date of birth
Ward No.
The Patient table
Ward
No.
Ward
Name
3
11
10
Nightingale
Fleming
Barnard
Type
Medical
Medical
Surgical
The Ward table
Number
of beds
8
12
21
8
Table Properties (1)
• Rows (or Records)
– Shows Occurrences of Patient
– Each Row Must be Uniquely
Identifiable
– The Order of the Rows MUST NOT Be
Significant
9
Table Properties (2)
Columns (or Fields)
– Each Column has a Type, e.g. Number,
Text, Boolean, Multimedia, etc…
– The Order of the Columns MUST NOT be
Significant
– Only One Value Should be Associated
With Each Column/Row Intersection in
the Table
10
Table Properties (3)
• Domain
– A Pool of Possible Values From Which
the Actual Values Appearing in the
Columns of the Table are Drawn
• e.g. The Domain of Patient Numbers
Includes all of the Possible Patient
Numbers, Not Just the Ones Currently in
Hospital
– Very Important for Comparing Values
from Different Tables
11
The Primary Key
• A Special Type of Field
• Not All Tables Have a Primary Key
• Usually a Number or String, e.g. Patient
Number
• Used to Relate Data Between Tables
12
Worked Examples
• Check That Microsoft Access
Loads
• Check That You Can See Four Files:
–
–
–
–
“Functions.xls”
“Gene ID.xls”
“spellman_yeast_alpha.xls”
“annette2004.ppt”
13
Worked Example (1)
• We Will:
– Create a Microsoft Access Database
– Import Some Data
– Make Sure the Fields are the Correct
Type
– Create Three Tables
– Look at the Tables (Datasheet View)
14
Queries (1)
• A Query Selects or Modifies a
Subset of One or More Tables
• E.g. All Female Patients Under 18
Years Old
• A Query is Often Expressed in a
Special Language Called SQL
15
SQL
• “Structured Query Language”
• Originally a Proprietary Language
from IBM
• Now an International Standard
High Level Language Supported by
Most Database Products
• Used to Modify Data Within a
Database
16
Data Manipulation
• Data is Manipulated by Rows and
Columns
• A Subset of Data is Selected and
then Modified
• The Selection is Made by the User,
Usually Some Set of Requirements
• E.g. Select All Female Patients
Under 18 Years Old and Delete All
Their Records
17
Queries (2)
A SELECT Query Selects a Subset of One
or More Tables
SELECT <Fields> FROM <Table>
WHERE <Condition>;
SELECT Alpha.* FROM Alpha
WHERE Alpha.alpha63="NULL";
18
Queries (3)
A Make Table Query Creates a Subset of
One or More Tables and Puts the Results
Into a New Table. The Destination Table
is Replaced
SELECT <Fields> INTO <Destination Table>
FROM <Source Table> WHERE <Condition>;
SELECT Alpha.* INTO Temp FROM Alpha
WHERE Alpha.ORF Like "YP*";
19
Queries (4)
An Update Query Changes the Values of
One or More Fields in One or More Tables
UPDATE <Table> SET <Fields to Values>
WHERE <Condition>;
UPDATE Alpha SET Alpha.alpha63 = "0“
WHERE Alpha.alpha63="NULL";
20
Queries (5)
An Append Query Selects a Subset of One
Tables and Adds it into Another Table
INSERT INTO <Destination Table> SELECT
<Fields> FROM <Source Table> WHERE
<Condition>;
INSERT INTO Temp SELECT Alpha.* FROM
Alpha WHERE Alpha.alpha63="NULL";
21
Queries (6)
A Delete Query Removes a Subset of One or
More Tables From the Database
DELETE <Rows> FROM <Table> WHERE
<Condition>;
DELETE Alpha.*FROM Alpha WHERE
Alpha.alpha63="NULL";
22
Queries (7)
A Crosstab Query is Very
Complex and Will Therefore
Not be Covered!
23
Worked Example (2)
• We Have Some Import Errors
• We Must Locate What Fields are in
Error
• We Must Then Use an UPDATE
Query to Modify the Erroneous Data
24
Forms
Forms are Used to View/Add/Manipulate Data
25
Data Entry (1)
• The User Should Only be Able to
Enter the Domain of a Field on a
Form
• E.g. If There are Only 10 Wards in a
Hospital, They Should Only be Able
to Enter 1-10 in the Wards Field
• In the Example Above, Allowing
Any Number Would Increase the
Chance of Data Errors
26
Data Entry (2)
• Pick Lists and Check Boxes Can
Help to Maintain Data Integrity
• Validation Rules on Form Fields
Can Prevent the User From
Entering Invalid Data
• Minimise Free Text Entry to Fields
• The Application Should Help the
User in Completing Forms
Correctly
27
Reports
Reports are Used to Display Data
28
Macros and Modules
• Macros are a User-Defined List of
Database Actions to be Carried Out
• Usually Commonly Performed
Tasks
• A Module Contains Functions and
Subroutines that Carry Out More
Complex Tasks
• Modules are Constructed Using a
29
Form of Visual Basic
Joins
• A Join Combines Two Tables into
One Virtual Table
• Tables are Joined Together Based
on a Common Value in a Field
• The Field That the Two Tables are
Joined on Must be the Same Type
30
Worked Example (3)
• We Are Going to Join Our Tables
Together
• Using “Tools-Relationships”
• Add the Three Tables We Imported
• Join “Alpha-ORF” and “Gene ID-ORF”
• Join “Gene ID-SGD” and “FunctionSGD”
31
Worked Example (4)
• Now Look at the Effect on:
–
–
Building a SELECT Query on All of
the Tables
The Datasheet View For One of the
Tables
• Without Joins it Would be Very
Difficult to Relate and/or Compare
Data From Different Tables
–
Why is This Important?
32
Normalising a Table
Normalisation is:
“The Organisation of a System's
Attributes into a Set of Compact
and Meaningful Tables”
33
Normalising a Table
Well Normalised Tables Avoid:
– Unnecessary Duplication of Data
• i.e. No Redundant Data
– Problems With Modifying, Inserting
and Deleting Data
• N.B. Sometimes Referred to as “Update
Anomalies”
34
Stages of Normalisation (1)
• Normalisation Takes Place in Stages
• Each Stage is Known as a Normal
Form
• Each Stage is a Development From
the Previous Stage
35
Stages of Normalisation (2)
Un-Normalised
Form
First Normal
Form
Second Normal
Form
Third Normal
Form
36
Un-Normalised Form
• Column Headings (Field Names)
Should be Meaningful
• Choice of Primary Key
– Must be Unique for the Particular Data
Source
– May Require Two or More Fields
– Use the Smallest Number of Fields
Possible
– Avoid Textual Keys (Degrades Speed) 37
1st, 2nd and 3rd Normal Form
• 1st : Separate any Repeating Groups
of Fields to Other/New Tables
• 2nd : Separate Fields that Only
Depend Upon Part of the Key to
Other/New Tables
• 3rd : Separate any Fields That are
Not Directly and Fully Dependent on
the Key to Other/New Tables
38
Sample Source of Data
DRUG CARD
Patient No.
Ward No.
923
10
Surname
Forename
Moneybags
Maurice
Ward Name Barnard
Drugs Prescribed
Length of
Date
Drug Code
20/5/88
CO2355P
Drug Name
Dosage
Treatment
2 pills 3 x day
Cortisone
after meals
14 days
20/5/88
MO3416T
Morphine
Injection every
4 hours
5
25/5/88
MO3416T
Morphine
Injection
3
every 8 hours
26/5/88
PE8694N
Penicillin
1 pill 3 x day
7
39
for additional drugs continue on another card
After Normalisation
SYSTEM:
Source ID No.:
UNF
Patient Number
Patient Surname
Patient Forename
Ward Number
Ward Name
Prescription Date
Drug Code
Drug Name
Dosage
Length of Treatment
Hospital
DATE
/
/
AUTHOR
Name of Source:
1NF
Drug Card
2NF
Patient Number
Patient Surname
Patient Forename
Ward Number
Ward Name
Patient Number
Patient Surname
Patient Forename
Ward Number
Ward Name
Patient Number
Prescription Date
Drug Code
Drug Name
Dosage
Length of Treatment
Patient Number
Prescription Date
Drug Code
Dosage
Length of Treatment
Drug Code
Drug Name
3NF
Patient Number
Patient Surname
Patient Forename
* Ward Number
Ward Number
Ward Name
Patient Number
Prescription Date
Drug Code
Dosage
Length of Treatment
Drug Code
Drug Name
40
Tables as a Logical Data
Structure
Ward
Patient
Pat No
923
109
Surname
Moneybags
Foot
Forename
Wd No
Wd No
Maurice
10
Ivor
11
Ward Name
10
Barnard
11
Fleming
Drug
Prescription
Pat No
Prescr Date Drug Code
Trt Lgth
923
20/5/88
MO3416T
923
25/5/88
MO3416T
923
26/5/88
PE8694N
Dosage
2 pills 3 x day
after meals
Injection
every 4
hours
Injection
every 8
hours
1 pill 3 x day
109
15/5/88
AS473A
2 pills 3 x day
after meals
7
109
20/5/88
VA231M
2 per day
5
923
20/5/88
CO2355P
14
5
Drug Code
Drug Name
CO2355P
Cortisone
MO3416T
Morphine
PE8694N
Penicillin
AS473A
Aspirin
VA231M
Valium
3
7
41
Worked Example (4)
• Create a SELECT Query that Just
Displays the Functional Groups
• Check that it Contains What We are
After
• Change the SELECT Query to a
MAKE TABLE Query
42
References
• Further Reading and Source for this
Presentation:
– Database Systems: “A Practical
Approach to Design, Implementation
and Management”, 3rd Edition, T.
Connolly and C. Begg, Addison
Wesley, 2001
– “An Introduction to Database
Systems”, 8th Edition, C. J. Date,
Addison Wesley, 2004
43