Powerpoint slides - Dynamic Connectome Lab
Download
Report
Transcript Powerpoint slides - Dynamic Connectome Lab
Database Systems
Marcus Kaiser
School of Computing Science
Newcastle University
Recap: Data Inconsistency when
a Computer Fails
A bank wishes to move £500 from account 12 to
account 17
The sequence of actions is:
1. Reduce the Balance of account 12 by £500
2. Increase the Balance of account 17 by £500
What if the computer crashes after 1 and before 2 ?
Transactions
A Transaction is a logical unit of work
A Transaction can consist of a sequence of database
operations
e.g.
1. Reduce the Balance of account 12 by £500
2. Increase the Balance of account 17 by £500
A Transaction either executes in its entirety or is totally
cancelled
Transactions (examples)
A Transaction either executes in its entirety or is totally cancelled
e.g.
Start Transaction
1. Reduce the Balance of account 12 by £500
2. Increase the Balance of account 17 by £500
End Transaction
A. Both steps of the transaction complete successfully
The database has been changed
B. The computer crashes after Step 1.
the DBMS restores the database to the state it was in before
the Transaction began
How does it do this ?......
Logging
The DBMS keeps a log (on Disk) in which it records:
transactions starts
db updates (old and new values)
transaction ends
e.g. Start Transaction
1. Reduce the Balance of account 12 by £500
2. Increase the Balance of account 17 by £500
End Transaction
if all goes well, the log entries are:
Start Transaction
Update (Account: 12 , 1000 -> 12 , 500)
Update (Account: 17 , 2000 -> 17 , 2500)
End Transaction
Logging
e.g.
Start Transaction
1. Reduce the Balance of account 12 by £500
2. Increase the Balance of account 17 by £500
End Transaction
If the computer crashes after Step 1, the log entries are:
Start Transaction
Update (Account: 12 , 1000 -> 12 , 500)
When the computer comes up again, the DBMS can undo all
updates made by incomplete transactions
Logging Example
A personnel department keep a database with 2 Tables:
Payroll:
Employee:
Id
Surname Initial
Title
12
Smith
A
Mr
75
Smith
K
34
Brown
M
Job
Id
Wage
Designer 1
12
15000
Ms
Designer 2
75
17000
Mrs
Implementer
34
16000
Mrs Brown is promoted to Manager, and her salary increased to 20000
The transaction is:
Start Transaction
1. Update the Job of Employee Id 34 to Manager
2. Update the Wage of Id 34 to 20000
End Transaction
Avoiding Data Loss when a Disk
Fails
The Log can also allow us to Recover from Disk Failure
The database is regularly copied onto tape (archiving)
nightly is common
The log is stored on a different disk to the database
Actions on Disk Failure:
If a Database Disk fails:
1. Replace the Disk
2. Copy the last database archive back onto the disk
3. Process all log entries made after the last archive
if the log entry is an update for a completed
transaction then do it
Recap: Simultaneous Access to
the Data
•
•
Sometimes problems can occur when a file is being updated if there
is more than one user.
e.g. Sue and Jim have a joint bank account.
• they go shopping separately and both run out of money at the
same time
• they both head for the nearest ATM.......
Sue @ Newcastle University ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance
Jim @ Metro Centre ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance
Atomic Transactions
Transactions can be made atomic
An atomic action has exclusive access to the data
Changing the balance on an account is atomic
Either Sue or Jim will get exclusive access to the
balance change
The other will need to wait until the atomic transaction is
finished
At which stage they will see the new balance
Keys
Primary Keys, Foreign Keys and Candidate
Keys
Keys
Keys are a subset of the fields of a table which
uniquely define a record
Primary Key – the key within a table
Staff Id
Name
Postcode
Phone
001
Scott T
NE1 7RU
0191 222 2222
Primary Key
Foreign key – a primary key in another table
Staff Id
Job Title
Salary
001
Chef
£10,000
002
Manager
£11,000
Candidate Key – one of the possible options for
primary key
Database Normalization
Helping to identify good designs
Database Normal forms
Normal forms are a set of requirements on a database
They won’t tell you you’ve got a good design
Just tell you your design isn’t bad
There are ~7 Normal Forms each becoming
progressively more restrictive
Though the first three are the ones that are used
most often
When you’ve designed your tables you can use them
to check you’ve not made a bad design
First Normal Form 1NF
A table is in 1NF if:
1. There's no top-to-bottom ordering to the rows.
2. There's no left-to-right ordering to the columns.
3. There are no duplicate rows.
4. Every row-and-column intersection contains exactly
one value from the applicable domain (and nothing
else).
5. All columns are regular [i.e. rows have no hidden
components such as row IDs, object IDs, or hidden
timestamps].
Examples of NOT 1NF
Name
ID
Module
Name
ID
Mod
Mod2
Scott
1
M1
Scott
1
M1
M2
M2
Simons
2
M1
M2
M1
Trevor
3
M3
Name
ID
Mod
Scott
1
M1 M2
Simons
2
Trevor
3
M1 M2 M3
M3
Simons
2
M2
M3
Trevor
3
M3
Name
ID
Mod
Scott
1
M1,M2
Simons
2
M1,M2,M3
Trevor
3
M3
Mod3
M3
Fixing 1NF problems
In general splitting a table into separate tables can fix
1NF problems
Name
ID
StudentID ModuleID
Scott
1
1
M1
Simons
2
1
M2
Trevor
3
2
M1
2
M2
2
M3
2
M3
Second Normal Form 2NF
A table is in 2NF if
It is in 1NF and
Any attribute in the table depends on the whole of
the candidate key and not just part of it.
Examples of NOT 2NF
CK
Student ID
A-level Subject
Degree
001
Maths
Comp-Sci
001
Physics
Comp-Sci
001
Biology
Comp-Sci
002
Maths
Maths
002
Physics
Maths
002
Business
CK
Maths
Name
Sport
Address
Tim
Football
11, The Acres
Tim
Tennis
11, The Acres
Mary
Football
44, Beech St
Mary
Badminton
44, Beech St
Fixing 2NF problems
Again in general splitting the table up will solve 2NF
problems
Student ID
A-level Subject
Name Address
001
Maths
Tim
11, The Acres
001
Physics
Mary
44, Beech St
001
Biology
002
Maths
002
Physics
Name
Sport
002
Business
Tim
Football
Tim
Tennis
Mary
Football
Mary
Badminton
Student ID
Degree
001
Comp-Sci
002
Maths
Third Normal Form 3NF
A Table is in 3NF if
It is in 2NF and
Any attribute is only dependent on the candidate
key and nothing else
Examples of NOT 3NF
CK
Module
Module Year
Module Leader ID
Module Leader Name
CSC8010
2009
0001
Jenny Palmer
CSC8010
2010
0004
Paul Watson
CSC8304
2009
0002
Marcus Kaiser
CSC8304
2010
0002
Marcus Kaiser
CSC8010
2008
0003
Pete Lee
Fixing 3NF problems
Again splitting into separate tables can fix 3NF
Module
Module Year
Module Leader ID
CSC8010
2009
0001
CSC8010
2010
0004
CSC8304
2009
0002
CSC8304
2010
0002
CSC8010
2008
0003
Module Leader ID
Module Leader Name
0001
Jenny Palmer
0002
Marcus Kaiser
0003
Pete Lee
0004
Paul Watson
Mnemonic for normal forms
Data should depend on
the key
(1NF)
no duplicate entries
the whole key
(2NF)
not only part of the candidate key
and nothing but the key (3NF)
no dependency on other attribute
More SQL
Running Example
A Company keeps records for boat hires
Each boat is crewed by a sailor
Boats can be reserved
Three tables: Sailors, Boats and Reservations
Sailor
Sid
Sname
Rating
Boat
Bid
Bname
Color
Reservation
Sid
Bid
Day
Age
Rough Matching
If you don’t know exactly what you’re looking for in a
string you can use LIKE
_ - Matches with exactly one unknown character
% - Matches with 0 or more unknown characters
SELECT Age
FROM Sailor
WHERE Sailor.Sname LIKE ‘Ne_o’
Would match ‘Nemo’, ‘Neto’, ‘Nebo’, ….
SELECT Age
FROM Sailor
WHERE Sailor.Sname LIKE ‘Ne%o’
Would match ‘Neo’, ‘Nemo’, ‘Nemo von bo’, …
Mathematical Operations
You can use mathematical operations within the
SELECT statement
You’ve already seen this as MIN, MAX, AVG
You can also have +, -, * , /, %
What is the sailors rating per year?
SELECT Rating / Age
FROM Sailor
Union, Intersect, Except
SQL provides set-manipulation constructs:
UNION ( )
INTERSECT ( )
EXCEPT ( )
By default, duplicates are eliminated in results
To retain duplicates, use UNION ALL, INTERSECT
ALL, EXCEPT ALL
Find Those who’ve hired a red
OR Green boat
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid
AND (B.colour=‘red’ OR B.colour=‘green’)
OR
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.colour=‘red’
UNION
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.colour=‘green’
Why?...
Find Those who’ve hired a red
AND Green boat
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid
AND (B.colour=‘red’ AND B.colour=‘green’)
OR
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.colour=‘red’
INTERSECT
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.colour=‘green’
Find Those who’ve hired a red
boat but not a Green boat
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid
AND (B.colour=‘red’ AND B.colour!=‘green’)
OR
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.colour=‘red’
EXCEPT
SELECT S.sid
FROM Sailors AS S, Boats AS B, Reservations AS R
WHERE S.sid=R.sid AND R.bid=B.bid AND B.colour=‘green’
GROUP BY
So far, we’ve applied aggregate operators to all
(qualifying) records. Sometimes, we want to apply
them to each of several groups of records.
Consider: Find the age of the youngest sailor for each
rating level.
How many rating levels are there? What are the rating
values for these levels? In general, we don’t know!
Suppose we know that rating values go from 1 to 10;
we can write 10 queries that look like this (!):
For i = 1, 2, ... , 10:
SELECT MIN (S.age)
FROM Sailors S
WHERE S.rating = i
GROUP BY
To write such queries, we need GROUP BY clause, a
major extension to the basic SQL query form.
E.g. Find the age of the youngest sailor for each rating
level - can be expressed as follows
SELECT
S.rating, MIN (S.age)
FROM
Sailors AS S
GROUP BY
S.rating
S.rating after ‘GROUP BY’ is called a grouping-list
HAVING
What if we’re only interested in some of the groups?
We can restrict this with HAVING
For sailors over 18 what is the highest rating for each
age?
SELECT
S.age, MAX(S.rating)
FROM
Sailors AS S
GROUP BY
S.age
HAVING
S.age > 18
Summary
Databases are the primary way in which information is managed
in organisations
They offer a range of valuable functions
querying, security, transactions…
Database design is important
Normal forms (1NF, 2NF, 3NF):
data should depend on the key, the whole key, and
nothing but the key
You should now be able to:
explain the main functions of databases
identify opportunities to exploit them to meet business needs
design databases
perform queries against an SQL database