Powerpoint slides - Dynamic Connectome Lab
Download
Report
Transcript Powerpoint slides - Dynamic Connectome Lab
Database Systems
Marcus Kaiser
School of Computing Science
Newcastle University
Aims
provide an understanding of the role of Databases
provide an understanding of the functionality of
Databases
introduce the skills of Database design
Student Objectives
an awareness and understanding of the potential
problems of storing data outside a Database system
a knowledge and understanding of what a database
system can do
an awareness of when it is appropriate to use a
Database
an introduction to how Database systems store
information and process queries
an introduction to the skills of designing databases
Core Content
Storing Data Outside a Database Management
System
Methods
Issues
Advantages of Database Management Systems
Overview of Database Management Systems
Introduction to Database Design
How Databases Store Information
How Databases Access Information
Advanced Issues
Security
Importance:
•
Database Systems are the main technology used
to manage business information
• Databases are used extensively in Bioinformatics
• Computer scientists use Databases in most
projects
By the end of this Theme you
should be able to:
identify opportunities for the use of a Database in
business/Bioinformatics/Computing
design Databases
use a Database
explain what Databases offer
explain how Databases store and access data
What is a Database ?
A collection of related Data about a subject that
provides a base for procedures such as
retrieving information and making decisions
Why are they important
All organisations are highly dependent on the effective
use of information
day-to-day operations
historic analysis
prediction
Storing Data outside a Database
System
Storing Data in Files (similar problems with spreadsheets)
Data often stored in as a delimited file (‘,’)
e.g. a Companies Personnel Address Database held in a file
has one entry for each employee:
Name , Initial ,Street, Town, Post Code, Telephone Number
Smith,A,73 Dover St., Newcastle, NE2 3SR,01912234554
Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,01914566547
We could then write a program to access the file to pick out fields
e.g. telnum Smith would print 01912234554
e.g. address Jones would print
25 Spring Rd., Sunderland, S12 7DD
but this has limitations....
Problems with this Approach :
Data Redundancy
Data Redundancy
If there is no central database then the same information may be
held in several different places
e.g.
The personnel department has the address list shown above
but, the payroll group needs some different information and keeps its
own records on each employee
Name , Initial , Street, Town, Post Code, National Insurance Number, Salary
Smith,A,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000
Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,WM32345,21450
There is redundancy in the name and address information
stored in multiple places
wastes storage space
updates take twice the work
if they happen .....
Problems: Data Integrity
When common data is kept in separate files
discrepancies often develop
e.g. Name and Address information
One may be updated but not the other
Interdependence of Programs
and Data Files
The program which accesses the data has to know the format of the data
Change the format, and you have to change the access program
e.g. consider previous address database
Name , Initial, Street, Town, Post Code, National Insurance Number, Salary
Smith,A,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000
Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,WM32345,21450
a program has been written to generate address labels from this file
but.. someone has complained about being addressed as A. Smith
rather than as Dr. A. Smith ... so a title field is added to the file:
Name , Initial, Title, Street, Town, Post Code, National Insurance Number, Salary
Smith,A,Dr,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000
Jones,C.A.,Mrs,25 Spring Rd,Sunderland,S12 7DD,WM32345,21450
so now all programs which access this file have to change.
Data Inconsistency when a
Computer Fails
Consider the following Sequence of events:
Sue @ Newcastle University ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance
What if the computer crashes at 11.04:30 ?
the balance is wrong
Data Loss when a Computer
Fails
What if the Disk holding the data file breaks ?
Do we just loose all that information?
Simultaneous Access to the
Data
•
•
Sometimes problems can occur when a file is being updated if there
is more than one user.
e.g. Sue and Jim have a joint bank account.
• they go shopping separately and both run out of money at the
same time
• they both head for the nearest ATM.......
Sue @ Newcastle University ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance
Jim @ Metro Centre ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance
Security
Some data needs to be protected
only certain users are allowed to access it
e.g.
Anyone in Personnel can see someone's address
(to send them a letter)
Only the Head of Personnel can see their salary
Only the Head of Personnel can change a salary
Remote Access
Due to the problems of keeping multiple copies of data...
...many organisations wish to centralize information
If the organisation is distributed so everyone can’t use the
same computer, how do they all get access to the data ?
Remote Access to the data ?
?
?
?
Requirements
Remove Redundancy – or at least control it
Data Integrity
Separation of data and program
Prevent inconsistency when systems fail
Allow multiple (simultaneous) interactions
High level of security
Remote access
Access by humans and computers
Database Management
Systems: Structure
DBMS
Software
Databas
e
Query Language
Application
Program
Pre-Defined
Applications:
Report Generator
Query by Forms
User
Query
Database Design
Designing a Database
Before creating a Database we need to design it
Database design cannot be done purely mechanically
but there are techniques which help
The steps are:
1. Decide on what subjects are covered by the database
2. Decide how they are related to each other
3. Decide on what characteristics they have
4. Derive the database tables from the design
5. Create the database tables using database software
Decide what Subjects are Covered
by the Database
Rules:
Look at what the subject is about, rather than the
conclusions you want to find
Think about the subject independently of any particular
database software (or computing at all)
Don’t design database on computer
Focus on the data you are likely to collect, rather than
how a computer will organise the data
Decide on what subjects are
covered by the DB
Subjects are called Entities:
things that hold interest for you in your database
represent a class of things
precise definition
different for every database you create
Example: a Crime database
The police want to track cases
When a robbery has happened who is involved?
What was taken?
Did this go to court?
What was the verdict?
What are the Entities (subjects) here?
Police Officer
Stolen Item
Case
Result
Example: a Crime database
What are the entities?
Decide how the entities are
related to one another
Relationships
a significant association between the entities
represented as a diamond
Each relationship has:
a name
a degree (more on this later)
Names
Pick a simple word that encapsulates the relationship
Concentrate on the main entities first: the rest can
come later when you understand the problem better
Naming relationships
Deciding on the Attributes of the
Entities
Attributes are details about the state of an entity
They are things we want to know about an entity
Ensure each has a unique name within the Entity
They are usually drawn as spokes on Entity-Relationship
diagrams:
Summary
Database requirements
Remove Redundancy – or at least control it
Data Integrity
Separation of data and program
Prevent inconsistency when systems fail
Allow multiple (simultaneous) interactions
High level of security
Database structure (Entity-Relationship model)
Entity
Attribute
Relationship