Powerpoint slides - Dynamic Connectome Lab

Download Report

Transcript Powerpoint slides - Dynamic Connectome Lab

Database Systems
Marcus Kaiser
School of Computing Science
Newcastle University
Aims



provide an understanding of the role of Databases
provide an understanding of the functionality of
Databases
introduce the skills of Database design
Student Objectives





an awareness and understanding of the potential
problems of storing data outside a Database system
a knowledge and understanding of what a database
system can do
an awareness of when it is appropriate to use a
Database
an introduction to how Database systems store
information and process queries
an introduction to the skills of designing databases
Core Content

Storing Data Outside a Database Management
System
 Methods
 Issues
 Advantages of Database Management Systems
 Overview of Database Management Systems
 Introduction to Database Design
 How Databases Store Information
 How Databases Access Information
 Advanced Issues
 Security
Importance:
•
Database Systems are the main technology used
to manage business information
• Databases are used extensively in Bioinformatics
• Computer scientists use Databases in most
projects
By the end of this Theme you
should be able to:





identify opportunities for the use of a Database in
business/Bioinformatics/Computing
design Databases
use a Database
explain what Databases offer
explain how Databases store and access data
What is a Database ?
A collection of related Data about a subject that
provides a base for procedures such as
retrieving information and making decisions
Why are they important

All organisations are highly dependent on the effective
use of information
 day-to-day operations
 historic analysis
 prediction
Storing Data outside a Database
System

Storing Data in Files (similar problems with spreadsheets)
 Data often stored in as a delimited file (‘,’)
 e.g. a Companies Personnel Address Database held in a file
has one entry for each employee:
Name , Initial ,Street, Town, Post Code, Telephone Number
Smith,A,73 Dover St., Newcastle, NE2 3SR,01912234554
Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,01914566547

We could then write a program to access the file to pick out fields
 e.g. telnum Smith would print 01912234554


e.g. address Jones would print
25 Spring Rd., Sunderland, S12 7DD
but this has limitations....
Problems with this Approach :
Data Redundancy

Data Redundancy
 If there is no central database then the same information may be
held in several different places
 e.g.
 The personnel department has the address list shown above
 but, the payroll group needs some different information and keeps its
own records on each employee
Name , Initial , Street, Town, Post Code, National Insurance Number, Salary
Smith,A,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000
Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,WM32345,21450

There is redundancy in the name and address information
 stored in multiple places
 wastes storage space
 updates take twice the work
 if they happen .....
Problems: Data Integrity


When common data is kept in separate files
discrepancies often develop
 e.g. Name and Address information
One may be updated but not the other
Interdependence of Programs
and Data Files

The program which accesses the data has to know the format of the data
 Change the format, and you have to change the access program
 e.g. consider previous address database
Name , Initial, Street, Town, Post Code, National Insurance Number, Salary
Smith,A,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000
Jones,C.A.,25 Spring Rd., Sunderland, S12 7DD,WM32345,21450


a program has been written to generate address labels from this file
but.. someone has complained about being addressed as A. Smith
rather than as Dr. A. Smith ... so a title field is added to the file:
Name , Initial, Title, Street, Town, Post Code, National Insurance Number, Salary
Smith,A,Dr,73 Dover St., Newcastle, NE2 3SR, WM12234, 15000
Jones,C.A.,Mrs,25 Spring Rd,Sunderland,S12 7DD,WM32345,21450

so now all programs which access this file have to change.
Data Inconsistency when a
Computer Fails


Consider the following Sequence of events:
 Sue @ Newcastle University ATM
 11.00 Check Balance
 11.01 ATM says there’s £200
 11.02 Ask for £200
 11.03 ATM Finds £200 in account
 11.04 ATM Gives £200
 11.05 ATM Stores £0 in Balance
What if the computer crashes at 11.04:30 ?
 the balance is wrong
Data Loss when a Computer
Fails

What if the Disk holding the data file breaks ?
 Do we just loose all that information?
Simultaneous Access to the
Data
•
•







Sometimes problems can occur when a file is being updated if there
is more than one user.
e.g. Sue and Jim have a joint bank account.
• they go shopping separately and both run out of money at the
same time
• they both head for the nearest ATM.......
Sue @ Newcastle University ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance







Jim @ Metro Centre ATM
11.00 Check Balance
11.01 ATM says there’s £200
11.02 Ask for £200
11.03 ATM Finds £200 in account
11.04 ATM Gives £200
11.05 ATM Stores £0 in Balance
Security


Some data needs to be protected
 only certain users are allowed to access it
e.g.
 Anyone in Personnel can see someone's address
(to send them a letter)
 Only the Head of Personnel can see their salary
 Only the Head of Personnel can change a salary
Remote Access



Due to the problems of keeping multiple copies of data...
...many organisations wish to centralize information
If the organisation is distributed so everyone can’t use the
same computer, how do they all get access to the data ?
Remote Access to the data ?
?
?
?
Requirements
Remove Redundancy – or at least control it
 Data Integrity
 Separation of data and program
 Prevent inconsistency when systems fail
 Allow multiple (simultaneous) interactions
 High level of security
 Remote access
 Access by humans and computers

Database Management
Systems: Structure
DBMS
Software
Databas
e
Query Language
Application
Program
Pre-Defined
Applications:
Report Generator
Query by Forms
User
Query
Database Design
Designing a Database

Before creating a Database we need to design it
 Database design cannot be done purely mechanically
but there are techniques which help
 The steps are:
1. Decide on what subjects are covered by the database
2. Decide how they are related to each other
3. Decide on what characteristics they have
4. Derive the database tables from the design
5. Create the database tables using database software
Decide what Subjects are Covered
by the Database
Rules:
 Look at what the subject is about, rather than the
conclusions you want to find
 Think about the subject independently of any particular
database software (or computing at all)
 Don’t design database on computer
 Focus on the data you are likely to collect, rather than
how a computer will organise the data
Decide on what subjects are
covered by the DB
Subjects are called Entities:
 things that hold interest for you in your database
 represent a class of things
 precise definition
 different for every database you create
Example: a Crime database


The police want to track cases
 When a robbery has happened who is involved?
 What was taken?
 Did this go to court?
 What was the verdict?
What are the Entities (subjects) here?
 Police Officer
 Stolen Item
 Case
 Result
Example: a Crime database

What are the entities?
Decide how the entities are
related to one another



Relationships
 a significant association between the entities
 represented as a diamond
Each relationship has:
 a name
 a degree (more on this later)
Names
 Pick a simple word that encapsulates the relationship
 Concentrate on the main entities first: the rest can
come later when you understand the problem better
Naming relationships
Deciding on the Attributes of the
Entities




Attributes are details about the state of an entity
They are things we want to know about an entity
Ensure each has a unique name within the Entity
They are usually drawn as spokes on Entity-Relationship
diagrams:
Summary
Database requirements
Remove Redundancy – or at least control it
Data Integrity
Separation of data and program
Prevent inconsistency when systems fail
Allow multiple (simultaneous) interactions
High level of security
Database structure (Entity-Relationship model)
Entity
Attribute
Relationship