Unit CGnnn - Databases

Download Report

Transcript Unit CGnnn - Databases

Introduction
Objectives of the Lecture :
•To consider “What is a Database ?”
•To consider “What is a Relational Database ?”
i.e. what is a “relation”, and therefore what
differentiates a relational database from any other
kind of database.
What is a Database ?
Definition :
“A collection of data that is permanently stored on a computer”.
In a database we should be able to :



have different types of data in the collection;
record relationships between different data items;
have varying sizes of data collections to suit our needs.
We need to be able to :



insert new data, delete old data, and amend existing data in the collection;
retrieve data from the collection;
manage the collection so that it can be permanently stored in the face of
various hazards that would otherwise corrupt or lose data in it.
Permanent Data Storage
By “permanent” is meant that once data is put into the computer, it
stays there until someone explicitly removes it, or until the
computer is damaged or broken and this causes the data to be
lost.
The data is not lost when the computer is switched off or the
database is not used for a long time.
A computer has 2 classes of data storage :


Random Access Memory (= RAM). This is used by the computer’s
Central Processing Unit (= CPU) to temporarily hold data that the CPU is
processing.
Backing Store. This is used to store data permanently. Typically magnetic
discs are used for this purpose, although other types of storage device, e.g.
compact discs and magnetic tapes, are also used.
Thus the database invariably uses the backing store to make the data
storage permanent.
Different Types of Data
There are many kinds or types of data, such as :
‘Janet & John’

Numbers
1, -999, 169.58

Text

Dates
14 January 2004

Pictures

Audio

Maps

Videos

etc.
To store useful information, many types may be needed together.
Example : recording observations of a species of bird.
Number of birds observed, textual description of characteristics,
dates of observations, map of migration routes, pictures of birds,
audio recording of bird calls, video of flight.
Relationships Between Data
In practice, we are not interested in just isolated items of data.
We also want to know the relationships between the data
items.
Example : species of bird observations
It is important to know which bird numbers, descriptions, etc
relate to which bird species. It may also be useful to know
which bird species have similar migration routes.
All sorts of relationships are possible : some essential to make
sense of data, some useful for different purposes.
Quantities of Data
Amounts of data may be large or small:

Example (small) :
Names and addresses of members of a golf club (say 500
members).

Example (large) :
Electricity company stores all customers’ account details (say
name & address, a/c no., meter reading, amount owed,
payment method, …. ; 1,000,000 customers).

Example (very large) :
Supermarket stores customer purchase details for analysis
(say average 10 purchases × 1000 customers × 400 stores ×
355 days × 5 years = 7.1 billion purchase details).
Computerised Databases
What is the benefit of a computerised database (= DB)
compared to a manual paper-based collection of data ?
 Compact. Less bulky than filing cabinets.
 Speedy. Computer can retrieve and change data faster than a
human being.
 Less drudgery. Removes tedium.
 Current. Easier to ensure data is up-to-date.
Often databases are shared by several/many users. Then there
are further benefits :
 Consistency. All users have the same data.
 Centralised management. The data collection can be
properly managed on behalf of the whole organisation.
Sharing consistent and well-managed data is much easier to
achieve if the database is computerised.
Database Management Systems


Need special software to manage the DB if the benefits are to
be obtained.
Don’t want to have to write our own computer programs to
do this.
Therefore buy a
Database Management System (= a DBMS).
A DBMS should provide the means to :
 change the DB contents to keep them up-to-date;
 find and retrieve data from the DB;
 protect the data from accidents and intruders;
 store it in an efficient way on the computer.
What is a Relational DB ?
Definition :
“A DB in which all the data is stored in relations”.
A relation is a simple logical structure which contains related
data (hence the name “relation”). It can be pictured as a
table of data
Example :
ENo EName M-S Sal
A relation storing the
E3
1 Smith
5
S
2 12,500
2
employee numbers (= ENo),
2 Mitchell
6
M
4 21,000
4
employee names (= EName), E5
marital statuses (= M-S),
E1
3 Robson
7
D
6 32,500
6
and salaries (= Sal)
E6
4 Blake
8
M
8 54,000
8
of 4 employees.
Row data is related because it all refers to the same employee.
The rows are related because they all relate to employees.
A Relation
The example relation has
4 tuples,
depicted as rows
4 attributes, depicted as columns
ENo
E3
1
E5
2
E1
3
EName M-S Sal
Smith
5
S
2 12,500
2
Mitchell
6
M
4 21,000
4
Robson
7
D
6 32,500
6
E6
4 Blake
8
M
8 54,000
8
As relations are just structures, they can hold any kind of data.
So they can hold all the different kinds of data mentioned earlier,
and more besides, without limit.
Relational Databases

Relations themselves are based on simple mathematical
principles.

Relations can be manipulated by users in ways that are
conceptually simple.

A relational DB is perceived and used as a collection of
relations.

The relational DBMS manages the physical processing of
data so that the user doesn‘t need to know what underlying
computer processing goes on and merely perceives things in
terms of relations.
Importance of Relational DBMSs



Because relational databases are built on sound theoretical
mathematical principles, the theory is very practical.
It makes relational databases easy to learn and use.
Nevertheless relational databases can be very powerful and
flexible.
Therefore relational databases :
 are the most common type of database on the market (over
90% share);
 are extremely important in practice.
Commercial relational DBMSs usually use a database
programming language called SQL. (Thus sometimes they
may be called SQL DBMSs).
Terminology


Because relations are usually depicted as tables, the word
table is used in SQL instead of relation.
Consequently, as tables have columns and rows, the
following tabular names are used in SQL (and often more
generally) instead :
table
instead of
relation
column
instead of
attribute
row
instead of
tuple
Note 1 : some relations are better represented as graphs or
networks.
Note 2 : don’t confuse a relation with its representation.
Relational Implementations
Principles
Calculus
Algebra
RAQUEL
Domain
Tuple
Graphical
QBE
Text
SQL
IBM’s DB2
MS Access
MS SQL Server
Ingres
Oracle
Sybase
Relational Standards

QBE and relational algebra have not been standardised.

SQL has been standardised. Later standards include
additions to SQL compared to earlier standards.

The SQL standards do not adhere completely to relational
principles.


Vendors’ SQL implementations often don’t adhere
completely to SQL standards, and so are even more ad hoc.
This course uses Oracle SQL, because it is popular in the
database market, but it does not adhere completely to the
SQL standards.
Other Data Models
There have been databases that used other data models :Hierarchical : data items (= facts) are arranged in a
hierarchical structure.
Codasyl/Network : data items are arranged in a network
structure.
Object-oriented : data items (= facts) are considered as :
• objects containing data,
• with their own ways of handling their internal data,
• usually arranged in a network structure.