Databases and DBMS - University of Wollongong
Download
Report
Transcript Databases and DBMS - University of Wollongong
Databases and DBMS
Paul Wong
bld39 rm6
Tues 12:30-2:30
Thurs 8:30-1030
A Brief Recap
Data: individual facts
Information: organisation of one or more pieces of
data to answer a complex question
Knowledge: inferred and applied information to
establish useful patterns
Database System
Databases store data that has a common structure
For example
the customers in a bank,
the car parts in an inventory system,
the books in a library
the students enrolled in a University
For every customer we need more or less the same
data: Name, address, phone, account number etc.
Database Concepts
Each of the things we want to know about a
customer (car part, book or student) is known as an
attribute and is stored in a separate “field”
Name and phone numbers are fields
Name could be divided into three fields
Title, Family Name and Given Name
All of the fields that describe a customer are
arranged to form a “record”
G. Smith, 3 Square St, Perth, 5554321, C50716
A set of records is known as a “file”. A “database”
may be a single file but usually it is several files
Databases, flat files and tables
When the data in an individual file is displayed it
looks like this:
ID
Num
Name
Size
Type
1
0175
Nail
15mm
Clout
2
0254
Nut
35mm
Hex head
Such files only have two dimensions and are
called flat files or tables.
Some applications only need a single flat file
Field attributes
Each field in the record must have:
a unique name (not a unique identifier, see later)
Part_name, Customer_name, PhoneWork, PhoneHome
a field type - number, character, date, logical etc
numbers may be defined as integers or decimals for most
business databases, but floating-point may be required
for scientific data
logical is just Yes or No (or True and False)
useful for indicating status: Payment received - No
a field width - the number of spaces required
Each record usually has a unique number (identifier)
Transaction systems
If you were building a system that stored data about
which parts a customer had purchased, the system
would be a Transaction Processing System (TPS)
You could do this with a single file but it would be
VERY redundant. This is called data redundancy
problem. Redundancies can be wasteful and degrade
performance of the system. BUT, it may lead to more
serious problems.
Name
Phone
1
G.Smith
2
Address
Name
Type
Size
5554321
Nut
hex head
35mm
G.Smith
5554321
Nail
clout
15mm
3
P.Ng
5556789
Nut
hex head
35mm
4
P.Ng
5556789
Nail
clout
15mm
5
Every time we make a sale, we have to enter all the
customer’s details and all the details for every part. If a
customer changes his address, it is now wrong on every
previous sale - which address do we use? Hence
redundancies may lead to data inconsistencies.
Database organisation
We can solve all these problems we need to rethink about
how we structure our data to avoid these problems.
One possibility is to use multiple files, one for each entity
that we are interested in.
In our previous example, we need a customer file, a parts
file and a sales file.
These files form a database that can be
relational - based on tables structure
hierarchical - based on trees structure
network - based on graphs structure
Relational are the most common so we’ll stick to those but
the other two are also useful. The underlying structures are
all based on precisely defined concepts. So they have
special meaning.
Relational database terminology
Relational databases use slightly different terms
The “files” in a relational database are known as tables
and each table describes a collection of real world entity,
e.g. the collection of customers, the collection of parts etc.
Each table consists of a set of records or tuples
Each tuple is made up of fields or attributes
The name relational refers to the fact that special
relationships are defined between the tables or entities in
the database, e.g. sales file. These relationships are known
as
“one to one”, “one to many” and “many to many”
How Does it Work?
customers
parts
sales
nail, 15mm
G. Smith,
1 Bruce Rd
nut, 35mm
File organisation
Unlike text files, DB files have their own internal
structure - the record. Records in a DB file are often
stored in sequential order e.g. part number order.
This is useful for reporting on all parts etc.
When searching a DB file, we often want a single
record or group of records with a common property.
In this case, it is better to store the file for direct
access - the address of every record is known in
advance. If we know we want part #512, we can go
directly to the record for that part.
Direct access files use relative or absolute addresses
Indexing
Obviously, we cannot actually put a file in order of
two different fields e.g. part_num & part_name
If we need to search on both these fields we can
build an index file for each one.
An index file could have all part_nums in order
and beside each, the corresponding record number
An index on part name would have every part
name in alphabetical order with a corresponding
record number
Deletion and indexing
When we delete a record in a database, we only
mark it as deleted - it still exists and it still has a
valid record number.
When we purge deleted records, the record that
used to be number 23, could become 21.
We should regenerate the index files immediately
after any purge procedure
DBMS
Organisations often have a number of applications that use
overlapping data
Our customer file may be used by both the sales
department and by the marketing department.
When the Sales TPS was written, it may have used the
field name: Customer_ID
While the Marketing application uses: CustId
The names must coincide exactly with the field names in
the database. But this makes application development more
difficult.
Database Management Systems - DBMS
Make it possible for different applications to use
different names for the same fields and entities
Alternative names are called aliases and are stored in
a data dictionary.
The data dictionary also contains all the definitions
of tables and fields, these are called metadata – data
about the structure of data
Application developers do not need to inspect each
table or file to find out the type or length of a field,
they simply use the data dictionary
Data dictionaries also contain access rights
Goals of DBMS
Data efficiency
reduces redundancy of data compared to paper ???
Access flexibility
provides simultaneous access to many users
Data integrity
the quality of the data is more reliable, up to date
Data independence
developers interact with the DBMS, not the data
Data security
granting of authorization to modify data, access
etc.
Database Administrator (DBA)
Documents all databases and database applications
(as they affect the DBMS)
Set up new databases and integrate new database
applications
Modifying database structures
Notifying staff of database alterations
Monitoring and tuning the databases
Carries out routine functions like purging and reindexing
Ensures correct access for authorised users
Selects DBMS and develops policies
User and developer views
Many DB packages (engines), particularly for the
PC provide the developer with the ability to view
all the records or step through the records one at a
time. The records can be edited interactively.This
is handy BUT dangerous
e.g an adventurous user could delete all index
files or delete the primary key from a table,
because he or she “never uses them”
Users do NOT usually access the database engine
itself. Instead, a DB application provides each
user with the required data and functions
DB procedural languages & SQL
Many DB packages (DB4, Oracle etc) provide their
own procedural languages to access the DB
Many different languages made it difficult. IBM
developed the Structured Query Language which is
now widely used by DB systems
SELECT ITEM, PRICE
FROM SALES_FILE
WHERE NUMBER .3
ORDER BY ITEM
This produces a list of the items and prices for all
sales where more than 3 items were purchased. The
list is in alphabetical order on item name
Uses of SQL
If a user typed in the code we just saw, we would
say it was interactive SQL use
SQL commands can also be embedded in other
procedural languages, like Cobol - embedded SQL
Although intended to be simple, SQL was still too
hard for most managers to use
Complex queries need to “join” relational tables to
get the required result. This is complex and slow
Access control and Data sharing
DBs, generally, and SQL, specifically, provide
mechanisms to control who gets access to data.
Remember that DBs allow multiple users to share
the data simultaneously. This has some problems.
If one user accesses a client RECORD to change a
the phone number while another user is changing
the address, they BOTH have copies of the record
open. The user who saves the record first will lose
the changes made when the second user saves his
or her record.
Smith, 5554321, 3 Big St Perth
Concurrency
When two or more users access a record
simultaneously it is called concurrency.
DBMS provide file locking and record locking to
overcome concurrency problems.
Several users can view the record without problems
The first person to use a system function to EDIT or
DELETE a record also places a lock on that record.
No other user can then start to edit or delete that same
record until the first user has finished
This causes delays and doesn’t solve all problems
Object oriented DBs
Object Oriented (OO) DBs store special data
called objects. An object contains not only the data
but the procedures associated with that data.
They are more flexible than traditional databases
and can handle more complex queries than SQL
Graphics/video - MDDB
Most early databases were text based - all of the
entities described in all the fields were either
characters or numbers
More modern databases often include graphics e.g. mug shots, house photos etc. They can also
include video clips and sound files
One recent development is the MultiDimensional
database (MDDB) which allows users to rapidly
analyse statistics about a company’s performance
e.g. no. of fridges sold in each state last month
Client Server
Many of the DBs we have seen use a powerful search
engine to access data. This engine may be located on
a remote computer whose task is to serve up the
requested data - a DB server
As PCs have become more powerful, they have taken
on some of the processing task from DB servers.
They do this by running their own “client” software
which integrates with the DB server
The client asks for a set of records and then it orders
them and filters out unwanted data
The DB engine/server is often called the “back end”
and the client software is called the “front end”
Data Warehouses (DW) & the Web
Many organisations have a lot of DBs and DB
applications. Frequently, the data in these DBs has
grown in an ad hoc fashion and cannot be easily
integrated. To overcome this, may organisations
are now developing data warehouses, where
snapshots of the active DBs can be stored
Many organisations prefer to provide access to
DWs through browser based applications. This
allows people all over the organisation to access
the data through a common interface