Chapter 1 - Computer Science

Download Report

Transcript Chapter 1 - Computer Science

COMP 207
Database development and design
www.csc.liv.ac.uk/~khan/COMP207.html
M S Khan
Room G.22
Dept of computer science
University of Liverpool
[email protected]
What is this module about?
In COMP 102 you learn the basics of relational databases.
Here we focus on:
• Database design principles for ER models:
• Logical and physical layer
• Transaction models
• Concurrency
• Recoverability
• Object oriented models
• Semi structured models
2
Prerequisites
• Entity relationship (E/R) modelling
• Relational data model
• Good knowledge of SQL
3
Content of the course
Part 1: Revise basic notions of relational Databases
Part 2: Advanced database design: Normalisation,
Logical and Physical DB design methodologies;
Part 3: Transaction management;
Part 4: Overview on Object Oriented and Object
Relational databases;
Part 5: Semi-structured models: XML;
Part 6: Data Mining.
4
Aims
To introduce students to:
• a systematic design approach to developing
databases, including the use of normalisation;
• the problems arising from concurrency in
databases, and how they are solved;
• the problems involved in the integration of
heterogeneous sources of information and semistructured data;
• non-relational databases;
• techniques for analyzing large amounts of data.
5
Learning outcomes
At the conclusion of this module, students should:
• understand the principles of the relational database model.
• understand the design process underlying the design of
databases and the use of normalization;
• understand the concepts involved in transaction
management;
• understand the XML terminology and syntax and
understand how to use XML;
• understand OODBS and their advantages and
disadvantages to relational DBS;
• understand basic concepts of data mining;
6
Textbooks
• Connolly & Begg. “Database systems”. Addison Wesley (fifth edition);
• R. Elmasri, S.B. Navathe. “Fundamentals of Database Systems” (third
edition) Addison Wesley;
• H. Garcia-Molina, J.D.Ullman, J Widom "Database systems. The
complete book”. Prentice Hall (First or Second Edition);
Useful readings:
• J.D.Ullman, J Widom ”A First Course in Database Systems" Prentice
Hall (ThirdEdition);
• Silberschatz, Korth and Sudarshan. “Database System Concepts”
McGraw-Hill
• N. Shah, "Database Systems using Oracle", Prentice Hall
7
Assessment weightings
• 80% Exam;
• Note - all of material covered by the module is relevant, and
thus any of it could appear in the exam...!
• 20% Coursework;
1. The coursework will consist of two assignments, most probably one in
week 5 and the other in week 9;
2. There are 4 lab slots available, in order to avoid conflicts with lectures:
3. The lab slots can also be used to access the weekly exercises on Vital!
8
Module Delivery
Lecture Times:
– Tuesday 12:00 (SHER-LT1)
– Thursday 13:00 (CHAD-LT CHAD)
– Friday 09:00 (SHER-LT1)
Lab Classes:
– Formal Labs (with exercises) weekly at:
– Thursdays: 09:00 – 10:00 and 10:00 – 11:00
– Fridays: 10:00 – 11:00 and 11:00 – 12:00
– Commence in Week 3 in Lab 3, Holt Building
– Check the course website to see which class you’re in
Resources
Printouts of the lecture notes will be available from the
Computer Science Helpdesk
– This module will evolve as the module proceeds. Whilst we
will strive to adhere to the made available at the beginning of
the module (lecture 3), these may vary slightly from the
slides delivered. Additional or amended notes will appear on
the web after each lecture.
Exercises, and answer to common questions will be
posted on Vital.
Expectations
The field of Databases is vast and still evolving:
– Exams and Exercise questions rely on
understanding and applying much of the material
in this module.
– Don’t rely on simply remembering the notes, as this
won’t help you pass...!
Finally...
The obvious...
–
–
–
–
Switch off all mobile phones during lectures
Remember to sign the register
Do not sign the register on behalf of others
Attend lectures and attempt the exercises set –
– this will help you pass the exam
– Attend the practical classes
– these will help you do the coursework
– Ask questions if there is anything that you do not understand
– If you can’t hear me in the back, let me know!
And respect your fellow students...
– There are people here who want to learn!
– If you want to talk or mess around, then don’t bother attending!
13
Lecture 1
Introduction to Databases
Examples of Database Applications
Purchases from the supermarket:
Queries for stock, prices, etc
Purchases using your credit card:
Queries for credit limit, available credit…
Booking a holiday at the travel agents
Using the local library
Taking out insurance
Renting a video
Using the Internet: Amazon, Itunes…
Studying at university
File-Based Systems
Collection of application programs that perform
services for the end users (e.g. reports).
Each program defines and manages its own
data.
• Programmers need to write routines to:
- Enter new data (in the right position);
- Update data making sure that no inconsistencies
arise;
- Delete data;
- Query the relevant data;
- Make sure changes are propagated throughout the
file!
Limitations of File-Based Approach
Separation and isolation of data
• Each program maintains its own set of data:
– Spider and Tulip would maintain separate data!
• Users of one program may be unaware of
potentially useful data held by other programs.
Duplication of data
• Same data is held by different programs.
• Wasted space and potentially different values
and/or different formats for the same item
– Loss of data integrity
Limitations of File-Based Approach
Data dependence
• File structure is defined in the program code:
– Every change in format requires reading the original file,
adding a new record to a temporary file (with the modified
data), repeat this for every record! Then delete the original
file and rename the temporary file… Lots of work.
Incompatible file formats
• Programs are written in different languages, and so
cannot easily access each other’s files.
– Makes the integration of different applications difficult
Fixed Queries/Proliferation of application programs
• Programs are written to satisfy particular functions.
• Any new requirement needs a new program.
• Access restricted to one user at a time.
Database Approach
Arose because:
• Definition of data was embedded in application
programs, rather than being stored separately and
independently.
• No control over access and manipulation of data
beyond that imposed by application programs.
Result:
• the database and Database Management System
(DBMS).
We will look at many DBMS functionalities, such
as transaction management, recoverability,
etc…
Database
Shared collection of logically related data (and a
description of this data), designed to meet the
information needs of an organization:
• From disconnected files with redundant data to
integrated data items with minimum amount of
duplication
System catalog (metadata) provides description of
data to enable program–data independence.
Logically related data comprises entities, attributes,
and relationships of an organization’s information.
Database Management System (DBMS)
A software system that enables users to define,
create, maintain, and control access to the
database:
• Allows the definition of the DB and its constraints (Data
Definition Language);
• Allows the update, deletion and retrieval of data (Data
Manipulation Language);
- In relational databases SQL is both the DDL and the DML.
(Database) application program: a computer
program that interacts with database by issuing an
appropriate request (SQL statement) to the DBMS.
Database Approach
Data definition language (DDL).
• Permits specification of data types, structures and
any data constraints.
• All specifications are stored in the database.
Data manipulation language (DML).
• General enquiry facility (query language) of the
data.
Database Approach
The DBMS provides controlled access to
database including:
•
•
•
•
•
a security system
an integrity system
a concurrency control system
a recovery control system
a user-accessible catalog.
Views
Allows each user to have his or her own view of
the database.
A view is essentially some subset of the
database.
Views - Benefits
Reduce complexity:
• Users see only the data they are interested in;
Provide a level of security:
• Data is excluded from what a user can access;
Provide a mechanism to customize the
appearance of the database:
• Allows users to change to rename data;
Present a consistent, unchanging picture of the
structure of the database, even if the
underlying database is changed
Components of DBMS Environment
Components of DBMS Environment
Hardware
• Can range from a PC to a network of
computers.
Software
• DBMS, operating system, network software (if
necessary) and also the application programs.
Data
• Used by the organization and a description of this
data called the schema.
Components of DBMS Environment
Procedures
• Instructions and rules that should be applied
to the design and use of the database and
DBMS.
People
Roles in the Database Environment
Data Administrator (DA)
Database Administrator (DBA)
Database Designers (Logical and Physical)
Application Programmers
End Users (naive and sophisticated)
History of Database Systems
First-generation
• Hierarchical and Network
Second generation
• Relational
Third generation
• Object-Relational
• Object-Oriented
Advantages of DBMSs
Control of data redundancy
Data consistency
More information from the same amount of data
Sharing of data
Improved data integrity
Improved security
Enforcement of standards
Economy of scale
Advantages of DBMSs
Balance conflicting requirements
Improved data accessibility and
responsiveness
Increased productivity
Improved maintenance through data
independence
Increased concurrency
Improved backup and recovery services
Disadvantages of DBMSs
Complexity
Size
Cost of DBMS
Additional hardware costs
Cost of conversion
Performance
Higher impact of a failure