M359 Relational databases: theory and practice

Download Report

Transcript M359 Relational databases: theory and practice

1
Explain relational database concepts.
Explain database and DBMS.
Explain the use of the database language SQL.
Explain about database development and the techniques.
2
Chapter -1
1.3
1.5
1.6
1.9
Modern Data Processing
Databases and DBMSs
Are all database systems the same?
Enduring issues in data management
3
Modern Data Processing
The file-based approach.
The database Management System approach.
11
File based systems
Made up of data files and associated programs.
Each data file contains data records specific to one
application.
Each application needs to know about the format and
access mechanisms for each file (interface).
12
Interface
13
The explicit interface
14
Productive maintenance
15
Productive Maintenance: Maintenance that leads to a
benefit by meeting a new requirement , not just a
continuation of the previous state
16
Unproductive maintenance
17
Unproductive Maintenance: making changes to the
program not because of any new requirement for that
program but in order to continue using a shared data
file that has been changed for some other reason
18
Unproductive maintenance-
change program
because shared data file has been changed &not because of new requirement for that
program
19
Data Dependence:
File based processing is closely tied to physical
structure of the data stored in the files.
Disadvantages of file based systems:
 Unproductive maintenance.
 Duplication of data in multiple files.
 Need to generate a new application for each request.
20
The Database Approach
 It hides physical representation from users or
application programs.
 A s/w system (DBMS) is responsible for all
interactions of data files with application programs.
 The application program doesn’t know how data is
stored or how to manipulate records.
 Application program uses a logical interface.
21
The Database Management System
Approach
22
Statements to the logical interface
Example for logical interface : Select Postcode From
Student;
 DBMS turns logical request to physical instruction.
 It internally hold a mapping b/w logical name & physical
representation.
 To change physical structure , change the mapping but not
to change logical interface& application process.
 If we change the physical structure, then the mapping can
be changed. But no change in logical name & application
program.
 The separation of physical storage of data from application
process (data independence).
23
The enduring principles of data
storage and management
 The data records must be durable.
 The data records must be interpretable.
 The data records must be trustworthy and secure.
 Agreeing the structure and content of data collections can
enable data sharing.
 Structured data can be processed automatically.
 Data structure representation must be flexible.
 Data access must be simple and logical, rather than tied to
physical representations.
 Data collections can be used for many purposes; their
processing must be timely to be useful.
24
Data vs Information
Data is a term given to the many recordable facts about
the world we inhabit.
 Example: a clock may show both hands pointing
towards the 12 
Information is said to be meaningful data.
 Example: if the clock is not working or is not set to local
time, we cannot say what  means!
25
Database terminology
Database: a collection of structured, persistent,
shared data.
Structured data: the data has a regular, defined,
structure within the database and can support the
manipulation of the data.
Persistent data: the data will continue to be stored
even when the applications creating and using it are
no longer running. Data must be maintained without
loss.
Shared data: can be used by many user applications.
26
Database terminology
 Database Management System (DBMS): the collection of
software that enable storage, manipulation and protection of
databases.
 Database engine: A software program for accessing DB and
manipulating data.
 Redundancy: duplication of data.
 Query: a request to the database to access some data.
 User process: user program running outside the DBMS.
 Data dictionary (system catalog): description of the data held
in the database.
 Query optimizer: works out how to turn a logical query into an
efficient series of physical processing steps to achieve the desired
result.
 Constraints: Rules that define representation and values for
data in the DB.
27
DBMS-embedded functions
28
Separation of concerns in a DBMS:
Physical storage, logical description & individual user requirements
29
Advantages of the database and
DBMS approach
Centralized functionality leads to consistency in the
management of the data, its storage, validation,
security, processing and control.
Data can be shared reducing the need for redundant
duplication.
There is a reduction of the unproductive maintenance
incurred when the requirements of data and programs
are changed.
The separation of concerns allows a focused way of
considering data representations, requirements and
processing.
30
Advantages of the database and
DBMS approach
The separation of concerns simplifies the task for
programmers developing user processes.
Ad hoc querying (an unplanned, interactive style of
query generation) becomes easily achievable because
the DBMS can translate a logical user query into the
more complex physical requirements of the underlying
computer system.
31
Disadvantages of the database and
DBMS approach
Complex.
Expensive.
Risks of failure and security breaches are severe.
32
The eras of database management
systems
Pre-relational.
Relational.
Post-relational.
33
Pre-relational systems
Hierarchical systems
• Data is arranged in a hierarchy.
• A tree structured hierarchy.
• Search is not efficient.
Network systems
• A data item can have multiple parent data items.
• It has increased flexibility.
• More complex to navigate.
34
Hierarchical systems
Student enrolled in courses at different years
35
Relational systems
The student relation
 E F Codd proposed this structure.
 A logical data structure in which all data appeared in tabular
form known as relation.
36
Relational systems
Selecting Rows (Postcode starts with MK)
37
Selecting certain columns
38
The Closure Property
Selecting rows and columns
Closure property: The property of applying operations to relations &
producing relations.
39
Problems with the relational model
 Representation difficulties: It is difficult to
represent objects in the real world into tables.
 Impedance mismatch: convert between the two
approaches whenever data is transferred from the setbased to the value-based environment.
 Semantics: Fails to capture the meaning of data
stored in the database.
40
Post-relational systems
Object-oriented database systems.
XML database systems.
41
Object-oriented database systems
We can represent data using objects.
Objects contain both value(data) and
behaviors(methods).
Encapsulation: the internal state is only accessible
through the methods.
Types of Object-Oriented Systems
Object-oriented DBMS (OODBMS).
Object-relational DBMS (ORDBMS).
42
eXtensible Markup Language (XML)
database systems
 Uses tagging to indicate the meaning of data.
 Example:
8 High Street MR245TD United Kingdom
<address> <housenumber> 8 </housenumber>
<streetname> High Street </streetname> <postcode>
MR245TD </postcode> <country> United Kingdom
</country> </address>
43
Chapter -2
46
Corporate information strategy(CIS)
 Information strategy for a large organization.
 Is developed to ensure that the technology is meeting the
company’s needs and is appropriate for those needs
It can be broken into:
 Information systems strategy (IS)
 Information management strategy (IM)
 Information technology strategy (IT)
47
Corporate information strategy
48
 IS strategy
 Address how to implement information technology to improve the
business
 Ex: make an online website , to make site attractive give free services,
promotional offers etc.
 IM strategy
 Clarify the duties of employees related with the new system to
implement IS strategy
 Their tasks will be manage & operate the new system
 Keep its information up-to-date, accurate and understandable
 IT strategy
 It includes an outline of activities & processes needed to implement
the IS strategy within the IM strategy defined
 It involves supply of technology and equipment
49
The cost of data
 Costs are usually tangible, except for
 Costs related to poor quality or missing data (time,
effort, frustration & dissatisfaction)
 Can be divided under 4 categories:
 System costs (building, equipment , network, people ,)
 Origination costs (acquisition, cleaning &
preparation)
 Communication costs (setting up & delivery)
 Execution costs (training employee, additional
equipment & software)
50
Data quality
 Data have high quality if it satisfies the requirements of its
intended use.
 It lacks quality when it fails to satisfy the requirements.
Quality Management of data
 establishing standards for quality of data.
 establishing procedures or methods which ensure that
these standards of quality are met.
 monitoring the actual quality.
 taking control when quality falls below acceptable
thresholds.
51
Data quality
Data must include the following qualities
 Accuracy : data value & data representation should be
accurate.
 Completeness: should have complete set of data values to
take decisions.
 Timeliness: data is available for user needs within a
reasonable time period.
 Relevance: the appropriateness of the data to the
requirement.
 Understandable: meaning and interpretation must be
clear to the users.
 Trusted: require an adequate guarantee for the security,
privacy and ownership of data.
52
The impact of poor data quality
Operational effect: lowering customer & employee
satisfaction, increasing costs.
Management effect: poor or invalid decision making,
taking longer to make decisions, resulting in internal
organizational mistrust.
Strategic effect: creating difficulties in setting
strategy, raising issues of data ownership and control,
diverting management attention.
53
ACTIVITY 2.1 Messy mailing lists
1.
2.
3.
4.
5.
Open the Notlaw Supermarket
Choose the Mailing List Generation application
Use the By Postcode
Use the partial postcode MK.
Compare it to the list in the ALL option
 List data quality problems that could have been
avoided by data cleaning or better constraints?
 What data quality problems cannot be avoided?
 How will poor quality affect the Supermarket?
56
ACTIVITY 2.2 Refund problems
1. Run the Notlaw Till Management app. give a date in last week.
2. Record the purchase of one four-pint carton of milk
3. Write a note of the till receipt No. & the line No. of the milk
and the code No. & price of the milk
4. Run Price Management app.
5. Give the date as yesterday.
6. Increase the price by 3 pence.
7. Run the Refund Management app.
8. Give today’s date.
9. Use the till receipt No. & receipt line No. you noted earlier.
 How much profit can the customer make in this situation?
 Can you do the same at the Walton Supermarket?
57
ACTIVITY 2.3 Moving problems
1. A customer informs the supermarket of a change of
address.
2. Using the Notlaw Customer Maintenance application:
3. Select the data for Geraldine Taylor, by entering her
customer number C12345.
4. Change this customer’s address to any new address
5. Save this change
6. Using the Orders Pending app. look for outstanding
orders still to be delivered to Geraldine (there is only one).
 What problem is evident in the data the applications
display?
 Is this problem present in the Walton Supermarket?
58
ACTIVITY 2.4 Confusion reigns
1. We are to develop an application to report the details of
incomplete orders to suppliers
2. Look at the document notlaworderdatadescription.doc in
the \Activities\Block 1\Notlaw\OtherDocs subfolder
within your M359 installation folder.
3. Using this description identify those data items you would
need to use to decide if an order was incomplete.
 What is your understanding of the meaning of the Notlaw
data?
 Now look at the file waltonorderdatadescription.doc why is
it easier to understand?
59
ACTIVITY 2.5 Using nonstandard data representations
1. Open the Notlaw Book Catalogue application.
2. Examine the data held about the latest Airy Porter book
(currently Volume 216).
3. Make a note of the main details for this book.
4. Use the Books-4-Booksellers Order app. to order another
6 copies of this book. Use the Customer Reference “1234 “
and “notlaw “ for the Customer Name.
 that is, if you can!
 follow the above instructions for the Walton Supermarket.
60
ACTIVITY 2.6 Satisfying a Data
Protection Act request
1. A customer who previously requested a copy of their data held by the
supermarket has disputed the alleged non-payment of a bill. This customer
says that on the occasion in question they did not make an order with
Notlaw Supermarket so there should be no outstanding bill related to their
account.
2. The supermarket admits it has made a mistake by confusing this customer’s
records with that of another customer. The supermarket has agreed to
correct the fact that the customer is recorded as having a poor payment
history.
3. Using the Notlaw Customer Maintenance application
4. Look at the entry for Gerald Tailor, customer No. C19342.
5. You should see that the customer details show a Poor Payment flag on
screen;
6. remove that flag by changing the yes to no.
7. Update the changed data in the database.
8. Now, retrieve Gerald’s data a second time and confirm that the Poor
Payment flag shows no.
61
ACTIVITY 2.6 Satisfying a Data
Protection Act request
9. Now execute the End-Of-Month Reconciliation application.
10. This application would do some automated reconciliation checks that the
supermarket might make to maintain the internal consistency of data; it
might also result in the payment of bills, sending of overdue reminders and
other such activities. We’ve coded this so that it checks if there are any
unpaid bills and sets the poor payment flag for those customers who have
unpaid bills.
11. Incidentally, the original choice of the 10-digit ISBN has proved to be
insufficient to cope with the volume of books being published; prompting
the recent introduction of 13-digit ISBNs – so expect a mini-millenium bug
type flurry of activity in the field of book databases.
12. Using the Customer Maintenance application, check the state of the poor
payment flag for Gerald Tailor. You should see that the poor payment flag
has been reset by the reconciliation application.
62
ACTIVITY 2.6 Satisfying a Data
Protection Act request
Now use the Credit-Check application:
13. Enter Gerald’s customer number C19342 to bring up the details of any
outstanding payments. As you can see, changing the poor payment flag has
not changed the data which shows that Gerald has an unpaid bill. In fact it
should be inconsistent to have unpaid bills and a good payment record, but
this is permitted by the Notlaw database. A change to the poor payment
flag should require a change to the related data so that the reconciliation
application cannot reset the poor payment flag at some later date.
14. Use the Walton Supermarket applications to make the above alterations to
Gerald’s data (he has the same customer number). Remember that the
Walton data is separate from the Notlaw data, so Gerald’s record hasn’t been
corrected yet. When you attempt to update the poor payment flag, the
Walton application will behave differently compared to the Notlaw
application. Follow the on-screen instructions to reallocate the disputed
order to the correct customer Geraldo Tailon C19324, and run the
reconciliation application
15. and check Gerald’s customer records.
63
ACTIVITY 2.7 Maintenance
overheads
1.
2.
3.
4.
5.
Run the Notlaw Customer Profiling application.
use today’s date
Look at the customer details listed.
Now request the report for the same date two years ago
Compare the detail of a few customers shown on the two
reports;
 What is unusual?
 For comparison, look at the customer list produced by the
Walton Customer Profiling application for a similar span
of dates.
64
ACTIVITY 2.8
Accessing archived records
1. Your task in this activity is to extract the contents of an archive
file and print out the customer data.
2. Locate the data file Notlaw_May_1_1977.nrc in the
\Activities\Block 1
3. \Notlaw\OtherDocs folder of your M359 installation folder.
4. This data file is an archival record of the content of the Notlaw
customer database on 1 May 1977. Try to extract some
meaningful content from this file.
5. Don’t spend too long trying to interpret the content of the file;
you can use a simple text editor like Microsoft Notepad to open
the file, then skip straight on to the discussion.
65
The data island syndrome
 Independent database inside the company for the use
of a single department or even a single person.
 Could be independent of the company’s policies and
outside the company’s direct data management.
 Creates isolated islands of data within the
organization.
67
End of First
Meeting
69