04CM20145Lecture7 - Department of Computer Science

Download Report

Transcript 04CM20145Lecture7 - Department of Computer Science

Dr Alwyn Barry
Dr Joanna Bryson
CM20145
Database Design
Lecture Plan
1. Basic Concepts
2. Data, Information & Knowledge
3. Data Models (The E-R Model)
4. The Relational Algebra
5. Introduction to SQL
6. Further SQL (Joins, RA Equivalences)
7. Database Design
8. Further DB Design – Normalisation
9. Architectures and Implementations
10. Integrity and Security
Lecture Plan
`Reading’ (PROJECT) Week
11. Ethics and Professional Conduct
12. Legal Issues
13. Transactions
14. Recovery
15. Concurrency Control
16. Storage and File Structure
17. Indexing and Hashing
18. Query Processing & Optimisation
January… Review Session
Overview
 Database Design and Software



Engineering.
Example from real life.
Views and Users.
Redundancy and
Decomposition.
Loosely based on A. Keller, 2002, USCU
What Is Relational Algebra For?
 Fundamental Operators:
Unary
Binary
Fundamental
 Restriction
 Projection
 Cartesian Product
 Union
 Difference
 Useful Operators:
Not
Primitive
 Join
 Intersection
 Division
File Problems DBs Should Solve
 Data redundancy & inconsistency
 Duplication of information; Different formats
 Data isolation
 Different formats; Different locations
 Limited access
 Need new program for each query
 Cannot support “Business Rules”
 Consistency
 Validity
 Atomicity
 Failures leave data in inconsistent state; Two users
cannot update the database at once
 Security
 Needs bespoke security in each application
Database Design
 DB design usually covers formal
methods for trying to ensure
databases meet all these goals.
 No formal system is perfect:
 Formal methods take a long time,
may be computationally intractable.
 Human error: mistakes in proofs.
 Formally correct databases can still
have errors (will be examples.)
 Design is an art; but science can
help.
E-R Model – System Design
From Elmasri & Navathe, 2003, pg 51
Problem
domain
DBMS Dependentt
DBMS Independent
Requirements
Collection & Analysis
Functional Requirements
Data Requirements
Functional Analysis
Conceptual Design
High-level Transaction Spec.
Application Program
Design
Conceptual Schema
Logical Design
(Data model mapping)
Logical Schema
Transaction
Implementation
Physical Design
Internal Schema
Application Programs
Software Development Iterates
http://www.extremeprogramming.org/
Waterfall Feigning Iteration
 This is not agile SE.
 People don’t want to know, but
iteration happens.
 The question is how well you cope.
Software Development Iterates
 Models change as programmers
understand the problem better.
 Requirements change as users
understand possibilities better.
 More resources become available.
 So:
 Save and maintain all modelling and
planning tools.
 Interact with users frequently.
 Learn rules to recognize common
failures.
Overview
 Database Design and Software



Engineering.
Example from real life.
Views and Users.
Redundancy and
Decomposition.
Loosely based on A. Keller, 2002, USCU
Where Do Models Come From?
 Demand – What do the users
want to see?
 Data – Don’t throw anything
away.
 Design – Never store anything
twice.
 Efficiency.
 Integrity.
 Clarity.
 Security.
Automating Trading Operations
Automating Trading Operations
 Data from:
 trading cards,
 market prices,
 underlying
values.
 Demand from:
 Knowing what is
owned for
determining risk.
 Knowing what
individuals did for
determining pay.
Entities and attributes
 Trades
 Instrument, trade,
price, 2 traders, 2
clearing firms.
 Traders
 Positions, trades.
 Trading cards
 Trades, trader,
number, time.
 Instruments
 Daily values,
volatilities, expiration
dates.
(draw on board)
New Requirement: Reconciling
 Business process – dual entry:
 Operations: primary concerns are
risk, execution on trading floor,
relations with individual traders.
 Clearing: primary concerns are
accounting, banks, law, relations
with other clearing firms.
 The same trading cards resulted
in different trade quantities.
 The same trade has different
dates!
Models will change!
 Design – Find out models are
inefficient or clumsy to maintain.
 Data – Discover new categories,
salient values.
 Demand – Users see new
potential.
Overview
 Database Design and Software



Engineering.
Example from real life.
Views and Users.
Redundancy and
Decomposition.
Loosely based on A. Keller, 2002, USCU
Views: Why they’re important.
 The same data may be seen by
different users in different ways.
 Shorthand for frequent joins,
formulas – may be more efficient.
 Automate / enforce security –
make access to tables and views
depend on user’s function.
 Keep reports logically
independent from underlying
representation – protect the
users!
Why Limit / Protect Users?
 Databases contain all
data for a company.
 Limits on access eliminate
suspects for errors, crime.
 Information overload.
 Smart, authoritative users
still need to find things
quickly.
 Some users really are
naïve.
Programmers are users.
Overview
 Database Design and Software



Engineering.
Example from real life.
Views and Users.
Redundancy and
Decomposition.
Loosely based on A. Keller, 2002, USCU
Software Development Iterates
 Models change as programmers
understand the problem better.
 Requirements change as users
understand possibilities better.
 More resources become available.
 So:
 Save and maintain all modelling and
planning tools.
 Interact with users frequently.
 Learn rules to recognize common
failures.
Software Development Iterates
 Models change as programmers
understand the problem better.
 Requirements change as users
understand possibilities better.
 More resources become available.
 So:
 Save and maintain all modelling and
planning tools.
 Interact with users frequently.
 Learn rules to recognize common
failures.
Pitfalls in Relational DB Design
A bad design may lead to:
 redundant information,
 difficulty in representing certain
information, or
 difficulty in checking integrity
constraints.
Design goals:
 Avoid redundant data.
 Ensure that relationships among
attributes are represented.
 Facilitate the checking of updates
for violation of integrity constraints.
©Silberschatz, Korth and Sudarshan
Modifications & additions by S Bird, Melbourne
Example of Bad Design
 Consider the relation schema:
Lending-schema = (branch-name,branch-city,assets,
customer-name,loan-number,amount)
 Redundant Information:
 Data for branch-name, branch-city, assets are
repeated for each loan that a branch makes.
 Wastes space and complicates updates, introducing
possibility of inconsistency of assets value.
 Difficulty representing certain information:
 Cannot store info. about a branch if no loans exist.
 Can use null values, but they are difficult to handle.
Solution: Decomposition
 Break up redundant tables into
multiple tables - this operation is
called decomposition.
 E.g. Lending-schema = (branchname, branch-city, assets, customername, loan-number, amount)
 Branch-schema = (branch-name,
branch-city,assets)
 Loan-info-schema = (customer-name,
loan-number, branch-name, amount)
Lossless-Join Decomposition:
 Want to ensure that the original
data is recoverable.
1.All attributes of the original schema
(R) must appear in the decomposition
(R1, R2), i.e. R = R1  R2
2.Decomposition must be a lossless-join
decomposition.
 Definition: R1,R2 is a lossless-join
decomposition of R
if, for all possible relations r(R)
 r = R1 (r) ⋈ R2 (r)
Bad Decomposition Example
 A Non Lossless-Join Decomposition
R = (A, B)  R1 = (A), R2 = (B)
A B
A
B
A
B





1
2
A(r)
B(r)




1
2
1
2
1
2
1
r
A (r) ⋈ B (r)
 Thus, r is different to A (r) ⋈ B (r)
 So A,B is not a lossless-join
decomposition of R.
Overview
 Database Design and Software



Engineering.
Example from real life.
Views and Users.
Redundancy and
Decomposition.
Loosely based on A. Keller, 2002, USCU
Summary
 Database design is an ongoing,
iterative process.
 Requirements come from data, user
demands, design issues.
 Change occurs:
 Corporations & technologies grow.
 Programmers & users learn.
 Views / security.
 Lossless-join decomposition
Next: Science for improving design.
Reading & Exercises
 Reading
 Connolly & Begg Chapter 9, (13, 14)
 Silberschatz Chapter 7.
 Much of 7 will be in the next lecture!
 Exercises:
 C&B: 9.2, 9.3/11, 9.9, 9.10
 Silberschatz:7.1, 7.2, 7.16
 These need functional dependencies,
which are covered next lecture.