Here - NYU Computer Science Department

Download Report

Transcript Here - NYU Computer Science Department

Dr. Zvi Kedem’s changes, if any, are
marked in green, they are not
copyrighted by the authors, and the
authors are not responsible for them.
Dennis Shasha has also made changes
in blue.
Goals of the Course
 To provide students a fundamental understanding of the
principles underlying both currently available and state of the art
databases.
 To acquaint you with basic tools used in the design and
implementations of databases, so you are better prepared to use
commercially available systems. It turns out that the more you
understand about how these systems are built, the better you will
be at using a database system.
 If you like it, then it will like you.
Database System Concepts
1.2
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
A Sample Relational Database
 NAME
A
A
B
C
E#
121
132
101
106
 E#
DATE
85.7.13
85.8.26
 E#
DATE
84.6.20
85.5.16
BOOK
132 a
121 b
ILLNS
101 cold
121 flu
 E#
101
121
106
132
ADDR
aaa
bbb
ccc
ddd
Database System Concepts
BIRTH
1948.2.12
1951.6.17
1963.2.28
1951.5.10
GRADE
2
3
4
2
SALARY
80
70
70
80
BIRTH
1963.2.28
1948.2.12
1951.5.10
1951.6.17
1.3
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Some Typical Queries
 Some typical queries
• Give names of all employees born before 1960.1.1.
• Give names and addresses of all employees who took out a book
before 1985.8.1.
• Send notices to all employees who had a flu to come for a checkup.
 Note that some queries involve a single table, and some involve
several tables.
 We would like to have a convenient language, as close as
possible to a natural language, to express these queries, and
similar ones. Such languages exist; principally, SQL.
Database System Concepts
1.4
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The Need for Good Design
 It is important also to think carefully about the correct (or just
good!) choice of which tables to use and what should be their
structure. Some sample problems with our design:
 Bad structures can lead to inconsistency or loss of data.
• Here we have the birth date appearing twice.
• There is a danger that this will be inconsistent.
• Even more dangerous would be multiple storage of employee's
address.
• If the employee moves, the address must be changed
everywhere.
 Note, however, that it might be useful to replicate information, to
speed up the access. This introduces a field called database
tuning, which is mostly the province of advanced databases,
though we will touch on it here.
Database System Concepts
1.5
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Different Users Need Different Data
 It may be our goal to create a design that best reflects the
inherent properties of the data.
• Various user groups may need to look at the data assuming different
organization.
 For privacy/security reasons we may want to give different users
different access privileges to the database. For example:
• The payroll department can see salaries but cannot see diseases.
• The health department can see diseases but cannot see salaries.
 Users may prefer to look at different aspects of the information.
• The payroll department may prefer to see the salary in a different
currency.
• The health department may prefer to see AGE instead of BIRTH.
Database System Concepts
1.6
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Views
 A possible solution: give each user (class of users) privileges to
look at a view, that is a small derived database. For example,
• the health department may think that there is a table:
E#
101
121
NAME
B
A
ADDRSS
aaa
bbb
AGE
25
37
DISEASE
COLD
FLU
 The database should provide such a view, which is computed
from the existing tables, without the user knowing other
(prohibited for this user) information.
 We need to leave flexibility for unanticipated queries.
• Some people may have the right and want to ask the query:
How are salaries and diseases correlated?
Database System Concepts
1.7
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Physical Design
 The database system must be organized so that it is able to
process queries efficiently. To do that:
• Files must be organized appropriately.
• Indices may be employed
 For example, if we frequently want to find the GRADE for various
E#, maybe the file should be hashed on this value, allowing
direct access.
 But, if we want to print the salaries of all the employees born in
1963, maybe the file should be sorted by BIRTH.
 What to do in general? Physical design of databases deals with
such issues, which are also closely related to the optimization of
query processing.
Database System Concepts
1.8
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Recovery
 The database must be resilient despite the fact that the system is
prone to faults.
 Assume one more table, describing the balances of employees'
accounts in the credit union.
• E#
101
106
121
132
SAVNGS
40
40
0
10
CHECKNG
30
20
80
0
 We want to give each employee a bonus of 10 in the savings
account.
• To do that, a transaction (a portion of a user program) will
sequentially change the values of SAVINGS.
Database System Concepts
1.9
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Example of a Problem
 The file describing the table is stored on a disk, values are read
into random access memory, modified and written out: if X is a
local variable then we have a trace of the desired execution (in
shorthand):
• X:=SAVINGS[101]
X:=X+10
SAVINGS[101]:=X
.
.
.
Read
Processing
Write
 What if the system crashes in the middle (say, electricity stops)?
 We do not know which of the values have been changed, so
what to do to recover?
 Various techniques exist for managing the execution, so that
reliable execution is possible.
Database System Concepts
1.10
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Concurrency
 There may also be problems because of the concurrent
execution of several transactions in a time sharing system.
 Assume that Alice is running a transaction, T1, that should
compute and print the sum of the total bank accounts:
• Account BALANCE
Checking 70
Savings 80
• Concurrently Bob wants to move 40 from CHECKING to SAVINGS,
using transaction T2.
 In a concurrent system we could have an incorrect execution.
 We write CH for CHECKING and SA for SAVINGS.
Database System Concepts
1.11
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Execution Trace of Two Transactions
 T1
(Bob)
T2 (Alice) .
.
.
Remove 40 from checking
Add 40 to savings
.
Read and sum up checking plus
savings
.
 Alice wonders what is going on.
Database System Concepts
1.12
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The Layers/Levels of the Ideal Database
 It is customary to think of the database as made of layers or
levels.
 The standard considers the top 3 of the 4 below:
• External (User)
View[1], ..., View[n]
• Conceptual (Community)
Description of the enterprise. Table
design.
• Internal
Files, access methods, indices,
distribution
• Database O.S.
Recovery and concurrency,
hardware and OS.
 Different levels have different tasks.
Database System Concepts
1.13
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The External Level
 The external level is seen by various users.
 Each view (subschema) is like a small conceptual level.
 It can also change in time.
• (A particular view may be modified, deleted or added).
Database System Concepts
1.14
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The Internal Level
 The internal level deals with file organization/storage
management.
 It changes in time too.
• New storage devices are brought
• files may have indices created because some queries are frequent
• the data may be geographically distributed,
• …
Database System Concepts
1.15
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
The Database Operating System Level
 The database operating system level deals with concurrency and
recovery.
 The database operating system can change too.
 The manufacturer of the database management system may
discover better methods to handle recovery/concurrency.
Database System Concepts
1.16
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Independence Among Levels
 A very important goal is (Data-) independence between levels:
e.g. the indexes that are chosen should not affect the semantics
of the application.
 We must make sure that changes in one level disturb as little as
possible the other levels. Sophisticated users soon realize that
performance considerations may cause one to violate this ideal.
But you should know the tradeoffs.
Database System Concepts
1.17
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,
Who Does What?
 The vendor sends:
• The database operating system
• Tools to create and manipulate the three top levels: external,
conceptual, and internal.
 The database administrator and his/her staff discuss with with
the users what information should the database contain and its
structure.
• A common model (language for describing reality) is needed for
them to communicate.
• Entity-relationship model is frequently used.
 The administrator and the users design the conceptual and the
external levels.
Database System Concepts
1.18
©Zvi
M. Kedem Korth and Sudarshan
©Silberschatz,