Transcript Review 1

Database: Review
Database
SQL: DDL, DML
Relational algebra,
Relational data model
B+-tree
Hashing
system architecture,
Basic concepts,
Introduction
1
ACS-3902
Yangjun Chen
Sept. 2009
Database: Review
What is a database?
The main characters of a database
The basic database design method
The entity-relationship data model
for application modeling
Sept. 2009
Yangjun Chen
ACS-3902
2
Database: Review
The main characteristics of the database approach:
single repository of data
• sharable by multiple users
• concurrency control and transaction concept
• security and integrity constraints
• self-describing - system catalogue contains meta data
• program-data independence
• some changes to the database are transparent to
programs/users
• multiple views of data - to support individual needs of
programs/users
Sept. 2009
Yangjun Chen
ACS-3902
3
Database: Review
Data modeling using
ER-model
Sept. 2009
Entity-relationship model
- Entity types
- strong entities
- weak entities
- Relationships among entities
- Attributes - attribute classification
- Constraints
- cardinality constraints
- participation constraints
ER-to-Relation-mapping
Yangjun Chen
ACS-3902
4
Database: Review
ER-model:
fname
sex
address
employee
1
M hours
M
supervisor
1
supervisee
1
number of
employees
N
N
works on
supervision
M
controls
manages
bdate
1
department
works for
N
salary
startdate
ssn
location
1
lname
minit
name
name number
dependents of
project
name number
location
N
dependent
name
Sept. 2009
sex
birthdate relationship
Yangjun Chen
ACS-3902
5
Database: Review
Database schema, Schema evolution,
Database state
Concepts and
Architecture
Working process with a database system
Database system architecture
Data independence concept
Sept. 2009
Yangjun Chen
ACS-3902
6
Database: Review
Database schema
Course CName CNo CrHrs Dept
Database 8803 3
CS
C
2606 3
CS
Relation schema
Schema evolution
Database state
Student Name StNo Class Major
Smith 17
1 CS
Brown
8
2
Grades StNo Sid Grade
17 25 A
CS
17
43
B
Section SId CNo Semester Yr
Instructor
32 8803 Spring 2000 Smith
25
43
Sept. 2009
8803 Winter 2000 Smith
2606 Spring 2000 Jones
Yangjun Chen
ACS-3902
7
Database: Review
Working process with a database system:
Definition
•record structure
•data elements
•names
•data types
•constraints
etc
Sept. 2009
Construction
•create database
files
•populate the
database with
records
Yangjun Chen
ACS-3902
Manipulation
•querying
•updating
8
Database: Review
Database Management System (DBMS)
• collection of software facilitating the definition,
construction and manipulation of databases
DBMS
Request
manager
Users/
actors
Sept. 2009
Storage
manager,
Query
evaluation
Yangjun Chen
ACS-3902
Meta data
Stored
database
9
Database: Review
Three-schema architecture
External
view
External
view
Describes the
whole database
for all users
Conceptual
schema
Physical storage
structures and
details
Internal
schema
Sept. 2009
Yangjun Chen
A specific user or
groups view of the
database
ACS-3902
10
Database: Review
external hashing
static hashing & dynamic hashing
hash function
mathematical function that maps a key to a
Hashing technique
bucket address
collisions
collision resolution scheme
- open addressing
- chaining
- multiple hashing
linear hashing
Sept. 2009
Yangjun Chen
ACS-3902
11
Database: Review
External hashing: the data are on the disk.
Static hashing:
using a hashing function to map keys to bucket addresses
primary area can not be changed
collision resolusion scheme:
open addressing
chaining
multiple hashing
Dynamic hashing:
primary area can be changed
linear hashing
Sept. 2009
Yangjun Chen
ACS-3902
12
Database: Review
Linear hashing:
1. What is a phase?
2. When to split a bucket?
3. How to split a bucket?
4. What bucket will be chosen to split next?
5. How do we find a record inserted into a linear hashing file?
Sept. 2009
Yangjun Chen
ACS-3902
13
Database: Review
Linear hashing:
initially hash file contains M buckets
hi = key mod (2iM) (i = 0, 1, 2, ...)
insertion process can be divided into several phases
phase 1:
insertion using h0 = key mod M
splitting using h1 = key mod (2M)
splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)
overflow will be put in the overflow area or redistributed through
splitting a bucket
splitting buckets from n = 0 to n = M- 1 (after each splitting
n is increased by 1.
Phase 1 finishes when n = M (in this case, the primary area
becomes 2M buckets long)
Sept. 2009
Yangjun Chen
ACS-3902
14
Database: Review
phase 2:
insertion using h1 = key mod (2M)
splitting using h2 = key mod (4M)
splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)
overflow will be put in the overflow area or redistributed
through
splitting a bucket
splitting buckets from n = 0 to n = 2M- 1 (after each splitting
n is increased by 1.
Phase 1 finishes when n = 2M (in this case, the primary area
will contain 4M buckets.)
phase 3: ... … h2 = …, h3 = …, ...
Sept. 2009
Yangjun Chen
ACS-3902
15
Database: Review
Linear Hashing including two Phases:
- collision resolution strategy: chaining
- split rule: load factor > 0.7
- initially M = 4 (M: size of the primary area)
- hash functions: hi(key) = key mod 2i  M (i = 0, 1, 2, …)
- bucket capacity = 2
Trace the insertion process of the following keys into a linear
hashing file:
3, 2, 4, 1, 8, 14, 5, 10, 7, 24, 17, 13, 15.
Sept. 2009
Yangjun Chen
ACS-3902
16
Database: Review
The first phase – phase0
• when inserting the sixth record we would have
4
8
1
0
1
2
14
3
2
3
n=0 before the split
(n is the point to the
bucket to be split.)
• but the load factor 6/8= 0.75 > 0.70 and so bucket 0 must be
split (using h1 = Key mod 2M):
8
2
14
1
0
Sept. 2009
1
3
2
n=1 after the split
load factor: 6/10=0.6
no split
4
3
Yangjun Chen
4
ACS-3902
17
Database: Review
insert(5)
8
0
8
0
Sept. 2009
1
1
1
5
1
2
14
3
2
3
2
14
3
2
3
Yangjun Chen
4
4
4
n=1
load factor: 7/10=0.7
no split
4
ACS-3902
18
Database: Review
insert(10)
8
0
8
1
5
1
1
5
2
14
3
2
3
2
14
3
4
4
4
n=1
load factor: 8/10=0.8
split using h1.
overflow
10
Sept. 2009
Yangjun Chen
ACS-3902
19
Database: Review
8
0
1
1
2
14
3
2
3
4
4
5
5
overflow
10
n=2
load factor: 8/12=0.66
no split
Sept. 2009
Yangjun Chen
ACS-3902
20
Database: Review
insert(7)
8
1
2
14
3
4
n=2
load factor: 9/12=0.75
split using h1.
overflow
10
8
0
1
1
5
2
14
3
7
2
3
4
4
5
5
overflow
10
Sept. 2009
Yangjun Chen
ACS-3902
21
Database: Review
8
1
2
10
3
7
4
5
14
n=3
load factor: 9/14=0.642
no split.
insert(24)
8
Sept. 2009
1
2
10
3
7
Yangjun Chen
4
ACS-3902
5
14
22
Database: Review
8
24
1
2
10
3
7
4
5
14
n=3
load factor: 10/14=0.71
split using h1.
8
24
Sept. 2009
1
2
10
3
Yangjun Chen
4
ACS-3902
5
14
7
23
Database: Review
8
24
1
2
10
3
4
5
14
7
n=4
The second phase – phase1
n = 0; using h1 = Key mod 2M to insert and
h2 = Key mod 4M to split.
insert(17)
8
24
Sept. 2009
1
2
10
3
Yangjun Chen
4
ACS-3902
5
14
7
24
Database: Review
8
24
1
2
10
3
4
5
14
7
n=4
The second phase – phase1
n = 0; using h1 = Key mod 2M to insert and
h2 = Key mod 4M to split.
insert(17)
8
24
Sept. 2009
1
2
10
3
Yangjun Chen
4
ACS-3902
5
14
7
25
Database: Review
8
24
1
17
2
10
3
4
5
14
7
n=0
load factor: 11/16=0.687
no split.
insert(13)
8
24
Sept. 2009
1
17
2
10
3
Yangjun Chen
4
ACS-3902
5
14
7
26
Database: Review
8
24
1
17
2
10
3
5
13
4
14
7
n=0
load factor: 12/16=0.75
split bucket 0, using h2.
1
17
Sept. 2009
2
10
3
4
Yangjun Chen
5
13
14
ACS-3902
7
8
24
27
Database: Review
insert(15)
1
17
2
10
1
17
2
10
3
3
4
5
13
4
5
13
14
7
8
24
14
7
15
8
24
n=1
load factor: 13/18=0.722
split bucket 1, using h2.
1
17
Sept. 2009
2
10
3
4
Yangjun Chen
5
13
14
ACS-3902
7
15
8
24
28
Database: Review
tree
- root, internal, leaf, subtree
- parent, child, sibling
Multi-level
index
balanced, unbalanced
b+-tree
- splits on overflow; merge on underflow
- in practice it is usually 3 or 4 levels deep
search, insert, delete algorithms
Sept. 2009
Yangjun Chen
ACS-3902
29
Database: Review
B+-tree Structure
non-leaf node (internal node or a root)
• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q  pinternal)
• K1 < K2 < ... < Kq-1
(i.e. it’s an ordered set)
• For any key value, X, in the subtree pointed to by Pi
•Ki-1 < X  Ki for 1 < i < q
•X  K1
for i = 1
•Kq-1 < X
for i = q
• Each internal node has at most pinternal pointers.
• Each node except root must have at least pinternal/2 pointers.
• The root, if it has some children, must have at least 2 pointers.
Sept. 2009
Yangjun Chen
ACS-3902
30
Database: Review
B+-tree Structure
leaf node (terminal node)
• < (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext >
• K1 < K2 < ... < Kq-1
• Pri points to a record with key value Ki, or Pri points to a page
containing a record with key value Ki.
• Maximum of pleaf key/pointer pairs.
• Each leaf has at least pleaf/2 keys.
• All leaves are at the same level (balanced).
• Pnext points to the next leaf node for key sequencing.
Sept. 2009
Yangjun Chen
ACS-3902
31
Database: Review
A B+-tree
pinternal = 3,
pleaf = 2.
5
3
1
3
5
7
6
7
8
8
9
12
Records in a file
Sept. 2009
Yangjun Chen
ACS-3902
32
Database: Review
B+-tree insertion: leaf node splitting, internal node splitting
Leaf splitting
When a leaf splits, a new leaf is allocated
• the original leaf is the left sibling, the new one is the right sibling
• key and pointer pairs are redistributed: the left sibling will have smaller
keys than the right sibling
• a 'copy' of the key value which is the largest of the keys in the left sibling
is promoted to the parent
22 33
33
12 22 33
44 48 55
12 22
31 33
44 48 55
insert 31
Sept. 2009
Yangjun Chen
ACS-3902
33
Database: Review
Internal node splitting
If an internal node splits and it is not the root,
• insert the key and pointer and then determine the middle key
• a new 'right' sibling is allocated
• everything to its left stays in the left sibling
• everything to its right goes into the right sibling
• the middle key value along with the pointer to the new right sibling is
promoted to the parent (the middle key value 'moves' to the parent to
become the discriminator between this left and right sibling)
26 55
55
22 33
22
33
Insert
26
Sept. 2009
Yangjun Chen
ACS-3902
34
Database: Review
Internal node splitting
When a new root is formed, a key value and two pointers must
be placed into it.
40
26 55
26
55
Insert
40
Sept. 2009
Yangjun Chen
ACS-3902
35
Database: Review
Deleting nodes from a B+-tree:
1. When deleting a key from a node A, check whether the
number of the remaining keys (or pointers) is  p/2.
2. If it is not the case, redistribute the keys in the left sibling B or
in the right sibling C if it is possible. Otherwise, merge A and B or merge
A and C.
3. When redistributing or merging, change the key values in the
parent node so that the following condition is satisfied:
• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq >
• K1 < K2 < ... < Kq-1
(i.e. it is an ordered set)
• for the key values, X, in the subtree pointed to by Pi
• Ki-1 < X <= Ki for 1 < i < q
• X <= K1
for i = 1
• Kq-1 < X
for i = q
Sept. 2009
Yangjun Chen
ACS-3902
36
Database: Review
A b+-tree
pinternal = 3,
pleaf = 2.
5
3
1
3
5
7
6
7
8
8
9
12
Records
Sept. 2009
Yangjun Chen
ACS-3902
37
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
7
6
7
9
9
12
Deleting 8 causes the node redistribute.
Sept. 2009
Yangjun Chen
ACS-3902
38
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
7
6
7
9
12 is removed.
Sept. 2009
Yangjun Chen
ACS-3902
39
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
6
6
7
9 is removed.
Sept. 2009
Yangjun Chen
ACS-3902
40
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
6
6
Deleting 7 makes this pointer no use.
Therefore, a merge at the level above
the leaf level occurs.
Sept. 2009
Yangjun Chen
ACS-3902
41
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
A
5
This point becomes useless.
The corresponding node
should also be removed.
B
1
3
5
6
C
For this merge, 5 will be taken as a key value in A since
any key value in B is less than or equal to 5 but any key
value in C is larger than 5.
Sept. 2009
Yangjun Chen
ACS-3902
42
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
3
1
Sept. 2009
3
5
Yangjun Chen
5
6
ACS-3902
43
Database: Review
Relational Data Model
Data modeling using
Relational model
Relational algebra
-
relation schema, relations
-
database schema (relational schema),
database state
-
integrity constraints and updating
Relational algebra
-
select, project, join, cartesian product
-
division
-
set operations:
union, intersection, difference,
Sept. 2009
Yangjun Chen
ACS-3902
44
Database: Review
Integrity Constraints
• any database will have some number of constraints that must
be applied to ensure correct data (valid states)
1. domain constraints
• a domain is a restriction on the set of valid values
• domain constraints specify that the value of each
attribute A must be an atomic value from the domain
dom(A).
2. key constraints
• a superkey is any combination of attributes that
uniquely identify a tuple: t1[superkey]  t2[superkey].
- Example: <Name, SSN> (in Employee)
• a key is superkey that has a minimal set of attributes
- Example: <SSN> (in Employee)
Sept. 2009
Yangjun Chen
ACS-3902
45
Database: Review
Integrity Constraints
• If a relation schema has more than one key, each of them is
called a candidate key.
• one candidate key is chosen as the primary key (PK)
• foreign key (FK) is defined as follows:
i) Consider two relation schemas R1 and R2;
ii) The attributes in FK in R1 have the same domain(s) as the
primary key attributes PK in R2; the attributes FK are said to
reference or refer to the relation R2;
iii) A value of FK in a tuple t1 of the current state r(R1) either
occurs as a value of PK for some tuple t2 in the current state
r(R2) or is null. In the former case, we have t1[FK] = t2[PK],
and we say that the tuple t1 references or refers to the tuple t2.
Example:
FK
Employee(SSN, …, Dno)
Sept. 2009
Yangjun Chen
Dept(Dno, … )
ACS-3902
46
Database: Review
Integrity Constraints
3. entity integrity
• no part of a PK can be null
4. referential integrity
• domain of FK must be same as domain of PK
• FK must be null or have a value that appears as a PK
value
5. semantic integrity
• other rules that the application domain requires:
• state constraint: gross salary > net income
• transition constraint: Widowed can only follow
Married; salary of an employee cannot decrease
Sept. 2009
Yangjun Chen
ACS-3902
47
Database: Review
Updating and constraints
insert
• Insert the following tuple into EMPLOYEE:
<‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy
Lane, Katy, TX’, F, 40000, null, 4>
• When inserting, the integrity constraints should be
checked: domain, key, entity, referential, semantic
integrity
update
• Update the SALARY of the EMPLOYEE tuple with ssn =
‘999887777’ to 30000.
• When updating, the integrity constraints should be
checked: domain, key, entity, referential, semantic
integrity
Sept. 2009
Yangjun Chen
ACS-3902
48
Database: Review
Updating and constraints
delete
• Delete the WORK_ON tuple with Essn = ‘999887777’ and
pno = 10.
• When deleting, the referential constraint will be checked.
- The following deletion is not acceptable:
Delete the EMPLOYEE tuple with ssn = ‘999887777’
- reject, cascade, modify
Sept. 2009
Yangjun Chen
ACS-3902
49
Database: Review
cascade – a strategy to enforce referential integrity
Employee
ssn
...
123456789
...
Works-on
Essn
123456789
...
Sept. 2009
delete
Pno
5
...
Yangjun Chen
delete
ACS-3902
50
Database: Review
cascade – a strategy to enforce referential integrity
Employee
ssn
... ...
supervisor
234589710
123456789
... ...
234589710
null
delete
Employee
ssn
... ...
supervisor
234589710
123456789
... ...
delete
234589710
Sept. 2009
not reasonable
null
Yangjun Chen
ACS-3902
delete
51
Database: Review
Modify – a strategy to enforce referential integrity
Employee
ssn
...
123456789
...
Works-on
Essn
123456789
...
delete
Works-on
Essn
Pno
null
5
...
...
Pno
5
...
This violates the entity constraint.
Sept. 2009
Yangjun Chen
ACS-3902
52
Database: Review
Relational Algebra
a set of relations
relation specific
a set of operations
set operations
Sept. 2009
Yangjun Chen
ACS-3902
select
project
join
division
union
intersection
difference
cartesian product
53
Database: Review
Relational algebra
Retrieve for each female employee a list of the names of her
dependents:
FEMALE_EMPS  SEX = ‘F’ (EMPLOYEE)
EMPNAMES  FNAME,LNAME, SSN(FEMALE_EMPS)
ACTUAL_DEPENDENTS  EMPNAMES
DEPENDENT
SSN = ESSN
RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS )
Sept. 2009
Yangjun Chen
ACS-3902
54
Database: Review
Query: Retrieve the name of employees who work on all
the projects that ‘John Smith’ works on.
SMITH  FNAME = ‘John’ and LNAME = ‘Smith’(EMPLOYEE)
SMITH_PNOs  PNO(WORK_ON
ESSN = SSNSMITH)
SSN_PNO  ESSN,PNO(WORK_ON)
SSNS(SSN)  SSN_PNO : SMITH_PNOs
RESULT  FNAME, LNAME(SSNS * EMPLOYEE)
Sept. 2009
Yangjun Chen
ACS-3902
55
Database: Review
Division
The DIVISION operator can be expressed as a sequence of
, , and - operations as follows:
Z = {A1, …, An, B1, …, Bm}, X = {B1, …, Bm},
Y = Z - X = {A1, …, An},
T1  Y( R)
R(Z) : S(X)
T2  Y((S  T1) - R)
T  T1 - T2
result
Sept. 2009
Yangjun Chen
ACS-3902
56
Database: Review
DDL
- creating schemas
- modifying schemas
DML
SQL
- select-from-where clause
- group by, having, order by
- update
- view
Sept. 2009
Yangjun Chen
ACS-3902
57
Database: Review
DDL - Examples:
• Create schema:
Create schema COMPANY authorization JSMITH;
• Create table:
Create table EMPLOYEE
(FNAME
VARCHAR(15)
NOT NULL,
MINIT
CHAR,
LNAME
VARCHAR(15)
NOT NULL,
SSN
CHAR(9)
NOT NULL,
BDATE
DATE,
ADDRESS
VARCHAR(30),
SEX
CHAR,
SALARY
DECIMAL(10, 2),
SUPERSSN
CHAR(9),
DNO
INT
NOT NULL,
PRIMARY KEY(SSN),
FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),
FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));
Sept. 2009
Yangjun Chen
ACS-3902
58
Database: Review
DDL - Examples:
• drop schema
DROP SCHEMA CAMPANY CASCADE;
DROP SCHEMA CAMPANY RESTRICT;
• drop table
DROP TABLE DEPENDENT CASCADE;
DROP TABLE DEPENDENT RESTRICT;
• alter table
ALTER TABLE COMPANY.EMPLOYEE
ADD JOB VARCHAR(12);
ALTER TABLE COMPANY.EMPLOYEE
DROP ADDRESS CASCADE;
Sept. 2009
Yangjun Chen
ACS-3902
59
Database: Review
DML - select-from-where clause
Retrieve a list of employees and the projects they are working on, ordered by
department, within each department, ordered alphabetically by last name, first
name:
SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT
WHERE DNUMBER = DNO AND SSN = ESSN AND
PNO = PNUMBER
ORDER BY DNAME, LNAME, FNAME
Sept. 2009
Yangjun Chen
ACS-3902
60