Transcript Review 2

Database: Review
Database
Hierarchical databases
Multi-dimensional indexes
Lossless join
Normalization
JDBC (DB application in Java)
Relational algebra,
Relational data model
B+-tree
Hashing
Introduction
1
not covered
ACS-3902
Yangjun Chen
Sept. 2012
Database: Review
What is a database?
The main characters of a database
The basic database design method
The entity-relationship data model
for application modeling
Sept. 2012
Yangjun Chen
ACS-3902
2
Database: Review
The main characteristics of the database approach:
single repository of data
• sharable by multiple users
• concurrency control and transaction concept
• security and integrity constraints
• self-describing - system catalogue contains meta data
• program-data independence
• some changes to the database are transparent to
programs/users
• multiple views of data - to support individual needs of
programs/users
Sept. 2012
Yangjun Chen
ACS-3902
3
Database: Review
Database schema, Schema evolution,
Database state
Concepts and
Architecture
Working process with a database system
Database system architecture
Data independence concept
Sept. 2012
Yangjun Chen
ACS-3902
4
Database: Review
Database schema
Course CName CNo CrHrs Dept
Database 8803 3
CS
C
2606 3
CS
Relation schema
Schema evolution
Database state
Student Name StNo Class Major
Smith 17
1 CS
Brown
8
2
Grades StNo Sid Grade
17 25 A
CS
17
43
B
Section SId CNo Semester Yr
Instructor
32 8803 Spring 2000 Smith
25
43
Sept. 2012
8803 Winter 2000 Smith
2606 Spring 2000 Jones
Yangjun Chen
ACS-3902
5
Database: Review
Working process with a database system:
Definition
•record structure
•data elements
•names
•data types
•constraints
etc
Sept. 2012
Construction
•create database
files
•populate the
database with
records
Yangjun Chen
ACS-3902
Manipulation
•querying
•updating
6
Database: Review
Database Management System (DBMS)
•collection of software facilitating the definition,
construction and manipulation of databases
DBMS
Request
manager
Users/
actors
Sept. 2012
Storage
manager,
Query
evaluation
Yangjun Chen
ACS-3902
Meta data
Stored
database
7
Database: Review
Three-schema architecture
External
view
Sept. 2012
External
view
A specific user or
groups view of the
database
Conceptual
schema
Describes the
whole database
for all users
Internal
schema
Physical storage
structures and
details
Yangjun Chen
ACS-3902
8
Database: Review
Data modeling using
ER-model
Sept. 2012
Entity-relationship model
- Entity types
- strong entities
- weak entities
- Relationships among entities
- Attributes - attribute classification
- Constraints
- cardinality constraints
- participation constraints
- is-a relationship
ER-to-Relation-mapping
Yangjun Chen
ACS-3902
9
Database: Review
ER-model:
fname
sex
address
employee
1
M hours
M
supervisor
1
supervisee
1
number of
employees
N
N
works on
supervision
M
controls
manages
bdate
1
department
works for
N
salary
startdate
ssn
location
1
lname
minit
name
name number
dependents of
project
name number
location
N
dependent
name
Sept. 2012
sex
birthdate relationship
Yangjun Chen
ACS-3902
10
Database: Review
Subtype is determined by
the student_class attribute
student
A student must be a
graduate or undergraduate
The bubble and the d
imply disjoint subtypes
(o - overlap subtypes)
Student_class
d
graduate
undergraduate
The arc implies graduate
and undergraduate are
subtypes of student
• Participation of supertype may be mandatory or optional
• Subtypes may be disjoint or overlapping
• a predicate (on an attribute) determines the subtype: e.g. attribute
Student_class
Student_class = ‘graduate’; Student_class = ‘undergraduate’
Sept. 2012
Yangjun Chen
ACS-3902
11
Database: Review
Mapping to a relational database
• 4 choices:
1. Create separate relations for the supertype and each of the
subtypes.
2. Create relations for the subtypes only - each contains
attributes from the supertype.
3. (disjoint subtypes) Create only one relation - includes all
of the attributes for the supertype and all for the subtypes,
and one discriminator attribute.
4. (overlapping subtypes) Create only one relation includes all of the attributes for the supertype and all for the
subtypes, and one logical discriminator attribute per subtype.
PK is always the same - determined from the supertype
Sept. 2012
Yangjun Chen
ACS-3902
12
Database: Review
fname
minit
lname
name
Ssn
bDates Address
JobType
EMPLOYEE
TypingSpeed
d
EngType
TGrade
SECRETARY
TECHNICIAN
ENGINEER
EMPLOYEE
fname, minit, lname, ssn, bdate, address, JobType
SECRETARY
Essn, TypingSpeed
Sept. 2012
TECHNICIAN
ENGINEER
Essn, TGrade
Yangjun Chen
ACS-3902
Essn, EngType
13
Database: Review
Example for super- &
sub-types: choice 2
VehicleId
Price
LicensePlate
Vehicle
TNoOfPassengers
d
NoOfAxles
CAR
MaxSpeed
TRUCK
Tonnage
CAR
VehicleId, LicensePlate, Price, MaxSpeed, NoOfPassenger
TRUCK
VehicleId, LicensePlate, Price, NoOfAxles, Tonnage
Sept. 2012
Yangjun Chen
ACS-3902
14
Database: Review
fname
minit
lname
name
Ssn
Example for super- &
sub-types: choice 3
bDates Address
JobType
EMPLOYEE
TypingSpeed
d
EngType
TGrade
SECRETARY
TECHNICIAN
ENGINEER
EMPLOYEE
fname, minit, lname, ssn, bdate, address, JobType, TypingSpeed, Tgrade, EngType
Sept. 2012
Yangjun Chen
ACS-3902
15
Database: Review
Example for super- &
sub-types: choice 4
Description
PartNo
Part
manufactureDate
o
Supplier
DrawingNo
Manufacture_Part
ListPrice
BatchNo
Purchased_Part
Part
PartNo, Desription, MFlag, Drawing, ManufactureDate, BatchNo, Pflag, Supplier, ListPrice
Sept. 2012
Yangjun Chen
ACS-3902
16
Database: Review
external hashing
static hashing & dynamic hashing
hash function
mathematical function that maps a key to a
Hashing technique
bucket address
collisions
collision resolution scheme
- open addressing
- chaining
- multiple hashing
linear hashing
Sept. 2012
Yangjun Chen
ACS-3902
17
Database: Review
External hashing: the data are on the disk.
Static hashing:
using a hashing function to map keys to bucket addresses
primary area can not be changed
collision resolution schema:
open addressing
chaining
multiple hashing
Dynamic hashing:
primary area can be changed
linear hashing
Sept. 2012
Yangjun Chen
ACS-3902
18
Database: Review
Linear hashing:
1. What is a phase?
2. How to split a bucket?
3. When to split a bucket?
4. What bucket will be chosen to split next?
Sept. 2012
Yangjun Chen
ACS-3902
19
Database: Review
Linear hashing:
initially hash file contains M buckets
hi = key mod 2iM (i = 0, 1, 2, ...)
insertion process can be divided into several phases
phase 1:
insertion using h0 = key mod M
splitting using h1 = key mod 2M
splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)
overflow will be put in the overflow area or redistributed through
splitting a bucket
splitting buckets from n = 0 to n = M- 1 (after each splitting
n is increased by 1.
Phase 1 finishes when n = M (in this case, the primary area
becomes 2M buckets long)
Sept. 2012
Yangjun Chen
ACS-3902
20
Database: Review
phase 2:
insertion using h1 = key mod 2M
splitting using h2 = key mod 4M
splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)
overflow will be put in the overflow area or redistributed
through
splitting a bucket
splitting buckets from n = 0 to n = 2M- 1 (after each splitting
n is increased by 1.
Phase 1 finishes when n = 2M (in this case, the primary area
will contain 4M buckets.)
phase 3: ... … h2 = …, h3 = …, ...
Sept. 2012
Yangjun Chen
ACS-3902
21
Database: Review
tree
- root, internal, leaf, subtree
- parent, child, sibling
Multi-level
index
balanced, unbalanced
b+-tree
- splits on overflow; merge on underflow
- in practice it is usually 3 or 4 levels deep
search, insert, delete algorithms
Sept. 2012
Yangjun Chen
ACS-3902
22
Database: Review
Motivation
• B+-tree provides a short access path.
file of records
page1
Index
page2
page3
Inverted index
Signature file
B+-tree
Hashing
……
Sept. 2012
Yangjun Chen
ACS-3902
23
Database: Review
A B+-tree
pinternal = 3,
pleaf = 2.
5
3
1
3
5
7
6
7
8
8
9
12
Records in a file
Sept. 2012
Yangjun Chen
ACS-3902
24
Database: Review
B+-tree insertion: leaf node splitting, internal node splitting
Leaf splitting
When a leaf splits, a new leaf is allocated
•the original leaf is the left sibling, the new one is the right sibling
•key and pointer pairs are redistributed: the left sibling will have smaller
keys than the right sibling
•a 'copy' of the key value which is the largest of the keys in the left sibling
is promoted to the parent
22 33
33
12 22 33
44 48 55
12 22
31 33
44 48 55
insert 31
Sept. 2012
Yangjun Chen
ACS-3902
25
Database: Review
Internal node splitting
If an internal node splits and it is not the root,
• insert the key and pointer and then determine the middle key
• a new 'right' sibling is allocated
• everything to its left stays in the left sibling
• everything to its right goes into the right sibling
• the middle key value along with the pointer to the new right sibling is
promoted to the parent (the middle key value 'moves' to the parent to
become the discriminator between this left and right sibling)
26 55
55
22 33
22
33
Insert
26
Sept. 2012
Yangjun Chen
ACS-3902
26
Database: Review
Internal node splitting
When a new root is formed, a key value and two pointers must
be placed into it.
40
26 55
26
55
Insert
40
Sept. 2012
Yangjun Chen
ACS-3902
27
Database: Review
Deleting nodes from a B+-tree:
1.
When deleting a key from a node A, check whether the number of the
remaining keys (or pointers) is  p/2.
2.
If it is not the case, redistribute the keys in the left sibling B or in the right
sibling C if it is possible. Otherwise, merge A and B or merge A and C.
3.
When redistributing or merging, change the key values in the parent node so
that the following condition is satisfied:
• < P1, K1, P2, K2, …, Pq-1, Kq-1, Pq >
• K1 < K2 < ... < Kq-1
(i.e. it is an ordered set)
• for the key values, X, in the subtree pointed to by Pi
• Ki-1 < X <= Ki for 1 < i < q
• X <= K1
for i = 1
• Kq-1 < X for i = q
Sept. 2012
Yangjun Chen
ACS-3902
28
Database: Review
A b+-tree
p = 3,
pleaf = 2.
5
3
1
3
5
7
6
7
8
8
9
12
Records
Sept. 2012
Yangjun Chen
ACS-3902
29
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
7
6
7
9
9
12
Deleting 8 causes the node redistribute.
Sept. 2012
Yangjun Chen
ACS-3902
30
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
7
6
7
9
12 is removed.
Sept. 2012
Yangjun Chen
ACS-3902
31
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
6
6
7
9 is removed.
Sept. 2012
Yangjun Chen
ACS-3902
32
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
1
3
5
6
6
Deleting 7 makes this pointer no use.
Therefore, a merge at the level above
the leaf level occurs.
Sept. 2012
Yangjun Chen
ACS-3902
33
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3
A
5
This point becomes useless.
The corresponding node
should also be removed.
B
1
3
5
6
C
For this merge, 5 will be taken as a key value in A since
any key value in B is less than or equal to 5 but any key
value in C is larger than 5.
Sept. 2012
Yangjun Chen
ACS-3902
34
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
3
1
Sept. 2012
3
5
Yangjun Chen
5
6
ACS-3902
35
Database: Review
Relational Data Model
Data modeling using
Relational model
Relational algebra
-
relational schema (database schema)
-
relation schema, relations, database state
-
integrity constraints and updating
Relational algebra
-
select, project, join, cartesian product
-
division
-
set operations:
union, intersection, difference,
Sept. 2012
Yangjun Chen
ACS-3902
36
Database: Review
Integrity Constraints
• any database will have some number of constraints that must
be applied to ensure correct data (valid states)
1. domain constraints
• a domain is a restriction on the set of valid values
• domain constraints specify that the value of each
attribute A must be an atomic value from the domain
dom(A).
2. key constraints
• a superkey is any combination of attributes that
uniquely identify a tuple: t1[superkey]  t2[superkey].
- Example: <Name, SSN> (in Employee)
• a key is superkey that has a minimal set of attributes
- Example: <SSN> (in Employee)
Sept. 2012
Yangjun Chen
ACS-3902
37
Database: Review
Integrity Constraints
• If a relation schema has more than one key, each of them is called
a candidate key.
• one candidate key is chosen as the primary key (PK)
• foreign key (FK) is defined as follows:
i) Consider two relation schemas R1 and R2;
ii
) The attributes in FK in R1 have the same domain(s) as the
primary key attributes PK in R2; the attributes FK are said to
reference or refer to the relation R2;
iii) A value of FK in a tuple t1 of the current state r(R1) either
occurs as a value of PK for some tuple t2 in the current state
r(R2) or is null. In the former case, we have t1[FK] = t2[PK],
and we say that the tuple t1 references or refers to the tuple t2.
Example:
FK
Employee(SSN, …, Dno)
Sept. 2012
Yangjun Chen
Dept(Dno, … )
ACS-3902
38
Database: Review
Integrity Constraints
3. entity integrity
• no part of a PK can be null
4. referential integrity
• domain of FK must be same as domain of PK
• FK must be null or have a value that appears as a PK value
5. semantic integrity
• other rules that the application domain requires:
• state constraint: gross salary > net income
• transition constraint: Widowed can only follow Married; salary
of an employee cannot decrease
Sept. 2012
Yangjun Chen
ACS-3902
39
Database: Review
Other SQL capabilities
• Assertions can be used for some constraints
• e.g. Create Assertion ... ...
Executed and
enforced by DBMS
Constraint: The salary of an employee must not be greater than
the salary of the manager of the department that the employee
works for.
CREATE ASSERTION salary_constraint
CHECK (NOT EXISTS (SELECT * FROM employee e,
employee m, department d
where e.salary > m.salary and e.dno=d.dnumber and
d.mgrssn=m.ssn));
Sept. 2012
Yangjun Chen
ACS-3902
40
Database: Review
Relational algebra
Retrieve for each female employee a list of the names of her
dependents:
FEMALE_EMPS  SEX = ‘F’ (EMPLOYEE)
EMPNAMES  FNAME,LNAME, SSN(FEMALE_EMPS)
ACTUAL_DEPENDENTS  EMPNAMES
DEPENDENT
SSN = ESSN
RESULT FNAME, LNAME, DEPENDENT_NAME(ACTUAL_DEPENDENTS )
Sept. 2012
Yangjun Chen
ACS-3902
41
Database: Review
DDL
- creating schemas
- modifying schemas
DML
SQL
- select-from-where clause
- group by, having, order by
- update
- view
Sept. 2012
Yangjun Chen
ACS-3902
42
Database: Review
DDL - Examples:
• Create schema:
Create schema COMPANY authorization JSMITH;
• Create table:
Create table EMPLOYEE
(FNAME
VARCHAR(15)
NOT NULL,
MINIT
CHAR,
LNAME
VARCHAR(15)
NOT NULL,
SSN
CHAR(9)
NOT NULL,
BDATE
DATE,
ADDRESS VARCHAR(30),
SEX
CHAR,
SALARY
DECIMAL(10, 2),
SUPERSSN CHAR(9),
DNO
INT
NOT NULL,
PRIMARY KEY(SSN),
FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),
FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));
Sept. 2012
Yangjun Chen
ACS-3902
43
Database: Review
DDL - Examples:
• drop schema
DROP SCHEMA CAMPANY CASCADE;
DROP SCHEMA CAMPANY RESTRICT;
• drop table
DROP TABLE DEPENDENT CASCADE;
DROP TABLE DEPENDENT RESTRICT;
• alter table
ALTER TABLE COMPANY.EMPLOYEE
ADD JOB VARCHAR(12);
ALTER TABLE COMPANY.EMPLOYEE
DROP ADDRESS CASCADE;
Sept. 2012
Yangjun Chen
ACS-3902
44
Database: Review
DDL - Examples:
• Specifying constraints:
Create table EMPLOYEE
(…,
DNO
INT
NOT NULL
DEFAULT 1,
CONSTRAINT EMPPK
PRIMARY KEY(SSN),
CONSTRAINT EMPSUPERFK
FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN)
ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT EMPDEPTFK
FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER)
ON DELETE SET DEFAULT ON UPDATE CASCADE);
• Create domain:
CREATE DOMAIN SSN_TYPE AS CHAR(9);
Sept. 2012
Yangjun Chen
ACS-3902
45
Database: Review
set null or cascade: strategies to maintain data consistency
Employee
ssn
... ...
supervisor
234589710
123456789
... ...
234589710
null
delete
Employee
ssn
... ...
supervisor
234589710
123456789
... ...
234589710
Sept. 2012
null
Yangjun Chen
ACS-3902
not reasonable
delete
cascade
delete
46
Database: Review
set null or cascade: strategies to maintain data consistency
Employee
ssn
... ...
supervisor
234589710
123456789
... ...
234589710
null
delete
Employee
ssn
... ...
supervisor
null
123456789
... ...
234589710
Sept. 2012
reasonable
set null
null
Yangjun Chen
ACS-3902
delete
47
Database: Review
set default: strategy to maintain data consistency
Employee
ssn
... ...
DNO
4
123456789
... ...
234589710
……
Department
DNUMBER
... ...
change this
value to the
default value
1.
……
……
1
... ...
……
4
Sept. 2012
Yangjun Chen
ACS-3902
delete
48
Database: Review
DML - select-from-where clause
Retrieve a list of employees and the projects they are working on, ordered by
department, within each department, ordered alphabetically by last name, first
name:
SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT
WHERE DNUMBER = DNO AND SSN = ESSN AND
PNO = PNUMBER
ORDER BY DNAME, LNAME, FNAME
order by – clause
group by – clause
having – clause
aggregation functions: max, min, average, count, sum
Sept. 2012
Yangjun Chen
ACS-3902
49
Database: Review
DML - select-from-where clause
• Insert
• Update
• Delete
INSERT INTO employee ( fname, lname, ssn, dno )
VALUES ( "Joe", "Smith", 909, 1);
Note that Access changes the above to read:
INSERT INTO employee ( fname, lname, ssn, dno )
SELECT "Joe", "Smith", 909, 1;
UPDATE employee SET salary = 100000
WHERE ssn=909;
DELETE FROM employee WHERE ssn=909;
Sept. 2012
Yangjun Chen
ACS-3902
50
Database: Review
View definition
• Use a Create View command
• essentially a select specifying the data that makes up the
view
• Create View Enames as select lname, fname from employee
CREATE VIEW
AS SELECT
FROM
Sept. 2012
Enames (lname, fname)
LNAME, FNAME
EMPLOYEE
Yangjun Chen
ACS-3902
51
Database: Review
CREATE VIEW
AS SELECT
FROM
WHERE
GROUP BY
DEPT_INFO (DEPT_NAME,
NO_OF_EMPS,
TOTAL_SAL)
DNAME, COUNT(*), SUM(SALARY)
DEPARTMENT, EMPLOYEE
DNUMBER = DNO
DNAME;
Sept. 2012
Yangjun Chen
ACS-3902
52
Database: Review
JDBC
(Database application in Java)
Sept. 2012
Yangjun Chen
ACS-3902
53
Database: Review
To develop a database application, JDBC or ODBC should
be used.
JDBC – JAVA Database Connectivity
ODBC – Open Database Connectivity
Client
Server
JDBC-ODBC Bridge
Database
ODBC Driver
Database Client
Sept. 2012
Yangjun Chen
ACS-3902
54
Database: Review
Connection to a database:
1. Loading driver class
Class.forName(“sun.jdbc.odbc.JdbcOdbcDriver”);
2. Connection to a database
String url = “jdbc:odbc:<databaseName>”;
Connction con =
DriverManager.getConnection(url, <userName>,
<password>)
Sept. 2012
Yangjun Chen
ACS-3902
55
Database: Review
3. Sending SQL statements
Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery(“SELECT *
FROM Information WHERE Balance >= 5000”);
4. Getting results
while (rs.next())
{ …
}
Sept. 2012
Yangjun Chen
ACS-3902
56
Database: Review
import java.sql.*;
public class DataSourceDemo1
{ public static void main(String[] args)
{ Connection con = null;
try
{//load driver class
Class.forName{“sun.jdbs.odbs.JdbsOdbcDriver”);
//data source
String url = “jdbs:odbc:Customers”;
//get connection
con = DriverManager.getConnection(url,
“sa”, “ “)
password
Sept. 2012
Yangjun Chen
ACS-3902
57
Database: Review
//create SQL statement
Statement stmt = con.createStatement();
//execute query
Result rs = stmt.executeQuery(“SELECT *
FROM Information WHERE Balance >= 5000”);
String firstName, lastName;
Date birthDate;
float balance;
int accountLevel;
Sept. 2012
Yangjun Chen
ACS-3902
58
Database: Review
while(rs.next())
{ firstName = rs.getString(“FirstName”);
lastName = rs.getString(“lastName”);
balance = rs.getFloat(“Balance”);
System.out.println(firstName + “ “ +
lastName + “, balance = “ + balance);
}
}
catch(Exception e)
{e.printStackTrace();}
finally
{try{con.close();}
catch(Exception e){ }
}
}
}
Sept. 2012
Yangjun Chen
ACS-3902
59
Database: Review
Programming in an dynamical environment:
Disadvantage of DataSourceDemo1:
If the JDBC-ODBC driver, database, user names, or
password are changed, the program has to be modifid.
Solution:
file name: datasource.config
Configuration file:
config.driver=sun.jdbc.odbc.JdbcOdbcDriver
config.protocol=jdbc
<property> = <property value>
config.subprotocol=odbc
config.dsname=Customers
config – datasource name
config.username=sa
config.password=
Sept. 2012
Yangjun Chen
ACS-3902
60
Database: Review
import java.sql.*;
import java.io.*;
import java.util.Properties;
public class DatabaseAccess
{ private String configDir;
//directory for configuration file
private String dsDriver = null;
private String dsProtocol = null;
private String dsSubprotocol = null;
private String dsName = null;
private String dsUsername = null;
private String dsPassword = null;
Sept. 2012
Yangjun Chen
ACS-3902
61
Database: Review
public DatabaseAccess(String configDir)
{ this.configDir = configDir; }
public DatabaseAccess()
{ this(“.”); }
//source: data source name
//configFile: source configuration file
public Connection getConnection(String source,
String configFile) throws SQLException, Exception
{ Connection con;
try
{Properties prop = loadConfig(ConfigDir, ConfigFile);
Sept. 2012
Yangjun Chen
ACS-3902
62
Database: Review
if (prop != null)
{dsDriver = prop.getProperty(source + “.driver”);
dsProtocol = prop.getPropert(source + “.protocol”);
dsSubprotocol = prop.getPropert(source +
“.subprotocol”);
if (dsName == null)
dsName = prop.getProperty(source +
“.dsName”);
if (dsUsername == null)
dsUsername = prop.getProperty(source +
“.username”);
if (dsPassword == null)
dsPassword = prop.getProperty(source +
“.password”);
Sept. 2012
Yangjun Chen
ACS-3902
63
Database: Review
//load driver class
Class.forName(dsDriver);
//connect to data source
String url = dsProtocol + “:” + dsSubprotocol + “:”
+ dsName;
con = DriverManager.getConnection(url, dsUsername,
dsPassword)
}
else
throw new Exception(“*Cannot find property file” +
configFile);
return con;
}
catch (ClassNotFoundException e)
{ throw new Exception(“* Cannot find driver class “ +
dsDriver + “!”); }
}
Sept. 2012
Yangjun Chen
ACS-3902
64
Database: Review
//dir: directory of configuration file
//filename: file name
public Properties loadConfig(String dir, String filename)
throws Exception
{ File inFile = null;
Properties prop = null;
try
{ inFile = new File(dir, filename);
if (inFile.exists()
{ prop = new Properties();
prop.load(new FileInputStream(inFile));
}
else throw new Exception(“* Error in finding “ +
inFile.toString());
}
finally {return prop;}
}
}
Sept. 2012
Yangjun Chen
ACS-3902
65
Database: Review
Using class DatabaseAccess, DataSourceDemo1 should be
modified a little bit:
DatabaseAccess db = new databaseAccess();
con = db.getConnection(“config”,
“datasource.config”);
Sept. 2012
Yangjun Chen
ACS-3902
66
Database: Review
function dependencies
- data redundancy, update anomalies
- what is a function dependency?
- inference rules, minimal set of FDs
Normalization
normal forms
- first normal form
- second normal form
- third normal form
- Boyce Codd normal form
Sept. 2012
Yangjun Chen
ACS-3902
67
Database: Review
Data redundancy and update anomalies:
EmployeeDepartment
ename
ssn
bdate
address dnumber
dname
This is similar to Employee, but we have included
dname.
Sept. 2012
Yangjun Chen
ACS-3902
68
Database: Review
EmployeeProject
ssn pnumber
hours
ename
plocation
This is similar to Works_on, but we have included
ename and plocation
Sept. 2012
Yangjun Chen
ACS-3902
69
Database: Review
In the two prior cases with EmployeeDepartment and
EmployeeProject, we have redundant information in the
database …
• if two employees work in the same department, then
that department name is replicated
• if more than one employee works on a project then the
project location is replicated
• if an employee works on more than one project his/her
name is replicated
Redundant data leads to
• additional space requirements
• update anomalies
Sept. 2012
Yangjun Chen
ACS-3902
70
Database: Review
Suppose EmployeeDepartment is the only relation where
department name is recorded
insert anomalies
• adding a new department is complicated unless there
is also an employee for that department
deletion anomalies
• if we delete all employees for some department, what
should happen to the department information?
modification anomalies
• if we change the name of a department, then we must
change it in all tuples referring to that department
Sept. 2012
Yangjun Chen
ACS-3902
71
Database: Review
Functional dependencies:
Suppose we have a relation R comprising attributes X,Y, …
We say a functional dependency exists between the
attributes X and Y,
X
Y
if, whenever a tuple exists with the value x for X, it will
always have the same value y for Y.
X
LHS
Sept. 2012
Y
RHS
Yangjun Chen
ACS-3902
72
Database: Review
Student
course_no
student_no student_name gender
Student_no
Student_name
gender
Given a specific student number, there is
only one value for student name and only
one value for gender found with it.
Sept. 2012
Yangjun Chen
ACS-3902
73
Database: Review
Inference Rules for Function Dependencies
• From a set of FDs, we can derive some other FDs
Example:
F = {ssn  {Ename, Bdate, Address, dnumber},
dnumber  {dname, dmgrssn}}
inference
ssn  {dname, dmgrssn},
ssn  dnumber,
dnumber  dname.
• F+ (closure of F): The set of all FDs that can be deduced from
F (with F together) is called the closure of F.
Sept. 2012
Yangjun Chen
ACS-3902
74
Database: Review
Inference Rules for Function Dependencies
• Inference rules:
- IR1 (reflexive rule): If X  Y, then X  Y. (X  X.)
- IR2 (augmentation rule): {X  Y} |= ZX  ZY.
- IR3 (transitive rule): {X  Y, Y  Z} |= X  Z.
- IR4 (decomposition, or projective, rule):
{X  ZY} |= X  Y, X  Z.
- IR5 (union, or additive, rule): {X  Y, Y  Z} |= X  ZY.
- IR6 (pseudotransitive rule): {X  Y, WY  Z} |= WX  Z.
Sept. 2012
Yangjun Chen
ACS-3902
75
Database: Review
Equivalence of Sets of FDs
E and F are equivalent if E+ = F+.
Minimal sets of FDs
• every dependency has a single attribute on the RHS
• the attributes on the LHS of a dependency are minimal
• we cannot remove any dependency from F and still
have a set of dependencies that is equivalent to F.
ssn
pnumber
hours
ename
plocation
{ssn, pnumber}  hours,
ssn  ename,
pnumber  plocation.
Sept. 2012
Yangjun Chen
ACS-3902
76
Database: Review
Normal Forms
• A series of normal forms are known that have,
successively, better update characteristics.
• We’ll consider 1NF, 2NF, 3NF, and BCNF.
• A technique used to improve a relation is
decomposition, where one relation is replaced by two
or more relations. When we do so, we want to
eliminate update anomalies without losing any
information.
Sept. 2012
Yangjun Chen
ACS-3902
77
Database: Review
1NF - First Normal Form
The domain of an attribute must only contain atomic values.
• This disallows repeating values, sets of values, relations
within relations, nested relations, …
• In the example database we have a department located in
possibly several locations: department 5 is located in
Bellaire, Sugarland, and Houston.
• If we had the relation
Department
dnumber dname dmgrssn
dlocations
5
Research 333445555 Bellaire, Sugarland, Houston
then it would not be 1NF because there are multiple values to
be kept in dlocations.
Sept. 2012
Yangjun Chen
ACS-3902
78
Database: Review
1NF - First Normal Form
If we have a non-1NF relation we can decompose it, or
modify it appropriately, to generate 1NF relations.
There are 3 options:
• option 1: split off the problem attribute into a new
relation (create a DepartmentLocation relation).
Department
DepartmentLocation
dnumber dname dmgrssn
dnumber dlocation
5
Research 333445555
5
Bellaire
5
5
Generally considered the best
solution
Sept. 2012
Yangjun Chen
ACS-3902
Sugarland
Houston
79
Database: Review
2NF - Second Normal Form
• full functional dependency
X  Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold
any more.
EmployeeProject
ssn pnumber hours
ename plocation
{ssn, pnumber}  hours is a full dependency
(neither ssn  hours , nor pnumber  hours).
Sept. 2012
Yangjun Chen
ACS-3902
80
Database: Review
2NF - Second Normal Form
• partial functional dependency
X  Y is a partial functional dependency if removal of
some attribute A from X does not affect the dependency.
EmployeeProject
ssn pnumber
hours
ename
plocation
{ssn, pnumber}  ename is a partial dependency
because ssn ename holds.)
Sept. 2012
Yangjun Chen
ACS-3902
81
Database: Review
2NF - Second Normal Form
A relation schema is in 2NF if
(1) it is in 1NF and
(2) every non-key attribute must be fully functionally
dependent on the primary key.
If we had the relation
EmployeeProject
ssn
pnumber
hours
ename
plocation
then this relation would not be 2NF because of two separate
violations of the 2NF definition:
Sept. 2012
Yangjun Chen
ACS-3902
82
Database: Review
2NF - Second Normal Form
•We correct this by decomposing the relation into three
relations - splitting off the offending attributes - splitting
off partial dependencies on the key.
EmployeeProject
ssn pnumber
ssn
pnumber
hours
ename
plocation
hours
2NF
ssn
ename
pnumber plocation
Sept. 2012
Yangjun Chen
ACS-3902
83
Database: Review
3NF - Third Normal Form
• Transitive dependency
A functional dependency X  Y in a relation schema R is a
transitive dependency if there is a set of attributes Z that is not
a subset of any key of R, and both X  Z and Z  Y hold.
EmployeeDept
ename
ssn
bdate
address dnumber
dname
ssn  dnumber and dnumber  dname
Sept. 2012
Yangjun Chen
ACS-3902
84
Database: Review
3NF - Third Normal Form
A relation schema is in 3NF if
(1) it is in 2NF and
(2) each non-key attribute must not be fully functionally
dependent on another non-key attribute (there must be no
transitive dependency of a non-key attribute on the PK)
• If we had the relation
ename
ssn
bdate
address dnumber
dname
then this relation would not be 3NF because
• dname is functionally dependent on dnumber and neither is
• a key attribute
Sept. 2012
Yangjun Chen
ACS-3902
85
Database: Review
3NF - Third Normal Form
• We correct this by decomposing - splitting off the transitive
dependencies
EmployeeDept
ename
ssn
ename
ssn
bdate
bdate
address dnumber
dname
address dnumber
3NF
dnumber
Sept. 2012
Yangjun Chen
ACS-3902
dname
86
Database: Review
Boyce Codd Normal Form, BCNF
• Consider a different definition of 3NF, which is
equivalent to the previous one.
A relation schema R is in 3NF if, whenever a
function dependency X  A holds in R, either
(a)
X is a superkey of R, or
(b)
A is a prime attribute of R.
A superkey of a relation schema R = {A1, A2, ..., An} is a set of attributes S R
with the propertity that no tuples t1 and t2 in any legal state r of R will have
t1[S] = t2[S].
An attribute is called a prime attribute if it is a member of any key.
Sept. 2012
Yangjun Chen
ACS-3902
87
Database: Review
Boyce Codd Normal Form, BCNF
• If we remove (b) from the previous definition for 3NF,
we have the definition for BCNF.
• A relation schema is in BCNF if every determinant is a
superkey key. Stronger than 3NF:
- no partial dependencies
- no transitive dependencies where a non-key attribute
is dependent on another non-key attribute
- no non-key attributes appear in the LHS of a
functional dependency.
Sept. 2012
Yangjun Chen
ACS-3902
88
Database: Review
Boyce Codd Normal Form, BCNF
Consider:
Instructor teaches one
course only.
student_no course_no instr_no
In 3NF!
Student takes a course
and has one instructor.
{student_no, course_no}  instr_no
instr_no  course_no
Sept. 2012
Yangjun Chen
ACS-3902
89
Database: Review
Boyce Codd Normal Form, BCNF
Some sample data:
student_no course_no instr_no
Sept. 2012
121
121
222
1803
1903
1803
99
77
66
222
1903
77
Yangjun Chen
ACS-3902
Instructor 99 teaches 1803
Instructor 77 teaches 1903
Instructor 66 teaches 1803
90
Database: Review
Boyce Codd Normal Form, BCNF
Some sample data:
student_no course_no instr_no
Sept. 2012
121
121
222
1803
1903
1803
99
77
66
222
1903
77
Yangjun Chen
ACS-3902
Instructor 99 teaches 1803
Instructor 77 teaches 1903
Instructor 66 teaches 1803
91
Database: Review
Boyce Codd Normal Form, BCNF
student_no course_no instr_no
121
121
222
1803
1903
1803
99
77
66
222
1903
77
Instructor 99 teaches 1803
Instructor 77 teaches 1903
Instructor 66 teaches 1803
Deletion anomaly: If we delete all rows for course 1803 we’ll
lose the information that instructors 99 teaches student 121 and
66 teaches student 222.
Insertion anomaly: How do we add the fact that instructor 55
teaches course 2906?
Sept. 2012
Yangjun Chen
ACS-3902
92
Database: Review
Boyce Codd Normal Form, BCNF
Which decomposition preserves all the information?
S#
C#
C#
I#
121
121
222
222
1803
1903
1803
1903
1803
1903
1803
99
77
66
student_no course_no
?
course_no instr_no
student_no course_no
Joining these two tables leads to
spurious tuples - result includes
121 1803 66
222 1803 99
student_no instr_no
student_no instr_no
course_no instr_no
Sept. 2012
Yangjun Chen
ACS-3902
93
Database: Review
student_no course_no
course_no instr_no
student_no
course_no
instr_no
121
1803
99
S#
C#
C#
I#
121
1903
77
121
1803
1803
99
222
1803
66
121
1903
1903
77
222
1903
77
222
1803
1803
66
222
1903

Sept. 2012
Yangjun Chen
ACS-3902
94
Database: Review
Boyce Codd Normal Form, BCNF
Which decomposition preserves all the information?
S#
C#
S#
I#
121
121
222
222
1803
1903
1803
1903
121
121
222
222
99
77
66
77
Joining these two tables leads to
spurious tuples - result includes
121 1803 77
121 1903 99
222 1803 77
222 1903 66
Sept. 2012
Yangjun Chen
student_no course_no
course_no instr_no
student_no course_no
?
student_no instr_no
student_no instr_no
course_no instr_no
ACS-3902
95
Database: Review
student_no course_no
student_no instr_no
student_no
course_no
instr_no
121
1803
99
S#
C#
S#
I#
121
1903
77
121
1803
121
99
222
1803
66
121
1903
121
77
222
1903
77
222
1803
222
66
222
1903
222
77

Sept. 2012
Yangjun Chen
ACS-3902
96
Database: Review
Boyce Codd Normal Form, BCNF
This decomposition preserves all the information.
S#
I#
C#
I#
121
121
222
222
99
77
66
77
1803
1903
1803
99
77
66
Only FD is
instr_no
student_no instr_no
course_no instr_no
course_no
but the join preserves
{student_no, course_no}
Sept. 2012
Yangjun Chen
ACS-3902
instr_no
97
Database: Review
student_no
Instr_no
course_no instr_no
student_no
course_no
instr_no
121
1803
99
S#
I#
C#
I#
121
1903
77
121
99
1803
99
222
1803
66
121
77
1903
77
222
1903
77
222
66
1803
66
222
77
Sept. 2012
Yangjun Chen
ACS-3902
98
Database: Review
Definition of lossless join property
- relation decomposition
- lossless join property
Lossless
join
Testing algorithm
- matrix construction
- matrix initialization
- matrix modification
Sept. 2012
Yangjun Chen
ACS-3902
99
Database: Review
• Basic definition of Lossless-join
A decomposition D = {R1, R2,..., Rm} of R has the lossless join
property with respect to the set of dependencies F on R if, for
every relation r of R that satisfies F, the following holds,
(R1(r), ..., Rm(r)) = r,
where  is the natural join of all the relations in D.
The word loss in lossless refers to loss of information, not to
loss of tuples.
Sept. 2012
Yangjun Chen
ACS-3902
100
Database: Review
Emp_PROJ
SSN
PNUM
hours
ENAME PNAME PLOCATION
F = {SSN  ENAME, PNUM  {PNAME, PLOCATION},
{SSN, PNUM}  hours}
R1
SSN
ENAME
R3
SSN
PNUM
Sept. 2012
R2
PNUM PNAME PLOCATION
hours
Yangjun Chen
Lossless join
ACS-3902
101
Database: Review
• decomposion-1
A1
SSN
A2
ENAME
A3
PNUM
A4
PNAME
A5
PLOCATION
A6
hours
R1
b11
b12
b13
b14
b15
b16
R2
b21
b22
b23
b24
b25
b26
R3
b31
b32
b33
b34
b35
b36
R1
a1
a2
b13
b14
b15
b16
R2
b21
b22
a3
a4
a5
b26
R3
a1
b32
a3
b34
b35
a6
Sept. 2012
Yangjun Chen
ACS-3902
102
Database: Review
SSN  ENAME
SSN
ENAME
R1
a1
a2
b13
b14
b15
b16
R2
b21
b22
a3
a4
a5
b26
R3
a1
a2
a3
b34
b35
a6
PNUM  {PNAME, PLOCATION}
PNUM
PNAME
PLOCATION
R1
a1
a2
b13
b14
b15
b16
R2
b21
b22
a3
a4
a5
b26
R3
a1
a2
a3
a4
a5
a6
Sept. 2012
Yangjun Chen
ACS-3902
103
Database: Review
• Example: decomposition-2
Emp_PROJ
SSN
PNUM
hours
ENAME PNAME PLOCATION
F = {SSN  ENAME, PNUM  {PNAME, PLOCATION},
{SSN, PNUM}  hours}
R1
ENAME PLOCATION
Not lossless join
R2
SSN PNUM
Sept. 2012
hours
PNAME PLOCATION
Yangjun Chen
ACS-3902
104
Database: Review
• decomposition-2
A1
SSN
A2
ENAME
A3
PNUM
A4
PNAME
A5
PLOCATION
A6
hours
R1
b11
b12
b13
b14
b15
b16
R2
b21
b22
b23
b24
b25
b26
R1
b11
a2
b13
b14
a5
b16
R2
a1
b22
a3
a4
a5
a6
SSN  ENAME
PNUM  {PNAME, PLOCATION}
{SSN, PNUM}  hours
The matrix can not be changed!
Sept. 2012
Yangjun Chen
ACS-3902
105
Database: Review
Multi-Dimensional Indexes
• Multiple-key indexes
• kd-trees
• Quad trees
• R-trees
• Bit map
• Inverted files
Sept. 2012
Yangjun Chen
ACS-3902
106
Database: Review
Multiple-key indexes
(Indexes over more than one attributes)
Employee
ename
ssn
age
salary dnumber
Aaron, Ed
Abbott, Diane
Adams, John
Adams, Robin
Sept. 2012
Yangjun Chen
ACS-3902
107
Database: Review
Multiple-key indexes
(Indexes over more than one attributes)
Index on age
Sept. 2012
Index on salary
Yangjun Chen
ACS-3902
108
Database: Review
Multiple-key indexes
60
400
260
25
30
45
50
60
70
85
60
350
75
100
120
275
260
110
140
Sept. 2012
Yangjun Chen
ACS-3902
109
Database: Review
kd-Trees
(A generalization of binary trees)
A kd-tree is a binary tree in which interior nodes have an associated
attribute a and a value v that splits the data points into two parts:
those with a-value less than v and those with a-value equal or larger
than v.
Sept. 2012
Yangjun Chen
ACS-3902
110
Database: Review
kd-Trees
salary 150
age 60
70, 110
85, 140
salary 80
50, 100
50, 120
age 38
25, 60
Sept. 2012
age 47
salary 300
30, 260
50, 275
60, 260
25, 400
45, 350
45, 60
50, 75
Yangjun Chen
ACS-3902
111
Database: Review
Insert a new entry into a kd-tree:
insert(35, 500):
salary 150
age 60
70, 110
85, 140
salary 80
50, 100
50, 120
age 38
25, 60
Sept. 2012
age 47
30, 260
salary 300
50, 275
60, 260
25, 400
45, 350
45, 60
50, 75
Yangjun Chen
ACS-3902
112
Database: Review
Insert a new entry into a kd-tree:
salary 150
insert(35, 500):
age 60
70, 110
85, 140
salary 80
50, 100
50, 120
age 38
25, 60
Sept. 2012
age 47
30, 260
age 35
25, 400
45, 60
50, 75
Yangjun Chen
50, 275
60, 260
salary 300
ACS-3902
35, 500
45, 350
113
Database: Review
Quad-trees
In a Quad-tree, each node corresponds to a square region in two
dimensions, or to a k-dimensional cube in k dimensions.
• If the number of data entries in a square is not larger than what
will fit in a block, then we can think of this square as a leaf node.
• If there are too many data entries to fit in one block, then we treat
the square as an interior node, whose children correspond to its
four quadrants.
Sept. 2012
Yangjun Chen
ACS-3902
114
Database: Review
Quad-trees
name
age
…
salary
…
…
25
…
400
…
400k
salary
0
Sept. 2012
100
age
Yangjun Chen
ACS-3902
115
Database: Review
400k
Quad-trees
SW
25, 60
46, 60
50, 75
50, 100
SW – south-west
SE – south-east
Sept. 2012
SE NE
75, 100
85, 140
100
0
50, 200
NW
50, 275
60, 260
50, 120
70, 110
25, 300
30, 260
25, 400
45, 350
NW – north-west
NE – north-east
Yangjun Chen
ACS-3902
116
Database: Review
R-trees
An R-tree is an extension of B-trees for
multidimensional data.
• An R-tree corresponds to a whole area (a rectangle for two-dimensional data.)
• In an R-tree, any interior node corresponds to some interior
regions, or just regions, which are usually a rectangle
• Each region x in an interior node n is associated with a link to a
child of n, which corresponds to all the subregions within x.
Sept. 2012
Yangjun Chen
ACS-3902
117
Database: Review
Suppose that the local cellular phone company adds a POP (point
of presence, or base station) at the position shown below.
100
POP
school
road1
house2
house1
pipeline
road2
Sept. 2012
0
Yangjun Chen
ACS-3902
100
118
Database: Review
R-trees
((0, 0), (60, 50))
road1
Sept. 2012
road2
house1
((20, 20), (100, 80))
school house2 pipeline pop
Yangjun Chen
ACS-3902
119
Database: Review
Insert a new region r into an R-tree.
100
POP
school
road1
house2
house1
pipeline
road2
house3
100
0
Sept. 2012
((70, 5), (980, 15))
Yangjun Chen
ACS-3902
120
Database: Review
Insert a new region r into an R-tree.
1. Search the R-tree, starting at the root.
2. If the encountered node is internal, find a subregion into which
r fits.
• If there is more than one such region, pick one and go to its
corresponding child.
• If there is no subregion that contains r, choose any subregion
such that it needs to be expanded as little as possible to contain
r.
((70, 5), (980, 15))
((0, 0), (60, 50))
road1
Sept. 2012
road2
house1
Yangjun Chen
((20, 20), (100, 80))
school house2 pipeline pop
ACS-3902
121
Database: Review
Two choices:
• If we expand the lower subregion, corresponding to the first
leaf, then we add 1000 square units to the region.
• If we extend the other subregion by lowering its bottom by 5
units, then we add 1200 square units.
((0, 0), (80, 50))
road1 road2 house1 house3
Sept. 2012
Yangjun Chen
((20, 20), (100, 80))
school house2 pipeline pop
ACS-3902
122
Database: Review
Insert a new region r into an R-tree.
3. If the encountered node v is a leaf, insert r into it. If there is no
room for r, split the leaf into two and distribute all subregions in
them as evenly as possible. Calculate the ‘parent’ regions for the
new leaf nodes and insert them into v’s parent. If there is the
room at v’s parent, we are done. Otherwise, we recursively split
nodes going up the tree.
Suppose that each
leaf has room for
6 regions.
road1
Sept. 2012
((0, 0), (100, 100))
road2
Add POP (point of
presence, or base
station)
house1 school house2 pipeline
Yangjun Chen
ACS-3902
123
Database: Review
Bit map
1. Image that the records of a file are numbered 1, …, n.
2. A bitmap for a data field F is a collection of bit-vector of
length n, one for each possible value that may appear in the
field F.
3. The vector for a specific value v has 1 in position i if the ith
record has v in the field F, and it has 0 there if not.
Sept. 2012
Yangjun Chen
ACS-3902
124
Database: Review
Example
Employee
ename
ssn
age
salary dnumber
Aaron, Ed
Abbott, Diane
30
30
60
60
Adams, John
Adams, Robin
Brian, Robin
Brian, Mary
Widom, Jones
40
50
55
55
60
75
75
78
80
100
Bit maps for age:
30: 1100000
40: 0010000
50: 0001000
Sept. 2012
Bit maps for salary:
55: 0000110
60: 0000001
Yangjun Chen
60: 1100000
75: 0011000
78: 0000100
ACS-3902
80: 0000010
100: 0000001
125
Database: Review
Query evaluation
Select ename
From Employee
Where age = 55 and salary = 80
In order to evaluate this query, we intersect the vectors for
age = 55 and salary = 80.

0000110
0000010
vector for age = 55
vector for salary = 80
0000010
This indicates the 6th tuple is the answer.
Sept. 2012
Yangjun Chen
ACS-3902
126
Database: Review
Range query evaluation
Select ename
From Employee
Where 30 < age < 55 and 60 < salary < 78
We first find the bit-vectors for the age values in (30, 50); there are only two:
0010000 and 0001000 for 40 and 50, respectively.
Take their bitwise OR: 0010000  0001000 = 0011000.
Next find the bit-vectors for the salary values in (60, 78) and take their bitwise
OR: 1100000  0011000 = 1111000.
0011000
 1111000
0011000
The 3rd and 4th tuples are the answer.
Sept. 2012
Yangjun Chen
ACS-3902
127
Database: Review
Compression of bitmaps
Run-length encoding:
Run in a bit vector: a sequence of i 0’s followed by a 1.
000000010001
This bit vector contains two runs.
Run compression: a run r is represented as another bit string r’
composed of two parts.
part 1: i expressed as a binary number, denoted as b1(i).
part 2: Assume that b1(i) is j bits long. Then, part 2 is a sequence
of (j – 1) 1’s followed by a 0, denoted as b2(i).
r’ = b2(i)b1(i).
Sept. 2012
Yangjun Chen
ACS-3902
128
Database: Review
Compression of bitmaps
Run-length encoding:
Run in a bit vector s: a sequence of i 0’s followed by a 1.
000000010001
This bit vector contains two runs.
r’ = b2(i)b1(i).
r1 = 00000001
r1’ = 110111
b11 = 7 = 111, b12 = 110
r2 = 0001
r2’ = 1011
b11 = 3 = 11, b12 = 10
Sept. 2012
Yangjun Chen
ACS-3902
129
Database: Review
000000010001
r1’ r2’ = 1101111011
Starting at the beginning, find the first 0
at the 3rd bit, so j = 3. The next 3 bits are
111, so we determine that the first integer
is 7. In the same way, we can decode1011.
Decoding a compressed sequence s’:
1. Scan s’ from the beginning to find the first 0.
2. Let the first 0 appears at position j. Check the next j bits. The
corresponding value is a run.
3. Remove all these bits from s’. Go to (1).
Sept. 2012
Yangjun Chen
ACS-3902
130
Database: Review
Inverted files
An inverted file - A list of pairs of the form: <key word, pointer>
… the cat is
fat
cat
… was raining
cats and dogs …
dog
… Fido the
Dogs …
a bucket of pointers
Sept. 2012
Yangjun Chen
ACS-3902
131
Database: Review
Inverted files
When we use “buckets” of pointers to occurrences of each word,
we may extend the idea to include in the bucket array some
information about each occurrence.
cat
type
position
title
header
anchor
text
5
10
3
57
dog
…
… the cat is
fat
…
…
…
Sept. 2012
Yangjun Chen
ACS-3902
… was raining
cats and dogs …
… Fido the
Dogs …
132
Database: Review
Hierarchical database schema
- hierarchical schema
- record type, PCR type
Hierarchical
databases
- virtual PCR: virtual child, virtual parent
Database languages
- HDDL
- HDML
Sept. 2012
Yangjun Chen
ACS-3902
133
Database: Review
ERD for Chapter 6
database example
dependent
n
n
1
1
m
Works on
employee
n
project
1
n
1
1
n
1
Dept_locations
Sept. 2012
n
Yangjun Chen
1
ACS-3902
department
134
Database: Review
• Virtual Parent-child Relationships
-
Hierarchical schema using VPCR - for a Company database
D
Department
Dname Dnum
E
Employee
Ename Minit … ...
P
L
Dlocation
Location
Project
Pname … ...
Esupervisee
SPTR
S
Y Demployee
EPTR
M
Dmanager
StartDate MPTR
Sept. 2012
T
Dependent
DEPname Minit ...
W
Pworker
Hours WPTR
Yangjun Chen
ACS-3902
135