Transcript rel

Levels of Abstraction in DBMS data
• Many views, single conceptual View 1 View 2 View 3
(logical) schema and physical
schema.
Conceptual Schema
–
–
–
–
Views describe how users see the
data (possibly different data models
for different views).
Conceptual schema defines logical
structure of entire data enterprise
Physical schema describes the
underlying files and indexes used.
Called ANSI schema model
Physical Schema
* Schemas are defined using DDL; data is modified/queried using DML.
Structure of a DBMS
• A typical DBMS has a
layered architecture.
• The figure does not
show the concurrency
control and recovery
components.
• This is one of several
possible architectures;
each system has its
own variations.
These layers
must consider
concurrency
control and
recovery
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Overview of Database Design
• Conceptual design: (ER Model is used at this stage.)
–
–
What are the entities and relationships in the enterprise?
What information about these entities and relationships
should we store in the database?
–
What integrity constraints or business rules hold?
A database `schema’ in the ER Model can be represented
pictorially (ER diagrams).
–
Then we can map an ER diagram into a relational schema.
–
ER Model Basics
ssn
name
lot
Employees
• Entity: Real-world object distinguishable from other
objects. An entity is described (in DB) using a set
of attributes.
• Entity Set: A collection of similar entities. E.g., all
employees.
–
–
–
All entities in an entity set have the same set of attributes.
(except when we consider ISA hierarchies, anyway!)
Each entity set has a key.(the chosen identifier attribute(s) )
Each attribute has a domain.(allowable value universe)
ER Model Basics (Contd.)
name
dname
lot
Employees
did
Works_In
ssn
lot
Employees
since
ssn
name
budget
Departments
supervisor
subordinate
Reports_To
Degree=2 relationship between entities, Employees and Departments
Degree=2 relationship between entities,Employees and Employees.
Must specify the “role” of each entity to distinguish them.
• Relationship: Association among two or more
entities. E.g., Jones works in Pharmacy
department.
Key Constraints
• (many-to-many) Consider
Works_In: An employee can
work in many depts; a dept
can have many employees.
since
name
ssn
lot
Employees
dname
budget
did
Works_In
Departments
since
• (1-many) In contrast, it may
be required that each dept
have at most one manager.
name
ssn
dname
Employees
1-to-1
did
lot
1-to Many
Manages
Many-to-1
budget
Departments
Many-to-Many
Participation Constraints
• Does every department have to have a manager?
–
If so, this is a participation constraint: the participation of
Departments in Manages is said to be total (vs. partial).
• Every did value in Departments table must appear in a
row of the Manages table (with a non-null ssn value!)
since
name
ssn
dname
did
lot
Employees
Manages
Works_In
since
budget
Departments
ISA (`is a’) Hierarchies
* Attributes are inherited.
* If we declare A ISA B,
every A entity is also
considered to be a B entity
(e.g., every Hourly_Emps
and every Contract_Emps
ISA Employees)
name
ssn
lot
Employees
hourly_wages
hours_worked
ISA
contractid
Hourly_Emps
Contract_Emps
• Overlap constraints: Can Joe be an Hourly_Emps as well as a
Contract_Emps entity? (Allowed/disallowed)
• Covering constraints: Does every Employees entity also have to
be an Hourly_Emps or a Contract_Emps entity? (Yes/no)
• Reasons for using ISA:
– To add descriptive attributes specific to a subclass (e.g.,
Hourly_Emps have hourly_wages but Contract_Emps don’t)
Conceptual Design Using the ER Model
• Design choices:
–
–
Should a concept be modeled as an entity or
an attribute?
Should a concept be modeled as an entity or
a relationship?
Why Study the Relational Model?
• Most widely used model.
–
Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc.
A competitor: object-oriented model
–
–
ObjectStore, Versant, Ontos
A synthesis emerging: object-relational model
• Informix Universal Server, UniSQL, O2, Oracle, DB2
• Really just a more flexible relational model
Relational Database: Working Definitions
• Relational database: a set of relations
• Relation: made up of 2 parts:
–
Instance or occurrence : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
–
Schema or type : specifies name of relation,
plus name and type of each column/attribute.
• E.G. Students(sid: string, name: string, login: string,
age: integer, gpa: real).
• Strictly speaking, a relation is a set of tuples but it is
commonplace to think if it a table (sequence of rows
made up of a sequence of attribute values)
Relational Query Languages
• A major strength of the relational model: supports
simple, powerful querying of data.
• Queries can be written intuitively (what, not how),
and the DBMS is responsible for efficient
evaluation.
–
Allows the optimizer to extensively re-order operations,
and still ensure that the answer does not change.
The SQL Query Language
• Developed by IBM (system R) in the 1970s
• Need for a standard since it is used by many vendors
• Standards:
–
–
–
–
SQL-86
SQL-89 (minor revision)
SQL-92 (major revision, current standard)
SQL-99 (major extensions)
– Procedural constructs (if-then-else, loops, procs)
– OO constructs (inheritance, polymorphism,…)
A look at the SQL Query Language
• One of the simplest languages on earth
(compared to C++ or JAVA or…)
• Find all 18 year old students, we can write:
SELECT *
FROM Students S
WHERE S.age=18
sid
name
53666 Jones
login
jones@cs
age gpa
18
3.4
53688 Smith smith@ee 18
3.2
•To find just names and logins, replace the first line:
SELECT S.name, S.login
Querying Multiple Relations
• What does the following query produce?
SELECT S.name, E.cid
FROM Students S, Enrolled E
WHERE S.sid=E.sid AND E.grade=“A”
sid
53831
53831
53650
53666
we get:
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
S.name
Smith
E.cid
Topology112
Creating Relations in SQL
CREATE TABLE Students
• Creates the Students
(sid: CHAR(20),
relation. Observe that the
name: CHAR(20),
type (domain) of each field
login: CHAR(10),
age: INTEGER,
is specified, and enforced by
gpa: REAL)
the DBMS whenever tuples
are added or modified.
CREATE TABLE Enrolled
• As another example, the
(sid: CHAR(20),
Enrolled table holds
cid: CHAR(20),
grade: CHAR(2))
information about courses
that students take.
Destroying and Altering
Relations
DROP TABLE Students
• Destroys the relation Students. The schema
information and the tuples are deleted.
ALTER TABLE Students
ADD COLUMN Year: integer
v
The schema of Students is altered by adding a
new field; every tuple in the current instance
is extended with a null value in the new field.
Adding and Deleting Tuples
• Can insert a single tuple using:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)
v
Can delete all tuples satisfying some
condition (e.g., name = Smith):
DELETE
FROM Students S
WHERE S.name = ‘Smith’
* Powerful variants of these commands are available!
Integrity Constraints (ICs)
• IC: condition that must be true for any instance of
the database; e.g., domain constraints.
–
–
ICs are specified when schema is defined.
ICs are checked when relations are modified.
• A legal instance of a relation is one that satisfies all
specified ICs.
–
DBMS should not allow illegal instances.
• If the DBMS checks ICs, stored data is more faithful
to real-world meaning.
–
Avoids data entry errors, too!
Primary Key Constraints
• A set of fields is a key (strictly: candidate key) for a
relation if :
1. (Uniqueness) No two distinct tuples can have same
values in the key field (all key fields if composite), and
2. (Minimality) This is not true for any subset of a
composite key.
– If Part 2 is false, it’s called a superkey (superset of a key)
– There’s always >1 key for a relation, one of the keys is
chosen (by DBA) to be the primary key.
• E.g., sid is a key for Students. The set {sid, gpa} is
a superkey.
Entity integrity
• No column of the primary key can contain a
null value.
Foreign Keys, Referential Integrity
• Foreign key : Set of fields in one relation that is used
to `refer’ to a tuple in another relation. (Must
correspond to primary key of the second relation.)
Like a `logical pointer’.
• E.g. sid is a foreign key referring to Students in
Enrolled(sid: string, cid: string, grade: string)
– If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references.
– That is, an Enrolled record cannot have an sid that is not
present in Students
Foreign Keys in SQL
• Only students listed in the Students relation
should be allowed to enroll for courses.
CREATE TABLE Enrolled
(sid CHAR(20), cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
Students
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Enforcing Referential Integrity
• Consider Students and Enrolled; sid in Enrolled is a
foreign key that references Students.
• What should be done if an Enrolled tuple with a
non-existent student id is inserted? (Reject it!)
• What should be done if a Students tuple is deleted?
–
–
–
–
Also delete all Enrolled tuples that refer to it.
Disallow deletion of a Students tuple that is referred to.
Set sid in Enrolled tuples that refer to it to a default sid.
(In SQL, also: Set sid in Enrolled tuples that refer to it to
a special value null, denoting `unknown’ or
`inapplicable’.)
• Similar if primary key of Students tuple is updated.
Referential Integrity in SQL/92
• SQL/92 supports all 4
options on deletes and
updates.
– Default is NO ACTION
(delete/update is rejected)
– CASCADE (also delete all
tuples that refer to deleted
tuple)
– SET NULL / SET DEFAULT
(sets foreign key value of
referencing tuple)
CREATE TABLE Enrolled
(sid CHAR(20),
cid CHAR(20),
grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid)
REFERENCES Students
ON DELETE CASCADE
ON UPDATE SET DEFAULT )
Where do ICs Come From?
• ICs are based upon the semantics of the real-world
enterprise that is being described in the database
relations (I.e., users decide, not DB experts!).
• We can check a database instance to see if an IC is
violated, but we can NEVER infer that an IC is true
by looking at the instances.
–
An IC is a statement about all possible instances! It is
not a statement that can be inferred from the set of
existing instances.
• Key and foreign key ICs are the most common;
more general ICs supported too.
Views
• A view is just a relation, but we store a definition,
rather than a set of tuples.
CREATE VIEW YoungActiveStudents (name, grade)
AS SELECT S.name, E.grade
FROM Students S, Enrolled E
WHERE S.sid = E.sid and S.age<21
v
Views can be dropped using the DROP VIEW command.
How to handle DROP TABLE if there’s a view on the table?
 DROP TABLE command has options to let user specify this.
• Views can be used to present necessary information (or a summary),
while hiding details in underlying relation(s).
– Given YoungStudents, but not Students or Enrolled, we can find
students s who have are enrolled, but not the cid’s of the courses
they are enrolled in.
u
Who decides what the primary key is? (and other design choices?)
The Database design expert?
NO! - not in isolation anyway.
Someone from the enterprise who understands
the data and the procedures should be consulted.
The following story illustrates this point.
CAST: Mr. Goodwrench = MG (parts manager);
Database Expert = DE
DE: I've looked at your data, and I have decided Part Number (P#) will be designated
the primary key for the relation, PARTS(P#, COLOR, WT, TIME-OF-ARRIVAL).
MG: You're the expert, but I think we should use the weight (WT)!
DE: Well, according to textbooks P# should be the primary key,
because it is the main lookup attribute!
...
later
MG: Why is the system so slow?
DE:
Do store parts in the stock room ordered by P#?
MG: No. We store by weight. When a shipment comes in, I take each
part into the back room and throw it as far as I can. The lighter
ones go further than the heavy ones so they get ordered by weight!
DE: But weight doesn't have Uniqueness property!
Parts with the same weight end up together in a pile!
MG: No they don't. I tire quickly, so the first one goes furthest, etc.
DE: Then use composite primary key, (weight, TIME-OF-ARRIVAL).
MG:
OK.
The point: This conversation should have taken place during the 1st
meeting.