Fall 2008, INFS614 - GMU Computer Science

Download Report

Transcript Fall 2008, INFS614 - GMU Computer Science

Database Management Systems
INFS 614-001
Fall 08
Lecture One - Introduction
Instructor: Carlotta Domeniconi
[email protected]
http://www.cs.gmu.edu/~carlotta/teaching/INFS-614s08/info.html
Fall 2008, INFS614
1
Outline

Course syllabus
– Course Schedule
– Homeworks & exams

Satisfaction of prerequisites
– Strictly enforced: GMU HONOR CODE applies!

Introduction to DB & DBMS
– Outline of the entire course material
Fall 2008, INFS614
2
Front matters

To communicate with me:
– Email: [email protected], I will try to reply
promptly.
– Office hours: by appointment.

Sign up for your Mason account. You may
forward all your Mason emails to your
favorite email address.
Fall 2008, INFS614
3
Front matters (cont.)

GTA:
– Huaming Liu and Chun-Kit Ngan
– Email: [email protected]
– Office hours: Tuesday, 4-6pm (Huaming)
– Room: 330 -ST2.
Fall 2008, INFS614
4
Front matters (cont.)
Required textbooks:
– Database Management Systems, 3rd ed. by
Raghu Ramakrishnan & Johannes Gehrke,
McGraw-Hill.
– Oracle 9i Programming: a Primer, by Rajshkhar
Sunderraman, Addison Wesley, ISBN 0-32119498-5
 On-Line Course Resources:
http://www.cs.gmu.edu/~carlotta/teaching/INFS614-f08/info.html
– You are required to read all the material there.
The content will be updated frequently. So
check the web site periodically, at least once
every week, and every time before class!

Fall 2008, INFS614
5
Date
Topic (chapter/section)
Aug 28
Introduction (chapter 1)
Sep 4
ER model (chapter 2)
Sep 11
Relational Model (chapter 3)
Sep 18
Relational Algebra (sections 4.1-4.2)
Sep 25
Relational Algebra (continued)
Oct 2
SQL (sections 3.4, 5.1-5.5)
Oct 9
Review
Oct 16
Midterm Exam
Oct 23
SQL (sections 5.6-5.15)
Oct 30
Functional Dependencies
Nov 6
Functional Dependencies (continued)
Nov 13
Decomposition and Normal Forms
Nov 20
Advanced topics (or catch up)
Dec 4
Review
Dec 11
Final Exam
Fall 2008, INFS614
HW assignment
HW due
1
2
1
3
2
3
4
5
4
5
6
Submission and Grading





Late submissions are not accepted:
no exceptions!
No make-up exams!
On-time: within 5 minutes after the class begins.
Important: your homeworks must run properly
under the Oracle system in the labs.
Final grades:
–
–
–
–
homework assignments (20%)
Project (15%)
midterm exam (25%)
final exam (40%)
Fall 2008, INFS614
7
Honor Code System


GMU honor Code
http://honorcode.gmu.edu/
For this class
– Homeworks & exams require individual work. Study
groups are encouraged, but homeworks’ solutions and
write up must be individual.
– Exams: individual effort, closed books

Satisfaction of prerequisites:
– Honor code invoked.
Fall 2008, INFS614
8
Satisfaction of prerequisites

Prerequisites (strictly enforced)
– INFS-501 (Discrete mathematics)
– INFS-515 (Computer architecture/organization)
– INFS-590 (Program Design and data structures)
Specifically:
– Good background in discrete mathematics (e.g.,
set theory, mathematical logic, relations and
functions);
– Programming (good knowledge of either C, C++ or
Java);
– Data structures and algorithms, Computer
architecture, and Operating systems.
Fall 2008, INFS614
9
Satisfaction of prerequisites
For INFS/SWE/ISA students:

Consult your letter of acceptance. It specifies your
status with respect to these foundation courses. For
each course, it must be that either
– You were waved from the course (the evidence
should be either in the acceptance letter or in a
subsequent official document).
– You took the course and received a grade of B or
better.
Fall 2008, INFS614
10
Satisfaction of prerequisites

For non-IS/SWE/ISA students, MUST DO THE
FOLLOWING (by next week):
– Consult the description of each of the
prerequisite courses in the university catalog.
– For each course, provide a list of one or more
courses taken, that cover the subjects of that
course, as follows: course-number, course-title,
institution, year, final-grade;
– Syllabus of each course taken;
– Copy of transcripts that shows equivalent classes taken
(with grade B or better);
– Current status;
Fall 2008, INFS614
11
Useful links for your computing
needs


http://labs.ite.gmu.edu/ (click on FAQ’s) for IT&E
computing labs, IT&E cluster account, and Oracle
DBMS information.
http://cs.gmu.edu/~ami/teaching/infs614/current/oracl
e.html
for information on our particular computing
environment.
Fall 2008, INFS614
12
Introduction to DB and
DBMS
Fall 2008, INFS614
13
What is a Database?



Database : A very large, integrated
collection of data.
Data : Known facts about the real-world
that can be recorded and have implicit
meaning;
A database models real-world scenarios :
– Entities
– Relationships between entities
Fall 2008, INFS614
14
University Database

Information about university environment
• Entities :
• Students
• Faculty
• Courses
• Classrooms
Fall 2008, INFS614
• Relationships :
•Students’ enrollment course
•Faculty teaching courses
•Use of classroom for course
•Prerequisite courses
15
What is a DBMS?

A Database Management System (DBMS) is
a software package designed to store,
provide access and manage databases
– One DBMS, many databases;


Database System: A database and a
DBMS.
Why use a DBMS?
Fall 2008, INFS614
16
A simple problem: address list



Solution 1: a blank notebook, entries recorded with
a pen, in time order.
Advantages: simple, private, reliable, space
efficient.
Disadvantages:
–
–
–
–
–
Hard to search;
Hard to add information (e.g., e-mail);
Hard to update information;
Hard to extract information (print Christmas cards);
Integrity and consistency (Mary Jones: see P. Jones
address, P. Jones-Smith entry);
– Loosing it is a catastrophe!
Fall 2008, INFS614
17
Solution 2: a loose-leaf notebook
with n entries per page

Better:
– Can keep it sorted by key;
– Insertions & deletions can be done;

Same as Sol. 1 in other aspects:
– No search by other keys (e.g., phone number).
Fall 2008, INFS614
18
Solution 3: Text format,managed
by text editor

Advantages:
–
–
–
–
–
–

Free format;
“Unlimited” size;
Easily copied (for backup);
Easily shared;
Sub-string searchable;
Easy Update.
Disadvantages:
– Change requirements?
Fall 2008, INFS614
19
Complications with Solution 3

File gets very large
– Search gets slow and possibly imprecise.
(E.g., search for “Elm Street” may yield
“Wilhelm Street”)

Solution: structure entries into records
with fields and add indexes over fields.
Database Concepts: Record Organization, Keys, Indexes
Fall 2008, INFS614
20
Complication 2: need to separate
families from addresses

Why?
–
–
–
–

People move;
Might forget to update addresses;
Want space economy: single point of update;
Important to separate for applications: 1 Christmas card
per residence!
Solution: two files (one: people, one: addresses).
How do we link them? How many residences a
person can have?
Database Concepts: Consistency, Normalization,
Foreign Keys
Fall 2008, INFS614
21
Complication 3: multiple
association
People own, rent, manage residences
 May want to impose constraints in the
number of residences per person or vice
versa.
 Examples: Many to many (rich people);
Many to one (single family); One to many
(Builder); One to one (legal residence)

Database Concepts: Relationships; Cardinality
Fall 2008, INFS614
22
Complication 4: dynamic
nature of the data

Add new information:
– Cards sent and received
– Zip+4

Requirements:
– Adding fields
– Summarizing
Database Concepts: Data Abstraction; Data Evolution
Fall 2008, INFS614
23
Complication 5: Ad hoc
analysis and retrieval

Example:
– Find who sent me cards over the past 5 years,
but received less than 3 cards from me.

Requirements:
– A language
– An implementation of retrieval functions
(correct and efficient).
Database Concepts: Query languages; Query
optimization
Fall 2008, INFS614
24
Complication 6: Sharing

Different users, different organizations
– Other family members want to see names and addresses
together
– You don’t want to give update access over your business
contacts to anybody.

Solutions:
– Use stored queries as ”windows” or “views” over the
database.
– Ability to “reunite” data from different files.
– Data not selected by the query is “not there”
– Permissions
Database Concepts: Joins; Views; Security
Fall 2008, INFS614
25
Complication 7: Required
existence of associated data


Examples:
– Can’t send Christmas card to somebody without an
address
– Names are not unique; only when associated with
residence.
Solution:
– Don’t insert a name if there is no address and vice versa
– Or tolerate multiple non-unique names
Database Concepts: Referential Integrity; Weak
entity sets
Fall 2008, INFS614
26
Complication 8: Multiple
updates in an all or none basis

Examples
– Two households merge (marriage)
– Need to change residences (or other data) for
a group of people
– Computer crashes in the middle of updates

Solution
– Illusion of updates being done simultaneously
– Commit or rollback an entire chunk of work
Database Concepts: Transactions; ACID properties;
Recovery
Fall 2008, INFS614
27
Complication 9: computer
crashes

Will I have my data after the crash?
– Uncorrupted?
– Consistent?

Solution:
– Make sure data is available uncorrupted at a
point in the past (checkpoint)
Database Concepts: Durability; Consistency (ACID);
Recovery
Fall 2008, INFS614
28
Complication 10: multimedia
Pictures, Audio, Text, …
 Requirements:

– Ability to store new data types
– Content search
– Integration with text and numeric data
Database Concepts: Multimedia databases;
Query by content
Fall 2008, INFS614
29
Complication 11: You become
President! …

Of something …(US, Corporation, Local chapter of
charity, your household)
– Your address list grows exponentially
– You realize some of the information is useful!

Examples:
– zip codes in states where there are less than 5%
difference in Rep./Dem. Votes in 2004?
– Which combinations of products sold best last year?
Database Concepts: Data Warehousing; Data
Mining
Fall 2008, INFS614
30
Files vs. DBMS
Application must store large datasets
between main memory and secondary
storage (e.g., buffering, page-oriented
access, etc.);
 Special code to answer different queries;
 Must protect data from inconsistency due
to multiple concurrent users;
 Crash recovery;
 Security and access control.

Fall 2008, INFS614
31
Why Use a DBMS?
Easier and More Efficient
Data independence and efficient access;
 Reduced application development time;
 Data integrity and security;
 Uniform data administration;
 Concurrent access, recovery from crashes.

Fall 2008, INFS614
32
Data Models
A data model is a collection of concepts
for describing data.
 A schema is a description of a particular
collection of data, using a given data model.
 The relational model of data is the most
widely used model today.

–
–
Main concept: relation, basically a table with
rows and columns.
Every relation has a schema, which describes
the columns, or fields.
Fall 2008, INFS614
33
Relational Model

The main concept is a relation:
– A table with rows and columns
sid
name
login
age
gpa
5366 Jones
[email protected]
18
2.4
5498 Boon
[email protected]
19
3.9
23
3.5
…
..
6756 Gioffrey [email protected]
..

…
…
Each row in the table is called a tuple
Fall 2008, INFS614
34
Relational Model (cont.)

The relation schema specifies:
– name of the relation,
– name of each attribute (column,field) and its
type. Every attribute has an atomic type.
Relation Name
Student(sid:string, login:string, age:integer, gpa:real);
Attribute Name

A Relation (Relation instance): a set of tuples.
Fall 2008, INFS614
35
Levels of Abstraction

Many views, single
conceptual (logical)
schema and physical
schema.
–
–
–
Views describe how users
see the data.
Conceptual schema defines
logical structure
Physical schema describes
the files and indexes used.
View 1
View 2
View 3
Conceptual Schema
Physical Schema
 Schemas are defined using DDL (Data Definition Language);
Data is modified/queried using DML (Data Manipulation Language).
Fall 2008, INFS614
36
Levels of Abstraction
*
Conceptual Schema : the data is described through the data
model. It describes structure and constraints for the whole
database.
*
External Schema : how the users see and use the data. Many
views of the data.
*
Physical schema : describes the physical structure of the DB
*
Mappings among schema levels are also needed. Programs and
applications refer to an external schema, and are mapped by
the DBMS to the conceptual schema for execution.
Conceptual, External Schemas are defined using Data
Definition Language (DDL) : specification for defining
the database schema
Fall 2008, INFS614
37
Example: University Database

Conceptual schema:
– Student (sid: string, name: string, login: string,
age: integer, gpa: real)
– Courses (cid: string, cname: string, credits:
integer)
– Enrolled (sid: string, cid: string, grade: string)

Physical schema:
– Relations stored as unordered files.
– Index on first column of Students…

External schema (View):
– Course_info (cid: string, enrollment: integer)
Fall 2008, INFS614
38
Data Independence
Applications insulated from how data is
structured and stored.
 Logical data independence: Protection
from changes in logical structure of data.
 Physical data independence: Protection
from changes in physical structure of data.

 One of the most important benefits of using a DBMS!
Fall 2008, INFS614
39
Easy Manipulation & Access of the
Information


A DBMS enables users to create , update and query the data
through Data Languages
Data Definition Language (DDL)
– Specification notation to create the Database schema

Data Manipulation Language (DML)
– A language for manipulating the data : updating the data
and accessing the data
– The portion of a DML that allows to access the information
through formulating queries is called the Query Language
Query : Request for retrieving data stored in a DBMS.
Fall 2008, INFS614
40
Concurrency Control

Concurrent execution of user programs is
essential for good DBMS performance
– Because disk accesses are frequent, and
relatively slow, it is important to keep the cpu
not idle by working on several user programs
concurrently.
Interleaving actions of different user
programs can lead to inconsistency: e.g.,
check is cleared while account balance is
being computed.
 DBMS ensures such problems don’t arise:
users can pretend they are using a singleuser system.

Fall 2008, INFS614
41
Concurrency Control (cont.)
Example:
 One
course still has space for one
more student.
 Two students are trying to enroll in
that course at the same time.
 The DBMS executes the two requests
in a serial order.
 Thus, only one student will be
enrolled.
Fall 2008, INFS614
42
Transaction
An execution of a DB program
 Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).
 ACID properties

–
–
–
–

A – Atomicity
C – Consistency
I – Isolation
D – Durability
How: log and concurrency control sub-system
Fall 2008, INFS614
43
Scheduling Concurrent Transactions

DBMS ensures that execution of {T1,…, Tn} is
equivalent to some serial execution T1’…Tn’.
– Before reading/writing an object, a transaction
requests a lock on the object, and waits till the
DBMS gives it the lock. All locks are released at
the end of the transaction. (Strict 2PL locking
protocol.)
– Idea: If an action of Ti (say, writing X) affects Tj
(e.g., reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti
completes; this effectively orders the transactions.
Fall 2008, INFS614
44
Ensuring Atomicity
DBMS ensures atomicity (all-or-nothing
property) even if system crashes in the middle
of a transaction.
 Idea: Keep a log (history) of all actions carried
out by the DBMS while executing a set of
transactions:

– Before a change is made to the database, the
corresponding log entry is forced to a safe location.
(WAL – Write-Ahead Log – protocol);
– After a crash, the effects of partially executed
transactions are undone using the log.
Fall 2008, INFS614
45
The Log

The following actions are recorded in the log:
– Ti writes an object: the old and new value.

Log record must go to disk before the changed page!
– Ti commits/aborts: a log record indicating this
action.
Log records chained together by the
transaction id, so it’s easy to undo a specific
transaction (e.g., resolve a deadlock)
 All log related activities are handled
transparently by the DBMS.

Fall 2008, INFS614
46
Structure of a DBMS
A typical DBMS has a
layered architecture
 Each layer is
composed of several
modules
 The architecture
varies from vendor to
vendor

Fall 2008, INFS614
47
Main cost of using a DBMS
• High initial investment and possible need for
additional hardware.
• Overhead for providing generality, security,
recovery, integrity and concurrency control.
When a DBMS may be unnecessary
• If the Database and application are simple,
well-defined and not expected to change.
• If there are stringent real-time requirements,
that may not be met due to DBMS overhead.
• If access to data by multiple users is not
required.
Fall 2008, INFS614
48
Database Users


End users (or DB application users)
DB application programmers (more precisely, they
are DBMS users)
–
–

E.g. smart webmasters
This course is mostly to learn how to (start to) be a
DB application programmer.
Database administrator (DBA)
–
–
–
–
Designs logical /physical schemas
Handles security and authorization
Data availability, crash recovery
Database tuning as needs evolve
Must understand how a DBMS works!
Fall 2008, INFS614
49
Summary
DBMS used to maintain, query large datasets.
 Benefits include recovery from system
crashes, concurrent access, quick application
development, data integrity and security.
 Levels of abstraction give data independence.
 We will learn how to

– Set up a database

Design (ERD and Relational Model), and refine (Relational
Normalization Theory)
– Use to query the database

Relational Algebra and SQL
Fall 2008, INFS614
50