intro352-web - COW :: Ceng

Download Report

Transcript intro352-web - COW :: Ceng

CENG 352
Database Management Systems
İsmail Sengör Altıngövde
email: [email protected]
URL: http://www.ceng.metu.edu.tr/~altingovde
CENG 352
•
•
•
•
Instructor: İsmail Sengör Altıngövde
Office: A-203
Email: [email protected]
Lecture Hours: Mon 15:40-17:30 (BMB4);
Thu. 16:40 (BMB4)
• Course Web page: http://cow.ceng.metu.edu.tr
• Teaching Assistant: Abdullah Doğan
– A-206, [email protected]
2
Text Books and References
1. Raghu Ramakrishnan, Database Management Systems,
McGraw Hill, 3rd edition, 2003 (text book).
2. M. Kifer, A. Bernstein and P. Lewis, Database
Systems: An Application Oriented Approach, 2nd
edition, Addison Wesley, 2005.
3. A. Silberschatz, H.F. Korth, S. Sudarshan, Database
System Concepts, McGraw Hill, 6th edition, 2010.
4. R. Elmasri, S.B. Navathe, Fundamentals of Database
Systems, 6th edition, Addison-Wesley, 2011.
5. H. Garcia-Molina, J. D. Ullman, J. Widom, Database
Systems The Complete Book, Prentice Hall, 2002.
3
Required Background
• Material from CENG351 (EntityRelationship modeling, relational data
model, relational algebra, B+-trees and
hash-based indexing, algorithms for
sorting/merging data files)
• Intermediate programming skills
– C/C++/Java
Course Outline
• Review (Relational Model)
– E/R modeling
– SQL
• Database Application Development and Internet
Applications
– Embedded SQL, cursors, dynamic SQL, JDBC classes and
interfaces, stored procedures.
– http protocol, html/xml documents, xml dtd, html forms,
javascript, style sheets, cgi, application servers, servlets, java
server pages.
Course Outline (cont.)
• Relational Database Design
– Functional Dependency Theory
• Attribute closure, FD set closure, 3NF, BCNF,
decomposition lossless-join decomposition, dependencypreserving decomposition.
– Normalization of Relations
• Decomposition into 3NF, BCNF.
– Other Types of Dependencies
• Multivalued dependencies, join dependencies, 4NF, 5NF.
Course Outline (cont.)
• Query Processing
– Algorithms for relational operators (sorting, indexing
(hashing, tree-indexing), nested-loops join, sort-merge join,
hash join), query evaluation plans.
• Query Optimization
– Translation of SQL queries to RA, estimating the cost of an
execution plan, equivalences of RA expressions, enumeration
of alternative plans.
Course Outline (cont.)
• Transaction Processing
– Properties of transactions, concurrent execution,
schedules, recoverability.
• Concurrency Control
– Serializability theory, two-phase locking, deadlock
problem, timestamp-ordering, optimistic concurrency
control.
• Crash Recovery
– Logging, checkpointing, recovering from a system
crash, example systems like ARIES.
Grading
•
•
•
•
•
Midterm
Final Exam
Assignments & Quizzes
Project
Attendance:
25 %
35 %
20 %
20 %
– Once in every week
– 50% attendance is mandatory to get the final
exam
9
Grading Policies
• Policy on final exam:
– Attendance should be > 50%, AND
– The weighted average of the assignments & quizzes,
OR the midterm grade should be at least 30
• Policy on missed midterm:
– make-up exam in the following week (only for the
legal excuses and if informed beforehand)
• Lateness policy:
– No late submission for the written assignments.
– For the programming homework(s): can be late up to 3
days. Afterwards, will be penalized by 10%* day2
– All assignments are to be your own work.
10
Projects
• Projects in groups of two.
– Determine your groups till the end of
add/drops, i.e., Oct 3 Friday, 24:00
• 3 stages:
– Proposal & Preliminary E/R Report
– Design Report
– Final Report + Demo
11
Questions?
Files
• Data is not scattered hither and thither on
disk.
• Instead, it is organized into files.
• Files are organized into records.
• Records are organized into fields.
CENG 351
13
File-Based Systems
• Collection of application programs that
perform services for the end users (e.g.
reports).
• Each program defines and manages its own
data.
14
File-Based Processing
Pearson Education © 2009
15
Limitations of File-Based
Approach
• Separation and isolation of data
– Each program maintains its own set of data.
– Users of one program may be unaware of
potentially useful data held by other programs.
• Duplication of data
– Same data is held by different programs.
– Wasted space and potentially different values
and/or different formats for the same item.
16
Limitations of File-Based
Approach
• Data dependence
– File structure is defined in the program code.
• Incompatible file formats
– Programs are written in different languages, and so
cannot easily access each other’s files.
• Fixed Queries/Proliferation of application
programs
– Programs are written to satisfy particular functions.
– Any new requirement needs a new program.
17
Database Approach
• Arose because:
– Definition of data was embedded in application
programs, rather than being stored separately and
independently.
– No control over access and manipulation of data
beyond that imposed by application programs.
• Result:
– the database and Database Management System
(DBMS).
18
Database Management Systems
Chapter 1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
19
What Is a DBMS?
A very large, integrated collection of data.
 Models real-world enterprise.




Entities (e.g., students, courses)
Relationships (e.g., Madonna is taking CS564)
A Database Management System (DBMS) is a
software package designed to store and
manage databases.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
20
Typical DBMS Functionality




Define a database : in terms of data types, structures
and constraints
Construct or load the database on a secondary storage
medium
Manipulating the database : querying, generating
reports, insertions, deletions and modifications to its
content
Concurrent Processing and Sharing by a set of users
and programs – yet, keeping all data valid and
consistent
21
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
21
Database Management System
(DBMS)
Pearson Education © 2009
22
Why Use a DBMS?
Data independence and efficient access.
 Reduced application development time.
 Data integrity and security.
 Uniform data administration.
 Concurrent access, recovery from crashes.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
23
Why Study Databases??

Shift from computation to information



at the “low end”: scramble to webspace (a mess!)
at the “high end”: scientific applications
Datasets increasing in diversity and volume.



?
Digital libraries, interactive video, Human
Genome project, EOS project
... need for DBMS exploding
DBMS encompasses most of CS

OS, languages, theory, AI, multimedia, logic
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
24
Data Models
A data model is a collection of concepts for
describing data.
 A schema is a description of a particular
collection of data, using a given data model.
 The relational model of data is the most widely
used model today.



Main concept: relation, basically a table with rows
and columns.
Every relation has a schema, which describes the
columns, or fields.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
25
Instance of Students Relation
Students( sid: string, name: string, login: string,
age: integer, gpa: real )
sid
53666
53688
53650
name
Jones
Smith
Smith
login
jones@cs
smith@ee
smith@math
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
age
18
18
19
gpa
3.4
3.2
3.8
26
Data Models

Object-Based Data Models





Entity-Relationship
Semantic
Functional
Object-Oriented.
Record-Based Data Models
 Relational Data Model
 Network Data Model
 Hierarchical Data Model.

Physical Data Models
Pearson Education © 2009
27
Relational Data Model
Pearson Education © 2009
28
Network Data Model
Pearson Education © 2009
29
Hierarchical Data Model
Pearson Education © 2009
30
Levels of Abstraction
View 3 External shemata (views) describe
how users see the data.
Conceptual schema defines
Conceptual Schema
logical structure
View 1
View 2
Physical Schema

Physical schema describes the files
and indexes used
Single conceptual (logical) schema and physical
schema, many views
 Schemas are defined using DDL; data is modified/queried using DML.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
31
Example: University Database

Conceptual schema:




Physical schema:



Students(sid: string, name: string, login: string,
age: integer, gpa:real)
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string)
Relations stored as unordered files.
Index on first column of Students.
External Schema (View):

Course_info(cid:string,enrollment:integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
32
Data Independence *
Applications insulated from how data is
structured and stored.
 Logical data independence: Protection from
changes in logical structure of data.
 Physical data independence: Protection from
changes in physical structure of data.

 One of the most important benefits of using a DBMS!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
33
Data Independence
• Logical Data Independence
– Refers to immunity of external schemas to
changes in conceptual schema.
– Conceptual schema changes (e.g.
addition/removal of entities).
– Should not require changes to external schema
or rewrites of application programs.
34
Data Independence
• Physical Data Independence
– Refers to immunity of conceptual schema to
changes in the internal schema.
– Physical schema changes (e.g. using different
file organizations, storage structures/devices).
– Should not require change to conceptual or
external schemas.
35
Concurrency Control

Concurrent execution of user programs
is essential for good DBMS performance.

Because disk accesses are frequent, and relatively
slow, it is important to keep the cpu humming by
working on several user programs concurrently.
Interleaving actions of different user programs
can lead to inconsistency: e.g., check is cleared
while account balance is being computed.
 DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
36
Transaction: An Execution of a DB Program
Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).
 Each transaction, executed completely, must
leave the DB in a consistent state if DB is
consistent when the transaction begins.




Users can specify some simple integrity constraints on
the data, and the DBMS will enforce these constraints.
Beyond this, the DBMS does not really understand the
semantics of the data. (e.g., it does not understand
how the interest on a bank account is computed).
Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
37
These layers
must consider
concurrency
control and
recovery
Structure of a DBMS



A typical DBMS has a
Query Optimization
layered architecture.
and Execution
The figure does not
Relational Operators
show the concurrency
Files and Access Methods
control and recovery
components.
Buffer Management
This is one of several
Disk Space Management
possible architectures;
each system has its own
variations.
DB
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
38
Database Languages
• Data Definition Language (DDL)
– Allows the DBA or user to describe and name
entities, attributes, and relationships required
for the application
– plus any associated integrity and security
constraints.
• Data Manipulation Language (DML)
– Provides basic data manipulation operations on
data held in the database.
39
System Catalog
• Repository of information (metadata)
describing the data in the database.
• One of the fundamental components of
DBMS.
• Typically stores:
–
–
–
–
–
names, types, and sizes of data items;
constraints on the data;
names of authorized users;
data items accessible by a user and the type of access;
usage statistics.
40
The DBMS Marketplace
• Relational DBMS companies – Oracle, Sybase – are among the
largest software companies in the world.
• IBM offers its relational DB2 system.
• Microsoft offers SQL-Server, plus Microsoft Access for the cheap
DBMS on the desktop, answered by “lite” systems from other
competitors.
• Relational companies also challenged by “object-oriented DB”
companies.
• But countered with “object-relational” systems, which retain the
relational core while allowing type extension as in OO systems.
41
Databases make these folks happy
...
• End users and DBMS vendors
• DB application programmers
–
E.g., smart webmasters
• Database administrator (DBA)
Designs logical /physical schemas
– Handles security and authorization
– Data availability, crash recovery
– Database tuning as needs evolve
Must understand how a DBMS works!
–
Summary
• DBMS used to maintain, query large datasets.
• Benefits include recovery from system
crashes, concurrent access, quick application
development, data integrity and security.
• Levels of abstraction give data independence.
• A DBMS typically has a layered architecture.
• DBAs hold responsible jobs
and are well-paid! 
• DBMS R&D is one of the broadest,
most exciting areas in CS.