CS 430 Database Theory

Download Report

Transcript CS 430 Database Theory

CS 430
Database Theory
Winter 2005
Lecture 1: Introduction
1
What’s a Database


“Collection” of “related” “data”
Contains data about some aspect of the “real
world”



Refers to a Universe of Discourse (UoD)
A “logically coherent” collection of data
Has a “specific purpose”
2
Typical Characteristics of Databases
(1 of 3)

“Large”


Typically bigger than a spreadsheet
May be very large

Example, IRS Tax return database:




About 200M returns per year, 5 year retention
About 1K-10K bytes per return (guess)
About 1 – 10 Terabytes (without overhead)
Shared

More than single user and single application
3
Typical Characteristics (2 of 3)

Structured


More than a simple flat table
Self describing



Contains Metadata (data about data) describing
the data contained in the database
Metadata maintained separately from applications
that use and manipulate the data
Has a Catalog which is a “database” of the
Metadata
4
Typical Characteristics (3 of 3)

Supports multiple views of the data


Different users and applications can view the data
differently
ACID properties




Atomicity – Atomic transactions (updates are all or
nothing)
Consistency – Enforces integrity constraints
Isolation – Transactions are isolated from each
other
Durability – Data from completed transactions is
never lost
5
A Little History of Databases (1 of 3)

Mid to late 1960s - first databases

Applications



Multiple data models


Maintain parts data for Lunar Lander
Airline reservations
Hierarchical, Network, Inverted File System
Early, mid 1970s - Relational data model


Edgar Codd – Father of Relation database
Basis for SQL (Structured or Standard Query
Language)
6
History (2 of 3)

1979 – Oracle Version 2



Late 1980s – IBM DB2 Version 1


Initial version (marketing decision)
Incomplete and slow
Used to define the SQL standard
Late 1980s – Object Oriented databases

Created to manage data for “non-traditional”
applications
7
History (3 of 3)

1990s – Object Relational Databases


Pioneered by Michael Stonebraker
Today

Dominant technology: Relational DBMS (RDBMS)




Oracle, MS SQL Server, IBM DB2, …
MySQL, PostgreSQL, …
OO capabilities being added to RDBMS
New: Object-Relational Mapping Software

Try to handle “impedance mismatch” between RDBMS
and OO programming languages
8
Database Applications (1 of 2)

Traditional

Business applications



Traditional data types


Numbers, strings, dates
Data warehousing


Personnel, accounting, ...
Student and Course data
Large “historical” databases for analytic support
Manufacturing Control

Real-time issues
9
Database Applications (2 of 2)

Non-traditional



Image and Video
GIS (Geographic Information Systems)
Engineering


Time Series



CAD (Computer Aided Drafting or Design)
Stock market data
Full text search
Environmental and Remote Sensing
10
Data Base Management System (DBMS)

Software that manages and or facilitates

Data definition


Data construction



E.g. loading data into the database
Data manipulation


E.g. creating and maintaining the catalog
Applications retrieving and updating the database
Data sharing
ACID properties
11
DBMS In Context
Users/Programmers
Application Programs
External Queries
Database System
DBMS Software
Query Processing
Application Program Interface
Access/Update Stored Data
Metadata
Catalog
The Data
Elmasri and Navathe, Figure 1.1, Page 6
12
Database People (Actors) (1 of 2)

Data Administrator


Database Administrator


Configure DBMS, manage data storage, DBMS
performance tuning
Database Designer


Responsible for correctness of the data
Design the database
All three of these may be same person or
group of people
13
Database People (2 of 2)

Application Analysts and Developers


Responsible for analyzing, designing, building,
and maintaining database applications
End Users

Use the database to accomplish useful work
14
Why use a DBMS? (1 of 2)

Manage redundancy


Access Control


If the same data is stored multiple times (often
enough, without periodic reconciliation) it is
guaranteed to be inconsistent
Not all the users can view and/or update all the
data
Persistent storage of program data

Rather than having to implement your own DBMS
internal to your application
15
Why a DBMS? (2 of 2)

Efficiency



Enforce integrity constraints


Defined and enforced once
Share data


DBMS vendors have done a lot of work to make their
products work efficiently
Mixed blessing (see “Why not to use a DBMS?”)
Among multiple applications, GUIs, users
ACID Properties

Difficult to implement correctly
16
Why not to use a DBMS?

Learning curve


“It takes four years to learn to be an Oracle DBA”
Overhead costs (time and space)




Generality
Concurrency and transactions
Multiple application and user access
Complex data structures

Rule of thumb: Using an RDBMS doubles the space
required for the data (e.g. versus a text file)
17
Course Administration

Course web site



Textbook


http://faculty.cs.wwu.edu/reedyc/CS_430_Winter_2005
Email: [email protected]
Elmasri, Navathe, Fundamentals of Database Systems,
Fourth Edition
Assignments

Use MySQL


Most convenient form of access?
Get hands dirty:



Design a database
Create database and load the data
Write a database application
18
Course Outline (1 of 2)

Introduction to Databases


Introduction to Data Modeling


Chapter 3 (partial)
Relation Data Model, Algebra, and Calculus


Chapters 1 and 2
Chapters 5, 6
Functional Dependencies and Normalization

Chapters 10 and 11 (partial)
19
Course Outline (2 of 2)

SQL Database Programming


Entity-Relationship Modeling


More of chapters 3, 4, and 7
Overview: What’s inside a DBMS?


Chapters 8 and 9
CS530, Chapters 13-19
Overview of additional topics


Object-Oriented and Object Relational DBMSs
XML in Databases
20