DavidShaoChapter3Fall2003CS157B

Download Report

Transcript DavidShaoChapter3Fall2003CS157B

Chapter 3 The Relational
Model, Additional Notes
By David Shao
CS 157B Fall 2003
Instructor: Dr. Lee
Overview
Codd’s Original Paper
 Alternative: Network Model
 IBM Develops System R
 How to Fund Basic Research?

Codd’s Original Paper
“A Relational Model of Data for Large
Shared Data Banks”
 Communications of the ACM, Volume 13,
Number 6, June 1970
 Lower level (basement) of the new Martin
Luther King, Jr. Library
 Get to roll the shelves apart to access
journals.

Codd’s Reasons

Data independence from database
implementation such as machine
representation
 Natural structure of data
 Can be analyzed mathematically (Codd was
a mathematician by training)
Alternative: Network Model

Charles A. Bachman 1973 ACM Turing
Award Lecture “The Programmer as
Navigator”
 Communications of the ACM, Volume 16,
Number 11, November 1973, pp. 653-657
Bachman’s Perspective

Compared his network model to the work of
Copernicus
 Derided “computer-centric” databases,
proposed a future where programmers
navigate an “n-dimensional data space”
CODASYL

CODASYL: Conference on Data Systems
Languages
 Formed 1959, helped develop COBOL
 1971 Report of the CODASYL Data Base
Task Group (DBTG)
 Bachman one of DBTG’s founding
members
Network: Navigating Links

Unlike the relational model where
everything is a table in the database, entities
and relationships are not treated in a
uniform fashion.
Sample Example Fragment

DBTG initial specification used COBOL
with promises to add Fortran later.
RECORD NAME IS ELECTION
LOCATION MODE IS VIA ALL-ELECTIONS-88
WITHIN PRESIDENTIAL-AREA
02 ELECTION-YEAR PIC “9999”
02 ELECTION-WON-ELECTORAL-VOTES PIC “999”
Relational vs. Network

Reference: Entire issue of Computing
Surveys, Volume 8, Number 1, March 1976
 IBM competitors DEC, Univac, Xerox, and
Honeywell offered commercial DBTG
databases on their hardware, and Cullinane
Corp. offered one for IBM’s System/360
 As of 1976, the relational model was
unproven.
IBM Develops System R
“The 1995 SQL Reunion: People, Projects,
and Politics”, edited by Paul McJones
 http://gatekeeper.dec.com/pub/DEC/SRC/te
chnical-notes/SRC-1997-018html/sqlr95.html
 “A History and Evaluation of System R”,
Chamberlin et al., CACM 24 (10), 1981

Codd Not Involved with
System R

Codd was NOT part of the development
team and not an author in the CACM paper
 Codd tried to form a team to investigate a
language to more closely resemble the
mathematics behind the first order predicate
calculus.
 Express notions such as x y f(x, y)
System R Timeline

Phase Zero: 1974-1975, throw-away
prototype
 Phase One: 1976-1977, full scale
implementation including other features of a
“real” database
 Phase Two: 1978-1979, testing System R
internally and with trusted customers
System R Features

Ability to process SQL commands
 Support for transactions
 Support for multiple users
Transactions

ACID: Atomicity, Consistency, Isolation,
Durability
 Don’t lose data when for example the hard
drive crashes or the power goes out.
 Basic idea: Keep a copy of the old data in a
separate place and keep a log of all changes
since the last known good state.
Coordinating Multiple Users

Users want access to the same resource
 Potential problems when one user writes
and another user writes or reads at the same
time.
Resource-Sharing Example

Suppose users A and B are attempting to
share resource X at the same time in a
multi-programming environment
 Different results depending on which order
A and B are scheduled to run:
 A writes X; B reads X; B writes X
 B reads X; A writes X; B writes X
 B reads X; B writes X; A writes X
Resource-Sharing
Implementation
Locks—reserve a resource for one’s usage,
at least prohibiting someone else from
changing the resource while one is using it.
 Many pitfalls—whole books have been
written on potential problems such as
deadlocks, starvation, and thrashing.

Database Implementation
Reference

Title: Transaction Processing: Concepts and
Techniques
 Jim Gray and Andreas Reuter
 Jim Gray is one of the authors of the System
R paper previously cited.
System R Support for SQL

Need for speed—universal answer back
then was to write a compiler.
 Pre-compiled code eliminates time spent reparsing SQL and figuring out the best way
to access the data.
 Used B-trees to support indexing.
Compiling SQL Queries

Compiled SQL queries into pre-optimized
machine code
 Discovered about 100 machine code
fragments were sufficient.
 Not too much impact on users when
interactive queries were compiled and not
interpreted.
B-Trees

Key idea—one node distinguishes many
Many Children for One Node?

Data is pulled into memory from secondary
storage in pages—relatively large pieces
 Accessing data from the hard drive is
incredibly slower than from memory—
maybe a million times slower!
 Conclusion: Number of pages accessed
needs to be kept as small as possible.
B-Tree Advantages

If each internal node can reference 200
children, in two accesses 200 * 200 =
40,000 branches can be distinguished.
 A tree has an intrinsic notion of an ordering
as opposed to a hash.
 Maybe data can be arranged so that related
items will be on the same page as one
recently accessed.
Join Example Using B-Trees

Suppose we want to perform a join
Bob
7
Tim
3
5
7
3
Mary
Anne
Sue
Bob
7
Anne
Tim
3
Sue
Using B-Tree Indexing

Use an idea similar to merge sort
 Index the first table on its column 2, index
the second table on its column 1, then
merge on identical values
 Avoid m * n comparisons
Illustration of Merging

Good
Bad
3
3
3
3
5
7
5
7
7
7
Who Made Winning Possible
Researchers—without Codd’s insight
relational models don’t happen
 Programmers—without System R relational
databases and SQL are unproven
 Management—without Larry Ellison and
the other founders of Oracle, relational
databases do not win commercially

Or Was It a Debacle?

Researchers lost—no one wants to read
mathematics, and corporate funding for
basic research has been slashed
 Programmers lost—ordinary users don’t
want SQL, they want simple Excel-like
spreadsheets. And SQL commoditizes
programmers.
IBM Lost?

IBM funded the basic research in relational
databases, they employed Codd and the
System R programmers
 But it was Larry Ellison who made tens of
billions of dollars.
 IBM previously had a near-monopoly with
IMS in business databases, but products like
Oracle help IBM competitors like Sun.
How to Fund Basic Research?
“Funding a Revolution: Government
Support for Computing Research”
 Copyright 1999 by the National Academy
of Sciences
 http://www.nap.edu/readingroom/books/far/
 Chapter 6 “The Rise of Relational
Databases”
