DavidShaoChapter3Fall2003CS157B
Download
Report
Transcript DavidShaoChapter3Fall2003CS157B
Chapter 3 The Relational
Model, Additional Notes
By David Shao
CS 157B Fall 2003
Instructor: Dr. Lee
Overview
Codd’s Original Paper
Alternative: Network Model
IBM Develops System R
How to Fund Basic Research?
Codd’s Original Paper
“A Relational Model of Data for Large
Shared Data Banks”
Communications of the ACM, Volume 13,
Number 6, June 1970
Lower level (basement) of the new Martin
Luther King, Jr. Library
Get to roll the shelves apart to access
journals.
Codd’s Reasons
Data independence from database
implementation such as machine
representation
Natural structure of data
Can be analyzed mathematically (Codd was
a mathematician by training)
Alternative: Network Model
Charles A. Bachman 1973 ACM Turing
Award Lecture “The Programmer as
Navigator”
Communications of the ACM, Volume 16,
Number 11, November 1973, pp. 653-657
Bachman’s Perspective
Compared his network model to the work of
Copernicus
Derided “computer-centric” databases,
proposed a future where programmers
navigate an “n-dimensional data space”
CODASYL
CODASYL: Conference on Data Systems
Languages
Formed 1959, helped develop COBOL
1971 Report of the CODASYL Data Base
Task Group (DBTG)
Bachman one of DBTG’s founding
members
Network: Navigating Links
Unlike the relational model where
everything is a table in the database, entities
and relationships are not treated in a
uniform fashion.
Sample Example Fragment
DBTG initial specification used COBOL
with promises to add Fortran later.
RECORD NAME IS ELECTION
LOCATION MODE IS VIA ALL-ELECTIONS-88
WITHIN PRESIDENTIAL-AREA
02 ELECTION-YEAR PIC “9999”
02 ELECTION-WON-ELECTORAL-VOTES PIC “999”
Relational vs. Network
Reference: Entire issue of Computing
Surveys, Volume 8, Number 1, March 1976
IBM competitors DEC, Univac, Xerox, and
Honeywell offered commercial DBTG
databases on their hardware, and Cullinane
Corp. offered one for IBM’s System/360
As of 1976, the relational model was
unproven.
IBM Develops System R
“The 1995 SQL Reunion: People, Projects,
and Politics”, edited by Paul McJones
http://gatekeeper.dec.com/pub/DEC/SRC/te
chnical-notes/SRC-1997-018html/sqlr95.html
“A History and Evaluation of System R”,
Chamberlin et al., CACM 24 (10), 1981
Codd Not Involved with
System R
Codd was NOT part of the development
team and not an author in the CACM paper
Codd tried to form a team to investigate a
language to more closely resemble the
mathematics behind the first order predicate
calculus.
Express notions such as x y f(x, y)
System R Timeline
Phase Zero: 1974-1975, throw-away
prototype
Phase One: 1976-1977, full scale
implementation including other features of a
“real” database
Phase Two: 1978-1979, testing System R
internally and with trusted customers
System R Features
Ability to process SQL commands
Support for transactions
Support for multiple users
Transactions
ACID: Atomicity, Consistency, Isolation,
Durability
Don’t lose data when for example the hard
drive crashes or the power goes out.
Basic idea: Keep a copy of the old data in a
separate place and keep a log of all changes
since the last known good state.
Coordinating Multiple Users
Users want access to the same resource
Potential problems when one user writes
and another user writes or reads at the same
time.
Resource-Sharing Example
Suppose users A and B are attempting to
share resource X at the same time in a
multi-programming environment
Different results depending on which order
A and B are scheduled to run:
A writes X; B reads X; B writes X
B reads X; A writes X; B writes X
B reads X; B writes X; A writes X
Resource-Sharing
Implementation
Locks—reserve a resource for one’s usage,
at least prohibiting someone else from
changing the resource while one is using it.
Many pitfalls—whole books have been
written on potential problems such as
deadlocks, starvation, and thrashing.
Database Implementation
Reference
Title: Transaction Processing: Concepts and
Techniques
Jim Gray and Andreas Reuter
Jim Gray is one of the authors of the System
R paper previously cited.
System R Support for SQL
Need for speed—universal answer back
then was to write a compiler.
Pre-compiled code eliminates time spent reparsing SQL and figuring out the best way
to access the data.
Used B-trees to support indexing.
Compiling SQL Queries
Compiled SQL queries into pre-optimized
machine code
Discovered about 100 machine code
fragments were sufficient.
Not too much impact on users when
interactive queries were compiled and not
interpreted.
B-Trees
Key idea—one node distinguishes many
Many Children for One Node?
Data is pulled into memory from secondary
storage in pages—relatively large pieces
Accessing data from the hard drive is
incredibly slower than from memory—
maybe a million times slower!
Conclusion: Number of pages accessed
needs to be kept as small as possible.
B-Tree Advantages
If each internal node can reference 200
children, in two accesses 200 * 200 =
40,000 branches can be distinguished.
A tree has an intrinsic notion of an ordering
as opposed to a hash.
Maybe data can be arranged so that related
items will be on the same page as one
recently accessed.
Join Example Using B-Trees
Suppose we want to perform a join
Bob
7
Tim
3
5
7
3
Mary
Anne
Sue
Bob
7
Anne
Tim
3
Sue
Using B-Tree Indexing
Use an idea similar to merge sort
Index the first table on its column 2, index
the second table on its column 1, then
merge on identical values
Avoid m * n comparisons
Illustration of Merging
Good
Bad
3
3
3
3
5
7
5
7
7
7
Who Made Winning Possible
Researchers—without Codd’s insight
relational models don’t happen
Programmers—without System R relational
databases and SQL are unproven
Management—without Larry Ellison and
the other founders of Oracle, relational
databases do not win commercially
Or Was It a Debacle?
Researchers lost—no one wants to read
mathematics, and corporate funding for
basic research has been slashed
Programmers lost—ordinary users don’t
want SQL, they want simple Excel-like
spreadsheets. And SQL commoditizes
programmers.
IBM Lost?
IBM funded the basic research in relational
databases, they employed Codd and the
System R programmers
But it was Larry Ellison who made tens of
billions of dollars.
IBM previously had a near-monopoly with
IMS in business databases, but products like
Oracle help IBM competitors like Sun.
How to Fund Basic Research?
“Funding a Revolution: Government
Support for Computing Research”
Copyright 1999 by the National Academy
of Sciences
http://www.nap.edu/readingroom/books/far/
Chapter 6 “The Rise of Relational
Databases”