CSCI 242 Advanced Database

Download Report

Transcript CSCI 242 Advanced Database

Introduction
» Misunderstood topics
˃
˃
˃
˃
Normalization
Database design
Performance
SQL
» Advanced topics
˃
˃
˃
Time in databases
Translucency
Performance
» Realistic experience
˃
˃
˃
Realistic team size
Accountability
Emerging requirements
» Current Developments
˃
˃
˃
Big data
NOSQL
Cloud Computing
2
» Early applications:
˃ Programs wrote information into files on disk
˃ Programs included lots of information about the files
+ Where they were stored
+ Type of storage
+ Exact format of each record
˃ Changing programs is, in general, very hard
+ Programming is exacting work
+ Testing takes lots of time
+ People change jobs
˃ Early programs were very hard to change
+ If data moved, programs had to change
+ If data changed, programs had to change
+ Events tend to force changes in data
3
» It was discovered that many programs fit a
paradigm:
˃ They stored some data
˃ Then later they changed it
˃ Although hard problems of changing structure of data remained
» Many useful applications could be built on this
notion of a “stored data base”
˃ Data base systems were developed to help manage the data
˃ They provided uniform backup, recovery
˃ Later, they even made changing the data easier
4
» Earlier database systems: hierarchies, networks
as data models
˃
˃
˃
˃
Data could be moved around easily
Relationships represented as physical connections
Structure of relationship imbedded in applications
When structure changed, programs had to change
» Relational: independent table as data model
˃
˃
˃
˃
Relationships “represented” by equal values of data
Structure of relationships invisible to applications
Relationships change as data value change
Much greater ease of change
5
6
»
»
»
»
Inventor of the relational approach
Received Turing Award
Mathematician at IBM Research
Was looking for a true formalism for data
7
» Relational Database: a set of relations
8
» Relation: a set of ordered pairs
» Ordered pair: a pair of values, such that
interchanging the two values changes the
meaning
˃ That is, <a,b>=<b,a> iff a=b and b=a
» Specifying a relation by enumeration:
R={<a,b>,<c,d>,<e,f>}
˃ This is a relation consisting of three ordered pairs.
9
» Ordered pairs can model more than two values
through nesting:
˃ <a, b, c> == <<a,b>, c>
˃ <a, b, c, d> == <<a,b>, c, d>
˃ And so on
» This extends the ordered pair so that it can
model a tuple of any length
» Now a relation starts to look like our notion of a
file, with each tuple corresponding to our
notion of a record
10
» Relation is a set of ordered pairs (modeling a
set of tuples), so:
» 1. exchanging order of values within a tuple
changes the meaning of the tuple
» 2. exchanging the order of tuples within a
relation does not change the meaning of the
tuple
» 3. duplicate tuples are not allowed
11
» Now we build a database as a collection of
independent relations, each describing
instances of a single entity type
» For example:
˃ Employee (employee#, job, salary, department)
˃ Department (department#, departmentname, location)
12
» We need a way to insert data into the database,
retrieve data from the database, and changes
values that are stored in the database
» We define a data language that can be used
from any programming language to do that
» The data language (SQL) has a lot of power and
can save a lot of programming work if you
understand it
13
» Now we’ll talk about course mechanics
14