History of database

Download Report

Transcript History of database

Database History
Why we want to use them
Copyright © 1998-2016 by Curt Hill
Six Generations of Data
Management
•
•
•
•
•
•
•
Manual – Prehistory
Punch cards – 1900
Programmed Unit record – 1955
Online database – 1965
Client Server, Relational – 1980
Multimedia – 1995
All of these still continue
Copyright © 1998-2016 by Curt Hill
Manual
• Writing has existed for millennia
• Kings used writing to inventory their
goods, record their laws
• Sumerian tablets date from 2000 BC or
before
Copyright © 1998-2016 by Curt Hill
Punch cards
• Originally used by Jacquard to
program silk weaving machines
– Not really data management
• Herman Hollerith used to record
census data in 1890
• Created a suite of machines that would
punch, sort, print and tabulate from
cards
– Programmed by rewiring control panels
• Known as unit record or electronic
accounting machines
Copyright © 1998-2016 by Curt Hill
Machines
Copyright © 1998-2016 by Curt Hill
Programmed Unit record
• Stored program computers change the
face of data management
• Cards still used for input
• Tapes store the data much more
densely than cards
• Programming removes the limits on
what sort of calculations or
transformations may be done on the
data
• Produced a file-oriented record
processing approach
Copyright © 1998-2016 by Curt Hill
Consider an example
• A college has many files that describe
their system
–
–
–
–
–
–
–
Faculty with department and specialty
Staff with position
Payroll
Catalog of courses
Grades
Students
Among many others
• We will look at payroll and grades as
an example
Copyright © 1998-2016 by Curt Hill
Payroll File Fields
•
•
•
•
•
Name
Address
Salary/wage
Earnings Year to Date
Among very many more
Copyright © 1998-2016 by Curt Hill
Payroll File
Name
Addr
Salary
YTD
Martin, Barb
94 Ridge
Jones, Al
121 Colton $30,000 $21,402
Johnson, Tom 604 46th
$32,000 $25,425
$35,000 $26,952
Copyright © 1998-2016 by Curt Hill
Grades Fields
• Course
– Including section
•
•
•
•
Student name
Term
Letter grade
Instructor
Copyright © 1998-2016 by Curt Hill
Grades File
Crs.
Name
Term
Grade
Instr
Math 103 Smith, Bill
Fall 03
A
Johnson
CIS 385
Smith, Bill
Fall 03
C
Hill
Hist 101
Johns, Sarah Fall 03
B
Rupert
Copyright © 1998-2016 by Curt Hill
Background Vocabulary
• Fields
– Collection of related characters
• Records
– Collection of related fields
• Files
– Collection of related records
• Database
– Collection of related files
Copyright © 1998-2016 by Curt Hill
How to use
• Actions we might want on each file:
–
–
–
–
Create
Update (add, remove, change records)
Sort
Generate any of several reports
• Each action for each file would be a program
for an overworked programming staff
– Typically a COBOL program
– Eight programs, or sections of programs, for two
files
• These are typically done in a batch
environment
Copyright © 1998-2016 by Curt Hill
Online database
• Many things do not work well in batch
mode:
– Travel reservations need up to the second
information
– The database is born
• Started out as disk based unit record,
but that is not the best organization for
this type of application
• Originally developed into two models:
– Hierarchical and network
Copyright © 1998-2016 by Curt Hill
Client Server, Relational
• EF Codd suggests the relational model
and he and others develop a
substantial theoretical base
• Queries may now be simple and short
– Needs to know a schema, but not
complete organization
– This allows transmission of a simple query
– Client server computing is born
Copyright © 1998-2016 by Curt Hill
Relational Database
• The key
– All the programs previously described are
about the same – every update is nearly
the same
– All that changes is the underlying file
• The solution
– Describe the file in a general way
– Generate a program that handles the file
based on the description
Copyright © 19982016 by Curt Hill
How to describe a file
• A file is a collection of records
• Each record is a collection of fields
– Typically only one type of record in a
file
• Each field is described by a:
– Name
– Type
• For example numeric, string, boolean etc.
– Length
• Booleans have a predefined length, others
require specification
Copyright © 19982016 by Curt Hill
SQL
• Structured Query Language
• Has become the “standard” for queries
• A relational database does not have to
accept SQL
– Unless it wants to be commercially viable
• SQL is mostly declarative but with
some procedural features
– Declarative – what is wanted
– Procedural – how to get it
Copyright © 1998-2016 by Curt Hill
The Files
Name
Addr
Salary
YTD
Martin, Barb
94 Ridge
$32,000
$25,425
Jones, Al
121 Colton $30,000
$21,402
Johnson, Tom 604 46th
Crs.
Name
$35,000
Term
$26,952
Grade
Instr
Math 103 Smith, Bill
Fall 03
A
Johnson
CIS 385
Smith, Bill
Fall 03
C
Hill
Hist 101
Johns, Sarah Fall 03
B
Rupert
Copyright © 1998-2016 by Curt Hill
What is wrong with the original
example?
• Redundancy in faculty description
– Space is wasted
– Discrepancies may occur between grades
and payroll
• Some reports need to access multiple
files
– Eg. Transcript generation
– Complicates the programming issue
Copyright © 19982016 by Curt Hill
Advantages (1 of 3)
• Data independence
– Application program no longer need
some or all of the files
– Do not know or care how data is stored,
aka abstraction
– Simplifies application development
• Efficient access
– The DBMS employs sophisticated
access techniques seldom used by
normal programmers
Copyright © 19982016 by Curt Hill
Advantages (2 of 3)
• Integrity constraints
– The DBMS may check data in a way
seldom done in normal file processing
– Eg. Account validity
• Security
– A DBMS may enforce requirements on
who can access the data and in what
way
Copyright © 19982016 by Curt Hill
Advantages (3 of 3)
• Administration
– Minimize redundancy
– Manage sharing of the data
– Optimize for the enterprise, not a small
group
– Easier to backup the data
• Concurrent access
– Manages the simultaneous update
problem
Copyright © 19982016 by Curt Hill
Disadvantages
• A DBMS is:
–
–
–
–
Complex
Expensive
Bulky
Simple file access is much quicker and
less expensive
– In the era of inexpensive hardware
these are not as significant as they
once were
• The view a DBMS provides may not
be helpful to a particular application
Copyright © 19982016 by Curt Hill
The Champion
• The relational model was undisputed
champion for a long time
• It is still the most widely used database
model
• There are now many alternatives
• Most fill a niche for which the relational
model is not very well suited
– There are many such niches now
Copyright © 1998-2016 by Curt Hill
NoSQL
• All manner of new types of databases
have now appeared
• These are the so called NoSQL
databases
– SQL is the universal query language
– NoSQL may mean no SQL or Not Only SQL
• This is a field that is not finished
developing
Copyright © 1998-2016 by Curt Hill
Finally
• Enterprises have learned that the
database is foundational to business
intelligence
• It is imperative to be able to store and
retrieve the data in both a simple way
and also in way not previously
anticipated
Copyright © 1998-2016 by Curt Hill