Transcript Slide 1
Databases and
Statistical Databases
Session 4
Mark Viney
Australian Bureau of Statistics
5 June 2007
Terms
Database
ƒ
A shared collection of logically related data
(and description of this data), designed to
meet the information needs of an
organisation
DataBase Management System (DBMS)
ƒ
A software system that enables users to
define, create and maintain the database and
provides controlled access to this database
Terms (example)
Database
Personnel Database
ƒ Stock database
ƒ Statistical Database
ƒ
DataBase Management System (DBMS)
Oracle
ƒ DB2
ƒ Access
ƒ MySql
ƒ FoxPro
ƒ Firebird
ƒ
Why keep information in
databases?
Accessibility of data
Increased concurrency (reads and writes)
ƒ Sharing data
ƒ
Improved data integrity
Improved security
ƒ
access only to necessary data
Relatable
ƒ
More information from same amount of data
Visible
Why keep information in
databases? (continued)
backup and recovery
Improved productivity
ƒ
common tools /
common processes
Disadvantages of databases
Complexity
Size
Cost of DBMS
Need to upgrade versions
Additional hardware costs
Higher impact of failure
www.cableready.net/newsletter/winter99.html
Databases
used to be solely mainframe
commonly on minicomputers
increasingly available on microcomputers
mostly accessed by SQL
Relational Databases
entities
datatypes
ƒ validation
ƒ
relationships
ƒ
rules for interaction
Database Tables
rows and columns
fixed number of columns
multiple rows (records)
columns are of same datatype
Structured Query Language - SQL
Standard database language that allows:Database creation and relation structures
ƒ Basic data management tasks
ƒ Both simple and complex queries
ƒ
SQL - Data Definition - DDL
allows creation, modification and deletion of
database objects
Creation - CREATE
CREATE TABLE TAB1 (COL1 NUMBER, COL2
NUMBER);
ƒ Modification - ALTER
ALTER TABLE TAB1 ADD COL3 NUMBER;
ƒ Deletion - DROP
DROP TABLE TAB1;
ƒ
Structured Query Language
- SQL Data Manipulation - DML
Standard language to allow access the data
stored in databases
Extraction - SELECT
SELECT COL1,COL2 from TAB1;
ƒ Loading - INSERT
INSERT INTO TAB1 (COL1,COL2) VALUES(7,22);
ƒ Manipulation - UPDATE
UPDATE TAB1 SET COL2 = COL1 + 2;
ƒ Deletion - DELETE
DELETE FROM TAB1 WHERE COL2 = 4;
ƒ
Database Modeling
representation of "real world"
conceptual model
logical model
physical model
Keys
Primary Keys
ƒ
uniquely identifies a record
Foreign Keys
ƒ
pointer to a Primary Key in another table
Indexes
May be applied to columns to allow fast
data access
May be applied to single columns or
several columns
Direct pointers to rows containing specific
values in the indexed column(s)
may be unique or non-unique
May have more than one index per table
Normalisation
A technique for producing a set of
relations with desirable properties, given
the data requirements of an enterprise
Normalisation
- unnormalised
A representation of the data that contains
repeating groups
Normalization
- unnormalised form
Normalisation
- 1st normal form
A relation in which the intersection of
each row and column contains one and
only one value
1NF
Normalization
- 1st normal form
1NF
Normalisation
- 2nd normal form
A relation that is
in first normal form
ƒ every non-primary key attribute is fully
functionally dependent on the primary key
ƒ
2NF
Normalization
- 2nd normal form
2NF
Normalisation
- 3rd normal form
A relation that is
in first and second normal form and
ƒ in which no non-primary key attribute is
transitively dependent on the primary key
ƒ
3NF
Normalization
- 3rd normal form
3NF
Loading data into databases
Bulk loading tool
Data Integrity
Validation
ad-hoc loading
Data Extraction
Assemble data into usable format
Spreadsheet
Timeseries
Data Cube
Publication
Data manipulation
Inside database
ƒ
Sophisticated manipulation language - SQL
Outside database
Timeseries
Seasonal Adjustment
Chain Volume Measures (Constant Price)
ƒ SAS, SPSS
ƒ
Transactional Integrity
the ability to apply rules to the data via
database constraints
ability to group several discrete data
insertion or data manipulation into one
logical data change
In SQL, controlled via COMMIT and
ROLLBACK statements
Transactional Integrity
database constraints
ƒ
values must conform to specific rules
exist in a specific column
belong to a "set"
uniqueness
If a validation against a constraint fails
ƒ
the current transaction fails
Transactions & Recovery
Each transaction is logged by the DBMS
Backups taken periodically
Data can be recovered
to an archived backup
ƒ to a point in time
ƒ
Transaction example
COMMIT;
transaction 1
INSERT INTO TABLE1 (COL1,COL2) VALUES(7,22);
UPDATE TABLE1 SET COL1 = 77 WHERE COL2 = 22;
DELETE FROM TABLE1 WHERE COL1 = 7;
ROLLBACK;
transaction 2
INSERT INTO TABLE2 (COL3,COL4) VALUES('ABC',11);
UPDATE TABLE2 SET COL3 = 'XYZ';
DELETE FROM TABLE2 WHERE COL3 = 'xyz';
COMMIT;
References
Database Systems a Practical Approach to Design,
Implementation and Management
Thomas Connolly, Carolyn Begg, Anne Strachan
(Addison-Wesley) 1999
cartoons - Randy Glassbergen
Questions?