Transcript Slide 1

Databases and
Statistical Databases
Session 4
Mark Viney
Australian Bureau of Statistics
5 June 2007
Terms
Database
ƒ
A shared collection of logically related data
(and description of this data), designed to
meet the information needs of an
organisation
DataBase Management System (DBMS)
ƒ
A software system that enables users to
define, create and maintain the database and
provides controlled access to this database
Terms (example)
Database
Personnel Database
ƒ Stock database
ƒ Statistical Database
ƒ
DataBase Management System (DBMS)
Oracle
ƒ DB2
ƒ Access
ƒ MySql
ƒ FoxPro
ƒ Firebird
ƒ
Why keep information in
databases?
Accessibility of data
Increased concurrency (reads and writes)
ƒ Sharing data
ƒ
Improved data integrity
Improved security
ƒ
access only to necessary data
Relatable
ƒ
More information from same amount of data
Visible
Why keep information in
databases? (continued)
backup and recovery
Improved productivity
ƒ
common tools /
common processes
Disadvantages of databases
Complexity
Size
Cost of DBMS
Need to upgrade versions
Additional hardware costs
Higher impact of failure
www.cableready.net/newsletter/winter99.html
Databases
used to be solely mainframe
commonly on minicomputers
increasingly available on microcomputers
mostly accessed by SQL
Relational Databases
entities
datatypes
ƒ validation
ƒ
relationships
ƒ
rules for interaction
Database Tables
rows and columns
fixed number of columns
multiple rows (records)
columns are of same datatype
Structured Query Language - SQL
Standard database language that allows:Database creation and relation structures
ƒ Basic data management tasks
ƒ Both simple and complex queries
ƒ
SQL - Data Definition - DDL
allows creation, modification and deletion of
database objects
Creation - CREATE
CREATE TABLE TAB1 (COL1 NUMBER, COL2
NUMBER);
ƒ Modification - ALTER
ALTER TABLE TAB1 ADD COL3 NUMBER;
ƒ Deletion - DROP
DROP TABLE TAB1;
ƒ
Structured Query Language
- SQL Data Manipulation - DML
Standard language to allow access the data
stored in databases
Extraction - SELECT
SELECT COL1,COL2 from TAB1;
ƒ Loading - INSERT
INSERT INTO TAB1 (COL1,COL2) VALUES(7,22);
ƒ Manipulation - UPDATE
UPDATE TAB1 SET COL2 = COL1 + 2;
ƒ Deletion - DELETE
DELETE FROM TAB1 WHERE COL2 = 4;
ƒ
Database Modeling
representation of "real world"
conceptual model
logical model
physical model
Keys
Primary Keys
ƒ
uniquely identifies a record
Foreign Keys
ƒ
pointer to a Primary Key in another table
Indexes
May be applied to columns to allow fast
data access
May be applied to single columns or
several columns
Direct pointers to rows containing specific
values in the indexed column(s)
may be unique or non-unique
May have more than one index per table
Normalisation
A technique for producing a set of
relations with desirable properties, given
the data requirements of an enterprise
Normalisation
- unnormalised
A representation of the data that contains
repeating groups
Normalization
- unnormalised form
Normalisation
- 1st normal form
A relation in which the intersection of
each row and column contains one and
only one value
1NF
Normalization
- 1st normal form
1NF
Normalisation
- 2nd normal form
A relation that is
in first normal form
ƒ every non-primary key attribute is fully
functionally dependent on the primary key
ƒ
2NF
Normalization
- 2nd normal form
2NF
Normalisation
- 3rd normal form
A relation that is
in first and second normal form and
ƒ in which no non-primary key attribute is
transitively dependent on the primary key
ƒ
3NF
Normalization
- 3rd normal form
3NF
Loading data into databases
Bulk loading tool
Data Integrity
Validation
ad-hoc loading
Data Extraction
Assemble data into usable format
Spreadsheet
Timeseries
Data Cube
Publication
Data manipulation
Inside database
ƒ
Sophisticated manipulation language - SQL
Outside database
Timeseries
Seasonal Adjustment
Chain Volume Measures (Constant Price)
ƒ SAS, SPSS
ƒ
Transactional Integrity
the ability to apply rules to the data via
database constraints
ability to group several discrete data
insertion or data manipulation into one
logical data change
In SQL, controlled via COMMIT and
ROLLBACK statements
Transactional Integrity
database constraints
ƒ
values must conform to specific rules
exist in a specific column
belong to a "set"
uniqueness
If a validation against a constraint fails
ƒ
the current transaction fails
Transactions & Recovery
Each transaction is logged by the DBMS
Backups taken periodically
Data can be recovered
to an archived backup
ƒ to a point in time
ƒ
Transaction example
COMMIT;
transaction 1
INSERT INTO TABLE1 (COL1,COL2) VALUES(7,22);
UPDATE TABLE1 SET COL1 = 77 WHERE COL2 = 22;
DELETE FROM TABLE1 WHERE COL1 = 7;
ROLLBACK;
transaction 2
INSERT INTO TABLE2 (COL3,COL4) VALUES('ABC',11);
UPDATE TABLE2 SET COL3 = 'XYZ';
DELETE FROM TABLE2 WHERE COL3 = 'xyz';
COMMIT;
References
Database Systems a Practical Approach to Design,
Implementation and Management
Thomas Connolly, Carolyn Begg, Anne Strachan
(Addison-Wesley) 1999
cartoons - Randy Glassbergen
Questions?