CS263Lecture1

Download Report

Transcript CS263Lecture1

CS263 Information Modeling
Course tutor: Tony Browne [email protected]
Some diagrams in these notes are small when printed 6 to a page,
however all notes are available on departmental webpage:
http://www.computing.surrey.ac.uk/courses/cs263
• Now: Download and install: MySQL 5 from http://www.mySql.com
• DBDesigner from http://www.fabforce.net
• Database System by P. Rob, Thompson Publishers, ISBN 1-41883593-5 (out this year)
• A guide to MySQL by P. J. Pratt, Thompson Publishing, 1-41883635-4
Definitions
• Data: Meaningful facts, text, graphics, images, sound,
video segments.
• Database: An organized collection of logically related data.
• Information: Data processed to be useful in decision
making.
• Metadata: Data that describes data.
Evolution of
database systems
• 1960’s – file processing systems: punch cards, paper tape,
magnetic tape – sequential access and batch processing
• 1970s - Hierarchical and Network (legacy, some still used
today) – difficulties = hard to access data (navigational
record-at-a-time procedures), limited data independence,
no widely accepted theoretical model (unlike relational)
• 1980s - Relational – E.F. Codd and others developed this
theoretically well-founded model – all data represented in
the form of tables – Oracle, DB2, Ingres
• 1990s - Object-oriented, but some organisations have to
handle large amounts of both structured and unstructured
data, so Object-relational databases developed.
Evolution of database systems
• 2000 and beyond – multi –tier, client-server,
distributed environments, web-based, contentaddressable storage, data mining
Evolution of database technologies
Three file processing systems
Disadvantages of file processing
systems
Still widely used today (e.g. for backup) but have the following problems:
• Program-Data Dependence (see Fig.) – file descriptions are stored
within each application that accesses file, so change to file structure
requires changes to all file descriptions in all programs.
• Data Redundancy (Duplication of data) – wasteful, inconsistent, loss
of metadata integrity (same data has different names in different files,
or same name may be used for different data in different files).
• Limited Data Sharing – users have little opportunity to share data
outside their own applications.
• Lengthy Development Times – little opportunity to re-use previous
development efforts.
• Excessive Program Maintenance – factors above combine to create
heavy maintenance load
Advantages of the
database approach
•
•
•
•
•
•
•
Minimal Data Redundancy/Improved Consistency
Improved Data Sharing
Increased Application Development Productivity
Improved Data Quality (Constraints)
Better Data Accessibility/ Responsiveness
Security, Backup/Recovery, Concurrency
DISADVANTAGES? Specialized Personnel required,
Management Cost and Complexity, Organizational
Conflict
Pine valley furniture company
• We will be using this fictitious company as a case study
• The company’s first step was to create an Enterprise Data
Model (a model of the organisation that provides valuable
information about how the organisation functions, as well
as important constraints – it stresses the integration of data
and processes by focussing on entities, relationships and
business rules)
• Enterprise data model is a graphical model that shows the
high-level entities and the associations among them (see
Fig. 3): An ENTITY-RELATIONSHIP DIAGRAM.
Pine valley furniture company
• The three associations (relationships) are shown by lines
connecting the entities
• Each CUSTOMER places any number of ORDERS
(conversely, each ORDER is placed by exactly one
CUSTOMER)
• Each ORDER contains any number of ORDER LINEs
(conversely, each ORDER LINE is contained in exactly
one ORDER)
• Each PRODUCT has any number of ORDER LINES
(conversely, each ORDER LINE is for exactly one
PRODUCT)
Figure 3
Tables
• Relational databases views all data in the form of tables
• Each column represents an attribute, e.g. the Customer
table has attributes ID, Name, Address..etc.
• Relationships between entities are represented by values
stored in columns of the corresponding tables, e.g.
Customer_ID is an attribute of both the Customer table and
the Order table. This makes it easy to link an order with its
customer.
SQL: Structured Query Language (some
for historical reasons call it ‘Sequel’)
• The standard and most common language for
relational database management systems
• An SQL-based relational database application
involves a user interface, a set of tables in the
database, and a RDBMS with an SQL capability
• Within the RDBMS SQL will be used to create the
tables, translate user requests, maintain the data
dictionary and system catalog, update an maintain
the tables, establish security, and carry out backup
and recovery procedures
The SQL environment
.
• Each database will have a set of schemas associated with a
catalog.
• Schema = the structure that contains descriptions of objects
created by a user (base tables, views, constraints)
• FOLLOWING IS A CRASH-COURSE IN SQL – DON’T
PANIC! – WE WILL BE GOING OVER THESE
CONCEPTS IN THE FOLLOWING WEEKS
3 types of SQL commands
• 1. Data Definition Language (DDL) commands that define a database, including creating, altering,
and dropping tables and establishing constraints
• 2. Data Manipulation Language (DML)
commands - that maintain and query a database
• 3. Data Control Language (DCL) commands - that
control a database, including administering
privileges and committing data
DDL, DML, DCL, and the database development process
SQL Data types
• CHAR(n) – fixed-length character data, n characters long
Maximum length = 2000 bytes
• VARCHAR2(n) – variable length character data,
maximum 4000 bytes
• LONG – variable-length character data, up to 4GB.
Maximum 1 per table
• NUMBER(p,q) – general purpose numeric data type
• INTEGER(p) – signed integer, p digits wide
• FLOAT(p) – floating point in scientific notation with p
binary digits precision
• DATE – fixed-length date/time in dd-mm-yy form
SQL database definition
• Each of the previous create commands may be reversed
using a DROP command, so DROP TABLE will destroy a
table (including its definitions and contents)
• Usually only the table creator may delete the table.
• ALTER TABLE may be used to change the definition of an
existing table
Creating tables
• Once data model is designed and normalised, the columns
needed for each table can be defined using the CREATE
TABLE command. The syntax for this is shown in the
following Fig. These are the seven steps to follow:
• 1. Identify the appropriate datatype for each, including
length and precision
• 2. Identify those columns that should accept null values.
Column controls that indicate a column cannot be null are
established when a table is created and are enforced for
every update of the table
Creating tables
• 3. Identify those columns that need to be UNIQUE - when
the data in that column must have a different value (no
duplicates) for each row of data within that table. Where a
column or set of columns is designated as UNIQUE, this is
a candidate key. Only one candidate key may be
designated as a PRIMARY KEY
• 4. Identify all primary key-foreign key mates. Foreign keys
can be established immediately or later by altering the
table. The parent table in such a parent-child relationship
should be created first. The column constraint
REFERENCES can be used to enforce referential integrity
Creating tables
• 5. Determine values to be inserted into any columns for
which a DEFAULT value is desired - can be used to define
a value that is automatically inserted when no value is
provided during data entry.
• 6. Identify any columns for which domain specifications
may be stated that are more constrained than those
established by data type. Using CHECK it is possible to
establish validation rules for values to be inserted into the
database
• 7. Create the table and any desired indexes using the
CREATE TABLE and CREATE INDEX statements
Table creation
Table creation
• The following Fig. Shows SQL database definition
commands
• Here some additional column constraints are shown, and
primary and foreign keys are given names
• For example, the CUSTOMER table’s primary key is
CUSTOMER_ID
• The primary key constraint is named CUSTOMER_PK,
without the constraint name a system identifier would be
assigned automatically and the identifier would be difficult
to read
SQL database definition commands for Pine Valley Furniture
STEP 1
Defining
attributes and
their data types
STEP2
Non-nullable
specifications
Note: primary
keys should not
be null
STEP 3
Identifying
primary keys
This is a composite
primary key
STEP 4
Identifying
foreign keys and
establishing
relationships
STEPS 5 and 6
Default values
and domain
constraints
STEP 7
Overall table
definitions