Introduction to Database

Download Report

Transcript Introduction to Database

The Database Environment
Lecture 1
The database environment
• To be able to function, an organisation needs information,
e.g. list of books in a library, customer details in a retail
business, specifications of cars and their components for a
car manufacturer
• Information may be defined as data represented in a
meaningful form. Same data shown in different ways will
provide different information to different viewers
• A major requirement of any computer system is to store
and retrieve data in a way that is meaningful to the end
user – so the core of any Information System is data, which
is to be transformed into information through data
modelling
Definitions
• Data: Meaningful facts, text, graphics, images, sound,
video segments.
• Database: An organized collection of logically related data.
• Information: Data processed to be useful in decision
making.
• Metadata: Data that describes data.
Data in Context
Summarized data
Disadvantages of file processing
systems
Still widely used today (e.g. for backup) but have the following problems:
• Program-Data Dependence (see Fig.) – file descriptions are stored
within each application that accesses file, so change to file structure
requires changes to all file descriptions in all programs.
• Data Redundancy (Duplication of data) – wasteful, inconsistent, loss
of metadata integrity (same data has different names in different files,
or same name may be used for different data in different files).
• Limited Data Sharing – users have little opportunity to share data
outside their own applications.
• Lengthy Development Times – little opportunity to re-use previous
development efforts.
• Excessive Program Maintenance – factors above combine to create
heavy maintenance load
Three file processing systems
Advantages of the
database approach
• Minimal Data Redundancy/Improved Consistency
• Data Integration
• Multiple Relationships
Database management system
• A DBMS is a data storage and retrieval system which
permits data to be stored non-redundantly while making it
appear to the user as if the data is well-integrated.
Advantages of the
database approach
•
•
•
•
•
•
•
Data Independence/Reduced Maintenance
Improved Data Sharing
Increased Application Development Productivity
Enforcement of Standards
Improved Data Quality (Constraints)
Better Data Accessibility/ Responsiveness
Security, Backup/Recovery, Concurrency
Costs and risks of the
database approach
•
•
•
•
•
New, Specialized Personnel required
Installation Management Cost and Complexity
Conversion Costs
Need for Explicit Backup and Recovery
Organizational Conflict
Pine valley furniture company
• We will be using this fictitious company as a case study
• The company’s first step was to create an Enterprise Data
Model (a model of the organisation that provides valuable
information about how the organisation functions, as well
as important constraints – it stresses the integration of data
and processes by focussing on entities, relationships and
business rules)
• First make a list of all the high-level activities that support
business activities
Pine valley furniture company
• Enterprise data model is a graphical model that shows the
high-level entities and the associations among them (see
Fig.): An ENTITY-RELATIONSHIP DIAGRAM.
• The three associations (relationships) are shown by lines
connecting the entities
• Each CUSTOMER places any number of ORDERS
(conversely, each ORDER is placed by exactly one
CUSTOMER)
• Each ORDER contains any number of ORDER LINEs
(conversely, each ORDER LINE is contained in exactly
one ORDER)
Pine valley furniture company
• Each PRODUCT has any number of ORDER LINES
(conversely, each ORDER LINE is for exactly one
PRODUCT)
Segment from enterprise data model
Figure 3
Tables
• Relational databases views all data in the form of tables
• Following Figs. a and b shows four tables, Customer,
Product, Order and OrderLine (the 4 entities shown in the
previous ER diagram).
• Each column represents an attribute, e.g. the Customer
table has attributes ID, Name, Address..etc.
• Relationships between entities are represented by values
stored in columns of the corresponding tables, e.g.
Customer_ID is an attribute of both the Customer table and
the Order table. This makes it easy to link an order with its
customer.
Fig. (a) Order and Order_Line tables
Fig. (b) Product and Customer tables
The range of
database applications
• Personal Database PCs/PDAs, Cellphones – OK in special situations
where need to share data amongst users is unlikely to arise
• Workgroup Database. Designed to support collaboration in a small
team (less than 25 people)
• Department Database typically larger than a workgroup (25-100
people) and more diverse range of functions – e.g. personnel database
• Enterprise Database – scope of the whole organisation. May be more
than one, as a single database for a large organisation may be
impractical due to performance difficulties for large databases, diverse
needs of user groups, and difficulty of achieving common definition of
data (metadata) for all users.
Typical data
from a
personal
computer
database
Workgroup database with local area network
An example of departmental database (Personnel
Department)
An enterprise
data
warehouse
Components of the
database environment
• CASE Tools – automated tools used to design databases
and applications
• Repository – generalised knowledge for all data
definitions, relationships, screen/report formats – an
extended set of metadata for managing databases and other
components of the information system
• Database Management System (DBMS) – software
(sometimes specialised hardware) used to define, create,
maintain and provide controlled access to the database and
the repository.
• Database – an organised collection of logically related data
occurrences
Components of the database
environment
• Application Programs – software used to create and
maintain the database and provide information to users.
• User Interface – languages, menus etc by which users
interact with other system components
• Data Administrators – people responsible for overall
information resources of an organisation.
• Systems analysts/programmers and end Users – people
who add, delete and modify the database and who get
information from it.
Components
of the
database
environment
Evolution of
database systems
• 1960’s – file processing systems: punch cards, paper tape,
magnetic tape – sequential access and batch processing
• 1970s - Hierarchical and Network (legacy, some still used
today) – difficulties = hard to access data (navigational
record-at-a-time procedures), limited data independence,
no widely accepted theoretical model (unlike relational)
• 1980s - Relational – E.F. Codd and others developed this
theoretically well-founded model – all data represented in
the form of tables – Oracle, DB2, Ingres
• 1990s - Object-oriented, but some organisations have to
handle large amounts of both structured and unstructured
data, so Object-relational databases developed.
Evolution of database systems
• 2000 and beyond – multi –tier, client-server,
distributed environments, web-based, contentaddressable storage, data mining
Evolution of database technologies
SQL Is:
• Structured Query Language (some for historical
reasons call it ‘Sequel’
• The standard and most common language for
relational database management systems
• An SQL-based relational database application
involves a user interface, a set of tables in the
database, and a RDBMS with an SQL capability
• Within the RDBMS SQL will be used to create the
tables, translate user requests, maintain the data
dictionary and system catalog, update an maintain
the tables, establish security, and carry out backup
and recovery procedures
The SQL environment
• Following Fig. Shows a schematic of an SQL environment
• Each database is contained in a catalog, which describes any
object that is part of the database (regardless of which user
created that object). Most companies keep at least two
versions of any database they are using. PROD_C is the live
production version, which captures real business data and
this must be very tightly controlled and monitored. DEV_C
is the development version used when the database is being
built and continues to serve as a development tool where
enhancements and maintenance efforts can be tested before
application to the production database.
• Each database will have a set of schemas associated with a
catalog.
• Schema = the structure that contains descriptions of objects
created by a user (base tables, views, constraints)
A simplified schematic of a typical SQL environment
3 types of SQL commands
• 1. Data Definition Language (DDL) commands - that
define a database, including creating, altering, and
dropping tables and establishing constraints
• 2. Data Manipulation Language (DML) commands - that
maintain and query a database
• 3. Data Control Language (DCL) commands - that control
a database, including administering privileges and
committing data
DDL, DML, DCL, and the database development process
SQL Data types
• CHAR(n) – fixed-length character data, n characters long
Maximum length = 2000 bytes
• VARCHAR2(n) – variable length character data,
maximum 4000 bytes
• LONG – variable-length character data, up to 4GB.
Maximum 1 per table
• NUMBER(p,q) – general purpose numeric data type
• INTEGER(p) – signed integer, p digits wide
• FLOAT(p) – floating point in scientific notation with p
binary digits precision
• DATE – fixed-length date/time in dd-mm-yy form
Syntax used in these notes
• Capitals = command syntax (may not be required by the
RDBMS)
• Lowercase = values that must be supplied by user
• Brackets = enclose optional syntax
• Ellipses (...) = indicate that the accompanying syntactic
clause may be repeated as necessary
• Each SQL command ends with a semicolon ‘;’
• In interactive mode, when the user presses the RETURN
key, the SQL command will execute
SQL Database Definition
• Data Definition Language (DDL) has these
major CREATE statements:
– CREATE SCHEMA – defines a portion of the database
owned by a particular user. Schemas are dependent on a
catalog and contain schema objects, including base
tables and views, domains, constraints, assertions,
character sets, collations etc.
– CREATE TABLE – defines a new table and its
columns. The table may be a base table or a derived
table. Tables are dependent on a schema. Derived tables
are created by executing a query that uses one or more
tables or views.
SQL Database Definition
• CREATE VIEW – defines a logical table from one or more
tables or views. There are limitations on updating data
through a view. Where views can be updated, those
changes can be transferred to the underlying base tables
originally referenced to create the view.
SQL database definition
• Each of the previous create commands may be reversed
using a DROP command, so DROP TABLE will destroy a
table (including its definitions, contents and any schemas
or views associated with it)
• Usually only the table creator may delete the table
• DROP SCHEMA and DROP VIEW will also delete the
named schema or view.
• ALTER TABLE may be used to change the definition of an
existing table
Creating tables
• Once data model is designed and normalised, the columns
needed for each table can be defined using the CREATE
TABLE command. The syntax for this is shown in the
following Fig. These are the seven steps to follow:
• 1. Identify the appropriate datatype for each , including
length and precision
• 2. Identify those columns that should accept null values.
Column controls that indicate a column cannot be null are
established when a table is created and are enforced for
every update of the table
•
Creating tables
• 3. Identify those columns that need to be UNQUE - when
the data in that column must have a different value (no
duplicates) for each row of data within that table. Where a
column or set of columns is designated as UNIQUE, this is
a candidate key. Only one candidate key may be
designated as a PRIMARY KEY
• 4. Identify all primary key-foreign key mates. Foreign keys
can be established immediately or later by altering the
table. The parent table in such a parent-child relationship
should be created first. The column constraint
REFERENCES can be used to enforce referential integrity
Creating tables
• 5. Determine values to be inserted into any columns for
which a DEFAULT value is desired - can be used to define
a value that is automatically inserted when no value is
provided during data entry.
• 6. Identify any columns for which domain specifications
may be stated that are more constrained than those
established by data type. Using CHECK it is possible to
establish validation rules for values to be inserted into the
database
• 7. Create the table and any desired indexes using the
CREATE TABLE and CREATE INDEX statements
Table creation
General syntax for CREATE TABLE
Table creation
• The following Fig. Shows SQL database definition
commands
• Here some additional column constraints are shown, and
primary and foreign keys are given names
• For example, the CUSTOMER table’s primary key is
CUSTOMER_ID
• The primary key constraint is named CUSTOMER_PK,
without the constraint name a system identifier would be
assigned automatically and the identifier would be difficult
to read
SQL database definition commands for Pine Valley Furniture
STEP 1
Defining
attributes and
their data types
STEP2
Non-nullable
specifications
Note: primary
keys should not
be null
STEP 3
Identifying
primary keys
This is a composite
primary key
STEP 4
Identifying
foreign keys and
establishing
relationships
STEPS 5 and 6
Default values
and domain
constraints
STEP 7
Overall table
definitions