02 Lecture 1

Download Report

Transcript 02 Lecture 1

Lecture 1
BIS4435 – Database Management for
Decision Support
Dr. Nawaz Khan
School of Computing Science
E-mail: [email protected]
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4435 – Data Management for Decision Support 1
Coursework
 Discussion Topics
 Each week you will have a discussion topic (DQ)
 DT will be available on Fridays. Reply to the DQ by Tuesdays
 Review and comment on your mates’ responses (at least one).
Lecture 1
 DQ requirements
 Initial response to DQ should be 250 words.
 Review and comments on your mates’ initial response should be
150 words.
 Appropriate references at the end of the text and proper citation of
references through out the text (avoid quotes). Harvard style ref.
 Print your initial post, response to your mates’ initial response and
any comment on your post that is made by your mates. Preserve it
in order to attach it with your CW portfolio.
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4435 - Online Database Systems
2
More on Coursework: Self Reflection
 Make sure you have one/one and half pages self reflection
on both the coursework.
 It should contain:
Lecture 1







What you have achieved so far
What you find the most difficult thing
What you find the easiest thing
What challenges you face to do the coursework
How have you overcome the challenges
What motivated you
What are the new concepts you come across and how it
helps to achieve the learning outcomes
 What you could have done better
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
3
Introduction
 Copy coursework from someone ??
Lecture 1
 See University Plagiarism policy in students’ handbook
 Don’t forget to put references (follow specific guidelines for
referencing)
 Acknowledge properly, since collusions are not accepted
 Make sure you have cited your references
 Examination: followed by the University regulations
 Contents of the module: 3 main parts
 Theoretical aspects of relational database
 Large Database: Data Warehouse and Data Mining
 Decision Support System
 General diagram of the module: see next slide
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
4
Introduction
DB Back-End
Lecture 1
Theoretical Aspects
of Relational DB
Example Relational DB:
Oracle
Decision support
System
Uncertainty, probability and Linear model,
feed forward network, self-organising
map,
Expert system and knowledge
engineering
Large Data
Management
Data Strategy and Data
Warehousing
Intelligent Database and
Data Mining
Fuzzy logic, genetic algorithm and
hybrid intelligent
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
5
Reading Materials
Lecture 1
 Connolly, T.M., and Begg, C.E., Database Systems: A
Practical Approach to Design, Implementation and
Management, Addison-Wesley, 4th Edition, ISBN:
0321210255
 Belavkin, R., Blundell, B., Cairns, P., Huyck, C.,
Mitchell, I., Stockman, T. (2005). Management Support
Systems, Middlesex University Press, ISBN: 1-89825368-4
 Module Learning Units at OasisPlus
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
6
Introduction
 File-based approach
 Shared file approach
 Database approach: 3-schema architecture
Lecture 1
 External schema
 Conceptual schema
 Internal schema





DBMS components
Characteristics of the database approach
Brief history of database systems and applications
Extending database capabilities for new applications
Reading suggestion:

Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design,
Implementation and Management, Addison-Wesley, 4th Edition, ISBN:
0321210255(chapters 1)
 Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 1)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
7
Introduction
 File-based approach


Lecture 1

Data is stored in one or more separate computer files
Data is then processed by computer programs applications
Problems:
 Data redundancy
 Data inconsistency
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
8
Introduction
Files
Customer
File
Applications
Stock
File
Customer
Orders
Order
File
Customer
File
Lecture 1
Stock
File
Customer
Invoicing
Files
Applications
Customer
File
Customer
Orders
Stock
File
Customer
Invoicing
Order
File
Order
File
Purchase
Orders
Stock
File
Purchase
Orders
Supplier
File
Supplier
File
Stock
Control
Stock
File
Stock
Control
Order
File
File-based approach
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Shared file approach
9
Introduction
 Shared file approach
Lecture 1




Data (files) is shared between different applications
Data redundancy problem is alleviated
Data inconsistency problem across different versions of the
same file is solved
Other problems:
 Rigid data structure: If applications have to share files, the file
structure that suits one application might not suit another
 Physical data dependency: If the structure of the data file
needs to be changed in some way, this alteration will need to
be reflected in all application programs that use that data file
 No support of concurrency control: While a data file is being
processed by one application, the file will not be available for
other applications or for ad hoc queries
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
10
Introduction
 Database approach

Lecture 1





DataBase Management System (DBMS): a generalpurpose software system that facilitates the processes of
defining, constructing, manipulating, and sharing databases
among various users and applications.
Database: a collection of related data managed by a DBMS
Data: known facts that can be recorded and that have
implicit meaning
Database system = the database + DBMS software
DBMS provides facilities for querying, data security, and
integrity and concurrent control
Database application - set of programs that use DBMS to
perform a particular business function
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
11
Introduction
 Database approach

Lecture 1
Advantage - physical
and logical data
independence achieved by hierarchy of
levels of data
specification
External
Schema 1
External
Schema N
External
Schema 2
Interface between
conceptual schema and
external schemas
Conceptual
Schema
Interface between
conceptual schema
and internal schema
Internal
Schema
Database
physically stored
in files on disks
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
12
Introduction
 Database approach

External schema:
Lecture 1
 Describes database as it is seen by user and applications reflects a simplified model of the world
 Allows applications to see as much of data as they require,
while excluding unrelated data items
 Interfaces with conceptual schema
 May be modified or created without altering physical storage
of data, modification reflected in the interface
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
13
Introduction
 Database approach

Conceptual schema:
Lecture 1
 Describes the universe of interest to the users of the database
system - data required
 Concerned with data rather than storage or access,
concentrates on describing entities, data types, relationships,
user operations, and constraints
 Interfaces with external and internal schema
 Logical data independence is the capability to change the
conceptual schema without having to change external
schemas or application programs
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
14
Introduction
 Database approach

Internal schema:
Lecture 1
 Definition of the way in which data is physically stored
 Interface with conceptual schema identifies how an item in the
conceptual schema is stored and accessed
 Physical data independence is the capability to change the
internal schema without having to change the conceptual
schema. Hence, the external schemas need not to be
changed as well
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
15
Introduction
 File-based approach
 Shared file approach
 Database approach: 3-schema architecture
Lecture 1
 External schema
 Conceptual schema
 Internal schema




DBMS components
Characteristics of the database approach
Brief history of database systems and applications
Extending database capabilities for new applications
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
16
Introduction
 DBMS components
Lecture 1








DBMS engine - central component
User interface - languages & interfaces
Data dictionary - data structure, user & access right, rules
Performance management - query optimisation & DBMS
reorganisation
Data integrity - intra-record, referential integrity &
concurrency
Backup and recovery - log
Application development - CASE tool
Security management - protect and control access to
database & data dictionary
 See the reading suggestion for more details
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
17
Introduction
 Characteristics of the database approach
 Self-describing nature of a database systems
 Insulation between programs and data, and data abstraction
Lecture 1
 Program-data independence + Program-operation
independence = Data abstraction
 A data model is a type of data abstraction
 Support of multiple views of the data
 Sharing of data and multi-user transaction processing
 Other advantages of using the DBMS approach
 Controlling redundancy
 Restricting unauthorized access
 Providing persistent storage for program objects
 Providing storage structures for efficient query processing
 Providing backup and recovery
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
18
Introduction
 Characteristics of the database approach
 Other advantages of using the DBMS approach (cont.)
Lecture 1
 Providing multiple user interfaces
 Representing complex relationships among data
 Enforcing integrity constraints
 Permitting inference and actions using rules: deductive
database systems
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
19
Introduction
 Extending database capabilities for new applications
Lecture 1
 Example applications: storage and retrieval of images,
videos, data mining (large amounts of data need to be stored
and analyzed), spatial databases, time series applications,
…
 More complex data structures than relational representation
 New data types except for the basic numeric and character
string types
 New operations and query languages for new data types
 New storage and retrieval methods
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
20
Distributed Database
Lecture 1
 A distributed database management systems employ a
number of computer workstations at different sites. These
workstations are part of a local network system. The
workstations contain a set of hardware and software that
allow them to be an integral part of this network and the
DDBMS must rely on these network components for its
data exchange. The workstations are needed to be
attached to each other through a communication media
that allow the sites to interact and to carry data
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
21
Introduction
 Summary:
Lecture 1








Introduction to the module, BIS4435File-based approach
Shared file approach
Database approach
DBMS components
Characteristics of the database approach
Brief history of database systems and applications
Distributed Database
Reading suggestion: do not forget !!
 Next week: Relational Data Model
 Reading suggestion:
 [1]: Chapter 1 (The relational data model and relational database
constraints)
 [3]: Unit 1: Global Campus Unit
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
22
Lecture 1
BIS4435 – Data Management for
Decision Support
Lecture 2: Relational Data Model
Dr. Nawaz Khan
School of Computing Science
E-mail: [email protected]
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4435
23
Introduction
Unit 1: Introduction to the Module: Introduction to the Database
Unit 2: Fundamentals of Relational and Object Model:
Reading Suggestion:
Lecture 1
Unit 1:
Connolly, T.M., and Begg, C.E., Database Systems: A Practical
Approach to Design, Implementation and Management, Addison-Wesley,
4th Edition, ISBN: 0321210255(chapters 1)
Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 1)
Unit 2:
Connolly, T.M., and Begg, C.E., Database Systems: A Practical
Approach to Design, Implementation and Management, Addison-Wesley,
4th Edition, ISBN: 0321210255(chapters 3)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4435
24
Relational Data Model
 Outline
Lecture 1
 Basic Concepts: relational data model, relation schema,
domain, tuple, cardinality & degree, database schema, etc.
 Relational Data Model Constraints
 key, primary key & foreign key
 entity integrity constraint
 referential integrity
 Update Operations on Relations
 insert
 deletion
 modification
 Summary
 Q&A
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
25
Relational Data Model
 Basic concepts
Lecture 1
 Relational data model: represents a database in the form of
relations - 2-dimensional table with rows and columns of data. A
database may contain one or more such tables. A relation schema
is used to describe a relation
 Relation schema: R(A1, A2,…, An) is made up of a relation name
R and a list of attributes A1, A2, . . ., An. Each attribute Ai is the
name of a role played by some domain D in the relation schema R.
R is called the name of this relation. The degree of a relation is the
number of attributes n of its relation schema.
 Domain D: D is called the domain of Ai and is denoted by dom(Ai).
It is a set of atomic values and a set of integrity constraints
STUDENT(Name, SSN, HomePhone, Address, OfficePhone, Age, GPA)
Degree = ??
dom(GPA) = ??
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
26
Relational Data Model
 Basic concepts
Lecture 1




Tuple: row in table, record
Cardinality: number of tuples (rows)
Database schema S = {R1, R2,…, Rm}
A relation (or relation state, relation instance) r of the relation
schema R(A1, A2, . . ., An), also denoted by r(R), is a set of ntuples r = {t1, t2, . . ., tm}. Each n-tuple t is an ordered list of n
values t = <v1, v2, . . ., vn>, where each value vi, i=1..n, is an
element of dom(Ai) or is a special null value. The ith value in tuple
t, which corresponds to the attribute Ai, is referred to as t[Ai]
Relational data model
Database schema
Relation schema
Relation
Tuple
Attribute
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
27
Relational Data Model
Lecture 1
 A relation can be conveniently represented by a table, as the
above example shows
 The columns of the tabular relation represent attributes
 Each attribute has a distinct name, and is always referenced by
that name, never by its position
 Each row of the table represents a tuple. The ordering of the
tuples is immaterial and all tuples must be distinct
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
28
Relational Data Model
 Outline
 Basic Concepts: relational data model, relation schema,
domain, tuple, cardinality & degree, database schema, etc.
 Relational Data Model Constraints
Lecture 1
 key, primary key & foreign key
 entity integrity constraint
 referential integrity
 Update Operations on Relations
 insert
 deletion
 modification
 Summary
 Q&A
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
29
Relational Data Model
 Relational Data Model Constraints
 Key, primary key & foreign key
 Key: A key K of a relation R is a subset of the attributes of R
which has the following time-independent properties:
Lecture 1
– Unique identification: The value of K must uniquely identify each
tuple in R
– Non-redundancy: No attribute in K can be discarded without
destroying property 1
 Primary Key: There may be more than one set of attributes
which satisfy both properties. Each of them is called a
candidate key. In this case only one candidate key must be
chosen as the key called primary key. Primary key (PK)
cannot contain null value
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
30
Relational Data Model
 Relational Data Model Constraints
 Key, primary key & foreign key
 Single primary key:
Lecture 1
TASK (TASK_NAME, START_DATE,EXPECTED_COMP_DATE,
COMP_DATE, EMPNO)
 Combined primary key: If we want to store in the task table, the
details of tasks for a number of different projects, TASK_NAME
is no longer a unique identifier. The solution is to use a further
attribute to provide a unique identifier for each task.
TASK (PROJECT_NAME, TASK_NAME, START_DATE,
EXPECTED_COMP_DATE, COMP_DATE, EMPNO)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
31
Relational Data Model
Example: A relational database schema that we call COMPANY = {EMPLOYEE,
DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT}
Lecture 1
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
32
Relational Data Model
 Relational Data Model Constraints
 Key, primary key & foreign key
Lecture 1
 Foreign Key (FK): A set of attributes is a FK of R1 if and only if it is the
primary key of another relation schema R2
 FK represents the relationship between R1 and R2
 FK can contain null value, but we may avoid this by replacing with a
flag
 Example:
TASKS (TASK_NAME, START_DATE, EXP_COMP_DATE, COMP_DATE,
EMPNO, PROJECT_NAME*)
PROJECT (PROJECT_NAME, START_DATE, EXP_COMP_DATE,
COMP_DATE, PROJECT_LEADER)
 To form the link between the two tables, we place the primary key of
the PROJECT table into the TASK table. Through the use of
PROJECT_NAME as a FK, we can see for any given task, the project
to which it belongs
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
33
Relational Data Model
 Relational Data Model Constraints
Lecture 1
 Entity Integrity Constraint: all entities in a database must be
identified by the primary key. That is why NO primary key value can
be null
 Referential Integrity Constraint: The value of a foreign key must
be meaningful, which means that a tuple in one relation that refers
to another relation must refer to an existing tuple in that relation
 Example: JobList and Company tables (see notes for labs 02, 03)
JobList
JobID CompanyID JobTitle Salary ….
Company
CompanyID CompanyName Address ...
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
34
Relational Data Model
 Outline
 Basic Concepts: relational data model, relation schema,
domain, tuple, cardinality & degree, database schema, etc.
 Relational Data Model Constraints
Lecture 1
 key, primary key & foreign key
 entity integrity constraint
 referential integrity
 Update Operations on Relations
 insert
 deletion
 modification
 Summary
 Q&A
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
35
Relational Data Model
 Update Operations on Relations
 Insertion: to insert a new tuple t into a relation R. When
inserting a new tuple, it should make sure that the database
constraints are not violated:
Lecture 1
 The value of an attribute should be of the correct data type (i.e.
from the appropriate domain).
 The value of a prime attribute (i.e. the key attribute) must not be
null
 The key value(s) must not be the same as that of an existing
tuple in the same relation
 The value of a foreign key (if any) must refer to an existing
tuple in the corresponding relation (labs !!)
 Two options if the constraints are violated:
 Reject the operation
 Rectify the operation
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
36
Relational Data Model
 Update Operations on Relations
 Deletion: to remove an existing tuple t from a relation R.
When deleting a tuple, the following constraints must not be
violated:
Lecture 1
 The tuple must already exist in the database
 The referential integrity constraint is not violated
 Four options if the constraints are violated:
 Reject the operation
 Rectify the operation to find the existing tuple concerned
 Also remove the tuples that reference the tuple being deleted
 Modify the referencing attribute values (i.e. values of foreign
keys) that cause the violation
 Modification: to change values of some attributes of an
existing tuple t in a relation R
 Test all above operations and constraints during the labs
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
37
ACTIVITY
Lecture 1
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4229 – Industrial Data Management
Technology
38
Introduction
 Summary:
 Relational Data Model
Lecture 1
 Basic concepts
 Constraints
 Update operations on relations
 Reading suggestion: do not forget !! (other relational operations)
 Next week:
 ER Modeling:
 Reading suggestion:
– [1]: Chapter 3, 7
– [3]: Unit 2
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
39
Relational Data Model
 Outline
 Basic Concepts: relational data model, relation schema,
domain, tuple, cardinality & degree, database schema, etc.
 Relational Data Model Constraints
Lecture 1
 key, primary key & foreign key
 entity integrity constraint
 referential integrity
 Update Operations on Relations
 insert
 deletion
 modification
 Summary
 Q&A
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
40