Data Models - Department of Computer Science

Download Report

Transcript Data Models - Department of Computer Science

CS6482
Topics on Data Engineering
Qing Li
(E-mail: [email protected])
Dept of Computer Science
City University of Hong Kong
2009 Qing Li
Course Overview

Course Format:
Tutorial classes and exercises which provide
students with supervised problem-solving exercises
Class on Wednesday in Y4701:
6:30 - 7:20pm (tutorials only)
Regular lectures, each lecturing session is about
two-hour
Classes on Wednesday in Y4701:
7:30 - 9:20pm (lectures only)
2009 Qing Li
Suggested Assessment

Continuous assessment -- 70% :
Term project -- 35%
Midterm quiz -- 25%
tutorial exercises -- 10%

Final examination -- 30%
2009 Qing Li
Course Materials

Reference books
R. Elmasri and S. Navathe, Fundamental of Database
Systems, 5th Edition (or later), Addison-Wesley.
M.T. Ozsu and P. Valduriez, Principles of Distributed
Database Systems, 2nd Edition, Prentice-Hall.
M. Stonebraker and J.M. Hellerstein, Readings in
Database Systems, 3rd Edition (or later), Morgan
Kaufmann.

Literature
selected papers from research journals, surveys,
conf. proceedngs, and collection of readings
2009 Qing Li
DB Systems: an Overview

Motivations
 Information about a particular enterprise
 File-processing Systems
permanent records stored in various files
application programs written to extract & add records
 Disadvantages
data redundancy & inconsistency
difficulty in accessing data
data isolation & different data formats
concurrent access anomalies
security problem
integrity problem
2009 Qing Li
DB Systems: an Overview

What is a Database (DB)?
 A non-redundant, persistent collection of logically related
records/files that are structured to support various processing
and retrieval needs

Database Management System (DBMS)
 A set of software programs for creating, storing, updating, and
accessing the data of a DB.
Software interface
DB
DBMS
2009 Qing Li
DB Systems: an Overview

Difference between
systems
DBMS & other programming
 the ability to manage persistent data
 primary goal of DBMS: to provide an environment that is
convenient, efficient, and robust to use in retrieving & storing
data

Other DBMS capabilities
 data modeling
 high-level languages to define, access and manipulate data
 transaction managent & concurrency control
 access control
 resiliency (recovery)
2009 Qing Li
DB Systems: an Overview

Data Abstraction
 Abstract view of the data
simplify interaction with the system
hide details of how data is stored and manipulated
 Levels of abstraction (“ANSI/SPARC 3 level architecture)
physical/internal level: data structures; how data are actually
stored
conceptual level: schema, what data are actually stored
view/external level: partial schema
2009 Qing Li
Data Abstracion: 3-level architecture
2009 Qing Li
Data Models

What is a data model?
 A data model is a collection of conceptual tools for describing
data, data relationships, operations, data semantics and
consistency constraints
 the “core” of a database

Catagories of data models
 Object-based logical models (conceptual & view levels)
the Entity-Relationship (ER) model -- mid 70’s
the Object-Oriented data models -- late 80’s
the Semantic Data Models -- early/mid 80’s
 Record-bsaed logical models (conceptual & view levels)
the Relational model -- early 70’s
the Network and Hierarchical models -- 60’s
2009 Qing Li
Data Models

Catagories of data models (cont’d)
 Physical data models (internal level)
Unifying model
Frame memory model
(these will NOT be studied in this course.)

Basic Concepts and Terminologies
 instance
- the collection of data (information) stored in the DB at a
particular moment (ie, a snapshot)
 scheme/schema
- the overall structure (design) of the DB -- relatively static
2009 Qing Li
Data Models

Basic Concepts and Terminologies (cont’d)
 Data Independence
- the ability to modify a schema definition in one level without
affecting a schema in the next higher level
- there are two kinds (a result of the 3-level architecture):
physical data independence
-- the ability to modify the physical schema without
altering the conceptual schema and thus, without
causing the application programs to be rewritten
logical data independence
-- the ability to modify the conceptual schema without
causing the application programs to be rewritten
2009 Qing Li
Data Models

Basic Concepts and Terminologies (cont’d)
 Data Definition Language (DDL)
- a language for defining DB schema
- DDL statements compile to a data dictionary which is a file
containing metadata (data about data), eg, descriptions
about the tables
 Data Manipulation Language (DML)
- a language that enables users to access and manipulate
data as organised by appropriate data model
- an important subset for retrieving data is called Query
Language
- two types of DML: procedural (specify “what” & “how”) vs.
declarative (just specify “what”)
2009 Qing Li
Data Models

Basic Concepts and Terminologies (cont’d)
 Database Administrator (DBA)
- DBA is the person who has central control over the DB
- Main functions of DBA:
schema definition
storage structure and access method definition
schema and physical organization modification
granting of authorization for data access
integrity constraint specification
2009 Qing Li
Data Models

Basic Concepts and Terminologies (cont’d)
 Database Users
- Application Programmers
embedded DML in a host language
fourth-generation languages (4GL)
- Interactive Users:
query language
- Specialized Users:
non-traditional applications
-Naive Users:
running application programs
2009 Qing Li
“Reference” DB System Architecture
Naïve user
Interactive
user
Appl. Prog’er
Application
interfaces
Application
programs
DML compiler
Application
programs
object code
DBA
(SQL) query
Query
processor
DB schema
DDL compiler
Database
manager
DBMS
File manager
Data files
2009 Qing Li
disk storage
Data dict.
DB
DB Concepts and Architecture
2009 Qing Li
“Reference” System Architecture


File Manager
 allocation of space
 operations on files
DB Manager
 interface between stored data and application programs/queries
 translate conceptual level commands into physical level ones
 responsible for
access control
concurrency control
backup & recovery
integrity
2009 Qing Li
“Reference” System Architecture



Query Processor
 translate high-level queries into low-level instructions
 query optimization
DML (Pre)compiler
 translates DML statements embedded in application program
into procedure calls
DDL (Pre)compiler
 converts DDL statements to data dictionary items (eg, table
descriptions)
2009 Qing Li
DB Concepts and Architecture

DB System Environment (cont’d)
 DB System Utilities
loading
back up
file re-organization
report generation
data dictionary
…
NEXT: Classification of DBMSs!
2009 Qing Li
Classification of DBMSs


Criteria:
 Data/Database Model
 Number of Users
single-user (eg, PC databases)
multi-user (concurrency control)
 Number of sites
centralized (logically, physically)
decentralized (logically, physically)
homogeneity vs. heterogeneity
Other Criterion:
 cost
 general-purpose vs. specialized DBMSs, ...
2009 Qing Li
Classification of DBMSs

Classification based on Data Model
 Hierarchical (late 60’s)
 Network (late 60’s)
 Relational (70’s)
 Entity-Relationship (ER)
 Semantic (80’s)
 Functional
 Object-Oriented (late 80’s/early 90’s)
 “Intelligent”
logic-based/deductive
expert/knowledge-based
hypermedia, ...
2009 Qing Li
The Entity-Relationship Model


Preliminaries
 Proposed by P. Chen in 1976
 One of the earliest “semantic” database model
 Mainly a design tool for record-based (ie, hierarchical, network,
relational) databases
Modeling Constructs
 Entity -- a distinguishable object with an independent existence
Example: John Chan, CityU, HK Bank, …
 Entity Set -- a set of entities of the same type
Example: Student, Employee, Customers, ...
2009 Qing Li
The Entity-Relationship Model

Modeling Constructs (cont’d)
 Attribute (Property) -- a piece of information describing an
entity
Example: Name, ID, Address, DoB are attributes of a
student entity
Each attribute can take a value from a domain
Example: Name  Character String,
ID  Integer, ...
Formally, an attribute A is a function which maps from
an entity set E into a domain D:
A: E  D
2009 Qing Li
The Entity-Relationship Model

Modeling Constructs (cont’d)
 Relationship -- an association among several entities
Example: Patrick and Eva are friends
Patrick is taking cs3450
 Relationship Set -- a set of relationships of the same type
Example:
taking
John
cs3450
mary
cs2578
may
ee4532
Formally, a relationship R is a subset of:
{ (e1, e2, …, ek) | e1  E1, e2  E2, …, ek  Ek) }
2009 Qing Li
The Entity-Relationship Model

Modeling Constructs (cont’d)
 Relationship vs. Attribute
an attribute A: E  D is a “simplified” form of a relationship:
If we allow D to be an Entity Set, then A becomes a
relationship
a relationship can carry attributes
properties of the relationship
Example: Patrick takes cs2450 with a grade of B+
Supplier S supplies item T with a price of P
2009 Qing Li
The Entity-Relationship Model

Modeling Constructs (cont’d)
 Entity Set vs. Attribute
What constitutes an attribute, and what constitutes an entity
set?
Example: Employee and Phone
1) employee entity set with attribute phone#
2) empPhn relationship set with entity sets employee
and phone#
No simple answer, depending on
- what we want to model
- meaning of attributes
2009 Qing Li
The Entity-Relationship Model

Integrity Constraints
 Mapping Cardinalities
One - to - One (1:1)
a
1
b
2
c
3
One - to - Many (1:M) / Many - to - One (N:1)
a
b
Many - to - Many (M:N)
??
2009 Qing Li
c
1
2
The Entity-Relationship Model
2009 Qing Li
The Entity-Relationship Model

Integrity Constraints (cont’d)
 Keys: to distinguish individual entities or relationships

 Insertion/Deletion Constraints: => “strong” vs. “weak” entities
ER Diagram
 rectangle: Entity Set
 diamond: Relationship Set
 ellipse: Attribute
 others (such as double rectangle for “weak entity set”, double
ellipses for “multi-valued attribute, underlined attribute for
key,…)
2009 Qing Li
SUMMARY OF ER-DIAGRAM NOTATION
Symbol
Meaning
ENTITY TYPE
WEAK ENTITY TYPE
RELATIONSHIP TYPE
IDENTIFYING RELATIONSHIP TYPE
ATTRIBUTE
KEY ATTRIBUTE
MULTIVALUED ATTRIBUTE
COMPOSITE ATTRIBUTE
DERIVED ATTRIBUTE
E1
E1
R
R
2009 Qing Li
E2
R
N
(min,max)
TOTAL PARTICIPATION OF E2 IN R
E2
CARDINALITY RATIO 1:N FOR E1:E2 IN R
E
STRUCTURAL CONSTRAINT (min, max) ON
PARTICIPATION OF E IN R
The Entity-Relationship Model

Integrity Constraints (cont’d)
 Keys: to distinguish individual entities or relationships
superkey -- a set of one or more attributes which, taken
together, identify uniquely an entity in an entity set
Example: {student ID, Name} identify a student
candidate key -- minimal set of attributes which allow to
identify uniquely an entity in an entity set
a superkey for which no proper subset is a superkey
Example: student ID identify a student,
but Name is not a candidate key (WHY?)
primary key -- a candidate key chose by the DB designer to
identify an entity in an entity set
2009 Qing Li
The Entity-Relationship Model

ER Diagram
 Rectangles:
 Ellipses:
 Diamonds:
 Lines:
Entity Sets
Attributes
Relationship Sets
Attributes to Entity/Relationship Sets
or, Entity Sets to Relationship Sets
m
m
1
2009 Qing Li
R
R
R
n
1
1
The Entity-Relationship Model

Weak Entity Set
 an entity set that does NOT have enough attributes to form a
primary/candidate key
Acct. no
account

trans. no
balance
log
Role Indicators
Emp. name
transaction
Multi-value attri.
Phone#
manager
employee
2009 Qing Li
date
worker
Works-for
amount
ER DIAGRAM FOR A BANK
DATABASE
2009 Qing Li
ER DIAGRAM WITH ROLE NAMES
AND MINI-MAX CONSTRAINTS
2009 Qing Li
The Entity-Relationship Model

Transformation of ER diagram to Record-based schema
 Standard transformation algorithms are available
 Mapping from ER to relational and network schemas are
straightforward
 Mapping from ER to hierarchical schema is relatively harder
Eg., for the Many - to - Many (M:N) relationships

ER Data Abstractions
 Aggregation (limited form)
 Association (Yes)
 Classification (Yes)
 Recursion (Yes)
2009 Qing Li
The Entity-Relationship Model

Summary
 The ER Model is the 1st “semantic” model centered around
relationships, not attributes
 It combines successfully the best features of the network and
relational models
 simple and easy to understand
2009
 The original model falls short of supporting more complex
applications
 Recent “Trend” on ER:
building ER database systems / interfaces
applications of ER approaches
extending the original ER to capture more “semantics”
=> Extended ER (EER) Models
Qing Li