Data Models - Department of Computer Science
Download
Report
Transcript Data Models - Department of Computer Science
CS6482
Topics on Data Engineering
Qing Li
(E-mail: [email protected])
Dept of Computer Science
City University of Hong Kong
2009 Qing Li
Course Overview
Course Format:
Tutorial classes and exercises which provide
students with supervised problem-solving exercises
Class on Wednesday in Y4701:
6:30 - 7:20pm (tutorials only)
Regular lectures, each lecturing session is about
two-hour
Classes on Wednesday in Y4701:
7:30 - 9:20pm (lectures only)
2009 Qing Li
Suggested Assessment
Continuous assessment -- 70% :
Term project -- 35%
Midterm quiz -- 25%
tutorial exercises -- 10%
Final examination -- 30%
2009 Qing Li
Course Materials
Reference books
R. Elmasri and S. Navathe, Fundamental of Database
Systems, 5th Edition (or later), Addison-Wesley.
M.T. Ozsu and P. Valduriez, Principles of Distributed
Database Systems, 2nd Edition, Prentice-Hall.
M. Stonebraker and J.M. Hellerstein, Readings in
Database Systems, 3rd Edition (or later), Morgan
Kaufmann.
Literature
selected papers from research journals, surveys,
conf. proceedngs, and collection of readings
2009 Qing Li
DB Systems: an Overview
Motivations
Information about a particular enterprise
File-processing Systems
permanent records stored in various files
application programs written to extract & add records
Disadvantages
data redundancy & inconsistency
difficulty in accessing data
data isolation & different data formats
concurrent access anomalies
security problem
integrity problem
2009 Qing Li
DB Systems: an Overview
What is a Database (DB)?
A non-redundant, persistent collection of logically related
records/files that are structured to support various processing
and retrieval needs
Database Management System (DBMS)
A set of software programs for creating, storing, updating, and
accessing the data of a DB.
Software interface
DB
DBMS
2009 Qing Li
DB Systems: an Overview
Difference between
systems
DBMS & other programming
the ability to manage persistent data
primary goal of DBMS: to provide an environment that is
convenient, efficient, and robust to use in retrieving & storing
data
Other DBMS capabilities
data modeling
high-level languages to define, access and manipulate data
transaction managent & concurrency control
access control
resiliency (recovery)
2009 Qing Li
DB Systems: an Overview
Data Abstraction
Abstract view of the data
simplify interaction with the system
hide details of how data is stored and manipulated
Levels of abstraction (“ANSI/SPARC 3 level architecture)
physical/internal level: data structures; how data are actually
stored
conceptual level: schema, what data are actually stored
view/external level: partial schema
2009 Qing Li
Data Abstracion: 3-level architecture
2009 Qing Li
Data Models
What is a data model?
A data model is a collection of conceptual tools for describing
data, data relationships, operations, data semantics and
consistency constraints
the “core” of a database
Catagories of data models
Object-based logical models (conceptual & view levels)
the Entity-Relationship (ER) model -- mid 70’s
the Object-Oriented data models -- late 80’s
the Semantic Data Models -- early/mid 80’s
Record-bsaed logical models (conceptual & view levels)
the Relational model -- early 70’s
the Network and Hierarchical models -- 60’s
2009 Qing Li
Data Models
Catagories of data models (cont’d)
Physical data models (internal level)
Unifying model
Frame memory model
(these will NOT be studied in this course.)
Basic Concepts and Terminologies
instance
- the collection of data (information) stored in the DB at a
particular moment (ie, a snapshot)
scheme/schema
- the overall structure (design) of the DB -- relatively static
2009 Qing Li
Data Models
Basic Concepts and Terminologies (cont’d)
Data Independence
- the ability to modify a schema definition in one level without
affecting a schema in the next higher level
- there are two kinds (a result of the 3-level architecture):
physical data independence
-- the ability to modify the physical schema without
altering the conceptual schema and thus, without
causing the application programs to be rewritten
logical data independence
-- the ability to modify the conceptual schema without
causing the application programs to be rewritten
2009 Qing Li
Data Models
Basic Concepts and Terminologies (cont’d)
Data Definition Language (DDL)
- a language for defining DB schema
- DDL statements compile to a data dictionary which is a file
containing metadata (data about data), eg, descriptions
about the tables
Data Manipulation Language (DML)
- a language that enables users to access and manipulate
data as organised by appropriate data model
- an important subset for retrieving data is called Query
Language
- two types of DML: procedural (specify “what” & “how”) vs.
declarative (just specify “what”)
2009 Qing Li
Data Models
Basic Concepts and Terminologies (cont’d)
Database Administrator (DBA)
- DBA is the person who has central control over the DB
- Main functions of DBA:
schema definition
storage structure and access method definition
schema and physical organization modification
granting of authorization for data access
integrity constraint specification
2009 Qing Li
Data Models
Basic Concepts and Terminologies (cont’d)
Database Users
- Application Programmers
embedded DML in a host language
fourth-generation languages (4GL)
- Interactive Users:
query language
- Specialized Users:
non-traditional applications
-Naive Users:
running application programs
2009 Qing Li
“Reference” DB System Architecture
Naïve user
Interactive
user
Appl. Prog’er
Application
interfaces
Application
programs
DML compiler
Application
programs
object code
DBA
(SQL) query
Query
processor
DB schema
DDL compiler
Database
manager
DBMS
File manager
Data files
2009 Qing Li
disk storage
Data dict.
DB
DB Concepts and Architecture
2009 Qing Li
“Reference” System Architecture
File Manager
allocation of space
operations on files
DB Manager
interface between stored data and application programs/queries
translate conceptual level commands into physical level ones
responsible for
access control
concurrency control
backup & recovery
integrity
2009 Qing Li
“Reference” System Architecture
Query Processor
translate high-level queries into low-level instructions
query optimization
DML (Pre)compiler
translates DML statements embedded in application program
into procedure calls
DDL (Pre)compiler
converts DDL statements to data dictionary items (eg, table
descriptions)
2009 Qing Li
DB Concepts and Architecture
DB System Environment (cont’d)
DB System Utilities
loading
back up
file re-organization
report generation
data dictionary
…
NEXT: Classification of DBMSs!
2009 Qing Li
Classification of DBMSs
Criteria:
Data/Database Model
Number of Users
single-user (eg, PC databases)
multi-user (concurrency control)
Number of sites
centralized (logically, physically)
decentralized (logically, physically)
homogeneity vs. heterogeneity
Other Criterion:
cost
general-purpose vs. specialized DBMSs, ...
2009 Qing Li
Classification of DBMSs
Classification based on Data Model
Hierarchical (late 60’s)
Network (late 60’s)
Relational (70’s)
Entity-Relationship (ER)
Semantic (80’s)
Functional
Object-Oriented (late 80’s/early 90’s)
“Intelligent”
logic-based/deductive
expert/knowledge-based
hypermedia, ...
2009 Qing Li
The Entity-Relationship Model
Preliminaries
Proposed by P. Chen in 1976
One of the earliest “semantic” database model
Mainly a design tool for record-based (ie, hierarchical, network,
relational) databases
Modeling Constructs
Entity -- a distinguishable object with an independent existence
Example: John Chan, CityU, HK Bank, …
Entity Set -- a set of entities of the same type
Example: Student, Employee, Customers, ...
2009 Qing Li
The Entity-Relationship Model
Modeling Constructs (cont’d)
Attribute (Property) -- a piece of information describing an
entity
Example: Name, ID, Address, DoB are attributes of a
student entity
Each attribute can take a value from a domain
Example: Name Character String,
ID Integer, ...
Formally, an attribute A is a function which maps from
an entity set E into a domain D:
A: E D
2009 Qing Li
The Entity-Relationship Model
Modeling Constructs (cont’d)
Relationship -- an association among several entities
Example: Patrick and Eva are friends
Patrick is taking cs3450
Relationship Set -- a set of relationships of the same type
Example:
taking
John
cs3450
mary
cs2578
may
ee4532
Formally, a relationship R is a subset of:
{ (e1, e2, …, ek) | e1 E1, e2 E2, …, ek Ek) }
2009 Qing Li
The Entity-Relationship Model
Modeling Constructs (cont’d)
Relationship vs. Attribute
an attribute A: E D is a “simplified” form of a relationship:
If we allow D to be an Entity Set, then A becomes a
relationship
a relationship can carry attributes
properties of the relationship
Example: Patrick takes cs2450 with a grade of B+
Supplier S supplies item T with a price of P
2009 Qing Li
The Entity-Relationship Model
Modeling Constructs (cont’d)
Entity Set vs. Attribute
What constitutes an attribute, and what constitutes an entity
set?
Example: Employee and Phone
1) employee entity set with attribute phone#
2) empPhn relationship set with entity sets employee
and phone#
No simple answer, depending on
- what we want to model
- meaning of attributes
2009 Qing Li
The Entity-Relationship Model
Integrity Constraints
Mapping Cardinalities
One - to - One (1:1)
a
1
b
2
c
3
One - to - Many (1:M) / Many - to - One (N:1)
a
b
Many - to - Many (M:N)
??
2009 Qing Li
c
1
2
The Entity-Relationship Model
2009 Qing Li
The Entity-Relationship Model
Integrity Constraints (cont’d)
Keys: to distinguish individual entities or relationships
Insertion/Deletion Constraints: => “strong” vs. “weak” entities
ER Diagram
rectangle: Entity Set
diamond: Relationship Set
ellipse: Attribute
others (such as double rectangle for “weak entity set”, double
ellipses for “multi-valued attribute, underlined attribute for
key,…)
2009 Qing Li
SUMMARY OF ER-DIAGRAM NOTATION
Symbol
Meaning
ENTITY TYPE
WEAK ENTITY TYPE
RELATIONSHIP TYPE
IDENTIFYING RELATIONSHIP TYPE
ATTRIBUTE
KEY ATTRIBUTE
MULTIVALUED ATTRIBUTE
COMPOSITE ATTRIBUTE
DERIVED ATTRIBUTE
E1
E1
R
R
2009 Qing Li
E2
R
N
(min,max)
TOTAL PARTICIPATION OF E2 IN R
E2
CARDINALITY RATIO 1:N FOR E1:E2 IN R
E
STRUCTURAL CONSTRAINT (min, max) ON
PARTICIPATION OF E IN R
The Entity-Relationship Model
Integrity Constraints (cont’d)
Keys: to distinguish individual entities or relationships
superkey -- a set of one or more attributes which, taken
together, identify uniquely an entity in an entity set
Example: {student ID, Name} identify a student
candidate key -- minimal set of attributes which allow to
identify uniquely an entity in an entity set
a superkey for which no proper subset is a superkey
Example: student ID identify a student,
but Name is not a candidate key (WHY?)
primary key -- a candidate key chose by the DB designer to
identify an entity in an entity set
2009 Qing Li
The Entity-Relationship Model
ER Diagram
Rectangles:
Ellipses:
Diamonds:
Lines:
Entity Sets
Attributes
Relationship Sets
Attributes to Entity/Relationship Sets
or, Entity Sets to Relationship Sets
m
m
1
2009 Qing Li
R
R
R
n
1
1
The Entity-Relationship Model
Weak Entity Set
an entity set that does NOT have enough attributes to form a
primary/candidate key
Acct. no
account
trans. no
balance
log
Role Indicators
Emp. name
transaction
Multi-value attri.
Phone#
manager
employee
2009 Qing Li
date
worker
Works-for
amount
ER DIAGRAM FOR A BANK
DATABASE
2009 Qing Li
ER DIAGRAM WITH ROLE NAMES
AND MINI-MAX CONSTRAINTS
2009 Qing Li
The Entity-Relationship Model
Transformation of ER diagram to Record-based schema
Standard transformation algorithms are available
Mapping from ER to relational and network schemas are
straightforward
Mapping from ER to hierarchical schema is relatively harder
Eg., for the Many - to - Many (M:N) relationships
ER Data Abstractions
Aggregation (limited form)
Association (Yes)
Classification (Yes)
Recursion (Yes)
2009 Qing Li
The Entity-Relationship Model
Summary
The ER Model is the 1st “semantic” model centered around
relationships, not attributes
It combines successfully the best features of the network and
relational models
simple and easy to understand
2009
The original model falls short of supporting more complex
applications
Recent “Trend” on ER:
building ER database systems / interfaces
applications of ER approaches
extending the original ER to capture more “semantics”
=> Extended ER (EER) Models
Qing Li