Lecture2 - The University of Texas at Dallas
Download
Report
Transcript Lecture2 - The University of Texas at Dallas
Data and Applications Security
Developments and Directions
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
Lecture #2
Supporting Technologies: Data Management
January 13, 2005
Objective of the Unit
This unit will provide an overview of the concepts and
developments in data management
Reference: Data Management Systems: Evolution and
Interoperation, Thuraisingham, CRC Press, 1997
Outline of the Unit
Concepts in database systems
Types of database systems
Distributed Data Management
Heterogeneous database integration
Federated data management
Concepts in Database Systems
Definition of a Database system
Early systems
Metadata
Architectural Issues
- Schema, Functional
DBMS Design Issues
Other Issues
- Database design, Administration
Database System
Consists of database, hardware, Database Management System
(DBMS), and users
Database is the repository for persistent data
Hardware consists of secondary storage volumes, processors, and
main memory
DBMS handles all users’ access to the database
Users include application programmers, end users, and the
Database Administrator (DBA)
Need: Reduced redundancy, avoids inconsistency, ability to share
data, enforce standards, apply security restrictions, maintain
integrity, balance conflicting requirements
We have used the definition of a database management system
given in C. J. Date’s Book (Addison Wesley, 1990)
An Example Database System
Application
Programs
Database Management System
Database
Adapted from C. J. Date, Addison Wesley, 1990
Users
Metadata
Metadata describes the data in the database
- Example:
Database D consists of a relation EMP with
attributes SS#, Name, and Salary
Metadatabase stores the metadata
- Could be physically stored with the database
Metadatabase may also store constraints and administrative
information
Metadata is also referred to as the schema or data dictionary
Three-level Schema Architecture: Details
User A1
External
Schema A
User A2
User A3
User B2
User B1
External
Model A
External
Model B
External/Conceptual
Mapping A
Conceptual
Schema
Conceptual
Model
External
Schema B
External/Conceptual
Mapping B
Conceptual/Internal Mapping
Internal
Schema
Stored
Database
Internal
Model
Functional Architecture
Data Management
User Interface Manager
Schema
(Data Dictionary)
Manager
(metadata)
Query
Manager
Security/
Integrity
Manager
Transaction Manager
Storage Management
File
Manager
Disk
Manager
DBMS Design Issues
Query Processing
- Optimization techniques
Transaction Management
- Techniques for concurrency control and recovery
Metadata Management
- Techniques for querying and updating the metadatabase
Security/Integrity Maintenance
- Techniques for processing integrity constraints and enforcing
access control rules
Storage management
- Access methods and index strategies for efficient access to the
database
Other Issues
Database design
- Generally a two-step process
Semantic data model to capture the entities of the
application and the relationships between the entities
Generate the conceptual schema; theory of normal forms for
relational databases
- Research on object-oriented approaches for database design
Database Administration
- Creating and deleting databases; backup and recovery,
enforcing policies, auditing, etc.
Types of Database Systems
Relational Database Systems
Object Database Systems
Deductive Database Systems
Other
- Real-time, Secure, Parallel, Scientific, Temporal, Wireless,
Functional, Entity-Relationship, Sensor/Stream Database
Systems, etc.
Relational Database: Informal Overview
Collection of tables also called relations
Table has one or more columns also called attributes
Each table has zero or more rows also called tuples
Elements of a row take values from a pool of legal values
The values of one or more columns in a row uniquely identify
the row. These columns form an identifier (also called key)
One identifier is designated as the unique identifier (also called
primary key)
Querying relational databases using language called SQL
(Structured Query Language)
Relational Database: Example
Relation S:
S#
S1
S2
S3
S4
S5
SNAME
Smith
Jones
Blake
Clark
Adams
Relation SP:
STATUS CITY
20
London
10
Paris
30
Paris
20
London
30
Athens
Relation P:
P#
P1
P2
P3
P4
P5
P6
PNAME
Nut
Bolt
Screw
Screw
Cam
Cog
COLOR WEIGHT CITY
Red
12
London
Green
17
Paris
Blue
17
Rome
Red
14
London
Blue
12
Paris
Red
19
London
S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
QTY
300
200
400
200
100
100
300
400
200
200
300
400
Concepts in Object Database Systems
Objects- every entity is an object
- Example: Book, Film, Employee, Car
Class
- Objects with common attributes are grouped into a class
Attributes or Instance Variables
- Properties of an object class inherited by the object instances
Class Hierarchy
- Parent-Child class hierarchy
Composite objects
- Book object with paragraphs, sections etc.
Methods
- Functions associated with a class
Example Class Hierarchy
Document
Class
D1
D2
ID
Name
Author
Publisher
Method1:
Print-doc-att(ID)
Journal
Book Subclass
B1
Method2:
Print-doc(ID)
Subclass
Volume #
# of Chapters
J1
Example Composite Object
Composite
Document
Object
Section 2
Object
Section 1
Object
Paragraph 1
Object
Paragraph 2
Object
Deductive Database Systems
Database systems augmented with inference engines to deduce new
data from existing data and rules
Example
- Rule: parent of a parent is a grandparent
- Data: John is Jane’s parent; Jane is Robert’s parent
- From the above, infer John is Robert’s grandparent
Loose and tight coupling architectures between the database system
and inference engine
A Definition of a Distributed Database System
A collection of database systems connected via a network
The software that is responsible for interconnection is a Distributed
Database Management System (DDBMS)
Each DBMS executes local applications and should be involved in at
least one global application (Ceri and Pelagetti)
Homogeneous environment
Architecture
Database 1
Database 3
DBMS 3
Distributed
Processor 3
Site 3
DBMS 1
Distributed
Processor 1
Communication Network
Site 1
Database 2
Distributed
Processor 2
DBMS 2
Site 2
Distributed Processor
Network Interface
Distributed
Query/Update
Processor
Distributed Metadata
Management
Distributed
Transaction
Manager
Integrity/
Security
Manager
Local DBMS Interface
Data Distribution
SITE 1
EMP1
DEPT1
SS#
Name
Salary
D#
D#
Dname
MGR
1
2
3
4
5
6
John
Paul
James
Jill
Mary
Jane
20
30
40
50
60
70
10
20
20
20
10
20
10
C. Sci.
Jane
30
English
David
40
French
Peter
D#
DEPT2
Dname
MGR
50
Math
John
20
Physics
Paul
SITE 2
EMP2
SS#
9
Name
Mathew
Salary
70
D#
50
7
David
80
30
8
Peter
90
40
Distributed Database Functions
Distributed Query Processing
- Optimization techniques across the databases
Distributed Transaction Management
- Techniques for distributed concurrency control and
recovery
Distributed Metadata Management
- Techniques for managing the distributed metadata
Distributed Security/Integrity Maintenance
- Techniques for processing integrity constraints and
enforcing access control rules across the databases
Query Processing Example (Concluded)
DQP
(Distributed
Query
Processor)
Network
DQP
DBMS 1
EMP1 (20)
DQP
DBMS 2
EMP2 (30)
DEPT2 (20)
Query at site 1: Join EMP and DEPT on D#
Move EMP2 to site 3; Merge EMP1, EMP2, EMP3 to form EMP
Move DEPT2 to site 3; Merge DEPT2 and DEPT3 to form DEPT
Join EMP and DEPT; Move result to site 1
DQP
DBMS 3
EMP1 (20)
EMP3 (50)
DEPT3 (30)
Transaction Processing Example
Issues:
Concurrency control
Recovery
Data Replication
Site 1
Coordinator
DTM (Distributed Transaction Manager)
responsible for executing the distributed
transaction
Transaction Tj
Subtransaction Tj4
Subtransaction Tj2
Site 2
Participant
Subtransaction Tj3
Site 3
Participant
Two-phase commit:
Coordinator queries participants whether they are
ready to commit
If all participants agree, then coordinator sends request for
the participants to commit
Site 4
Participant
Interoperability of Heterogeneous Database
Systems
Database System A
Database System B
(Relational)
(ObjectOriented)
Network
Transparent access
to heterogeneous
databases both users
and application
programs;
Query, Transaction
processing
Database System C
(Legacy)
Technical Issues on the Interoperability of
Heterogeneous Database Systems
Heterogeneity with respect to data models, schema, query
processing, query languages, transaction management, semantics,
integrity, and security policies
Interoperability based on client-server architectures
Federated database management
- Collection of cooperating, autonomous, and possibly
heterogeneous component database systems, each belonging
to one or more federations
Different Data Models
Network
Node A
Node B
Database
Database
Relational
Model
Network
Model
Node C
Database
Hierarchical
Model
Node D
Database
ObjectOriented Model
Developments: Tools for interoperability; commercial products
Challenges:
Global data model
Schema Integration and Transformation: An
approach
External
Schema I
External
Schema III
External
Schema II
Global Schema: Integrate
the generic schemas
Generic schema
describing
the relational
database
Schema
describing
the relational
database
Generic schema
describing
the network
database
Generic schema
describing
the hierarchical
database
Generic schema
describing
the object-oriented
database
Schema
describing
the network
database
Schema
describing
the hierarchical
database
Schema
describing
the object-oriented
database
Challenges: Selecting appropriate generic representation;
maintaining consistency during transformations;
schema evolution
Semantic Heterogeneity
Semantic heterogeneity occurs when there is a disagreement about
the meaning or interpretation of the same data
Object O
Challenges:
Standard definitions;
Repositories
Node A
Database
Object O
interpreted as
a passenger ship
Node B
Database
Object O
interpreted as
a submarine
Federated Database Management
Database System A
Database System B
Federation
F1
Cooperating database
systems yet maintaining
some degree of
autonomy
Federation
F2
Database System C
Autonomy
component A honors
the local request first
request from component
local request
Component A
component A
does not
communicate
with
component C
Component B
communication
through
federation
Component C
Challenges:
Adapt techniques
to handle autonomy e.g., transaction
processing, schema
integration; transition
research to products
Schema Integration and Transformation in a
Federated Environment
External
Schema 1.1
External
Schema 2.1
External
Schema 1.2
Federated Schema
for FDS - 2
Federated Schema
for FDS - 1
Export Schema I
for Component B
Export Schema
for Component A
Generic Schema
for Component A
Component Schema
for Component A
Local
Schema 1
External
Schema 2.2
Export Schema II
for Component B
Export Schema
for Component C
Generic Schema
for Component B
Generic Schema
for Component C
Component Schema
for Component B
Component Schema
for Component C
Local
Schema 2
Adapted from Sheth and Larson, ACM Computing Surveys, September 1990
Federated Data and Policy Management
Data/Policy for Federation
Export
Data/Policy
Export
Data/Policy
Export
Data/Policy
Component
Data/Policy for
Agency A
Component
Data/Policy for
Agency C
Component
Data/Policy for
Agency B
Current Status and Directions
Developments
- Several prototypes and some commercial products
- Tools for schema integration and transformation
- Standards for interoperable database systems
Challenges being addressed
- Semantic heterogeneity
- Autonomy and federation
- Global transaction management
- Integrity and Security
New challenges
- Scale
- Web data management