database - GCG-42
Download
Report
Transcript database - GCG-42
RELATIONAL DATABASE MANAGEMENT SYSTEM - I
Subject code : BCA-12
Internal Assessment : 40
External Assessment : 60
1
Overview of DBMS
Section A
2
Database
• Related information when placed in an organized
form makes a database.
• Examples
– Dictionary
– Telephone Directory
– Student record register
– Address book
• Two approaches for storing data in computers– File based approach
– Database approach
3
Why do we need a database?
• Keep records of our:
– Clients
– Staff
– Volunteers
• To keep a record of
activities and
interventions;
• Keep sales records;
• Develop reports;
• Perform research
File based Approach
• A group of files storing data of an organization.
• Each file is independent from one another.
• Each file contains and processes information
for one specific function, such as accounting or
inventory.
• Files are designed using programs written in
programming languages such as COBOl,C,C++.
• File systems may use a storage device such as
hard disk or CD-ROM etc and involve
maintaining physical location of the files.
5
Database Approach
6
• A shared collection of logically related data, designed
to meet the information needs of an organization.
• Database is a single, large repository of data, which can
be used simultaneously by many departments and
users.
Characteristics of Database
• Shared
• Persistence
• Validity/Integrity/Correctness
• Security
• Consistency
• Non-redundancy
• Data Independence
Database Management System (DBMS)
A software system that
• Allows users to define, create and maintain a
database
• Provides controlled access to database
• DBMS provides an environment that is both
convenient and efficient to use.
Examples
– Computerized Library systems
– Automated teller machines
– Flight reservations systems
7
• Database Applications:
– Banking: all transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Manufacturing: production, inventory, orders,
supply chain
– Human resources: employee records, salaries, tax
deductions
• Commercially available DBMS are
8
–
–
–
–
Dbase
FoxPro
IMS
Oracle
Purpose of Database System
• In the early days, database applications were built on
top of file systems
• Drawbacks of using file systems to store data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information
in different files
– Difficulty in accessing data
• Need to write a new program to carry out each
new task
– Data isolation — multiple files and formats
– Integrity problems
• Integrity constraints (e.g. account balance > 0)
become part of program code
• Hard to add new constraints or change existing
ones
9
Purpose of Database Systems (Cont.)
10
• Drawbacks of using file systems (cont.)
– Atomicity of updates
• Failures may leave database in an inconsistent state
with partial updates carried out
• E.g. transfer of funds from one account to another
should either complete or not happen at all
– Concurrent access by multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to
inconsistencies.E.g. two people reading a balance
and updating it at the same time
– Security problems
• Database systems offer solutions to all the above
problems
Major Components of Database System
•
•
•
•
Data
Hardware (Computer System)
Software (DBMS)
Users (Naïve ,Online, Sophisticated users, Application
Programmers, DBA)
• Procedures (Instructions)
DBMS
DATABASE
DATA AND PROGRAMS
APPLICATION PROGRAMMERS
11
END USERS
Advantages of DBMS
•
•
•
•
•
•
•
12
Controlling Redundancy
Integrity can be enforced
Inconsistency can be avoided
Data can be shared
Standards can be enforced
Restricting unauthorized access
Providing Backup and recovery
Disadvantages of DBMS
•
•
•
•
•
•
13
Complexity
Size
Performance
Higher impact of failure
Cost of DBMS
Additional Hardware costs
Architecture of DBMS
• The American National Standards Institute (ANSI)
Standards Planning and Requirements Committee
(SPARC) proposed a three layer model for DBMS.
• 3 Levels of DBMS are:– External Level
– Conceptual Level
– Internal Level
• This model was later known as ANSI SPARC
architecture of DBMS.
14
Levels of DBMS
• External level or User View
– Highest level of DBMS
– Closest to the user
– Concerned with the way data is viewed by individual
users
– Only those portions of the database that are of concern
to the user or application program are included.
– Same database can have different views for different
users.
– Each External view is described by means of a scheme
called an external Schema.
– The external schema consists of the definition of logical
records and the relationships in external view.
•
15
Note:-A schema/scheme is an outline or plan that describes the records
and relationships existing in the view.
Conceptual level or Logical View
• Level of indirection between external and internal
level.
• Describes what data are actually stored in the
database and the relationship that exist among the
data.
• Defined by Conceptual schema which describes all
records and relationships included in the database.
• There is only one Conceptual Schema per database.
• Defined by DBA.
16
Internal level or Physical View
• Lowest level of Architecture
• Closest to physical Storage i.e. how data is stored in the
database.
• Describes the data structures and Access methods to be
used by the database.
• Expressed by Internal Schema which contains the
definition of the stored record, the method of
representing the data fields and the access aids used.
• Defined by DBA.
17
View of Data
An architecture for a database system
18
3 Level ANSI Sparc Architecture of DBMS
END USER
EXTERNAL
LEVEL
END USER
EXTERNAL VIEW A
EXTERNAL VIEW B
EXTERNAL/CONCEPTUAL
MAPPING A
EXTERNAL/CONCEPTUAL
MAPPING B
DBMS
CONCEPTUAL
LEVEL
CONCEPTUAL VIEW
CONCEPTUAL/INTERNAL
MAPPING
INTERNAL
LEVEL
19
STORED DATABASE (INTERNAL VIEW)
Mapping b/w Views
• External/Conceptual Mapping
– Each external schema is related to conceptual
schema by external /conceptual mapping
– It gives the correspondence among records and
relationships of external/conceptual views.
• Conceptual/Internal Mapping
– Conceptual schema is related to Internal Schema by
20
Conceptual / internal mapping
– This enables the DBMS to find the actual record or
combination of records in physical storage that
constitute a logical record in conceptual schema.
– Mapping b/w two levels specifies the method of
deriving the conceptual record from physical
database.
Data Abstraction
• Major purpose of DBMS is to provide an abstract view
of data
• i.e. system hides certain details of how data is stored
and maintained.
• Since many users are not computer trained complexity
is hidden from them through several levels of
abstraction.
• 3 levels of abstraction
– Internal Level or Physical Level
– Conceptual Level or Logical Level
– External Level or View Level
21
Levels of Abstraction
• Physical level describes how a record (e.g.,
customer) is stored.
• Logical level: describes data stored in database,
and the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
• View level: application programs hide details of
data types.
22
Instances and Schemas
• Schema – the logical structure of the database or overall
description of the database.
– e.g., the database consists of information about a set of
customers and accounts and the relationship between them)
– Analogous to type information of a variable in a program
– Physical schema: database design at the physical level
– Logical schema: database design at the logical level
– Subschema: External view is defined by a subschema
• Instance – the actual content of the database at a particular
point in time
– Analogous to the value of a variable
23
DATA INDEPENDENCE
• Ability to modify schema definition in one level without
effecting a schema definition in next higher level is Data
Independence.
• Physical Data Independence – the ability to modify the
physical schema without changing the logical schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels
and components should be well defined so that
changes in some parts do not seriously influence
others.
• Logical Data Independence – indicates that conceptual
schema can be changed without affecting existing
external schema
24
Database Administrator(DBA)
• A person or group of persons responsible for overall
control of database systems.
• Main responsibilities of DBA are:
– Deciding the information content of the database
– Deciding the storage structures and access strategy
– Deciding authorization checks and validation
procedures
– Granting of authorization for data access
– Defining a strategy for backup and recovery
– Monitoring performance and responding to changes
in requirements.
25
Data Dictionary
• A metadata (Data Dictionary) is the data about the
data.
• Also known as system catalog.
• DD is used to control the database operation, data
integrity and accuracy.
• Contains information regarding database
• Provides the name of the data element, its description
and data structure in which it may be found.
• Provides great assistance in producing a report of
where a data element is used in all programs that
mention it.
26
Database Languages
• Data Definition Languages (DDL)
– It is a language that allows the users to define data and their
relationship to other types of data.
– It is mainly used to create files, databases , data dictionary and
tables within databases.
• Data Manipulation Languages (DML)
– It is a language that provides a set of operations to support the
–
•
•
•
•
basic data manipulation operations on the data held in the
databases.
By manipulation we mean:
to retrieve information stored in the database
To insert new information in the database
To delete unwanted information from the database
To modify existing data in the database
• Data Control Language (DCL)
– DCL helps to control access to data and the database using
27
statements grant and revoke
Types of Database Systems
• On the basis of no. of users
– Single user DBMS
– Multi user DBMS
• On the basis of Site Location
– Centralized DBMS
– Parallel DBMS
– Distributed DBMS
– Client/Server DBMS
28
Data Models
• A model is a representation of reality,' real world’
objects and events, and their associations.
• An integrated collection of concepts for describing and
manipulating
– data
– data relationships
– data semantics
– data constraints
• A data model comprises of 3 components
– A structural part
– A manipulative part
– A set of integrity rules
29
Types of Data Models
• Basically data models fall into 3 broad categories
– Object based Data Models
• Entity-Relationship model
• Object Oriented Model
– Physical Data Model
– Record based Data Model
• Hierarchical Model
• Network Model
• Relational Model
30
Entity-Relationship Model
Example of schema in the entity-relationship model
31
Entity Relationship Model (Cont.)
• E-R model of real world
– Entities (objects)
• E.g. customers, accounts, bank branch
– Relationships between entities
• E.g. Account A-101 is held by customer Johnson
• Relationship set depositor associates customers
with accounts
• Widely used for database design
– Database design in E-R model usually converted to
design in the relational model (coming up next)
which is used for storage and processing
32
Relational Model
Attributes
• Example of tabular data in the relational model
Customer
-id
customername
192-83-7465
Johnson
South
Palo Alto
A-101
019-28-3746
Smith
North
Rye
A-215
192-83-7465
Johnson
South
Palo Alto
A-201
321-12-3123
Jones
West
Harrison
A-217
019-28-3746
Smith
North
Rye
A-201
33
customerstreet
customercity
accountnumber
A Sample Relational Database
34
Data Definition Language (DDL)
• Specification notation for defining the database schema
– E.g.
create table account (
account-number char(10),
balance
integer)
• DDL compiler generates a set of tables stored in a data
dictionary
• Data dictionary contains metadata (i.e., data about data)
– database schema
– Data storage and definition language
• language in which the storage structure and access
methods used by the database system are specified
• Usually an extension of the data definition language
35
Data Manipulation Language (DML)
• Language for accessing and manipulating the data
organized by the appropriate data model
– DML also known as query language
• Two classes of languages
– Procedural – user specifies what data is required
and how to get those data
– Nonprocedural – user specifies what data is
required without specifying how to get those data
• SQL is the most widely used query language
36
DATA MODELS
SECTION B
ENTITY RELATIONSHIP MODEL
Entity-Relationship Model
•
•
•
•
•
•
•
•
•
Entity Sets
Relationship Sets
Design Issues
Mapping Constraints
Keys
E-R Diagram
Extended E-R Features
Design of an E-R Database Schema
Reduction of an E-R Schema to Tables
Entity Sets
• A database can be modeled as:
– a collection of entities,
– relationship among entities.
• An entity is an object that exists and is
distinguishable from other objects.
– Example: specific person, company, event,
plant
• Entities have attributes
– Example: people have names and addresses
• An entity set is a set of entities of the same type
that share the same properties.
– Example: set of all persons, companies, trees,
holidays
Entity Sets customer and loan
Custid
CustCustname
street
Custcity
loan-
amt
number
Attributes
• An entity is represented by a set of attributes, that is
descriptive properties possessed by all members of an
Example:
entity set.
customer = (customer-id, customer-name,
customer-street, customer-city)
loan = (loan-number, amount)
• Domain – the set of permitted values for each attribute
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• E.g. multivalued attribute: phone-numbers
– Derived attributes
• Can be computed from other attributes
• E.g. age, given date of birth
Composite Attributes
Relationship Sets
• A relationship is an association among several
entities
Example:
Hayes
depositor
A-102
customer entity relationship set account entity
• A relationship set is a mathematical relation among
n 2 entities, each taken from entity sets
{(e1, e2, … en) | e1 E1, e2 E2, …, en En}
where (e1, e2, …, en) is a relationship
– Example:
(Hayes, A-102) depositor
Relationship Set borrower
Relationship Sets (Cont.)
• An attribute can also be property of a relationship set.
• For instance, the depositor relationship set between entity sets
customer and account may have the attribute access-date
Degree of a Relationship Set
• Refers to number of entity sets that participate in a
relationship set.
• Relationship sets that involve two entity sets are
binary (or degree two). Generally, most relationship
sets in a database system are binary.
• Relationship sets may involve more than two entity
sets.
E.g. Suppose employees of a bank may have jobs
(responsibilities) at multiple branches, with different jobs
at different branches. Then there is a ternary relationship
set between entity sets employee, job and branch
• Relationships between more than two entity sets are
rare. Most relationships are binary. (More on this
later.)
Mapping Cardinalities
• Express the number of entities to which another
entity can be associated via a relationship set.
• Most useful in describing binary relationship
sets.
• For a binary relationship set the mapping
cardinality must be one of the following types:
– One to one
– One to many
– Many to one
– Many to many
Mapping Cardinalities
One to one
One to many
Note: Some elements in A and B may not be mapped to any
elements in the other set
Mapping Cardinalities
Many to one
Many to many
Note: Some elements in A and B may not be mapped to any
elements in the other set
Mapping Cardinalities affect ER Design
Can make access-date an attribute of account, instead of a
relationship attribute, if each account can have only one
customer i.e., the relationship from account to customer is
many to one, or equivalently, customer to account is one to
many
E-R Diagrams
Rectangles represent entity sets.
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to
relationship sets.
Ellipses represent attributes
Double ellipses represent multivalve attributes.
Dashed ellipses denote derived attributes.
Underline indicates primary key attributes
E-R Diagram With Composite,
Multivalued, and Derived Attributes
Relationship Sets with Attributes
Roles
• Entity sets of a relationship need not be distinct
• The labels “manager” and “worker” are called roles; they
specify how employee entities interact via the works-for
relationship set.
• Roles are indicated in E-R diagrams by labeling the lines that
connect diamonds to rectangles.
• Role labels are optional, and are used to clarify semantics of
the relationship
Cardinality Constraints
• We express cardinality constraints by drawing either a directed
line (), signifying “one,” or an undirected line (—), signifying
“many,” between the relationship set and the entity set.
• E.g.: One-to-one relationship:
– A customer is associated with at most one loan via the
relationship borrower
– A loan is associated with at most one customer via
borrower
One-To-Many Relationship
• In the one-to-many relationship a loan is associated
with at most one customer via borrower, a customer is
associated with several (including 0) loans via borrower
Many-To-One Relationships
• In a many-to-one relationship a loan is associated with
several (including 0) customers via borrower, a customer
is associated with at most one loan via borrower
Many-To-Many Relationship
• A customer is associated with several (possibly 0) loans
via borrower
• A loan is associated with several (possibly 0) customers
via borrower
Participation of an Entity Set in a
Relationship Set
Total participation (indicated by double line):
every entity in the entity set participates in at
least one relationship in the relationship set
E.g. participation of loan in borrower is total
every loan must have a customer
associated to it via borrower
Partial participation: some entities may not
participate in any relationship in the relationship
set
E.g. participation of customer in borrower is
partial
Contd..
Alternative Notation for Cardinality
Limits
Cardinality limits can also express participation
constraints
Keys
• A super key of an entity set is a set of one or more
attributes whose values uniquely determine each
entity.
• A candidate key of an entity set is a minimal super
key
– Customer-id is candidate key of customer
– account-number is candidate key of account
• Although several candidate keys may exist, one of
the candidate keys is selected to be the primary
key.
Keys for Relationship Sets
• The combination of primary keys of the participating entity
sets forms a super key of a relationship set.
– (customer-id, account-number) is the super key of
depositor
– NOTE: this means a pair of entity sets can have at most
one relationship in a particular relationship set.
• E.g. if we wish to track all access-dates to each
account by each customer, we cannot assume a
relationship for each access.
We can use a
multivalued attribute though
• Must consider the mapping cardinality of the relationship set
when deciding the what are the candidate keys
• Need to consider semantics of relationship set in selecting
the primary key in case of more than one candidate key
E-R Diagram with a Ternary
Relationship
Cardinality Constraints on Ternary
Relationship
• We allow at most one arrow out of a ternary (or greater degree)
relationship to indicate a cardinality constraint
• E.g. an arrow from works-on to job indicates each employee
works on at most one job at any branch.
• If there is more than one arrow, there are two ways of defining
the meaning.
– E.g a ternary relationship R between A, B and C with arrows to
B and C could mean
– 1. each A entity is associated with a unique entity from B and
C or
– 2. each pair of entities from (A, B) is associated with a unique
C entity, and each pair (A, C) is associated with a unique B
– Each alternative has been used in different formalisms
– To avoid confusion we outlaw more than one arrow
Design Issues
• Use of entity sets vs. attributes
Choice mainly depends on the structure of the
enterprise being modeled, and on the semantics
associated with the attribute in question.
• Use of entity sets vs. relationship sets
Possible guideline is to designate a relationship set to
describe an action that occurs between entities
• Binary versus n-ary relationship sets
Although it is possible to replace any nonbinary (n-ary,
for n > 2) relationship set by a number of distinct binary
relationship sets, a n-ary relationship set shows more
clearly that several entities participate in a single
relationship.
• Placement of relationship attributes
How about doing an ER design
interactively on the board?
Suggest an application to be modeled.
Weak Entity Sets
• An entity set that does not have a primary key is referred to
as a weak entity set.
• The existence of a weak entity set depends on the existence
of a identifying entity set
– it must relate to the identifying entity set via a total,
one-to-many relationship set from the identifying to the
weak entity set
– Identifying relationship depicted using a double diamond
• The discriminator (or partial key) of a weak entity set is the
set of attributes that distinguishes among all the entities of a
weak entity set.
• The primary key of a weak entity set is formed by the
primary key of the strong entity set on which the weak
entity set is existence dependent, plus the weak entity set’s
discriminator.
Weak Entity Sets (Cont.)
• We depict a weak entity set by double rectangles.
• We underline the discriminator of a weak entity set with a
dashed line.
• payment-number – discriminator of the payment entity set
• Primary key for payment – (loan-number, payment-number)
Weak Entity Sets (Cont.)
• Note: the primary key of the strong entity set is not
explicitly stored with the weak entity set, since it is
implicit in the identifying relationship.
• If loan-number were explicitly stored, payment could
be made a strong entity, but then the relationship
between payment and loan would be duplicated by an
implicit relationship defined by the attribute loannumber common to payment and loan
More Weak Entity Set Examples
• In a university, a course is a strong entity and a courseoffering can be modeled as a weak entity
• The discriminator of course-offering would be
semester (including year) and section-number (if there
is more than one section)
• If we model course-offering as a strong entity we
would model course-number as an attribute.
Then the relationship with course would be implicit in
the course-number attribute
Specialization
• Top-down design process; we designate subgroupings
within an entity set that are distinctive from other
entities in the set.
• These subgroupings become lower-level entity sets that
have attributes or participate in relationships that do
not apply to the higher-level entity set.
• Depicted by a triangle component labeled ISA (E.g.
customer “is a” person).
• Attribute inheritance – a lower-level entity set inherits
all the attributes and relationship participation of the
higher-level entity set to which it is linked.
Specialization Example
Generalization
• A bottom-up design process – combine a number of
entity sets that share the same features into a higherlevel entity set.
• Specialization and generalization are simple
inversions of each other; they are represented in an
E-R diagram in the same way.
• The terms specialization and generalization are used
interchangeably.
Specialization and Generalization
(Contd.)
• Can have multiple specializations of an entity set based
on different features.
• E.g. permanent-employee vs. temporary-employee, in
addition to officer vs. secretary vs. teller
• Each particular employee would be
– a member of one of permanent-employee or
temporary-employee,
– and also a member of one of officer, secretary, or
teller
• The ISA relationship also referred to as superclass subclass relationship
Design Constraints on a
Specialization/Generalization
• Constraint on which entities can be members of a given
lower-level entity set.
– condition-defined
• E.g. all customers over 65 years are members of
senior-citizen entity set; senior-citizen ISA person.
– user-defined
• Constraint on whether or not entities may belong to
more than one lower-level entity set within a single
generalization.
– Disjoint
• an entity can belong to only one lower-level entity
set
• Noted in E-R diagram by writing disjoint next to the
ISA triangle
– Overlapping
• an entity can belong to more than one lower-level
entity set
Design Constraints on a
Specialization/Generalization (Contd.)
• Completeness constraint -- specifies whether or not an
entity in the higher-level entity set must belong to at least
one of the lower-level entity sets within a generalization.
– total : an entity must belong to one of the lower-level
entity sets
– partial: an entity need not belong to one of the lowerlevel entity sets
Aggregation
Consider the ternary relationship works-on, which we saw
earlier
Suppose we want to record managers for tasks performed by
an
employee at a branch
Aggregation (Cont.)
• Relationship sets works-on and manages represent overlapping
information
– Every manages relationship corresponds to a works-on
relationship
– However, some works-on relationships may not correspond to
any manages relationships
• So we can’t discard the works-on relationship
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity
• Without introducing redundancy, the following diagram
represents:
– An employee works on a particular job at a particular branch
– An employee, branch, job combination may have an
associated manager
E-R Diagram With Aggregation
E-R Design Decisions
• The use of an attribute or entity set to represent an
object.
• Whether a real-world concept is best expressed by an
entity set or a relationship set.
• The use of a ternary relationship versus a pair of binary
relationships.
• The use of a strong or weak entity set.
• The use of specialization/generalization – contributes
to modularity in the design.
• The use of aggregation – can treat the aggregate entity
set as a single unit without concern for the details of its
internal structure.
E-R Diagram for a Banking Enterprise
How about doing another ER design
interactively on the board?
Summary of Symbols Used in E-R
Notation
Summary of Symbols (Cont.)
Alternative E-R Notations
Reduction of an E-R Schema to Tables
• Primary keys allow entity sets and relationship sets to
be expressed uniformly as tables which represent the
contents of the database.
• A database which conforms to an E-R diagram can be
represented by a collection of tables.
• For each entity set and relationship set there is a
unique table which is assigned the name of the
corresponding entity set or relationship set.
• Each table has a number of columns (generally
corresponding to attributes), which have unique
names.
• Converting an E-R diagram to a table format is the
basis for deriving a relational database design from an
E-R diagram.
Representing Entity Sets as Tables
• A strong entity set reduces to a table with the
same attributes.
Composite and Multivalve Attributes
• Composite attributes are flattened out by creating a separate
attribute for each component attribute
– E.g. given entity set customer with composite attribute
name with component attributes first-name and last-name
the table corresponding to the entity set has two
attributes
name.first-name and name.last-name
• A multivalued attribute M of an entity E is represented by a
separate table EM
– Table EM has attributes corresponding to the primary key
of E and an attribute corresponding to multivalued
attribute M
– E.g. Multivalued attribute dependent-names of employee
is represented by a table
employee-dependent-names( employee-id, dname)
Composite and Multivalued
Attributes(Cont.)
– Each value of the multivalued attribute maps to a
separate row of the table EM
• E.g., an employee entity with primary key John
and
dependents Johnson and Johndotir maps to
two rows:
(John, Johnson) and (John, Johndotir)
Representing Weak Entity Sets
A weak entity set becomes a table that includes a
column for the primary key of the identifying strong
entity set
Representing Relationship Sets as Tables
• A many-to-many relationship set is represented as a
table with columns for the primary keys of the two
participating entity sets, and any descriptive attributes
of the relationship set.
• E.g.: table for relationship set borrower
Redundancy of Tables
Many-to-one and one-to-many relationship sets
that are total on the many-side can be
represented by adding an extra attribute to the
many side, containing the primary key of the one
side
E.g.: Instead of creating a table for relationship
account-branch, add an attribute branch to the
entity set account
Redundancy of Tables (Cont.)
• For one-to-one relationship sets, either side can be
chosen to act as the “many” side
– That is, extra attribute can be added to either of
the tables corresponding to the two entity sets
• If participation is partial on the many side, replacing a
table by an extra attribute in the relation
corresponding to the “many” side could result in null
values
• The table corresponding to a relationship set linking a
weak entity set to its identifying strong entity set is
redundant.
– E.g. The payment table already contains the
information that would appear in the loan-payment
table (i.e., the columns loan-number and paymentnumber).
Hierarchical Data Model
GRAND PARENTS
PARENTS
CHILDREN
PARENTS
CHILDREN
PARENTS
CHILDREN
CHILDREN
Hierarchical Data Model
• Developed in late 1950s by North American Rockwell Company and
IBM.
• Tree like structure with records forming nodes and fields forming
branches of the tree.
• Organizes data elements as tabular rows, one for each instance of
entity.
• Example :Consider a company’s organizational structure
GENERAL MANAGER
SALES
DEPT
EMP1
EMP2
DEPUTY
GM
DEPUTY
GM
ADVT
DEPT
PROD DEPT
EMP1
DEPUTY
GM
INVENTORY
DEPT
EMP1
EMP1
EMP2
EMP2
Characteristics of Hierarchical Model
• Relationship is represented as a parent-child .
• Each Hierarchical tree can have one root
record type and this root record type does not
have a parent record type.
• Root can have any no. of child record types
and each of which can itself be a root of
hierarchical subtree.
• Each child record type can have only one
parent record type. Thus a many to many
relationship cannot be directly expressed b/w
two record types.
• Data in a parent record type applies to all its
children records.
• A child record occurrence must have a parent
record occurrence. Deleting a parent record
occurrence requires deleting all its children
record occurrences.
Hierarchical View for Database
• Consider a Supplier-Parts Database
• The records for suppliers, parts and shipments are
given in following tables.
Sno Name
Status
City
S1
Suneet
20
Mohali
S2
Ankit
10
Amritsar
S3
Amit
10
Amritsar
Supplier table
Pno
Name
Color
City
P1
Nut
Red
Mohali
P2
Bolt
Green
Amritsar
P3
Screw
Blue
Jalandhar
P4
Screw
Red
Mohali
Parts Table
Hierarchical Database
Shipment Table
Sno
Pno
Qty
S1
P1
250
S1
P2
300
S1
P3
500
S2
P1
250
S2
P2
500
S3
P2
300
P3
Screw
Blue
S1
Suneet
20
P2
Screw
Red
Mohali
Green
500
Amritsar
S1
Suneet
20 Mohali
S2
Ankit
10
Amritsar 500
S3
Amit
10
Amritsar 300
P1
P4
Bolt
Jalandhar
Nut
Red
300
Mohali
Mohali
S1
Suneet
20
Mohali
250
S2
Ankit
10
Amritsar 250
Operations on Hierarchical Model
• Four basic operations that can be
performed on Hierarchical model are
– Insert Operation
– Update Operation
– Delete Operation
– Retrieve Operation
Advantages
• Advantages
– Simplicity
– Data Security
– Data Integrity
– Efficiency
Disadvantages
– Implementation Complexity
– Database Management problems
– Lack of Structural Independence
– Program Complexity
– Operational Anomalies
– Implementation Limitation
Network Database Model
• Network Model replaces the hierarchical tree
with a graph.
• Thus allowing a more general connections
among the nodes.
• Relationship is represented as owner-member
type.
• Main difference of Network model from
Hierarchical model is its ability to handle Many
to Many relationships (M:N).
• In other words ,it allows a record to have more
than one owner.
Terminology of Network Model
• A relationship is a set
• Each set is made up of two types of records: an owner
record and a member record.
• An owner can have any no. of member records .
• Each member can have more than one owner record
types.
• Thus 1:1,1:M,M:N relationship among entities is
allowed.
• Member and owner records are connected with each
other with a third record type known as connector.
• A connector occurrence specifies the association b/w
member and owner records.
Network view of Database
S1
Suneet
250
P1 Nut Red
Mohali
20
Mohali
300
S2
Ankit
500
P2 Bolt Green
Amritsar
10
250
Amritsar
S3
500
P3 Screw Blue
Jalandhar
Amit
10
Amritsar
300
P4 Screw Red
Mohali
Operations on Network Model
•
•
•
•
Insert operation
Update operation
Delete operation
Retrieve operation
Advantages & Disadvantages
• Advantages
– Conceptual Simplicity
–
–
–
–
–
•
Capability to handle more relationship types
Ease of data access
Data Integrity
Data Independence
Database standards
Disadvantages
– System Complexity
– Operational anomalies
– Absence of structural Independence
Relational Model
•
•
•
•
•
Structure of Relational Databases
Relational Algebra
Tuple Relational Calculus
Domain Relational Calculus
Extended Relational-AlgebraOperations
• Modification of the Database
• Views
Relational Model
• Concept was proposed by Dr. E.F. Codd, in 1960s.
• Data is stored in the form of tables.
• Relational model consists of 3 major components:– Set of relations and set of domains (Data
Structure)
– Integrity Rules (Data Integrity)
– Operations (Data Manipulation)
• Characteristics of Relational Database
– Whole data is conceptually represented as an
orderly arrangement of data into rows and
columns called a relation or table.
– All values are scalar.
– All operations are performed on an entire relation
and result is an entire relation.
Basic terminology used in Relational Model
• Tuple- Each row of data is a tuple. Actually each row
is an n-tuple (n=1,2,3……).
• Cardinality of a relation -No. of tuples in a relation is
cardinality.
• Attributes-Each column in a relation is an attribute.
• Degree of a relation-No. of attributes in a relation
determines its degree.
• Domain- Set of all possible values that an attribute
may contain.
• Body of a relation-The body of a relation consists of
unordered set of zero or more tuples.
Keys of a Relation
• Primary key- Is an attribute which possess the
property of uniqueness and irreducibility.
• Foreign key-is an attribute or set of attributes which
helps to establish relationship b/w two tables.
Operations in Relational Model
– Insert operation
– Update operation
– Delete operation
– Retrieve operation
Advantages & Disadvantages
• Advantages
– Structural Independence
– Conceptual Simplicity
– Design, Implementation, maintenance and usage
ease
– Adhoc query capability
•
Disadvantages
– Hardware overheads
– Ease of design lead to bad design
Example of a Relation
RDBMS
• Concept was given by Dr. E.F.Codd.
• In RDBMS, data is stored in relations i.e. in the form of tables.
• Relational Database Management System Components– Data Structure
– Data Integrity
– Data Manipulation
•
Relational data structure
–
–
–
–
–
–
Relation
Attribute
Domain
Tuple
Cardinality
Degree
• Extension of a relation
– The set of tuples appearing in that relation at any given
instant of time.
– Extension varies with time.
– It changes as tuples are created, destroyed and
updated.
– It is same as view of a table.
• Intension of a relation
– It is permanent part of a relation and independent of
time.
– Intension is combination of two things
• Naming structure – consists of relation name ,name
of attributes.
• Integrity constraints- key constraints, referential
constraints etc.
Data Integrity
• Relational Keys
– What is a key???
• Key is a set of one or more columns whose
combined values are unique among all
occurrences in a given table.
• Types of Keys
– Primary key
– Foreign key
– Super key
– Candidate key
– Alternate key
Integrity Rules
• There are two important rules which are constraints
or restrictions that apply to all instances of the
database.
• Two integrity rules are
– Entity Integrity Rule
– Referential Integrity Rule
• Entity Integrity Rule
– It states that in a base relation, value of attribute
of a primary key cannot be null.
• Referential Integrity Rule
– It states that if a foreign key exists in a relation,
either the foreign key value must match values in
primary key or the foreign key value must be null.
Relational Data Manipulation
• The manipulative part of the relational model
consists of a set of operators known as Relational
Algebra and Relational Calculus.
• Relational Algebra is a procedural language that can
be used to tell DBMS how to build a new relation
from one or more relations in the database.
• Relational Calculus is a non procedural language that
can be used to formulate the definition of a relation
in terms of one or more database relations.
• Note :- Relational Algebra and Calculus will be
discussed in detail later.
CODD’s Rules
•
•
•
•
Dr. E.F.Codd, founder of relational database systems,
framed 12 rules to assist a database to qualify as
relational.
An RDBMS has to satisfy at least six of the 12 rules to be
accepted as full fledged RDBMS.
There is no RDBMS package commercially available that
satisfies all the 12 rules.
These rules are1. Information Rule:- All information in a relational
database is represented in the form of tables. It speeds
up design and learning process. User productivity is
improved since knowledge of only one language is
necessary to access all data.
2. Guaranteed Access Rule:- Every piece of data in a
relational database, can be accessed using a primary
key value that identifies the row & column.
3. Comprehensive Data Sub-language Rule:- The RDBMS may support
several languages. But at least one of them should allow the user
to do all the following: define tables and views, query and update
data, set integrity constraints, set authorizations etc.
4. View updating Rule:-Any view that can be updated theoretically can
be updated using RDBMS. Data consistency is ensured since the
changes made in the view are transmitted in the base table and
vice-versa.
5. High level insert, Update and delete:-RDBMS supports insertion,
deletion and updation at table level.
6. Physical Data Independence
7. Logical Data Independence
8. Integrity Independence:-Like table definitions, integrity constraints
are stored in the on line catalog or data dictionary and can
therefore be changed without necessitating changes in the
application programs.
9. Non subversion rule:- If RDBMS has a language that accesses
information of a record at a time, then that language should not
be used to bypass the integrity constraints.
10. Systematic treatment of Null values:-In RDBMS, null values
should be supported for representation of missing or
inapplicable information.
11. Database description rule:- The description of a database is
stored and maintained in the form of tables. This implies that
Data dictionary must be present within the RDBMS.
12. Distributed Independence:- The RDBMS package must have
distribution independence. It helps to distribute database across
multiple computers even though they are having heterogeneous
platforms both for hardware and operating system. This is the
most attractive aspects of RDBMS.
Basic Structure
• Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus a relation is a set of n-tuples (a1, a2, …, an) where
each ai Di
• Example: if
customer-name = {Jones, Smith, Curry, Lindsay}
customer-street = {Main, North, Park}
customer-city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield)}
is a relation over customer-name x customer-street x
customer-city
Relation Schema
• A1, A2, …, An are attributes
• R = (A1, A2, …, An ) is a relation schema
E.g. Customer-schema =
(customer-name, customer-street,
customer-city)
• r(R) is a relation on the relation schema R
E.g.
customer (Customer-schema)
Relation Instance
• The current values (relation instance) of a relation
are specified by a table
• An element t of r is a tuple, represented by a row in a
table
attributes
(or columns)
customer-name customer-street
Jones
Smith
Curry
Lindsay
Main
North
North
Park
customer
customer-city
Harrison
Rye
Rye
Pittsfield
tuples
(or rows)
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an
arbitrary order)
E.g. account relation with unordered tuples
Query Languages
• Language in which user requests information from
the database.
• Categories of languages
– procedural
– non-procedural
• “Pure” languages:
– Relational Algebra
– Tuple Relational Calculus
– Domain Relational Calculus
• Pure languages form underlying basis of query
languages that people use.
Relational Algebra
• Procedural language
• User has to specify what is required and what are
the steps to obtain the required output.
• The operators take one or more relations as inputs
and give a new relation as a result.
• Two types of relational operators– Traditional Set operators
– Special operators
• Formal and non-user friendly language.
Traditional Set Operators
• Traditional set operators are:–
–
–
–
Union
Intersection
Difference
Cartesian Product
• Union:-A union of two relations is a relation that
contain all data elements belonging to both relations
while writing the duplicate values only once.
– In order to perform the union operation , both operand
relations must be union compatible i.e. they must have
same no. of columns drawn from same domain (i.e. must
be of same data type).
– It is represented by symbol ‘U’.
Union
Cust_name
Cust_status
Cust_name
Cust_status
Sham
Good
Karan
Bad
Rahul
Excellent
Sham
Good
S
R
Cust_name
Cust_status
Sham
Good
Rahul
Excellent
Karan
Bad
RUS
Intersection
• Intersection of two relations consists of a third
relation with all rows that are common to both
relations.
• It is represented by symbol ‘∩’.
• Taking above example, R ∩ S is
Cust_name
Cust_status
Sham
Good
R∩S
Difference
• Difference of two relations consist of a third relation
which consists of elements which are present in one
relation and not in another.
• It is represented using symbol ‘-’.
• Minus is not associative.
Cust_name
Cust_status
Cust_name
Cust_status
Rahul
Excellent
Karan
Bad
R-S
S-R
Cartesian Product
• Cartesian product of two relations include each
row of one relation is associated with each row of
another relation.
• It is denoted by cross (X).
A B
C
D
A
B
C
D
α 1
α
10
α
1
α
10
β 2
β
20
α
1
β
20
α
11
α
1
α
11
α
24
α
1
α
24
β
2
α
10
β
2
β
20
β
2
α
11
β
2
α
24
R
S
RXS
Special Relational Operations
• Four special relational operators are:– Selection
– Projection
– Join
– Division
• Selection:- Yields a horizontal subset of a given
relation.
• Those tuples or rows of a table are selected which
satisfy a particular condition (predicate).
• It is represented by Greek letter sigma ‘σ’.
• Notation: p (r)
• p is called the selection predicate.
Selection
• E.g. Suppose from supplier table, we have to select
values where city is
‘Mohali’.
Sno
Sname
Status
City
S1
Suneet
20
Mohali
S2
Ankit
10
Amritsar
S3
Amit
30
Amritsar
S4
Raj
20
Chandigarh
The query is like
•
σcity= ‘Mohali’ (Supplier)
Another example
•
branch-name=“Perryridge”
(account)
Projection
• Projection operation on a table simply forms another table by
copying specified columns from original table while eliminating
any duplicated rows.
• Projection is denoted by ‘π’
• Consider the following employee table.
Empno Ename
Age
Sal
E01
Smith
25
7500
E02
John
36
10000
E03
Allen
42
10000
• Projections of Age
Age
25
36
42
πAge (employee)
Project Operation – Example
• Relation r:
A
A,C
B
C
10
1
20
1
30
1
40
2
(r)
A
C
A
1
1
1
1
2
C
=
1
2
Project Operation
• Notation:
A1, A2, …, Ak (r)
where A1, A2 are attribute names and r is a relation
name.
• The result is defined as the relation of k columns
obtained by erasing the columns that are not listed
• Duplicate rows removed from result, since relations
are sets
Natural-Join Operation
Notation: r
s
• Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R S obtained as follows:
– Consider each pair of tuples tr from r and ts from s.
– If tr and ts have the same value on each of the attributes in R
S, add a tuple t to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
– Result schema = (A, B, C, D, E)
– r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join Operation – Example
• Relations r, s:
A
B
C
1
2
4
1
2
D
B
D
a
a
b
a
b
1
3
1
2
3
a
a
a
b
b
r
r
s
s
A
B
1
1
1
1
2
C
D
a
a
a
a
b
E
E
Division Operation
• Is used to divide a relation A of degree m+n (no. of columns in a
relation) by relation B of degree n, produces a relation of degree m.
• It is denoted by ‘ ‘.
•
E.g.
A
Sno
Pno
S1
P1
S1
P2
S2
P1
S3
P2
S3
p3
A
Pn
o
Pno
P1
P2
P3
II
I
B
Sno
S1
S2
B
Sno
S3
II
I
Division Operation – Example
Relations r, s:
A
r
s:
A
r
B
B
1
2
3
1
1
1
3
4
6
1
2
1
2
s
Another Division Example
Relations r, s:
A
B
C
a
a
a
a
a
a
a
a
D
E
D
E
a
a
b
a
b
a
b
b
1
1
1
1
3
1
1
1
a
b
1
1
r
r
s:
A
B
a
a
C
s
Assignment Operation
• The result to the right of the is assigned to the
relation variable on the left of the .
• AB
• Value of B is assigned to A.
Aggregate Functions and Operations
• Aggregation function takes a collection of values and returns
a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Aggregate operation in relational algebra
G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E)
– E is any relational-algebra expression
– G1, G2 …, Gn is a list of attributes on which to group (can be
empty)
– Each Fi is an aggregate function
– Each Ai is an attribute name
Aggregate Operation – Example
• Relation r:
A
B
C
7
7
3
10
g sum(c) (r)
sum-C
27
Aggregate Operation – Example
• Relation account grouped by branch-name:
branch-name account-number
Perryridge
Perryridge
Brighton
Brighton
Redwood
branch-name
g
balance
A-102
A-201
A-217
A-215
A-222
sum(balance)
400
900
750
750
700
(account)
branch-name
Perryridge
Brighton
Redwood
balance
1300
1500
700
Aggregate Functions (Cont.)
• Result of aggregation does not have a name
– Can use rename operation to give it a name
– For convenience, we permit renaming as part of
aggregate operation
branch-name
g
sum(balance) as sum-balance (account)
Outer Join
• An extension of the join operation that avoids loss of
information.
• Computes the join and then adds tuples form one
relation that do not match tuples in the other relation
to the result of the join.
• Uses null values:
– null signifies that the value is unknown or does not
exist
– All comparisons involving null are (roughly
speaking) false by definition.
• Will study precise meaning of comparisons with
nulls later
Outer Join – Example
• Relation loan
loan-number
branch-name
L-170
L-230
L-260
Downtown
Redwood
Perryridge
amount
3000
4000
1700
Relation borrower
customer-name loan-number
Jones
Smith
Hayes
L-170
L-230
L-155
Outer Join – Example
• Inner Join
loan Borrower
loan-number branch-name
L-170
L-230
Downtown
Redwood
amount
customer-name
3000
4000
Jones
Smith
amount
customer-name
Left Outer Join
loan
loan-number
L-170
L-230
L-260
Borrower
branch-name
Downtown
Redwood
Perryridge
3000
4000
1700
Jones
Smith
null
Outer Join – Example
• Right Outer Join
loan
borrower
loan-number
L-170
L-230
L-155
branch-name
Downtown
Redwood
null
amount
3000
4000
null
customer-name
Jones
Smith
Hayes
Full Outer Join
loan
borrower
loan-number
L-170
L-230
L-260
L-155
branch-name
Downtown
Redwood
Perryridge
null
amount
3000
4000
1700
null
customer-name
Jones
Smith
null
Hayes
Null Values
• It is possible for tuples to have a null value, denoted by null,
for some of their attributes
• null signifies an unknown value or that a value does not exist.
• The result of any arithmetic expression involving null is null.
• Aggregate functions simply ignore null values
– Is an arbitrary decision. Could have returned null as result
instead.
– We follow the semantics of SQL in its handling of null
values
• For duplicate elimination and grouping, null is treated like
any other value, and two nulls are assumed to be the same
– Alternative: assume each null is different from each other
– Both are arbitrary decisions, so we simply follow SQL
Modification of the Database
• The content of the database may be modified using
the following operations:
– Deletion
– Insertion
– Updating
• All these operations are expressed using the
assignment operator.
Deletion
• A delete request is expressed similarly to a query,
except instead of displaying tuples to the user, the
selected tuples are removed from the database.
• Can delete only whole tuples; cannot delete values on
only particular attributes
• A deletion is expressed in relational algebra by:
rr–E
where r is a relation and E is a relational algebra
query.
Deletion Examples
• Delete all account records in the Perryridge branch.
account
account –
branch-name = “Perryridge”
(account)
Delete all loan records with amount in the range of 0 to 50
loan
loan –
amount
0 and amount
50 (loan)
Insertion
• To insert data into a relation, we either:
– specify a tuple to be inserted
– write a query whose result is a set of tuples to be
inserted
• in relational algebra, an insertion is expressed by:
r r E
where r is a relation and E is a relational algebra
expression.
• The insertion of a single tuple is expressed by letting
E be a constant relation containing one tuple.
Insertion Example
• Insert information in the database specifying that
Smith has $1200 in account A-973 at the Perryridge
branch.
account
account
{(“Perryridge”, A-973, 1200)}
depositor
depositor
{(“Smith”, A-973)}
Updating
• A mechanism to change a value in a tuple without
changing all values in the tuple
• Use the generalized projection operator to do this task
r F1, F2, …, FI, (r)
• Each Fi is either
– the ith attribute of r, if the ith attribute is not updated, or,
– if the attribute is to be updated Fi is an expression, involving
only constants and the attributes of r, which gives the new
value for the attribute
Update Examples
• Make interest payments by increasing all balances by 5
percent.
account
AN, BN, BAL * 1.05 (account)
where AN, BN and BAL stand for account-number, branch-name
and balance, respectively.
Pay all accounts with balances over $10,000 6 percent interest
and pay all others 5 percent
account
AN, BN, BAL * 1.06 (
AN, BN, BAL * 1.05 (
BAL
BAL
10000 (account))
10000 (account))
Relational Calculus
Tuple oriented relational calculus
Domain Oriented relational calculus
Relational Calculus
• Non procedural language
• User is not concerned with the procedure to obtain the results,
he/she just tell the results and output is available without
knowing the method of retrieval.
• A query is expressed as a formula consisting of a no. of variables
and an expression involving these variables.
• It is up to DBMS to transform these non procedural queries into
procedural queries.
• This concept was first proposed by Codd.
• It is based on predicate calculus.
• Propositions specifying a property consist of an expression that
names an individual object and another expression called the
predicate stand for the property that object possesses.
• E.g. MCA is a class
Object –MCA, India
India is a country.
Predicate- Class, Country
Contd..
• A convenient method of writing the statements is to place the
predicate first and follow it with the object enclosed in
parenthesis.
• E.g. MCA is a class can be written as
is a class (MCA) or
Class (MCA)
• Finally, if we use symbols for both the predicate and the object, we
can rewrite the statement as P(X) where P is the predicate and X is
the object.
• These are known as one place predicate.
• Two place predicate-require two objects.
• E.g. Naresh is taller than Rajesh
ABC is intelligent than XYZ
These can be written as
TALLER_THAN (Naresh, Rajesh)
INTELLIGENT_THAN (ABC,XYZ)
Atomic Formula
• A predicate followed by its arguments is called an
Atomic Formula.
• E.g. CLASS(MCA)
TALLER_THAN (Naresh ,Rajesh)
• Atomic formulas can also be combined using logical
connectors such as
NOT or Negation
┐
OR
V
AND
^
IMPLICATION
→
• E.g. oracle is a DBMS and ABC is a company
Can be written in atomic formula as
DBMS (oracle) ^ COMPANY(ABC)
Tuple Oriented Relational Calculus
• Based on specifying a number of tuple variables.
• A nonprocedural query language, where each query is of the
form
{t | P (t) }
• It is the set of all tuples t such that predicate P is true for t
• t is a tuple variable, t[A] denotes the value of tuple t on attribute
A
• t r denotes that tuple t is in relation r
• P is a formula similar to that of the predicate calculus.
• E.g. {t | Book (t) and t. Price>100}
Will get you all books whose price is greater than 100.It retrieves
attributes for each selected value.
• To retrieve only some of the attributes say TITLE,AUTHOR,PRICE,
Query is like
{ t.TITLE, t. AUTHOR, t. PRICE |BOOK (t) and t. PRICE>200}
CONTD..
• Thus in a tuple calculus, we need
– A condition to select the required tuples from the relation.
– A set of attributes to be retrieved.
• WFF (Well Formed Formula)
Every condition is a WFF.A WFF is constructed from
conditions, Boolean operations (AND,OR,NOT) and
quantifiers like for all ()
and there exists().
t r (Q(t)) ”there exists” a tuple in t in relation r
such that predicate Q(t) is true
t r (Q(t)) Q is true “for all” tuples t in relation r
Free and Bound variables
•
Bound Variables:- Bound variables are those range variables when
meaning of the formula remain unchanged if all the occurrence of
range variables say ‘x’ were replaced by some other variable say ‘y’.
Then x is called bound variable.
• E.g. x (x>3) means exists x (x>3)
• If x is replaced by ‘y’
exists y (y>3)
Meaning of the expression has not changed. So, x is a bound variable.
• Free Variables:- Free variables are those range variables if all the
occurrences of range variable say ‘x’ were replaced by some other
variable say ‘y’, meaning of the formula is changed .
• E.g. exists x (x>3) and y<0
• If x is replaced by ‘y’
Exists y (y>3) and y<0
Meaning of the expression is totally changed.
Open and Closed WFF
• Closed WFF:- A WFF in which all variables are bound is
Closed WFF.
• E.g. x (x>3) is closed WFF
• Open WFF:- A WFF in which at least one variable is
free variable is Open WFF.
• E.g. x (x>3) and y<0 is open WFF
Domain Oriented Relational Calculus
• In Domain Calculus the variables range over single
values from domains of attributes rather than ranging
over tuples.
• An expression of domain calculus is –
{ x1, x2, …, xn | P(x1, x2, …, xn)}
– x1, x2, …, xn represent domain variables
– P represents a formula similar to that of the predicate calculus
• Free and bound Variables-(rules same as tuple
calculus)
• E.g. Get emp no in relation emp
EX where emp (empno: EX)
• Get emp no for job ’Clerk’
EX where emp (empno: Ex, job=‘Clerk’)
Comparison of Relational Algebra and Relational Calculus
RELATIONAL ALGEBRA
RELATIONAL CALCULUS
1. It is a procedural method of 1. It is a non procedural method
solving the queries.
of solving the queries.
2. The solution is obtained by 2. The solution is obtained by
stating what is required and
what are the steps to obtain
that information.
stating what is required and
letting the system find the
answer.
3. It is used as a vehicle for 3. Relational Calculus queries
implementation
Relational Calculus.
4. Two
of
types-Traditional set
operators,
Special
operators.
are converted into equivalent
relational
algebra format
using
codd’s
Reduction
algorithm and than it is
implemented using relational
operators.
4. Two types- Tuple oriented ,
Domain oriented.
Relational Database Design
Normalization
Normalization
• A design technique used to design Relational Model.
• Can be defined as a process during which redundant
relation schemas are decomposed by breaking up their
attributes into smaller relation schemas that possess
desirable properties.
• Basic objective of normalization is to reduce
redundancy.
• Storing information several times leads to –
– Wastage of storage space
– High cost
– Inconsistencies
Properties of a Normalized Relation
• No data values should be duplicated in different rows
unnecessarily.
• A value must be specified for every attribute in a row.
• Each relation should be self- contained. In other words, if
a row is deleted, important information should not be
accidentally lost.
• When a row is added to a relation, other relations in a
database should not be affected.
• A value of an attribute in a tuple may be changed
independent of other tuples in the relation and other
relations.
Types of Normal Forms
•
•
•
•
•
•
First Normal Form(1NF)
Second Normal Form(2NF)
Third Normal Form(3NF)
Boyce- Codd Normal Form (BCNF)
Fourth Normal Form(4NF)
Fifth Normal Form(5NF)
5NF
4NF
BCNF
3NF
2NF
1NF
Dependencies
• Functional Dependency (FD):- An association between
two attributes of the same relational database.
• One attribute is called determinant and the other is
called determined.
• For each value of the determinant there is associated
one and only one value of the determined.
• If A is the determinant and B is determined ,then we
can say that A functionally determines B and is
represented as A → B.
For each value of A ,there is only one value
of B
A
B
1
1
2
4
3
9
4
16
Fully Functional Dependency (FFD)
• Is defined as attribute Y is FFD on attribute X if it is FD
on X and not FD on any proper subset of X.
• E.g. (A,B) → C
X
Y
• C depends on both A & B and not on A or B.
Multivalued Dependency(MVD)
• MVD is when one attribute value is potentially a multi
valued fact about another.
• If a relation R having A,B,C as attributes, B and C are
multivalued facts about A which is represented as
A→→B and A →→C, then multivalued dependency exist
only if B and C are independent of each other.
Transitive Dependency
• Assume that A,B,C are the attributes of a relation R.
If in a relation
A→B
B→C
Then A → C
• This is transitive dependency
First Normal Form
• A relation is in 1NF if and only if all underlying domains
contain atomic values or single value only.
Course
code
Course
name
Teacher
name
Roll no
Name
System
_used
Hourly
_rate
Total_
hrs
C1
Visual
Basic
ABC
100
A1
P-1
20
7
101
A2
P-II
30
3
Oracle
DEF
100
A1
P-I
20
7
104
A7
P-III
37
3
106
A7
P-IV
40
3
C2
C3
C++
KJP
Table -1 Unnormalized Relation
1NF
• Two approaches used to convert a relation to 1NF– Flattening the table
– Decomposition of the table
•
Flattening the table:- It removes repeating groups by filling in the
“missing” entries of each incomplete row of the table with copies of
their non – repeating attributes.
Course
code
Course
name
Teacher
name
Roll no
Name
System_us
ed
Hourly_
rate
Total_h
rs
C1
Visual Basic
ABC
100
A1
P-1
20
7
C1
Visual Basic
ABC
101
A2
P-II
30
3
C2
Oracle
DEF
100
A1
P-I
20
7
C2
Oracle
DEF
104
A7
P-III
37
3
C3
C++
KJP
106
A7
P-IV
40
3
Table 2 Normalized relation
1NF (Contd…)
2. Decomposition of the table:- In this original table is
decomposed into two new tables.
• Involves separating the attributes of the relation to
create the schemas of two new relations.
• Before decomposing the original table, it is necessary to
identify an attribute or set of attributes that can be
used as table identifier.
• Rule of decomposition
– One of the two tables contains the table identifier of
the original table and all the non- repeating
attributes,
– The other table contains a copy of the table
identifier and all the repeating attributes.
1NF (Contd…)
Cours
e code
Course
name
C1
Visual Basic ABC
C2
Oracle
DEF
C3
C++
KJP
PK
Table 4
Course_student
Teacher
name
Course
code
Table 3 Course
Roll
no
Name
Syste
m_us
ed
Hourl
y_rat
e
Total
_hrs
C1
100
A1
P-1
20
7
C1
101
A2
P-II
30
3
C2
100
A1
P-I
20
6
C2
104
A7
P-III
37
3
C3
106
A7
P-IV
40
3
PK
Second Normal Form
• A relation is said to be in 2NF if and only if it is in 1NF and every
non - key attribute is fully dependent on the primary key.
• Table 4 Course_student is in 1NF but not in 2NF because non key
attributes name, system_ used and hourly_ rate are not fully
dependent on PK (Course code, Rollno).
• To convert the above table in 2NF ,Rule is
The
A*
B*
C
D
A*
B*
+
A*
D
C
first relation contains the PK and the attributes that are fully
dependent on the PK.
Second relation contains the attributes that are partially dependent on
the key and the key attribute on which it is dependent.
2NF(Contd…)
• Transformation of Course_student into 2NF
Course
Code
Roll
no
Total
_hrs
C1
100
7
C1
101
3
C2
100
6
C2
104
3
c3
106
3
Table 6
System
Charge
Contains
attributes that
are partially
dependent
Table 5 Hours_assigned
Contains attribute Total_hrs which is fully
dependent on PK (Course code, Rollno)
Roll no
Name
System_used
Hourly_rate
100
A1
P-I
20
101
A2
P-II
30
104
A7
P-III
37
106
A7
P-IV
40
Third Normal Form
• A relation is said to be in 3NF if it satisfies the following
conditions simultaneously –
–
R is already in 2NF
– No non prime attribute is transitively dependent on the key i.e
no non prime attribute functionally determines any other non
–prime attribute.
• Table 6 System charge is not in 3NF because there is
transitive dependency of a non prime attribute on the
PK of the relation.
• In table 6 ,non-prime attribute Hourly_rate is
transitively dependent on the pk rollno through FD.
Rollno → System_used
System_used → Hourly_rate
=>
Rollno → Hourly_rate
This is transitive dependency
3NF(Contd…)
• Rule to convert to 3NF
A*
B
C
A*
B*
+
B
C
Table 6 can be represented in 3NF by dividing it into two relations.
First relation contains the pk and the attributes that are
dependent
on it.
Second relation contains the non-prime attribute and another
non- prime attribute on which it is FD and making it as pk.
3NF(Contd…)
Rollno
Name
System
_used
100
A1
P-I
101
A2
P-II
104
A7
P-III
106
A7
P-IV
PK
Table 8 Charges
Table 7 System used
System_used
Hourly_rate
P-I
20
P-II
30
P-III
37
P-IV
40
PK
Boyce Codd Normal Form
• BCNF states that –
A relation R is said to be in BCNF if and only if every
determinant is a candidate key.
• Consider table 6, System Charge (Course code, Rollno,
Name, System_used, Hourly_rate, Total_hrs)
(Course code, Rollno) → Total_hrs
Rollno → (Name, System_used, Hourly_rate)
Rollno is a determinant but not a candidate key.
System_used is again a determinant but not unique.
So, Table 4 is not in BCNF.
BCNF (Contd…)
• To make it in BCNF ,divide it into three tables• First table Includes Course_time (Course code,
rollno,Total_hrs)
• Second table includes Student_system (Rollno,
system_used, Hourly_rate)
• Third includes Charge (System_used, Hourly_rate)
Difference between 3NF and BCNF
• A table in 3NF has to be in 2NF but in BCNF, a relation
need not be in 2NF or 1NF.
• BCNF is stronger definition of 3NF.
Fourth Normal Form
• A relation is said to be in 4NF if and only if– If R is already in 3NF or BCNF
– If it contains no multi valued dependencies.
• Two things must be noted– Firstly, in order for a table to contain MVD, it must
have 3 or more attributes.
– It is possible to have table containing two or more
attributes which are independent multi valued facts
about another attribute.
4NF(Contd…)
• Consider a relation
Course
Student_name
Textbook
Physics
Ankit
Mechanics
Physics
Ankit
Optics
Physics
Rahat
Mechanics
Physics
Rahat
Optics
Chemistry
Ankit
Organic
Chemistry
Ankit
Inorganic
English
Raj
Literature
English
Raj
Grammar
Table 9 Course_student_book
MVDs in the above table are-
Course →→ Student_name
Course →→ Textbook
4NF(contd…)
• Problems with MVD
– If a new student join the physics course, then we have to make
two insertions for that student in the database which is equal
to the no. of physics books. And if there are hundred textbooks
for a subject?
– Sly, if a new book is introduced for a course , then again we’ll
have to make multiple insertions which is equal to the no. of
students for that course.
So, there is high degree of redundancy which will lead to update
problems.
Rule to transform a relation to 4NFA relation R having A,B,C as attributes can be no loss decomposed
into two tables R1(A,B),R2(A,C) if and only if MVD A →→B/C hold in
R.
4NF(Contd…)
• To convert table 9 into 4NF,two separate tables are
formed– Course_student → (Course, Student_name)
– Course_book → (Course, Textbook)
Course
Student_
name
Physics
Ankit
Physics
Rahat
Chemistry
Ankit
English
Raj
Table 10 Course_student
Course
Textbook
Physics
Mechanics
Physics
Optics
Chemistry
Organic
Chemistry
Inorganic
English
Literature
English
Grammar
Table 11 Course_book
Fifth Normal Form
• A relation is said to be in 5NF if and only if– R is already in 4NF
– It cannot be further non- loss decomposed.
• Non loss decomposition is possible because of the
availability of the join operator.
• In 5NF,consideration must be given to tables where
non-loss decomposition can be achieved by
decomposition into 3 or more separate tables.
• In join, a new table is formed which contains all the
columns from both the joined tables whose tuples are
defined by the restriction applied.
5NF(Contd…)
• E.g. consider a relation
Agent
Compan
y
Product
name
Suneet
ABC
Nut
Raj
ABC
Bolt
Raj
ABC
Nut
Suneet
CDE
Bolt
Suneet
ABC
Bolt
Table 12 It states that ABC
makes both Nuts and Bolts but
CDE makes only Bolts.
This table can be decomposed into 3 tables –
P1 (Agent, Company)
P2 (Agent, Product name)
P3 (Company, Product name)
5NF(Contd…)
• P1
Agent
P2
Company
Suneet
ABC
Suneet
CDE
Raj
ABC
Agent
P3
Company Product
name
ABC
Nut
ABC
Bolt
CDE
Bolt
Produc
t name
Suneet
Nut
Suneet
Bolt
Raj
Bolt
Raj
Nut
5NF(Contd…)
• Join P1 and P2
Agent
Join P4 with P3
Agent
Company Product_
name
Suneet
ABC
Nut
Suneet
ABC
Bolt
Suneet
CDE
Bolt
Raj
ABC
Bolt
Raj
ABC
Nut
Company Product_
name
Suneet
ABC
Nut
Suneet
ABC
Bolt
Suneet
CDE
Nut*
Suneet
CDE
Bolt
Raj
ABC
Bolt
Raj
ABC
Nut
Table P4
Correct recomposition of the
original table. So, the
original table is in 5NF.
Further Normal Forms
• Join dependencies generalize multivalued
dependencies
– lead to project-join normal form (PJNF) (also called
fifth normal form)
• A class of even more general constraints, leads to a
normal form called domain-key normal form.
• Problem with these generalized constraints: are hard
to reason with, and no set of sound and complete set
of inference rules exists.
• Hence rarely used
Distributed Database
Section-A
Distributed Databases
•
•
•
•
•
•
•
•
Definition of Distributed Databases
DDBMS ,its characteristics
Topology of DDBMS
Advantages and Disadvantages of DDBMS
Heterogeneous and Homogeneous Databases
Distributed Data Storage
Architecture of DDBMS
DDBMS Design
Distributed Databases
“A logically interrelated collection of
shared data physically distributed over a
network.”
DDBMS
• DDBMS stands for Distributed Database
Management System.
Characteristics of DDBMS
• A DDBMS has the following characteristics
– A collection of logically shared data
– The data is split into a number of fragments
– Fragments may be replicated
– Fragments/Replicas are allocated to sites
– The sites are linked by a communication network
– The data at each site is under the control of a
DBMS
– The DBMS at each site can handle local applications
autonomously
•
NOTE:-It is not necessary for every site in the system to have its own local
database.
Topology of DDBMS
Site 1
DB
Site 4
Computer
Network
DB
Site 2
DB
Site 3
Advantages of DDBMS
•
•
•
•
•
•
•
Reflects organizational structure
Improved share ability and local autonomy
Improved availability
Improved reliability
Improved Performance
Economics
Modular growth
Disadvantages of DDBMS
•
•
•
•
•
•
Complexity
Cost
Security
Integrity Control more difficult
Lack of standards
Lack of experience
Homogeneous Distributed Databases
• In a homogeneous distributed database
– All sites have identical software
– Are aware of each other and agree to cooperate in
processing user requests.
– Each site surrenders part of its autonomy in terms
of right to change schemas or software
– Appears to user as a single system
Heterogeneous Distributed Databases
• In a heterogeneous distributed database
– Different sites may use different schemas and
software
• Difference in schema is a major problem for
query processing
• Difference in software is a major problem for
transaction processing
– Sites may not be aware of each other and may
provide
only
limited facilities for cooperation in transaction
processing
Distributed Data Storage
• Replication
– System maintains multiple copies of data, stored in
different sites, for faster retrieval and fault
tolerance.
• Fragmentation
– Relation is partitioned into several fragments
stored in distinct sites
• Replication and fragmentation can be combined
– Relation is partitioned into several fragments:
system maintains several identical replicas of each
such fragment.
Data Replication
• A relation or fragment of a relation is replicated if it is
stored redundantly in two or more sites.
• Full replication of a relation is the case where the
relation is stored at all sites.
• Fully redundant databases are those in which every site
contains a copy of the entire database.
Data Replication (Cont.)
• Advantages of Replication
– Availability: failure of site containing relation r does
not result in unavailability of r is replicas exist.
– Parallelism: queries on r may be processed by several
nodes in parallel.
– Reduced data transfer: relation r is available locally at
each site containing a replica of r.
• Disadvantages of Replication
– Increased cost of updates: each replica of relation r
must be updated.
– Increased complexity of concurrency control:
concurrent updates to distinct replicas may lead to
inconsistent data unless special concurrency
control mechanisms are implemented.
• One solution: choose one copy as primary copy
and apply concurrency control operations on
primary copy
Data Fragmentation
• Division of relation r into fragments r1, r2, …, rn which
contain sufficient information to reconstruct relation r.
• Horizontal fragmentation: each tuple of r is assigned to
one or more fragments
• Vertical fragmentation: the schema for relation r is
split into several smaller schemas
– All schemas must contain a common candidate key
(or superkey) to ensure lossless join property.
– A special attribute, the tuple-id attribute may be
added to each schema to serve as a candidate key.
• Example : relation account with following schema
• Account-schema = (branch-name, account-number,
balance)
Horizontal Fragmentation of account
Relation
branch-name
Hillside
Hillside
Hillside
acc-number
A-305
A-226
A-155
balance
500
336
62
account1=branch-name=“Hillside” (account)
branch-name
Valleyview
Valleyview
Valleyview
Valleyview
account-number
A-177
A-402
A-408
A-639
balance
205
10000
1123
750
account2=branch-name=“Valley view” (account)
Vertical Fragmentation of employee-info Relation
branch-name
tuple-id
customer-name
Lowman
Camp
Camp
Kahn
Kahn
Kahn
Green
Hillside
Hillside
Valleyview
Valleyview
Hillside
Valleyview
Valleyview
1
2
3
4
5
6
7
deposit1=branch-name, customer-name, tuple-id (employee-info)
Acc-number
A-305
A-226
A-177
A-402
A-155
A-408
A-639
balance
500
336
205
10000
62
1123
750
tuple-id
1
2
3
4
5
6
7
deposit2=account-number, balance, tuple-id (employee-info)
Advantages of Fragmentation
• Horizontal:
– allows parallel processing on fragments of a relation
– allows a relation to be split so that tuples are located
where they are most frequently accessed
• Vertical:
– allows tuples to be split so that each part of the tuple is
stored where it is most frequently accessed
– tuple-id attribute allows efficient joining of vertical
fragments
– allows parallel processing on a relation
• Vertical and horizontal fragmentation can be mixed.
– Fragments may be successively fragmented to an
arbitrary depth.
Data Transparency
• Data transparency: Degree to which system
user may remain unaware of the details of
how and where the data items are stored in a
distributed system
• Consider transparency issues in relation to:
– Fragmentation transparency
– Replication transparency
– Location transparency
Architecture of DDBMS
• The reference architecture of DDBMS includes
– A set of global external schema
– A global conceptual schema
– A fragmentation schema and allocation schema
– A set of schemas for each local DBMS
• Global Conceptual Schema
– Is a logical description of the whole database (as if it
were not distributed)
– Contains entities, relationships, constraints, security
and integrity information.
• Fragmentation and Allocation schemas
– fragmentation is how data is logically
partitioned.
– Allocation schema is where data is to be
located
• Local Schemas
– each local schema has its own set of
schemas.
Reference Architecture of DDBMS
Global external
Schema
Global Conceptual
Schema
Fragmentation
Schema
Allocation
Schema
Local mapping
schema
Local Conceptual
schema
Local Internal
Schema
Local mapping
schema
Local mapping
schema
Local Conceptual
Schema
Local Conceptual
Schema
Local Internal
Schema
Database
Local Internal
Schema
Distributed Database Design
• The factors to be considered for the
design are:
– Fragmentation
• Horizontal Fragmentation
• Vertical Fragmentation
– Allocation
– Replication
• Design should meet the following
objectives
– Locality of reference
– Improved reliability and availability
– Acceptable performance
– Balanced storage capacities and
costs
– Minimal Communication costs
Database Security and Integrity
Database Security
• Database security is important because data in the
database is very valuable and sensitive.
• So data should be protected from
– Abuse
– Unauthorized access
– Update
• Importance of security in database environment
–
–
–
–
–
–
–
–
Data Security Risks
Data Tampering
Data Theft
Falsifying User Identities
Password related threats
Unauthorized access to tables and Columns
Lack of accountability
Complex user management requirements.
Security Levels
• To protect the database, we must take security
measures at several levels–
–
–
–
–
•
Physical Level
Human Level
Operating System
Network Level
Database System
Data Security Requirements
– Confidentiality
– Integrity
– Availability
Physical Level Security
• Protection of equipment from floods, power failure, etc.
• Protection of disks from theft, erasure, physical damage,
etc.
• Protection of network and terminal cables from wiretaps
non-invasive electronic eavesdropping, physical damage,
etc.
Solutions:
• Replicated hardware:
– mirrored disks, dual busses, etc.
– multiple access paths between every pair of devises
• Physical security: locks,police, etc.
• Software techniques to detect physical security breaches.
Human Level Security
• Protection from stolen passwords, sabotage,
etc.
• Primarily a management problem:
– Frequent change of passwords
– Use of “non-guessable” passwords
– Log all invalid access attempts
– Data audits
– Careful hiring practices
Operating System Level Security
• Protection from invalid logins
• File-level access protection (often not very helpful
for database security)
• Protection from improper use of “superuser”
authority.
• Protection from improper use of privileged machine
intructions.
Network-Level Security
• Each site must ensure that it communicate with
trusted sites (not intruders).
• Links must be protected from theft or modification of
messages
• Mechanisms:
– Identification protocol (password-based),
– Cryptography.
Database-Level Security
• Assume security at network, operating system,
human, and physical levels.
• Database specific issues:
– each user may have authority to read only part of
the data and to write only part of the data.
– User authority may correspond to entire files or
relations, but it may also correspond only to parts
of files or relations.
• Local autonomy suggests site-level authorization
control in a distributed database.
• Global control suggests centralized control.
Confidentiality
• Privacy of Communications
– DBMS should be able to control the spread
of personal information
– It should keep corporate data such as trade
secrets, proprietary info, competitive
analysis
secure
and
away
from
unauthorized access.
• Secure storage of sensitive data
– Integrity and privacy of confidential data
must be protected.
• Authentication
– System must maintain security of database
by verifying user’s identity.
• Authorization
– Authorization is second layer of security.
– It is the process through which system
obtains information about authenticated
user including database operations user
can perform and object that he can access.
Integrity
• It is concerned with the maintenance of the correctness
and consistency of the data.
• Integrity rules are
– Domain Integrity rule
• Is concerned with maintaining the correctness of
attribute values within relation.
– Entity Integrity rule
• Primary key attribute cannot have null values.
– Referential Integrity rule
• foreign key attribute can only have primary key
values or it can be null.
Availability
• A secure system makes data available to authorized
users ,without delay.
View –Another way to provide Security
Views provide security by restricting access to a set of rows
and columns of a table.
A View is a logical table based on one or more tables, providing
window to them.
Views themselves do not contain data.
They just act as a window through which contents of a table can
be viewed.
One can also manipulate the contents of the original table
through these views.
Syntax to create views in SQL
• Create or replace view <View name> as <select
operation>;
Uses of View
• Views restrict access to the database.
• Critical data in the database is safeguarded as access
to such data can be controlled using views.
• Views allow users to make simple queries like joining
of tables without even knowing how tom perform join.
• Views allow the same data to be seen in different ways
by different users.
• Views can be used to select insert or update data .All
the changes will be actually made in the base table.
Protecting the data within the Database
• Two methods b which the access control is done –
– Privilege
– Role
• Privilege- A permission to access a named object. A
privilege can be
– Database Privilege
– System Privilege
– Object Privilege
• Roles – A role is a mechanism that can be used to
provide authorization.
• Granting and Revoking Privileges
– Grant Command
– Revoke Command
Role of DBA to provide security to a system
• DBA plays an important role in enforcing
security.
• DBA deals with –
– Creating new accounts- Each new user or
group of users must be assigned an
authorized id and a password.
– Mandatory Control Issues- DBA must assign
security class to each database object and
security clearances to each authorization id
in accordance with chosen security policy.
Encryption
• Data may be encrypted when database
authorization provisions do not offer sufficient
protection.
• Properties of good encryption technique:
– Relatively simple for authorized users to
encrypt and decrypt data.
– Encryption scheme depends not on the
secrecy of the algorithm but on the secrecy
of a parameter of the algorithm called the
encryption key.
– Extremely difficult for an intruder to
determine the encryption key.
Data Encryption
• Encryption helps to protect information in certain
situation where the normal mechanisms of DBMS are not
adequate.
Intruder
Plain
text
Encryption
Method
Encryption Key
Plain
text
Decryption
Method
Cipher text
Decryption Key
This process is called Cryptography
Encryption (Cont.)
•
Data Encryption Standard (DES) substitutes
characters and rearranges their order on the basis of
an encryption key which is provided to authorized
users via a secure mechanism. Scheme is no more
secure than the key transmission mechanism since
the key has to be shared.
• Advanced Encryption Standard (AES) is a new
standard replacing DES, and is based on the Rindael
algorithm, but is also dependent on shared secret
keys
• Public-key encryption is based on each user
having two keys:
– public key – publicly published key used to
encrypt data, but cannot be used to decrypt
data
– private key -- key known only to individual
user, and used to decrypt data.
Need not be transmitted to the site doing
encryption.
Encryption scheme is such that it is impossible
or extremely hard to decrypt data given only
the public key.
Concurrency
What is a transaction?
• A set of changes that must all be made together.
• A program unit in which either all the operations should
be performed or none should be done
• It is necessary for consistency of the database.
Processes of Transaction
• Read operation- When value of a data object from disk
is copied to a program variable onto main memory.
• Write operation- After modification when copy of data
object is written to disk.
Transaction Properties
• ACID Properties
• Atomicity
• Consistency
• Isolation
• Durability
• Scheduling of Transactions
•
•
•
•
Complete schedule
Serial Schedule
Non –serial Schedule
Serializiable Schedule
Object Relational
DBMS (ORDBMS)
ORDBMS
• An object-relational database (ORD) or object-relational
database management system (ORDBMS) is a database
management system (DBMS) similar to a relational
database, but with an object-oriented database model:
objects, classes and inheritance are directly supported
in database schemas and in the query language. In
addition, it supports extension of the data model with
custom data-types and methods.
Definition
• An object relational database is also called an object
relational database management system (ORDBMS).
This system simply puts an object oriented front end
on a relational database (RDBMS).
• When applications interface to this type of database,
it will normally interface as though the data is stored
as objects.
• However the system will convert the object
information into data tables with rows and colums
and handle the data the same as a relational
database.
• Likewise, when the data is retrieved, it must be
reassembled from simple data into complex objects.
• ORDBMS was created to handle new types of data
such as audio, video, and image files that relational
databases were not equipped to handle. In addition,
its development was the result of increased usage of
object-oriented programming languages, and a large
mismatch between these and the DBMS software.
• ORDBMS are systems that “attempt to extend
relational database systems with the functionality
necessary to support a broader class of applications
and, in many ways, provide a bridge between the
relational and object-oriented paradigms.”
Why Extend Relational Data Model
To eradicate the following weaknesses
• Poor representation of ‘real world’ conceptual
model
– Usually the relational schema does not
correspond to real world entities
– Difficult to change schema without affecting
the applications;
• Semantic overloading
– The same relation is used to represent
entities as well as relationships
• Poor support for integrity and business rules
• Fixed number of attributes & all
attribute values must be atomic
• Limited operations
• Difficult to handle recursive queries
• Impedance mismatch (when SQL is
embedded in PLs)
• Short-lived transactions (strict locking
and recovery mechanisms)
ORDBMS Products
• IBM’s DB2 Universal Database system.
• Oracle Corporation and IBM have developed
product as Universal Database Servers.
• Informix’s Universal server that handles
applications such as text, video, sound and
large binary objects.
Advantages
• Easy to define complex objects and new
data types
• Reduce application development time
• Support object views
• Support powerful Query language. e.g.
SQL 3
• Simplicity of development
Disadvantages
• Complexity
• Immature
• Increased Costs
Object Oriented
Databases
OOD Concepts
•
•
•
•
•
•
•
•
•
Object
Object identity
Attribute
Object state
Methods and messages
Class
Abstraction and Encapsulation
Inheritance
Method overloading
Object
• An abstract representation of the real world entity that
has unique identity, embedded properties and ability to
interpret with other objects.
• Each object consist of two components- a state and a
behavior.
• State- is a condition of an object or set of circumstances
describing the object.
• Behavior- the operations that can be defined for an
object describe the behavior of the object with respect
to other objects.
Object identity
•
An object must have unique identity. It is
represented by an object ID(OID). The OID is assigned
by the system at the moment of object creation and
cannot be changed under any circumstances.
• OID is maintained by the system and not by the user.
• The OID can be deleted only if the object is deleted
and that OID can never be reused.
Attribute
• An attribute is a data value held by the object in a
class.
• Attributes are also known as instance variables .
• Each attribute has a unique name and a data type
associated with it.
• Attributes also have a domain.
Object State
• The object state is a set of values that the
objects attribute can have at a given time.
• E.g. state of electric light would be ON/OFF.
• State of an object can vary, but its OID
remains same.
• To change object’s state, change the values of
object’s attribute.
Methods and Messages
• A method is a code that performs a specific operation
on object’s data.
• Methods protect data from direct and unauthorized
access by other objects.
• Methods are equivalent to procedures in programming
languages.
• Every method is identified by a name and has a body.
• A message is a request to an object to invoke one of its
methods.
• A message therefore contains specification of the
receiver object, name of the method and arguments of
the methods.
Class
• A class is a collection of similar objects with shared
attributes, methods and common relationships to
other objects.
• Each object of a class is called an instance of a class.
• Each instance has its own values for each attribute
but shares the same attribute name and methods
with other instances of a class.
Abstraction and Encapsulation
• Abstraction is focusing on the essential aspects of an
entity and ignoring its unimportant details.
• Encapsulation consists of separating the external
aspects of an object ,which are accessible to other
objects ,from the internal implementation details of
the objects which are hidden from other objects.
• Encapsulation forms a sort of data independence.
Inheritance
• The ability of an object within a hierarchy to inherit
the attributes and methods of classes above it.
• Inheritance is sharing of attributes and methods
among classes based on hierarchical relationship.
• If there are two classes A and B .The objects of class B
have access to attributes and methods of class A
without the need to redefine them, then class A is
known as Super class and class B is known as Sub class.
Method overloading and Polymorphism
• Method overloading offers the possibilities to use
same name for different methods.
• Polymorphism allows different objects to response
to the same message in different ways.
• It is important feature of OO systems as its existence
allows objects to behave according to their specific
characteristics.
• To sum up, polymorphism means that same
operation may behave differently on different
classes.
OODBMS
In OODBMS,
• object oriented programming concepts such as
inheritance, encapsulation etc. are enforced in
addition to database management concepts such as
ACID properties
• which lead to
– System integrity
– Support for adhoc query language
– Secondary storage management systems.
OODBMS
Object – Oriented
Features
OO Concepts
OO data Model
OOPL
OODBMS
Persistence
Conventional
DBMS Features
Backup and Recovery
Transaction
Concurrency
Administration
Security and Integrity
Data Accessibility
OODBMS Products
•
•
•
•
‘Object Store’ by Design Inc.
Objectivity/DB’ by Objectivity Inc.
Versant OODBMS’ by versant software
‘Inter Systems’ cache OODBMS by Ajou
University Medical Center in South Korea.
• Oracle Berkeley DB by Oracle Corporation.
Applications of OODBMS
• Financial applications in risk management
• Telecommunication applications such as network
configuration management applications.
• Medical applications that handle digitized data such as
X-rays, Ultrasounds together with textual data used
for medical research and patient medical history
analysis.
• CAD and CAM
• Computer Assisted Software Engineering (CASE),
which are designed to handle very large amount of
inter related data.
Advantages
• Enhanced Modeling Capabilities
– With the introduction of objects, the data and its
behavior can be grouped as a unit.
– Also, the objects can interrelate with other objects
and form complex objects which cannot be handled
by conventional data models.
• Applicability to advanced database Applications
– Conventional databases simply lack the ability to
provide efficient applications in CAD, CAM, Medical
Imaging etc.
• Extensibility
– The database system comes with a set of predefined types, an OODBMS allows new data
types to be defined from the existing data types.
• Improved performance
– OODBMS provides performance improvements
compared to RDBMS when managing complex
objects.
• Faster application development time
– It is obtained
reusability.
through
inheritance
and
Disadvantages
• Lack of theoretical foundation
– The OODBMS is based on object model which lacks the
solid theoretical foundation of the relational model on
which the RDBMS is built.
• Competition
– OODBMS’s face strong and effective opposition from the
firmly established RDBMS.
• Lack of standard Tools
• Lack of experience
• Lack of standard adhoc query language
•
•
•
•
Lack of support for views
Lack of foreign key concepts
Lack of support for security
Complexity
Object Query Language (OQL)
• Declarative query language
– Not computationally complete
• Syntax based on SQL (select, from, where)
• Additional flexibility (queries with user
defined operators and types)
Example of OQL query
The following is a sample query
“what are the names of the black product?”
Select distinct p.name From products p
Where p.color = “black”
Valid in both SQL and OQL, but results are
different.
Result of the query (SQL)
Original table
Product
no
Name
Color
P1
P2
P3
Ford Mustang Black
Toyota Celica Green
Mercedes SLK Black
RESULT
Name
Ford Mustang
Mercedes SLK
The statement queries a
relational database.
=> Returns a table with
rows.
Result of the query (OQL)
Original table
Product no Name
Color
P1
P2
P3
Black
Green
Black
Ford Mustang
Toyota Celica
Mercedes SLK
RESULT
String
Ford
Mustang
String
Mercedes
SLK
The statement queries a
object-oriented database
=> Returns a collection
of objects.
Comparison
Queries look very similar in SQL and
OQL, sometimes they are the same
In fact, the results they give are very
different
Query returns:
OQL
SQL
Object
Collection of
objects
Tuple
Table
Recovery of Data
Causes of Failure
•
•
•
•
•
•
•
•
•
System Crash
User Error
Carelessness
Sabotage (Intentional Corruption)
Statement Failure (Inability to execute SQL
Statement)
Application S/W failure (Logical Error)
Network Failure
Media Failure (Hard Disk crash)
Natural Physical disasters (fire, floods, earthquake)
Recovery from failures
• First we need to identify failure modes.
• Next must consider how these failures affect the
contents of the database.
• Then propose the recovery algorithm to recover the
database.
• Recovery algos perform two types of actions– Actions taken during normal transaction to ensure
enough information exists to allow recovery from
failure.
– Actions taken following a failure to recover the
database to a state that is known to be correct.
Terminology used in recovery process
• Disk is partitioned into fixed length storage units
called Blocks.
• Blocks are the units of data transfer to and from disk.
• When a transaction starts the data required for the
transaction is transferred from the disk to main
memory in blocks.
• When all the operations are performed on the block
present in main memory, then the block is transferred
from main memory to disk for final storage of results.
• The blocks residing in main memory are referred to as
buffer block.
• The blocks residing on the disk are called Physical
block.
Contd…
Buffer
Block
A
A
Main Memory
Physical
Block
Disk
Block movements between disk and main memory are
categorized into two operations•Input operation- Transfer of physical block to main memory
•Output operation- Transfer of buffer block in main memory to disk
Recovery and Atomicity of a transaction
•
Consider a banking transaction that transfer Rs.50 from account A to
account B with initial values of A and B being Rs.1000 and Rs.2000
respectively.
•
T1
Read (A, a)
A=a-50
Write (A, a)
Output (A)
Read (B, b)
B=b+50
Write (B, b)
Output (B)
•
Suppose that a system crash occurs
during the execution of transactions
T1 after output (A) has taken place
but before output (B) has executed.
Since output (A) operation was
performed successfully ,it means
that data item A has value of 950 but
since output (B) is not performed so
its value remain 2000.
System enters into inconsistent state because there is a loss of Rs.50
after execution of transaction T1.
To recover from above problem two
methods are• Re-execute T1- If transaction T1 is re-executed
successfully then the value of A is Rs.900 and B is
Rs.2050.It again results in incorrect information as
A’s correct value as 950 and not 900.
• Do not Re-execute T1- The current state of
system is A=950 and B=2000 which again is an
inconsistent state.
• So, when simple recovery schemes do not work, we
use
– Log based Recovery
– Shadow Paging
Log Based Recovery
• A log file is maintained for recovery purpose.
• Log File is a sequence of log records
• Log record maintain a record of all operations of the
database.
• There are several types of Log records– <Start>log record
– <Update>log record
– <Commit> log record
– <Abort> log record
Types of Log records
• <Start> log record
– It contains about the start of each transaction.
– It includes transaction identifier. Transaction identifier is
the unique identification of the transaction.
– <Ti, start>
• <Update > log record
– It describes a single database and has the following fields.
– <Ti,Xj,V1,V2>
Ti-Transaction identifier, Xj- Data
item,V1-Old value,V2- New value
• <Commit> log record
– When a transaction successfully commits or completes, <Ti,
commit > log record is stored.
• <Abort> log record
– When a transaction Ti is aborted due to some reason, a <Ti,
abort> log record is stored in log file.
Techniques for Log based Recovery
• Two techniques– Deferred Database Modification
– Immediate Database Modification
• Deferred Database Modification
– It ensures transaction atomicity by recording all database
modifications in log but deferring the execution of all write
operations until the transaction partially commits.
– A transaction is said to be partially committed once the final
action of the transaction has been executed.
– It means that during write operation the modified values of
local variable are not copied into database items but the
corresponding new and old values are stored in the log file.
• When the transaction successfully performs all its
operations, then the information stored in log record is
used to set the values of data items.
• But if the transaction fails to complete, then the data item
retain their old values and information in log record is
simply ignored.
Recovery Scheme
• Redo (Ti)- It sets the value of all data items updated by the
transaction Ti to the new value from the log of records.
• After a failure has occurred , the recovery subsystem
consults the log to determine which transaction needs to
be redone.
• Transaction Ti needs to be redone if and only if the log
contain both records <Ti, Start> and <Ti, Commit>.
• In other cases, ignore the log record and re-execute the
transaction.
Immediate Database Modification
• It allows database modification to be output in the
database while the transaction is still in the active state.
• If a system crashes or transaction aborts then the old value
field of the log record is used to restore the modified data
items.
• This restoration is accomplished by undo operation.
Recovery Scheme
• Undo (Ti)- It restores the values of all data items updated
by transaction Ti to old values.
• After a failure had occurred, the recovery scheme
determine which transaction needs to be undone.
• A transaction needs to be undone if the log record contains
record <Ti, Start> and not <Ti, Commit>.
• Redo (Ti)- As discussed before
Shadow Paging
• Two types of recovery concepts– In-place updating- In this, system writes the data
item in the same location, thus overwriting the old
value of the data item on disk.
– Thus there is a single copy of each data item
maintained on disk. This technique is used in Log
based recovery.
– Shadowing- In this, system writes new data item
at different disk location. Thus there are multiple
copies of a data item maintained on disk. This is
known as Shadow Paging.
• Database is partitioned into some no. of fixed length
blocks, which are referred to as Pages.
• Suppose there are n pages numbered 1 through n which
do not need to be stored in particular order on disk.
• A page table is maintained for each page n disk. This
helps in finding ith page of the database as each entry
contains a pointer to a page on disk.
• The first entry contains a pointer to first page and so
on.
• The key idea behind shadow paging is to maintain two
page tables during the life of a transaction- the current
page table and the shadow page table.
• When the transaction starts both tables are identical.
• The shadow page table is never changed over the
duration of the transaction .
• The current page table may be changed when a
transaction performs a write operation.
Write operation
•
•
Suppose a transaction performs a write (x) operation
and that x resides on the ith page.
Write operation is executed as follows1. If the ith page is not main memory ,then first
bring that page from disk to main memory (Input
operation).
2. If it is the first write operation on the ith page by
a transaction, then1. First find an unused page on disk.
2. Delete the page found from list of free pages.
3. Modify the current page table such that ith
page points to new page.
4. Assign new modified value to that page.
Fig. showing Shadow paging
1
2
3
4
5
6
7
Shadow Page table
1
2
3
4
5
6
7
8
Pages on disk
1
2
3
4
5
6
7
Current Page
table
Example:A
• T1
Read (A, a)
B
a= a-100
C
Write (A, a)
Read (B, b)
b=b+ 100
Write (B, b)
• T2
Shadow Page
Read (C, c)
Table
C=C-200
Write (C, c)
[A=1000,B=2000,C=3000]
***
1000
3000
***
A
B
C
2000
Pages on
disk
Current Page
Table
Structured Query Language
Oracle • The Oracle corporation is world’s leading supplier for information
management and world’s second largest dependent S/W company.
• Oracle is the product of Oracle corporation and is the top selling
multi-user DBMS.
• First released in 1970s by Larry Elision & his team and organization
software development laboratories (SDL).
• In 1984,oracle was ready to run over PCs released by companies
such as IBM and Apple.
• In 1985 and 1986,oracle released its versions 5.0 and 5.1 of its
RDBMS.
• In 1992,after intense research & development and testing oracle
released version 7.
• In 1997,released oracle 8.
• In 1999 oracle 8i was released with added functionality.
• Various new versions like Oracle 9i,10g,11i are being widely used.
Products of Oracle 8i
• Four main products of 8i are– Oracle 8i- Oracle serves for small no. of users
and small database size.
– Oracle 8i Enterprise Edition- For large no. of
users or large database size.
– Oracle 8i Personal Edition- Single version of
Oracle.
– Oracle 8i Lite – Light weight database engine
for mobile computing on notebooks.
Features of Oracle 8
•
•
•
•
•
•
•
•
Client/Server Architecture
Large database & space management
Concurrent Processing
High transaction processing performance
Controlled availability
Manageable Security
Portability
Compatibility
SQL
• To communicate with Oracle engine, User needs
some language.
• The most acceptable language is ANSI SQL
(Structured query language)
• Oracle includes interactive SQL tool called SQL * plus
which allows the users to enter ANSI SQL sentences
and pass them to oracle engine for execution.
• These sentences allow the user to create ,access and
maintain data structure like tables, indexes etc.
Steps to invoke SQL * Plus/SQL
• Start your Operating System normally.
• Click on the Start button and then on
programs. It displays the list of
programs.
• Log
Click
on
Oracle
95
and
the
select
SQL*
On
Plus.
UserName
: is prompted for oracle login
• The
user
Scott
Password
:
ID
and password
so
that
user
may
get
Tiger
Host String :
connected to oracle.
OK
Cancel
Contd…
• Once Login id and password are entered ,an SQL
prompt appears.
• At this point requesting user is connected with the
oracle engine.
• User can interact by giving its queries to the oracle
engine for the execution.
SQL Fundamentals
• SQL is a simple and powerful language used to
create ,access and manipulate data and structure in
the database.
• SQL is like plain English, easy to understand and
write.
• SQL is not case sensitive.
SQL Statements
• Data Definition Language (DDL) – Create, alter, drop,
rename
• Data Manipulation Language (DML) – Delete, Insert,
Select, Update
• Data Control Statements (DCL)– Grant ,Revoke
• Transaction Control Statements (TCL)- Commit, Rollback,
Savepoint
Data Types
• The most commonly used data types are:–
–
–
–
–
–
Char
Varchar or Varchar2
Number
Date
Long
Long raw/raw
• Char(n) – Used to store character strings of fixed size.
Size of the character string is determined by numeric
value of n. This data type can hold maximum of 255
characters.
• Varchar(n)/Varchar2(n) – Used to store variable length
alphanumeric data. It can store maximum of 2000
characters.
Contd…
• Number (p, s) – Used to store numbers fixed or
floating point. The precision p determines the length
of the data while Scale (S) determines the number of
places after the decimal.
• Date – This data type stores date values in a special
format internal to oracle. The default format in which
date is stored is DD-MON-YY .
• Long – Used to store up to 2 gigabytes of
alphanumeric text data.
• Long Raw / Raw – used to store graphics & sound files
when used in conjunction with long to form long raw.
Creating & Managing Tables
• A table is composed of rows & columns.
• A table can represent a single entity.
• A table must have unique name through which it can be
referred.
• A table has no. of characteristics. The characteristics of a
table are called its attributes.
Table creation rules
• Table names and column names must begin with a letter
and can be 1-30 characters long.
• Names must contain only the characters A-Z,a-z,09,_(underscore),$, #.
• Name must not be an oracle server reserved word or SQL
reserved word.
• Names must not duplicate the names of other objects.
• Table name is not case sensitive.
Constraints
• Constraints are the restrictions that are enforced so that
the data stored in the database is valid and prohibits
wrong data to be entered into the database.
• SQL will reject any value that violates the criteria that was
defined.
• Categories of constraints– Column level Constraint- applied on individual columns.
– Table constraints – applied to more than one column of a table.
• Types of Constraints–
–
–
–
–
Not Null Constraint
Unique Constraint
Primary Key Constraint
Check Constraint
Default constraint
Types of Constraints
• Not Null Constraint
– It prevents the column from accepting Null values.
– This can be applied only at Column level.
– Syntax- Columnname data type (size) Not Null
• Unique Constraint
– This allows unique values to be entered into the column i.e.
every value is a distinct value in that column.
– When we apply unique constraint on a group of columns ,then
each value of the group should be unique.
– A table can have more than one unique key.
– Syntax- Column level Constraint
• Columnname data type (size) Unique;
– Table level Constraint
• Columnname1 data type (size),columnname2 data type
(Size),
Unique (Columnname1, Columnname 2));
Contd…
• Primary Key Constraint
– Is used to identify each row of a table uniquely.
– A pk may be a single column or group of columns.
– It is a combination of both the unique key and not null
constraint.
– When pk is applied on a single column it is called as simple
key.
– When applied on multiple columns it is called as composite
primary key.
– Syntax- column level
Columnname data type (size) primary key
– Table level
Columnname1 data type (size),columnname2 data type (Size),
primary key (Columnname1, Columnname 2));
Contd…
• Check constraint
– It allows Oracle to verify the validity of data being entered on a
table against a set of constant values.
– It consists of keyword CHECK followed by parenthesized
conditions.
– Check Constraint must be specified as a logical expression that
evaluates either to be true or false.
– If an SQL statement causes the condition to be false an error
message will be displayed.
– Syntax:Columnname data type (size) check (logical expression)
• Default Constraint
– It allows the user to insert values in the columns where the user
do not want to insert the value.
– Syntax
Columnname data type (size) default value;
Contd…
• Foreign Key Constraint
–
–
–
–
–
–
–
–
–
It helps to establish a relationship among tables.
A fk may be single column or the combination of columns which
derive their values based on primary key or unique key values
from another table.
A fk is also known as referential integrity constraint.
A table in which there is a foreign key is called detail table or the
child table.
And the table to which it refers the values of the pk is called
master table or parent table.
A fk must have corresponding primary key in master table
Whenever you insert a value in the foreign key column, it checks
the value in the primary key of the parent table.
If value is not present ,it gives an error.
SyntaxColumnname data type (size) references table name (column
name) [on delete cascade]