Infrastructure
Download
Report
Transcript Infrastructure
An Introduction to DBMS
Technology
Yelena Yesha
Olga Streltchenko
1
Presentation Overview
Database functionality
Relational Data Model
SQL
Web-Database Connectivity
2
Transaction
A transaction is an exchange of:
Information;
Goods;
Services;
Currency/currencies;
Transaction properties
Atomicity: a transaction must be all or nothing.
Consistency: a transaction takes the system
from one consistent state to another.
Isolation.
Durability.
3
Databases and Information
Economy
Shift from computation to information;
corporate computing;
personal computing and the Internet;
scientific computing.
Growing importance of transaction-orientation
and information retrieval.
The database field concentrates on the
efficient management of large amounts of
persistent, reliable shared data.
4
Database versus File System
Database management systems (DBMS)
is a software that provides transaction
support by implementing
Data independence;
Data access efficiency;
Concurrency control ;
Data integrity;
Reliability;
Security;
Data distribution and heterogeneity.
5
Data Independence and
Access Efficiency
DBMS allows to avoid rewriting all access routines
every time the data format changes or data is
added/modified/deleted.
insulate applications from data storage details.
Logical independence: protection from changes in
logical structure of data.
Physical independence: protection from changes
in physical structure of data.
DBMS maintains data structures and implements
algorithms allowing to avoid linear search
indexing: search in O(log n);
fast access even on complex data queries.
6
Concurrency Control and
Data Integrity
Interleaving actions of different applications
boosts performance.
DBMS insures semantically correct access to
the same data by concurrent applications
two programs accessing the same data at the
same time can result in an inconsistent update;
implement sharing in a controlled manner.
Data semantics may require certain constraints
to be satisfied.
DBMS guarantees that application programs
comply with the constraints when
adding/modifying the data.
7
Reliability and Security
DBMS provides techniques for recovery
from software and hardware failures
guarantee survival of the data across
catastrophes.
DBMS prevents unauthorized users
from accessing/modifying data or
denying service to other users.
8
Data Distribution and
Heterogeneity
Centralization is the enemy of scalability
a vast number of modern applications are
distributed.
Data sharing in a distributed environment is
a challenge.
Heterogeneity applies to networks, hardware,
operating systems, programming languages,
data formats, etc.
Distributed applications must mask the
differences.
Need distributed data management.
9
Categories of Data Models
High-level or conceptual
entities, attributes, relationships.
Representational or implementation or
logical
relational, network hierarchical, objectoriented, object-relational.
Physical or low-level
data storage.
10
Levels of Abstraction in a
Database
Schema versus Instance
schema = description of the data that
captures data types, relationships, constraints on the
data;
l
meta-data (data about data), knowledge, e.g.,
Employees(EmpName, EmpNo, Dept, Sal)
is independent of any application program
changes infrequently
instance = set of records/tuples/rows for that schema,
the actual data in the database at a given time
time-varying
e.g., <Jane, 201, Shoe, 1M>,<Susan, 302, Toy, 1M>
11
3-schema Architecture
Physical level description of a database:
how things are stored on disk:
files, record structures,
indices,
data structures for disk blocks,
methodology for dealing with too long records, etc.
Conceptual level description of a database
The description of application data (its schema)
using one of the traditional data models.
12
3-Schema Architecture
(cont'd)
View-level description of a database
What users of a particular application see
their own customized schema, e.g., for payroll, for the
ticket agent, for a simulation program.
Multiple levels
helps with data independence;
helps with maintenance.
Many views, single logical and physical
schema.
Levels of abstraction give data
independence.
13
The Entity-Relational
Model
Entity: a distinguishable object.
Entity set: a set of entities all of the same
type.
Attribute: a single property of an entity;
simple vs composite;
single-valued vs multi-valued;
stored vs derived;
null values.
Domain: set of values permitted for that
attribute.
14
The E-R Model (2)
Relationship: an association between two or
more entities.
Relationship set: a set of relationships all of the
same type
There is no correct schema for a batch of data.
Which schema is best depends on the
application.
Many basic data modelling choices depend on
an understanding of the application.
15
Data Model
Data model: notation for describing data,
plus a set of operations used to
manipulate that data.
a set of primitives for defining the structure of a
DB;
a set of operations for specifying the retrievals
and updates on a DB;
relational, hierarchical, network, object-oriented.
16
The Relational Model
(Codd 1970)
The relational data model is the most
important data model currently existing.
Value-oriented, i.e., allows operations on
relations whose results are relations, thus
enables to combine operations.
As opposed to object-oriented models, in which
Operations cannot be applied to the result of other
operations;
The result of an operation may be a new data type, and
operations may not be available for this type.
17
Definitions: Domain &
Relation
A domain is a set of atomic values.
A relation is a finite subset of the cartesian
product of a finite list of domains;
relation is a set of tuples;
order of tuples is irrelevant and
no relation has 2 identical tuples;
each tuple value is atomic
no composite attributes;
no multi-valued attributes.
18
Relational Model (cont’d)
Everything is represented by relations
– Formally: Given sets D1, D2, ....Dn (not necessarily distinct), a
relation R D1 X D2 X ...X Dn
– Di 's are the domains and n is the arity (degree) of R
– elements of R are called tuples
– number of tuples in R is the cardinality of R
relational data model helps to view a relation as a table
Observe the following properties:
19
Relational Model (cont’d)
Everything is represented by relations;
Given sets D1, D2, ....Dn (not necessarily distinct), a relation R
D1 X D2 X ...X Dn;
Di 's are the domains and n is the arity (degree) of R;
elements of R are called tuples;
number of tuples in R is the cardinality of R.
Relational data model helps to view a relation as a table:
each row represents a tuple (record);
each column represents an attribute (field).
Properties:
no two rows are identical;
the ordering of tuples is unimportant;
the ordering of columns is important.
20
Keys
Let R be a relation schema and K R.
K is a superkey of R if it can uniquely identify any
tuple in any r(R). There are no tuples t and t' such
that t[K] = t'[K}.
K is a candidate key if K is a minimal superkey.
There is no K' K such that K' is also a superkey of
r(R)
A primary key is one of the candidate keys,
remaining candidate keys are alternate keys;
Every relation has a key.
A key is a property of a relation schema, not a
relation.
21
Integrity Constraints
Relational database schema is a set of relation schemas
and a set of integrity constraints
Integrity constraint: condition that must be true for any
instance of a database.
Integrity constraints are expected to hold on every
database instance of the schema
Integrity constraints
Structural:
key constraint: uniqueness of keys;
entity integrity constraint: no primary key value can be null;
referential integrity constraint: reference from relation R to
relation S must refer to an existing tuple of S
Semantic.
22
Foreign Keys
A set of attributes (FK) of R is a foreign key if
the attributes in FK have the same domain as the
primary key (PK) attributes of another relation S, and
for each instance of R, the values of FK occur as a
value of PK for some instance in S, or is null.
In the relational model, the only way an entity can
reference another entity is through the value of
the primary key of the second entity.
Foreign keys don't have to be unique or non-null,
but if one component is null, then all components
must be null.
23
E-R to Relations (I.e.,
Defining Relations)
Done using DDL (Data Definition Language)
Name whole database schema
Declare domains for attributes
Define relations:
name
attribute names and domains
primary and other keys
foreign keys
24
Translating from E-R
Represent entity set E by a relation whose
attributes are all the E-R attributes of E. Then
each tuple represents one entity of E.
To represent relation R between entity sets E1,
…, Ek, create relation R with key attributes of E1,
…, key attributes of Ek, as attributes (rename
duplicates). Each tuple of the relation represents
one combination of entities that are related to
one another.
You might have some redundant relations, which
you can delete.
25
Schema Normalization
Formal theory of database design
based on grouping attributes in a particular way
using attribute dependencies to achieve ‘good’
schemas
1NF, 2NF, 3NF, BCNF, 4NF, …
Goal:
don’t store redundant information
can represent everything (otherwise, the schema is
useless!)
26
Query and Update
Languages
DDL : data definition language
used by DBA
to define schemas, create views, create
indices
DML : data manipulation language
used by sophisticated casual user
to query data or
update data
27
Relational Query Languages
Query languages allow manipulation and retrieval of data from
a database.
Relational model supports simple, powerful query languages;
strong formal foundation based on logic;
allows for optimization.
Two mathematical languages form the basis for relational
languages (e.g., SQL) and for implementation:
Relational Algebra: More operational, useful for representing
execution plans;
Relational Calculus: Lets users describe what they want, rather
than how to compute it (non-operational, declarative).
Basic operations:
selection, projection, cross-product, set-difference, union,
intersection, join, division
28
SQL
SQL (Structured Query Language) is the query
language for the System R developed at IBM San
Jose [Astraham, Gray, Lindsay, Selinger ..]
SQL is now the query language for IBM's DB2 and
the de-facto standard on most commercial RDBMS.
SQL is a comprehensive language providing
statements for data definition, query and update.
Hence it is both DDL and DML.
29
SQL (cont’d)
SQL allows to create views, it can be embedded in a
general-purpose programming language (C or
PASCAL).
SQL has one basic statement for retrieving data from
the database:
the SELECT statement
SELECT <attribute list>
FROM <table list>
WHERE <condition>
Standards:
SQL or SQL1 (ANSI 1986);
SQL2 or SQL-92 (ANSI 1992);
SQL3 underway: extends SQL with OO and other concepts.
30
SQL Data Types
Numeric
Integers of various ranges: INTEGER (or INT), SMALLINT;
Real numbers of various precision: FLOAT, REAL, DOUBLE
PRECISION;
Formatted numbers: DECIMAL(i,j) or DEC(i,j) or
NUMERIC(i,j).
Character Strings
Fixed length n: CHAR(n) or CHARACTER(n);
Variable length of maximum n: VARCHAR(n) or CHAR
VARYING(n) (default n =1).
Bit strings:
Fixed length n: BIT(n);
Varying length of maximum n: VARBIT(n) or BIT VARYING(n).
31
SQL Data Types (cont’d)
Date & Time [SQL2]:
DATE (10 positions): YYYY-MM-DD;
TIME (8 positions): HH:MM:SS;
TIMESTAMP:date, time with 6 fractions of seconds and
optional time zone TIME(i) defines i decimal fractions of
seconds
(8+1+i positions): HH:MM:SS:ddd...d;
TIME WITH TIME ZONE includes the displacement from
standard universal time zone [+13:00 to -12:59] (6
additional positions): HH:MM:SS+/-HH:MM
INTERVAL: Year/Month or Day/TIME
32
Data Definition Language
DDL is used to define the (schema of) database
to create a database schema;
to create a domain;
to create, drop. alter a table;
to create, remove an index [defunct in SQL2];
to create or drop a view;
to define integrity constraints;
to define access privileges to users (Oracle: CONNECT,
RESOURCE, DBA);
to GRANT or REVOKE privileges ON/TO object/user
SQL2 supports multiple schemas
CREATE SCHEMA name AUTHORIZATION user;
– CREATE SCHEMA EMPLOYEE AUTHORIZATION yesha;
33
SQL Schema
EMP(Name,SSN,DNO,BirthPlace)
DEPT(DName,DNO,MGRSSN)
PROJECT(PName,PNO,PLocation,DNum)
WORKSON(ESSN,PNO,Hours)
CREATE SCHEMA 'COMPANY';
CREATE TABLE EMP
(
EName
name_dom
NOT NULL,
SSN
CHAR(9)
NOT NULL,
DNO
INTEGER
NOT NULL,
BirthPlace
city_dom,
PRIMARY KEY(SSN),
FOREIGN KEY (DNO) REFERENCES DEPT (DNO)
);
34
Drop
DROP command can be used to remove
–
a schema:
DROP SCHEMA Company CASCADE;
DROP SCHEMA Company RESTRICT
–
CASCADE option removes everything: tuples, tables,
domains, ...
RESTRICT option removes the schema if it has no elements
in it
a table:
DROP TABLE EMP CASCADE;
DROP SCHEMA EMP RESTRICT
CASCADE option removes the table and all references to it
RESTRICT option removes the table if it is not referenced
35
Alter
The ALTER allows to:
–
–
–
–
–
alter the domain of an attribute
ALTER TABLE Student
ALTER GPA NUMBER(4,2);
set or drop default value of an attribute
ALTER TABLE Student
ALTER GPA DROP DEFAULT;
ALTER TABLE Student
ALTER GPA SET DEFAULT 0.00;
add a new attribute to a relation
ALTER TABLE Student
ALTER Admission DATE;
drop an attribute (not in SQL1)
ALTER TABLE Student
DROP GPA [CASCADE/RESTRICT];
36
Data Manipulation Language
SELECT
tuple queries
aggregate queries
INSERT
DELETE
UPDATE
CREATE VIEW
37
Data Query: SELECT
Used for retrieval.
Used to specify subqueries for retrieval and for the
other operations.
Not the sigma or select operator of the relational
algebra.
The general form of a SELECT statement:
SELECT <attribute list>
FROM <table list>
WHERE <condition>
GROUP BY <attribute list>
HAVING <condition>
ORDER BY <attribute,{ASC/DESC} pair>
38
Relational Operators in SQL
Projection: SELECT A,B FROM R
Selection: SELECT * FROM R WHERE F
The WHERE condition can be a Boolean combination
of conditions.
Product of two tables, A X B:
SELECT
FROM
R., S.
R, S
39
Data Update
Examples:
Create a new instance of an entity explicitly
with all details:
INSERT table VALUES (v1, v2, ..., vn) inserts a
tuple (v1, v2, ..., vn) into table
Unspecified columns get their default value
if available,NULL if allowed (and no default
is defined), (fails otherwise):
INSERT table(c1, c2, cm) VALUES (v1, v2, ...,
vm) inserts a tuple with m columns set to (v1,
v2, ..., vm).
40
Data Update (cont’d)
Examples:
UPDATE table SET c1 = v1, ..., cm = vm [WHERE
expr]
Update all instances matching some local criterion:
UPDATE table SET ? WHERE?
Remove instances matching some local criterion:
DELETE table WHERE ?
Remove instances matching a non-local criterion
DELETE table FROM table, table2 WHERE AND
?
41
SQL Views
An SQL view is a table derived from other
tables
base (physical) tables—ultimately depend on
these defining tables
other views
Views are
usually virtual
sometimes materialized
42
View Specification
Views are specified with the paradigm
CREATE VIEW view
AS SELECT …
The SELECT … part is the defining query
In a view definition, we can
define column names of view
use aggregation (GROUP BY)
43
View Resolution
Views are
computed by a sort of macro expansion
when needed—view resolution
query modification: compute fresh
view materialization: store
always up to date: modifications to the
defining tables are automatically reflected in
the view
44
Updating Views
A view tuple may have as its source a
combination of base tuples, because of
joins
aggregations
Therefore, updating a view
is nontrivial to determine
may potentially lead to more than one update on the
defining relations
is often not allowed
45
Updating Views (2)
Restricted kinds of views may be updated.
The subselect must have
only one defining table
columns including a candidate key of the
table
only column names, not expressions
no joins, e.g., no correlative subqueries
no DISTINCT keyword
no aggregates
46
CHECK OPTION
Specified for updatable views
Checks whether an INSERT or UPDATE to a
view would immediately cause the tuple to
disappear from the view
the new tuple should satisfy the WHERE
clause of the view definition
47
SQL Constraints
SQL allows declarative constraints such as
CREATE ASSERTION assertion-name
CHECK (enhanced subselect query)
The DBMS checks if any ASSERTION is
ever violated
48