Infrastructure

Download Report

Transcript Infrastructure

An Introduction to DBMS
Technology
Yelena Yesha
Olga Streltchenko
1
Presentation Overview
Database functionality
Relational Data Model
SQL
Web-Database Connectivity
2
Transaction


A transaction is an exchange of:
 Information;
 Goods;
 Services;
 Currency/currencies;
Transaction properties
 Atomicity: a transaction must be all or nothing.
 Consistency: a transaction takes the system
from one consistent state to another.
 Isolation.
 Durability.
3
Databases and Information
Economy
Shift from computation to information;
corporate computing;
personal computing and the Internet;
scientific computing.
Growing importance of transaction-orientation
and information retrieval.
The database field concentrates on the
efficient management of large amounts of
persistent, reliable shared data.
4
Database versus File System
Database management systems (DBMS)
is a software that provides transaction
support by implementing
Data independence;
Data access efficiency;
Concurrency control ;
Data integrity;
Reliability;
Security;
Data distribution and heterogeneity.
5
Data Independence and
Access Efficiency

DBMS allows to avoid rewriting all access routines
every time the data format changes or data is
added/modified/deleted.
 insulate applications from data storage details.
Logical independence: protection from changes in
logical structure of data.
Physical independence: protection from changes
in physical structure of data.

DBMS maintains data structures and implements
algorithms allowing to avoid linear search
indexing: search in O(log n);
fast access even on complex data queries.
6
Concurrency Control and
Data Integrity


Interleaving actions of different applications
boosts performance.
DBMS insures semantically correct access to
the same data by concurrent applications
two programs accessing the same data at the
same time can result in an inconsistent update;
implement sharing in a controlled manner.


Data semantics may require certain constraints
to be satisfied.
DBMS guarantees that application programs
comply with the constraints when
adding/modifying the data.
7
Reliability and Security
DBMS provides techniques for recovery
from software and hardware failures
guarantee survival of the data across
catastrophes.
DBMS prevents unauthorized users
from accessing/modifying data or
denying service to other users.
8
Data Distribution and
Heterogeneity
Centralization is the enemy of scalability
a vast number of modern applications are
distributed.
Data sharing in a distributed environment is
a challenge.
Heterogeneity applies to networks, hardware,
operating systems, programming languages,
data formats, etc.
Distributed applications must mask the
differences.
Need distributed data management.
9
Categories of Data Models
High-level or conceptual
entities, attributes, relationships.
Representational or implementation or
logical
relational, network hierarchical, objectoriented, object-relational.
Physical or low-level
data storage.
10
Levels of Abstraction in a
Database
 Schema versus Instance

schema = description of the data that




captures data types, relationships, constraints on the
data;
l
meta-data (data about data), knowledge, e.g.,
Employees(EmpName, EmpNo, Dept, Sal)
is independent of any application program
changes infrequently
instance = set of records/tuples/rows for that schema,
the actual data in the database at a given time


time-varying
e.g., <Jane, 201, Shoe, 1M>,<Susan, 302, Toy, 1M>
11
3-schema Architecture
Physical level description of a database:
how things are stored on disk:
files, record structures,
indices,
data structures for disk blocks,
methodology for dealing with too long records, etc.
Conceptual level description of a database
The description of application data (its schema)
using one of the traditional data models.
12
3-Schema Architecture
(cont'd)
View-level description of a database
What users of a particular application see
their own customized schema, e.g., for payroll, for the
ticket agent, for a simulation program.
Multiple levels
helps with data independence;
helps with maintenance.
Many views, single logical and physical
schema.
Levels of abstraction give data
independence.
13
The Entity-Relational
Model
Entity: a distinguishable object.
Entity set: a set of entities all of the same
type.
Attribute: a single property of an entity;
simple vs composite;
single-valued vs multi-valued;
stored vs derived;
null values.
Domain: set of values permitted for that
attribute.
14
The E-R Model (2)
Relationship: an association between two or
more entities.
Relationship set: a set of relationships all of the
same type
There is no correct schema for a batch of data.
Which schema is best depends on the
application.
Many basic data modelling choices depend on
an understanding of the application.
15
Data Model
Data model: notation for describing data,
plus a set of operations used to
manipulate that data.
a set of primitives for defining the structure of a
DB;
a set of operations for specifying the retrievals
and updates on a DB;
relational, hierarchical, network, object-oriented.
16
The Relational Model
(Codd 1970)
The relational data model is the most
important data model currently existing.
Value-oriented, i.e., allows operations on
relations whose results are relations, thus
enables to combine operations.
As opposed to object-oriented models, in which
Operations cannot be applied to the result of other
operations;
The result of an operation may be a new data type, and
operations may not be available for this type.
17
Definitions: Domain &
Relation
A domain is a set of atomic values.
A relation is a finite subset of the cartesian
product of a finite list of domains;
relation is a set of tuples;
order of tuples is irrelevant and
no relation has 2 identical tuples;
each tuple value is atomic
no composite attributes;
no multi-valued attributes.
18
Relational Model (cont’d)

Everything is represented by relations
– Formally: Given sets D1, D2, ....Dn (not necessarily distinct), a
relation R  D1 X D2 X ...X Dn
– Di 's are the domains and n is the arity (degree) of R
– elements of R are called tuples
– number of tuples in R is the cardinality of R

relational data model helps to view a relation as a table

Observe the following properties:
19
Relational Model (cont’d)
 Everything is represented by relations;
Given sets D1, D2, ....Dn (not necessarily distinct), a relation R
 D1 X D2 X ...X Dn;
Di 's are the domains and n is the arity (degree) of R;
elements of R are called tuples;
number of tuples in R is the cardinality of R.
 Relational data model helps to view a relation as a table:
each row represents a tuple (record);
each column represents an attribute (field).
 Properties:
no two rows are identical;
the ordering of tuples is unimportant;
the ordering of columns is important.
20
Keys
 Let R be a relation schema and K  R.
 K is a superkey of R if it can uniquely identify any
tuple in any r(R). There are no tuples t and t' such
that t[K] = t'[K}.
 K is a candidate key if K is a minimal superkey.
There is no K'  K such that K' is also a superkey of
r(R)
 A primary key is one of the candidate keys,
remaining candidate keys are alternate keys;
 Every relation has a key.
 A key is a property of a relation schema, not a
relation.
21
Integrity Constraints
 Relational database schema is a set of relation schemas
and a set of integrity constraints
 Integrity constraint: condition that must be true for any
instance of a database.
 Integrity constraints are expected to hold on every
database instance of the schema
 Integrity constraints
Structural:
key constraint: uniqueness of keys;
entity integrity constraint: no primary key value can be null;
referential integrity constraint: reference from relation R to
relation S must refer to an existing tuple of S
Semantic.
22
Foreign Keys
A set of attributes (FK) of R is a foreign key if
the attributes in FK have the same domain as the
primary key (PK) attributes of another relation S, and
for each instance of R, the values of FK occur as a
value of PK for some instance in S, or is null.
In the relational model, the only way an entity can
reference another entity is through the value of
the primary key of the second entity.
Foreign keys don't have to be unique or non-null,
but if one component is null, then all components
must be null.
23
E-R to Relations (I.e.,
Defining Relations)
Done using DDL (Data Definition Language)
Name whole database schema
Declare domains for attributes
Define relations:
name
attribute names and domains
primary and other keys
foreign keys
24
Translating from E-R
Represent entity set E by a relation whose
attributes are all the E-R attributes of E. Then
each tuple represents one entity of E.
To represent relation R between entity sets E1,
…, Ek, create relation R with key attributes of E1,
…, key attributes of Ek, as attributes (rename
duplicates). Each tuple of the relation represents
one combination of entities that are related to
one another.
You might have some redundant relations, which
you can delete.
25
Schema Normalization
Formal theory of database design
based on grouping attributes in a particular way
using attribute dependencies to achieve ‘good’
schemas
1NF, 2NF, 3NF, BCNF, 4NF, …
Goal:
don’t store redundant information
can represent everything (otherwise, the schema is
useless!)
26
Query and Update
Languages
DDL : data definition language
used by DBA
to define schemas, create views, create
indices
DML : data manipulation language
used by sophisticated casual user
to query data or
update data
27
Relational Query Languages
 Query languages allow manipulation and retrieval of data from
a database.
 Relational model supports simple, powerful query languages;
strong formal foundation based on logic;
allows for optimization.
 Two mathematical languages form the basis for relational
languages (e.g., SQL) and for implementation:
Relational Algebra: More operational, useful for representing
execution plans;
Relational Calculus: Lets users describe what they want, rather
than how to compute it (non-operational, declarative).
 Basic operations:
selection, projection, cross-product, set-difference, union,
intersection, join, division
28
SQL
 SQL (Structured Query Language) is the query
language for the System R developed at IBM San
Jose [Astraham, Gray, Lindsay, Selinger ..]
 SQL is now the query language for IBM's DB2 and
the de-facto standard on most commercial RDBMS.
 SQL is a comprehensive language providing
statements for data definition, query and update.
Hence it is both DDL and DML.
29
SQL (cont’d)
 SQL allows to create views, it can be embedded in a
general-purpose programming language (C or
PASCAL).
 SQL has one basic statement for retrieving data from
the database:
the SELECT statement
SELECT <attribute list>
FROM <table list>
WHERE <condition>
 Standards:
SQL or SQL1 (ANSI 1986);
SQL2 or SQL-92 (ANSI 1992);
SQL3 underway: extends SQL with OO and other concepts.
30
SQL Data Types
 Numeric
Integers of various ranges: INTEGER (or INT), SMALLINT;
Real numbers of various precision: FLOAT, REAL, DOUBLE
PRECISION;
Formatted numbers: DECIMAL(i,j) or DEC(i,j) or
NUMERIC(i,j).
 Character Strings
Fixed length n: CHAR(n) or CHARACTER(n);
Variable length of maximum n: VARCHAR(n) or CHAR
VARYING(n) (default n =1).
 Bit strings:
Fixed length n: BIT(n);
Varying length of maximum n: VARBIT(n) or BIT VARYING(n).
31
SQL Data Types (cont’d)
 Date & Time [SQL2]:
DATE (10 positions): YYYY-MM-DD;
TIME (8 positions): HH:MM:SS;
TIMESTAMP:date, time with 6 fractions of seconds and
optional time zone TIME(i) defines i decimal fractions of
seconds
(8+1+i positions): HH:MM:SS:ddd...d;
TIME WITH TIME ZONE includes the displacement from
standard universal time zone [+13:00 to -12:59] (6
additional positions): HH:MM:SS+/-HH:MM
INTERVAL: Year/Month or Day/TIME
32
Data Definition Language
 DDL is used to define the (schema of) database
to create a database schema;
to create a domain;
to create, drop. alter a table;
to create, remove an index [defunct in SQL2];
to create or drop a view;
to define integrity constraints;
to define access privileges to users (Oracle: CONNECT,
RESOURCE, DBA);
to GRANT or REVOKE privileges ON/TO object/user
 SQL2 supports multiple schemas
CREATE SCHEMA name AUTHORIZATION user;
– CREATE SCHEMA EMPLOYEE AUTHORIZATION yesha;
33
SQL Schema
EMP(Name,SSN,DNO,BirthPlace)
DEPT(DName,DNO,MGRSSN)
PROJECT(PName,PNO,PLocation,DNum)
WORKSON(ESSN,PNO,Hours)
CREATE SCHEMA 'COMPANY';
CREATE TABLE EMP
(
EName
name_dom
NOT NULL,
SSN
CHAR(9)
NOT NULL,
DNO
INTEGER
NOT NULL,
BirthPlace
city_dom,
PRIMARY KEY(SSN),
FOREIGN KEY (DNO) REFERENCES DEPT (DNO)
);
34
Drop
DROP command can be used to remove
–
a schema:
DROP SCHEMA Company CASCADE;
DROP SCHEMA Company RESTRICT


–
CASCADE option removes everything: tuples, tables,
domains, ...
RESTRICT option removes the schema if it has no elements
in it
a table:
DROP TABLE EMP CASCADE;
DROP SCHEMA EMP RESTRICT


CASCADE option removes the table and all references to it
RESTRICT option removes the table if it is not referenced
35
Alter
The ALTER allows to:
–
–
–
–
–
alter the domain of an attribute
ALTER TABLE Student
ALTER GPA NUMBER(4,2);
set or drop default value of an attribute
ALTER TABLE Student
ALTER GPA DROP DEFAULT;
ALTER TABLE Student
ALTER GPA SET DEFAULT 0.00;
add a new attribute to a relation
ALTER TABLE Student
ALTER Admission DATE;
drop an attribute (not in SQL1)
ALTER TABLE Student
DROP GPA [CASCADE/RESTRICT];
36
Data Manipulation Language
SELECT
tuple queries
aggregate queries
INSERT
DELETE
UPDATE
CREATE VIEW
37
Data Query: SELECT
 Used for retrieval.
 Used to specify subqueries for retrieval and for the
other operations.
 Not the sigma or select operator of the relational
algebra.
 The general form of a SELECT statement:
SELECT <attribute list>
FROM <table list>
WHERE <condition>
GROUP BY <attribute list>
HAVING <condition>
ORDER BY <attribute,{ASC/DESC} pair>
38
Relational Operators in SQL
Projection: SELECT A,B FROM R
Selection: SELECT * FROM R WHERE F
The WHERE condition can be a Boolean combination
of conditions.
Product of two tables, A X B:
SELECT
FROM
R., S.
R, S
39
Data Update
Examples:
Create a new instance of an entity explicitly
with all details:
INSERT table VALUES (v1, v2, ..., vn) inserts a
tuple (v1, v2, ..., vn) into table
Unspecified columns get their default value
if available,NULL if allowed (and no default
is defined), (fails otherwise):
INSERT table(c1, c2, cm) VALUES (v1, v2, ...,
vm) inserts a tuple with m columns set to (v1,
v2, ..., vm).
40
Data Update (cont’d)
Examples:
UPDATE table SET c1 = v1, ..., cm = vm [WHERE
expr]
Update all instances matching some local criterion:
UPDATE table SET ? WHERE?
Remove instances matching some local criterion:
DELETE table WHERE ?
Remove instances matching a non-local criterion
DELETE table FROM table, table2 WHERE AND
?
41
SQL Views
An SQL view is a table derived from other
tables
base (physical) tables—ultimately depend on
these defining tables
other views
Views are
usually virtual
sometimes materialized
42
View Specification
Views are specified with the paradigm
CREATE VIEW view
AS SELECT …
The SELECT … part is the defining query
In a view definition, we can
define column names of view
use aggregation (GROUP BY)
43
View Resolution
Views are
computed by a sort of macro expansion
when needed—view resolution
query modification: compute fresh
view materialization: store
always up to date: modifications to the
defining tables are automatically reflected in
the view
44
Updating Views
A view tuple may have as its source a
combination of base tuples, because of
joins
aggregations
Therefore, updating a view
is nontrivial to determine
may potentially lead to more than one update on the
defining relations
is often not allowed
45
Updating Views (2)
Restricted kinds of views may be updated.
The subselect must have
only one defining table
columns including a candidate key of the
table
only column names, not expressions
no joins, e.g., no correlative subqueries
no DISTINCT keyword
no aggregates
46
CHECK OPTION
Specified for updatable views
Checks whether an INSERT or UPDATE to a
view would immediately cause the tuple to
disappear from the view
the new tuple should satisfy the WHERE
clause of the view definition
47
SQL Constraints
SQL allows declarative constraints such as
CREATE ASSERTION assertion-name
CHECK (enhanced subselect query)
The DBMS checks if any ASSERTION is
ever violated
48