Insert Title Here

Download Report

Transcript Insert Title Here

1
The Relational Data Model
and SQL
10:30AM—12noon
Monday, July 18th, 2005
CSIG05
Chaitan Baru
2
OUTLINE
• Foundations of the relational data model
– Data models and database systems
– Relations, attributes, keys
• Introduction to SQL
• “Hands-on” with SQL
– Vishwanath Nandigam
3
Historical note: IBM’s role in
database systems
• IBM’s early database systems were based on the
hierarchical data model – IMS (Information Management
System).
– IMS serves more than 95 percent of Fortune 1000 companies
– Manages 15 petabytes of production data
– Supports more than two hundred million users per day
S1
Student
S2
Course
C1
C2
C2
C3
C4
Instructor
I1
I2
I2
I3
I1
4
IBM and Relational Database
Systems
• Relational model was introduced to provide separation of
application logic from the data representations
• DB2  IBM’s second database product!
• Foundations of relational model were invented at IBM by
E.F.Codd, IBM
– A Relational Model of Data for Large Shared Data Banks, E. F.
Codd, June 1970
• First prototype, System R, was developed at IBM Almaden
in mid-70’s
– Introduced SQL
– Provided as SQL/DS on IBM mainframe systems
– Oracle was based on early System R work
5
Why the Relational Model?
• Deals with the application logic / data separation in
business data processing, unlike the earlier “network” and
“hierarchical” data model
• Plus, an algebra for manipulating relations
 the key innovation
Student
N
takes
S# student info
M
Course
C# course info
S# C#
N
1
teaches Instructor
6
What is a Database System?
• Database (system) =
– Database Instance (e.g. set of tables of rows)
– Database Management System (DBMS)
• Origins in the commercial world:
– to organize, query, and manipulate data more
effectively, efficiently, and independently
• Scientific databases
– often special features:
• spatial, temporal, spatiotemporal, GIS, units, uncertainty, raw
& derived data, …
7
Why not just use files as
“databases”?
• Works for some applications…
• But:
– scanning & ‘grep’ing large files can be very inefficient
– no language support for selecting desired data, joining them, etc.
• cannot express the kinds of questions/queries you’d like to ask
• ‘grep’ is no substitute for a query language
–
–
–
–
–
–
–
redundant and/or inconsistent storage of data
no transaction management and concurrency control among multiple users
no security
no recovery
no data independence (application data)
no data modeling support
…
8
Features of a Database System
• A data model (relational, object-oriented, XML)
prescribes how data can be organized:
– as relations (tables) of tuples (rows)
– as classes of (linked) objects
– as XML trees
• A (database) schema (stored in the “data dictionary”)
defines the structure of a specific database instance:
– Relational schema
– OO schema
– XML Schema (or XML DTD)
• A query language
– Allows ad hoc, declarative (non-procedural) queries on schema
9
Features of a Database System
• Data is treated uniformly and separately from the
application
• Efficient data access
• Queries and views are expressed over the schema
• Integrity constraints (checking and enforcement)
• Transactions combine sets of operations into logical units
(all-or-nothing)
• Synchronization of concurrent user transactions
• Recovery (after system crash)
– not to be confused w/ backup
– instead: guarantee consistency by “roll-back” of partially executed
transactions (how? Hint: logging)
• …
10
DB features:
E.g. Concurrency Control
• Concurrent execution of simultaneous requests
– long before web servers where around...
– transaction management guarantees consistency despite
concurrent/interleaved execution
• Transaction (= sequence of read/write operations)
– Atomicity: a transaction is executed completely or not at all
– Consistency: a transaction creates a new consistent DB state, i.e.,
in which all integrity constraints are maintained
– Isolation: to the user, a transaction seems to run in isolation
– Durability: the effect of a successful (“committed”) transaction
remains even after system failure
11
Levels of Abstraction:
Architecture Overview
User
Conceptual
…
View 1
View 2
logical data
independence
Level
ER-Model
(Entity-Relationship)
OO Models (Classes…)
part of DB design
 conceptual design
… often lost in the
process…
View n Export schemas
Logical (“conceptual”) level Tables
physical data independence
Physical level Index structures
DB
instances
12
Database Design: EntityRelationship (ER) Model
Name
•
•
•
•
Entities:
Relationships:
Attributes:
ER Model:
Salary
Employee
Name
works-for
Manager
Department
since
– initial, high-level DB design (conceptual model)
– easy to map to a relational schema (database tables)
– comes with more constraints (cardinalities, aggregation) and
extensions: EER (is-a => class hierarchies)
– related: UML (Unified Modeling Language) class diagrams
13
The Relational Model
Employee
Emp
tom
tim
sally
carol
carol
….
Salary DNo
60k
1
57k
1
45k
3
30k
1
35k
2
FK: foreign key,
pointing to another key
Department
DNo
1
2
3
Name
Toys
Comp.
Shoes
Mgr
carol
carol
sam
• Relation/Table Name:
– employee, dept
• Attributes = Column Names:
– Emp, Salary, DeptNo, Name, Mgr
• Relational Schema:
– employee(Emp:string, Salary:integer,
DeptNo:integer), ...
• Tuple = Row of the table:
– (“tom”, “60000”, “1”)
• Relation = Set of tuples:
– {(...), (...), ...}
14
Creating a Relational Database in SQL
CREATE TABLE employee (
ssn
CHAR(11),
name
VARCHAR(30),
deptNo
INTEGER,
PRIMARY KEY (ssn),
FOREIGN KEY (deptNo) REFERENCES department
)
CREATE TABLE department (
deptNo
INTEGER,
name
VARCHAR(20),
manager CHAR(11),
PRIMARY KEY (deptNo),
FOREIGN KEY (manager) REFERENCES employee(ssn) )
15
What is a Query?
• Intuitively:
– An “executable question” in terms of a database schema
– Evaluating a query Q against a database instance D yields a set of
answer objects:
• Relational tuples or XML elements
• Example:
– Who are the employees in the ‘Toys’ dept.?
– Who is (are) the manager(s) of ‘Tom’?
– Show all pairs (Employee, Mgr)
• Technically:
– A mapping from an input schema (the given table schemas) to a
result schema (the new columns you are interested in) defined in
some query language
16
Why (Declarative) Query
Languages?
• Things we talk and think about in PLs and QLs
– Assembly languages:
• registers, memory locations, jumps, ...
– C and the likes:
• if-then-else, for, while, memory (de-)allocation, pointers, ...
– Object-oriented languages:
•
•
•
•
C++: C plus objects, methods, classes, ...
Java: objects, methods, classes, references, ...
Smalltalk: objects, objects, objects, ...
OQL: object-query language
17
Why (Declarative) Query
Languages?
• Things we talk and think about in PLs and QLs
– Functional languages (Haskell, ML):
• (higher-order) functions, fold(l|r), recursion,
patterns, ...
=> Relational languages (SQL, Datalog)
• relations (tables), tuples (rows); conceptual level: ER
• relational operations: , , , , ..., ,,,,,..., , , |X|
=> Semistructured/XML (Tree) & Graph Query
Languages
• trees, graphs, nodes, edges, children nodes, siblings, …
• XPath, XQuery, …
• Also: Focus on what, and not how!
18
Example: Querying a Relational
Database
input tables
Employee
Department
Emp
anne
john
DeptNoMgr
1
anne
2
anne
Salary DeptNo
62k
2
60k
1
join
SQL query
(or view def.)
SELECT e.Emp, d.Mgr
we don’t say how to
FROM Employee e, Department d
evaluate this
expression
WHERE e.DeptNo = d.DeptNo
result
answer
(or view)
Emp
john
anne
Mgr
anne
anne
19
Example Query: SQL vs
DATALOG
• “List all employees and their managers”
• In SQL:
SELECT e.name, d.manager
FROM Employee e, Department d
WHERE e.deptNo = d.deptNo
a “join” operation
• In DATALOG:
q(E, M) :- employee(E, S, D), department(D, N, M).
20
Important Relational Operations
• select(R, Condition)
– filter rows of a table wrt. a condition
• project(R, Attr)
– remove unwanted columns; keep rest
• join(R1, A2, R2, A2, Condition)
– find “matches” in a “related” table
– e.g. match R1.foreign key = R2.primary key
• cartesian product(R1, R2)
• union (“OR”), intersection (“AND”)
• set-difference (“NOT IN”)
21
Queries, Views, Integrity Constraints
• … can all be seen as “special queries”
• Query q(…) :- …
ad-hoc queries
• View v(…) :- …
exported views;
• Integrity Constraints
– ic (…) :- …. MgrSal < EmpSal …
– say what shouldn’t happen
– if it does: alert the user (or refuse an update, …)
22
Query Evaluation vs Reasoning
• Query evaluation
– Given a database instance D and a query Q, run Q(D)
– What databases do all the time
• Reasoning (aka “Semantic Query Optimization”)
– Given a query Q and a constraint C, “optimize” Q&C (e.g., given
C, Q might be unsatisfiable)
– Given Q1 and Q2 decide whether Q1 Q2
– Given Q1,Q2, C decide whether Q1 Q2 | C
– Note: we are NOT given a database instance D here; just the
schema and the query/IC expressions
23
Summary QLs for Relational
Databases
Natural Join: same attribute name
 add condition that values must match
24
Relational Algebra
25
Relational Algebra
26
Relational Algebra
27
Relational Algebra
28
Relational Algebra