Advanced Querying

Download Report

Transcript Advanced Querying

Extending Relational Databases
Toon Calders
[email protected]
Last Lectures
• Relational query languages are limited
• Transitive closure cannot be expressed
• Gaifman-locality
• Also w.r.t. data storage relational model is limited
• Simple data
• No set types, arrays, …
This lecture
• Extending SQL
• Recursion
• Nested relations
• Object-oriented and object-relational database
systems
This lecture
• Extending SQL
• Recursion
• Nested relations
• Object-oriented and object-relational database
systems
Recursion in SQL
• SQL:1999 permits recursive view definition
• Example: find all employee-manager pairs, where
the employee reports to the manager directly or
indirectly (that is manager’s manager, manager’s
manager’s manager, etc.)
This example view, empl, is called the transitive
closure of the manager relation
Recursion in SQL
with recursive empl (employee_name, manager_name ) as (
select employee_name, manager_name
from manager
union
select manager.employee_name,
empl.manager_name
from manager, empl
where manager.manager_name =
empl.employee_name )
select *
from empl
This lecture
• Extending SQL
• Recursion
• Nested relations
• Object-oriented and object-relational database
systems
Nested Relations
• Very simple example:
• Class book
− set of authors
− title
− set of keywords
Extremely simple to model in any programming
language
Hard in relational database!
Nested Relations
• Either we ignore the multivalued dependencies
Title
Author
Keyword
Database System Concepts
Silberschatz
Database
Database System Concepts
Korth
Database
Database System Concepts
Sudarshan
Database
Database System Concepts
Silberschatz
Storage
Database System Concepts
Korth
Storage
Database System Concepts
Sudarshan
Storage
• This table is in 3NF, BCNF
Nested Relations
• Or we go to 4NF
Title
Author
Database System Concepts
Silberschatz
Database System Concepts
Korth
Database System Concepts
Sudarshan
Title
Keyword
Database System Concepts
Database
Database System Concepts
Storage
Nested Relations
• 4NF design requires users to include joins in their
queries.
• 1NF relational view
• eliminates the need for users to perform joins,
• but loses the one-to-one correspondence between
tuples and objects.
• has a large amount of redundancy
Nested Relational Algebra
• Types:
• Set of constants C = {c1,c2, …}
• If T1, …, Tk are types, then also {(T1,…,Tk)}
• Domain:
• Dom(C) = {c1,c2, …}
• Dom ( {(T1,…,Tk)} ) = P( { (x1,…,xk) | xi Ti, 1 ≤ i ≤ k} )
Nested Relational Algebra
• Example:
• C = {1,2,3,…}
• Type { (C, { ( C, { (C) } ) } ) } has domain:
− {(C)} 
{ {}, {(1)}, {(2)}, {(1),(2)}, ... }

Set of natural numbers
− { ( C, { (C) } ) }

Set of pairs (c,S), c is a number,
S a set of numbers
− All

Set of pairs (c,T), T is set of pairs (c,S)
Nested Relational Algebra
• Nested relation of type (T1,…,Tn)
• Subset of Dom( {(T1,…,Tn)} ).
• Nested relational algebra
• The usual operators , , , -, ,
• Also Nest and Unnest:
− U$i (R): remove nesting from ith column of R
− N$i1, …$ik (R): nest columns $i1 … $ik
Nested Relational Algebra
R =
{
N$2 R ={
( a, b ),
( a, c ),
( d, b ),
( d, c )
}
( a, {b,c} ),
( d, {b,c} )
}
U$2 (N$2 R) produces again original R
Flat-Flat Theorem
• Let Q be a nested-relational algebra expression
• Q takes a non-nested relation as input
• Q produced a non-nested relation as output
• Type (C,…,C)  (C,…,C)
• Then:
• Q can be rewritten as a normal relational algebra
expression; (i.e., one without nesting)
• Result is actually stronger:
• Nested d deep  nested d’ deep: no need for
intermediate results having depth > d, d’
Flat-Flat Theorem
• Importance?
• Can be used by query optimizers

No need to introduce intermediate nesting

Standard techniques can be used
Nesting in SQL
• Nesting is the opposite of unnesting, creating a
collection-valued attribute
• Many commercial database systems support nesting
in one way or another
• NOTE: SQL:1999 does not support nesting
• Nesting can be done in a manner similar to
aggregation, but using the function set() in place of
an aggregation operation, to create a set
Nesting in SQL
select title, author, pub_name, pub_branch,
set(keyword) as keyword-list
from flat-books
groupby title, author, pub_name, pub_branch
select title, set(author) as author-list,
pub_name, pub_branch,
set(keyword) as keyword-list
from flat-books
groupby title, pub_name, pub_branch,
Nesting (Cont.)
• Another approach to creating nested relations is to
use subqueries in the select clause.
select title,
( select author
from flat-books as M
where M.title=O.title) as author-set,
pub-name, pub-branch,
(select keyword
from flat-books as N
where N.title = O.title) as keyword-set
from flat-books as O
This lecture
• Extensions to SQL
• Recursion
• Enumeration types
• Nested relations
• Object-oriented and object-relational database
systems
Motivation for OO Databases
• Many applications require the storage and
manipulation of complex data
• design databases
• geometric databases
• …
• Object-Oriented programming languages manipulate
complex objects
• classes, methods, inheritance, polymorphism
Motivation for OO Databases
• In many applications persistency of the data is
nevertheless required
• protection against system failure
• consistency of the data
• Mapping: object in OO language  tuples of atomic
values in relational database is often problematic
• Impedance mismatch
Motivation for OO Databases
• Need for object-oriented features in databases
• Object-Oriented Databases
• Object-Relational Databases
• Need for more powerful data manipulation
• Extend expressive power of SQL; include functions,
recursion
Motivation for OO Databases
• Often we want to manipulate the data in the database
itself
• no data transmission costs
• client has limited computation power
• Expressive power of SQL is quite limited
• allows for efficient query optimization
• disallows complex data operations
Motivation for OO Databases
• Whole spectrum:
Relational
Database

Object
Relational 
Database
Persistent OO
Programming
Language
Complex Types and SQL:1999
• Extensions to SQL to support complex types
include:
• Collection and large object types
− Nested relations are an example of collection
types
• Structured types
− Nested record structures like composite
attributes
• Inheritance
• Object orientation
− Including object identifiers and references
Structured Types and Inheritance in SQL
• Structured types can be declared and used in SQL
create type Name as
(firstname
varchar(20),
lastname
varchar(20))
final
create type Address as
(street
varchar(20),
city
varchar(20),
zipcode
varchar(20))
not final
• Note: final and not final indicate whether subtypes can be
created
Structured Types and Inheritance in SQL
• Structured types can be used to create tables with
composite attributes
create table customer (
name Name,
address Address,
dateOfBirth date
)
• Dot notation used to reference components:
Name.firstname
Methods
• Can add a method declaration with a structured type.
method ageOnDate (onDate date)
returns interval year
• Method body is given separately.
create instance method ageOnDate (onDate date)
returns interval year
for CustomerType
begin
return onDate - self.dateOfBirth;
end
Inheritance
• Suppose that we have the following type definition for
people:
create type Person
(name varchar(20),
address varchar(20))
• Using inheritance to define the student and teacher
types
create type Student under Person
(degree
varchar(20),
department varchar(20))
create type Teacher under Person
(salary
integer,
department varchar(20))
Object-Identity and Reference Types
• Define a type Department with a field name and a
field head which is a reference to the type Person,
with table people as scope:
create type Department (
name varchar (20),
head ref (Person) scope people)
• We can then create a table departments as follows
create table departments of Department
Persistent Programming Languages
• Languages extended with constructs to handle
persistent data
• Programmer can manipulate persistent data directly
• no need to fetch it into memory and store it back to disk
(unlike embedded SQL)
• Persistent objects:
•
•
•
•
by class - explicit declaration of persistence
by creation - special syntax to create persistent objects
by marking - make objects persistent after creation
by reachability - object is persistent if it is declared
explicitly to be so or is reachable from a persistent
object
Object Identity and Pointers
• Degrees of permanence of object identity
• Intraprocedure: only during execution of a single
procedure
• Intraprogram: only during execution of a single
program or query
• Interprogram: across program executions, but not if
data-storage format on disk changes
• Persistent: interprogram, plus persistent across data
reorganizations
Object Identity and Pointers
• Persistent versions of C++ and Java have been
implemented
• C++
− ODMG C++
− ObjectStore
• Java
− Java Database Objects (JDO)
Comparison of O-O and O-R Databases
• Relational systems
• simple data types, powerful query languages, high
protection.
• Persistent-programming-language-based OODBs
• complex data types, integration with programming
language, high performance.
• Object-relational systems
• complex data types, powerful query languages, high
protection.
• Note: Many real systems blur these boundaries
• E.g. persistent programming language built as a
wrapper on a relational database offers first two
benefits, but may have poor performance.
Self-Study
• Read Chapter 9 of the book “Database System
Concepts, 5th edition”.
• Corresponds roughly to Chapters 8 and 9 of the 4th
edition, except the parts on ODMG (sections 8.5.1 and
8.5.2) which are not present in the 5th edition anymore.
• During the lecture of next Friday 20/02/09 there will be
time for questions and answers regarding the topics
covered in this book chapter.