Advanced Databases Introduction

Download Report

Transcript Advanced Databases Introduction

Advanced Databases
Introduction
dr. Toon Calders
prof. dr. Jan Paredaens
Outline
• Motivation for the course
• Other DH courses
• Practical organization
• Course topics
• Project
• Overview of changes
Motivation for the Course
• Database = a piece of software to handle data:
− store,
− maintain, and
− query
• Most ideal system situation-dependent
•
•
•
•
data type: simple / semi-structured / complex / …
types of queries: simple lookup / analytical / …
type of usage: multi-user / single-user / distributed / …
…
Motivation for the Course
• Relational databases are tuned towards:
• simple data
• simple, ad-hoc queries
• multiple users
• Other models are more suitable for other types of
data
•
•
•
•
Object-Oriented,
Deductive,
Semi-Structured Databases,
Data warehouses
Motivation for the Course
• Study different data models
• Advantages, disadvantages
• Conceptual level
− what are the important notions?
• What’s underneath?
• In a scientific way
• exact, not just claims
Motivation for the Course
• Student knows:
• different database models
• Understands:
• why they are introduced
• conceptual notions
• Is able to:
• quickly master vendor-specific products
Outline
• Motivation for the course
• Other DH courses
• Practical organization
• Course topics
• Project
• Overview of changes
Other DH Courses
• Relational database systems
(2ID05) Databases and Data Modelling
(2ID35) Database Technology
transations, indexing, query optimization, distributed DB
• Other database models
(2ID45) Advanced Databases
• (2II15) Data Mining
• (2ID25) Information Retrieval
• (2ID99) Capita Selecta DH
Outline
• Motivation for the course
• Other DH courses
• Practical organization
• Course topics
• Project
• Overview of changes
Practical Organization
In principle …
• Wed 8:45  10:30
Practical session M 1.46
− no new material
− opportunity to practice, ask questions
− together solve exercises
• Fri 10:45  12:30
Lectures
− XML : Paredaens (6 lectures)
− other parts: Calders
HG 6.09
Practical Organization
• Important information
http://wwwis.win.tue.nl/~tcalders/teaching/advancedDB/
• Subscribe to 2ID45 on studyweb !
• messages to the whole class group
− lecture postponed, room changes, …
• [email protected]
Practical Organization
• Course material
• Book:
Silberschatz, Korth, Sudarshan. Database system
concepts 5th edition. McGraw-Hill International
• Lots of additional material on course webpage
− papers
− slides
− solutions to exercises
− …
Practical Organization
• Grades:
• 70% written exam
• 30% group project
• No project = no grade
• Grade for the project can be transfered to August,
similar for grade for the exam
• Grades expire in August
Outline
• Motivation for the course
• Other DH courses
• Practical organization
• Course Topics
• Project
• Overview of changes
Course Topics
• Limitations of the relational model
• Deductive databases
• Object-Oriented Databases
• Data Warehousing & OLAP
• Semi-Structured data
Limitations of the relational model
• Not every query can be expressed
• Transitive closure cannot be expressed in Relational
Algebra
− Give all cities reachable from Antwerp by plane
− Give all smallest components of a part
− Give all decendants of person X
• Not even if you’re very smart …
− proof
• Extension to other relational query languages
Deductive Databases
• Motivation is two-fold:
• add deductive capabilities to databases; the database
contains:
− facts (intensional relations)
− rules to generate derived facts (extensional relations)
Database is knowledge base
• Extend the querying
− datalog allows for recursion
Deductive Databases
• Datalog as engine of deductive databases
• similarities with Prolog
• has facts and rules
• rules define -possibly recursive- views
• Semantics not always clear
• safety
• negation
• recursion
Deductive Databases
g(a,b). g(b,c). g(a,d).
reach(X,X) :- g(X,Y).
reach(X,Y) :- g(X,Y).
reach(X,Z) :- reach(X,Y), reach(Y,Z).
node(X) :- g(X,Y).
node(Y) :- g(X,Y).
unreach(X,Y) :- node(X), node(Y),
not reach(X,Y).
Deductive Databases
• In this topic we study:
• How to handle negation and recursion in the same
program
• How to efficiently evaluate Datalog queries
OO Databases
• Many applications require the storage and
manipulation of complex data
• design databases
• geometric databases
• …
• Object-Oriented programming languages manipulate
complex objects
• classes, methods, inheritance, polymorphism
OO Databases
• Very simple example:
• Class book
− set of authors
− title
− set of keywords
Extremely simple to model in OO language
Hard in relational database!
OO Databases
• In many applications persistency of the data is
nevertheless required
• protection against system failure
• consistency of the data
• Mapping: object in OO language  tuples of atomic
values in relational database is often problematic
OO Databases
• Either we ignore the multivalued dependencies
Title
Author
Keyword
Database System Concepts
Silberschatz
Database
Database System Concepts
Korth
Database
Database System Concepts
Sudarshan
Database
Database System Concepts
Silberschatz
Storage
Database System Concepts
Korth
Storage
Database System Concepts
Sudarshan
Storage
• This table is in 3NF, BCNF
OO Databases
• Or we go to 4NF
Title
Author
Database System Concepts
Silberschatz
Database System Concepts
Korth
Database System Concepts
Sudarshan
Title
Keyword
Database System Concepts
Database
Database System Concepts
Storage
OO Databases
• Basically OODB = persistent OO programming
language
• Very important concept
• rather uninteresting scientifically
• This topic will mainly be self-study
• Reading bookchapter + Q & A session
Data Warehousing & OLAP
other
Metadata
sources
Operational
DBs
Extract
Transform
Load
Refresh
Monitor
&
Integrator
OLAP
Server
Analysis
Query/Reporting
Data
Warehouse
Serve
Data Mining
Data Marts
Data Sources
Data Storage
ROLAP
Server
OLAP Engine Front-End Tools
Data Warehousing & OLAP
Transaction processing
Flight reservations
• Operational setting
• Up-to-date = critical
• ticket sales
• do not sell a seat twice
• Simple data
• reservation, date, name
• Simple queries; only « touch » a
small part of the database
• Give flight details of X
List flights to Y
Data Warehousing & OLAP
Decision support
• Off-line setting
• « Historical » data
• Summarized data
• Integrate different databases
• Statistical queries
Flight company
• Evaluate ROI flights
• Flights of last year
• # passengers per carrier for
destination X
• Passengers, fuel costs,
maintenance info
• Average % of seats
sold/month/destination
Data Warehousing & OLAP
• In this topic we will study:
• Conceptual models for decision support
• Database explosion problem
• Efficient implementation strategies
− indexing, view materialization
XML
• Why is XML important?
• simple open non-proprietary widely accepted data
exchange format
• XML is like HTML but
• no fixed set of tags
− X = “extensible”
• no fixed semantics (c.q. representation) of tags
− representation determined by separate ‘stylesheet’
− semantics determined by application
• no fixed structure
− user-defined schemas
XML
<PersonList Type="Student" Date="2004-12-12">
<Title Value="Student List"/>
<Contents>
<Person>
<Name>Jan Vijs</Name>
<Id>11</Id>
<Address>
<Number>123</Number>
<Street>Turnstreet</Street>
</Address>
</Person>
<Person>
<Id>66</Id>
<Address>
<Street>Hole Rd</Street>
</Address>
</Person>
</Contents>
</PersonList>
XML
• In this topic:
• XML
• XQuery, XSLT
• LiXQuery
• Taught by prof Paredaens
Outline
• Motivation for the course
• Other DH courses
• Practical organization
• Course Topics
• Project
• Overview of changes
Project
• Pick one of the 4 topics:
•
•
•
•
deductive databases / rule-based systems
object-oriented databases
data warehouses
semi-structured databases
• Formulate your own project
• illustrating the different course concepts
• showing you mastered the technology
Project
• Make a project proposal
( WEEK 10 )
• examples of last year will be given
• fulfilling certain constraints
• listing technologies to be used
• Status report
( WEEK 15 )
• Final report
( WEEK 20 )
• Project presentations
( WEEKS 21 & 22 )
Outline
• Motivation for the course
• Other DH courses
• Practical organization
• Course Topics
• Project
• Overview of changes
Overview of Changes
• First some facts and figures regarding Spring 2008
• Heterogeneous group
− Outside NL, HBO, BSc TU/e
CSE
BIS
Overview of Changes
• Some suggestions I decided to act upon:
1. Start with the difficult material:
− expressiveness of RA
− Gaifman locality
2. Too much time is being spent on XML
− (5+5)  (6+3) & topic (XSLT) has been added
3. Disproportional weight given to XML in exam
− project no longer exclusively XML
Overview of Changes
• Some suggestions I decided to act upon:
4. Some materials and instruction just too hard
− extra exercices will be added; more modular
5. The course was split up in lots of individual subjects,
with no apparent relation to one another
− tried to handle that in the course motivation
Overview of Changes
• Some suggestions that were ignored:
A google for 'advanced databases' returns quite
some courses from other universities that look
interesting to me. Perhaps the lecturers could take a
look at those.
− When (re-)constructing the course last year other
universities’ ADB courses were surveyed. Many of the
interesting topics are already handled in other courses
(Data Mining, Information retrieval, Database technology)
Overview of Changes
• Some suggestions that were ignored:
Don't discuss prerequisite knowledge too much, it is
prerequisite.
 Heterogeneous group.
Balance the course subjects more, TC was discussed
very specific while the other 3 subjects where treated
in global.
 Time spent on TC is justified by its difficulty and its
importance for database theory + motivates OODB &
Deductive DB
Overview of Changes
• Take-away message
• (some?) lecturers do act on questionnaires
• filling out the questionnaires is useful
Overview of Changes
• Take-away message
• (some?) lecturers do act on questionnaires
• filling out the questionnaires is useful
Summary
• Relational model has limitations
• simple queries
• simple data
• OODBs allow complex data types
• Deductive databases, datalog complex queries
• Somewhere in-between: datawarehouses and OLAP
• special requirements, special datastructures
• Semi-structured data can be stored in XML
• Project complements theoretical lectures
• Instructions for clarification
!! See you on Friday !!