Introduction to Database Systems

Download Report

Transcript Introduction to Database Systems

Database Management
Systems
CSE 594
Introduction
September 28, 2000
Staff
Instructor: Alon Halevy
Sieg, Room 310, [email protected]
Office hours: Thursdays 5pm, email.
TAs: Maya Rodrig
Office hours: Thursdays 5pm, or by
appointment.
Mailing list: cse594@cs
Web page:
http://www.cs.washington.edu/education/courses/594/00au/
Goals of the Course
Purpose:
Principles of building database applications
Foundations of database management
systems.
Issues in building database systems.
Have fun: databases are not just bunches of
tuples.
Not an introduction to the nitty gritty of any
specific commerical system.
Grading
Paper homeworks: 25%
Very little regurgitation.
Meant to be challenging (I.e., fun).
Two programming projects: 40%
Work in pairs.
Build a database application
Build an XML query processor
Final Exam: 25% (currently scheduled for Dec.
14th).
Intangibles (e.g., participation): 10%
Textbook
Two volume collection, available as a pair
in the bookstore:
A First Course on Database Systems:
 Ullman & Widom
Database System Implementation:
Garcia-Molina, Ullman and Widom.
A few comments about the books.
Other Useful Texts
 Database Management Systems: Ramakrishnan and
Gehrke
 Foundations of Databases (Abiteboul, Hull & Vianu)
 Parallel and Distributed DBMS (Ozsu and Valduriez)
 Transaction Processing (Gray and Reuter)
 Database Systems (Silberschatz, Korth and Sudarshan)
 Principles of Transaction Processing (Bernstein and
Newcomer)
 Readings in Database Systems (Stonebraker and
Hellerstein)
 Proceedings of SIGMOD, VLDB, PODS conferences.
Prerequisites
Real Prerequisites
Operating systems
Data structures and
algorithms
Distributed systems
Complexity theory
Mathematical Logic
Knowledge
Representation
User interface design
Programming
languages
Artificial Intelligence
(Search)
 Greek, Hebrew,
French
Why use a DBMS?
Suppose we are building a system to store the
information pertaining to the university.
Several questions arise:
how do we store the data? (file organization, etc.)
how do we query the data? (write programs…)
make sure that updates don’t mess things up?
Provide different views on the data? (registrar versus
students)
how do we deal with crashes?
Way too complicated! Go buy a database system!
Why Use a DBMS?
All programs manipulate data, so why
use a database?
Many data manipulation tasks involve recurring
operations:
• Large amounts of data (Giga’s, Tera’s)
• Data is very structured
• Persistent data
• Valuable data
• Performance requirements
• Concurrent access to the data
• Restricted access to data
Functionality of a DBMS
Persistent storage management
Transaction management
Resiliency: recovery from crashes.
Separation between logical and physical
views of the data.
High level query and data manipulation
language.
Efficient query processing
Interface with programming languages
Bird’s Eye View of
How to build a database application
The different components of a database
system.
Building an Application
with a Database System
Requirements modeling (conceptual, pictures)
Decide what entities should be part of the application and
how they should be linked.
Schema design and implementation
Decide on a set of tables, attributes.
Define the tables in the database system.
Populate database (insert tuples).
Write application programs using the DBMS
way easier now that the data management is taken care of.
name
category
Conceptual
Modeling
name
ssn
Takes
Course
Student
quarter
Advises
Teaches
Professor
address
name
field
Relational Terminology
Attribute names
Product (relation name)
Name
Price
Category
Manufacturer
gizmo
$19.99
gadgets
GizmoWorks
Power gizmo $29.99
gadgets
GizmoWorks
SingleTouch $149.99
photography
Canon
MultiTouch
household
Hitachi
tuples
$203.99
(Arity=4)
Product(name: string, Price: real, category: enum, Manufacturer: string)
Schema Design and
Implementation
Table Students
Student
Course
Quarter
Charles
CS 444
Fall, 1997
Dan
CS 142
…
…
Winter,
1998
…
Note: Separation of the logical view from
the physical view of the data.
Normalization (theory).
Querying a Database
Find all the students who have taken CSE444 in
Fall, 1997.
S(tructured) Q(uery) L(anguage)
select E.name
from Enroll E
where E.course=CS444 and

E.quarter=“Fall, 1997”
Query processor figures out how to answer the
query efficiently.
An acquired taste…
Other query languages exist (OO, OR, datalog)
Writing Application Code
Use ODBC/JDBC.
Create a connection with a database.
Embed SQL in application code.
Specify transaction borders
May need physical tuning of the database.
User/
Application
Query
update
Query optimizer
Execution engine
Record, index
requests
Query execution
plan
Index/record mgr.
Page
commands
Buffer manager
Read/write
pages
Storage manager
storage
Storage Management
Becomes a hard problem because of the
interaction with the other levels of the
DBMS:
What are we storing?
Efficient indexing, single and multidimensional
Exploit “semantic” knowledge
Issue: interaction with the operating
system. Should we rely on the OS?
TP and Recovery
For efficient use of resources, we want
concurrent access to data.
Systems sometimes crash.
A “real” database guarantees ACID:
Atomicity: all or nothing of a transaction.
Consistency: always leave the DB consistent.
Isolation: every transaction runs as if it’s the
only one in the system.
Durability: if committed, we really mean it.
Do we really want ACID?
Data Integration
mybooks.com Mediated Schema
Books
Internet
Inventory
Orders
WAN
MorganKaufman
PrenticeHall
...
Shipping
Internet
East
West
Orders
Reviews
Internet
FedEx
Customer
Reviews
UPS
NYTimes
...
alt.books.
reviews
Uniform query capability across autonomous,
heterogeneous data sources on LAN, WAN, or
Internet
XML: Semi-structured Data
eXtensible Markup Language:
Emerging
format for
data
exchange
on the web
and
between
applications
.
<db>
<book>
<title>Complete Guide to DB2</title>
<author>Chamberlin</author>
</book>
<book>
<title>Transaction Processing</title>
<author>Bernstein</author>
<author>Newcomer</author>
</book>
<publisher>
<name>Morgan Kaufman</name>
<state>CA</state>
</publisher>
</db>
Database Industry
Relational databases are a great success
of theoretical ideas.
Oracle has a market cap of over $200B
Other players: IBM, MS, Sybase, Informix
Trends:
warehousing and decision support
data integration
XML, XML, XML.
Course (Rough) Outline
The basics: (quickly)
The relational model
SQL
Views, integrity constraints
XML
Physical representation:
Index structures.
Course Outline (cont)
Query execution: (Zack Ives)
Algorithms for joins, selections, projections.
Query Optimization
Data Integration
semi-structured data
Transaction processing and recovery (Phil
Bernstein)
Projects
Goal: identify and solve a problem in
database systems.
(almost) anything goes.
Groups of 2-3
Groups assembled end of week 2;
Proposals, end of week 3.
Touch base with me: every two weeks.
Example projects on web site.
Start Early.