Introduction to Database Systems
Download
Report
Transcript Introduction to Database Systems
Database Management
Systems
CSE 594
Introduction
September 28, 2000
Staff
Instructor: Alon Halevy
Sieg, Room 310, [email protected]
Office hours: Thursdays 5pm, email.
TAs: Maya Rodrig
Office hours: Thursdays 5pm, or by
appointment.
Mailing list: cse594@cs
Web page:
http://www.cs.washington.edu/education/courses/594/00au/
Goals of the Course
Purpose:
Principles of building database applications
Foundations of database management
systems.
Issues in building database systems.
Have fun: databases are not just bunches of
tuples.
Not an introduction to the nitty gritty of any
specific commerical system.
Grading
Paper homeworks: 25%
Very little regurgitation.
Meant to be challenging (I.e., fun).
Two programming projects: 40%
Work in pairs.
Build a database application
Build an XML query processor
Final Exam: 25% (currently scheduled for Dec.
14th).
Intangibles (e.g., participation): 10%
Textbook
Two volume collection, available as a pair
in the bookstore:
A First Course on Database Systems:
Ullman & Widom
Database System Implementation:
Garcia-Molina, Ullman and Widom.
A few comments about the books.
Other Useful Texts
Database Management Systems: Ramakrishnan and
Gehrke
Foundations of Databases (Abiteboul, Hull & Vianu)
Parallel and Distributed DBMS (Ozsu and Valduriez)
Transaction Processing (Gray and Reuter)
Database Systems (Silberschatz, Korth and Sudarshan)
Principles of Transaction Processing (Bernstein and
Newcomer)
Readings in Database Systems (Stonebraker and
Hellerstein)
Proceedings of SIGMOD, VLDB, PODS conferences.
Prerequisites
Real Prerequisites
Operating systems
Data structures and
algorithms
Distributed systems
Complexity theory
Mathematical Logic
Knowledge
Representation
User interface design
Programming
languages
Artificial Intelligence
(Search)
Greek, Hebrew,
French
Why use a DBMS?
Suppose we are building a system to store the
information pertaining to the university.
Several questions arise:
how do we store the data? (file organization, etc.)
how do we query the data? (write programs…)
make sure that updates don’t mess things up?
Provide different views on the data? (registrar versus
students)
how do we deal with crashes?
Way too complicated! Go buy a database system!
Why Use a DBMS?
All programs manipulate data, so why
use a database?
Many data manipulation tasks involve recurring
operations:
• Large amounts of data (Giga’s, Tera’s)
• Data is very structured
• Persistent data
• Valuable data
• Performance requirements
• Concurrent access to the data
• Restricted access to data
Functionality of a DBMS
Persistent storage management
Transaction management
Resiliency: recovery from crashes.
Separation between logical and physical
views of the data.
High level query and data manipulation
language.
Efficient query processing
Interface with programming languages
Bird’s Eye View of
How to build a database application
The different components of a database
system.
Building an Application
with a Database System
Requirements modeling (conceptual, pictures)
Decide what entities should be part of the application and
how they should be linked.
Schema design and implementation
Decide on a set of tables, attributes.
Define the tables in the database system.
Populate database (insert tuples).
Write application programs using the DBMS
way easier now that the data management is taken care of.
name
category
Conceptual
Modeling
name
ssn
Takes
Course
Student
quarter
Advises
Teaches
Professor
address
name
field
Relational Terminology
Attribute names
Product (relation name)
Name
Price
Category
Manufacturer
gizmo
$19.99
gadgets
GizmoWorks
Power gizmo $29.99
gadgets
GizmoWorks
SingleTouch $149.99
photography
Canon
MultiTouch
household
Hitachi
tuples
$203.99
(Arity=4)
Product(name: string, Price: real, category: enum, Manufacturer: string)
Schema Design and
Implementation
Table Students
Student
Course
Quarter
Charles
CS 444
Fall, 1997
Dan
CS 142
…
…
Winter,
1998
…
Note: Separation of the logical view from
the physical view of the data.
Normalization (theory).
Querying a Database
Find all the students who have taken CSE444 in
Fall, 1997.
S(tructured) Q(uery) L(anguage)
select E.name
from Enroll E
where E.course=CS444 and
E.quarter=“Fall, 1997”
Query processor figures out how to answer the
query efficiently.
An acquired taste…
Other query languages exist (OO, OR, datalog)
Writing Application Code
Use ODBC/JDBC.
Create a connection with a database.
Embed SQL in application code.
Specify transaction borders
May need physical tuning of the database.
User/
Application
Query
update
Query optimizer
Execution engine
Record, index
requests
Query execution
plan
Index/record mgr.
Page
commands
Buffer manager
Read/write
pages
Storage manager
storage
Storage Management
Becomes a hard problem because of the
interaction with the other levels of the
DBMS:
What are we storing?
Efficient indexing, single and multidimensional
Exploit “semantic” knowledge
Issue: interaction with the operating
system. Should we rely on the OS?
TP and Recovery
For efficient use of resources, we want
concurrent access to data.
Systems sometimes crash.
A “real” database guarantees ACID:
Atomicity: all or nothing of a transaction.
Consistency: always leave the DB consistent.
Isolation: every transaction runs as if it’s the
only one in the system.
Durability: if committed, we really mean it.
Do we really want ACID?
Data Integration
mybooks.com Mediated Schema
Books
Internet
Inventory
Orders
WAN
MorganKaufman
PrenticeHall
...
Shipping
Internet
East
West
Orders
Reviews
Internet
FedEx
Customer
Reviews
UPS
NYTimes
...
alt.books.
reviews
Uniform query capability across autonomous,
heterogeneous data sources on LAN, WAN, or
Internet
XML: Semi-structured Data
eXtensible Markup Language:
Emerging
format for
data
exchange
on the web
and
between
applications
.
<db>
<book>
<title>Complete Guide to DB2</title>
<author>Chamberlin</author>
</book>
<book>
<title>Transaction Processing</title>
<author>Bernstein</author>
<author>Newcomer</author>
</book>
<publisher>
<name>Morgan Kaufman</name>
<state>CA</state>
</publisher>
</db>
Database Industry
Relational databases are a great success
of theoretical ideas.
Oracle has a market cap of over $200B
Other players: IBM, MS, Sybase, Informix
Trends:
warehousing and decision support
data integration
XML, XML, XML.
Course (Rough) Outline
The basics: (quickly)
The relational model
SQL
Views, integrity constraints
XML
Physical representation:
Index structures.
Course Outline (cont)
Query execution: (Zack Ives)
Algorithms for joins, selections, projections.
Query Optimization
Data Integration
semi-structured data
Transaction processing and recovery (Phil
Bernstein)
Projects
Goal: identify and solve a problem in
database systems.
(almost) anything goes.
Groups of 2-3
Groups assembled end of week 2;
Proposals, end of week 3.
Touch base with me: every two weeks.
Example projects on web site.
Start Early.