Chapter 1: Introduction - CS-People by full name
Download
Report
Transcript Chapter 1: Introduction - CS-People by full name
CAS CS 460
Introduction to Database Systems
Thanks to Prof. George Kollios, Boston University and
Prof. Mitch Cherniack Brandeis University for lecture materials
About the course – Administrivia
Instructor:
Ravi Kothuri, [email protected]
Office, Hours: MCS 147, Mon/Wed 5-6PM and 7:30-8PM
Teaching Fellow:
Panagiotis Papapetrou, [email protected]
MCS 147, Tue/Thu 11 - 12:30 AM
Home Page:
http://www.cs.bu.edu/rkothuri
Check frequently! Syllabus, schedule, assignments,
announcements…
1.2
Grading
Homeworks: 20%
4-5 assignments
Midterm 20%
Final 30%
Projects 30%
5-6 parts
1.3
My Background
Oracle Corporation
PhD from University of California, Santa Barbara
Research:
Multi-dimensional indexing
Mobile Databases
Spatial, GIS systems and CAD/CAM databases
Google Maps type of technologies for Enterprise
Geometric algorithms for terrain management, city modeling,…
Data Mining (spatial, financial, …)
RFID technologies
Semantic –web (RDF) technologies
Book: “Pro Oracle Spatial”, Nov 2004
Teaching on invitation from Prof. George Kollios
1.4
Who uses Databases?
Universities (records for students, faculty, courses,…
Airlines (passengers, flights, luggage, …)
Banking (customers, loans, …)
Utilities (customers, usage history, bills); e.g. telecom, electric,..
Any Company: human resources
Employees, depts, facilities,…
“Data is the primary and integral part of information industry. Proper
management of the data using database technology is essential
for any large-scale company, organization.’’
1.5
What is a Database System?
Database:
A very large collection of related data
Models a real world enterprise:
Entities (e.g., teams, games / students, courses)
Relationships (e.g., The Patriots are playing in the
Superbowl)
Even active components (e.g. “business logic”)
DBMS: A software package/system that can be used
to store, manage and retrieve data form databases
Database System: DBMS+data (+ applications)
1.6
Why Study Databases??
Shift from computation to information
Always true for corporate computing
More and more true in the scientific world
and of course, Web
DBMS encompasses much of CS in a practical discipline
OS, languages, theory, AI, logic
1.7
Managing data: A naïve approach
Why not store everything on flat files: use the file system of the
OS, cheap/simple…
Name, Course, Grade
John Smith, CS112, B
Mike Stonebraker, CS234, A
Jim Gray, CS560, A
John Smith, CS560, B+
…………………
Yes, but not scalable…
Filesize limitations, access/update performance is slow,..
1.8
Problem 1
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
(say, in different departments)
John Smith, [email protected], CS112, B
John Smith, Arts560, [email protected], B+
Smith J, [email protected], Math212, A
Why is this a problem?
Wasted space
Potential inconsistencies (multiple formats, John
Smith vs Smith J.)
1.9
Problem 2
Data retrieval:
Find the students who took CS560
Find the students with GPA > 3.5
For every query we need to write a program!
Need a Query/Retrieval engine that can support
different ways to access data
Easy to write
Execute efficiently
1.10
Problem 3
Data Integrity
No support for sharing:
Prevent simultaneous modifications
No coping mechanisms for system crashes
No means of Preventing Data Entry Errors (checks must be hard-coded
in the programs)
Security problems
1.11
Database Systems
Database systems offer solutions to all the mentioned problems
Database systems:
Support Modeling of the data
Provide Levels of Abstraction of the data
Provide programs to allow you to Retrieve/modify the data
SQL
• For easy, standard specification of queries
Query Optimizer
• To process your queries efficiently
Ensure Integrity Maintenance
Transaction Manager/Recovery Manager
• to ensure atomicity/integrity in concurrent transactions
• to ensure integrity after system crashes)
…………….
1.12
Database Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.13
Data Model
A framework for describing
data
data relationships
data semantics
data constraints
Entity-Relationship model (Ch. 6)
A set of entities to model real-world objects
Relationships among entities
Relational model
Data as a set (or sets) of “records” or “tuples”
Each tuple in the set has the same set of attributes
Other models:
object-oriented model: inheritance, abstraction,…
semi-structured data models, XML: tuples in a set can have
different attributes
1.14
Entity-Relationship Model
Example of schema in the entity-relationship model
1.15
Entity Relationship Model (Cont.)
E-R model of real world
Entities (objects)
E.g. customers, accounts, bank branch
Relationships between entities
E.g. Account A-101 is held by customer Johnson
Relationship set depositor associates customers with accounts
Widely used for database design
Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and
processing
1.16
Relational Model
Attributes
Example of tabular data in the relational model
Customer-id
customername
192-83-7465
Johnson
019-28-3746
Smith
192-83-7465
Johnson
321-12-3123
Jones
019-28-3746
Smith
customerstreet
customercity
accountnumber
Alma
Palo Alto
A-101
North
Rye
A-215
Alma
Palo Alto
A-201
Main
Harrison
A-217
North
Rye
A-201
1.17
Database Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.18
Levels of Abstraction
Data storage Involves Complex data structures
Hide complexity from users
Abstract views of the data (e.g., for storing a customer record)
Physical level: how a customer record is stored as
bytes/words on disk
• Mostly hidden from database users/programmers
Logical level: describes “types” inside the database
type customer = record
name : string;
street : string;
city : integer;
ssn; integer;
end;
View level: application programs hide details of data types.
Views can also hide information (e.g., ssn) for security purposes.
1.19
View of Data
A logical architecture for a database system
1.20
Physical Level: Data Organization
Data Storage (Ch 11)
Where can data be stored?
Main memory
Secondary memory (hard disks)
Optical store
Tertiary store (tapes)
Move data? Determined by buffer manager
Mapping data to files? Determined by file manager
1.21
Database Architecture
(physical level data organization)
DBA
DDL Commands
DDL Interpreter
File Manager
Buffer Manager
Storage Manager
Data
Secondary Storage
Metadata
Schema
1.22
Database Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.23
Data retrieval
Queries (Ch 3, 4)
Query = Declarative data retrieval
describes what data, not how to retrieve it
Ex. Give me the students with GPA > 3.5
vs
Scan the student file and retrieve the records with gpa>3.5
Why?
1. Easier to write
2. Efficient to execute
1.24
Data retrieval
Query
Query Processor
Plan
Query Optimizer
Query Evaluator
Data
Query Optimizer
“compiler” for queries (aka “DML Compiler”)
Plan ~ Assembly Language Program
Optimizer Does Better With Declarative Queries:
1. Algorithmic Query (e.g., in C) 1 Plan to choose from
2. Declarative Query (e.g., in SQL) n Plans to choose from
1.25
Specifying the Query using SQL
SQL: widely used (declarative) non-procedural language
E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
Procedural languages: C++, Java, relational algebra
1.26
Data retrieval:
Indexing (Ch 12)
How to answer fast the query: “Find the student with SID = 101”?
One approach is to scan the student table, check every student, retrurn
the one with id=101… very slow for large databases
Any better idea?
1st keep student record over the SID. Do a binary search…. Updates…
2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the
same time keep the records sorted! In databases we use the B+-tree (multiway
search tree)
3rd Use a hash table. Much faster for exact match queries… but cannot support
Range queries. (Also, special hashing schemes are needed for dynamic data)
1.27
1.28
180
200
150
156
179
120
130
100
101
110
30
35
3
5
11
180
150
100
30
120
B+Tree Example
B=4
Root
Database Architecture
(data retrieval)
DB Programmer
User
Code w/ embedded queries
DBA
Query
DDL Commands
Query Optimizer
DML Precompiler
Query Evaluator
Query Processor
File Manager
Storage Manager
Buffer Manager
Secondary Storage
Indices
Data
Statistics
Metadata
Schema
1.29
DDL Interpreter
Database Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.30
Data Integrity
Transaction processing (Ch 15, 16)
Why Concurrent Access to Data must be Managed?
John and Jane withdraw $50 and $100 from a common
account…
John:
1. get balance
2. if balance > $50
3. balance = balance - $50
4. update balance
Jane:
1. get balance
2. if balance > $100
3. balance = balance - $100
4. update balance
Initial balance $300. Final balance=?
It depends…
1.31
Data Integrity
Recovery (Ch 17)
Transfer $50 from account A ($100) to account B ($200)
1. get balance for A
2. If balanceA > $50
3. balanceA = balanceA – 50
4.Update balanceA in database
5. Get balance for B
System crashes….
6. balanceB = balanceB + 50
7. Update balanceB in database
Recovery management
1.32
Database Architecture
DB Programmer
DBA
User
Code w/ embedded queries
DDL Commands
Query
Query Optimizer
DML Precompiler
Query Evaluator
Query Processor
File Manager
Transaction Manager
Recovery Manager
Buffer Manager
Storage Manager
Secondary Storage
DDL Interpreter
Indices
Data
Metadata
Integrity Constraints
Statistics
Schema
1.33
Outline
1st half of the course: application-oriented
How to develop database applications: User + DBA
2nd part of the course: system-oriented
Learn the internals of a relational DBMS (Oracle..)
Last few lectures on Oracle-specific features such as XDB….
1.34