Notes 1 - Duke Computer Science

Download Report

Transcript Notes 1 - Duke Computer Science

CPS 216: Advanced Database
Systems
Shivnath Babu
Outline for Today
• What this class is about: Data management
• What we will cover in this class
• Logistics
What does a Database System mean to you?
(Hint: What are they used for? Give examples)
User/Application
Data Management
Query
Query
Query
Data
DataBase Management System (DBMS)
Example: At a Company
Query 1: Is there an employee named “Nemo”?
Query 2: What is “Nemo’s” salary?
Query 3: How many departments are there in the company?
Query 4: What is the name of “Nemo’s” department?
Query 5: How many employees are there in the
“Accounts” department?
Employee
Department
ID
Name
DeptID
Salary
…
ID
Name
…
10
Nemo
12
120K
…
12
IT
…
20
Dory
156
79K
…
34
Accounts
…
40
Gill
89
76K
…
89
HR
…
52
Ray
34
85K
…
156
Marketing
…
…
…
…
…
…
…
…
…
DataBase Management System (DBMS)
High-level
Query Q
Answer
DBMS
Data
Translates Q into
best execution plan
for current conditions,
runs plan
Example: Store that Sells Cars
Make
Model OwnerID
ID Name
Owners of
Honda Accords Honda Accord
12
12 Nemo
who are <=
Honda Accord
156
156 Dory
23 years old
Join (Cars.OwnerID = Owners.ID)
Filter (Make = Honda and
Model = Accord)
Cars
Age
22
21
Filter (Age <= 23)
Owners
Make
Model
OwnerID
ID
Name
Age
Honda
Accord
12
12
Nemo
22
Toyota
Camry
34
34
Ray
42
Mini
Cooper
89
89
Gill
36
Honda
Accord
156
156
Dory
21
…
…
…
…
…
…
DataBase Management System (DBMS)
High-level
Query Q
Answer
DBMS
Keeps data safe
and correct
despite failures,
concurrent
updates, online
processing, etc.
Data
Translates Q into
best execution plan
for current conditions,
runs plan
DBMS is multi-user
• Example
Get account balance from database;
If balance > amount of withdrawal then
balance = balance - amount of withdrawal;
dispense cash;
store new balance into database;
• Homer at ATM1 withdraws $100
• Marge at ATM2 withdraws $50
• Initial balance = $400, final balance = ?
– Should be $250 no matter who goes first
Final balance = $250
Homer withdraws $100:
read balance; $400
if balance > amount then
balance = balance - amount; $300
write balance; $300
Marge withdraws $50:
read balance; $300
if balance > amount then
balance = balance - amount; $250
write balance; $250
Final balance = $300
Homer withdraws $100:
Marge withdraws $50:
read balance; $400
read balance; $400
If balance > amount then
balance = balance - amount; $350
write balance; $350
if balance > amount then
balance = balance - amount; $300
write balance; $300
Final balance = $350
Homer withdraws $100:
Marge withdraws $50:
read balance; $400
read balance; $400
if balance > amount then
balance = balance - amount; $300
write balance; $300
if balance > amount then
balance = balance - amount; $350
write balance; $350
Concurrency control in DBMS
• Similar to concurrent programming problems
– But data is not all in main-memory
• Appears similar to file system concurrent
access?
– Approach taken by MySQL initially; now
MySQL offers better alternatives
• But want to control at much finer granularity
• Or else one withdrawal would lock up all
accounts!
Recovery in DBMS
• Example: balance transfer
decrement the balance of account X
by $100;
increment the balance of account Y
by $100;
• Scenario 1: Power goes out after the first
instruction
• Scenario 2: DBMS buffers and updates data in
memory (for efficiency); before they are written
back to disk, power goes out
• Log updates; undo/redo during recovery
DataBase Management System (DBMS)
High-level
Query Q
Answer
DBMS
Keeps data safe
and correct
despite failures,
concurrent
updates, online
processing, etc.
Data
Translates Q into
best execution plan
for current conditions,
runs plan
Summary of modern DBMS features
• Persistent storage of data
• Logical data model; declarative queries and
updates ! physical data independence
• Multi-user concurrent access
• Safety from system failures
• Performance, performance, performance
– Massive amounts of data (terabytes ~
petabytes)
– High throughput (thousands ~ millions
transactions per minute)
– High availability (¸ 99.999% uptime)
Modern DBMS Architecture
Applications
SQL
DBMS
Parser
Logical query plan
Query Optimizer
Physical query plan
Query Executor
Access method API calls
Storage Manager
Storage system API calls File system API calls
OS
Disk(s)
Course Outline
• 40% of the class is about core DBMS concepts
– Query execution, query optimization, transactions,
recovery, etc.
– Textbook material
• 60% of the class is on “what is happening today
in data management”
–
–
–
–
–
–
New developments on textbook material
Data streams
Web search – Google, Yahoo!
Data integration (structured data + unstructured data)
Data mining
Unsolved challenges
Using a Traditional DBMS
User/Application
Query
Query
…
Loader
Result
Result
…
Table R
Table S
New Approach for Data Streams
User/Application
Register
Continuous Query
(Standing Query)
Input streams
Result
Stream Query
Processor
Example Continuous (Standing) Queries
• Web
– Amazon’s best sellers over last hour
• Network Intrusion Detection
– Track HTTP packets with destination address
matching a prefix in given table and content
matching “*\.ida”
• Finance
– Monitor NASDAQ stocks between $20 and
$200 that have moved down more than 2% in
the last 20 minutes
New Challenges in DBMSs
High-level
Query Q
Answer
DBMS
TeraBytes  PetaBytes
Data
<CD>
<TITLE>Empire B.</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia
</COMPANY>
<PRICE>10.90</PRICE>
</CD>
Course Logistics
• Reference: Database Systems: The Complete
Book, by H. Garcia-Molina, J. D. Ullman, and J.
Widom
• Web site: http://www.cs.duke.edu/courses/fall07/cps216
• Grading:
– Project 30%
– Homework Assignments 20%
– Midterm 20%
– Final 30%
Summary: Data Management is
Important
• Core aspect of most sciences and
engineering today
• Core need in industry
• Cool mix of theory and systems
• Chances are you will find something
interesting even if you primary interest is
elsewhere