Data Abstraction

Download Report

Transcript Data Abstraction

CMSC424: Review
Database Management Systems
Manage data
►Store
data
►Update
data
►Answer
questions about the data
What kind of data ?
► Enterprise






Banking
Supermarkets, Sales
Airlines
Universities
Manufacturing
Human resources
► More




data
recent:
Semi-structured Data (XML)
Scientific data
Biological data
Sensor network data etc etc…
Naïve solutions
► Don’t
offer:
 Consistency
►Atomicity,




durability etc
Concurrency
Declarative data retrieval
Control of redundancy
Dynamic data evolution
DBMS
► Database
Management Systems provide
 Data abstraction
► Key
in evolving systems
the most important purpose of a DBMS
► Goal: Hiding low-level details from the users of the system
► Probably
 Guarantees about data integrity
► In
presence of concurrent access, failures…
 Speed !!
Data Abstraction
What data users and
application programs
see ?
View Level
View 1
What data is stored ?
describe data properties such as
data semantics, data relationships
How data is actually stored ?
e.g. are we using disks ? Which
file system ?
View 2
Logical
Level
Physical
Level
…
View n
DBMS at a Glance
1.
Data Modeling
2.
Data Retrieval
3.
Data Storage
4.
Data Integrity
Data Modeling
►
A data model is a collection of concepts for
describing data properties and domain knowledge:
 Data relationships
 Data semantics
 Data constraints
►
We discussed two models:
 Entity-relationship Model
► Diagrammatic representation
► Easier to work with
► Syntax not important, but remember
► Remember what you can model
the “meaning”
 Relational Model
► Only one abstract concept
► Closer to the physical representation
► Normalization
on disk
Data Retrieval
►
Query = Declarative data retrieval program
 describes what data to acquire, not how to acquire it
 Non-declarative:
► scan the accounts file
► look for number 55 in
► subtract $50 from the
the 2nd field
3rd field
 Declarative (posed against the tables abstraction):
update accounts
set balance = balance - 50
where acct_no = 55
►
Why ?
 Easier to write
 More efficient to execute
► Database
system can decide how to execute it
Data Storage
► Where
and how to store data ?
 Main memory ?
► What
 Disks
if the database larger than memory size ?
► We
discussed properties of disks
► RAID
► How to move data between memory and disk ?
 Buffer Management
► LRU, MRU, Clock
 Indexes
► Closely
tied to data retrieval
► B+-trees, Hashing
Data Integrity
Manage concurrency and crashes
 Transaction: A sequence of database actions enclosed within
special tags
 Properties:
►
►
Atomicity: Entire transaction or nothing
Consistency: Transaction, executed completely, take database from
one consistent state to another
►
►
Isolation: Concurrent transactions appear to run in isolation
Durability: Effects of committed transactions are not lost
 Consistency: Transaction programmer needs to guarantee that
►
DBMS can do a few things, e.g., enforce constraints on the data
 Rest: DBMS guarantees
►
Haven’t covered in class yet
Data Integrity
► Semantic
constraints
 Typically specified at the logical level
 E.g. balance > 0
► Assert
statements
► Functional dependencies
 kinda
DBMS at a glance
►
Data Models
 Conceptual representation of the data
►
Data Retrieval
 How to ask questions of the database
 How to answer those questions
►
Data Storage
 How/where to store data, how to access it
►
Data Integrity
 Manage crashes, concurrency
 Manage semantic inconsistencies
►
Not fully disjoint categorization !!
SQL Assignment
Report the home run champs in the last
three years (2002 to 2004).
select h.year, firstname, lastname, h.hrs
from playerinfo p, hitting h
where p.playerid = h.playerid and
h.hrs = (
select max(hrs)
from hitting h2
where h.year = h2.year);
SQL Assignment
Report the last name of the batter who
would be reported first in alphabetical order.
select firstname, lastname
from playerinfo
where firstname <= all (select firstname from playerinfo)
and
lastname <= all (
select lastname
from playerinfo p2
where p2.firstname = playerinfo.firstname);
SQL Assignment
20 Create the dream NL batting team (that will
have the most total RBI) from 2004 statistics.
Remember, a NL team consists of 1 LF, 1CF,
1RF, 1 SS, 1 2B, 1 3B, 1 1B, 1 Catcher,
and 1 Pitcher. Only consider the position at
which the hitter played the maximum number of
games; so a player will only qualify at one
fielding position. Break Ties arbitrarily.
create table rbistable as
select p.firstname, p.lastname, p.playerid, h.rbis, f.pos
from playerinfo p, hitting h, fielding f
where p.playerid = h.playerid and f.playerid = h.playerid and
f.year = 2004 and h.year = 2004
and f.numgames = (select max(f2.numgames)
from fielding f2
where f2.playerid = f.playerid and f2.year = f.year);
create table rbistable2 as
select firstname, lastname, pos, rbis, playerid
from rbistable r1
where r1.rbis = (select max(r2.rbis) from rbistable r2
where r2.pos = r1.pos);
select firstname, lastname, pos, rbis from rbistable2 r
where playerid <= all (select playerid from rbistable2 r2
where r2.pos = r.pos);
22 Rank the 2004 teams by their number of wins.
The output should contains a table with two
columns: Team Name, and Rank (between 1 and 30),
and it should be sorted by Rank. Two teams
with same number of wins will be ranked the
same, and the next rank will be skipped in
that case.
select t1.teamname, t1.wins, 31 - count(t2.teamname) as rank
from teams t1, teams t2
where t1.year = 2004 and t1.year = t2.year and
t1.losses <= t2.losses
group by t1.teamname, t1.wins
order by rank;