Transcript Slides
CSC 370 – Database Systems
Introduction
What’s a database?
• In essence a database is nothing more than a collection of
information that exists over a long period of time.
• Databases are empowered by a body of knowledge and technology
embodied in specialized software called a database management
system, or DBMS.
• A DBMS is a powerful tool for creating and managing large amounts of
data efficiently and allowing it to persist over long periods of time, safely.
• Among the most complex types of software available.
The database [management] system
1.
Allows users to create new databases and specify their schema (logical
structure of the data), using a data-definition language.
2.
Enables users to query and modify the data, using a query language and datamanipulation language.
3.
Supports intelligent storage of very large amounts of data.
•
Protects data from accident or not proper use.
•
Example: We can require from the DBMS to not allow the insertion of two
different employees with the same SIN.
•
Allows efficient access to the data for queries and modifications.
•
Example: Indexes over a specified fields
4.
Controls access to data from many users at once (concurrency), without
allowing “bad” interactions that can corrupt the data accidentally.
5.
Recovers from software failures and crashes.
Early database systems and file syst.
•
The first commercial database systems evolved from file systems.
•
The file systems allow the storage of big amounts of data (not safely
though).
•
But file systems do not provide a query language for the data in files.
•
People had to write programs in order to extract even the most elementary
information from a set of files.
Example: Suppose we have stored in a file called Employees records
having the fields:
(code, name, dept_code)
and in another file called Departments records having the fields:
(dept_code, dept_name)
Suppose now that given an employee, for instance with name
“Smith”, we want to find what department is he working for.
Cont.
In the absence of a query language we have to write a program which will:
1. open the file Employees
2. declare a variable of the same type as the records stored in the file
3. scan the file:
while the end of the file is not yet encountered assign the current
record to above variable.
4. If the value of the name field is “Smith” get the value of the
dept_code field. Suppose it is “10000”
5. Search in a similar way for a record with “10000” for the dept_code
field in the Department file.
6. Print the dept_name when successfully finding the dept_code.
Very painful procedure even for the simplest queries.
Compare it to the short and elegant SQL query
SELECT dept_name
FROM Employees, Department
WHERE Employees.name=”Smith” AND
Employees.dept_code = Department.dept_code
The first important applications of
DBMS’s
•
The ones where the data was composed of
•
•
•
•
•
many small items, and
many queries or modifications were made.
Airline reservation systems
Banking systems
Corporate records
Airline Reservation Systems
•
Here the items of data include:
–
–
–
•
Typical queries ask for:
–
•
•
Flights leaving about a certain time from one given city to another, what seats are
available, and at what prices.
Typical data modifications include:
–
•
•
Reservations by a single customer on a single flight, including such information as
assigned seat…
Flights information – the airport they fly from and to, their departure and arrival
times…
Ticket information – prices, requirements, and availability.
Making a reservation in a flight for a customer, assign a seat etc.
Many agents will be accessing parts of the data at any given time.
The DBMS must allow concurrent accesses preventing problems such as two
agents assigning the same seat simultaneously.
Also, the DBMS should protect against loss of records if the system
suddenly fails.
Banking Systems
•
Data items include:
–
–
–
–
Customers, their names, addresses etc.
Accounts, and their balances
Loans, and their balances
Connections between customers and their accounts and loans.
•
Typical queries are those for account and loan balances.
•
Typical modifications are those representing a payment from or deposit to an
account.
•
In banking systems failures cannot be tolerated.
–
E.g, once the money has been ejected from an ATM machine, the bank must
record the debit, even if the power immediately fails.
–
On the other hand, it is not permissible for the bank to record the debit and then
not to deliver the money because the power fails.
–
The proper way to handle this operation is far from obvious and is one of the
significant achievements in DBMS architecture.
Early DBMS’s (1960’s)
•
They encouraged the user to view the data much as it was stored.
•
The chief models were the Hierarchical and Network.
•
The main characteristic of these models was the possibility of
easy jumping or navigating from one object to another through
pointers.
–
•
However these models didn’t provide a high-level query language
for the data.
–
•
E.g. From one employee to his department.
So, one had still to write programs for querying the data.
Also they didn’t allow on-line schema modifications.
Relational databases
•
Codd (1970): A database system should present the user with a
view of data organized as tables (also called relations).
•
Behind the scene there could be a complex data structure that
allows rapid response to a variety of queries.
•
•
But the user would not be concerned with the storage structure.
Queries could be expressed in a very high-level language, which
greatly increases the efficiency of database programmers.
•
This high-level query language for relational databases is called:
Structured Query Language (SQL)
Example of a Relational DB
•
•
Relations = Tables. The columns are “headed” by attribute names.
A relation Accounts might be:
accountNo
12345
67890
…
•
•
balance
1000.00
2846.92
…
type
savings
checking
…
Below the attributes are the rows, or tuples.
Suppose we want to know the balance of account “67890”. We could ask this
query in SQL as in (1).
1 SELECT balance
FROM Accounts
WHERE accountNo = 67890;
2
SELECT accountNo
FROM Accounts
WHERE type = ‘savings’ AND balance < 0;
•
For another e.g., we ask for the sav. accounts with neg. balances (2).
•
•
•
We examine all the tuples of the relation Accounts in FROM-clause.
Pick out those tuples that satisfy some criterion in the WHERE-clause,
Produce as an answer certain attributes of those tuples, as indicated in the
SELECT-clause.
Architecture of a DBMS
•
The “cylindrical” component contains
not only data, but also metadata, i.e. info
about the structure of data.
•
If DBMS is relational metadata includes:
– names of relations,
– names of attributes of those
relations, and
– data types for those attributes (e.g.,
integer or character string).
•
A database maintains indexes for the
data.
– Indexes are part of the stored data.
– Description of which attributes have
indexes is part of the metadata.
A few words about indexes
•
Similar to a book indexes.
•
•
•
A book index associates words with page numbers where they appear.
A database index associates values of some object field(s) with the
physical address of the corresponding objects in the disk.
Two are the main properties of an index:
a) it is sorted, and
b) its size is much smaller than the record set being indexed.
•
Hence, searching in an index is much faster than searching in the
corresponding record set.
•
•
Storage Manager
The job of the storage manager is to
– obtain data from the data storage, and
– modify the data to the data storage
when requested.
Storage manager has two components:
– File manager handles files.
–
Buffer manager handles main
memory.
•
Keeps track of the location of files
Obtains block(s) of a file on request
from the buffer manager.
Obtains and returns blocks of data
from/to the file manager
Stores blocks temporarily in main
memory pages.
1 block = 1 page = 4,000 to 16,000 bytes.
–
Smallest unit of data that is read/written
from/to disk.
Query Processor
•
Query processor handles: queries and
modifications to the data.
• Finds the best way to carry out a
requested operation and
• Issues commands to the storage
manager that will carry them out.
•
E.g. A bank has a DB with two relat.:
Customers (name, SIN, address),
Accounts (accountNo, balance, SIN)
Query: “Find the balances of all accounts
of which Sally is the owner.”
SELECT Accounts.balance
FROM Customers, Accounts
WHERE Customers.SIN = Accounts.SIN
AND Customers.name = 'Sally';
Query Processor (Cont.)
•
What this query logically says is:
1.
2.
3.
•
If answer this query as it says the performance would be terrible.
–
•
•
Make Cartesian product of tables specified in the FROM-clause,
Choose from R the tuples satisfying the condition in the WHERE clause.
Produce as answer only the values of attributes in SELECT-clause.
Because of the usually enormous Cartesian product.
Suppose we have
–
Index on name of Customer and
–
Index on SIN of Accounts.
Then, query processor will cleverly create a plan which
inexpensively:
–
–
Retrieves the tuple for “Sally” and gets the SIN number.
Retrieves the account tuples for this SIN number.
Transaction Manager
•
Transaction manager is responsible for
the integrity of the system. It must
assure that:
– several queries running
simultaneously do not interfere with
each other and that,
– the system will not lose data even if
there is a power failure.
•
Transaction manager interacts with:
• query manager,
– Because it may need to delay
certain query operations to
avoid conflicts.
• storage manager
– Because schemes for protecting
data involve storing a log of
changes to the data.
Database Studies
• Design of databases.
– What kinds of information go into the database?
– How is the information structured?
– How do data items connect?
• Database programming.
– How does one express queries on the database?
– How does one use other capabilities of a DBMS, such as transactions or
constraints, in an application?
– How is database programming combined with conventional
programming?
• Database system implementation.
– How does one build a DBMS, including such matters as query
processing, transaction processing and organizing storage for efficient
access?