Transcript slides
COMP 530 Database Architecture and
Implementation
1. Introduction
Department of Computer Science, HKUST Slide 1
Why Learn DBMS?
You want to find a JOB !!!
Department of Computer Science, HKUST Slide 2
Big Names in Database Systems
Company
Product
Remarks
Oracle
Oracle 8i, 9i,
etc.
World’s 2nd largest software company; CEO,
Larry Ellison, world’s 2nd richest
IBM
DB2,
Universal
Server
World’s 2nd largest after Informix acquisition
in 2001
Microsoft
SQL Server,
Access
Access comes with MS Office
Sybase
Adaptive
Server
CEO John Chen, grown up in HK
Oracle
MySQL
Open Source, acquired by Sun in 2007, which
was acquired by Oracle in 2009
Postgres
“World’s most advanced Open Source DBMS”
Department of Computer Science, HKUST Slide 3
Who Needs Database Systems
Corporate databases
Web data management
Typical Applications:
Personnel management
Inventory and purchase order
Insurance policies and customer data
……
Typical Applications:
Web page management
Personalize web pages
……
Department of Computer Science, HKUST Slide 4
There is a difference between DBMSs
(Database Management Systems) and
Databases
A few people work for Oracle, etc., to develop,
enhance or maintain their DBMS products
Most people make a living working as DB
designers, DB programmers or DB
Administrators
Department of Computer Science, HKUST Slide 5
What is in a Database?
• A database contains information about a particular
enterprise or a particular application.
E.g., a database for an enterprise may contain everything
needed for the planning and operation of the enterprise:
customer information, employee information, product
information, sales and expenses, etc.
You don’t have to be a company to use a database: you can
store your personal information, expenses, phone numbers
in a database (e.g., using Access on a PC).
As a matter of fact, you could store all data pertinent to a
particular purpose in a database.
This usually means that a database stores data that are
related to each other.
Department of Computer Science, HKUST Slide 6
Database Design
Academic Registration database:
students: names, address, …
courses: course-no, course-names, …
classroom: number, location, …
HKUST
db designer 1
Estate Management database:
db designer 2
classroom: number, location, …
office: number, location, …
faculty-residence: building-no, …
student-residence: hall-no, …
Department of Computer Science, HKUST Slide 7
Is a database the same as a bunch of files?
• You can store data in a file or a set of files, but …
– How do you input data and to get back the data from the
files?
• A database is managed by a DBMS.
Department of Computer Science, HKUST Slide 8
Before we have DBMS
Applications
User
Inventory
Control
User
Customer
Order
Question: When a customer ordered 10 PC
monitors, how many files do you have to update?
Data Files
Key issues: data sharing,
data redundancy
Department of Computer Science, HKUST Slide 9
A Simple Architecture
SQL
DBMS
C/C++
Programs
Applications
Shared facilities:
• Backup and recovery
• Data storage and access
modules
• Programming tools, etc.
Department of Computer Science, HKUST Slide 10
Databases
Purposes of Database Systems
Database management systems were developed to handle
the difficulties caused by different people writing different
applications independently.
Department of Computer Science, HKUST Slide 11
Specifically …
• A DBMS attempts to resolve the following problems:
– Data redundancy and inconsistency by keeping one copy of a data
item in the database
– Difficulty in accessing data by provided query languages and
shared libraries
– Data isolation (multiple files and formats)
– Integrity problems by enforcing constraints (age > 0)
– Atomicity of updates
– Concurrent access by multiple users
– Security problems
Department of Computer Science, HKUST Slide 12
Data Independence
• One big problem in application development is the separation of
applications from data
• Do I have change my program when I …
– replace my hard drive?
– store the data in a b-tree instead of a hash file?
– partition the data into two physical files (or merge two physical files
into one)?
– store salary as floating point number instead of integer?
– develop other applications that use the same set of data?
– add more data fields to support other applications?
– ……
Independence between Data and Programs/Applications
Department of Computer Science, HKUST Slide 13
Data Independence
• Ability to modify a schema definition in one level without
affecting a schema definition in the next higher level.
• The interfaces between the various levels and components
should be well defined so that changes in some parts do not
seriously influence others.
• Two levels of data independence:
- Physical data independence
- Logical data independence
Department of Computer Science, HKUST Slide 14
Data Abstraction
• The answer to the previous questions is to introduce
levels of abstraction of indirection.
• Consider how do function calls allow you to change a
part of your program without affecting other parts?
Main
Program
function
s
function
s
Department of Computer Science, HKUST Slide 15
data
Three Levels of Abstraction
Payroll
view 1
Inventory
view 2
..……...
Sales
view n
Logical
view
Company database
Physical
view
Files on disks
Department of Computer Science, HKUST Slide 16
Application
view
View definitions
logical
Logical schema
physical
Physical schema
Department of Computer Science, HKUST Slide 17
Three Levels of Abstraction (cont.)
• Physical level: describe how a record is stored on disks.
• e.g., “Divide the customer records into 3 partitions and store them
on disks 1, 2 and 3.”
• Logical level: describes data stored in database, and the
relationships among the data. Similar to defining a record type
in Pascal or C:
Type customer = record
name: string;
street: string;
city: integer; end;
• View level: Define a subset of the database for a particular
application. Views can also hide information (e.g. salary) for
security purposes.
Department of Computer Science, HKUST Slide 18
An Example of Data Independence
Data on disk
1129
John Law
program
……
Program accessing data directly has to know:
• first 4 bytes is an ID number
• next 10 bytes is an employee name
Schema
Data on disk
1129
John Law
……
Employee:
ID: integer
Name char(10)
DBMS
program
Department of Computer Science, HKUST Slide 19
Instances and Schemas
• Each level is defined by a schema, which defines the
data at the corresponding level
– A logical schema defines the logical structure of the
database (e.g., set of customers and accounts and the
relationship between them)
– A physical schema defines the file formats and locations
• A database instance refers to the actual content of
the database at a particular point in time. A database
instance must conform to the corresponding schema
Department of Computer Science, HKUST Slide 20
Data Models
• A collection of tools for describing:
– data
– data relationships
– data semantics
– data constraints
Department of Computer Science, HKUST Slide 21
Entity-Relationship Model
• Example of entity-relationship model
social-security
customer-street
account-number
balance
customer-city
customer-name
CUSTOMER
DEPOSITER
Department of Computer Science, HKUST Slide 22
ACCOUNT
Relational Model
Example of tabular data in the relational model:
customername
Johnson
Smith
Johnson
Jones
Smith
socialsecurity
192-83-7465
019-28-3746
192-83-7465
321-12-3123
019-28-3746
customerstreet
Alma
North
Alma
Main
North
account-number
A-101
A-201
A-215
A-217
customercity
Palo Alto
Rye
Palo Alto
Harrison
Rye
accountnumber
A-101
A-215
A-201
A-217
A-201
balance
500
900
700
750
Department of Computer Science, HKUST Slide 23
Data Definition Language (DDL)
• Specification notation for defining the database schema
– Express what were in the previous two slides to the DBMS in a
formal language
• Data storage and definition language - special type of DDL in
which the storage structure and access methods used by the
database system are specified
Department of Computer Science, HKUST Slide 24
Data Manipulation Language (DML)
• Language for accessing and manipulation the data organized by
the appropriate data model
• Two classes of languages
– Procedural - user specifies what data is required and how to
get those data.
– Nonprocedural - user specifies what data is required without
specifying how to get those data
Department of Computer Science, HKUST Slide 25
Transaction Management
•A transaction is a collection of operations that performs a single logical
function in database application
time
Transaction 1
Transaction 1
Transaction 2
Conflicting read/write
Department of Computer Science, HKUST Slide 26
Transaction Management (cont.)
• Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures
(e.g. power failures and operating system crashes) and
transaction failures.
• Concurrency-control manager controls the interaction among
the concurrent transactions, to ensure the consistency of the
database.
Department of Computer Science, HKUST Slide 27
Storage Management
• A storage manager is a program module that provides the
interface between the low-level data stored in the database and
the application programs and queries submitted to the system.
• The storage manager is responsible for the following tasks:
– interaction with the file manager
– efficient storing, retrieving, and updating of data.
Department of Computer Science, HKUST Slide 28
Database Administrator (DBA)
•
•
Coordinates all the activities of the database system; the database
administrator has good understanding of the enterprise’s information
resources and needs.
Database administrator’s duties include:
Primary job of a database
– Schema definition
designer
– Specifying integrity constraints
– Storage structure and access method definition
– Schema and physical organization modification
More system
– Granting user authority to access the database
oriented
– Acting as liaison with users
– Monitoring performance and responding to changes in
requirements
Department of Computer Science, HKUST Slide 29
Database Users
•
•
Users are differentiated by the way they expected to interact with the
system
Application programmers
– Develop applications that interact with DBMS through DML calls
•
Sophisticated users
– form requests in a database query language
– mostly one-time ad hoc queries
•
End users
– invoke one of the existing application programs (e.g., print monthly sales
report)
– Interact with applications through GUI
Department of Computer Science, HKUST Slide 30