Information Technology and Applications

Download Report

Transcript Information Technology and Applications

CSC443 Database Management
Course Introduction
Professor Pepper
adapted from presentations given by
Professor Juliana Freire &
Karl Aberer
& Yan Chen
& Silberschatz, Korth and Sudarshan
Today’s Goals
Course Overview
 Why study databases?
 Why use databases?
 Intro to Databases
Major Course Objectives
Design and diagram relational databases
Create Access and Oracle databases
Use SQL commands
Be able to design a good relational
database
Know how to get information out of a
database to answer any question
Diagramming
 Use Case
 Class Diagram
 Entity Relationship Diagram
 Algebraic Relation Model
Tools
 Panther


Unix
Oracle 9.2.0.1.0
 FTP Explorer – register for trial
 MS Access
Books
 Database System Concepts 5th Ed


Theory
Cross Reference for fourth ed
 Oracle 9i Programming - A Primer

Practical examples
 See course syllabus
 Available in Library
Learning Resources
 Blackboard: my.adelphi.edu
 Web site Database System Concepts:
 www.db-book.com/
 My office hours:
 Tuesday & Thursday 12:15-1:30; Wed 1212:30
 Alumni 114 or Science Lab
 My email: [email protected]
 My phone: 516-747-2362
 My Web: www.adelphi.edu/~pepperk
Adelphi Account Setup





Panther
Oracle
Blackboard
E-mail
Signin Sheet
Projects / Grading
 Projects: 40%


Access – 15
Oracle - 25
 Homework assignments: 20%
 Midterm: 20%
 Final: 20%.
Assignments
 2% dropped for anything 1 day late.
 10% dropped for anything 2 weeks late.
Delivering assignments







Email
ftp
drop box
discussion board
mailbox in math department
E-mail me if making a change in delivery place.
forward your email from Adelphi
What is a
Database Management System?
Database Management System = DBMS
 A collection of files that store the data
 A big program written by someone else that
accesses and updates those files for you
Relational DBMS = RDBMS
 Data files are structured as relations (tables)
Why Study
Databases?
What is behind this Web Site?






http://www.ticketmaster.com/
Search on a large database
Specify search conditions
Many users
Updates
Access through a web interface
Central to Modern Computer Science
Database Systems: Then
Database Systems: Today
Field is developing quickly
From Friendster.com on-line tour
Other databases you may use
Databases are
EVERYWHERE
Current Commercial Outlook
 A major part of the software industry:



Oracle, IBM, Microsoft, Sybase
also Informix (now IBM), Teradata
smaller players: java-based dbms, devices, OO, …
 Well-known benchmarks (esp. TPC)
 Lots of related industries

data warehouse, document management, storage,
backup, reporting, business intelligence, app integration
 Relational products dominant and evolving

adapting for extensibility (user-defined types), adding
native XML support.
 Open Source coming on strong

MySQL, PostgreSQL, BerkeleyDB
Why Study Databases??
?
 Need exploded


Corporate: retail swipe/clickstreams, “customer
relationship mgmt”, “supply chain mgmt”,
“data warehouses”, etc.
Scientific: digital libraries, Human Genome
project, NASA Mission to Planet Earth,
physical sensors, grid physics network
Why study databases?
 Data is valuable:
 bank account records, tax records,
student records…
 Protect It! - no matter what
• Hurricane
• Flood
• Human error
Why study databases?
Data often structured:
 Example: Bank account records all
follow the same structure
 We can exploit this regular structure


To retrieve data in useful ways (that is, we
can use a query language)
To store data efficiently
Why Study Databases Summary





Central to modern computer science
Databases are everywhere
Commercially successful
Fast moving technology
Plethora of structured data that business and
people need
What is a database?
 Whiteboard Exercise
Database Definition
 Database

– a very large, integrated collection of data. (the stuff)
 Models a real-world enterprise


Entities (e.g., teams, games)
Relationships
(e.g., The Forty-Niners are playing in The Superbowl)
 Database Management System

– software that stores and manages databases (the tools)
Database is better than simple file
system because:
 Data redundancy, inconsistency and
isolation
 Difficult to access
 Integrity problems
 Atomicity of updates (change one file and
die before the other completes)
 Multiple user issues
So a Database Has:
 representing information
 data modeling
 languages and systems for querying data
 complex queries with real semantics*
 over massive data sets
 concurrency control for data manipulation
 controlling concurrent access
 ensuring transactional semantics
 reliable data storage
 maintain data semantics even if you pull the
plug
• * semantics: the meaning or relationship of meanings of a sign or set of signs
Why Use a Database
 Why use a database presentation
What is in a database?
Describing Data: Data Models
 A data model is a collection of concepts for
describing data.
 A schema is a description of a particular collection
of data, using a given data model.
 A relation is the data stored in a certain schema
 The relational model of data is the most widely
used model today.



Entities and relations among them
Integrity constraints and business rules
Perspective dependent (warehouse & sales view item
differently)
Database Design
The process of designing the general structure of the
database:
 Logical Design – Deciding on the database
schema.


Business decision – What attributes
Computer Science decision – What relation schemas
 Physical Design – Deciding on the physical layout
of the database
Data
Models
 A collection
of tools for describing




Data
Data relationships
Data semantics
Data constraints
 Relational model
 Entity-Relationship data model (mainly for
database design)
 Object-based data models (Object-oriented and
Object-relational)
 Semistructured data model (XML)
 Other older models:


Network model
Hierarchical model
The Entity-Relationship Model
 Models an enterprise as a collection of entities and relationships

Entity: a “thing” or “object” in the enterprise that is distinguishable
from other objects
• Described by a set of attributes

Relationship: an association among several entities
 Represented diagrammatically by an entity-relationship diagram:
Relational Model
 ER for concept  map to Algebraic
Relational Model
 Relations (tables of possible data)
 Instance (actual data at a given time)
 Schema (description of those tables, their
relations)
Relational Model Terminology
Relational Model Look
 Notation:  p(r)
 p is called the selection predicate
 Defined as:
p(r) = {t | t  r and p(t)}
Where p is a formula in propositional calculus
consisting of terms connected by :  (and),  (or),
 (not)
Each term is one of:
<attribute>op <attribute> or <constant>
where op is one of: =, , >, . <. 
 Example of selection:
 branch_name=“Perryridge”(account)
Object-Relational Data Models
 Extend the relational data model by including
object orientation and constructs to deal with
added data types.
 Allow attributes of tuples to have complex types,
including non-atomic values such as nested
relations.
 Preserve relational foundations, in particular the
declarative access to data, while extending
modeling power.
 Provide upward compatibility with existing
relational languages.
Design Goals
 Design Goals:
 Avoid redundant data
 Ensure that relationships among
attributes represented
 Ensure constraints are properly
modeled: updates
 check for violation of database integrity
constraints.
Bad Design
Queries
 What the programmer sees
Some Basic SQL Commands





Select – Get rows of data
* - everything
From – the name of the table (relation) will follow
Where – Only get the stuff that matches
Example: Select * from movies where theater =
Loews
 Exercise –

Write down the query to select all of your friends that
live in NY State
Example: University Database
 Conceptual schema:



Students(sid: string, name: string,
login: string, age: integer, gpa:real)
Courses(cid: string, cname:string,
credits:integer)
Enrolled(sid:string, cid:string,
grade:string)
 External Schema (View):
Course_info(cid:string,enrollment:
integer)

 Physical schema:



Relations stored as unordered files.
Index on first column of Students.
Key to good performance
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
Data Independence (levels of
abstraction)
 Applications insulated from
how data is structured and
stored.
 Logical data independence:
Protection from changes in
logical structure of data –
stablize views.
 Physical data independence:
Protection from changes in
physical structure of data.
 Q: Why are these particularly
important for DBMS?
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
Queries





Change and get data from a database
Run over data model
Easy & efficient
Not good for complex calculations
DML and DDL
Data Manipulation Language
(DML)
 Language for accessing and manipulating the data
organized by the appropriate data model

DML also known as query language
 Two classes of languages


Procedural – user specifies what data is required and
how to get those data
Declarative (nonprocedural) – user specifies what
data is required without specifying how to get those
data
 SQL is the most widely used query language
Data Definition Language (DDL)
 Specification notation for defining the database schema
Example: create table account (
account-number char(10),
balance
integer)
 DDL compiler generates a set of tables stored in a data
dictionary
 Data dictionary contains metadata (i.e., data about data)
 Database schema
 Data storage and definition language
• Specifies the storage structure and access methods
used
 Integrity constraints
• Domain constraints
• Referential integrity (references constraint in SQL)
• Assertions
 Authorization
Queries - What does it look like?
SELECT
SELECT eid,
E.loc,
ename,
COUNT DISTINCT (E.eid)
title
AVG(E.sal)
FROM
E, Proj
FROMEmp
Emp
E P, Asgn A
WHERE E.eid = A.eid
GROUP E.sal
BY E.loc
WHERE
> $50K
AND P.pid = A.pid
HAVING
Count(*)
AND E.loc
<> P.loc> 5
 System handles query
plan generation &
optimization; ensures
correct execution.

Count
Having
distinct

Group(agg)
Join
Select

Join
Emp
Proj
Emp
Emp
Asgn
Employees
Projects
Assignments
 Issues: view reconciliation, operator ordering, physical operator choice
memory management, access path (index) use, …
SQL
 SQL: widely used non-procedural language

Example: Find the name of the customer with customer-id 192-83-7465
select customer.customer_name
from customer
where customer.customer_id = ‘192-83-7465’

Example: Find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer_id = ‘192-83-7465’ and
depositor.account_number = account.account_number
 Application programs generally access databases through one of

Language extensions to allow embedded SQL

Application program interface (e.g., ODBC/JDBC) which allow SQL
queries to be sent to a database
 For us: Oracle and Access SQL languages
A Look underneath
Concurrency Control
 Concurrent execution of user programs: key to
good DBMS performance.


Disk accesses frequent, pretty slow
Keep the CPU working on several programs
concurrently.
 Interleaving actions of different programs:
trouble!

e.g., account-transfer & print statement at same time
 DBMS ensures such problems don’t arise.


Users/programmers can pretend they are using a singleuser system. (called “Isolation”)
Thank goodness! Don’t have to program “very, very
carefully”.
Transactions: ACID Properties
 Key concept is a transaction: a sequence of database
actions (reads/writes).
 DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle.
 Each transaction, executed completely, must take the
DB between consistent states or must not run at all.
 DBMS ensures that concurrent transactions appear to
run in isolation.
 DBMS ensures durability of committed Xacts even if
system crashes.
 DBMS can enforce simple integrity constraints on the
data.
These layers
must consider
concurrency
control and
recovery
Structure of a DBMS
 A typical DBMS has a
layered architecture.
 The figure does not
show the concurrency
control and recovery
components.
 Each database system
has its own variations.
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Overall System Structure
Databases make these folks
happy ...
 DBMS vendors, programmers $20 million industry

Oracle, IBM, MS, Sybase, …
 End users
 Business, education, science, …
 DB application programmers

Eg smart webmasters

Build web services that run off DBMSs
 Database administrators (DBAs)

Design logical/physical schemas

Handle security and authorization

Data availability, crash recovery

Database tuning as needs evolve
…must understand how a DBMS works
Summary
What is a database – lots of data organized into entities
and schemes with a manager
Why study databases? – common use, needed for
programming apps
Why use databases? – all the advantages over flat file
systems
Intro to Databases
Logical layer:
Query language, data models, transactions
Physical layer
Actual files with indexes, query processing,
concurrency, recovery & logs