PPT - NYU Stern School of Business

Download Report

Transcript PPT - NYU Stern School of Business

ORCL1 Oracle 8i:
SQL & PL/SQL
Session #1
Matthew P. Johnson
CISDD, CUNY
Fall, 2004
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
1
Personnel

Instructor: Matthew P. Johnson


TA: Myo Zin


mattpj at gmail.com
myokyawzin at msn.com
Admin: Dawn Kleinberger

dkleinberger at gc.cuny.edu
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
2
Communications

Web page:

http://pages.stern.nyu.edu/~mjohnson/oracle/

syllabus

course policies

reading assignments

etc.
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
3
Acknowledgements


Thanks to Ramesh, Ullman, et al., Raghu and
Johannes, Dan Suciu, Arthur Keller, David
Kuijt for course materials
See classpage for other related, antecedent
DBMS courses
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
4
What is a Database?


A very large, integrated collection of data.
Models real-world enterprise.

Entities


Relationships




students, courses, instructors, TAs
George is currently taking OCL1
Dick is currently teaching OCL1
Condi is currently TA-ing OCL1 but took it last semester
Database Management System (DBMS):
large software package designed to store and
manage databases
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
5
Databases are everywhere: Ordering a pizza
Databases involved?
Pizza Hut’s DB

1.


stores previous orders by customer
stores previous credit cards used
Credit card records
2.


huge databases of (attempted) purchases
location, date, amount, parties
Got approved by credit-report companies
phone company’s records
3.
4.

Local Usage Details (“Pull his LUDs, Lenny.”)
Caller ID
5.

ensures reported address matches destination
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
6
Your wallet is full of DB records








Driver’s license
Credit cards
Medical insurance card
Social security card
Gym membership
Individual checks
Dollar bills (w/serial numbers)
Maybe even photos (ids on back)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
7
Databases are everywhere


Q: Websites backed by DBMSs?
 retail: Amazon, etc.
 data-mining: “Page You Made”
 search engines: Google, etc.
 directories: Internic, etc.
 searchable DBs: IMDB, tvguide.com, etc.
Q: Non-web examples of DBMSs?
 airline bookings
 criminal/terrorist: TIA
 NYPD’s CompStat
 all serious crime stats by precinct
 Retailers: Wal-Mart, etc.
 when to re-order, purchase patterns, data-mining

Genomics!
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
8
Example of a Traditional DB App

Suppose we are building a system to store
the information about:




checking accounts
savings accounts
account holders
state of each of each person’s accounts
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
9
Can we do it without a DBMS?
Sure we can! Start by storing the data in files:
checking.txt
savings.txt
customers.txt
Now write C or Java programs to implement
specific tasks
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
10
Doing it without a DBMS...

Transfer $100 from George’s savings acc to
checking:
Write a C program to do the following:
Read ‘savings.txt’
Find & update the record “George”
balance -= 100
Write ‘savings.txt’
Read ‘checking.txt’
Find & update the record “George”
balance += 100
Write ‘checking.txt’
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
11
Problems without an DBMS...
1. System crashes:
Read ‘savings.txt’
Find&update the rec “George.”
Write ‘savings.txt’
Read ‘checking.txt’
Find&update the rec “George”
Write ‘checking.txt’



CRASH !
Q: What is the problem?
A: George lost his $100
Same problem even if reordered
2. Simultaneous access by many users


George and Dick visit ATMs at same
Lock checking.txt before each use– what is the problem?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
12
Problems without an DBMS...
3.Large data sets (say 50GB)


No indices


Why is this a problem?
Finding “George” in huge flatfile is expensive
Modifications intractable without better data
structures


“George”  “Georgie” is very expensive
Deletions are very expensive
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
13
Problems without an DBMS...
5.Security?


File system may be insecure
File system security may be coarse
6.Application programming interface (API)?

suppose need other apps to access DB
7.How to interact with other DBMSs?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
14
General problems to solve

In building our own system, many Qs arise:



how do we store the data? (file organization, etc.)
how do we query the data? (write programs…)
make sure that updates don’t mess things up?


provide different views on the data?


leave the DB “consistent”
e.g., ATM user’s view v. bank teller’s view
how do we deal with crashes?

Too hard! Go buy Oracle!

Q: How does a DBMS solve these problems?
A: Long story; see other courses/books

Matthew P. Johnson, OCL1, CISDD CUNY, F2004
15
Big issue: Transaction processing


Grouping of several queries (or other
database actions) into one transaction
ACID properties

Atomicity


Consistency


constraints on relationships
Isolation



all or nothing
concurrency control
Simulated solipsim
Durability

Crash recovery
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
16
Atomicity & Durability


Saw how George lost $100 with makeshift DBMS
DBMS prevents this outcome


xacts are all or nothing
One idea: Keep a log (history) of all actions in set
of xacts


Durability: Use log to redo or undo certain ops in
crash recovery
Atomicity: don’t really commit changes until end
 Then, all at once
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
17
Isolation

Concurrent execution is essential for performance.




Interleaving actions of different user programs
can lead to inconsistency:


Frequent, slow disk accesses
 don’t waste CPU – keep running
e.g., two programs simultaneously withdraw from the
same account
DBMS ensures such problems don’t arise:

users can pretend they are using a single-user system
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
18
Isolation

Contrast with a file in two Notepads




Strategy: ignore multiple users
whichever saves last wins
first save is overwritten
Contrast with a file in two Words



Strategy: blunt isolation
One can edit
To the other it’s read-only
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
19
Consistency

Each xant (on a consistent DB) must leave it
in a consistent state


can define integrity constraints
checks the defined claims about the data remain
true
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
20
Data Models



Any DBMS uses a data model: collection of
concepts for describing data
Schema: description of partic set of data,
using some data model
Relational data model: most widely used (by
far) data model



Oracle, DB2, SQLServer, other SQL DBMSs
main concept: relation ~ table of rows & columns
a rel’s schema defines its fields
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
21
Example: university database

Conceptual schema:




Physical schema:



Students(ssn: string, name: string, login: string,
age: int, gpa: real)
Courses(cid: string, cname: string, credits: int)
Enrolled(sid:string, cid:string, grade: string)
Relations stored as unordered text files.
Indices on first column of each rel
External Schema (View):


Course_info(ssn: string, name: string)
My_courses(cname: string, grade: string)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
22
How the programmer sees the DBMS

Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...

Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Howard, ‘123456789’, ‘undergraduate’)
. . . .
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
23
How the programmer sees the DBMS

Tables:
Takes:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
C20.0046
C20.0056

Name
Howard
Wesley
…
Category
undergrad
grad
…
SSN
123-45-6789
CID
C20.0046
123-45-6789
C20.0056
234-56-7890
C20.0046
…
CName
Databases
Advanced Software
semester
Spring,
2004
Spring,
2004
Fall, 2003
Still implemented as files, but behind the
scenes can be quite complex
“data independence” = separate logical view
from physical implementation
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
24
Querying: Structured Query Language

Find all the students who have taken OCL1:


Find all the students who OCL1 last fall:


SELECT SSN
FROM Takes
WHERE CID=“OCL1” AND Semester=“Fall, 2003”
Find the students’ names:


SELECT SSN
FROM Takes
WHERE CID=“OCL1”
SELECT Name
FROM Students, Takes
WHERE Students.SSN=Takes.SSN AND
CID=“OCL1” AND Semester=“Fall, 2003”
Query processor does this efficiently
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
25
Database Industry

Relational databases are a great success of
theoretical ideas






based on most “theoretical” type of math there is: set theory
DBMS companies are among the largest software
companies in the world
Oracle, IBM (with DB2), Microsoft (SQL Server,
Microsoft Access), Sybase
Also opensource: MySQL, PostgreSQL, etc.
$20B+ industry
XML (“semi-structured data”) also important

New lingua franca for exchanging data
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
26
The Study of DBMS

Several aspects:




This course covers all three


Modeling and design of databases
DBMS programming: querying and update
DBMS implementation
though more time on first two
Also will look at some more advanced areas

XML, data-mining, LDAP?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
27
Databases are used by

DB app programmers



Database administrators (DBAs)






desktop app programmers
web developers
design schemas
security/authorization
crash recovery
tuning
Everyone else (perhaps indirectly)
“You may not be interested in databases, but
databases are interested in you.” - Trotsky
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
28
Course outline

Two biggest topics:


SQL
PL/SQL

But also:

Database design:



Entity/Relationship models
Modeling constraints
The relational model:


Relational algebra
Transforming E/R models to relational schemas
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
29
Outline (Continued)

SQL views and triggers
Connecting to a database from a
programming language
Web apps
XML

May change as course progresses




partly in response to audience
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
30
Textbook

Mastering Oracle SQL





Hardcover: 492 pages
Publisher: O'Reilly
ISBN: 0596006322
2nd Edition June 2004
Distributed in class,
hopefully next time
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
31
SQL Readings


Optional reference: SQL in a Nutshell
Online (free) SQL tutorials include:

SQL for Web Nerds
(http://philip.greenspun.com/sql/)

A Gentle Introduction to SQL (http://sqlzoo.net/)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
32
On-going Feedback

Don’t wait until the class is over to give
feedback on improving it

too late for you then!

Send mail if you have questions or concerns

“We’re in touch, so you be in touch.”
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
33
Summary

DBMS used to maintain, query large datasets

Benefits include recovery from system
crashes, concurrent access, data integrity,
security, and quick application development

Database skills are critical in financial
services, marketing and other business
areas!
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
34
So what is this course about, really?
A bit of everything!
 Languages: SQL, XPath, XQuery
 Data modeling
 Some theory!





Functional dependencies, normal forms
e.g., how to find most efficient schema for data
Lots of implementation and hacking in end-ofcourse project
Business DBMS examples/cases
Most importantly: how to meet real-world needs
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
35
For right now: written survey




name
email
previous cs/is/math/logic courses/background
previous programming experience





Perl?
PHP?
HTML?
Job:: programmer, DBA, etc.
why taking class
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
36
Agenda


Last time: intro, RDBMS, ACID test
This time: E/R model
6.
Identify entity sets, relations and attributes
One-one, one-many, many-many relations
Simple ER diagrams to model a situation
3-way relationships; Converting to binary
Entities with multiple roles
Subclasses

Design issues
1.
Principles of faithfulness & simplicity in ER diagrams
Redundancy
Whether an element should be an attribute or entity set
Replacing a relationships with entity sets
1.
2.
3.
4.
5.
2.
3.
4.
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
37
DB development path
the
World
E/R
design
Relational
schema
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
Relational
DB
38
Entity Relationship (E/R) Model




A popular data model – useful to database
designers
Graphical representation of miniworld
Helps design the database, not implement it
E/R design is translated to a relational design


relational design then implemented in an RDBMS
Elements of model




Entities
Entity Sets
Attributes
Relationships (!= relations!)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
39
Elements of E/R Model: Entity Sets

Entity: like an object



Entity set: set of one sort of entities or a
concept




e.g. President Bush
Particular instance of a concept
e.g. World leaders
Generally, same set of attributes
World Leader
Represented by a rectangle
A “good” entity set – you decide


Common properties
Correspond to class of phys. or bus. objects

(People, products, accounts, grades, etc.)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
40
Elements of E/R Model: Attributes

Properties of entities in entity set






Like fields in a struct
Like columns in a table/spreadsheet
Like data members in an object
Values in some domain (e.g., ints, strings)
Represented by ovals:
ID
Name
Assumed atomic


But could have limited structure
Ints, strings, etc.
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
Student
41
Elements of E/R Model:
Relationships
 Connect two or more entity sets




e.g. students enroll in courses
Binary relationships: connect two entity sets –
most common
Multiway relationships: connect several entity
sets
Represented by diamonds
Students
Enroll
Courses
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
42
Elms of E/R Model: Rel’ships (cont’d)



Students Enroll in courses
Courses are Held in rooms
The E/R data model:
Students
ID
Enroll
Courses
Name
Held
Rooms
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
43
A little set theory


A mathematical set a collection of members
A set is defined by its members



Sets specified by listing:



“Are you in or are you out?”
No other structure, no order, no duplicates allowed
{1, 2, 3, …} = N
{1, 2, George Bush} (not useful in DBMS)
Or by “set-builder” notation:



{ x in N: 2 divides x} = ?
{ x in Presidents | reelected(x)} = ?
{2x: x in N} = ?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
44
A little set theory

One set can be a subset of another (which is a
superset)





ReelectedPresidents is a subset of Presidents
Also, RP is a proper subset of Pres – some lost reelection
Given two sets X and Y, the cross product or
Cartesian product is
X x Y = {(x,y): x in X, y in Y}
= the set of all ordered pairs in which the first comes
from X and the second comes from Y
Important: (x,y) != {x,y}
In an order pair or tuple


Order matters
Duplicates are allowed
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
45
A little set theory


Mathematically, a relation(ship) between X and Y is
just a subset of X x Y
= all those pairs (x,y) s.t. x is related to y
Example: owner-of O on People, Cats


The equals relation E on N, N:




O(MPJ, Izzy) holds
E(3,3) holds because 3 = 3
E(3,4) does not hold
E is still a set: E = {(1,1), (2,2), (3,3), …}
Father of relation F on People, People:



F(GHWB, GWB) holds
F(GWB, GHWB) does not hold
 Relations aren’t necessarily symmetric
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
46
Multiplicity of Relationships
Many-many
Many-one
One-one
Representation of relationships



No arrow: many-to-many
Sharp arrow: many-to-one
Rounded arrow: “exactly one”


“key constraint”
One-one:
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
47
Multiplicity of Relationships
Many-to-many:
Students
Enrolls
Courses
Many to one: a student lives in <= 1 residence hall
Student
Live
Residence hall
Many to exactly one: a student must live in a residence hall
Student
Live
Residence hall
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
48
Multiplicity, set-theoretically


Assume no vars below are equal
Many-one means:


One-many means:



if (x1, y1) in R then (x2,y1) cannot be in R
One-one means:


if (x1,y1) in R then (x1,y2) cannot be in R
if (x1,y1) in R, then neither (x2,y1) nor (x1,y2) can
be in R
Notice: one-one is stronger than many-one
One-one implies both many-one and onemany
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
49
E/R Diagram
Name
Name
Students
Enrolls
Courses
ID
ID
Assisting
ID
TA
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
Name
50
E/R Diagrams

Works if each TA is a TA of all students.
Connection student-TA is only through the
course

But what if students were divided into
sections, each section with a separate TA?


Then, a student in OCL1 would be related to only
one of the TA's for OCL1. Which one?
3-way relationship is helpful here
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
51
Multiway Relationships
Students
Courses
Enrolls
Enrolls entries:
Students
Carol
Richard
Al
…
Courses
OCL1
OCL1
OCL1
…
TAs
Donald
Paul
Colin
…
TAs
NB: Enrolls
determines TA:
(student, course) 
at most one TA
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
52
Converting multiway relships to binary



Some models (e.g. ODL) limit relationships to binary
Multiway relationship – equivalent collection of binary,
many to one relationships
Replace relationship with connecting entity set
Course-of
Students
Student-of
NB: Enrolls has no
attributes!
Courses
Enrolls
TA- of
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
TAs
53
Second multiway e.g.: renting movies

Scenario: a Customer Rents a Movie from a
VideoStore on a certain date
date
VideoStore
Rental
Movie
Customer



Q: Which entity does date belong to?
A: To the fact of the renting
Relationships can have attributes

always (implicitly) many-one
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
54
Second multiway e.g.: renting movies


But they don’t have to
Relationship attributes can be replaced with
(trivial) new entities
date
Date
VideoStore
Rental
Movie
Customer
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
55
Second multiway e.g.: renting movies

Where can we draw arrows?
date
VideoStore
Rental
Movie
Customer

(store, video, customer)  date ?




Date is a relship att, implicitly determinied
(store, video, date)  customer ?
(store, date, customer)  video ?
(video, date, customer)  store ?
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
56
Arrow-drawing


Q: Why does it matter?
Round arrow benefit:



Regular arraw benefit:




Obvious: One item takes less space than many
Less obvious: easier to access one item x than set of
one item {x}
 In programming: an int v. a linked list with just one int
Mapping to a set of either one elm or none seems bad
But not implemented this way
Always one element, but that value may be NULL
Lesson: it pays to identify your relship’s multiplicity
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
57
Second multiway e.g.: renting movies
date
VideoStore

Convert to binary?
Rental
Movie
Customer
date
Rental
MovieOf
Movie
StoreOf
Store
BuyerOf
Customer
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
58
Roles in relationships

Entity set appears more than once in a relship



Generally distinct entities
Each appearance is in a different role
Edges labeled by roles
Successor
Pre-req
Course
Prereq
Course
(Pre-req)
Accounting
Finance-I
Finance-I
Calculus
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
Course
(Successor)
Finance-I
Derivatives
Finance-II
Derivatives
59
Subclasses in the E/R model


Some entities are special cases of other
Conversely: some are generalizations


Humans are specialized mammals
Grad students are specialized students
 And,


in turn, specialized mammals
NB: These aren’t examples but subclasses
Subclass A isa B





Represented by a triangle
Always one-to-one, though arrows omitted
Root is more general
Multiple inheritance is allowed!
A single entity may consist of all components (sets of
fields) in aribtrary ESs and their ancestors
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
60
Subclasses
length
stars
Voices
title
“Lion
year
King”:
atts of Movies;
relship Voices
Movies
isa
Lion King
TX
isa
Chainsaw
Massacre
Weapon
MurderMysteries
Cartoons
“Roger
Component
Roger
Rabbit
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
Rabbit”:
atts of Movies;
relship Voices;
att weapon
61
E/R inheritance v. OO inheritance

In a OOP class hierarchy, children also inherit
“attributes” from parents


In E/R, an entity may be composed of components
from multiple, not-directly-related ESs



But an object is an instance of one class
Roger Rabbit is composed of components from Cartoons,
Murder Mysteries, and Movies
We could create a Cartoon Murder Mysteries ES if there
were any atts specific to them
So the real difference: In E/R, can have implicit
multiple inheritance between any set of IS-Aconnected nodes (sharing a root)
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
62
Design Principles





Faithfulness
Avoiding redundancy
Simplicity
Choice of relationships
Picking elements
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
63
Faithfulness




Is the relationship many-many or many-one?
Are the attributes appropriate?
Are the relationships applicable to the
entities?
Examples

Courses & instructors


maybe many-one, maybe many-many
Bosses & subordinates

maybe one-many, maybe many-many
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
64
Simplicity


Einstein: Theories as simple as possible, but not
simpler.
Use as few elements as possible




Minimum required relations
No unnecessary attributes (will you be using this
attribute?)
Eliminate “spinning wheels”
Example: how can we simplify this?
Movies
Owned-by
Ownings
Owns
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
Studios
65
Avoiding redundancy

Say everything exactly once


Minimize database storage requirements
More important: prevent possible update errors


simplest but not only e.g.: modify data one place
but not the other – more later
Example: Spot the redundancy
Studios
Own
Name
Movies
Length
Address
Name
StudioName
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
66
Next time


We’ll finish E/R models and begin the
relational model
Readings will be posted
Matthew P. Johnson, OCL1, CISDD CUNY, F2004
67