PPT - NYU Stern School of Business
Download
Report
Transcript PPT - NYU Stern School of Business
OCL2 Oracle 10g:
SQL & PL/SQL
Session #1
Matthew P. Johnson
CISDD, CUNY
January, 2005
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
1
Personnel
Instructor: Matthew P. Johnson
TA: Ratna Priya Moganti
mattpj-at-gmail.com
rmoganti7-at-yahoo.com
Admin: Dawn Kleinberger
dkleinberger-at-gc.cuny.edu
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
2
Communications
Web page:
http://pages.stern.nyu.edu/~mjohnson/oracle/
syllabus
course policies
reading assignments
etc.
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
3
Acknowledgements
Thanks to Ramesh at NYU, Ullman, et al.,
Raghu and Johannes, Dan Suciu, Arthur
Keller, David Kuijt for course materials
See classpage for other related, antecedent
DBMS courses
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
4
What is a Database?
A very large, integrated collection of data.
Models real-world enterprise.
Entities
Relationships
students, courses, instructors, TAs
George is currently taking OCL2
Dick is currently teaching OCL2
Condi is currently TA-ing OCL2 but took it last semester
Database Management System (DBMS):
large software package designed to store and
manage databases
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
5
Databases are everywhere: Ordering a pizza
Databases involved?
Pizza Hut’s DB
1.
stores previous orders by customer
stores previous credit cards used
Credit card records
2.
huge databases of (attempted) purchases
location, date, amount, parties
Got approved by credit-report companies
phone company’s records
3.
4.
Local Usage Details (“Pull his LUDs, Lenny.”)
Caller ID
5.
ensures reported address matches destination
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
6
Your wallet is full of DB records
Driver’s license
Credit cards
Medical insurance card
Social security card
Gym membership
Individual checks
Dollar bills (w/serial numbers)
Maybe even photos (ids on back)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
7
Databases are everywhere
Q: Websites backed by DBMSs?
retail: Amazon, etc.
data-mining: “Page You Made”
search engines: Google, etc.
directories: Internic, etc.
searchable DBs: IMDB, tvguide.com, etc.
Q: Non-web examples of DBMSs?
airline bookings
criminal/terrorist: TIA
NYPD’s CompStat
all serious crime stats by precinct
Retailers: Wal-Mart, etc.
when to re-order, purchase patterns, data-mining
Genomics!
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
8
Example of a Traditional DB App
Suppose we are building a system to store
the information about:
checking accounts
savings accounts
account holders
state of each of each person’s accounts
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
9
Can we do it without a DBMS?
Sure we can! Start by storing the data in files:
checking.txt
savings.txt
customers.txt
Now write C or Java programs to implement
specific tasks
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
10
Doing it without a DBMS...
Transfer $100 from George’s savings acc to
checking:
Write a C program to do the following:
Read ‘savings.txt’
Find & update the record “George”
balance -= 100
Write ‘savings.txt’
Read ‘checking.txt’
Find & update the record “George”
balance += 100
Write ‘checking.txt’
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
11
Problems without a DBMS...
1. System crashes:
Read ‘savings.txt’
Find&update the rec “George.”
Write ‘savings.txt’
Read ‘checking.txt’
Find&update the rec “George”
Write ‘checking.txt’
CRASH !
Q: What is the problem?
A: George lost his $100
Same problem even if reordered
2. Simultaneous access by many users
George and Dick visit ATMs at same
Lock checking.txt before each use– what is the problem?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
12
Problems without an DBMS...
3.Large data sets (say 50GB)
No indices
Why is this a problem?
Finding “George” in huge flatfile is expensive
Modifications intractable without better data
structures
“George” “Georgie” is very expensive
Deletions are very expensive
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
13
Problems without a DBMS...
5.Security?
File system may be insecure
File system security may be coarse
6.Application programming interface (API)?
suppose need other apps to access DB
7.How to interact with other DBMSs?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
14
General problems to solve
In building our own system, many Qs arise:
how do we store the data? (file organization, etc.)
how do we query the data? (write programs…)
make sure that updates don’t mess things up?
leave the DB “consistent”
provide different views on the data?
e.g., ATM user’s view v. bank teller’s view
how do we deal with crashes?
Too hard! Go buy Oracle!
Q: How does a DBMS solve these problems?
A: Long story; see other courses/books
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
15
Big issue: Transaction processing
Grouping of several queries (or other
database actions) into one transaction
ACID properties
Atomicity
Consistency
constraints on relationships
Isolation
all or nothing
concurrency control
Simulated solipsism
Durability
Crash recovery
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
16
Atomicity & Durability
Saw how George lost $100 with makeshift DBMS
Real DBMSs prevent this outcome
xacts are all or nothing
One idea: Keep a log (history) of all actions in set
of xacts
Durability: Use log to redo or undo certain ops in
crash recovery
Atomicity: don’t really commit changes until end
Then, all at once
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
17
Isolation
Concurrent execution is essential for performance.
Interleaving actions of different user programs
can lead to inconsistency:
Frequent, slow disk accesses
don’t waste CPU – keep running
e.g., two programs simultaneously withdraw from the
same account
DBMS ensures such problems don’t arise:
users can pretend they are using a single-user system
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
18
Isolation
Contrast with a file in two Notepads
Strategy: ignore multiple users
whichever saves last wins
first save is overwritten
Contrast with a file in two Words
Strategy: blunt isolation
One can edit
To the other it’s read-only
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
19
Consistency
Each xant (on a consistent DB) must leave it
in a consistent state
can define integrity constraints
checks the defined claims about the data remain
true
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
20
Data Models
Every DBMS uses some data model:
collection of concepts for describing data
Schema: description of partic set of data,
using some data model
Relational data model: most widely used (by
far) data model
Oracle, DB2, SQLServer, other SQL DBMSs
main concept: relation ~ table of rows & columns
a rel’s schema defines its fields
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
21
Example: university database
Conceptual schema:
Physical schema:
Students(ssn: string, name: string, login: string,
age: int, gpa: real)
Courses(cid: string, cname: string, credits: int)
Enrolled(sid:string, cid:string, grade: string)
Relations stored as unordered text files.
Indices on first column of each rel
External Schema (View):
Course_info(ssn: string, name: string)
My_courses(cname: string, grade: string)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
22
How the programmer sees the DBMS
Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...
Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Howard, ‘123456789’, ‘undergraduate’)
. . . .
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
23
How the programmer sees the DBMS
Tables:
Takes:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
C20.0046
C20.0056
Name
Howard
Wesley
…
Category
undergrad
grad
…
SSN
123-45-6789
CID
C20.0046
123-45-6789
C20.0056
234-56-7890
C20.0046
…
CName
Databases
Advanced Software
semester
Spring,
2004
Spring,
2004
Fall, 2003
Still implemented as files, but behind the
scenes can be quite complex
“data independence” = separate logical view
from physical implementation
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
24
Querying: Structured Query Language
Find all the students who have taken OCL2:
SELECT SSN
FROM Takes
WHERE CID=“OCL2”
Find all the students who OCL2 last fall:
SELECT SSN
FROM Takes
WHERE CID=“OCL2” AND Semester=“Fall, 2003”
Find the students’ names:
SELECT Name
FROM Students, Takes
WHERE Students.SSN=Takes.SSN AND
CID=“OCL2” AND Semester=“Fall, 2003”
Query processor does this efficiently
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
25
Database Industry
Relational databases are a great success of
theoretical ideas
based on most “theoretical” type of math there is: set theory
DBMS companies are among the largest software
companies in the world
Oracle, IBM (with DB2), Microsoft (SQL Server,
Microsoft Access), Sybase
Also opensource: MySQL, PostgreSQL, etc.
$20B+ industry
XML (“semi-structured data”) also important
New lingua franca for exchanging data
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
26
The Study of DBMS
Several aspects:
This course covers all three
Modeling and design of databases
DBMS programming: querying and update
DBMS implementation
though more time on first two
Also will look at some more advanced areas
XML, data-mining, LDAP?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
27
Databases are used by
DB app programmers
Database administrators (DBAs)
desktop app programmers
web developers
design schemas
security/authorization
crash recovery
tuning
Everyone else (perhaps indirectly)
“You may not be interested in databases, but
databases are interested in you.” - Trotsky
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
28
Course outline
Two biggest topics:
SQL
PL/SQL
But also:
Database design:
Entity/Relationship models
Modeling constraints
The relational model:
Relational algebra
Transforming E/R models to relational schemas
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
29
Outline (Continued)
SQL views and triggers
Connecting to a Oracle from programming
languages
Web apps
XML
May change as course progresses
partly in response to audience
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
30
Textbook
Oracle Database 10g PL/SQL 101
by Christopher Allen
Hardcover: 416 pages
Publisher: McGraw-Hill/
Osborne Media
ISBN: 0072255404
1 edition (August 10, 2004)
Distributed in class
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
31
SQL Readings
Optional reference: SQL in a Nutshell
Online (free) SQL tutorials include:
SQL for Web Nerds
(http://philip.greenspun.com/sql/)
A Gentle Introduction to SQL (http://sqlzoo.net/)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
32
On-going Feedback
Don’t wait until the class is over to give
feedback on improving it
too late for you then!
Send mail if you have questions or concerns
“We’re in touch, so you be in touch.”
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
33
Summary
DBMS used to maintain, query large datasets
Benefits include recovery from system
crashes, concurrent access, data integrity,
security, and quick application development
Database skills are critical in financial
services, marketing and other business
areas!
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
34
So what is this course about, really?
A bit of everything!
Languages: SQL, XPath, XQuery
Data modeling
Some theory!
Functional dependencies, normal forms
e.g., how to find most efficient schema for data
Lots of implementation and hacking in end-ofcourse project
Business DBMS examples/cases
Most importantly: how to meet real-world needs
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
35
For right now: written survey
Email to mattpj-at-gmail.com:
name
email
previous cs/is/math/logic courses/background
previous programming experience
Perl?
PHP?
HTML?
Job: programmer, DBA, etc.
why taking class
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
36
Agenda
Last time: intro, RDBMS, ACID test
This time: E/R model
6.
Identify entity sets, relations and attributes
One-one, one-many, many-many relations
Simple ER diagrams to model a situation
3-way relationships; Converting to binary
Entities with multiple roles
Subclasses
Design issues
1.
Principles of faithfulness & simplicity in ER diagrams
Redundancy
Whether an element should be an attribute or entity set
Replacing a relationships with entity sets
1.
2.
3.
4.
5.
2.
3.
4.
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
37
DB development path
the
World
E/R
design
Relational
schema
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Relational
DB
38
Entity Relationship (E/R) Model
A popular data model – useful to database
designers
Graphical representation of miniworld
Helps design the database, not implement it
E/R design is translated to a relational design
relational design then implemented in an RDBMS
Elements of model
Entities
Entity Sets
Attributes
Relationships (!= relations!)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
39
Elements of E/R Model: Entity Sets
Entity: like an object
Entity set: set of one sort of entities or a
concept
e.g. President Bush
Particular instance of a concept
e.g. World leaders
Generally, same set of attributes
World Leader
Represented by a rectangle
A “good” entity set – you decide
Common properties
Correspond to class of phys. or bus. objects
(People, products, accounts, grades, etc.)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
40
Elements of E/R Model: Attributes
Properties of entities in entity set
Like fields in a struct
Like columns in a table/spreadsheet
Like data members in an object
Values in some domain (e.g., ints, strings)
Represented by ovals:
ID
Name
Assumed atomic
But could have limited structure
Ints, strings, etc.
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Student
41
Elements of E/R Model:
Relationships
Connect two or more entity sets
e.g. students enroll in courses
Binary relationships: connect two entity sets –
most common
Multiway relationships: connect several entity
sets
Represented by diamonds
Students
Enroll
Courses
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
42
Elms of E/R Model: Rel’ships (cont’d)
Students Enroll in courses
Courses are Held in rooms
The E/R data model:
Students
ID
Name
Enroll
Courses
Held
Rooms
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
43
A little set theory
A mathematical set a collection of members
A set is defined by its members
Sets specified by listing:
“Are you in or are you out?”
No other structure, no order, no duplicates allowed
{1, 2, 3, …} = N
{1, 2, George Bush} (not useful in DBMS)
Or by “set-builder” notation:
{ x in N: 2 divides x} = ?
{ x in Presidents | reelected(x)} = ?
{2x: x in N} = ?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
44
A little set theory
One set can be a subset of another (which is then a
superset of it)
ReelectedPresidents is a subset of Presidents
Also, RP is a proper subset of Pres – some lost reelection
Given two sets X and Y, the cross product or
Cartesian product is
X x Y = {(x,y): x in X, y in Y}
= the set of all ordered pairs in which the first comes
from X and the second comes from Y
Important: (x,y) != {x,y}
In an order pair or tuple
Order matters
Duplicates are allowed
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
45
A little set theory
Mathematically, a relation(ship) between X and Y is
just a subset of X x Y
= all those pairs (x,y) s.t. x is related to y
Example: owner-of O on People, Cats
The equals relation E on N, N:
O(MPJ, Gödel) holds
E(3,3) holds because 3 = 3
E(3,4) does not hold
E is still a set: E = {(1,1), (2,2), (3,3), …}
Father of relation F on People, People:
F(GHWB, GWB) holds
F(GWB, GHWB) does not hold
Relations aren’t necessarily symmetric
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
46
Multiplicity of Relationships
Many-many
Many-one
One-one
Representation of relationships
No arrow: many-to-many
Sharp arrow: many-to-one
Rounded arrow: “exactly one”
“key constraint”
One-one:
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
47
Multiplicity of Relationships
Many-to-many:
Students
Enrolls
Courses
Many to one: a student lives in <= 1 residence hall
Student
Live
Residence hall
Many to exactly one: a student must live in a residence hall
Student
Live
Residence hall
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
48
Multiplicity, set-theoretically
Assume: no vars below are equal
Many-one means:
One-many means:
if (x1, y1) in R then (x2,y1) cannot be in R
One-one means:
if (x1,y1) in R then (x1,y2) cannot be in R
if (x1,y1) in R, then neither (x1,y2) nor (x2,y1) can
be in R
Notice: one-one is stronger than many-one
One-one implies both many-one and onemany
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
49
E/R Diagram
Name
Name
Students
Enrolls
Courses
ID
ID
Assisting
ID
TA
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Name
50
E/R Diagrams
Works if each TA is a TA of all students.
Connection student-TA is only through the
course
But what if students were divided into
sections, each section with a separate TA?
Then, a student in OCL2 would be related to only
one of the TAs for OCL2. Which one?
3-way relationship is helpful here
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
51
Multiway Relationships
Students
Courses
Enrolls
Enrolls entries:
Students
Alberto
Richard
Bernard
…
Courses
OCL2
OCL2
OCL2
…
TAs
Donald
Paul
Colin
…
TAs
NB: Enrolls
determines TA:
(student, course)
at most one TA
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
52
Converting multiway relships to binary
Some models limit relationships to binary
Multiway relationship – equivalent collection of binary,
many to one relationships
Replace relationship with connecting entity set
Course-of
Students
Student-of
NB: Enrolls has no
attributes!
Courses
Enrolls
TA- of
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
TAs
53
Second multiway e.g.: renting movies
Scenario: a Customer Rents a Movie from a
VideoStore on a certain date
date
VideoStore
Rental
Movie
Customer
Q: Which entity does date belong to?
A: To the fact of the renting
Relationships can have attributes
always (implicitly) many-one
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
54
Second multiway e.g.: renting movies
But they don’t have to
Relationship attributes can be replaced with
(trivial) new entities
date
Date
VideoStore
Rental
Movie
Customer
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
55
Second multiway e.g.: renting movies
Where can we draw arrows?
date
VideoStore
Rental
Movie
Customer
(store, video, customer) date ?
Date is a relship att, implicitly determined
(store, video, date) customer ?
(store, date, customer) video ?
(video, date, customer) store ?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
56
Arrow-drawing
Q: Why does it matter?
Round arrow benefit:
Obvious: One item takes less space than many
Less obvious: easier to access one item x than set of
one item {x}
Regular arrow benefit:
In programming: an int v. a linked list with just one int
Mapping to a set of either one elm or none seems bad
But not implemented this way
Always one element, but that value may be NULL
Lesson: it pays to identify your relship’s multiplicity
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
57
Second multiway e.g.: renting movies
date
VideoStore
Convert to binary?
Rental
Movie
Customer
date
Rental
MovieOf
Movie
StoreOf
Store
BuyerOf
Customer
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
58
Roles in relationships
Entity set appears more than once in a relship
Generally distinct entities
Each appearance is in a different role
Edges labeled by roles
Successor
Pre-req
Course
Prereq
Course
(Pre-req)
Accounting
Finance-I
Finance-I
Calculus
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Course
(Successor)
Finance-I
Derivatives
Finance-II
Derivatives
59
Subclasses in the E/R model
Some entities are special cases of other
Conversely: some are generalizations
Humans are specialized mammals
Grad students are specialized students
And, in turn, specialized mammals
NB: These aren’t examples but subclasses
Subclass A isa B
Represented by a triangle
Always one-to-one, though arrows omitted
Root is more general
Multiple inheritance is allowed!
A single entity may consist of all components (sets of
fields) in arbitrary ESs and their ancestors
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
60
Subclasses
length
stars
Voices
title
“Lion
year
King”:
atts of Movies;
relship Voices
Movies
isa
Lion King
TX
isa
Chainsaw
Massacre
Weapon
MurderMysteries
Cartoons
“Roger
Component
Roger
Rabbit
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Rabbit”:
atts of Movies;
relship Voices;
att weapon
61
E/R inheritance v. OO inheritance
In a OOP class hierarchy, children also inherit
“attributes” from parents
In E/R, an entity may be composed of components
from multiple, not-directly-related ESs
But an object is an instance of one class
Roger Rabbit is composed of components from Cartoons,
Murder Mysteries, and Movies
We could create a Cartoon Murder Mysteries ES if there
were any atts specific to them
So the real difference: In E/R, can have implicit
multiple inheritance between any set of IS-Aconnected nodes (sharing a root)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
62
Design Principles
Faithfulness
Avoiding redundancy
Simplicity
Choice of relationships
Picking elements
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
63
Faithfulness
Is the relationship many-many or many-one?
Are the attributes appropriate?
Are the relationships applicable to the
entities?
Examples
Courses & instructors
maybe many-one, maybe many-many
Bosses & subordinates
maybe one-many, maybe many-many
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
64
Simplicity
Einstein: Theories as simple as possible, but not
simpler.
Use as few elements as possible
Minimum required relations
No unnecessary attributes (will you be using this
attribute?)
Eliminate “spinning wheels”
Example: how can we simplify this?
Movies
Owned-by
Ownings
Owns
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
Studios
65
Avoiding redundancy
Say everything exactly once
Minimize database storage requirements
More important: prevent possible update errors
simplest but not only e.g.: modify data one place
but not the other – more later
Example: Spot the redundancy
Studios
Own
Name
Movies
Length
Address
Name
StudioName
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
66
Next time
We’ll finish E/R models and begin the
relational model
Readings will be posted
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
67