- Courses - University of California, Berkeley

Download Report

Transcript - Courses - University of California, Berkeley

Relational Algebra and Calculus
University of California, Berkeley
School of Information Management and
Systems
SIMS 257: Database Management
10/3/2000
SIMS 257: Database Management -- Ray Larson
Review
• Physical Database Design
• RAID
• Data Integrity
10/3/2000
SIMS 257: Database Management -- Ray Larson
Physical Design Decisions
• There are several critical decisions that will
affect the integrity and performance of the
system.
–
–
–
–
–
10/3/2000
Storage Format
Physical record composition
Data arrangement
Indexes
Query optimization and performance tuning
SIMS 257: Database Management -- Ray Larson
When to Index
• Tradeoff between time and space:
– Indexes permit faster processing for searching
– But they take up space for the index
– They also slow processing for insertions, deletions, and
updates, because both the table and the index must be
modified
• Thus they SHOULD be used for databases where
search is the main mode of interaction
• The might be skipped if high rates of updating and
insertions are expected
10/3/2000
SIMS 257: Database Management -- Ray Larson
When to Use Indexes
• Rules of thumb
– Indexes are most useful on larger tables
– Specify a unique index for the primary key of each
table
– Indexes are most useful for attributes used as search
criteria or for joining tables
– Indexes are useful if sorting is often done on the
attribute
– Most useful when there are many different values for an
attribute
– Some DBMS limit the number of indexes and the size
of the index key values
– Some indexed will not retrieve NULL values
10/3/2000
SIMS 257: Database Management -- Ray Larson
RAID Technology
One logical disk drive
Parallel
Writes
Disk 1
Disk 2
Disk 3
Disk 4
1
5
2
6
3
7
4
8
9
*
*
*
10
*
*
*
11
*
*
*
12
*
*
*
Stripe
Stripe
Stripe
Parallel
Reads
10/3/2000
SIMS 257: Database Management -- Ray Larson
Raid 0
One logical disk drive
Parallel
Writes
Disk 1
Disk 2
Disk 3
Disk 4
1
5
2
6
3
7
4
8
9
*
*
*
10
*
*
*
11
*
*
*
12
*
*
*
Stripe
Stripe
Stripe
Parallel
Reads
10/3/2000
SIMS 257: Database Management -- Ray Larson
RAID-1
Parallel
Writes
Disk 1
Disk 2
Disk 3
Disk 4
1
3
1
3
2
4
2
4
5
*
*
*
5
*
*
*
6
*
*
*
6
*
*
*
Stripe
Stripe
Stripe
Parallel
Reads
10/3/2000
SIMS 257: Database Management -- Ray Larson
RAID-2
Writes span all drives
Disk 1
Disk 2
Disk 3
Disk 4
1a
1b
ecc
ecc
Stripe
2a
3a
*
*
*
2b
3b
*
*
*
ecc ecc
ecc ecc
*
*
*
*
*
*
Stripe
Stripe
Reads span all drives
10/3/2000
SIMS 257: Database Management -- Ray Larson
RAID-3
Writes span all drives
Disk 1
Disk 2
Disk 3
Disk 4
1a
1b
1c
ecc
Stripe
2a
3a
*
*
*
2b
3b
*
*
*
2c
3c
*
*
*
ecc
ecc
*
*
*
Stripe
Stripe
Reads span all drives
10/3/2000
SIMS 257: Database Management -- Ray Larson
Raid-4
Parallel
Writes
Disk 1
Disk 2
Disk 3
Disk 4
1
2
3
ecc
Stripe
4
7
*
*
*
5
8
*
*
*
6
9
*
*
*
ecc
ecc
*
*
*
Stripe
Stripe
Parallel
Reads
10/3/2000
SIMS 257: Database Management -- Ray Larson
RAID-5
Parallel
Writes
Disk 1
Disk 2
Disk 3
Disk 4
1
5
2
6
3
7
4
8
9
*
*
ecc
10
*
*
ecc
11
*
*
ecc
12
*
*
ecc
Stripe
Stripe
Stripe
Parallel
Reads
10/3/2000
SIMS 257: Database Management -- Ray Larson
Integrity Constraints
• The constraints we wish to impose in order to
protect the database from becoming inconsistent.
• Five types
–
–
–
–
–
10/3/2000
Required data
attribute domain constraints
entity integrity
referential integrity
enterprise constraints
SIMS 257: Database Management -- Ray Larson
10/3/2000
SIMS 257: Database Management -- Ray Larson
Required Data
• Some attributes must always contain a value -they cannot have a null
• For example:
– Every employee must have a job title.
– Every diveshop diveitem must have an order number
and an item number.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Attribute Domain Constraints
• Every attribute has a domain, that is a set of values
that are legal for it to use.
• For example:
– The domain of sex in the employee relation is “M” or
“F”
• Domain ranges can be used to validate input to the
database.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Entity Integrity
• The primary key of any entity cannot be NULL.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Referential Integrity
• A “foreign key” links each occurrence in a relation
representing a child entity to the occurrence of the
parent entity containing the matching candidate
key.
• Referential Integrity means that if the foreign key
contains a value, that value must refer to an
existing occurrence in the parent entity.
• For example:
– Since the Order ID in the diveitem relation refers to a
particular diveords item, that item must exist for
referential integrity to be satisfied.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Referential Integrity
• Referential integrity options are declared when
tables are defined (in most systems)
• There are many issues having to do with how
particular referential integrity constraints are to be
implemented to deal with insertions and deletions
of data from the parent and child tables.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Insertion rules
• A row should not be inserted in the referencing
(child) table unless there already exists a matching
entry in the referenced table.
• Inserting into the parent table should not cause
referential integrity problems.
• Sometimes a special NULL value may be used to
create child entries without a parent or with a
“dummy” parent.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Deletion rules
• A row should not be deleted from the referenced
table (parent) if there are matching rows in the
referencing table (child).
• Three ways to handle this
– Restrict -- disallow the delete
– Nullify -- reset the foreign keys in the child to some
NULL or dummy value
– Cascade -- Delete all rows in the child where there is a
foreign key matching the key in the parent row being
deleted
10/3/2000
SIMS 257: Database Management -- Ray Larson
Referential Integrity
• This can be implemented using external programs
that access the database
• newer databases implement executable rules or
built-in integrity constraints (e.g. Access)
10/3/2000
SIMS 257: Database Management -- Ray Larson
Enterprise Constraints
• These are business rule that may affect the
database and the data in it
– for example, if a manager is only permitted to manage
10 employees then it would violate an enterprise
constraint to manage more
10/3/2000
SIMS 257: Database Management -- Ray Larson
Today
•
•
•
•
Relational Operations
Relational Algebra
Relational Calculus
Introduction to SQL via Access
10/3/2000
SIMS 257: Database Management -- Ray Larson
Relational Algebra Operations
•
•
•
•
•
•
•
•
Select
Project
Product
Union
Intersect
Difference
Join
Divide
10/3/2000
SIMS 257: Database Management -- Ray Larson
Select
• Extracts specified tuples (rows) from a
specified relation (table).
10/3/2000
SIMS 257: Database Management -- Ray Larson
Project
• Extracts specified attributes(columns) from
a specified relation.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Product
• Builds a relation from two specified
relations consisting of all possible
concatenated pairs of tuples, one from each
of the two relations.
Product
a
b
c
10/3/2000
x
y
a
a
b
b
c
c
x
y
x
y
x
y
SIMS 257: Database Management -- Ray Larson
Union
• Builds a relation consisting of all tuples
appearing in either or both of two specified
relations.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Intersect
• Builds a relation consisting of all tuples
appearing in both of two specified relations
10/3/2000
SIMS 257: Database Management -- Ray Larson
Difference
• Builds a relation consisting of all tuples
appearing in first relation but not the
second.
10/3/2000
SIMS 257: Database Management -- Ray Larson
Join
• Builds a relation from two specified
relations consisting of all possible
concatenated pairs of, one from each of the
two relations, such that in each pair the two
tuples satisfy some condition.
A1 B1
A2 B1
A3 B2
10/3/2000
B1 C1
B2 C2
B3 C3
(Natural
or Inner)
Join
A1 B1 C1
A2 B1 C1
A3 B2 C2
SIMS 257: Database Management -- Ray Larson
Outer Join
• Outer Joins are similar to PRODUCT -- but
will leave NULLs for any row in the first
table with no corresponding rows in the
second.
Outer
Join
A1
A2
A3
A4
10/3/2000
B1
B1
B2
B7
B1 C1
B2 C2
B3 C3
SIMS 257: Database Management -- Ray Larson
A1 B1 C1
A2 B1 C1
A3 B2 C2
A4 * *
Divide
• Takes two relations, one binary and one
unary, and builds a relation consisting of all
values of one attribute of the binary relation
that match (in the other attribute) all values
in the unary relation.
Divide
a
a
a
b
c
10/3/2000
x
y
z
x
y
a
x
y
SIMS 257: Database Management -- Ray Larson
ER Diagram: Acme Widget Co.
Emp#
Wage
ISA
Hourly
Sales
Cust#
Customer
Employee
Sales-Rep
Writes
Orders
Invoice
Part#
Invoice# Quantity
Contains
Line-Item
Contains
Invoice# Rep#
Part#
Count
Price
Cust#
10/3/2000
Part
SIMS 257: Database Management -- Ray Larson
Employee
Firstname
Janet
Thomas
Charles
Roberto
10/3/2000
Middlename Birthdate
Mary
6/25/63
Frederick
8/4/70
Robert
9/23/61
Garcia
7/8/58
Address
City
Zip
234 State Berkeley 92654
12 Lambert Oakland 93587
44 Central Berkeley 94720
76 Highland Berkeley 94654
SIMS 257: Database Management -- Ray Larson
Part
Part #
1
2
3
4
5
6
7
8
9
10/3/2000
Name
Price
Count
Big blue widget
3.76
2
Small blue Widget
7.35
4
Tiny red widget
5.25
7
large red widget
157.23
23
double widget rack
10.44
12
Small green Widget
30.45
58
Big yellow widget
7.96
1
Tiny orange widget
81.75
42
Big purple widget
55.99
9
SIMS 257: Database Management -- Ray Larson
Sales-Rep
SSN
Rep # Sales
123-76-3423
1 $12,345.45
843-36-7659
2 $231,456.75
Hourly
SSN
Wage
342-88-7865
$12.75
486-87-6543
$20.50
10/3/2000
SIMS 257: Database Management -- Ray Larson
Customer
Cust #
COMPANY
STREET1
Integrated Standards
1 Ltd.
35 Broadway
STREET2
STATE ZIPCODE
New York
NY
02111
34 Bureaucracy Plaza Floors 1-172
3 Control Elevation
Cyber Assicates
Place
Center
Phildelphia
PA
03756
Cyberoid
NY
08645
35 Libra Plaza
Nashua
NH
09242
1 Broadway
Middletown
IN
32467
88 Oligopoly Place
3 Independence
Parkway
Sagrado
TX
78798
Rivendell
CA
93456
8 Little Mighty Micro
34 Last One Drive
Orinda
CA
94563
9 SportLine Ltd.
38 Champion Place
Compton
CA
95328
2 MegaInt Inc.
3 Cyber Associates
General
4 Consolidated
Consolidated
5 MultiCorp
Internet Behometh
6 Ltd.
Consolidated
7 Brands, Inc.
10/3/2000
Floor 12
CITY
Suite 882
SIMS 257: Database Management -- Ray Larson
Invoice
Invoice # Cust #
Rep #
93774
3
84747
4
88367
5
88647
9
776879
2
65689
6
10/3/2000
SIMS 257: Database Management -- Ray Larson
1
1
2
1
2
2
Line-Item
Invoice # Part #
93774
84747
88367
88647
776879
65689
93774
88367
10/3/2000
Quantity
3
10
23
1
75
2
4
3
22
5
76
12
23
10
34
2
SIMS 257: Database Management -- Ray Larson
Join Items
Part #
Invoice # Part #
Quantity
93774
3
10
84747
23
1
88367
75
2
88647
4
3
776879
22
5
65689
76
12
93774
23
10
88367
34
2
1
2
3
4
5
6
7
8
9
Cust #
COMPANY
STREET1
Integrated Standards
1 Ltd.
35 Broadway
10/3/2000
Rep #
3
4
5
9
2
6
1
1
2
1
2
2
STREET2
STATE
ZIPCODE
NY
02111
34 Bureaucracy Plaza Floors 1-172
3 Control Elevation
Cyber Assicates
Place
Center
Phildelphia
PA
03756
Cyberoid
NY
08645
35 Libra Plaza
Nashua
NH
09242
1 Broadway
Middletown
IN
32467
88 Oligopoly Place
3 Independence
Parkway
Sagrado
TX
78798
Rivendell
CA
93456
8 Little Mighty Micro
34 Last One Drive
Orinda
CA
94563
9 SportLine Ltd.
38 Champion Place
Compton
CA
95328
3 Cyber Associates
General
4 Consolidated
Consolidated
5 MultiCorp
Internet Behometh
6 Ltd.
Consolidated
7 Brands, Inc.
SIMS 257: Database Management -- Ray Larson
Floor 12
CITY
New York
2 MegaInt Inc.
Invoice # Cust #
93774
84747
88367
88647
776879
65689
Name
Price
Count
Big blue widget
3.76
2
Small blue Widget
7.35
4
Tiny red widget
5.25
7
large red widget
157.23
23
double widget rack
10.44
12
Small green Widget
30.45
58
Big yellow widget
7.96
1
Tiny orange widget
81.75
42
Big purple widget
55.99
9
Suite 882
Relational Algebra
• What is the name of the customer who
ordered Large Red Widgets?
–
–
–
–
–
10/3/2000
Select “large Red Widgets” from Part as temp1
Join temp1 with Line-item on Part # as temp2
Join temp2 with Invoice on Invoice # as temp3
Join temp3 with customer on cust # as temp4
Project Name from temp4
SIMS 257: Database Management -- Ray Larson
Relational Calculus
• Relational Algebra provides a set of explicit
operations (select, project, join, etc) that can
be used to build some desired relation from
the database.
• Relational Calculus provides a notation for
formulating the definition of that desired
relation in terms of the relations in the
database without explicitly stating the
operations to be performed
• SQL is based on the relational calculus.
10/3/2000
SIMS 257: Database Management -- Ray Larson
SQL
• Structured Query Language
• Database Definition and Querying
• Basic language is standardized across
relational DBMSs. Each system may have
proprietary extensions to standard.
• Relational Calculus combines Select,
Project and Join operations in a single
command. SELECT.
10/3/2000
SIMS 257: Database Management -- Ray Larson
SELECT
• Syntax:
– SELECT attr1, attr2,…, attr3 FROM rel1 r1,
rel2 r2,… rel3 r3 WHERE condition1 {AND |
OR} condition2 ORDER BY attr1, attr3
• Examples in Access...
10/3/2000
SIMS 257: Database Management -- Ray Larson