CS405G: Introduction to Database Systems

Download Report

Transcript CS405G: Introduction to Database Systems

CS405G: Introduction to
Database Systems
Final Review
Database Design
7/18/2015
Jinze Liu @ University of Kentucky
2
E-R model
• E-R model
• Entities
• Attributes
• Relationships
7/18/2015
Jinze Liu @ University of Kentucky
4
Database Design
7/18/2015
5
From E-R Diagram to Relations
•
•
•
•
Relations
Schemas
Converting E-R diagram to relations
Keys
• Super keys
• Candidate keys
• Primary keys
• Relational integrity constraints
Key Constraints
• Superkey:
– (Uniqueness constraints) A set of attributes where
no two distinct tuples can have the same values
– Every relation has at least one superkey:
• The set of all attributes.
•
Key: A minimal superkey
–
–
Uniqueness constraint (superkey)
Minimum Constraint
•
7/18/2015
No attribute can be removed and still satisfy the
uniqueness constraints.
Jinze Liu @ University of Kentucky
7
Relational Integrity Constraints
•
Constraints are conditions that must hold on
all valid relation instances. There are four
main types of constraints:
1. Domain constraints
1. The value of a attribute must come from its domain
2. Key constraints
3. Entity integrity constraints
4. Referential integrity constraints
7/18/2015
8
Database Normalization
• Functional Dependency
• Functional Closure
• Keys
– Redefined
– Based on functional dependency
• DB Norm Form
– 1st, 2nd, 3rd, BCNF
Database Query
7/18/2015
Luke Huan Univ. of Kansas
10
Relational Algebra and SQL
• Relational algebra
• SQL query
 SFW
 Group by …, Having
 Subqueries
 Relationship between R.A. and SQL
Relational algebra
A language for querying relational databases based on operators:
RelOp
RelOp
• Core set of operators:
– Selection, projection, cross product, union, difference, and renaming
• Additional, derived operators:
– Join, natural join, intersection, etc.
• Compose operators to make complex queries
7/18/2015
Jinze Liu @ University of Kentucky
12
•
•
•
•
•
•
Summary of core operators
σp R
πL R
RXS
RS
R-S
ρ S(A1, A2, …) R
Selection:
Projection:
Cross product:
Union:
Difference:
Renaming:
– Does not really add
“processing” power
7/18/2015
Jinze Liu @ University of Kentucky
13
Summary of derived operators
• Join:
• Natural join:
• Intersection:
7/18/2015
R pS
RS
R S
Jinze Liu @ University of Kentucky
14
Classification of relational operators
•
•
•
•
•
•
•
•
Selection: σp R
Projection: πL R
Cross product: R X S
Join: R p S
Natural join: R  S
Union: R U S
Difference: R - S
Intersection: R ∩ S
7/18/2015
Monotone
Monotone
Monotone
Monotone
Monotone
Monotone
Monotone w.r.t. R; non-monotone w.r.t S
Monotone
Jinze Liu @ University of Kentucky
15
Update Operations on Relations
• Update operations
– INSERT a tuple.
– DELETE a tuple.
– MODIFY a tuple.
• Constraints should not be violated in
updates
7/18/2015
Jinze Liu @ University of Kentucky
16
Basic queries: SFW statement
• SELECT A1, A2, …, An
FROM R1, R2, …, Rm
WHERE condition;
• Also called an SPJ (select-project-join) query
• (almost) Equivalent to relational algebra query
π A1, A2, …, An (σ condition (R1 X R2 X … X Rm))
7/18/2015
Luke Huan Univ. of Kansas
17
Semantics of SFW
• SELECT E1, E2, …, En
FROM R1, R2, …, Rm
WHERE condition;
• For each t1 in R1:
For each t2 in R2: … …
For each tm in Rm:
If condition is true over t1, t2, …, tm:
Compute and output E1, E2, …, En as a row
• t1, t2, …, tm are often called tuple variables
• Not 100% correct, we will see
7/18/2015
Luke Huan Univ. of Kansas
18
Operational semantics of GROUP BY
SELECT … FROM … WHERE … GROUP BY
…;
• Compute FROM
• Compute WHERE
• Compute GROUP BY: group rows according to
the values of GROUP BY columns
• Compute SELECT for each group
– For aggregation functions with DISTINCT inputs,
first eliminate duplicates within the group
Number of groups = number of rows in the final
output
7/18/2015
Jinze Liu @ University of Kentucky
19
Database Design
7/18/2015
Jinze Liu @ University of Kentucky
20
physical data organization
• Storage hierarchy (DC vs. Pluto) ! count I/O’s
• Disk geometry: three components of access cost; random
vs. sequential I/O
• Data layout
– Record layout (handling variable-length fields, NULL’s)
– Block layout (NSM, PAX) ! inter-/intra-record locality
• Access paths
– Primary versus secondary indexes
– Tree-based indexes: ISAM, B+-tree
! Again, reintroduce redundancy to improve performance
! Fundamental trade-off: query versus update cost
21
Performance Issues on Indexes
• Indexes
– ISAM
– B+ Tree
• Metrics
– Storage
– IO-costs
• Operations
– Single value query & range query
– Insertion and deletion
22
Query Processing Implementation
• Typical Query Processings
– Selection
– Join
– Set operations.
• Typical approaches
– Sequential scans in unsorted database
– Sorted database
– What are the tradeoffs.
23