...Looking back

Download Report

Transcript ...Looking back

...Looking back
Why use a DBMS?
How to design a database?
How to query a database?
How does a DBMS work?
Why do we need to use a DBMS?
• DBMS vs File Systems
• DBMS offers a lot of functionality that makes Apps easier
to develop
– data integrity
– query processing
•
•
•
•
algorithms
optimization
indexes
data storage
– transactions
• concurrency
• atomicity
Database Design
• Conceptual - Logical - Physical models
• Conceptual: Entity Relationship diagram
–
–
–
–
great tool for putting design on paper
types of attributes, relationships
designing an E-R diagram for an application
Translating it into Relational Schema
Logical Design
• Relational Databases
– efficient, precise way to convey data to the DBMS
• Normalization
– good database design eliminates redundancy
– uses decomposition
• lossless-join
• dependency preserving
• Boyce-Codd Normal Form
– the “purest” normal form - no redundancy at all
• 3NF
– allows some non-key dependencies
• PRACTICE: balance decomposition with query joins.
Integrity Constraints
• Domain constraints
• Key declarations
• Referential integrity
– Emp.DeptName ----> Dept.DeptName
• Assertions
– say that something can not happen
• Triggers
– what to do if something happens
Functional Dependencies
• Relationships between attributes in a relation
• A -> B means that A determines B
– if t.A = s.A, then t.B = s.B (t,s are tuples)
• Closure of dependencies
– all dependencies derivable from the given set
– use Armstrongs axioms
• Closure of attribute sets
– simple transitive closure algorithm
• Canonical cover
– eliminate redundant attributes from the dependency
• Practice: look for dependencies and their effect on updates
Query Languages
• SQL
– Intergalactic language in databases
– Select .. From .. Where .. Group By .. Having .. Order By..
•
Relational Algebra
– Concise way to express queries
– Used inside a DBMS for translation, optimization
• Practice: know your SQL, it helps.
Data Storage
• Databases mainly rely on Disks and Main Memory
• Buffer manager used for keeping data in memory
• Representation of attributes, tuples
– fixed-length representation
– heap representation (pointers pointing to the data)
• Tables
– Heap (unordered set of tuples)
– Sorted on some attribute
– Indexed
Index Structures
• ISAM
– static multi-level tree structure
– gets bad as updates happen
• B+Trees
– dynamic, balanced tree
– update algorithm
– helps in range, equality selections, joins
• Hash
– use a hash function to determine where a tuple stays
– helps only in equality predicate selections and join queries
Query Processing
• Translating SQL query into disk/memory accesses.
• Key component: Query Optimizer
– selecting the best access plan for the query
• Multiple choices for execution
– selection: heap/index
– join: join algorithms (nested loops, merge join), heap/index
• Optimizer uses cost estimates to pick the cheapest plan
– what is the best plan for select * from R, S where R.a = S.a?
Transactions
•
•
•
•
Atomicity
Consistency
Isolation
Durability
• Extremely important and complex.
Isolation
• How do we schedule concurrent transactions
• Serial is good, but slow
• Serializable is good and fast
• Check for serializability of a schedule
– look for cycles in the precedence graph
Durability/Atomicity
• Shadow database scheme works, but too costly
– make a copy of the database before updates
• Practice: use Commit and Recovery algorithms
– keep a Log of what updates we did to the data
– Push the Log and the Commit flag to the disk
– On crash and recovery, reconstruct the data from the Log
Looking Forward...
• OLAP / Data Warehousing / Decision Support
– fast query processing
– new operators (data cube etc.)
• Data Mining
• Geographical and Multimedia Databases
• World Wide Web
– XML