Databases and Web-programming RW 334 Database Systems

Download Report

Transcript Databases and Web-programming RW 334 Database Systems

Databases and Web-programming
RW 334
Database Systems
Willem Visser
[email protected] or [email protected]
Main Engineering A517
Computer Science
Stellenbosch University
Overview
•
•
•
•
•
Why and How?
Historical View of Database systems
Data Models
Relational Model
Much of the source for these notes comes
from
“Database Systems: Design, Implementation
& Management” – Rob, Coronel and Crockett
Data versus Information
• Data are raw facts
• Information is the result of processing
raw data to reveal its meaning
DATA
INFORMATION
Business Survival
•
•
•
•
•
Data is required to derive knowledge
Information comes from processing data
Information reveal the meaning of data
Decision making requires good information
Good decision making is key to business
survival
Data Management is the practice of
proper generation, storage and retrieval of data
and is the cornerstone on which any
successful business is built
Database Systems
• Database is a shared and integrated computer
structure that stores:
– Raw data
– Metadata
• i.e. data about data, or a description of the structure of
the raw data
• A database management systems (DBMS) is a
collection of programs that manages the
database structure and controls access to the
data stored in the database
Advantages of DBMS
• Better data integration
– If designed correctly!
– One part of the company knows what the other is doing
• Minimizes data inconsistency
– If designed correctly!
– If one part of the company knows what the other is doing
• Improved data access
– If designed correctly!
– Efficient data queries
• Good database systems leads to
– Improved decision making
– Increased productivity
Types of Databases
• Many dimensions
– Such as single user, multiple user, centralized or distributed,
etc.
• From a DB design point we will consider multiple user and centralized
• Two fundamental types
– Operational
• Supports companies day to day business
• We will focus almost exclusively on these
• Also called production databases
– Analytical
• Required to make business decisions
• Sometimes these are called data warehouses (more about that later
in the course) and they store aggregations of data from various
operational DBs typically over time
• Of course sometimes the operational DB is used as if it is an
analytical DB, but that can be very dangerous (more about that later
as well)
Database Design
• Is more important than you can ever imagine
• Bad design leads to bad data
• Bad data leads to bad information
• Bad information leads to bankruptcy
• That is why database designers earn BIG bucks
History of Databases
• Manual File systems to Computer File systems
• File System Shortcomings
– Extensive programming!
– Difficult to make any changes or ad hoc queries
– No structural independence
• Change the structure and lots of dependents need to change
– No data independence
• Change the data and lots of dependents need to change
• Biggest Issue
– must tell system not just what to do, but also how to do it
• Fosters data redundancy
– Same data in many places
• Causes data inconsistency
– Changing some versions of the data but not others
Database Systems
• Hardware
• Software
– OS, DBMS, Application programs
• People
– System administrators
• Oversee general operations
– Database administrators (DBA)
• Manages DBMS
– DB Designers
• Design the structure of the DB
– Programmers
• Build the applications that use the DB
– End Users
• Procedures
– How to use the DB in the organization
• Data
Database Management Systems
• Level of abstraction to hide the complexities of
the data storage and retrieval
– Ensures data and structural independence
– Hides difference between logical (what user sees)
and physical (what computer sees) data format
• Data storage management
– Data is stored in very complex ways under the hood
and is hidden from the user, until you need to
optimize the performance!
• Security management
– Allow users to have different access
Database Management Systems
• Concurrency control
– Important in multi-user DBs to keep consistency
• Backup and Recovery
– Disks fail!!!
• Data Integrity
– Minimizes redundancy and maximizes consistency
• Easy Access
– Efficient Query languages and programming language
access support
– Good connectivity to other environments
DBMS The Dark Side
• Increased Costs
– Some of these DBMS systems are VERY expensive and run
on expensive hardware
• Management Complexities
– DBAs are very well paid, for good reason!
• Maintenance
– Must keep the system up to date and patches are frequent
• Vendor dependence
– A killer!
– Can become very expensive
• How many boats do Larry Ellison own again?
Data Models
• How to represent real world entities in computer
accessible form
– Abstraction of complex object or event
• Database Environment
–
–
–
–
Data structures and their characteristics
Relationships
Constraints
Transformations
• Models help one see the big picture
– Always true not just in database design
Data Models Consists of
• Entities
– About which data is to be collected
– People, places, objects, events,…
• Characteristics
– Of the entity
• Relationships
–
–
–
–
Between entities
One to one, 1-1
One to Many, 1-*
Many to Many, *-*
• Constraints
– On the entities, characteristics and relationships
Business Rules
• Is a brief, precise and unambiguous description of a
policy, procedure or principle within an organization
– “Students may use many books” and “A book is a used by
only one student”
• Two entities: STUDENT and BOOK
• 1-* Relationship (1 = student, * = book)
– “A class must have at least 5 students and no more than
30”
• Constraint , two entities and a relationship
• Business Rules can be most easily discovered through
targeted interviews
– Iteratively! Not everyone will agree on the rules…at first
Rules into Data
• Noun becomes Entity
• Verb becomes a Relationship
• “A customer may generate many invoices”
– Nouns: Entities CUSTOMER and INVOICE
– Verb: Relationship “GENERATE”
• Relationships are bidirectional
– How many instances of B are related to one instance of A?
– How many instances of A are related to one instance of B?
• How many classes can one student enroll in?
– Many
• How many students can be enrolled in one class?
– Many
• Relationship between Student and Class is:
– Many to Many, *-*
History of Data Models
• File System
– 1960s
• Hierarchical and Network Models
– 1970s
• Relational
– 1970-present
• Object-Oriented Extended Relational
– Mid 1980s-present
• Unstructured (XML format)
– Extended Relational
– 2000s - future
Hierarchical Model
Parent
Children
• Parent to Children is a One to Many (1-*) Relationship
• Children refine the Parent Entity
• AGENT (parent) represents various ENTERTAINERS (children)
• Benefits:
• Fast retrieval and referential integrity (every child must be linked to parent)
• Problems:
• Rigid structure that is hard to change, i.e. no structural independence
• Not easy to present *-* relationships, can lead to redundant and inconsistent data
Network Model
• A child can have more than one parent
• Major drawback of both Hierarchical and Network model
– No structurally independent way of extracting information!
– And of course any change in the systems was a major headache
• These models were abandoned in favor of the relational model in
the 1980s
Relational Model
• Edgar F. Codd (IBM) published the following paper in
1970
– “A Relational Model of Data for Large Shared Databanks”
– Considered Ingenious but impractical…too much machine
overhead…computers became faster and the problem
went away
Relational Model
• Collection of Tables
–
–
–
–
–
Also called a Relation
Row/Column matrix
Each column is called a field
Each row entry is called a record (or tuple)
Tables can have common fields that relate their entries
• STUDENT’s Number and LIBRARYBOOK’s Number can show which student
has a library book checked out
• Relational Database Management System (RDBMS) hides the real
complexity of the model from the user
– User sees the logical view and the physical view is completely hidden
– This is what made it so popular…and the query language, SQL, of
course
– SQL is called declarative, since it says what to do, but not how to do it
Entity Relationship (ER) Model
• Designing a database can be complex and a notation for doing
it graphically was required
• Peter Chen introduced the ER data model in 1976
– Shown in an Entity Relationship diagram
• Unified Modeling Language (UML) can be used
– Specifically the class diagram part without method descriptions
• Entities
• Relationships
– 1 (one)
– 0..* (many)
Object Oriented Data Model
• You all know this one!
• Some databases that use this as its data
model
• Mostly this model makes sense when
combined with the Relational Model
– Also called the Extended Relational Model
Summary
• Data Models are abstractions
– Get them from business rules
– Consists of entities, attributes, relationships and
constraints
• Relational Model is most popular data model
in the database world
– Tables are related through common attributes
– ER models are popular for DB design
• Next, Relational Model in detail