Transcript Lecture 2

INFO 340
Lecture 2
Intro to Databases
• Book
– Need it
– Order individually through UW Bookstore
– Or Amazon
Previous Class
• How many cards did your system
require total?
• How difficult was it to answer the
standard queries?
– The ad-hoc queries?
What if -- iTunes Store via File Based Approach
•
•
Imagine all details of a song encapsulated in a single file (or nest of files)
Problems
– Difficulty in accessing data
• We need that particular application to access the data
– Duplication of data
• do we want a separate copy of the album cover for every single song ?
– Data dependence
• What if we want to get at the same data, but with a different application?
– File format incompatibilities
• What if the applications use different file formats? Ie one uses text, another
XML, another binary, etc
– Fixed queries
• Largely dependent on app developer
– Results in fixed (non- ‘ad-hoc’) queries and/or
– Multiple applications
Limitations of Files
•
•
•
•
•
•
Separation and isolation of data
Duplication of data
Data Dependence
Incompatible file formats
Fixed queries / proliferation of apps
Two main problems emerge:
– Data definition is embedded in application
– No control over access and data manipulation.
Introducing Databases
• Solves many problems introduced in a
file system.
• Multi-user, simultaneous connections
• Security
• Separation of application and data
• Common query language
• One copy of data
Database Def (pg 15)
A shared collection of logically related
data, and a description of this data,
designed to meet the information needs
of an organization
Types of databases
• Relational (we’ll focus on this type in
this class)
• Network and Hierarchical model
• Object Oriented
Database are “Self Describing”
• System catalog
• Data dictionary
• Meta-data
It is this local or in-place description of the
data that allows “program-data
independence”
(BTW good quiz item)
Data abstraction
• Similar to programming classes
– You can change the internal coding to the
object (instantiated class) as long as the
external behavior (behavior presented to
the outside world) doesn’t change
• Similarly, in databases, we can
separate the data from the application
Logically related
• Entities -- distinct object
• Attributes -- describes some aspect of
the object
• Relationships -- associations between
the entities
DBMS
Database Management System
• Provides facility to define the database via a Data
Definition Language -- DDL
• Provides facility to store, retrieve, & manipulate data
through Data Manipulation Language
– SQL (Structured Query Language) is the standard
– Note SQL & MySQL are 2 different things
DBMS
• Controlled access to database:
– Authorization & possibly authentication
– Control of integrity/consistency
– Handle multiple simultaneous users (concurrency)
– Handle recovery -- return to a known, consistent
state
– User-accessible catalog
Database Application Program
• Program that interacts with a database.
• Database can expose Views to the application.
Views help to
– Present the data to the user in a useful format
– Choose who sees what
• ie not all should see payroll
– Customize data presentation
• Same db can have many different looks/appearances
• Generally better than having multiple databases
– Allows db to hide any evolutionary structural changes
to the database ‘behind the scenes’
Three (or four) levels of the
DBMS
External Level
Conceptual
Level
• User’s view of the DB
• Can change per user
• What data is stored
• Complete view of the database and relationships
Internal Level
• How data is stored
• Represents how data is physically stored.
Physical Data
Organization
• Operating system / DBMS interaction.
Database Design
• Data structure first & application second
– Data structure will be the basis
– Data structure is also harder to change on
the fly
– At the design phase, want to think it
through as much as possible
– Data design decisions can make
application design easier (or harder)
Database Design
• Data Administrator -- “Data Monk”
• Database Administrator
• Logical Database Designer
– Business rules
• Ie only HR or finance person can see payroll
• Physical Database Designer
• Application developers
• User types
– Naïve
– Sophisticated
DBMS Advantages
• Control (though not elimination) of redundancy
• Consistency - reduce points of
update/modification
– Caused by ‘same’ data existing in multiple places
•
•
•
•
•
•
Data sharing
Integrity - use of constraints
Security - users, roles, privileges
Standardization forced
Concurrency
Backup & recovery
DBMS Disadvantages
• You pay for robustness
• Complexity - Lots of forethought
• Cost (upfront, but generally not over
time)
• Higher impact of failure - centralization
• Conversion cost
Schema & Instance
Schema -- Description of the database
Instance -- A functioning database at any
point in time
-- sometimes refers to a particular
running installation
DDL & DML
• DDL
– Language to specify the db
• System uses sets of tables itself to support user tables/schema
– System catalog
– Data dictionary
• DML
– Language to update the db & retrieve data
– Procedural
• How
– Non-Procedural
• What
• SQL
DBMS Functions
• Data storage, retrieval, update
• User-accessible catalog
– Data about data -- metadata
– System catalog
• Transaction support -- ie ‘rollback’
– Failure of db, network, etc before operation is complete
• Concurrency support
• Authorization services
• Data comm support
• Integrity services
• Utility services - imports, exports, analysis,
monitoring/alerts
Some Behind the Scenes Processes of DBMS
•
•
•
•
•
•
•
•
•
Query Processor
File Manager
DML preprocessor
DDL compiler - creates metadata tables
Authorization control
Integrity checker
Query optimizer
Transaction processor
Recovery manager
Entity Relationship Modeling/Entity Relationship
Diagram (ERD) Introduction
Example - Used car business with multiple lots,
multiple cars, multiple sales people
color
make
address
name
Engine size
Phone number
car
M
year
M
has
has
1
1
Sales
lots
location
Lighted?
size
salesperson
In-class work
• Groups of 5
• Create ERD for following 3 scenarios
– iTunes (w/customers, songs, artists)
– Live inventory (w/computers, users, & ‘onnet’ snapshots)
– Hacker attack incident database
• Attacker, victim, locations, computers
(addresses, OS’s, other pertinent info,etc)
• Present your results for each
Assignment
• Read Chapter 3-4 of Connolly & Begg