DataBase Systems - Baldwin Wallace University

Download Report

Transcript DataBase Systems - Baldwin Wallace University

DATABASE SYSTEMS
Lecture 21
Dr. Adam P. Anthony
Overview








Database Fundamentals
System components
Schemas
Database Models
The Relational Model
Survey of Database Products
Data Mining
Social Issues
What is a Database?

Non-Technical: A computer program that stores
data
 Examples

of Databases using this definition?
Technical: A computer program that stores structured
data in an organized fashion, and provides the
ability to grant limited, or restricted access to
different users based on their data requirements.
 Examples
of Databases using this definition?
 Anything ruled out that we came up with above?
Database Systems

Initially: each department
maintained their own
“database”




Sometimes just a text or
Excel file
Problems with duplication?
Problems with sharing
information?
Now: One database holds
everything, and each
department is given access
only to the data they need
Schemas

Schema  full description of all the database,
including what information is provided and how it
interacts
Used to maintain and program the database
 Provides a nice overview of what data a business collects
and (perhaps) how they use that data


Subschema  Partial description of only a few relevant
items
Anything in a Subschema will also appear in the database’s
Schema
 Based on providing only what the user needs to do their job
 Provides basic security: payroll record example

The Database Management System

Software applications can be enhanced by working with a
database


Especially on the web! (Google, Amazon, etc.)
Database Management System (DBMS): has all the
necessary code for organizing, manipulating data

When application needs data, it just asks the DBMS to provide it
More on Database Management
Systems


Database Management System (DBMS): A software
layer that manipulates a database in response to requests
from applications
Schemas and subschemas are created, managed by the
DBMS


Distributed Database: A database stored on multiple
machines



Can have a different subschema for each application that
accesses the DBMS
Data is shared between each machine over a network
DBMS will mask this organizational detail from its users
Data independence: The ability to change the
organization of a database without changing the
application software that uses it
Database Models



Bits and bytes are hard to talk about!
Want to reflect the organization of our data without
thinking about the complex programming techniques
needed to keep our data organized
A Database Model provides a conceptual view of the
database that gives us a ‘general idea’ of how the
data is organized
Flat File Model (not really a DB)
 Hierarchical Model
 Object-Oriented Model
 Relational Model

The Relational Database Model


Pretty Simple, on the surface
A Relation is just a rectangular table of data
 An
Attribute is any column in that table
 A Tuple is any row in the table

There is complexity not in the table, but in how two
different tables can interact
 If
two relations have a common Attribute, then we can
combine tuples from the two tables to get a more
complex piece of data!
A relation containing employee
information
9-10
Designing a Good Relational
Database

When we set up the DBMS, we could just put all our
data in one relation, like a big Excel file!
 But
that’s no fun!
 And, it can cause problems



Redundancy: When the same data is repeated
unnecessarily
Deletion Anomalies: When deleting a tuple results in
the loss of important information
We can prevent some of these problems using a multirelation design!
A relation containing redundancy
9-12
An employee database consisting
of three relations
9-13
The Employee Database Schema
Employee
Empl ID
Name
Address
SSN
Job
Job ID
Job Title
Skill Code
Dept
Assignment
Empl ID
Job ID
Start Date
Term Date
Combining Data From Multiple Tables

In which departments has G. Jerry Smith worked?
Designing a Relational Database

Given a data problem:
 What
are the entities/objects in this problem?
 How do those entities interact?


Make one relational table for each entity
Make one relational table for each interaction
Identifying/Avoiding Redundancy

Sometimes it’s obvious in the design



Two columns with same/similar names
Sometimes you need to focus on how the data will be
used
Common perpetrators:
Addresses—more than one person can live at same
address
 Organizations—more than one person belongs to a single
organization
 Owners—a person/company can own more than one asset
 Creators—a creator may have many creations

In-Class Example (from book)

Chapter Review, question 18 (similar to assigned
problem 20)
 Design
a relational database containing information
about music performers, their recordings, and the
composers of the music they recorded (Avoid
redundancies similar to those in Figure 9.4)
SQL (Structured Query Language)


Developed at IBM as a part of their Relational DBMS
project
Basic language humans can use to express, with
precision, the data that they want from each (separate)
relational table, and how it should be combined:
select EmplId, Dept
from ASSIGNMENT, JOB
where ASSIGNMENT.JobId = JOB.JobId
and ASSIGNMENT.TermData = “*”
Survey of Database Products




MySQL
Microsoft Access
Microsoft SQLServer
Oracle DB
MySQL



Free, Open-Source database product
Little support from creators, minimal software for managing the system
Must know a great deal about database design fundamentals in order to
use successfully on a large scale


Can be done, but requires expert knowledge
Good Choice for:


temporary data solutions
Zero-budget scenarios (but only if you have expert users to run it!)


Internet applications


Hiring people to run the system costs money!
Facebook??
Poor Choice for:


Non-Expert users
MASSIVE data problems (but Facebook??)
Microsoft Access


Cheap, business-oriented database product
Emphasis on user-friendliness, ease of development



Limits on amount of data that can be stored in a single database


Data may be as complex as on any other DBMS, just limited in size
Good choice for:




Packaged with ‘data templates’ that might automatically fit your data needs, no
design required!
very nice application-development features that are popular in the business
world
Small business settings
Non-Expert Users
Temporary data solutions
Poor choice for:


Large, enterprise-level data solutions
Sharing data with a large number of people
Microsoft SQLServer and Oracle DB



Large, expensive enterprise-level DBMS’s
Distributed DBMS support
Provides both expert-level and non-technical tools
for interaction with DBMS
 SQLServer

Good For:
 Everything
 Large

more-so than Oracle
(if you can afford them!)
jobs
Bad for:
 Low-budget
scenarios (except for SQLServer Express!)
Data Mining

Data Mining = Buzz word that covers any action that seeks to detect
a previously unknown pattern from a large data set


Class Description:




Encompasses many areas of Computer Science, Mathematics, Statistics
Divide data set into pre-determined groups (e.g., Mexican food lovers
and Italian food lovers)
Characterize each group by identifying common patterns
“What type of person enjoys Mexican food?”
Class Discrimination:



Start with an initial data set, generate a class description for each
group
Given a new (incomplete) item, predict which group it will fall into:
“Will this new customer prefer Mexican or Italian food?”
Data Mining, Continued

Cluster Analysis:
Similar to Class Description but now the groups aren’t
predetermined!
 Must find the most distinct groups AND describe them at the
same time!
 Much more difficult


Association Analysis
Find patters between groups!
 Big in market research
 “Are Mexican food lovers also lovers of imported beers?”

Social Impact of Databases

Problems

Massive amounts of personal data are being collected

Often without knowledge or meaningful consent of affected people
Data merging produces new, more invasive information
 Errors are widely disseminated and hard to correct


Remedies
Existing legal remedies often difficult to apply
 Negative publicity may be more effective
